Skip to content

Adding DataFrame.Dtypes support #635

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 34 commits into from
Aug 21, 2020
Merged

Conversation

Niharikadutta
Copy link
Collaborator

@Niharikadutta Niharikadutta commented Aug 19, 2020

This PR adds support for dtypes and addresses part of #621

Scala side:
Dataframe.dtypes => returns Array[(String, String)]

dtypes when called on a DataFrame, returns all column names and their data types as an array of tuples. For example, for a dataframe df with the following schema:
root
|-- age: long (nullable = true)
|-- name: string (nullable = true)

Calling dtypes:

val result = df.dtypes
result: Array[(String, String)] = Array((age,LongType), (name,StringType))

C# side:
Dataframe.DTypes() => returns IEnumerable<Tuple<string, string>>

Calling DTypes():

IEnumerable<Tuple<string, string>> result = df.DTypes();

@Niharikadutta Niharikadutta changed the title [WIP] Adding DataFrame.Dtypes support Adding DataFrame.Dtypes support Aug 20, 2020
@suhsteve
Copy link
Member

@Niharikadutta I think you made this a bit more complicated than it needed to be. How about we do something like this instead?

public IEnumerable<Tuple<string, string>> DTypes() =>
            Schema().Fields.Select(
                f => new Tuple<string, string>(f.Name, f.DataType.SimpleString));

@Niharikadutta
Copy link
Collaborator Author

@suhsteve Good point, I didn't think about just exposing the functionality instead of calling spark's dtypes. This also returns C#'s DataTypes like integer and string instead of Spark's IntegerType and StringType, which is good I think?

@imback82
Copy link
Contributor

Yea this should be fine. Please check pyspark implementation: https://github.com/apache/spark/blob/master/python/pyspark/sql/dataframe.py#L1010

Copy link
Contributor

@imback82 imback82 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except minor comments

@imback82
Copy link
Contributor

Can you remove .ionide/symbolCache.db from this PR?

Copy link
Member

@suhsteve suhsteve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @Niharikadutta !

@suhsteve
Copy link
Member

Can you update your description to reflect the updated changes ?

Copy link
Contributor

@imback82 imback82 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @Niharikadutta!

@imback82 imback82 merged commit 2aa9bc4 into dotnet:master Aug 21, 2020
@Niharikadutta Niharikadutta deleted the dtypesSupport branch August 24, 2020 01:25
@imback82 imback82 added the enhancement New feature or request label Aug 26, 2020
@imback82 imback82 added this to the 1.0.0 milestone Aug 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants