-
Notifications
You must be signed in to change notification settings - Fork 327
Adding DataFrame.Dtypes support #635
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
merging latest from master
src/csharp/Microsoft.Spark.E2ETest/IpcTests/Sql/DataFrameTests.cs
Outdated
Show resolved
Hide resolved
@Niharikadutta I think you made this a bit more complicated than it needed to be. How about we do something like this instead? public IEnumerable<Tuple<string, string>> DTypes() =>
Schema().Fields.Select(
f => new Tuple<string, string>(f.Name, f.DataType.SimpleString)); |
@suhsteve Good point, I didn't think about just exposing the functionality instead of calling spark's |
Yea this should be fine. Please check pyspark implementation: https://github.com/apache/spark/blob/master/python/pyspark/sql/dataframe.py#L1010 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except minor comments
src/csharp/Microsoft.Spark.E2ETest/IpcTests/Sql/DataFrameTests.cs
Outdated
Show resolved
Hide resolved
src/csharp/Microsoft.Spark.E2ETest/IpcTests/Sql/DataFrameTests.cs
Outdated
Show resolved
Hide resolved
src/csharp/Microsoft.Spark.E2ETest/IpcTests/Sql/DataFrameTests.cs
Outdated
Show resolved
Hide resolved
Can you remove |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @Niharikadutta !
Can you update your description to reflect the updated changes ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @Niharikadutta!
This PR adds support for
dtypes
and addresses part of #621Scala side:
Dataframe.dtypes => returns Array[(String, String)]
dtypes
when called on aDataFrame
, returns all column names and their data types as an array of tuples. For example, for a dataframedf
with the following schema:root
|-- age: long (nullable = true)
|-- name: string (nullable = true)
Calling
dtypes
:C# side:
Dataframe.DTypes() => returns IEnumerable<Tuple<string, string>>
Calling
DTypes()
: