Skip to content

Adding DataFrame.Dtypes support #635

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 34 commits into from
Aug 21, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
03b7939
Adding section for UDF serialization
Niharikadutta Apr 20, 2020
4ef693d
removing guides from master
Niharikadutta Apr 20, 2020
81145ca
Merge latest from master
Niharikadutta May 6, 2020
e4b81af
merging latest from master
Niharikadutta May 7, 2020
6bab996
CountVectorizer
Jul 27, 2020
e2a566b
moving private methods to bottom
Jul 27, 2020
5f682a6
changing wrap method
Jul 28, 2020
31371db
setting min version required
Jul 31, 2020
60eb82f
undoing csproj change
Jul 31, 2020
ed36375
member doesnt need to be internal
Jul 31, 2020
c7baf72
too many lines
Jul 31, 2020
d13303c
removing whitespace change
Jul 31, 2020
f5b477c
removing whitespace change
Jul 31, 2020
73db52b
ionide
Jul 31, 2020
4c5d502
Merge remote-tracking branch 'upstream/master'
Niharikadutta Aug 10, 2020
a766146
Merge branch 'master' into ml/countvectorizer
GoEddie Aug 12, 2020
ad6bced
Merge branch 'ml/countvectorizer' of https://github.com/GoEddie/spark
Niharikadutta Aug 13, 2020
8e1685c
Revert "Merge branch 'master' into ml/countvectorizer"
Niharikadutta Aug 13, 2020
255515e
Revert "Merge branch 'ml/countvectorizer' of https://github.com/GoEdd…
Niharikadutta Aug 13, 2020
a44c882
Merge remote-tracking branch 'upstream/master'
Niharikadutta Aug 14, 2020
3c2c936
fixing merge errors
Niharikadutta Aug 14, 2020
69e91fa
first commit adding dtypes support
Niharikadutta Aug 19, 2020
5bdc6a3
working changes, returns string[][]
Niharikadutta Aug 20, 2020
9e3e75d
Adding SerDe changes to all spark version jars
Niharikadutta Aug 20, 2020
005e773
removing unnecessary test
Niharikadutta Aug 20, 2020
110ad03
PR review comments
Niharikadutta Aug 20, 2020
c2fb0ce
comments
Niharikadutta Aug 20, 2020
28d8c1a
PR review
Niharikadutta Aug 20, 2020
550b7cf
Adding test for dtypes return type validation
Niharikadutta Aug 21, 2020
eb10574
Merge branch 'master' into dtypesSupport
Niharikadutta Aug 21, 2020
cfc2159
changes
Niharikadutta Aug 21, 2020
5843287
Merge branch 'dtypesSupport' of github.com:Niharikadutta/spark into d…
Niharikadutta Aug 21, 2020
57c7028
PR comments
Niharikadutta Aug 21, 2020
c893c78
removing ionide
Niharikadutta Aug 21, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
// See the LICENSE file in the project root for more information.

using System;
using System.Collections.Generic;
using System.Linq;
using Apache.Arrow;
using Microsoft.Data.Analysis;
Expand Down Expand Up @@ -459,6 +460,13 @@ public void TestSignaturesV2_3_X()

Assert.Equal(2, _df.Columns().ToArray().Length);

var expected = new List<Tuple<string, string>>
{
new Tuple<string, string>("age", "integer"),
new Tuple<string, string>("name", "string")
};
Assert.Equal(expected, _df.DTypes());

Assert.IsType<bool>(_df.IsLocal());

Assert.IsType<bool>(_df.IsStreaming());
Expand Down
8 changes: 8 additions & 0 deletions src/csharp/Microsoft.Spark/Sql/DataFrame.cs
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,14 @@ public void Explain(bool extended = false)
Console.WriteLine((string)execution.Invoke(extended ? "toString" : "simpleString"));
}

/// <summary>
/// Returns all column names and their data types as an IEnumerable of Tuples.
/// </summary>
/// <returns>IEnumerable of Tuple of strings</returns>
public IEnumerable<Tuple<string, string>> DTypes() =>
Schema().Fields.Select(
f => new Tuple<string, string>(f.Name, f.DataType.SimpleString));

/// <summary>
/// Returns all column names.
/// </summary>
Expand Down