From 03b79393e71910a33a39864e563fcbeb2de56658 Mon Sep 17 00:00:00 2001
From: Niharika Dutta <nidutta@microsoft.com>
Date: Sun, 19 Apr 2020 22:31:05 -0700
Subject: [PATCH 01/36] Adding section for UDF serialization

---
 docs/broadcast-guide.md |  92 +++++++++++++++++++++
 docs/udf-guide.md       | 172 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 264 insertions(+)
 create mode 100644 docs/broadcast-guide.md
 create mode 100644 docs/udf-guide.md
diff --git a/docs/broadcast-guide.md b/docs/broadcast-guide.md
new file mode 100644
index 000000000..4286c569e
--- /dev/null
+++ b/docs/broadcast-guide.md
@@ -0,0 +1,92 @@
+# Guide to using Broadcast Variables
+
+This is a guide to show how to use broadcast variables in .NET for Apache Spark.
+
+## What are Broadcast Variables
+
+[Broadcast variables in Apache Spark](https://spark.apache.org/docs/2.2.0/rdd-programming-guide.html#broadcast-variables) are a mechanism for sharing variables across executors that are meant to be read-only. They allow the programmer to keep a read-only variable cached on each machine rather than shipping a copy of it with tasks. They can be used, for example, to give every node a copy of a large input dataset in an efficient manner.
+
+### How to use broadcast variables in .NET for Apache Spark
+
+Broadcast variables are created from a variable `v` by calling `SparkContext.Broadcast(v)`. The broadcast variable is a wrapper around `v`, and its value can be accessed by calling the `Value()` method on it. 
+
+Example:
+
+```csharp
+string v = "Variable to be broadcasted";
+Broadcast<string> bv = SparkContext.Broadcast(v);
+
+// Using the broadcast variable in a UDF:
+Func<Column, Column> udf = Udf<string, string>(
+	str => $"{str}: {bv.Value()}");
+```
+
+The type of broadcast variable is captured by using Generics in C#, as can be seen in the above example.
+
+### Deleting broadcast variables
+
+The broadcast variable can be deleted from all executors by calling the `Destroy()` function on it.
+
+```csharp
+// Destroying the broadcast variable bv:
+bv.Destroy();
+```
+
+> Note: `Destroy` deletes all data and metadata related to the broadcast variable. Use this with caution- once a broadcast variable has been destroyed, it cannot be used again.
+
+#### Caveat of using Destroy
+
+One important thing to keep in mind while using broadcast variables in UDFs is to limit the scope of the variable to only the UDF that is referencing it. The [guide to using UDFs](udf-guide.md) describes this phenomenon in detail. This is especially crucial when calling `Destroy` on the broadcast variable. If the broadcast variable that has been destroyed is visible to or accessible from other UDFs, it gets picked up for serialization by all those UDFs, even if it is not being referenced by them. This will throw an error as .NET for Apache Spark is not able to serialize the destroyed broadcast variable.
+
+Example to demonstrate:
+
+```csharp
+string v = "Variable to be broadcasted";
+Broadcast<string> bv = SparkContext.Broadcast(v);
+
+// Using the broadcast variable in a UDF:
+Func<Column, Column> udf1 = Udf<string, string>(
+	str => $"{str}: {bv.Value()}");
+
+// Destroying bv
+bv.Destroy();
+
+// Calling udf1 after destroying bv throws the following expected exception:
+// org.apache.spark.SparkException: Attempted to use Broadcast(0) after it was destroyed
+df.Select(udf1(df["_1"])).Show();
+
+// Different UDF udf2 that is not referencing bv
+Func<Column, Column> udf2 = Udf<string, string>(
+	str => $"{str}: not referencing broadcast variable");
+
+// Calling udf2 throws the following (unexpected) exception:
+// [Error] [JvmBridge] org.apache.spark.SparkException: Task not serializable
+df.Select(udf2(df["_1"])).Show();
+```
+
+The recommended way of implementing above desired behavior:
+
+```csharp
+string v = "Variable to be broadcasted";
+// Restricting the visibility of bv to only the UDF referencing it
+{
+	Broadcast<string> bv = SparkContext.Broadcast(v);
+
+	// Using the broadcast variable in a UDF:
+	Func<Column, Column> udf1 = Udf<string, string>(
+		str => $"{str}: {bv.Value()}");
+
+	// Destroying bv
+	bv.Destroy();
+}
+
+// Different UDF udf2 that is not referencing bv
+Func<Column, Column> udf2 = Udf<string, string>(
+	str => $"{str}: not referencing broadcast variable");
+
+// Calling udf2 works fine as expected
+df.Select(udf2(df["_1"])).Show();
+```
+ This ensures that destroying `bv` doesn't affect calling `udf2` because of unexpected serialization behavior. 
+
+ Broadcast variables are very useful for transmitting read-only data to all executors, as the data is sent only once and this gives huge performance benefits when compared with using local variables that get shipped to the executors with each task. Please refer to the [official documentation](https://spark.apache.org/docs/2.2.0/rdd-programming-guide.html#broadcast-variables) to get a deeper understanding of broadcast variables and why they are used.
\ No newline at end of file
diff --git a/docs/udf-guide.md b/docs/udf-guide.md
new file mode 100644
index 000000000..bb308815d
--- /dev/null
+++ b/docs/udf-guide.md
@@ -0,0 +1,172 @@
+# Guide to User-Defined Functions (UDFs)
+
+This is a guide to show how to use UDFs in .NET for Apache Spark.
+
+## What are UDFs
+
+[User-Defined Functions (UDFs)](https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/expressions/UserDefinedFunction.html) are a feature of Spark that allow developers to use custom functions to extend the system's built-in functionality. They transform values from a single row within a table to produce a single corresponding output value per row based on the logic defined in the UDF.
+
+Let's take the following as an example for a UDF definition:
+
+```csharp
+string s1 = "hello";
+Func<Column, Column> udf = Udf<string, string>(
+    str => $"{s1} {str}");
+
+```
+The above defined UDF takes a `string` as an input (in the form of a [Column](https://github.com/dotnet/spark/blob/master/src/csharp/Microsoft.Spark/Sql/Column.cs#L14) of a [Dataframe](https://github.com/dotnet/spark/blob/master/src/csharp/Microsoft.Spark/Sql/DataFrame.cs#L24)), and returns a `string` with `hello` appended in front of the input.
+
+For a sample Dataframe, let's take the following Dataframe `df`:
+
+```text
++-------+
+|   name|
++-------+
+|Michael|
+|   Andy|
+| Justin|
++-------+
+```
+
+Now let's apply the above defined `udf` to the dataframe `df`:
+
+```csharp
+DataFrame udfResult = df.Select(udf(df["name"]));
+```
+
+This would return the below as the Dataframe `udfResult`:
+
+```text
++-------------+
+|         name|
++-------------+
+|hello Michael|
+|   hello Andy|
+| hello Justin|
++-------------+
+```
+To get a better understanding of how to implement UDFs, please take a look at the [UDF helper functions](https://github.com/dotnet/spark/blob/master/src/csharp/Microsoft.Spark/Sql/Functions.cs#L3616) and some [test examples](https://github.com/dotnet/spark/blob/master/src/csharp/Microsoft.Spark.E2ETest/UdfTests/UdfSimpleTypesTests.cs#L49).
+
+## UDF serialization
+
+Since UDFs are functions that need to be executed on the workers, they have to be serialized and sent to the workers as part of the payload from the driver. This involves serializing the [delegate](https://docs.microsoft.com/en-us/dotnet/csharp/programming-guide/delegates/) which is a reference to the method, along with its [target](https://docs.microsoft.com/en-us/dotnet/api/system.delegate.target?view=netframework-4.8) which is the class instance on which the current delegate invokes the instance method. Please take a look at this [code](https://github.com/dotnet/spark/blob/master/src/csharp/Microsoft.Spark/Utils/CommandSerDe.cs#L149) to get a better understanding of how UDF serialization is being done.
+
+## Good to know while implementing UDFs
+
+One behavior to be aware of while implementing UDFs in .NET for Apache Spark is how the target of the UDF gets serialized. .NET for Apache Spark uses .NET Core, which does not support serializing delegates, so it is instead done by using reflection to serialize the target where the delegate is defined. When multiple delegates are defined in a common scope, they have a shared closure that becomes the target of reflection for serialization. Let's take an example to illustrate what that means.
+
+The following code snippet defines two string variables that are being referenced in two function delegates, that just return the respective strings as result:
+
+```csharp
+using System;
+
+public class C {
+    public void M() {
+        string s1 = "s1";
+        string s2 = "s2";
+        Func<string, string> a = str => s1;
+        Func<string, string> b = str => s2;
+    }
+}
+```
+
+The above C# code generates the following C# disassembly (credit source: [sharplab.io](sharplab.io)) code from the compiler:
+
+```csharp
+public class C
+{
+    [CompilerGenerated]
+    private sealed class <>c__DisplayClass0_0
+    {
+        public string s1;
+
+        public string s2;
+
+        internal string <M>b__0(string str)
+        {
+            return s1;
+        }
+
+        internal string <M>b__1(string str)
+        {
+            return s2;
+        }
+    }
+
+    public void M()
+    {
+        <>c__DisplayClass0_0 <>c__DisplayClass0_ = new <>c__DisplayClass0_0();
+        <>c__DisplayClass0_.s1 = "s1";
+        <>c__DisplayClass0_.s2 = "s2";
+        Func<string, string> func = new Func<string, string>(<>c__DisplayClass0_.<M>b__0);
+        Func<string, string> func2 = new Func<string, string>(<>c__DisplayClass0_.<M>b__1);
+    }
+}
+```
+As can be seen in the above IL code, both `func` and `func2` share the same closure `<>c__DisplayClass0_0`, which is the target that is serialized when serializing the delegates `func` and `func2`. Hence, even though `Func<string, string> a` is only referencing `s1`, `s2` also gets serialized when sending over the bytes to the workers.
+
+This can lead to some unexpected behaviors at runtime (like in the case of using [broadcast variables](broadcast-guide.md)), which is why we recommend restricting the visibility of the variables used in a function to that function's scope.
+Taking the above example to better explain what that means:
+
+Recommended user code to implement desired behavior of previous code snippet:
+
+```csharp
+using System;
+
+public class C {
+    public void M() {
+        {
+            string s1 = "s1";
+            Func<string, string> a = str => s1;
+        }
+        {
+            string s2 = "s2";
+            Func<string, string> b = str => s2;
+        }
+    }
+}
+```
+
+The above C# code generates the following C# disassembly (credit source: [sharplab.io](sharplab.io)) code from the compiler:
+
+```csharp
+public class C
+{
+    [CompilerGenerated]
+    private sealed class <>c__DisplayClass0_0
+    {
+        public string s1;
+
+        internal string <M>b__0(string str)
+        {
+            return s1;
+        }
+    }
+
+    [CompilerGenerated]
+    private sealed class <>c__DisplayClass0_1
+    {
+        public string s2;
+
+        internal string <M>b__1(string str)
+        {
+            return s2;
+        }
+    }
+
+    public void M()
+    {
+        <>c__DisplayClass0_0 <>c__DisplayClass0_ = new <>c__DisplayClass0_0();
+        <>c__DisplayClass0_.s1 = "s1";
+        Func<string, string> func = new Func<string, string>(<>c__DisplayClass0_.<M>b__0);
+        <>c__DisplayClass0_1 <>c__DisplayClass0_2 = new <>c__DisplayClass0_1();
+        <>c__DisplayClass0_2.s2 = "s2";
+        Func<string, string> func2 = new Func<string, string>(<>c__DisplayClass0_2.<M>b__1);
+    }
+}
+```
+
+Here we see that `func` and `func2` no longer share a closure and have their own separate closures `<>c__DisplayClass0_0` and `<>c__DisplayClass0_1` respectively. When used as the target for serialization, nothing other than the referenced variables will get serialized for the delegate.
+
+This above behavior is important to keep in mind while implementing multiple UDFs in a common scope. 
+To learn more about UDFs in general, please review the following articles that explain UDFs and how to use them: [UDFs in databricks(scala)](https://docs.databricks.com/spark/latest/spark-sql/udf-scala.html), [Spark UDFs and some gotchas](https://medium.com/@achilleus/spark-udfs-we-can-use-them-but-should-we-use-them-2c5a561fde6d).
\ No newline at end of file

From 4ef693dbf7616b738a6ae70d1e9dc8c12dd8e5d3 Mon Sep 17 00:00:00 2001
From: Niharika Dutta <nidutta@microsoft.com>
Date: Sun, 19 Apr 2020 22:32:56 -0700
Subject: [PATCH 02/36] removing guides from master

---
 docs/broadcast-guide.md |  92 ---------------------
 docs/udf-guide.md       | 172 ----------------------------------------
 2 files changed, 264 deletions(-)
 delete mode 100644 docs/broadcast-guide.md
 delete mode 100644 docs/udf-guide.md

diff --git a/docs/broadcast-guide.md b/docs/broadcast-guide.md
deleted file mode 100644
index 4286c569e..000000000
--- a/docs/broadcast-guide.md
+++ /dev/null
@@ -1,92 +0,0 @@
-# Guide to using Broadcast Variables
-
-This is a guide to show how to use broadcast variables in .NET for Apache Spark.
-
-## What are Broadcast Variables
-
-[Broadcast variables in Apache Spark](https://spark.apache.org/docs/2.2.0/rdd-programming-guide.html#broadcast-variables) are a mechanism for sharing variables across executors that are meant to be read-only. They allow the programmer to keep a read-only variable cached on each machine rather than shipping a copy of it with tasks. They can be used, for example, to give every node a copy of a large input dataset in an efficient manner.
-
-### How to use broadcast variables in .NET for Apache Spark
-
-Broadcast variables are created from a variable `v` by calling `SparkContext.Broadcast(v)`. The broadcast variable is a wrapper around `v`, and its value can be accessed by calling the `Value()` method on it. 
-
-Example:
-
-```csharp
-string v = "Variable to be broadcasted";
-Broadcast<string> bv = SparkContext.Broadcast(v);
-
-// Using the broadcast variable in a UDF:
-Func<Column, Column> udf = Udf<string, string>(
-	str => $"{str}: {bv.Value()}");
-```
-
-The type of broadcast variable is captured by using Generics in C#, as can be seen in the above example.
-
-### Deleting broadcast variables
-
-The broadcast variable can be deleted from all executors by calling the `Destroy()` function on it.
-
-```csharp
-// Destroying the broadcast variable bv:
-bv.Destroy();
-```
-
-> Note: `Destroy` deletes all data and metadata related to the broadcast variable. Use this with caution- once a broadcast variable has been destroyed, it cannot be used again.
-
-#### Caveat of using Destroy
-
-One important thing to keep in mind while using broadcast variables in UDFs is to limit the scope of the variable to only the UDF that is referencing it. The [guide to using UDFs](udf-guide.md) describes this phenomenon in detail. This is especially crucial when calling `Destroy` on the broadcast variable. If the broadcast variable that has been destroyed is visible to or accessible from other UDFs, it gets picked up for serialization by all those UDFs, even if it is not being referenced by them. This will throw an error as .NET for Apache Spark is not able to serialize the destroyed broadcast variable.
-
-Example to demonstrate:
-
-```csharp
-string v = "Variable to be broadcasted";
-Broadcast<string> bv = SparkContext.Broadcast(v);
-
-// Using the broadcast variable in a UDF:
-Func<Column, Column> udf1 = Udf<string, string>(
-	str => $"{str}: {bv.Value()}");
-
-// Destroying bv
-bv.Destroy();
-
-// Calling udf1 after destroying bv throws the following expected exception:
-// org.apache.spark.SparkException: Attempted to use Broadcast(0) after it was destroyed
-df.Select(udf1(df["_1"])).Show();
-
-// Different UDF udf2 that is not referencing bv
-Func<Column, Column> udf2 = Udf<string, string>(
-	str => $"{str}: not referencing broadcast variable");
-
-// Calling udf2 throws the following (unexpected) exception:
-// [Error] [JvmBridge] org.apache.spark.SparkException: Task not serializable
-df.Select(udf2(df["_1"])).Show();
-```
-
-The recommended way of implementing above desired behavior:
-
-```csharp
-string v = "Variable to be broadcasted";
-// Restricting the visibility of bv to only the UDF referencing it
-{
-	Broadcast<string> bv = SparkContext.Broadcast(v);
-
-	// Using the broadcast variable in a UDF:
-	Func<Column, Column> udf1 = Udf<string, string>(
-		str => $"{str}: {bv.Value()}");
-
-	// Destroying bv
-	bv.Destroy();
-}
-
-// Different UDF udf2 that is not referencing bv
-Func<Column, Column> udf2 = Udf<string, string>(
-	str => $"{str}: not referencing broadcast variable");
-
-// Calling udf2 works fine as expected
-df.Select(udf2(df["_1"])).Show();
-```
- This ensures that destroying `bv` doesn't affect calling `udf2` because of unexpected serialization behavior. 
-
- Broadcast variables are very useful for transmitting read-only data to all executors, as the data is sent only once and this gives huge performance benefits when compared with using local variables that get shipped to the executors with each task. Please refer to the [official documentation](https://spark.apache.org/docs/2.2.0/rdd-programming-guide.html#broadcast-variables) to get a deeper understanding of broadcast variables and why they are used.
\ No newline at end of file
diff --git a/docs/udf-guide.md b/docs/udf-guide.md
deleted file mode 100644
index bb308815d..000000000
--- a/docs/udf-guide.md
+++ /dev/null
@@ -1,172 +0,0 @@
-# Guide to User-Defined Functions (UDFs)
-
-This is a guide to show how to use UDFs in .NET for Apache Spark.
-
-## What are UDFs
-
-[User-Defined Functions (UDFs)](https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/expressions/UserDefinedFunction.html) are a feature of Spark that allow developers to use custom functions to extend the system's built-in functionality. They transform values from a single row within a table to produce a single corresponding output value per row based on the logic defined in the UDF.
-
-Let's take the following as an example for a UDF definition:
-
-```csharp
-string s1 = "hello";
-Func<Column, Column> udf = Udf<string, string>(
-    str => $"{s1} {str}");
-
-```
-The above defined UDF takes a `string` as an input (in the form of a [Column](https://github.com/dotnet/spark/blob/master/src/csharp/Microsoft.Spark/Sql/Column.cs#L14) of a [Dataframe](https://github.com/dotnet/spark/blob/master/src/csharp/Microsoft.Spark/Sql/DataFrame.cs#L24)), and returns a `string` with `hello` appended in front of the input.
-
-For a sample Dataframe, let's take the following Dataframe `df`:
-
-```text
-+-------+
-|   name|
-+-------+
-|Michael|
-|   Andy|
-| Justin|
-+-------+
-```
-
-Now let's apply the above defined `udf` to the dataframe `df`:
-
-```csharp
-DataFrame udfResult = df.Select(udf(df["name"]));
-```
-
-This would return the below as the Dataframe `udfResult`:
-
-```text
-+-------------+
-|         name|
-+-------------+
-|hello Michael|
-|   hello Andy|
-| hello Justin|
-+-------------+
-```
-To get a better understanding of how to implement UDFs, please take a look at the [UDF helper functions](https://github.com/dotnet/spark/blob/master/src/csharp/Microsoft.Spark/Sql/Functions.cs#L3616) and some [test examples](https://github.com/dotnet/spark/blob/master/src/csharp/Microsoft.Spark.E2ETest/UdfTests/UdfSimpleTypesTests.cs#L49).
-
-## UDF serialization
-
-Since UDFs are functions that need to be executed on the workers, they have to be serialized and sent to the workers as part of the payload from the driver. This involves serializing the [delegate](https://docs.microsoft.com/en-us/dotnet/csharp/programming-guide/delegates/) which is a reference to the method, along with its [target](https://docs.microsoft.com/en-us/dotnet/api/system.delegate.target?view=netframework-4.8) which is the class instance on which the current delegate invokes the instance method. Please take a look at this [code](https://github.com/dotnet/spark/blob/master/src/csharp/Microsoft.Spark/Utils/CommandSerDe.cs#L149) to get a better understanding of how UDF serialization is being done.
-
-## Good to know while implementing UDFs
-
-One behavior to be aware of while implementing UDFs in .NET for Apache Spark is how the target of the UDF gets serialized. .NET for Apache Spark uses .NET Core, which does not support serializing delegates, so it is instead done by using reflection to serialize the target where the delegate is defined. When multiple delegates are defined in a common scope, they have a shared closure that becomes the target of reflection for serialization. Let's take an example to illustrate what that means.
-
-The following code snippet defines two string variables that are being referenced in two function delegates, that just return the respective strings as result:
-
-```csharp
-using System;
-
-public class C {
-    public void M() {
-        string s1 = "s1";
-        string s2 = "s2";
-        Func<string, string> a = str => s1;
-        Func<string, string> b = str => s2;
-    }
-}
-```
-
-The above C# code generates the following C# disassembly (credit source: [sharplab.io](sharplab.io)) code from the compiler:
-
-```csharp
-public class C
-{
-    [CompilerGenerated]
-    private sealed class <>c__DisplayClass0_0
-    {
-        public string s1;
-
-        public string s2;
-
-        internal string <M>b__0(string str)
-        {
-            return s1;
-        }
-
-        internal string <M>b__1(string str)
-        {
-            return s2;
-        }
-    }
-
-    public void M()
-    {
-        <>c__DisplayClass0_0 <>c__DisplayClass0_ = new <>c__DisplayClass0_0();
-        <>c__DisplayClass0_.s1 = "s1";
-        <>c__DisplayClass0_.s2 = "s2";
-        Func<string, string> func = new Func<string, string>(<>c__DisplayClass0_.<M>b__0);
-        Func<string, string> func2 = new Func<string, string>(<>c__DisplayClass0_.<M>b__1);
-    }
-}
-```
-As can be seen in the above IL code, both `func` and `func2` share the same closure `<>c__DisplayClass0_0`, which is the target that is serialized when serializing the delegates `func` and `func2`. Hence, even though `Func<string, string> a` is only referencing `s1`, `s2` also gets serialized when sending over the bytes to the workers.
-
-This can lead to some unexpected behaviors at runtime (like in the case of using [broadcast variables](broadcast-guide.md)), which is why we recommend restricting the visibility of the variables used in a function to that function's scope.
-Taking the above example to better explain what that means:
-
-Recommended user code to implement desired behavior of previous code snippet:
-
-```csharp
-using System;
-
-public class C {
-    public void M() {
-        {
-            string s1 = "s1";
-            Func<string, string> a = str => s1;
-        }
-        {
-            string s2 = "s2";
-            Func<string, string> b = str => s2;
-        }
-    }
-}
-```
-
-The above C# code generates the following C# disassembly (credit source: [sharplab.io](sharplab.io)) code from the compiler:
-
-```csharp
-public class C
-{
-    [CompilerGenerated]
-    private sealed class <>c__DisplayClass0_0
-    {
-        public string s1;
-
-        internal string <M>b__0(string str)
-        {
-            return s1;
-        }
-    }
-
-    [CompilerGenerated]
-    private sealed class <>c__DisplayClass0_1
-    {
-        public string s2;
-
-        internal string <M>b__1(string str)
-        {
-            return s2;
-        }
-    }
-
-    public void M()
-    {
-        <>c__DisplayClass0_0 <>c__DisplayClass0_ = new <>c__DisplayClass0_0();
-        <>c__DisplayClass0_.s1 = "s1";
-        Func<string, string> func = new Func<string, string>(<>c__DisplayClass0_.<M>b__0);
-        <>c__DisplayClass0_1 <>c__DisplayClass0_2 = new <>c__DisplayClass0_1();
-        <>c__DisplayClass0_2.s2 = "s2";
-        Func<string, string> func2 = new Func<string, string>(<>c__DisplayClass0_2.<M>b__1);
-    }
-}
-```
-
-Here we see that `func` and `func2` no longer share a closure and have their own separate closures `<>c__DisplayClass0_0` and `<>c__DisplayClass0_1` respectively. When used as the target for serialization, nothing other than the referenced variables will get serialized for the delegate.
-
-This above behavior is important to keep in mind while implementing multiple UDFs in a common scope. 
-To learn more about UDFs in general, please review the following articles that explain UDFs and how to use them: [UDFs in databricks(scala)](https://docs.databricks.com/spark/latest/spark-sql/udf-scala.html), [Spark UDFs and some gotchas](https://medium.com/@achilleus/spark-udfs-we-can-use-them-but-should-we-use-them-2c5a561fde6d).
\ No newline at end of file

From 6bab99604db5cc8b8528b54216085afb96cbaff7 Mon Sep 17 00:00:00 2001
From: GOEddieUK <goeddie@gmail.com>
Date: Mon, 27 Jul 2020 21:10:51 +0100
Subject: [PATCH 03/36] CountVectorizer

---
 .../ML/Feature/CountVectorizerModelTests.cs   |  73 +++++++
 .../ML/Feature/CountVectorizerTests.cs        |  70 +++++++
 .../ML/Feature/CountVectorizer.cs             | 195 ++++++++++++++++++
 .../ML/Feature/CountVectorizerModel.cs        | 170 +++++++++++++++
 4 files changed, 508 insertions(+)
 create mode 100644 src/csharp/Microsoft.Spark.E2ETest/IpcTests/ML/Feature/CountVectorizerModelTests.cs
 create mode 100644 src/csharp/Microsoft.Spark.E2ETest/IpcTests/ML/Feature/CountVectorizerTests.cs
 create mode 100644 src/csharp/Microsoft.Spark/ML/Feature/CountVectorizer.cs
 create mode 100644 src/csharp/Microsoft.Spark/ML/Feature/CountVectorizerModel.cs

diff --git a/src/csharp/Microsoft.Spark.E2ETest/IpcTests/ML/Feature/CountVectorizerModelTests.cs b/src/csharp/Microsoft.Spark.E2ETest/IpcTests/ML/Feature/CountVectorizerModelTests.cs
new file mode 100644
index 000000000..3c3132dd9
--- /dev/null
+++ b/src/csharp/Microsoft.Spark.E2ETest/IpcTests/ML/Feature/CountVectorizerModelTests.cs
@@ -0,0 +1,73 @@
+// Licensed to the .NET Foundation under one or more agreements.
+// The .NET Foundation licenses this file to you under the MIT license.
+// See the LICENSE file in the project root for more information.
+
+using System;
+using System.Collections.Generic;
+using System.IO;
+using Microsoft.Spark.ML.Feature;
+using Microsoft.Spark.Sql;
+using Microsoft.Spark.UnitTest.TestUtils;
+using Xunit;
+
+namespace Microsoft.Spark.E2ETest.IpcTests.ML.Feature
+{
+    [Collection("Spark E2E Tests")]
+    public class CountVectorizerModelTests
+    {
+        private readonly SparkSession _spark;
+
+        public CountVectorizerModelTests(SparkFixture fixture)
+        {
+            _spark = fixture.Spark;
+        }
+
+        [Fact]
+        public void Test_CountVectorizerModel()
+        {
+            DataFrame input = _spark.Sql("SELECT array('hello', 'I', 'AM', 'a', 'string', 'TO', " +
+                                         "'TOKENIZE') as input from range(100)");
+
+            const string inputColumn = "input";
+            const string outputColumn = "output";
+            const double minTf = 10.0;
+            const bool binary = false;
+            
+            List<string> vocabulary = new List<string>()
+            {
+                "hello",
+                "I",
+                "AM",
+                "TO",
+                "TOKENIZE"
+            };
+            
+            var countVectorizerModel = new CountVectorizerModel(vocabulary);
+            
+            Assert.IsType<CountVectorizerModel>(new CountVectorizerModel("my-uid", vocabulary));
+            
+            countVectorizerModel = countVectorizerModel
+                .SetInputCol(inputColumn)
+                .SetOutputCol(outputColumn)
+                .SetMinTF(minTf)
+                .SetBinary(binary);
+            
+            Assert.Equal(inputColumn, countVectorizerModel.GetInputCol());
+            Assert.Equal(outputColumn, countVectorizerModel.GetOutputCol());
+            Assert.Equal(minTf, countVectorizerModel.GetMinTF());
+            Assert.Equal(binary, countVectorizerModel.GetBinary());
+            using (var tempDirectory = new TemporaryDirectory())
+            {
+                string savePath = Path.Join(tempDirectory.Path, "countVectorizerModel");
+                countVectorizerModel.Save(savePath);
+                
+                CountVectorizerModel loadedModel = CountVectorizerModel.Load(savePath);
+                Assert.Equal(countVectorizerModel.Uid(), loadedModel.Uid());
+            }
+
+            Assert.IsType<int>(countVectorizerModel.GetVocabSize());
+            Assert.NotEmpty(countVectorizerModel.ExplainParams());
+            Assert.NotEmpty(countVectorizerModel.ToString());
+        } 
+    }
+}
diff --git a/src/csharp/Microsoft.Spark.E2ETest/IpcTests/ML/Feature/CountVectorizerTests.cs b/src/csharp/Microsoft.Spark.E2ETest/IpcTests/ML/Feature/CountVectorizerTests.cs
new file mode 100644
index 000000000..d54bfe376
--- /dev/null
+++ b/src/csharp/Microsoft.Spark.E2ETest/IpcTests/ML/Feature/CountVectorizerTests.cs
@@ -0,0 +1,70 @@
+// Licensed to the .NET Foundation under one or more agreements.
+// The .NET Foundation licenses this file to you under the MIT license.
+// See the LICENSE file in the project root for more information.
+
+using System;
+using System.IO;
+using Microsoft.Spark.ML.Feature;
+using Microsoft.Spark.Sql;
+using Microsoft.Spark.UnitTest.TestUtils;
+using Xunit;
+
+namespace Microsoft.Spark.E2ETest.IpcTests.ML.Feature
+{
+    [Collection("Spark E2E Tests")]
+    public class CountVectorizerTests
+    {
+        private readonly SparkSession _spark;
+
+        public CountVectorizerTests(SparkFixture fixture)
+        {
+            _spark = fixture.Spark;
+        }
+
+        [Fact]
+        public void Test_CountVectorizer()
+        {
+            DataFrame input = _spark.Sql("SELECT array('hello', 'I', 'AM', 'a', 'string', 'TO', " +
+                                         "'TOKENIZE') as input from range(100)");
+
+            const string inputColumn = "input";
+            const string outputColumn = "output";
+            const double minDf = 1;
+            const double maxDf = 100;
+            const double minTf = 10;
+            const int vocabSize = 10000;
+            const bool binary = false;
+            
+            var countVectorizer = new CountVectorizer();
+            
+            countVectorizer
+                .SetInputCol(inputColumn)
+                .SetOutputCol(outputColumn)
+                .SetMinDF(minDf)
+                .SetMaxDF(maxDf)
+                .SetMinTF(minTf)
+                .SetVocabSize(vocabSize);
+                
+            Assert.IsType<CountVectorizerModel>(countVectorizer.Fit(input));
+            Assert.Equal(inputColumn, countVectorizer.GetInputCol());
+            Assert.Equal(outputColumn, countVectorizer.GetOutputCol());
+            Assert.Equal(minDf, countVectorizer.GetMinDF());
+            Assert.Equal(maxDf, countVectorizer.GetMaxDF());
+            Assert.Equal(minTf, countVectorizer.GetMinTF());
+            Assert.Equal(vocabSize, countVectorizer.GetVocabSize());
+            Assert.Equal(binary, countVectorizer.GetBinary());
+
+            using (var tempDirectory = new TemporaryDirectory())
+            {
+                string savePath = Path.Join(tempDirectory.Path, "countVectorizer");
+                countVectorizer.Save(savePath);
+                
+                CountVectorizer loadedVectorizer = CountVectorizer.Load(savePath);
+                Assert.Equal(countVectorizer.Uid(), loadedVectorizer.Uid());
+            }
+            
+            Assert.NotEmpty(countVectorizer.ExplainParams());
+            Assert.NotEmpty(countVectorizer.ToString());
+        } 
+    }
+}
diff --git a/src/csharp/Microsoft.Spark/ML/Feature/CountVectorizer.cs b/src/csharp/Microsoft.Spark/ML/Feature/CountVectorizer.cs
new file mode 100644
index 000000000..41e0dbdd0
--- /dev/null
+++ b/src/csharp/Microsoft.Spark/ML/Feature/CountVectorizer.cs
@@ -0,0 +1,195 @@
+// Licensed to the .NET Foundation under one or more agreements.
+// The .NET Foundation licenses this file to you under the MIT license.
+// See the LICENSE file in the project root for more information.
+
+using Microsoft.Spark.Interop;
+using Microsoft.Spark.Interop.Ipc;
+using Microsoft.Spark.Sql;
+
+namespace Microsoft.Spark.ML.Feature
+{
+    public class CountVectorizer : FeatureBase<CountVectorizer>, IJvmObjectReferenceProvider
+    {
+        private static readonly string s_countVectorizerClassName = 
+            "org.apache.spark.ml.feature.CountVectorizer";
+        
+        /// <summary>
+        /// Create a <see cref="CountVectorizer"/> without any parameters
+        /// </summary>
+        public CountVectorizer() : base(s_countVectorizerClassName)
+        {
+        }
+
+        /// <summary>
+        /// Create a <see cref="CountVectorizer"/> with a UID that is used to give the
+        /// <see cref="CountVectorizer"/> a unique ID
+        /// </summary>
+        /// <param name="uid">An immutable unique ID for the object and its derivatives.</param>
+        public CountVectorizer(string uid) : base(s_countVectorizerClassName, uid)
+        {
+        }
+        
+        internal CountVectorizer(JvmObjectReference jvmObject) : base(jvmObject)
+        {
+        }
+
+        JvmObjectReference IJvmObjectReferenceProvider.Reference => _jvmObject;
+
+        /// <summary>Fits a model to the input data.</summary>
+        /// <param name="dataFrame">The <see cref="DataFrame"/> to fit the model to.</param>
+        /// <returns><see cref="CountVectorizerModel"/></returns>
+        public CountVectorizerModel Fit(DataFrame dataFrame) => 
+            new CountVectorizerModel((JvmObjectReference)_jvmObject.Invoke("fit", dataFrame));
+
+        /// <summary>
+        /// Loads the <see cref="CountVectorizer"/> that was previously saved using Save
+        /// </summary>
+        /// <param name="path">
+        /// The path the previous <see cref="CountVectorizer"/> was saved to
+        /// </param>
+        /// <returns>New <see cref="CountVectorizer"/> object</returns>
+        public static CountVectorizer Load(string path) =>
+            WrapAsType((JvmObjectReference)
+                SparkEnvironment.JvmBridge.CallStaticJavaMethod(
+                    s_countVectorizerClassName,"load", path));
+        
+        /// <summary>
+        /// Gets the binary toggle to control the output vector values. If True, all nonzero counts
+        /// (after minTF filter applied) are set to 1. This is useful for discrete probabilistic
+        /// models that model binary events rather than integer counts. Default: false
+        /// </summary>
+        /// <returns>boolean</returns>
+        public bool GetBinary() => (bool)_jvmObject.Invoke("getBinary");
+        
+        /// <summary>
+        /// Sets the binary toggle to control the output vector values. If True, all nonzero counts
+        /// (after minTF filter applied) are set to 1. This is useful for discrete probabilistic
+        /// models that model binary events rather than integer counts. Default: false
+        /// </summary>
+        /// <param name="value">Turn the binary toggle on or off</param>
+        /// <returns><see cref="CountVectorizer"/> with the new binary toggle value set</returns>
+        public CountVectorizer SetBinary(bool value) =>
+            WrapAsCountVectorizer((JvmObjectReference)_jvmObject.Invoke("setBinary", value));
+
+        private static CountVectorizer WrapAsCountVectorizer(object obj) => 
+            new CountVectorizer((JvmObjectReference)obj);
+
+        /// <summary>
+        /// Gets the column that the <see cref="CountVectorizer"/> should read from and convert
+        /// into buckets. This would have been set by SetInputCol
+        /// </summary>
+        /// <returns>string, the input column</returns>
+        public string GetInputCol() => _jvmObject.Invoke("getInputCol") as string;
+        
+        /// <summary>
+        /// Sets the column that the <see cref="CountVectorizer"/> should read from.
+        /// </summary>
+        /// <param name="value">The name of the column to as the source.</param>
+        /// <returns><see cref="CountVectorizer"/> with the input column set</returns>
+        public CountVectorizer SetInputCol(string value) =>
+            WrapAsCountVectorizer((JvmObjectReference)_jvmObject.Invoke("setInputCol", value));
+        
+        /// <summary>
+        /// The <see cref="CountVectorizer"/> will create a new column in the DataFrame, this is
+        /// the name of the new column.
+        /// </summary>
+        /// <returns>The name of the output column.</returns>
+        public string GetOutputCol() => _jvmObject.Invoke("getOutputCol") as string;
+        
+        /// <summary>
+        /// The <see cref="CountVectorizer"/> will create a new column in the DataFrame, this
+        /// is the name of the new column.
+        /// </summary>
+        /// <param name="value">The name of the output column which will be created.</param>
+        /// <returns>New <see cref="CountVectorizer"/> with the output column set</returns>
+        public CountVectorizer SetOutputCol(string value) =>
+            WrapAsCountVectorizer((JvmObjectReference)_jvmObject.Invoke("setOutputCol", value));
+        
+        /// <summary>
+        /// Gets the maximum number of different documents a term could appear in to be included in
+        /// the vocabulary. A term that appears more than the threshold will be ignored. If this is
+        /// an integer greater than or equal to 1, this specifies the maximum number of documents
+        /// the term could appear in; if this is a double in [0,1), then this specifies the maximum
+        /// fraction of documents the term could appear in.
+        /// </summary>
+        /// <returns>The maximum document term frequency</returns>
+        public double GetMaxDF() => (double)_jvmObject.Invoke("getMaxDF");
+
+        /// <summary>
+        /// Sets the maximum number of different documents a term could appear in to be included in
+        /// the vocabulary. A term that appears more than the threshold will be ignored. If this is
+        /// an integer greater than or equal to 1, this specifies the maximum number of documents
+        /// the term could appear in; if this is a double in [0,1), then this specifies the maximum
+        /// fraction of documents the term could appear in.
+        /// </summary>
+        /// <param name="value">The maximum document term frequency</param>
+        /// <returns>New <see cref="CountVectorizer"/> with the max df value set</returns>
+        public CountVectorizer SetMaxDF(double value) =>
+            WrapAsCountVectorizer((JvmObjectReference)_jvmObject.Invoke("setMaxDF", value));
+        
+        /// <summary>
+        /// Gets the minimum number of different documents a term must appear in to be included in
+        /// the vocabulary. If this is an integer greater than or equal to 1, this specifies the
+        /// number of documents the term must appear in; if this is a double in [0,1), then this
+        /// specifies the fraction of documents.
+        /// </summary>
+        /// <returns>The minimum document term frequency</returns>
+        public double GetMinDF() => (double)_jvmObject.Invoke("getMinDF");
+        
+        /// <summary>
+        /// Sets the minimum number of different documents a term must appear in to be included in
+        /// the vocabulary. If this is an integer greater than or equal to 1, this specifies the
+        /// number of documents the term must appear in; if this is a double in [0,1), then this
+        /// specifies the fraction of documents.
+        /// </summary>
+        /// <param name="value">The minimum document term frequency</param>
+        /// <returns>New <see cref="CountVectorizer"/> with the min df value set</returns>
+        public CountVectorizer SetMinDF(double value) =>
+            WrapAsCountVectorizer((JvmObjectReference)_jvmObject.Invoke("setMinDF", value));
+        
+        /// <summary>
+        /// Filter to ignore rare words in a document. For each document, terms with
+        /// frequency/count less than the given threshold are ignored. If this is an integer
+        /// greater than or equal to 1, then this specifies a count (of times the term must appear
+        /// in the document); if this is a double in [0,1), then this specifies a fraction (out of
+        /// the document's token count).
+        ///
+        /// Note that the parameter is only used in transform of CountVectorizerModel and does not
+        /// affect fitting.
+        /// </summary>
+        /// <returns>Minimum term frequency</returns>
+        public double GetMinTF() => (double)_jvmObject.Invoke("getMinTF");
+
+        /// <summary>
+        /// Filter to ignore rare words in a document. For each document, terms with
+        /// frequency/count less than the given threshold are ignored. If this is an integer
+        /// greater than or equal to 1, then this specifies a count (of times the term must appear
+        /// in the document); if this is a double in [0,1), then this specifies a fraction (out of
+        /// the document's token count).
+        ///
+        /// Note that the parameter is only used in transform of CountVectorizerModel and does not
+        /// affect fitting.
+        /// </summary>
+        /// <param name="value">Minimum term frequency</param>
+        /// <returns>New <see cref="CountVectorizer"/> with the min term frequency set</returns>
+        public CountVectorizer SetMinTF(double value) =>
+            WrapAsCountVectorizer((JvmObjectReference)_jvmObject.Invoke("setMinTF", value));
+        
+        /// <summary>
+        /// Gets the max size of the vocabulary. CountVectorizer will build a vocabulary that only
+        /// considers the top vocabSize terms ordered by term frequency across the corpus.
+        /// </summary>
+        /// <returns>The max size of the vocabulary</returns>
+        public int GetVocabSize() => (int)_jvmObject.Invoke("getVocabSize");
+        
+        /// <summary>
+        /// Sets the max size of the vocabulary. <see cref="CountVectorizer"/> will build a
+        /// vocabulary that only considers the top vocabSize terms ordered by term frequency across
+        /// the corpus.
+        /// </summary>
+        /// <param name="value">The max vocabulary size</param>
+        /// <returns><see cref="CountVectorizer"/> with the max vocab value set</returns>
+        public CountVectorizer SetVocabSize(int value) => 
+            WrapAsCountVectorizer(_jvmObject.Invoke("setVocabSize", value));
+    }
+}
diff --git a/src/csharp/Microsoft.Spark/ML/Feature/CountVectorizerModel.cs b/src/csharp/Microsoft.Spark/ML/Feature/CountVectorizerModel.cs
new file mode 100644
index 000000000..8a6e427df
--- /dev/null
+++ b/src/csharp/Microsoft.Spark/ML/Feature/CountVectorizerModel.cs
@@ -0,0 +1,170 @@
+// Licensed to the .NET Foundation under one or more agreements.
+// The .NET Foundation licenses this file to you under the MIT license.
+// See the LICENSE file in the project root for more information.
+
+using System.Collections.Generic;
+using Microsoft.Spark.Interop;
+using Microsoft.Spark.Interop.Ipc;
+
+namespace Microsoft.Spark.ML.Feature
+{
+    public class CountVectorizerModel : FeatureBase<CountVectorizerModel>
+        , IJvmObjectReferenceProvider
+    {
+        private static readonly string s_countVectorizerModelClassName = 
+            "org.apache.spark.ml.feature.CountVectorizerModel";
+        
+        /// <summary>
+        /// Create a <see cref="CountVectorizerModel"/> without any parameters
+        /// </summary>
+        /// <param name="vocabulary">The vocabulary to use</param>
+        public CountVectorizerModel(List<string> vocabulary) : 
+            this(SparkEnvironment.JvmBridge.CallConstructor(
+                s_countVectorizerModelClassName, vocabulary))
+        {
+        }
+
+        /// <summary>
+        /// Create a <see cref="CountVectorizerModel"/> with a UID that is used to give the
+        /// <see cref="CountVectorizerModel"/> a unique ID
+        /// </summary>
+        /// <param name="uid">An immutable unique ID for the object and its derivatives.</param>
+        /// <param name="vocabulary">The vocabulary to use</param>
+        public CountVectorizerModel(string uid, List<string> vocabulary) : 
+            this(SparkEnvironment.JvmBridge.CallConstructor(
+                s_countVectorizerModelClassName, uid, vocabulary))
+        {
+        }
+        
+        internal CountVectorizerModel(JvmObjectReference jvmObject) : base(jvmObject)
+        {
+        }
+
+        JvmObjectReference IJvmObjectReferenceProvider.Reference => _jvmObject;
+        
+        /// <summary>
+        /// Loads the <see cref="CountVectorizerModel"/> that was previously saved using Save
+        /// </summary>
+        /// <param name="path">
+        /// The path the previous <see cref="CountVectorizerModel"/> was saved to
+        /// </param>
+        /// <returns>New <see cref="CountVectorizerModel"/> object</returns>
+        public static CountVectorizerModel Load(string path) =>
+            WrapAsType((JvmObjectReference)
+                SparkEnvironment.JvmBridge.CallStaticJavaMethod(
+                    s_countVectorizerModelClassName,"load", path));
+        
+        /// <summary>
+        /// Gets the binary toggle to control the output vector values. If True, all nonzero counts
+        /// (after minTF filter applied) are set to 1. This is useful for discrete probabilistic
+        /// models that model binary events rather than integer counts. Default: false
+        /// </summary>
+        /// <returns>boolean</returns>
+        public bool GetBinary() => (bool)_jvmObject.Invoke("getBinary");
+
+        /// <summary>
+        /// Sets the binary toggle to control the output vector values. If True, all nonzero counts
+        /// (after minTF filter applied) are set to 1. This is useful for discrete probabilistic
+        /// models that model binary events rather than integer counts. Default: false
+        /// </summary>
+        /// <param name="value">Turn the binary toggle on or off</param>
+        /// <returns>
+        /// <see cref="CountVectorizerModel"/> with the new binary toggle value set
+        /// </returns>
+        public CountVectorizerModel SetBinary(bool value) =>
+            WrapAsCountVectorizerModel((JvmObjectReference)_jvmObject.Invoke("setBinary", value));
+
+        private static CountVectorizerModel WrapAsCountVectorizerModel(object obj) => 
+            new CountVectorizerModel((JvmObjectReference)obj);
+
+        /// <summary>
+        /// Gets the column that the <see cref="CountVectorizerModel"/> should read from and
+        /// convert into buckets. This would have been set by SetInputCol
+        /// </summary>
+        /// <returns>string, the input column</returns>
+        public string GetInputCol() => _jvmObject.Invoke("getInputCol") as string;
+        
+        /// <summary>
+        /// Sets the column that the <see cref="CountVectorizerModel"/> should read from.
+        /// </summary>
+        /// <param name="value">The name of the column to as the source.</param>
+        /// <returns><see cref="CountVectorizerModel"/> with the input column set</returns>
+        public CountVectorizerModel SetInputCol(string value) =>
+            WrapAsCountVectorizerModel(
+                (JvmObjectReference)_jvmObject.Invoke("setInputCol", value));
+        
+        /// <summary>
+        /// The <see cref="CountVectorizerModel"/> will create a new column in the DataFrame, this
+        /// is the name of the new column.
+        /// </summary>
+        /// <returns>The name of the output column.</returns>
+        public string GetOutputCol() => _jvmObject.Invoke("getOutputCol") as string;
+        
+        /// <summary>
+        /// The <see cref="CountVectorizerModel"/> will create a new column in the DataFrame,
+        /// this is the name of the new column.
+        /// </summary>
+        /// <param name="value">The name of the output column which will be created.</param>
+        /// <returns>New <see cref="CountVectorizerModel"/> with the output column set</returns>
+        public CountVectorizerModel SetOutputCol(string value) =>
+            WrapAsCountVectorizerModel(
+                (JvmObjectReference)_jvmObject.Invoke("setOutputCol", value));
+        
+        /// <summary>
+        /// Gets the maximum number of different documents a term could appear in to be included in
+        /// the vocabulary. A term that appears more than the threshold will be ignored. If this is
+        /// an integer greater than or equal to 1, this specifies the maximum number of documents
+        /// the term could appear in; if this is a double in [0,1), then this specifies the maximum
+        /// fraction of documents the term could appear in.
+        /// </summary>
+        /// <returns>The maximum document term frequency</returns>
+        public double GetMaxDF() => (double)_jvmObject.Invoke("getMaxDF");
+        
+        /// <summary>
+        /// Gets the minimum number of different documents a term must appear in to be included in
+        /// the vocabulary. If this is an integer greater than or equal to 1, this specifies the
+        /// number of documents the term must appear in; if this is a double in [0,1), then this
+        /// specifies the fraction of documents.
+        /// </summary>
+        /// <returns>The minimum document term frequency</returns>
+        public double GetMinDF() => (double)_jvmObject.Invoke("getMinDF");
+
+        /// <summary>
+        /// Filter to ignore rare words in a document. For each document, terms with
+        /// frequency/count less than the given threshold are ignored. If this is an integer
+        /// greater than or equal to 1, then this specifies a count (of times the term must appear
+        /// in the document); if this is a double in [0,1), then this specifies a fraction (out of
+        /// the document's token count).
+        ///
+        /// Note that the parameter is only used in transform of CountVectorizerModel and does not
+        /// affect fitting.
+        /// </summary>
+        /// <returns>Minimum term frequency</returns>
+        public double GetMinTF() => (double)_jvmObject.Invoke("getMinTF");
+
+        /// <summary>
+        /// Filter to ignore rare words in a document. For each document, terms with
+        /// frequency/count less than the given threshold are ignored. If this is an integer
+        /// greater than or equal to 1, then this specifies a count (of times the term must appear
+        /// in the document); if this is a double in [0,1), then this specifies a fraction (out of
+        /// the document's token count).
+        ///
+        /// Note that the parameter is only used in transform of CountVectorizerModel and does not
+        /// affect fitting.
+        /// </summary>
+        /// <param name="value">Minimum term frequency</param>
+        /// <returns>
+        /// New <see cref="CountVectorizerModel"/> with the min term frequency set
+        /// </returns>
+        public CountVectorizerModel SetMinTF(double value) =>
+            WrapAsCountVectorizerModel((JvmObjectReference)_jvmObject.Invoke("setMinTF", value));
+        
+        /// <summary>
+        /// Gets the max size of the vocabulary. <see cref="CountVectorizerModel"/> will build a
+        /// vocabulary that only considers the top vocabSize terms ordered by term frequency across
+        /// the corpus.
+        /// </summary>
+        /// <returns>The max size of the vocabulary</returns>
+        public int GetVocabSize() => (int)_jvmObject.Invoke("getVocabSize");
+    }
+}

From e2a566b1f4b29775be9b57616a258802e294f304 Mon Sep 17 00:00:00 2001
From: GOEddieUK <goeddie@gmail.com>
Date: Mon, 27 Jul 2020 21:24:35 +0100
Subject: [PATCH 04/36] moving private methods to bottom

---
 src/csharp/Microsoft.Spark/ML/Feature/CountVectorizer.cs    | 6 +++---
 .../Microsoft.Spark/ML/Feature/CountVectorizerModel.cs      | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/src/csharp/Microsoft.Spark/ML/Feature/CountVectorizer.cs b/src/csharp/Microsoft.Spark/ML/Feature/CountVectorizer.cs
index 41e0dbdd0..cf68f7c4a 100644
--- a/src/csharp/Microsoft.Spark/ML/Feature/CountVectorizer.cs
+++ b/src/csharp/Microsoft.Spark/ML/Feature/CountVectorizer.cs
@@ -71,9 +71,6 @@ public static CountVectorizer Load(string path) =>
         public CountVectorizer SetBinary(bool value) =>
             WrapAsCountVectorizer((JvmObjectReference)_jvmObject.Invoke("setBinary", value));
 
-        private static CountVectorizer WrapAsCountVectorizer(object obj) => 
-            new CountVectorizer((JvmObjectReference)obj);
-
         /// <summary>
         /// Gets the column that the <see cref="CountVectorizer"/> should read from and convert
         /// into buckets. This would have been set by SetInputCol
@@ -191,5 +188,8 @@ public CountVectorizer SetMinTF(double value) =>
         /// <returns><see cref="CountVectorizer"/> with the max vocab value set</returns>
         public CountVectorizer SetVocabSize(int value) => 
             WrapAsCountVectorizer(_jvmObject.Invoke("setVocabSize", value));
+        
+        private static CountVectorizer WrapAsCountVectorizer(object obj) => 
+            new CountVectorizer((JvmObjectReference)obj);
     }
 }
diff --git a/src/csharp/Microsoft.Spark/ML/Feature/CountVectorizerModel.cs b/src/csharp/Microsoft.Spark/ML/Feature/CountVectorizerModel.cs
index 8a6e427df..8e225a179 100644
--- a/src/csharp/Microsoft.Spark/ML/Feature/CountVectorizerModel.cs
+++ b/src/csharp/Microsoft.Spark/ML/Feature/CountVectorizerModel.cs
@@ -74,9 +74,6 @@ public static CountVectorizerModel Load(string path) =>
         public CountVectorizerModel SetBinary(bool value) =>
             WrapAsCountVectorizerModel((JvmObjectReference)_jvmObject.Invoke("setBinary", value));
 
-        private static CountVectorizerModel WrapAsCountVectorizerModel(object obj) => 
-            new CountVectorizerModel((JvmObjectReference)obj);
-
         /// <summary>
         /// Gets the column that the <see cref="CountVectorizerModel"/> should read from and
         /// convert into buckets. This would have been set by SetInputCol
@@ -166,5 +163,8 @@ public CountVectorizerModel SetMinTF(double value) =>
         /// </summary>
         /// <returns>The max size of the vocabulary</returns>
         public int GetVocabSize() => (int)_jvmObject.Invoke("getVocabSize");
+        
+        private static CountVectorizerModel WrapAsCountVectorizerModel(object obj) => 
+            new CountVectorizerModel((JvmObjectReference)obj);
     }
 }

From 5f682a601ec783f1609e6fd6e32c4d83ff1491d1 Mon Sep 17 00:00:00 2001
From: GOEddieUK <goeddie@gmail.com>
Date: Tue, 28 Jul 2020 20:47:31 +0100
Subject: [PATCH 05/36] changing wrap method

---
 src/csharp/Microsoft.Spark/ML/Feature/CountVectorizer.cs      | 2 +-
 src/csharp/Microsoft.Spark/ML/Feature/CountVectorizerModel.cs | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/csharp/Microsoft.Spark/ML/Feature/CountVectorizer.cs b/src/csharp/Microsoft.Spark/ML/Feature/CountVectorizer.cs
index cf68f7c4a..b3fa0ef8a 100644
--- a/src/csharp/Microsoft.Spark/ML/Feature/CountVectorizer.cs
+++ b/src/csharp/Microsoft.Spark/ML/Feature/CountVectorizer.cs
@@ -49,7 +49,7 @@ public CountVectorizerModel Fit(DataFrame dataFrame) =>
         /// </param>
         /// <returns>New <see cref="CountVectorizer"/> object</returns>
         public static CountVectorizer Load(string path) =>
-            WrapAsType((JvmObjectReference)
+            WrapAsCountVectorizer((JvmObjectReference)
                 SparkEnvironment.JvmBridge.CallStaticJavaMethod(
                     s_countVectorizerClassName,"load", path));
         
diff --git a/src/csharp/Microsoft.Spark/ML/Feature/CountVectorizerModel.cs b/src/csharp/Microsoft.Spark/ML/Feature/CountVectorizerModel.cs
index 8e225a179..52bbd72c3 100644
--- a/src/csharp/Microsoft.Spark/ML/Feature/CountVectorizerModel.cs
+++ b/src/csharp/Microsoft.Spark/ML/Feature/CountVectorizerModel.cs
@@ -50,7 +50,7 @@ internal CountVectorizerModel(JvmObjectReference jvmObject) : base(jvmObject)
         /// </param>
         /// <returns>New <see cref="CountVectorizerModel"/> object</returns>
         public static CountVectorizerModel Load(string path) =>
-            WrapAsType((JvmObjectReference)
+            WrapAsCountVectorizerModel((JvmObjectReference)
                 SparkEnvironment.JvmBridge.CallStaticJavaMethod(
                     s_countVectorizerModelClassName,"load", path));
         

From 31371db73b4faa653c07fdb8082e7aed02c0a031 Mon Sep 17 00:00:00 2001
From: GOEddieUK <goeddie@gmail.com>
Date: Fri, 31 Jul 2020 18:45:46 +0100
Subject: [PATCH 06/36] setting min version required

---
 .../IpcTests/ML/Feature/CountVectorizerTests.cs    | 14 ++++++++++----
 .../Microsoft.Spark/ML/Feature/CountVectorizer.cs  |  2 ++
 .../Microsoft.Spark/ML/Feature/FeatureBase.cs      |  3 ++-
 src/csharp/Microsoft.Spark/Microsoft.Spark.csproj  |  5 +----
 4 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/src/csharp/Microsoft.Spark.E2ETest/IpcTests/ML/Feature/CountVectorizerTests.cs b/src/csharp/Microsoft.Spark.E2ETest/IpcTests/ML/Feature/CountVectorizerTests.cs
index d54bfe376..95b9bc504 100644
--- a/src/csharp/Microsoft.Spark.E2ETest/IpcTests/ML/Feature/CountVectorizerTests.cs
+++ b/src/csharp/Microsoft.Spark.E2ETest/IpcTests/ML/Feature/CountVectorizerTests.cs
@@ -4,6 +4,7 @@
 
 using System;
 using System.IO;
+using Microsoft.Spark.E2ETest.Utils;
 using Microsoft.Spark.ML.Feature;
 using Microsoft.Spark.Sql;
 using Microsoft.Spark.UnitTest.TestUtils;
@@ -30,7 +31,6 @@ public void Test_CountVectorizer()
             const string inputColumn = "input";
             const string outputColumn = "output";
             const double minDf = 1;
-            const double maxDf = 100;
             const double minTf = 10;
             const int vocabSize = 10000;
             const bool binary = false;
@@ -41,7 +41,6 @@ public void Test_CountVectorizer()
                 .SetInputCol(inputColumn)
                 .SetOutputCol(outputColumn)
                 .SetMinDF(minDf)
-                .SetMaxDF(maxDf)
                 .SetMinTF(minTf)
                 .SetVocabSize(vocabSize);
                 
@@ -49,7 +48,6 @@ public void Test_CountVectorizer()
             Assert.Equal(inputColumn, countVectorizer.GetInputCol());
             Assert.Equal(outputColumn, countVectorizer.GetOutputCol());
             Assert.Equal(minDf, countVectorizer.GetMinDF());
-            Assert.Equal(maxDf, countVectorizer.GetMaxDF());
             Assert.Equal(minTf, countVectorizer.GetMinTF());
             Assert.Equal(vocabSize, countVectorizer.GetVocabSize());
             Assert.Equal(binary, countVectorizer.GetBinary());
@@ -65,6 +63,14 @@ public void Test_CountVectorizer()
             
             Assert.NotEmpty(countVectorizer.ExplainParams());
             Assert.NotEmpty(countVectorizer.ToString());
-        } 
+        }
+
+        [SkipIfSparkVersionIsLessThan(Versions.V2_4_0)]
+        public void CountVectorizer_MaxDF()
+        {
+            const double maxDf = 100;
+            CountVectorizer countVectorizer = new CountVectorizer().SetMaxDF(maxDf);
+            Assert.Equal(maxDf, countVectorizer.GetMaxDF());
+        }
     }
 }
diff --git a/src/csharp/Microsoft.Spark/ML/Feature/CountVectorizer.cs b/src/csharp/Microsoft.Spark/ML/Feature/CountVectorizer.cs
index b3fa0ef8a..5689e19fd 100644
--- a/src/csharp/Microsoft.Spark/ML/Feature/CountVectorizer.cs
+++ b/src/csharp/Microsoft.Spark/ML/Feature/CountVectorizer.cs
@@ -110,6 +110,7 @@ public CountVectorizer SetOutputCol(string value) =>
         /// fraction of documents the term could appear in.
         /// </summary>
         /// <returns>The maximum document term frequency</returns>
+        [Since(Versions.V2_4_0)]
         public double GetMaxDF() => (double)_jvmObject.Invoke("getMaxDF");
 
         /// <summary>
@@ -121,6 +122,7 @@ public CountVectorizer SetOutputCol(string value) =>
         /// </summary>
         /// <param name="value">The maximum document term frequency</param>
         /// <returns>New <see cref="CountVectorizer"/> with the max df value set</returns>
+        [Since(Versions.V2_4_0)]
         public CountVectorizer SetMaxDF(double value) =>
             WrapAsCountVectorizer((JvmObjectReference)_jvmObject.Invoke("setMaxDF", value));
         
diff --git a/src/csharp/Microsoft.Spark/ML/Feature/FeatureBase.cs b/src/csharp/Microsoft.Spark/ML/Feature/FeatureBase.cs
index fcc90b43d..0895dace1 100644
--- a/src/csharp/Microsoft.Spark/ML/Feature/FeatureBase.cs
+++ b/src/csharp/Microsoft.Spark/ML/Feature/FeatureBase.cs
@@ -98,7 +98,7 @@ public Param.Param GetParam(string paramName) =>
         public T Set(Param.Param param, object value) => 
             WrapAsType((JvmObjectReference)_jvmObject.Invoke("set", param, value));
 
-        private static T WrapAsType(JvmObjectReference reference)
+        internal static T WrapAsType(JvmObjectReference reference)
         {
             ConstructorInfo constructor = typeof(T)
                 .GetConstructors(BindingFlags.NonPublic | BindingFlags.Instance)
@@ -111,5 +111,6 @@ private static T WrapAsType(JvmObjectReference reference)
 
             return (T)constructor.Invoke(new object[] {reference});
         }
+        
     }
 }
diff --git a/src/csharp/Microsoft.Spark/Microsoft.Spark.csproj b/src/csharp/Microsoft.Spark/Microsoft.Spark.csproj
index 2cddc5627..f284de8c6 100644
--- a/src/csharp/Microsoft.Spark/Microsoft.Spark.csproj
+++ b/src/csharp/Microsoft.Spark/Microsoft.Spark.csproj
@@ -38,10 +38,7 @@
   </ItemGroup>
 
   <ItemGroup>
-    <Content Include="..\..\scala\microsoft-spark-*\target\microsoft-spark-*.jar"
-             Link="jars\%(Filename)%(Extension)"
-             Pack="true"
-             PackagePath="jars\%(Filename)%(Extension)" />
+    <Content Include="..\..\scala\microsoft-spark-*\target\microsoft-spark-*.jar" Link="jars\%(Filename)%(Extension)" Pack="true" PackagePath="jars\%(Filename)%(Extension)" />
     <Content Include="build\**" Pack="true" PackagePath="build" />
   </ItemGroup>
 

From 60eb82f40ac37c553ca00a3ab4d0e404e4447dca Mon Sep 17 00:00:00 2001
From: GOEddieUK <goeddie@gmail.com>
Date: Fri, 31 Jul 2020 19:52:23 +0100
Subject: [PATCH 07/36] undoing csproj change

---
 .ionide/symbolCache.db                          | Bin 28672 -> 0 bytes
 .../Microsoft.Spark/Microsoft.Spark.csproj      |   5 ++++-
 2 files changed, 4 insertions(+), 1 deletion(-)
 delete mode 100644 .ionide/symbolCache.db

diff --git a/.ionide/symbolCache.db b/.ionide/symbolCache.db
deleted file mode 100644
index 43e567d6d682d85dd32b3baebb0fdf61f67c1643..0000000000000000000000000000000000000000
GIT binary patch
literal 0
HcmV?d00001

literal 28672
zcmeHPYiuJ|6}A(<pK-j2lTG3{v7KzX&Lqyu+kIp=n>f3n*{Ao>?UI&RXX3biC7$u_
zru*Uw_|aboAq4!aPz(G7Dp3^)2^9$msX|362qA<70{*l}EvWneB<{TIImtK)+7qdu
zu{?L~_%Yvi&pr3fz2}^JGgp@L0vB1UR7<Q7nKZST%x2S>NW^3^wa~*(5A|iH8H;*B
z&*Jr7uND(C74gzvnY~{#(YNt3Bw$FukbofpLjr~b3<($#FeG3|z>t6;0j&hSxNf$G
zdV0*S4h!s^BA3}J-Ki9L<<v5NH9dDN9l17lb~znUK0M-eMYg>b^F{6=TjC<u($8N@
z?EF3>uvK9>U*;l97n^=RUn$l~<tD#KmsfZBKmB&se=qTi)k<lBTg#X8LcUUtoV|2u
zIX!n#hpPP3IYFrA@6_mQb+2E}S4tZ-ffetXzQ0lBcz!?kU&!+ttdL#L6f43~X`{%M
zxUzhcJpnJXg1}YxF2c2~josMGmAP0e7I&>UNgYJCMRsfNA1?9Bl`LCS%K#N&#7gzf
z;{QO@KD+;){!jcL`9JXd#`A*vLwDKru4~cx6X&wy2aZ|$+xDBbpW9ZfKeZmWylYu)
z{ax!q%P(3^n}21#VfwY{Wv%+^=L?ZG(qpDCr$c=8^*P6`^IVl5<5tIVd0~tSzgigM
z?z5uk`LPT6Y_-By)&wRae!(ne*4gR?lUBdKT&?7)Y>8Rp+k4w%>fhA!qkaWU!g8j&
z9avyP?TH38h17hd$}v3E&T>vpABO?_g&-RIX#2Phe6h%7MQ!I9p4+7FTpy5icQ=}>
z549iWIkuWzms8^H1xPCcSS58?UH(Q%WgSo}pSi&1%ghFqw{V?jb6g|G_h<$0h{v%C
z-gd4n!}2^=x>J<eoHkE74+GR%50((xtZkl#5VYEszAseSYn6><?k-onw82$bp;F~#
znXHv1fvlB@r5g~-pfv-3(m$(DEF(1oCR-_IX=f^>xh}w>8<Ry!#*@=R2ms-AG5)xJ
zRP9>SGqHC9v|D(^V*GlIoc1w|iaru~I2##}Q_|T1nBeK3-|27DujYN}U*Qnog;yWG
zZr^}f59*n?f&gs=trhybzAhapG&<c3x5?!BbZJA_Qci3;;1Wc808lP%6pp@$IRG#+
zco+_!Tc^B0|H>7CzWFvl0k6CDOn7FsU92`wI{g3@Pu)FM&(n0byiegJk8tp$;dZ(^
zv=Y$fNsAXqBh!KP@Nsv!7c*PDz?GP*+?q1NVCFOrK}H;XiU+ZH0EsJTZI6;T-JAq-
zjuS+D<wU}6K%+2?D2wUcX{7IS*_uE{fU1BfryZX?ashsGeTe+%RIpiHsgw#%1s;0{
z?h3fN)qYq#KXPcxiZ}#!^|}W2yC-?@EvJv|03L4$?uxpGbl@9L13Lbu@4GfFzGDn+
zVC?WZLOZIu(Li%nfQvWSa@ZUm8COnomIDH^+w1`7K`Q_{h1O`SUDimWCgBt#t;{nk
zEdcyz2&=_f+pVE(;1Yw9836Y8qX9bTyxjJLCe{~eVwt5$tK^%H(CNGXEB=V@TfXDo
z_q?}^`Ts7)24ntj%>U_#0}R28`M+eFlgwc9Ox&3NKZQZlZ_NMqF!>tue`EenJqeO=
z-(k%EWxt9n4P*W<`j}){81sMGVPVYwoyPqCY2B>G{GWCz|I_n-YoKm$sAt63{IKZ@
zCjVW()Axq2+xxcnw&!C{$^DLd+4ZLDu=DedKRby1-}XDUcWqa#zqMYm{LFH`_1mrC
zmLIi@o8L1(L;w7AFA|##XrEvtfaXaX7#xVzi>ihYNYj*Mww$X`*YV|QzC^=M?s7b{
zR2E&Ad_Jr7vJDN1UN$i$u~P>{b1*gdEEdg`lZyRFw(<J?SRU|jBs?Gp#Bl=Hr*5Ow
zRk$sDSVssAyDIg0F>CoE4YZL{_BddI=VPNxf)tFtiS1c{kwLRP_Bs2KJFK{M=Zfgp
z5MzaS^-RLY01>>A4J9(PJCPk;3|-Gg3h=}8Y*2oI=KR!=4YAJvd^~4-*c;Zwp=d)e
zwB3Zp8E>CHASQ8d{J&ySn^K6#J;CrWR!<OehB~X2isX6YH`pu(F198?`BP{H>`-^;
zqM`|6+mM`(61?aElrp4!0${VlSjO{EDzvau3op=cAg;PpUaK$*T(-!H0bn9Ea6!6~
zfK+Z2jZ}ANN{^JVURgcM@|@U<%-5<_8pe2m6F=O3P0ZtfS{ltsMe8cM8#S4aNHRO7
zP>{8>qXSDzJ6)YVwwmL`gM=7R(3d8$>Y$^w!>jYuqOlTbI-J<LT%j>)^}tn~k4R`#
z%gp&{VwO;uNmcVHV%BjK=nOZ4Rh#YB_F$tnC4XE!;#3Ygq;<NOFRv=@L3mYrMirQ3
zewaoZ8U=@gV@hjM%<JjR8e8P$lnjpm+|4;GaRiSAvjA5NtfsvjUn$ek@?Dx^%e03n
z3biV?lq=IH*m>Hbn}0)EjsT`7(7Hm(^n4Shvcww9bHjiGUTkb|JUFBEjjap;AiR+{
zRfi)Sw-Q%wk3G;24hEwfM_e&LA1|CP7zp+@8d-fr1qb>{%Ti8k6mY>C>QgR<<e`3H
z5J2F?Dh91VuNo7ho{4<`py3TygT7#sh)w<BsyP3Tn7(B4-|~Imchviu_pIkF&y@Rp
z_YK!CUA*%X=hvORj@KPS_9wP~*uG)wv%X;+v%F;)Z+)-zM$2DX*35r0S4{t+H+Y&?
z?L{CBypd`w5*r1BQ*g1l0$@kc`s-LY*1uiDWjz-@3m{Wy!Fqa-jA-c)k0-YQYBGuS
zfDlo4T2gj?^5lgBBv7~8EY`~MDu|t*lx_h6cu_S#?)5s~dLF>x&4FUUSXg<^HJJmg
z+yu~gPaO(<h}wUwH^BXGq6I|7W@4$j0D-Me+jE@~&2coTT?aJq##+%M(UV{#!7Ov-
zoUksZwQB$tucxB0)1Y$#7qtw4MJ#PF8gE|Ot*ZbQPmQA;(Y@683V_B_<7hA{sqtlS
ze>^pg4n>Feuf`;H>7b0t(b?!3eIh&iXMNlE!-ryH6Vpc*fwWGc*$D;%gTqk&q-Cl6
z&AhPQ$ki?Yc)V6Ocw}&wR=2ebVDZ#{FgVz|M=$;yfW}jQD)4qec<>o;`)ASWbD(2j
zKwGjUe(ny9$7ZuaBe${y!12mP!8_>_%6uN&A8%dEKw#jIev4{6q2zTjkIn%KC!$yb
zu1H5@5XzU1b*Mvx8ey0CdUj<Iz~f1n!b=iPgZtx2I1-5LQ^Lf)063)4ObPvg{s|~z
z>SW00%Hm3Vam6}SWN^JQ4~XCiyMMBOimJ2yEVw`3RFM``6M<fzhl6Ea0<~t-HP}*0
zEOS~W&P!;{u>P+8rh)j(l45I9&m?x{pdOk1GyTW(8~A=7nOb-8>4TyfiALgBG{yP<
zfN9y}f7L(ad(C&&`$zA~o<DdB?!UY5xm~W;T^-KvIKz(bIlAp%wm)Ng$9CNMrZr~y
zp(WY+<JKv<1HiiZ_vWJMLwXAn2|v*)hy%|_IT@H_ls4H|=>oi^`*Tu02K?}xl$2k0
zbAGCma`TY_FI()QQ7l(SGLY<5OiOdr+o8!4+e1*~+h_)d<Zv>nql67B-dwfHZZUiz
zzabvn1HdDMHf^Ll5)zb@WM2U=c;2LBe==J4CfT<D3SO6-j3xEfmVqu=yYhSUzCr|B
zqsH6DdlWSiy{nomX}zZ@ywh<VO{2!$<4yhB;^r17Uz%kyl@cc&g}WLOn45_(T}(tN
zt=f)w;;jKRyiG0`FQa(`>-n1+TU#`8q*c)H!K{K&_voHgaIsS6#Dmy)4GV_@uWLz6
zCQj|6Ygy;mRq^1efCpaJLU|n1bS;|z2Cr*L98HWhx|R)qg4eYq#uF1dLT%c$kd6vq
zg6CvS^dv?aO!Sl<x>ey-p$ur@`DqgVL}*9Lx0e7KVs&&S^rGFGR!5--h~O1_<j{yZ
zcF=Tq1%QUv;f**)_HeK_y17O!`!b+|*Yt+n!+y2t74iy*{4B+jaIdpBuwAhptb-Dr
zKns$J55{Mdk*E?Xkg16l7#f*W&(v}N9M75@PsN{u;J|ji24L~5cocR4n9IQhv2p+x
z&(s@Vj$deWWcm~6Bax$`5tSzQRsoG^wAph!7EdTrmwZC%{&DJgB@2M@3=~w?qJpq;
zN4Y(A974P=9;Wt>78cOIj<(HBye+O7eATP3F@vFj0ZfVApl|kbfCFCq0p*~C8rNO|
z5O|G}`oPsFL8V8~HcD~+AGF&{{x#oUeXn>w_CEI7Jzw?M-QRS_UEg(`cE0b-IsWD-
z*gvut>CcRpApt`Ih6D@=7!vqDl0cTI`}N6#m<}|fmUX}y7k4Wa4`L?jmVGkJ6*QPm
zYp1g@6xEwkmnZ@8xJSUAea_Qp<Ehl~l%Cm!PHFRHx-ptSECg=chqoPR%9}d8)BfIA
zWFST>R|;ZeGEFy$x=)|HN*vD)pBhe0C^NlncK8B;yY%7BE>k_JC|$KEu;nbLo^u3l
z$A=f=Oa&b*Noz^uG)L4O_hgO6Lz)T9gtSOo1n!-O7mYD<ct(w`m*!7&m5i)8^~@$w
zH_pS>7>XTNG)BbMWD7*yKTnn{p2kcErn^;5QO|^~@dfjb=g`(vO!rQ!^N%?io_SZ?
zsF<0ra;!kxqpY{^imti@RPfs8^z`%@jfgc_8rKLgN6`#2wD@X-UA_4A%qCGc(UU6~
zuc#A&3DvB(O|fV<3^&xn+pKdkFx9uS8&=()>Y`TKtVBwi1a7T|SN5q>Q|Fbs-R8Wi
r5i_V}JR8*1*wjg2b^;fqlb8T7-kvp6;i)FU08O@N<|Vu8nsWLNN`{i9

diff --git a/src/csharp/Microsoft.Spark/Microsoft.Spark.csproj b/src/csharp/Microsoft.Spark/Microsoft.Spark.csproj
index f284de8c6..2cddc5627 100644
--- a/src/csharp/Microsoft.Spark/Microsoft.Spark.csproj
+++ b/src/csharp/Microsoft.Spark/Microsoft.Spark.csproj
@@ -38,7 +38,10 @@
   </ItemGroup>
 
   <ItemGroup>
-    <Content Include="..\..\scala\microsoft-spark-*\target\microsoft-spark-*.jar" Link="jars\%(Filename)%(Extension)" Pack="true" PackagePath="jars\%(Filename)%(Extension)" />
+    <Content Include="..\..\scala\microsoft-spark-*\target\microsoft-spark-*.jar"
+             Link="jars\%(Filename)%(Extension)"
+             Pack="true"
+             PackagePath="jars\%(Filename)%(Extension)" />
     <Content Include="build\**" Pack="true" PackagePath="build" />
   </ItemGroup>
 

From ed36375561e3495a675f9ac14ab80f79f3fbb38d Mon Sep 17 00:00:00 2001
From: GOEddieUK <goeddie@gmail.com>
Date: Fri, 31 Jul 2020 19:55:49 +0100
Subject: [PATCH 08/36] member doesnt need to be internal

---
 src/csharp/Microsoft.Spark/ML/Feature/FeatureBase.cs | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/csharp/Microsoft.Spark/ML/Feature/FeatureBase.cs b/src/csharp/Microsoft.Spark/ML/Feature/FeatureBase.cs
index 0895dace1..8446b9f4e 100644
--- a/src/csharp/Microsoft.Spark/ML/Feature/FeatureBase.cs
+++ b/src/csharp/Microsoft.Spark/ML/Feature/FeatureBase.cs
@@ -98,7 +98,7 @@ public Param.Param GetParam(string paramName) =>
         public T Set(Param.Param param, object value) => 
             WrapAsType((JvmObjectReference)_jvmObject.Invoke("set", param, value));
 
-        internal static T WrapAsType(JvmObjectReference reference)
+        private static T WrapAsType(JvmObjectReference reference)
         {
             ConstructorInfo constructor = typeof(T)
                 .GetConstructors(BindingFlags.NonPublic | BindingFlags.Instance)

From c7baf7231914b10300175e67158b604d646b97d4 Mon Sep 17 00:00:00 2001
From: GOEddieUK <goeddie@gmail.com>
Date: Fri, 31 Jul 2020 19:56:29 +0100
Subject: [PATCH 09/36] too many lines

---
 src/csharp/Microsoft.Spark/ML/Feature/FeatureBase.cs | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/src/csharp/Microsoft.Spark/ML/Feature/FeatureBase.cs b/src/csharp/Microsoft.Spark/ML/Feature/FeatureBase.cs
index 8446b9f4e..9ccd64d5b 100644
--- a/src/csharp/Microsoft.Spark/ML/Feature/FeatureBase.cs
+++ b/src/csharp/Microsoft.Spark/ML/Feature/FeatureBase.cs
@@ -106,11 +106,10 @@ private static T WrapAsType(JvmObjectReference reference)
                 {
                     ParameterInfo[] parameters = c.GetParameters();
                     return (parameters.Length == 1) &&
-                        (parameters[0].ParameterType == typeof(JvmObjectReference));
+                           (parameters[0].ParameterType == typeof(JvmObjectReference));
                 });
 
             return (T)constructor.Invoke(new object[] {reference});
         }
-        
     }
 }

From d13303ccaeb691691c4d294d96e0995f3597becb Mon Sep 17 00:00:00 2001
From: GOEddieUK <goeddie@gmail.com>
Date: Fri, 31 Jul 2020 20:01:07 +0100
Subject: [PATCH 10/36] removing whitespace change

---
 src/csharp/Microsoft.Spark/ML/Feature/FeatureBase.cs | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/csharp/Microsoft.Spark/ML/Feature/FeatureBase.cs b/src/csharp/Microsoft.Spark/ML/Feature/FeatureBase.cs
index 9ccd64d5b..326268a5e 100644
--- a/src/csharp/Microsoft.Spark/ML/Feature/FeatureBase.cs
+++ b/src/csharp/Microsoft.Spark/ML/Feature/FeatureBase.cs
@@ -105,7 +105,7 @@ private static T WrapAsType(JvmObjectReference reference)
                 .Single(c =>
                 {
                     ParameterInfo[] parameters = c.GetParameters();
-                    return (parameters.Length == 1) &&
+                    return (parameters.Length == 1) && 
                            (parameters[0].ParameterType == typeof(JvmObjectReference));
                 });
 

From f5b477c72158599b1c6552c7eb1af20edfab7779 Mon Sep 17 00:00:00 2001
From: GOEddieUK <goeddie@gmail.com>
Date: Fri, 31 Jul 2020 20:01:57 +0100
Subject: [PATCH 11/36] removing whitespace change

---
 src/csharp/Microsoft.Spark/ML/Feature/FeatureBase.cs | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/csharp/Microsoft.Spark/ML/Feature/FeatureBase.cs b/src/csharp/Microsoft.Spark/ML/Feature/FeatureBase.cs
index 326268a5e..9ccd64d5b 100644
--- a/src/csharp/Microsoft.Spark/ML/Feature/FeatureBase.cs
+++ b/src/csharp/Microsoft.Spark/ML/Feature/FeatureBase.cs
@@ -105,7 +105,7 @@ private static T WrapAsType(JvmObjectReference reference)
                 .Single(c =>
                 {
                     ParameterInfo[] parameters = c.GetParameters();
-                    return (parameters.Length == 1) && 
+                    return (parameters.Length == 1) &&
                            (parameters[0].ParameterType == typeof(JvmObjectReference));
                 });
 

From 73db52b400637585b2216f44aac616828800b9d2 Mon Sep 17 00:00:00 2001
From: GOEddieUK <goeddie@gmail.com>
Date: Fri, 31 Jul 2020 20:06:12 +0100
Subject: [PATCH 12/36] ionide

---
 .ionide/symbolCache.db                          | Bin 0 -> 28672 bytes
 .../Microsoft.Spark/ML/Feature/FeatureBase.cs   |   2 +-
 2 files changed, 1 insertion(+), 1 deletion(-)
 create mode 100644 .ionide/symbolCache.db

diff --git a/.ionide/symbolCache.db b/.ionide/symbolCache.db
new file mode 100644
index 0000000000000000000000000000000000000000..43e567d6d682d85dd32b3baebb0fdf61f67c1643
GIT binary patch
literal 28672
zcmeHPYiuJ|6}A(<pK-j2lTG3{v7KzX&Lqyu+kIp=n>f3n*{Ao>?UI&RXX3biC7$u_
zru*Uw_|aboAq4!aPz(G7Dp3^)2^9$msX|362qA<70{*l}EvWneB<{TIImtK)+7qdu
zu{?L~_%Yvi&pr3fz2}^JGgp@L0vB1UR7<Q7nKZST%x2S>NW^3^wa~*(5A|iH8H;*B
z&*Jr7uND(C74gzvnY~{#(YNt3Bw$FukbofpLjr~b3<($#FeG3|z>t6;0j&hSxNf$G
zdV0*S4h!s^BA3}J-Ki9L<<v5NH9dDN9l17lb~znUK0M-eMYg>b^F{6=TjC<u($8N@
z?EF3>uvK9>U*;l97n^=RUn$l~<tD#KmsfZBKmB&se=qTi)k<lBTg#X8LcUUtoV|2u
zIX!n#hpPP3IYFrA@6_mQb+2E}S4tZ-ffetXzQ0lBcz!?kU&!+ttdL#L6f43~X`{%M
zxUzhcJpnJXg1}YxF2c2~josMGmAP0e7I&>UNgYJCMRsfNA1?9Bl`LCS%K#N&#7gzf
z;{QO@KD+;){!jcL`9JXd#`A*vLwDKru4~cx6X&wy2aZ|$+xDBbpW9ZfKeZmWylYu)
z{ax!q%P(3^n}21#VfwY{Wv%+^=L?ZG(qpDCr$c=8^*P6`^IVl5<5tIVd0~tSzgigM
z?z5uk`LPT6Y_-By)&wRae!(ne*4gR?lUBdKT&?7)Y>8Rp+k4w%>fhA!qkaWU!g8j&
z9avyP?TH38h17hd$}v3E&T>vpABO?_g&-RIX#2Phe6h%7MQ!I9p4+7FTpy5icQ=}>
z549iWIkuWzms8^H1xPCcSS58?UH(Q%WgSo}pSi&1%ghFqw{V?jb6g|G_h<$0h{v%C
z-gd4n!}2^=x>J<eoHkE74+GR%50((xtZkl#5VYEszAseSYn6><?k-onw82$bp;F~#
znXHv1fvlB@r5g~-pfv-3(m$(DEF(1oCR-_IX=f^>xh}w>8<Ry!#*@=R2ms-AG5)xJ
zRP9>SGqHC9v|D(^V*GlIoc1w|iaru~I2##}Q_|T1nBeK3-|27DujYN}U*Qnog;yWG
zZr^}f59*n?f&gs=trhybzAhapG&<c3x5?!BbZJA_Qci3;;1Wc808lP%6pp@$IRG#+
zco+_!Tc^B0|H>7CzWFvl0k6CDOn7FsU92`wI{g3@Pu)FM&(n0byiegJk8tp$;dZ(^
zv=Y$fNsAXqBh!KP@Nsv!7c*PDz?GP*+?q1NVCFOrK}H;XiU+ZH0EsJTZI6;T-JAq-
zjuS+D<wU}6K%+2?D2wUcX{7IS*_uE{fU1BfryZX?ashsGeTe+%RIpiHsgw#%1s;0{
z?h3fN)qYq#KXPcxiZ}#!^|}W2yC-?@EvJv|03L4$?uxpGbl@9L13Lbu@4GfFzGDn+
zVC?WZLOZIu(Li%nfQvWSa@ZUm8COnomIDH^+w1`7K`Q_{h1O`SUDimWCgBt#t;{nk
zEdcyz2&=_f+pVE(;1Yw9836Y8qX9bTyxjJLCe{~eVwt5$tK^%H(CNGXEB=V@TfXDo
z_q?}^`Ts7)24ntj%>U_#0}R28`M+eFlgwc9Ox&3NKZQZlZ_NMqF!>tue`EenJqeO=
z-(k%EWxt9n4P*W<`j}){81sMGVPVYwoyPqCY2B>G{GWCz|I_n-YoKm$sAt63{IKZ@
zCjVW()Axq2+xxcnw&!C{$^DLd+4ZLDu=DedKRby1-}XDUcWqa#zqMYm{LFH`_1mrC
zmLIi@o8L1(L;w7AFA|##XrEvtfaXaX7#xVzi>ihYNYj*Mww$X`*YV|QzC^=M?s7b{
zR2E&Ad_Jr7vJDN1UN$i$u~P>{b1*gdEEdg`lZyRFw(<J?SRU|jBs?Gp#Bl=Hr*5Ow
zRk$sDSVssAyDIg0F>CoE4YZL{_BddI=VPNxf)tFtiS1c{kwLRP_Bs2KJFK{M=Zfgp
z5MzaS^-RLY01>>A4J9(PJCPk;3|-Gg3h=}8Y*2oI=KR!=4YAJvd^~4-*c;Zwp=d)e
zwB3Zp8E>CHASQ8d{J&ySn^K6#J;CrWR!<OehB~X2isX6YH`pu(F198?`BP{H>`-^;
zqM`|6+mM`(61?aElrp4!0${VlSjO{EDzvau3op=cAg;PpUaK$*T(-!H0bn9Ea6!6~
zfK+Z2jZ}ANN{^JVURgcM@|@U<%-5<_8pe2m6F=O3P0ZtfS{ltsMe8cM8#S4aNHRO7
zP>{8>qXSDzJ6)YVwwmL`gM=7R(3d8$>Y$^w!>jYuqOlTbI-J<LT%j>)^}tn~k4R`#
z%gp&{VwO;uNmcVHV%BjK=nOZ4Rh#YB_F$tnC4XE!;#3Ygq;<NOFRv=@L3mYrMirQ3
zewaoZ8U=@gV@hjM%<JjR8e8P$lnjpm+|4;GaRiSAvjA5NtfsvjUn$ek@?Dx^%e03n
z3biV?lq=IH*m>Hbn}0)EjsT`7(7Hm(^n4Shvcww9bHjiGUTkb|JUFBEjjap;AiR+{
zRfi)Sw-Q%wk3G;24hEwfM_e&LA1|CP7zp+@8d-fr1qb>{%Ti8k6mY>C>QgR<<e`3H
z5J2F?Dh91VuNo7ho{4<`py3TygT7#sh)w<BsyP3Tn7(B4-|~Imchviu_pIkF&y@Rp
z_YK!CUA*%X=hvORj@KPS_9wP~*uG)wv%X;+v%F;)Z+)-zM$2DX*35r0S4{t+H+Y&?
z?L{CBypd`w5*r1BQ*g1l0$@kc`s-LY*1uiDWjz-@3m{Wy!Fqa-jA-c)k0-YQYBGuS
zfDlo4T2gj?^5lgBBv7~8EY`~MDu|t*lx_h6cu_S#?)5s~dLF>x&4FUUSXg<^HJJmg
z+yu~gPaO(<h}wUwH^BXGq6I|7W@4$j0D-Me+jE@~&2coTT?aJq##+%M(UV{#!7Ov-
zoUksZwQB$tucxB0)1Y$#7qtw4MJ#PF8gE|Ot*ZbQPmQA;(Y@683V_B_<7hA{sqtlS
ze>^pg4n>Feuf`;H>7b0t(b?!3eIh&iXMNlE!-ryH6Vpc*fwWGc*$D;%gTqk&q-Cl6
z&AhPQ$ki?Yc)V6Ocw}&wR=2ebVDZ#{FgVz|M=$;yfW}jQD)4qec<>o;`)ASWbD(2j
zKwGjUe(ny9$7ZuaBe${y!12mP!8_>_%6uN&A8%dEKw#jIev4{6q2zTjkIn%KC!$yb
zu1H5@5XzU1b*Mvx8ey0CdUj<Iz~f1n!b=iPgZtx2I1-5LQ^Lf)063)4ObPvg{s|~z
z>SW00%Hm3Vam6}SWN^JQ4~XCiyMMBOimJ2yEVw`3RFM``6M<fzhl6Ea0<~t-HP}*0
zEOS~W&P!;{u>P+8rh)j(l45I9&m?x{pdOk1GyTW(8~A=7nOb-8>4TyfiALgBG{yP<
zfN9y}f7L(ad(C&&`$zA~o<DdB?!UY5xm~W;T^-KvIKz(bIlAp%wm)Ng$9CNMrZr~y
zp(WY+<JKv<1HiiZ_vWJMLwXAn2|v*)hy%|_IT@H_ls4H|=>oi^`*Tu02K?}xl$2k0
zbAGCma`TY_FI()QQ7l(SGLY<5OiOdr+o8!4+e1*~+h_)d<Zv>nql67B-dwfHZZUiz
zzabvn1HdDMHf^Ll5)zb@WM2U=c;2LBe==J4CfT<D3SO6-j3xEfmVqu=yYhSUzCr|B
zqsH6DdlWSiy{nomX}zZ@ywh<VO{2!$<4yhB;^r17Uz%kyl@cc&g}WLOn45_(T}(tN
zt=f)w;;jKRyiG0`FQa(`>-n1+TU#`8q*c)H!K{K&_voHgaIsS6#Dmy)4GV_@uWLz6
zCQj|6Ygy;mRq^1efCpaJLU|n1bS;|z2Cr*L98HWhx|R)qg4eYq#uF1dLT%c$kd6vq
zg6CvS^dv?aO!Sl<x>ey-p$ur@`DqgVL}*9Lx0e7KVs&&S^rGFGR!5--h~O1_<j{yZ
zcF=Tq1%QUv;f**)_HeK_y17O!`!b+|*Yt+n!+y2t74iy*{4B+jaIdpBuwAhptb-Dr
zKns$J55{Mdk*E?Xkg16l7#f*W&(v}N9M75@PsN{u;J|ji24L~5cocR4n9IQhv2p+x
z&(s@Vj$deWWcm~6Bax$`5tSzQRsoG^wAph!7EdTrmwZC%{&DJgB@2M@3=~w?qJpq;
zN4Y(A974P=9;Wt>78cOIj<(HBye+O7eATP3F@vFj0ZfVApl|kbfCFCq0p*~C8rNO|
z5O|G}`oPsFL8V8~HcD~+AGF&{{x#oUeXn>w_CEI7Jzw?M-QRS_UEg(`cE0b-IsWD-
z*gvut>CcRpApt`Ih6D@=7!vqDl0cTI`}N6#m<}|fmUX}y7k4Wa4`L?jmVGkJ6*QPm
zYp1g@6xEwkmnZ@8xJSUAea_Qp<Ehl~l%Cm!PHFRHx-ptSECg=chqoPR%9}d8)BfIA
zWFST>R|;ZeGEFy$x=)|HN*vD)pBhe0C^NlncK8B;yY%7BE>k_JC|$KEu;nbLo^u3l
z$A=f=Oa&b*Noz^uG)L4O_hgO6Lz)T9gtSOo1n!-O7mYD<ct(w`m*!7&m5i)8^~@$w
zH_pS>7>XTNG)BbMWD7*yKTnn{p2kcErn^;5QO|^~@dfjb=g`(vO!rQ!^N%?io_SZ?
zsF<0ra;!kxqpY{^imti@RPfs8^z`%@jfgc_8rKLgN6`#2wD@X-UA_4A%qCGc(UU6~
zuc#A&3DvB(O|fV<3^&xn+pKdkFx9uS8&=()>Y`TKtVBwi1a7T|SN5q>Q|Fbs-R8Wi
r5i_V}JR8*1*wjg2b^;fqlb8T7-kvp6;i)FU08O@N<|Vu8nsWLNN`{i9

literal 0
HcmV?d00001

diff --git a/src/csharp/Microsoft.Spark/ML/Feature/FeatureBase.cs b/src/csharp/Microsoft.Spark/ML/Feature/FeatureBase.cs
index 9ccd64d5b..326268a5e 100644
--- a/src/csharp/Microsoft.Spark/ML/Feature/FeatureBase.cs
+++ b/src/csharp/Microsoft.Spark/ML/Feature/FeatureBase.cs
@@ -105,7 +105,7 @@ private static T WrapAsType(JvmObjectReference reference)
                 .Single(c =>
                 {
                     ParameterInfo[] parameters = c.GetParameters();
-                    return (parameters.Length == 1) &&
+                    return (parameters.Length == 1) && 
                            (parameters[0].ParameterType == typeof(JvmObjectReference));
                 });
 

From 8e1685cd270657c5e7a6769e732bf85d5ae6cb2e Mon Sep 17 00:00:00 2001
From: Niharika Dutta <nidutta@microsoft.com>
Date: Thu, 13 Aug 2020 12:59:34 -0700
Subject: [PATCH 13/36] Revert "Merge branch 'master' into ml/countvectorizer"

This reverts commit a766146f56014ccae4118b35495b84da588af94f, reversing
changes made to 73db52b400637585b2216f44aac616828800b9d2.

Reverting countvectorizer changes
---
 .gitignore                                      |   3 ---
 .ionide/symbolCache.db                          | Bin 0 -> 28672 bytes
 .../Processor/BroadcastVariableProcessor.cs     |   3 +--
 3 files changed, 1 insertion(+), 5 deletions(-)
 create mode 100644 .ionide/symbolCache.db

diff --git a/.gitignore b/.gitignore
index faada9c8a..251cfa7e2 100644
--- a/.gitignore
+++ b/.gitignore
@@ -367,6 +367,3 @@ hs_err_pid*
 
 # The target folder contains the output of building
 **/target/**
-
-# F# vs code 
-.ionide/
diff --git a/.ionide/symbolCache.db b/.ionide/symbolCache.db
new file mode 100644
index 0000000000000000000000000000000000000000..43e567d6d682d85dd32b3baebb0fdf61f67c1643
GIT binary patch
literal 28672
zcmeHPYiuJ|6}A(<pK-j2lTG3{v7KzX&Lqyu+kIp=n>f3n*{Ao>?UI&RXX3biC7$u_
zru*Uw_|aboAq4!aPz(G7Dp3^)2^9$msX|362qA<70{*l}EvWneB<{TIImtK)+7qdu
zu{?L~_%Yvi&pr3fz2}^JGgp@L0vB1UR7<Q7nKZST%x2S>NW^3^wa~*(5A|iH8H;*B
z&*Jr7uND(C74gzvnY~{#(YNt3Bw$FukbofpLjr~b3<($#FeG3|z>t6;0j&hSxNf$G
zdV0*S4h!s^BA3}J-Ki9L<<v5NH9dDN9l17lb~znUK0M-eMYg>b^F{6=TjC<u($8N@
z?EF3>uvK9>U*;l97n^=RUn$l~<tD#KmsfZBKmB&se=qTi)k<lBTg#X8LcUUtoV|2u
zIX!n#hpPP3IYFrA@6_mQb+2E}S4tZ-ffetXzQ0lBcz!?kU&!+ttdL#L6f43~X`{%M
zxUzhcJpnJXg1}YxF2c2~josMGmAP0e7I&>UNgYJCMRsfNA1?9Bl`LCS%K#N&#7gzf
z;{QO@KD+;){!jcL`9JXd#`A*vLwDKru4~cx6X&wy2aZ|$+xDBbpW9ZfKeZmWylYu)
z{ax!q%P(3^n}21#VfwY{Wv%+^=L?ZG(qpDCr$c=8^*P6`^IVl5<5tIVd0~tSzgigM
z?z5uk`LPT6Y_-By)&wRae!(ne*4gR?lUBdKT&?7)Y>8Rp+k4w%>fhA!qkaWU!g8j&
z9avyP?TH38h17hd$}v3E&T>vpABO?_g&-RIX#2Phe6h%7MQ!I9p4+7FTpy5icQ=}>
z549iWIkuWzms8^H1xPCcSS58?UH(Q%WgSo}pSi&1%ghFqw{V?jb6g|G_h<$0h{v%C
z-gd4n!}2^=x>J<eoHkE74+GR%50((xtZkl#5VYEszAseSYn6><?k-onw82$bp;F~#
znXHv1fvlB@r5g~-pfv-3(m$(DEF(1oCR-_IX=f^>xh}w>8<Ry!#*@=R2ms-AG5)xJ
zRP9>SGqHC9v|D(^V*GlIoc1w|iaru~I2##}Q_|T1nBeK3-|27DujYN}U*Qnog;yWG
zZr^}f59*n?f&gs=trhybzAhapG&<c3x5?!BbZJA_Qci3;;1Wc808lP%6pp@$IRG#+
zco+_!Tc^B0|H>7CzWFvl0k6CDOn7FsU92`wI{g3@Pu)FM&(n0byiegJk8tp$;dZ(^
zv=Y$fNsAXqBh!KP@Nsv!7c*PDz?GP*+?q1NVCFOrK}H;XiU+ZH0EsJTZI6;T-JAq-
zjuS+D<wU}6K%+2?D2wUcX{7IS*_uE{fU1BfryZX?ashsGeTe+%RIpiHsgw#%1s;0{
z?h3fN)qYq#KXPcxiZ}#!^|}W2yC-?@EvJv|03L4$?uxpGbl@9L13Lbu@4GfFzGDn+
zVC?WZLOZIu(Li%nfQvWSa@ZUm8COnomIDH^+w1`7K`Q_{h1O`SUDimWCgBt#t;{nk
zEdcyz2&=_f+pVE(;1Yw9836Y8qX9bTyxjJLCe{~eVwt5$tK^%H(CNGXEB=V@TfXDo
z_q?}^`Ts7)24ntj%>U_#0}R28`M+eFlgwc9Ox&3NKZQZlZ_NMqF!>tue`EenJqeO=
z-(k%EWxt9n4P*W<`j}){81sMGVPVYwoyPqCY2B>G{GWCz|I_n-YoKm$sAt63{IKZ@
zCjVW()Axq2+xxcnw&!C{$^DLd+4ZLDu=DedKRby1-}XDUcWqa#zqMYm{LFH`_1mrC
zmLIi@o8L1(L;w7AFA|##XrEvtfaXaX7#xVzi>ihYNYj*Mww$X`*YV|QzC^=M?s7b{
zR2E&Ad_Jr7vJDN1UN$i$u~P>{b1*gdEEdg`lZyRFw(<J?SRU|jBs?Gp#Bl=Hr*5Ow
zRk$sDSVssAyDIg0F>CoE4YZL{_BddI=VPNxf)tFtiS1c{kwLRP_Bs2KJFK{M=Zfgp
z5MzaS^-RLY01>>A4J9(PJCPk;3|-Gg3h=}8Y*2oI=KR!=4YAJvd^~4-*c;Zwp=d)e
zwB3Zp8E>CHASQ8d{J&ySn^K6#J;CrWR!<OehB~X2isX6YH`pu(F198?`BP{H>`-^;
zqM`|6+mM`(61?aElrp4!0${VlSjO{EDzvau3op=cAg;PpUaK$*T(-!H0bn9Ea6!6~
zfK+Z2jZ}ANN{^JVURgcM@|@U<%-5<_8pe2m6F=O3P0ZtfS{ltsMe8cM8#S4aNHRO7
zP>{8>qXSDzJ6)YVwwmL`gM=7R(3d8$>Y$^w!>jYuqOlTbI-J<LT%j>)^}tn~k4R`#
z%gp&{VwO;uNmcVHV%BjK=nOZ4Rh#YB_F$tnC4XE!;#3Ygq;<NOFRv=@L3mYrMirQ3
zewaoZ8U=@gV@hjM%<JjR8e8P$lnjpm+|4;GaRiSAvjA5NtfsvjUn$ek@?Dx^%e03n
z3biV?lq=IH*m>Hbn}0)EjsT`7(7Hm(^n4Shvcww9bHjiGUTkb|JUFBEjjap;AiR+{
zRfi)Sw-Q%wk3G;24hEwfM_e&LA1|CP7zp+@8d-fr1qb>{%Ti8k6mY>C>QgR<<e`3H
z5J2F?Dh91VuNo7ho{4<`py3TygT7#sh)w<BsyP3Tn7(B4-|~Imchviu_pIkF&y@Rp
z_YK!CUA*%X=hvORj@KPS_9wP~*uG)wv%X;+v%F;)Z+)-zM$2DX*35r0S4{t+H+Y&?
z?L{CBypd`w5*r1BQ*g1l0$@kc`s-LY*1uiDWjz-@3m{Wy!Fqa-jA-c)k0-YQYBGuS
zfDlo4T2gj?^5lgBBv7~8EY`~MDu|t*lx_h6cu_S#?)5s~dLF>x&4FUUSXg<^HJJmg
z+yu~gPaO(<h}wUwH^BXGq6I|7W@4$j0D-Me+jE@~&2coTT?aJq##+%M(UV{#!7Ov-
zoUksZwQB$tucxB0)1Y$#7qtw4MJ#PF8gE|Ot*ZbQPmQA;(Y@683V_B_<7hA{sqtlS
ze>^pg4n>Feuf`;H>7b0t(b?!3eIh&iXMNlE!-ryH6Vpc*fwWGc*$D;%gTqk&q-Cl6
z&AhPQ$ki?Yc)V6Ocw}&wR=2ebVDZ#{FgVz|M=$;yfW}jQD)4qec<>o;`)ASWbD(2j
zKwGjUe(ny9$7ZuaBe${y!12mP!8_>_%6uN&A8%dEKw#jIev4{6q2zTjkIn%KC!$yb
zu1H5@5XzU1b*Mvx8ey0CdUj<Iz~f1n!b=iPgZtx2I1-5LQ^Lf)063)4ObPvg{s|~z
z>SW00%Hm3Vam6}SWN^JQ4~XCiyMMBOimJ2yEVw`3RFM``6M<fzhl6Ea0<~t-HP}*0
zEOS~W&P!;{u>P+8rh)j(l45I9&m?x{pdOk1GyTW(8~A=7nOb-8>4TyfiALgBG{yP<
zfN9y}f7L(ad(C&&`$zA~o<DdB?!UY5xm~W;T^-KvIKz(bIlAp%wm)Ng$9CNMrZr~y
zp(WY+<JKv<1HiiZ_vWJMLwXAn2|v*)hy%|_IT@H_ls4H|=>oi^`*Tu02K?}xl$2k0
zbAGCma`TY_FI()QQ7l(SGLY<5OiOdr+o8!4+e1*~+h_)d<Zv>nql67B-dwfHZZUiz
zzabvn1HdDMHf^Ll5)zb@WM2U=c;2LBe==J4CfT<D3SO6-j3xEfmVqu=yYhSUzCr|B
zqsH6DdlWSiy{nomX}zZ@ywh<VO{2!$<4yhB;^r17Uz%kyl@cc&g}WLOn45_(T}(tN
zt=f)w;;jKRyiG0`FQa(`>-n1+TU#`8q*c)H!K{K&_voHgaIsS6#Dmy)4GV_@uWLz6
zCQj|6Ygy;mRq^1efCpaJLU|n1bS;|z2Cr*L98HWhx|R)qg4eYq#uF1dLT%c$kd6vq
zg6CvS^dv?aO!Sl<x>ey-p$ur@`DqgVL}*9Lx0e7KVs&&S^rGFGR!5--h~O1_<j{yZ
zcF=Tq1%QUv;f**)_HeK_y17O!`!b+|*Yt+n!+y2t74iy*{4B+jaIdpBuwAhptb-Dr
zKns$J55{Mdk*E?Xkg16l7#f*W&(v}N9M75@PsN{u;J|ji24L~5cocR4n9IQhv2p+x
z&(s@Vj$deWWcm~6Bax$`5tSzQRsoG^wAph!7EdTrmwZC%{&DJgB@2M@3=~w?qJpq;
zN4Y(A974P=9;Wt>78cOIj<(HBye+O7eATP3F@vFj0ZfVApl|kbfCFCq0p*~C8rNO|
z5O|G}`oPsFL8V8~HcD~+AGF&{{x#oUeXn>w_CEI7Jzw?M-QRS_UEg(`cE0b-IsWD-
z*gvut>CcRpApt`Ih6D@=7!vqDl0cTI`}N6#m<}|fmUX}y7k4Wa4`L?jmVGkJ6*QPm
zYp1g@6xEwkmnZ@8xJSUAea_Qp<Ehl~l%Cm!PHFRHx-ptSECg=chqoPR%9}d8)BfIA
zWFST>R|;ZeGEFy$x=)|HN*vD)pBhe0C^NlncK8B;yY%7BE>k_JC|$KEu;nbLo^u3l
z$A=f=Oa&b*Noz^uG)L4O_hgO6Lz)T9gtSOo1n!-O7mYD<ct(w`m*!7&m5i)8^~@$w
zH_pS>7>XTNG)BbMWD7*yKTnn{p2kcErn^;5QO|^~@dfjb=g`(vO!rQ!^N%?io_SZ?
zsF<0ra;!kxqpY{^imti@RPfs8^z`%@jfgc_8rKLgN6`#2wD@X-UA_4A%qCGc(UU6~
zuc#A&3DvB(O|fV<3^&xn+pKdkFx9uS8&=()>Y`TKtVBwi1a7T|SN5q>Q|Fbs-R8Wi
r5i_V}JR8*1*wjg2b^;fqlb8T7-kvp6;i)FU08O@N<|Vu8nsWLNN`{i9

literal 0
HcmV?d00001

diff --git a/src/csharp/Microsoft.Spark.Worker/Processor/BroadcastVariableProcessor.cs b/src/csharp/Microsoft.Spark.Worker/Processor/BroadcastVariableProcessor.cs
index bf8f48ed8..41c817d02 100644
--- a/src/csharp/Microsoft.Spark.Worker/Processor/BroadcastVariableProcessor.cs
+++ b/src/csharp/Microsoft.Spark.Worker/Processor/BroadcastVariableProcessor.cs
@@ -54,8 +54,7 @@ internal BroadcastVariables Process(Stream stream)
                     else
                     {
                         string path = SerDe.ReadString(stream);
-                        using FileStream fStream = 
-                            File.Open(path, FileMode.Open, FileAccess.Read, FileShare.Read);
+                        using FileStream fStream = File.Open(path, FileMode.Open, FileAccess.Read);
                         object value = formatter.Deserialize(fStream);
                         BroadcastRegistry.Add(bid, value);
                     }

From 255515eecbd6cb8e7919fbd2b857d99e335c66d2 Mon Sep 17 00:00:00 2001
From: Niharika Dutta <nidutta@microsoft.com>
Date: Thu, 13 Aug 2020 13:04:05 -0700
Subject: [PATCH 14/36] Revert "Merge branch 'ml/countvectorizer' of
 https://github.com/GoEddie/spark"

This reverts commit ad6bcede69de012c22178825e76c6b175c770b8f, reversing
changes made to 4c5d502a9f56e79ea071b12d2a49dced3873dea8.

reverting countvectorizer changes -2
---
 .../ML/Feature/CountVectorizerModelTests.cs   |  73 -------
 .../ML/Feature/CountVectorizerTests.cs        |  76 -------
 .../ML/Feature/CountVectorizer.cs             | 197 ------------------
 .../ML/Feature/CountVectorizerModel.cs        | 170 ---------------
 .../Microsoft.Spark/ML/Feature/FeatureBase.cs |   4 +-
 5 files changed, 2 insertions(+), 518 deletions(-)
 delete mode 100644 src/csharp/Microsoft.Spark.E2ETest/IpcTests/ML/Feature/CountVectorizerModelTests.cs
 delete mode 100644 src/csharp/Microsoft.Spark.E2ETest/IpcTests/ML/Feature/CountVectorizerTests.cs
 delete mode 100644 src/csharp/Microsoft.Spark/ML/Feature/CountVectorizer.cs
 delete mode 100644 src/csharp/Microsoft.Spark/ML/Feature/CountVectorizerModel.cs

diff --git a/src/csharp/Microsoft.Spark.E2ETest/IpcTests/ML/Feature/CountVectorizerModelTests.cs b/src/csharp/Microsoft.Spark.E2ETest/IpcTests/ML/Feature/CountVectorizerModelTests.cs
deleted file mode 100644
index 3c3132dd9..000000000
--- a/src/csharp/Microsoft.Spark.E2ETest/IpcTests/ML/Feature/CountVectorizerModelTests.cs
+++ /dev/null
@@ -1,73 +0,0 @@
-// Licensed to the .NET Foundation under one or more agreements.
-// The .NET Foundation licenses this file to you under the MIT license.
-// See the LICENSE file in the project root for more information.
-
-using System;
-using System.Collections.Generic;
-using System.IO;
-using Microsoft.Spark.ML.Feature;
-using Microsoft.Spark.Sql;
-using Microsoft.Spark.UnitTest.TestUtils;
-using Xunit;
-
-namespace Microsoft.Spark.E2ETest.IpcTests.ML.Feature
-{
-    [Collection("Spark E2E Tests")]
-    public class CountVectorizerModelTests
-    {
-        private readonly SparkSession _spark;
-
-        public CountVectorizerModelTests(SparkFixture fixture)
-        {
-            _spark = fixture.Spark;
-        }
-
-        [Fact]
-        public void Test_CountVectorizerModel()
-        {
-            DataFrame input = _spark.Sql("SELECT array('hello', 'I', 'AM', 'a', 'string', 'TO', " +
-                                         "'TOKENIZE') as input from range(100)");
-
-            const string inputColumn = "input";
-            const string outputColumn = "output";
-            const double minTf = 10.0;
-            const bool binary = false;
-            
-            List<string> vocabulary = new List<string>()
-            {
-                "hello",
-                "I",
-                "AM",
-                "TO",
-                "TOKENIZE"
-            };
-            
-            var countVectorizerModel = new CountVectorizerModel(vocabulary);
-            
-            Assert.IsType<CountVectorizerModel>(new CountVectorizerModel("my-uid", vocabulary));
-            
-            countVectorizerModel = countVectorizerModel
-                .SetInputCol(inputColumn)
-                .SetOutputCol(outputColumn)
-                .SetMinTF(minTf)
-                .SetBinary(binary);
-            
-            Assert.Equal(inputColumn, countVectorizerModel.GetInputCol());
-            Assert.Equal(outputColumn, countVectorizerModel.GetOutputCol());
-            Assert.Equal(minTf, countVectorizerModel.GetMinTF());
-            Assert.Equal(binary, countVectorizerModel.GetBinary());
-            using (var tempDirectory = new TemporaryDirectory())
-            {
-                string savePath = Path.Join(tempDirectory.Path, "countVectorizerModel");
-                countVectorizerModel.Save(savePath);
-                
-                CountVectorizerModel loadedModel = CountVectorizerModel.Load(savePath);
-                Assert.Equal(countVectorizerModel.Uid(), loadedModel.Uid());
-            }
-
-            Assert.IsType<int>(countVectorizerModel.GetVocabSize());
-            Assert.NotEmpty(countVectorizerModel.ExplainParams());
-            Assert.NotEmpty(countVectorizerModel.ToString());
-        } 
-    }
-}
diff --git a/src/csharp/Microsoft.Spark.E2ETest/IpcTests/ML/Feature/CountVectorizerTests.cs b/src/csharp/Microsoft.Spark.E2ETest/IpcTests/ML/Feature/CountVectorizerTests.cs
deleted file mode 100644
index 95b9bc504..000000000
--- a/src/csharp/Microsoft.Spark.E2ETest/IpcTests/ML/Feature/CountVectorizerTests.cs
+++ /dev/null
@@ -1,76 +0,0 @@
-// Licensed to the .NET Foundation under one or more agreements.
-// The .NET Foundation licenses this file to you under the MIT license.
-// See the LICENSE file in the project root for more information.
-
-using System;
-using System.IO;
-using Microsoft.Spark.E2ETest.Utils;
-using Microsoft.Spark.ML.Feature;
-using Microsoft.Spark.Sql;
-using Microsoft.Spark.UnitTest.TestUtils;
-using Xunit;
-
-namespace Microsoft.Spark.E2ETest.IpcTests.ML.Feature
-{
-    [Collection("Spark E2E Tests")]
-    public class CountVectorizerTests
-    {
-        private readonly SparkSession _spark;
-
-        public CountVectorizerTests(SparkFixture fixture)
-        {
-            _spark = fixture.Spark;
-        }
-
-        [Fact]
-        public void Test_CountVectorizer()
-        {
-            DataFrame input = _spark.Sql("SELECT array('hello', 'I', 'AM', 'a', 'string', 'TO', " +
-                                         "'TOKENIZE') as input from range(100)");
-
-            const string inputColumn = "input";
-            const string outputColumn = "output";
-            const double minDf = 1;
-            const double minTf = 10;
-            const int vocabSize = 10000;
-            const bool binary = false;
-            
-            var countVectorizer = new CountVectorizer();
-            
-            countVectorizer
-                .SetInputCol(inputColumn)
-                .SetOutputCol(outputColumn)
-                .SetMinDF(minDf)
-                .SetMinTF(minTf)
-                .SetVocabSize(vocabSize);
-                
-            Assert.IsType<CountVectorizerModel>(countVectorizer.Fit(input));
-            Assert.Equal(inputColumn, countVectorizer.GetInputCol());
-            Assert.Equal(outputColumn, countVectorizer.GetOutputCol());
-            Assert.Equal(minDf, countVectorizer.GetMinDF());
-            Assert.Equal(minTf, countVectorizer.GetMinTF());
-            Assert.Equal(vocabSize, countVectorizer.GetVocabSize());
-            Assert.Equal(binary, countVectorizer.GetBinary());
-
-            using (var tempDirectory = new TemporaryDirectory())
-            {
-                string savePath = Path.Join(tempDirectory.Path, "countVectorizer");
-                countVectorizer.Save(savePath);
-                
-                CountVectorizer loadedVectorizer = CountVectorizer.Load(savePath);
-                Assert.Equal(countVectorizer.Uid(), loadedVectorizer.Uid());
-            }
-            
-            Assert.NotEmpty(countVectorizer.ExplainParams());
-            Assert.NotEmpty(countVectorizer.ToString());
-        }
-
-        [SkipIfSparkVersionIsLessThan(Versions.V2_4_0)]
-        public void CountVectorizer_MaxDF()
-        {
-            const double maxDf = 100;
-            CountVectorizer countVectorizer = new CountVectorizer().SetMaxDF(maxDf);
-            Assert.Equal(maxDf, countVectorizer.GetMaxDF());
-        }
-    }
-}
diff --git a/src/csharp/Microsoft.Spark/ML/Feature/CountVectorizer.cs b/src/csharp/Microsoft.Spark/ML/Feature/CountVectorizer.cs
deleted file mode 100644
index 5689e19fd..000000000
--- a/src/csharp/Microsoft.Spark/ML/Feature/CountVectorizer.cs
+++ /dev/null
@@ -1,197 +0,0 @@
-// Licensed to the .NET Foundation under one or more agreements.
-// The .NET Foundation licenses this file to you under the MIT license.
-// See the LICENSE file in the project root for more information.
-
-using Microsoft.Spark.Interop;
-using Microsoft.Spark.Interop.Ipc;
-using Microsoft.Spark.Sql;
-
-namespace Microsoft.Spark.ML.Feature
-{
-    public class CountVectorizer : FeatureBase<CountVectorizer>, IJvmObjectReferenceProvider
-    {
-        private static readonly string s_countVectorizerClassName = 
-            "org.apache.spark.ml.feature.CountVectorizer";
-        
-        /// <summary>
-        /// Create a <see cref="CountVectorizer"/> without any parameters
-        /// </summary>
-        public CountVectorizer() : base(s_countVectorizerClassName)
-        {
-        }
-
-        /// <summary>
-        /// Create a <see cref="CountVectorizer"/> with a UID that is used to give the
-        /// <see cref="CountVectorizer"/> a unique ID
-        /// </summary>
-        /// <param name="uid">An immutable unique ID for the object and its derivatives.</param>
-        public CountVectorizer(string uid) : base(s_countVectorizerClassName, uid)
-        {
-        }
-        
-        internal CountVectorizer(JvmObjectReference jvmObject) : base(jvmObject)
-        {
-        }
-
-        JvmObjectReference IJvmObjectReferenceProvider.Reference => _jvmObject;
-
-        /// <summary>Fits a model to the input data.</summary>
-        /// <param name="dataFrame">The <see cref="DataFrame"/> to fit the model to.</param>
-        /// <returns><see cref="CountVectorizerModel"/></returns>
-        public CountVectorizerModel Fit(DataFrame dataFrame) => 
-            new CountVectorizerModel((JvmObjectReference)_jvmObject.Invoke("fit", dataFrame));
-
-        /// <summary>
-        /// Loads the <see cref="CountVectorizer"/> that was previously saved using Save
-        /// </summary>
-        /// <param name="path">
-        /// The path the previous <see cref="CountVectorizer"/> was saved to
-        /// </param>
-        /// <returns>New <see cref="CountVectorizer"/> object</returns>
-        public static CountVectorizer Load(string path) =>
-            WrapAsCountVectorizer((JvmObjectReference)
-                SparkEnvironment.JvmBridge.CallStaticJavaMethod(
-                    s_countVectorizerClassName,"load", path));
-        
-        /// <summary>
-        /// Gets the binary toggle to control the output vector values. If True, all nonzero counts
-        /// (after minTF filter applied) are set to 1. This is useful for discrete probabilistic
-        /// models that model binary events rather than integer counts. Default: false
-        /// </summary>
-        /// <returns>boolean</returns>
-        public bool GetBinary() => (bool)_jvmObject.Invoke("getBinary");
-        
-        /// <summary>
-        /// Sets the binary toggle to control the output vector values. If True, all nonzero counts
-        /// (after minTF filter applied) are set to 1. This is useful for discrete probabilistic
-        /// models that model binary events rather than integer counts. Default: false
-        /// </summary>
-        /// <param name="value">Turn the binary toggle on or off</param>
-        /// <returns><see cref="CountVectorizer"/> with the new binary toggle value set</returns>
-        public CountVectorizer SetBinary(bool value) =>
-            WrapAsCountVectorizer((JvmObjectReference)_jvmObject.Invoke("setBinary", value));
-
-        /// <summary>
-        /// Gets the column that the <see cref="CountVectorizer"/> should read from and convert
-        /// into buckets. This would have been set by SetInputCol
-        /// </summary>
-        /// <returns>string, the input column</returns>
-        public string GetInputCol() => _jvmObject.Invoke("getInputCol") as string;
-        
-        /// <summary>
-        /// Sets the column that the <see cref="CountVectorizer"/> should read from.
-        /// </summary>
-        /// <param name="value">The name of the column to as the source.</param>
-        /// <returns><see cref="CountVectorizer"/> with the input column set</returns>
-        public CountVectorizer SetInputCol(string value) =>
-            WrapAsCountVectorizer((JvmObjectReference)_jvmObject.Invoke("setInputCol", value));
-        
-        /// <summary>
-        /// The <see cref="CountVectorizer"/> will create a new column in the DataFrame, this is
-        /// the name of the new column.
-        /// </summary>
-        /// <returns>The name of the output column.</returns>
-        public string GetOutputCol() => _jvmObject.Invoke("getOutputCol") as string;
-        
-        /// <summary>
-        /// The <see cref="CountVectorizer"/> will create a new column in the DataFrame, this
-        /// is the name of the new column.
-        /// </summary>
-        /// <param name="value">The name of the output column which will be created.</param>
-        /// <returns>New <see cref="CountVectorizer"/> with the output column set</returns>
-        public CountVectorizer SetOutputCol(string value) =>
-            WrapAsCountVectorizer((JvmObjectReference)_jvmObject.Invoke("setOutputCol", value));
-        
-        /// <summary>
-        /// Gets the maximum number of different documents a term could appear in to be included in
-        /// the vocabulary. A term that appears more than the threshold will be ignored. If this is
-        /// an integer greater than or equal to 1, this specifies the maximum number of documents
-        /// the term could appear in; if this is a double in [0,1), then this specifies the maximum
-        /// fraction of documents the term could appear in.
-        /// </summary>
-        /// <returns>The maximum document term frequency</returns>
-        [Since(Versions.V2_4_0)]
-        public double GetMaxDF() => (double)_jvmObject.Invoke("getMaxDF");
-
-        /// <summary>
-        /// Sets the maximum number of different documents a term could appear in to be included in
-        /// the vocabulary. A term that appears more than the threshold will be ignored. If this is
-        /// an integer greater than or equal to 1, this specifies the maximum number of documents
-        /// the term could appear in; if this is a double in [0,1), then this specifies the maximum
-        /// fraction of documents the term could appear in.
-        /// </summary>
-        /// <param name="value">The maximum document term frequency</param>
-        /// <returns>New <see cref="CountVectorizer"/> with the max df value set</returns>
-        [Since(Versions.V2_4_0)]
-        public CountVectorizer SetMaxDF(double value) =>
-            WrapAsCountVectorizer((JvmObjectReference)_jvmObject.Invoke("setMaxDF", value));
-        
-        /// <summary>
-        /// Gets the minimum number of different documents a term must appear in to be included in
-        /// the vocabulary. If this is an integer greater than or equal to 1, this specifies the
-        /// number of documents the term must appear in; if this is a double in [0,1), then this
-        /// specifies the fraction of documents.
-        /// </summary>
-        /// <returns>The minimum document term frequency</returns>
-        public double GetMinDF() => (double)_jvmObject.Invoke("getMinDF");
-        
-        /// <summary>
-        /// Sets the minimum number of different documents a term must appear in to be included in
-        /// the vocabulary. If this is an integer greater than or equal to 1, this specifies the
-        /// number of documents the term must appear in; if this is a double in [0,1), then this
-        /// specifies the fraction of documents.
-        /// </summary>
-        /// <param name="value">The minimum document term frequency</param>
-        /// <returns>New <see cref="CountVectorizer"/> with the min df value set</returns>
-        public CountVectorizer SetMinDF(double value) =>
-            WrapAsCountVectorizer((JvmObjectReference)_jvmObject.Invoke("setMinDF", value));
-        
-        /// <summary>
-        /// Filter to ignore rare words in a document. For each document, terms with
-        /// frequency/count less than the given threshold are ignored. If this is an integer
-        /// greater than or equal to 1, then this specifies a count (of times the term must appear
-        /// in the document); if this is a double in [0,1), then this specifies a fraction (out of
-        /// the document's token count).
-        ///
-        /// Note that the parameter is only used in transform of CountVectorizerModel and does not
-        /// affect fitting.
-        /// </summary>
-        /// <returns>Minimum term frequency</returns>
-        public double GetMinTF() => (double)_jvmObject.Invoke("getMinTF");
-
-        /// <summary>
-        /// Filter to ignore rare words in a document. For each document, terms with
-        /// frequency/count less than the given threshold are ignored. If this is an integer
-        /// greater than or equal to 1, then this specifies a count (of times the term must appear
-        /// in the document); if this is a double in [0,1), then this specifies a fraction (out of
-        /// the document's token count).
-        ///
-        /// Note that the parameter is only used in transform of CountVectorizerModel and does not
-        /// affect fitting.
-        /// </summary>
-        /// <param name="value">Minimum term frequency</param>
-        /// <returns>New <see cref="CountVectorizer"/> with the min term frequency set</returns>
-        public CountVectorizer SetMinTF(double value) =>
-            WrapAsCountVectorizer((JvmObjectReference)_jvmObject.Invoke("setMinTF", value));
-        
-        /// <summary>
-        /// Gets the max size of the vocabulary. CountVectorizer will build a vocabulary that only
-        /// considers the top vocabSize terms ordered by term frequency across the corpus.
-        /// </summary>
-        /// <returns>The max size of the vocabulary</returns>
-        public int GetVocabSize() => (int)_jvmObject.Invoke("getVocabSize");
-        
-        /// <summary>
-        /// Sets the max size of the vocabulary. <see cref="CountVectorizer"/> will build a
-        /// vocabulary that only considers the top vocabSize terms ordered by term frequency across
-        /// the corpus.
-        /// </summary>
-        /// <param name="value">The max vocabulary size</param>
-        /// <returns><see cref="CountVectorizer"/> with the max vocab value set</returns>
-        public CountVectorizer SetVocabSize(int value) => 
-            WrapAsCountVectorizer(_jvmObject.Invoke("setVocabSize", value));
-        
-        private static CountVectorizer WrapAsCountVectorizer(object obj) => 
-            new CountVectorizer((JvmObjectReference)obj);
-    }
-}
diff --git a/src/csharp/Microsoft.Spark/ML/Feature/CountVectorizerModel.cs b/src/csharp/Microsoft.Spark/ML/Feature/CountVectorizerModel.cs
deleted file mode 100644
index 52bbd72c3..000000000
--- a/src/csharp/Microsoft.Spark/ML/Feature/CountVectorizerModel.cs
+++ /dev/null
@@ -1,170 +0,0 @@
-// Licensed to the .NET Foundation under one or more agreements.
-// The .NET Foundation licenses this file to you under the MIT license.
-// See the LICENSE file in the project root for more information.
-
-using System.Collections.Generic;
-using Microsoft.Spark.Interop;
-using Microsoft.Spark.Interop.Ipc;
-
-namespace Microsoft.Spark.ML.Feature
-{
-    public class CountVectorizerModel : FeatureBase<CountVectorizerModel>
-        , IJvmObjectReferenceProvider
-    {
-        private static readonly string s_countVectorizerModelClassName = 
-            "org.apache.spark.ml.feature.CountVectorizerModel";
-        
-        /// <summary>
-        /// Create a <see cref="CountVectorizerModel"/> without any parameters
-        /// </summary>
-        /// <param name="vocabulary">The vocabulary to use</param>
-        public CountVectorizerModel(List<string> vocabulary) : 
-            this(SparkEnvironment.JvmBridge.CallConstructor(
-                s_countVectorizerModelClassName, vocabulary))
-        {
-        }
-
-        /// <summary>
-        /// Create a <see cref="CountVectorizerModel"/> with a UID that is used to give the
-        /// <see cref="CountVectorizerModel"/> a unique ID
-        /// </summary>
-        /// <param name="uid">An immutable unique ID for the object and its derivatives.</param>
-        /// <param name="vocabulary">The vocabulary to use</param>
-        public CountVectorizerModel(string uid, List<string> vocabulary) : 
-            this(SparkEnvironment.JvmBridge.CallConstructor(
-                s_countVectorizerModelClassName, uid, vocabulary))
-        {
-        }
-        
-        internal CountVectorizerModel(JvmObjectReference jvmObject) : base(jvmObject)
-        {
-        }
-
-        JvmObjectReference IJvmObjectReferenceProvider.Reference => _jvmObject;
-        
-        /// <summary>
-        /// Loads the <see cref="CountVectorizerModel"/> that was previously saved using Save
-        /// </summary>
-        /// <param name="path">
-        /// The path the previous <see cref="CountVectorizerModel"/> was saved to
-        /// </param>
-        /// <returns>New <see cref="CountVectorizerModel"/> object</returns>
-        public static CountVectorizerModel Load(string path) =>
-            WrapAsCountVectorizerModel((JvmObjectReference)
-                SparkEnvironment.JvmBridge.CallStaticJavaMethod(
-                    s_countVectorizerModelClassName,"load", path));
-        
-        /// <summary>
-        /// Gets the binary toggle to control the output vector values. If True, all nonzero counts
-        /// (after minTF filter applied) are set to 1. This is useful for discrete probabilistic
-        /// models that model binary events rather than integer counts. Default: false
-        /// </summary>
-        /// <returns>boolean</returns>
-        public bool GetBinary() => (bool)_jvmObject.Invoke("getBinary");
-
-        /// <summary>
-        /// Sets the binary toggle to control the output vector values. If True, all nonzero counts
-        /// (after minTF filter applied) are set to 1. This is useful for discrete probabilistic
-        /// models that model binary events rather than integer counts. Default: false
-        /// </summary>
-        /// <param name="value">Turn the binary toggle on or off</param>
-        /// <returns>
-        /// <see cref="CountVectorizerModel"/> with the new binary toggle value set
-        /// </returns>
-        public CountVectorizerModel SetBinary(bool value) =>
-            WrapAsCountVectorizerModel((JvmObjectReference)_jvmObject.Invoke("setBinary", value));
-
-        /// <summary>
-        /// Gets the column that the <see cref="CountVectorizerModel"/> should read from and
-        /// convert into buckets. This would have been set by SetInputCol
-        /// </summary>
-        /// <returns>string, the input column</returns>
-        public string GetInputCol() => _jvmObject.Invoke("getInputCol") as string;
-        
-        /// <summary>
-        /// Sets the column that the <see cref="CountVectorizerModel"/> should read from.
-        /// </summary>
-        /// <param name="value">The name of the column to as the source.</param>
-        /// <returns><see cref="CountVectorizerModel"/> with the input column set</returns>
-        public CountVectorizerModel SetInputCol(string value) =>
-            WrapAsCountVectorizerModel(
-                (JvmObjectReference)_jvmObject.Invoke("setInputCol", value));
-        
-        /// <summary>
-        /// The <see cref="CountVectorizerModel"/> will create a new column in the DataFrame, this
-        /// is the name of the new column.
-        /// </summary>
-        /// <returns>The name of the output column.</returns>
-        public string GetOutputCol() => _jvmObject.Invoke("getOutputCol") as string;
-        
-        /// <summary>
-        /// The <see cref="CountVectorizerModel"/> will create a new column in the DataFrame,
-        /// this is the name of the new column.
-        /// </summary>
-        /// <param name="value">The name of the output column which will be created.</param>
-        /// <returns>New <see cref="CountVectorizerModel"/> with the output column set</returns>
-        public CountVectorizerModel SetOutputCol(string value) =>
-            WrapAsCountVectorizerModel(
-                (JvmObjectReference)_jvmObject.Invoke("setOutputCol", value));
-        
-        /// <summary>
-        /// Gets the maximum number of different documents a term could appear in to be included in
-        /// the vocabulary. A term that appears more than the threshold will be ignored. If this is
-        /// an integer greater than or equal to 1, this specifies the maximum number of documents
-        /// the term could appear in; if this is a double in [0,1), then this specifies the maximum
-        /// fraction of documents the term could appear in.
-        /// </summary>
-        /// <returns>The maximum document term frequency</returns>
-        public double GetMaxDF() => (double)_jvmObject.Invoke("getMaxDF");
-        
-        /// <summary>
-        /// Gets the minimum number of different documents a term must appear in to be included in
-        /// the vocabulary. If this is an integer greater than or equal to 1, this specifies the
-        /// number of documents the term must appear in; if this is a double in [0,1), then this
-        /// specifies the fraction of documents.
-        /// </summary>
-        /// <returns>The minimum document term frequency</returns>
-        public double GetMinDF() => (double)_jvmObject.Invoke("getMinDF");
-
-        /// <summary>
-        /// Filter to ignore rare words in a document. For each document, terms with
-        /// frequency/count less than the given threshold are ignored. If this is an integer
-        /// greater than or equal to 1, then this specifies a count (of times the term must appear
-        /// in the document); if this is a double in [0,1), then this specifies a fraction (out of
-        /// the document's token count).
-        ///
-        /// Note that the parameter is only used in transform of CountVectorizerModel and does not
-        /// affect fitting.
-        /// </summary>
-        /// <returns>Minimum term frequency</returns>
-        public double GetMinTF() => (double)_jvmObject.Invoke("getMinTF");
-
-        /// <summary>
-        /// Filter to ignore rare words in a document. For each document, terms with
-        /// frequency/count less than the given threshold are ignored. If this is an integer
-        /// greater than or equal to 1, then this specifies a count (of times the term must appear
-        /// in the document); if this is a double in [0,1), then this specifies a fraction (out of
-        /// the document's token count).
-        ///
-        /// Note that the parameter is only used in transform of CountVectorizerModel and does not
-        /// affect fitting.
-        /// </summary>
-        /// <param name="value">Minimum term frequency</param>
-        /// <returns>
-        /// New <see cref="CountVectorizerModel"/> with the min term frequency set
-        /// </returns>
-        public CountVectorizerModel SetMinTF(double value) =>
-            WrapAsCountVectorizerModel((JvmObjectReference)_jvmObject.Invoke("setMinTF", value));
-        
-        /// <summary>
-        /// Gets the max size of the vocabulary. <see cref="CountVectorizerModel"/> will build a
-        /// vocabulary that only considers the top vocabSize terms ordered by term frequency across
-        /// the corpus.
-        /// </summary>
-        /// <returns>The max size of the vocabulary</returns>
-        public int GetVocabSize() => (int)_jvmObject.Invoke("getVocabSize");
-        
-        private static CountVectorizerModel WrapAsCountVectorizerModel(object obj) => 
-            new CountVectorizerModel((JvmObjectReference)obj);
-    }
-}
diff --git a/src/csharp/Microsoft.Spark/ML/Feature/FeatureBase.cs b/src/csharp/Microsoft.Spark/ML/Feature/FeatureBase.cs
index 326268a5e..fcc90b43d 100644
--- a/src/csharp/Microsoft.Spark/ML/Feature/FeatureBase.cs
+++ b/src/csharp/Microsoft.Spark/ML/Feature/FeatureBase.cs
@@ -105,8 +105,8 @@ private static T WrapAsType(JvmObjectReference reference)
                 .Single(c =>
                 {
                     ParameterInfo[] parameters = c.GetParameters();
-                    return (parameters.Length == 1) && 
-                           (parameters[0].ParameterType == typeof(JvmObjectReference));
+                    return (parameters.Length == 1) &&
+                        (parameters[0].ParameterType == typeof(JvmObjectReference));
                 });
 
             return (T)constructor.Invoke(new object[] {reference});

From 3c2c936b007d7b5d761fda737625dc8f7d03728b Mon Sep 17 00:00:00 2001
From: Niharika Dutta <nidutta@microsoft.com>
Date: Fri, 14 Aug 2020 13:32:54 -0700
Subject: [PATCH 15/36] fixing merge errors

---
 .gitignore                                                     | 3 +++
 .../Processor/BroadcastVariableProcessor.cs                    | 3 ++-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/.gitignore b/.gitignore
index 251cfa7e2..8e67b5699 100644
--- a/.gitignore
+++ b/.gitignore
@@ -367,3 +367,6 @@ hs_err_pid*
 
 # The target folder contains the output of building
 **/target/**
+
+# F# vs code 
+.ionide/
\ No newline at end of file
diff --git a/src/csharp/Microsoft.Spark.Worker/Processor/BroadcastVariableProcessor.cs b/src/csharp/Microsoft.Spark.Worker/Processor/BroadcastVariableProcessor.cs
index 41c817d02..bf8f48ed8 100644
--- a/src/csharp/Microsoft.Spark.Worker/Processor/BroadcastVariableProcessor.cs
+++ b/src/csharp/Microsoft.Spark.Worker/Processor/BroadcastVariableProcessor.cs
@@ -54,7 +54,8 @@ internal BroadcastVariables Process(Stream stream)
                     else
                     {
                         string path = SerDe.ReadString(stream);
-                        using FileStream fStream = File.Open(path, FileMode.Open, FileAccess.Read);
+                        using FileStream fStream = 
+                            File.Open(path, FileMode.Open, FileAccess.Read, FileShare.Read);
                         object value = formatter.Deserialize(fStream);
                         BroadcastRegistry.Add(bid, value);
                     }

From 88e834d53b7be8931147a095a7b0df3c08cd9aa8 Mon Sep 17 00:00:00 2001
From: Niharika Dutta <nidutta@microsoft.com>
Date: Wed, 19 Aug 2020 19:24:14 -0700
Subject: [PATCH 16/36] removing ionid

---
 .gitignore             |   2 +-
 .ionide/symbolCache.db | Bin 28672 -> 0 bytes
 2 files changed, 1 insertion(+), 1 deletion(-)
 delete mode 100644 .ionide/symbolCache.db

diff --git a/.gitignore b/.gitignore
index 8e67b5699..faada9c8a 100644
--- a/.gitignore
+++ b/.gitignore
@@ -369,4 +369,4 @@ hs_err_pid*
 **/target/**
 
 # F# vs code 
-.ionide/
\ No newline at end of file
+.ionide/
diff --git a/.ionide/symbolCache.db b/.ionide/symbolCache.db
deleted file mode 100644
index 43e567d6d682d85dd32b3baebb0fdf61f67c1643..0000000000000000000000000000000000000000
GIT binary patch
literal 0
HcmV?d00001

literal 28672
zcmeHPYiuJ|6}A(<pK-j2lTG3{v7KzX&Lqyu+kIp=n>f3n*{Ao>?UI&RXX3biC7$u_
zru*Uw_|aboAq4!aPz(G7Dp3^)2^9$msX|362qA<70{*l}EvWneB<{TIImtK)+7qdu
zu{?L~_%Yvi&pr3fz2}^JGgp@L0vB1UR7<Q7nKZST%x2S>NW^3^wa~*(5A|iH8H;*B
z&*Jr7uND(C74gzvnY~{#(YNt3Bw$FukbofpLjr~b3<($#FeG3|z>t6;0j&hSxNf$G
zdV0*S4h!s^BA3}J-Ki9L<<v5NH9dDN9l17lb~znUK0M-eMYg>b^F{6=TjC<u($8N@
z?EF3>uvK9>U*;l97n^=RUn$l~<tD#KmsfZBKmB&se=qTi)k<lBTg#X8LcUUtoV|2u
zIX!n#hpPP3IYFrA@6_mQb+2E}S4tZ-ffetXzQ0lBcz!?kU&!+ttdL#L6f43~X`{%M
zxUzhcJpnJXg1}YxF2c2~josMGmAP0e7I&>UNgYJCMRsfNA1?9Bl`LCS%K#N&#7gzf
z;{QO@KD+;){!jcL`9JXd#`A*vLwDKru4~cx6X&wy2aZ|$+xDBbpW9ZfKeZmWylYu)
z{ax!q%P(3^n}21#VfwY{Wv%+^=L?ZG(qpDCr$c=8^*P6`^IVl5<5tIVd0~tSzgigM
z?z5uk`LPT6Y_-By)&wRae!(ne*4gR?lUBdKT&?7)Y>8Rp+k4w%>fhA!qkaWU!g8j&
z9avyP?TH38h17hd$}v3E&T>vpABO?_g&-RIX#2Phe6h%7MQ!I9p4+7FTpy5icQ=}>
z549iWIkuWzms8^H1xPCcSS58?UH(Q%WgSo}pSi&1%ghFqw{V?jb6g|G_h<$0h{v%C
z-gd4n!}2^=x>J<eoHkE74+GR%50((xtZkl#5VYEszAseSYn6><?k-onw82$bp;F~#
znXHv1fvlB@r5g~-pfv-3(m$(DEF(1oCR-_IX=f^>xh}w>8<Ry!#*@=R2ms-AG5)xJ
zRP9>SGqHC9v|D(^V*GlIoc1w|iaru~I2##}Q_|T1nBeK3-|27DujYN}U*Qnog;yWG
zZr^}f59*n?f&gs=trhybzAhapG&<c3x5?!BbZJA_Qci3;;1Wc808lP%6pp@$IRG#+
zco+_!Tc^B0|H>7CzWFvl0k6CDOn7FsU92`wI{g3@Pu)FM&(n0byiegJk8tp$;dZ(^
zv=Y$fNsAXqBh!KP@Nsv!7c*PDz?GP*+?q1NVCFOrK}H;XiU+ZH0EsJTZI6;T-JAq-
zjuS+D<wU}6K%+2?D2wUcX{7IS*_uE{fU1BfryZX?ashsGeTe+%RIpiHsgw#%1s;0{
z?h3fN)qYq#KXPcxiZ}#!^|}W2yC-?@EvJv|03L4$?uxpGbl@9L13Lbu@4GfFzGDn+
zVC?WZLOZIu(Li%nfQvWSa@ZUm8COnomIDH^+w1`7K`Q_{h1O`SUDimWCgBt#t;{nk
zEdcyz2&=_f+pVE(;1Yw9836Y8qX9bTyxjJLCe{~eVwt5$tK^%H(CNGXEB=V@TfXDo
z_q?}^`Ts7)24ntj%>U_#0}R28`M+eFlgwc9Ox&3NKZQZlZ_NMqF!>tue`EenJqeO=
z-(k%EWxt9n4P*W<`j}){81sMGVPVYwoyPqCY2B>G{GWCz|I_n-YoKm$sAt63{IKZ@
zCjVW()Axq2+xxcnw&!C{$^DLd+4ZLDu=DedKRby1-}XDUcWqa#zqMYm{LFH`_1mrC
zmLIi@o8L1(L;w7AFA|##XrEvtfaXaX7#xVzi>ihYNYj*Mww$X`*YV|QzC^=M?s7b{
zR2E&Ad_Jr7vJDN1UN$i$u~P>{b1*gdEEdg`lZyRFw(<J?SRU|jBs?Gp#Bl=Hr*5Ow
zRk$sDSVssAyDIg0F>CoE4YZL{_BddI=VPNxf)tFtiS1c{kwLRP_Bs2KJFK{M=Zfgp
z5MzaS^-RLY01>>A4J9(PJCPk;3|-Gg3h=}8Y*2oI=KR!=4YAJvd^~4-*c;Zwp=d)e
zwB3Zp8E>CHASQ8d{J&ySn^K6#J;CrWR!<OehB~X2isX6YH`pu(F198?`BP{H>`-^;
zqM`|6+mM`(61?aElrp4!0${VlSjO{EDzvau3op=cAg;PpUaK$*T(-!H0bn9Ea6!6~
zfK+Z2jZ}ANN{^JVURgcM@|@U<%-5<_8pe2m6F=O3P0ZtfS{ltsMe8cM8#S4aNHRO7
zP>{8>qXSDzJ6)YVwwmL`gM=7R(3d8$>Y$^w!>jYuqOlTbI-J<LT%j>)^}tn~k4R`#
z%gp&{VwO;uNmcVHV%BjK=nOZ4Rh#YB_F$tnC4XE!;#3Ygq;<NOFRv=@L3mYrMirQ3
zewaoZ8U=@gV@hjM%<JjR8e8P$lnjpm+|4;GaRiSAvjA5NtfsvjUn$ek@?Dx^%e03n
z3biV?lq=IH*m>Hbn}0)EjsT`7(7Hm(^n4Shvcww9bHjiGUTkb|JUFBEjjap;AiR+{
zRfi)Sw-Q%wk3G;24hEwfM_e&LA1|CP7zp+@8d-fr1qb>{%Ti8k6mY>C>QgR<<e`3H
z5J2F?Dh91VuNo7ho{4<`py3TygT7#sh)w<BsyP3Tn7(B4-|~Imchviu_pIkF&y@Rp
z_YK!CUA*%X=hvORj@KPS_9wP~*uG)wv%X;+v%F;)Z+)-zM$2DX*35r0S4{t+H+Y&?
z?L{CBypd`w5*r1BQ*g1l0$@kc`s-LY*1uiDWjz-@3m{Wy!Fqa-jA-c)k0-YQYBGuS
zfDlo4T2gj?^5lgBBv7~8EY`~MDu|t*lx_h6cu_S#?)5s~dLF>x&4FUUSXg<^HJJmg
z+yu~gPaO(<h}wUwH^BXGq6I|7W@4$j0D-Me+jE@~&2coTT?aJq##+%M(UV{#!7Ov-
zoUksZwQB$tucxB0)1Y$#7qtw4MJ#PF8gE|Ot*ZbQPmQA;(Y@683V_B_<7hA{sqtlS
ze>^pg4n>Feuf`;H>7b0t(b?!3eIh&iXMNlE!-ryH6Vpc*fwWGc*$D;%gTqk&q-Cl6
z&AhPQ$ki?Yc)V6Ocw}&wR=2ebVDZ#{FgVz|M=$;yfW}jQD)4qec<>o;`)ASWbD(2j
zKwGjUe(ny9$7ZuaBe${y!12mP!8_>_%6uN&A8%dEKw#jIev4{6q2zTjkIn%KC!$yb
zu1H5@5XzU1b*Mvx8ey0CdUj<Iz~f1n!b=iPgZtx2I1-5LQ^Lf)063)4ObPvg{s|~z
z>SW00%Hm3Vam6}SWN^JQ4~XCiyMMBOimJ2yEVw`3RFM``6M<fzhl6Ea0<~t-HP}*0
zEOS~W&P!;{u>P+8rh)j(l45I9&m?x{pdOk1GyTW(8~A=7nOb-8>4TyfiALgBG{yP<
zfN9y}f7L(ad(C&&`$zA~o<DdB?!UY5xm~W;T^-KvIKz(bIlAp%wm)Ng$9CNMrZr~y
zp(WY+<JKv<1HiiZ_vWJMLwXAn2|v*)hy%|_IT@H_ls4H|=>oi^`*Tu02K?}xl$2k0
zbAGCma`TY_FI()QQ7l(SGLY<5OiOdr+o8!4+e1*~+h_)d<Zv>nql67B-dwfHZZUiz
zzabvn1HdDMHf^Ll5)zb@WM2U=c;2LBe==J4CfT<D3SO6-j3xEfmVqu=yYhSUzCr|B
zqsH6DdlWSiy{nomX}zZ@ywh<VO{2!$<4yhB;^r17Uz%kyl@cc&g}WLOn45_(T}(tN
zt=f)w;;jKRyiG0`FQa(`>-n1+TU#`8q*c)H!K{K&_voHgaIsS6#Dmy)4GV_@uWLz6
zCQj|6Ygy;mRq^1efCpaJLU|n1bS;|z2Cr*L98HWhx|R)qg4eYq#uF1dLT%c$kd6vq
zg6CvS^dv?aO!Sl<x>ey-p$ur@`DqgVL}*9Lx0e7KVs&&S^rGFGR!5--h~O1_<j{yZ
zcF=Tq1%QUv;f**)_HeK_y17O!`!b+|*Yt+n!+y2t74iy*{4B+jaIdpBuwAhptb-Dr
zKns$J55{Mdk*E?Xkg16l7#f*W&(v}N9M75@PsN{u;J|ji24L~5cocR4n9IQhv2p+x
z&(s@Vj$deWWcm~6Bax$`5tSzQRsoG^wAph!7EdTrmwZC%{&DJgB@2M@3=~w?qJpq;
zN4Y(A974P=9;Wt>78cOIj<(HBye+O7eATP3F@vFj0ZfVApl|kbfCFCq0p*~C8rNO|
z5O|G}`oPsFL8V8~HcD~+AGF&{{x#oUeXn>w_CEI7Jzw?M-QRS_UEg(`cE0b-IsWD-
z*gvut>CcRpApt`Ih6D@=7!vqDl0cTI`}N6#m<}|fmUX}y7k4Wa4`L?jmVGkJ6*QPm
zYp1g@6xEwkmnZ@8xJSUAea_Qp<Ehl~l%Cm!PHFRHx-ptSECg=chqoPR%9}d8)BfIA
zWFST>R|;ZeGEFy$x=)|HN*vD)pBhe0C^NlncK8B;yY%7BE>k_JC|$KEu;nbLo^u3l
z$A=f=Oa&b*Noz^uG)L4O_hgO6Lz)T9gtSOo1n!-O7mYD<ct(w`m*!7&m5i)8^~@$w
zH_pS>7>XTNG)BbMWD7*yKTnn{p2kcErn^;5QO|^~@dfjb=g`(vO!rQ!^N%?io_SZ?
zsF<0ra;!kxqpY{^imti@RPfs8^z`%@jfgc_8rKLgN6`#2wD@X-UA_4A%qCGc(UU6~
zuc#A&3DvB(O|fV<3^&xn+pKdkFx9uS8&=()>Y`TKtVBwi1a7T|SN5q>Q|Fbs-R8Wi
r5i_V}JR8*1*wjg2b^;fqlb8T7-kvp6;i)FU08O@N<|Vu8nsWLNN`{i9


From f92820fd2f989bf1a70e27f5487081f115f83eef Mon Sep 17 00:00:00 2001
From: Niharika Dutta <nidutta@microsoft.com>
Date: Wed, 28 Oct 2020 11:13:26 -0700
Subject: [PATCH 17/36] first commit

---
 README.md                                 |   2 +-
 benchmark/scala/pom.xml                   |   2 +-
 docs/release-notes/1.0.1/release-1.0.1.md | 181 ++++++++++++++++++++++
 eng/Versions.props                        |   2 +-
 src/scala/pom.xml                         |   2 +-
 5 files changed, 185 insertions(+), 4 deletions(-)
 create mode 100644 docs/release-notes/1.0.1/release-1.0.1.md

diff --git a/README.md b/README.md
index 03dd08e39..7813372a2 100644
--- a/README.md
+++ b/README.md
@@ -39,7 +39,7 @@
     <tbody align="center">
         <tr>
             <td >2.3</td>
-            <td rowspan=3><a href="https://github.com/dotnet/spark/releases/tag/v1.0.0">v1.0.0</a></td>
+            <td rowspan=3><a href="https://github.com/dotnet/spark/releases/tag/v1.0.1">v1.0.1</a></td>
         </tr>
         <tr>
             <td>2.4*</td>
diff --git a/benchmark/scala/pom.xml b/benchmark/scala/pom.xml
index c1f61f288..b6eb7acff 100644
--- a/benchmark/scala/pom.xml
+++ b/benchmark/scala/pom.xml
@@ -3,7 +3,7 @@
   <modelVersion>4.0.0</modelVersion>
   <groupId>com.microsoft.spark</groupId>
   <artifactId>microsoft-spark-benchmark</artifactId>
-  <version>1.0.0</version>
+  <version>1.0.1</version>
   <inceptionYear>2019</inceptionYear>
   <properties>
     <encoding>UTF-8</encoding>
diff --git a/docs/release-notes/1.0.1/release-1.0.1.md b/docs/release-notes/1.0.1/release-1.0.1.md
new file mode 100644
index 000000000..cbde5b303
--- /dev/null
+++ b/docs/release-notes/1.0.1/release-1.0.1.md
@@ -0,0 +1,181 @@
+# .NET for Apache Spark 1.0.1 Release Notes
+
+### New Features/Improvements
+
+* Use pattern matching in arrow test utils to improve readability ([#725](https://github.com/dotnet/spark/pull/725))
+* Support for Arrow 2.0 and GroupedMapUdf in Spark 3.0.0 ([#654](https://github.com/dotnet/spark/issues/654))
+
+### Bug Fixes
+
+* Fix signer information mismatch issue ([752](https://github.com/dotnet/spark/pull/752))
+
+### Infrastructure / Documentation / Etc.
+
+* Fix flaky CallbackTests.TestCallbackHandlers Test ([#745](https://github.com/dotnet/spark/pull/745))
+
+### Breaking Changes
+
+* None
+
+### Known Issues
+
+* Broadcast variables do not work with [dotnet-interactive](https://github.com/dotnet/interactive) ([#561](https://github.com/dotnet/spark/pull/561))
+* UDFs defined using class objects with closures does not work with [dotnet-interactive](https://github.com/dotnet/interactive) ([#619](https://github.com/dotnet/spark/pull/619))
+* In [dotnet-interactive](https://github.com/dotnet/interactive) blocking Spark methods that require external threads to unblock them does not work. ie `StreamingQuery.AwaitTermination` requires `StreamingQuery.Stop` to unblock ([#736](https://github.com/dotnet/spark/pull/736))
+
+### Compatibility
+
+#### Backward compatibility
+
+The following table describes the oldest version of the worker that the current version is compatible with, along with new features that are incompatible with the worker.
+
+<table>
+    <thead>
+        <tr>
+            <th>Oldest compatible Microsoft.Spark.Worker version</th>
+            <th>Incompatible features</th>
+        </tr>
+    </thead>
+    <tbody align="center">
+        <tr>
+            <td rowspan=4>v1.0.0</td>
+        </tr>
+        <tr>
+        </tr>
+        <tr>
+        </tr>
+        <tr>
+        </tr>
+    </tbody>
+</table>
+
+#### Forward compatibility
+
+The following table describes the oldest version of .NET for Apache Spark release that the current worker is compatible with.
+
+<table>
+    <thead>
+        <tr>
+            <th>Oldest compatible .NET for Apache Spark release version</th>
+        </tr>
+    </thead>
+    <tbody align="center">
+        <tr>
+            <td>v1.0.0</td>
+        </tr>
+    </tbody>
+</table>
+
+### Supported Spark Versions
+
+The following table outlines the supported Spark versions along with the microsoft-spark JAR to use with:
+
+<table>
+    <thead>
+        <tr>
+            <th>Spark Version</th>
+            <th>microsoft-spark JAR</th>
+        </tr>
+    </thead>
+    <tbody align="center">
+        <tr>
+            <td>2.3.*</td>
+            <td>microsoft-spark-2-3_2.11-1.0.0.jar</td>
+        </tr>
+        <tr>
+            <td>2.4.0</td>
+            <td rowspan=7>microsoft-spark-2-4_2.11-1.0.0.jar</td>
+        </tr>
+        <tr>
+            <td>2.4.1</td>
+        </tr>
+        <tr>
+            <td>2.4.3</td>
+        </tr>
+        <tr>
+            <td>2.4.4</td>
+        </tr>
+        <tr>
+            <td>2.4.5</td>
+        </tr>
+        <tr>
+            <td>2.4.6</td>
+        </tr>
+        <tr>
+            <td>2.4.7</td>
+        </tr>
+        <tr>
+            <td>2.4.2</td>
+            <td><a href="https://github.com/dotnet/spark/issues/60">Not supported</a></td>
+        </tr>
+        <tr>
+            <td>3.0.0</td>
+            <td rowspan=2>microsoft-spark-3-0_2.12-1.0.0.jar</td>
+        </tr>
+        <tr>
+            <td>3.0.1</td>
+        </tr>
+    </tbody>
+</table>
+
+### Supported Delta Versions
+
+The following table outlines the supported Delta versions along with the Microsoft.Spark.Extensions version to use with:
+
+<table>
+    <thead>
+        <tr>
+            <th>Delta Version</th>
+            <th>Microsoft.Spark.Extensions.Delta</th>
+        </tr>
+    </thead>
+    <tbody align="center">
+        <tr>
+            <td>0.1.0</td>
+            <td rowspan=8>1.0.0</td>
+        </tr>
+        <tr>
+            <td>0.2.0</td>
+        </tr>
+        <tr>
+            <td>0.3.0</td>
+        </tr>
+        <tr>
+            <td>0.4.0</td>
+        </tr>
+        <tr>
+            <td>0.5.0</td>
+        </tr>
+        <tr>
+            <td>0.6.0</td>
+        </tr>
+        <tr>
+            <td>0.6.1</td>
+        </tr>
+        <tr>
+            <td>0.7.0</td>
+        </tr>
+    </tbody>
+</table>
+
+### Supported Hyperspace Versions
+
+The following table outlines the supported Hyperspace versions along with the Microsoft.Spark.Extensions version to use with:
+
+<table>
+    <thead>
+        <tr>
+            <th>Hyperspace Version</th>
+            <th>Microsoft.Spark.Extensions.Hyperspace</th>
+        </tr>
+    </thead>
+    <tbody align="center">
+        <tr>
+            <td>0.1.0</td>
+            <td rowspan=2>1.0.0</td>
+        </tr>
+        <tr>
+            <td>0.2.0</td>
+        </tr>
+    </tbody>
+</table>
diff --git a/eng/Versions.props b/eng/Versions.props
index d5551afa3..0b98f7489 100644
--- a/eng/Versions.props
+++ b/eng/Versions.props
@@ -1,7 +1,7 @@
 <?xml version="1.0" encoding="utf-8"?>
 <Project ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
   <PropertyGroup>
-    <VersionPrefix>1.0.0</VersionPrefix>
+    <VersionPrefix>1.0.1</VersionPrefix>
     <PreReleaseVersionLabel>prerelease</PreReleaseVersionLabel>
     <RestoreSources>
       $(RestoreSources);
diff --git a/src/scala/pom.xml b/src/scala/pom.xml
index bb0b408ae..d2539d120 100644
--- a/src/scala/pom.xml
+++ b/src/scala/pom.xml
@@ -7,7 +7,7 @@
   <version>${microsoft-spark.version}</version>
   <properties>
     <encoding>UTF-8</encoding>
-    <microsoft-spark.version>1.0.0</microsoft-spark.version>
+    <microsoft-spark.version>1.0.1</microsoft-spark.version>
   </properties>
 
   <modules>

From 1478b1cc05b858027b1420a51b3267ec3441223d Mon Sep 17 00:00:00 2001
From: Niharika Dutta <nidutta@microsoft.com>
Date: Wed, 28 Oct 2020 11:19:37 -0700
Subject: [PATCH 18/36] formatting

---
 docs/release-notes/1.0.1/release-1.0.1.md | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/docs/release-notes/1.0.1/release-1.0.1.md b/docs/release-notes/1.0.1/release-1.0.1.md
index cbde5b303..8faf13b20 100644
--- a/docs/release-notes/1.0.1/release-1.0.1.md
+++ b/docs/release-notes/1.0.1/release-1.0.1.md
@@ -7,7 +7,7 @@
 
 ### Bug Fixes
 
-* Fix signer information mismatch issue ([752](https://github.com/dotnet/spark/pull/752))
+* Fix signer information mismatch issue ([#752](https://github.com/dotnet/spark/pull/752))
 
 ### Infrastructure / Documentation / Etc.
 
@@ -39,6 +39,7 @@ The following table describes the oldest version of the worker that the current
     <tbody align="center">
         <tr>
             <td rowspan=4>v1.0.0</td>
+            <td></td>
         </tr>
         <tr>
         </tr>

From cfb81544ab1ea709a89d19e9a6154e623991a086 Mon Sep 17 00:00:00 2001
From: Niharika Dutta <nidutta@microsoft.com>
Date: Wed, 28 Oct 2020 11:20:57 -0700
Subject: [PATCH 19/36] formatting

---
 docs/release-notes/1.0.1/release-1.0.1.md | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/docs/release-notes/1.0.1/release-1.0.1.md b/docs/release-notes/1.0.1/release-1.0.1.md
index 8faf13b20..a2bd97341 100644
--- a/docs/release-notes/1.0.1/release-1.0.1.md
+++ b/docs/release-notes/1.0.1/release-1.0.1.md
@@ -38,15 +38,9 @@ The following table describes the oldest version of the worker that the current
     </thead>
     <tbody align="center">
         <tr>
-            <td rowspan=4>v1.0.0</td>
+            <td rowspan>v1.0.0</td>
             <td></td>
         </tr>
-        <tr>
-        </tr>
-        <tr>
-        </tr>
-        <tr>
-        </tr>
     </tbody>
 </table>
 

From a0e556a4b66069ab54b40a6adf7af056d8e2373f Mon Sep 17 00:00:00 2001
From: Niharika Dutta <nidutta@microsoft.com>
Date: Wed, 28 Oct 2020 11:22:03 -0700
Subject: [PATCH 20/36] update jar name

---
 docs/release-notes/1.0.1/release-1.0.1.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/release-notes/1.0.1/release-1.0.1.md b/docs/release-notes/1.0.1/release-1.0.1.md
index a2bd97341..ce429b00b 100644
--- a/docs/release-notes/1.0.1/release-1.0.1.md
+++ b/docs/release-notes/1.0.1/release-1.0.1.md
@@ -75,11 +75,11 @@ The following table outlines the supported Spark versions along with the microso
     <tbody align="center">
         <tr>
             <td>2.3.*</td>
-            <td>microsoft-spark-2-3_2.11-1.0.0.jar</td>
+            <td>microsoft-spark-2-3_2.11-1.0.1.jar</td>
         </tr>
         <tr>
             <td>2.4.0</td>
-            <td rowspan=7>microsoft-spark-2-4_2.11-1.0.0.jar</td>
+            <td rowspan=7>microsoft-spark-2-4_2.11-1.0.1.jar</td>
         </tr>
         <tr>
             <td>2.4.1</td>
@@ -105,7 +105,7 @@ The following table outlines the supported Spark versions along with the microso
         </tr>
         <tr>
             <td>3.0.0</td>
-            <td rowspan=2>microsoft-spark-3-0_2.12-1.0.0.jar</td>
+            <td rowspan=2>microsoft-spark-3-0_2.12-1.0.1.jar</td>
         </tr>
         <tr>
             <td>3.0.1</td>

From 0d5e89c67b3fd9f30ffc143e11ebe8b1f2bd20ae Mon Sep 17 00:00:00 2001
From: Niharika Dutta <nidutta@microsoft.com>
Date: Wed, 28 Oct 2020 11:23:26 -0700
Subject: [PATCH 21/36] fix table

---
 docs/release-notes/1.0.1/release-1.0.1.md | 2 --
 1 file changed, 2 deletions(-)

diff --git a/docs/release-notes/1.0.1/release-1.0.1.md b/docs/release-notes/1.0.1/release-1.0.1.md
index ce429b00b..0f38dcb53 100644
--- a/docs/release-notes/1.0.1/release-1.0.1.md
+++ b/docs/release-notes/1.0.1/release-1.0.1.md
@@ -33,13 +33,11 @@ The following table describes the oldest version of the worker that the current
     <thead>
         <tr>
             <th>Oldest compatible Microsoft.Spark.Worker version</th>
-            <th>Incompatible features</th>
         </tr>
     </thead>
     <tbody align="center">
         <tr>
             <td rowspan>v1.0.0</td>
-            <td></td>
         </tr>
     </tbody>
 </table>

From b9801c8a943a5897d7d671ea1aa1eb796df2408a Mon Sep 17 00:00:00 2001
From: Niharika Dutta <nidutta@microsoft.com>
Date: Wed, 4 Nov 2020 03:18:42 -0800
Subject: [PATCH 22/36] Updating release notes

---
 docs/release-notes/1.0.1/release-1.0.1.md | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/docs/release-notes/1.0.1/release-1.0.1.md b/docs/release-notes/1.0.1/release-1.0.1.md
index 0f38dcb53..52191e1dc 100644
--- a/docs/release-notes/1.0.1/release-1.0.1.md
+++ b/docs/release-notes/1.0.1/release-1.0.1.md
@@ -2,16 +2,19 @@
 
 ### New Features/Improvements
 
-* Use pattern matching in arrow test utils to improve readability ([#725](https://github.com/dotnet/spark/pull/725))
 * Support for Arrow 2.0 and GroupedMapUdf in Spark 3.0.0 ([#654](https://github.com/dotnet/spark/issues/654))
+* Use pattern matching in arrow test utils to improve readability ([#725](https://github.com/dotnet/spark/pull/725))
+* Check whether file is found before trying to dereference it ([#759](https://github.com/dotnet/spark/pull/759))
 
 ### Bug Fixes
 
 * Fix signer information mismatch issue ([#752](https://github.com/dotnet/spark/pull/752))
+* Fix package-worker.ps1 to handle output path with ":" ([#742](https://github.com/dotnet/spark/pull/742))
 
 ### Infrastructure / Documentation / Etc.
 
 * Fix flaky CallbackTests.TestCallbackHandlers Test ([#745](https://github.com/dotnet/spark/pull/745))
+* Run E2E tests on Linux in build pipeline and add Backward/Forward E2E tests ([#737](https://github.com/dotnet/spark/pull/737))
 
 ### Breaking Changes
 
@@ -125,7 +128,7 @@ The following table outlines the supported Delta versions along with the Microso
     <tbody align="center">
         <tr>
             <td>0.1.0</td>
-            <td rowspan=8>1.0.0</td>
+            <td rowspan=8>1.0.1</td>
         </tr>
         <tr>
             <td>0.2.0</td>
@@ -165,7 +168,7 @@ The following table outlines the supported Hyperspace versions along with the Mi
     <tbody align="center">
         <tr>
             <td>0.1.0</td>
-            <td rowspan=2>1.0.0</td>
+            <td rowspan=2>1.0.1</td>
         </tr>
         <tr>
             <td>0.2.0</td>

From b66f23fca6649fdaffc7880b6e2b73739a69fefe Mon Sep 17 00:00:00 2001
From: Niharika Dutta <nidutta@microsoft.com>
Date: Wed, 4 Nov 2020 17:51:18 -0800
Subject: [PATCH 23/36] PR comments

---
 docs/release-notes/1.0.1/release-1.0.1.md | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/docs/release-notes/1.0.1/release-1.0.1.md b/docs/release-notes/1.0.1/release-1.0.1.md
index 52191e1dc..1373e0575 100644
--- a/docs/release-notes/1.0.1/release-1.0.1.md
+++ b/docs/release-notes/1.0.1/release-1.0.1.md
@@ -25,6 +25,7 @@
 * Broadcast variables do not work with [dotnet-interactive](https://github.com/dotnet/interactive) ([#561](https://github.com/dotnet/spark/pull/561))
 * UDFs defined using class objects with closures does not work with [dotnet-interactive](https://github.com/dotnet/interactive) ([#619](https://github.com/dotnet/spark/pull/619))
 * In [dotnet-interactive](https://github.com/dotnet/interactive) blocking Spark methods that require external threads to unblock them does not work. ie `StreamingQuery.AwaitTermination` requires `StreamingQuery.Stop` to unblock ([#736](https://github.com/dotnet/spark/pull/736))
+* UDFs don't work in Linux with Spark 2.3.0 ([#753]https://github.com/dotnet/spark/issues/753))
 
 ### Compatibility
 
@@ -36,11 +37,13 @@ The following table describes the oldest version of the worker that the current
     <thead>
         <tr>
             <th>Oldest compatible Microsoft.Spark.Worker version</th>
+            <th>Incompatible features</th>
         </tr>
     </thead>
     <tbody align="center">
         <tr>
-            <td rowspan>v1.0.0</td>
+            <td>v1.0.0</td>
+            <td>GroupedMap in Spark 3.0 is not compatible with Worker 1.0 <a href="https://github.com/dotnet/spark/pull/654">(#654)</a></td>
         </tr>
     </tbody>
 </table>

From a58fa74854bb06121d82985f637c758772c8e005 Mon Sep 17 00:00:00 2001
From: Niharika Dutta <nidutta@microsoft.com>
Date: Wed, 4 Nov 2020 17:52:47 -0800
Subject: [PATCH 24/36] fixed formatting

---
 docs/release-notes/1.0.1/release-1.0.1.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/release-notes/1.0.1/release-1.0.1.md b/docs/release-notes/1.0.1/release-1.0.1.md
index 1373e0575..122dc0bb4 100644
--- a/docs/release-notes/1.0.1/release-1.0.1.md
+++ b/docs/release-notes/1.0.1/release-1.0.1.md
@@ -25,7 +25,7 @@
 * Broadcast variables do not work with [dotnet-interactive](https://github.com/dotnet/interactive) ([#561](https://github.com/dotnet/spark/pull/561))
 * UDFs defined using class objects with closures does not work with [dotnet-interactive](https://github.com/dotnet/interactive) ([#619](https://github.com/dotnet/spark/pull/619))
 * In [dotnet-interactive](https://github.com/dotnet/interactive) blocking Spark methods that require external threads to unblock them does not work. ie `StreamingQuery.AwaitTermination` requires `StreamingQuery.Stop` to unblock ([#736](https://github.com/dotnet/spark/pull/736))
-* UDFs don't work in Linux with Spark 2.3.0 ([#753]https://github.com/dotnet/spark/issues/753))
+* UDFs don't work in Linux with Spark 2.3.0 ([#753](https://github.com/dotnet/spark/issues/753))
 
 ### Compatibility
 

From a64fd508d30daed6dfcfbb706d1416d6c57f941a Mon Sep 17 00:00:00 2001
From: Niharika Dutta <nidutta@microsoft.com>
Date: Fri, 6 Nov 2020 00:51:35 -0800
Subject: [PATCH 25/36] update

---
 docs/release-notes/1.0.1/release-1.0.1.md | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/docs/release-notes/1.0.1/release-1.0.1.md b/docs/release-notes/1.0.1/release-1.0.1.md
index 122dc0bb4..11e64e0e6 100644
--- a/docs/release-notes/1.0.1/release-1.0.1.md
+++ b/docs/release-notes/1.0.1/release-1.0.1.md
@@ -5,16 +5,19 @@
 * Support for Arrow 2.0 and GroupedMapUdf in Spark 3.0.0 ([#654](https://github.com/dotnet/spark/issues/654))
 * Use pattern matching in arrow test utils to improve readability ([#725](https://github.com/dotnet/spark/pull/725))
 * Check whether file is found before trying to dereference it ([#759](https://github.com/dotnet/spark/pull/759))
+* Ml/feature hasher has only internal contructors ([#761](https://github.com/dotnet/spark/pull/761))
 
 ### Bug Fixes
 
 * Fix signer information mismatch issue ([#752](https://github.com/dotnet/spark/pull/752))
 * Fix package-worker.ps1 to handle output path with ":" ([#742](https://github.com/dotnet/spark/pull/742))
+* Fixes for TimestampType and DateType conversion ([#765](https://github.com/dotnet/spark/pull/765))
 
 ### Infrastructure / Documentation / Etc.
 
 * Fix flaky CallbackTests.TestCallbackHandlers Test ([#745](https://github.com/dotnet/spark/pull/745))
 * Run E2E tests on Linux in build pipeline and add Backward/Forward E2E tests ([#737](https://github.com/dotnet/spark/pull/737))
+* Add comments and cleanup azure pipeline ([#764](https://github.com/dotnet/spark/pull/764))
 
 ### Breaking Changes
 

From 4fe60a9f529950910aeffbff99f8d3a26fc36cee Mon Sep 17 00:00:00 2001
From: Niharika Dutta <nidutta@microsoft.com>
Date: Fri, 6 Nov 2020 14:25:25 -0800
Subject: [PATCH 26/36] updating release notes

---
 docs/release-notes/1.0.1/release-1.0.1.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/docs/release-notes/1.0.1/release-1.0.1.md b/docs/release-notes/1.0.1/release-1.0.1.md
index 11e64e0e6..279e7ba90 100644
--- a/docs/release-notes/1.0.1/release-1.0.1.md
+++ b/docs/release-notes/1.0.1/release-1.0.1.md
@@ -12,6 +12,7 @@
 * Fix signer information mismatch issue ([#752](https://github.com/dotnet/spark/pull/752))
 * Fix package-worker.ps1 to handle output path with ":" ([#742](https://github.com/dotnet/spark/pull/742))
 * Fixes for TimestampType and DateType conversion ([#765](https://github.com/dotnet/spark/pull/765))
+* Fix for using Broadcast variables in Databricks ([#766](https://github.com/dotnet/spark/pull/766))
 
 ### Infrastructure / Documentation / Etc.
 

From 12361131ee8f03ed24f5a082b5616c83f2c4b5ab Mon Sep 17 00:00:00 2001
From: Niharika Dutta <nidutta@microsoft.com>
Date: Fri, 26 Mar 2021 12:11:38 -0700
Subject: [PATCH 27/36] prep for 1.1

---
 README.md                                     |  2 +-
 .../release-1.0.1.md => 1.1/release-1.1.md}   | 19 +++++++++++++++++--
 eng/Versions.props                            |  2 +-
 src/scala/pom.xml                             |  2 +-
 4 files changed, 20 insertions(+), 5 deletions(-)
 rename docs/release-notes/{1.0.1/release-1.0.1.md => 1.1/release-1.1.md} (76%)

diff --git a/README.md b/README.md
index 7813372a2..8e2f068ec 100644
--- a/README.md
+++ b/README.md
@@ -39,7 +39,7 @@
     <tbody align="center">
         <tr>
             <td >2.3</td>
-            <td rowspan=3><a href="https://github.com/dotnet/spark/releases/tag/v1.0.1">v1.0.1</a></td>
+            <td rowspan=3><a href="https://github.com/dotnet/spark/releases/tag/v1.1">v1.1</a></td>
         </tr>
         <tr>
             <td>2.4*</td>
diff --git a/docs/release-notes/1.0.1/release-1.0.1.md b/docs/release-notes/1.1/release-1.1.md
similarity index 76%
rename from docs/release-notes/1.0.1/release-1.0.1.md
rename to docs/release-notes/1.1/release-1.1.md
index 279e7ba90..04e691250 100644
--- a/docs/release-notes/1.0.1/release-1.0.1.md
+++ b/docs/release-notes/1.1/release-1.1.md
@@ -1,11 +1,22 @@
-# .NET for Apache Spark 1.0.1 Release Notes
+# .NET for Apache Spark 1.1 Release Notes
 
 ### New Features/Improvements
 
-* Support for Arrow 2.0 and GroupedMapUdf in Spark 3.0.0 ([#654](https://github.com/dotnet/spark/issues/654))
+* Support for Arrow 2.0 and GroupedMapUdf in Spark 3.0.0 ([#711](https://github.com/dotnet/spark/pull/711))
 * Use pattern matching in arrow test utils to improve readability ([#725](https://github.com/dotnet/spark/pull/725))
 * Check whether file is found before trying to dereference it ([#759](https://github.com/dotnet/spark/pull/759))
 * Ml/feature hasher has only internal contructors ([#761](https://github.com/dotnet/spark/pull/761))
+* Support for stop words removers ([#726](https://github.com/dotnet/spark/pull/726))
+* Support for adding NGram functionality ([#734](https://github.com/dotnet/spark/pull/734))
+* Add support for SQLTransformer ML feature ([#781](https://github.com/dotnet/spark/pull/781))
+* Add skeletal support for FileSystem extension ([#787](https://github.com/dotnet/spark/pull/787))
+* Using (processId, threadId) as key to mantain threadpool executor instead of only threadId ([#793](https://github.com/dotnet/spark/pull/793))
+* Support for Hyperspace 0.4.0 ([#815](https://github.com/dotnet/spark/pull/815))
+* Support for Delta Lake 0.8.0 ([#823](https://github.com/dotnet/spark/pull/823))
+* Add support for Spark 3.0.2 ([#833](https://github.com/dotnet/spark/pull/833))
+* Migrating master to main branch ([#847](https://github.com/dotnet/spark/pull/847), [#849](https://github.com/dotnet/spark/pull/849))
+* Add DOTNET_WORKER_<ver>_DIR environment variable ([#861](https://github.com/dotnet/spark/pull/861))
+* Add spark.dotnet.ignoreSparkPatchVersionCheck conf to ignore patch version in DotnetRunner ([#862](https://github.com/dotnet/spark/pull/862))
 
 ### Bug Fixes
 
@@ -13,12 +24,16 @@
 * Fix package-worker.ps1 to handle output path with ":" ([#742](https://github.com/dotnet/spark/pull/742))
 * Fixes for TimestampType and DateType conversion ([#765](https://github.com/dotnet/spark/pull/765))
 * Fix for using Broadcast variables in Databricks ([#766](https://github.com/dotnet/spark/pull/766))
+* Fix macOS Catalina Permissions ([#784](https://github.com/dotnet/spark/pull/784))
+* Fix for memory leak in JVMObjectTracker ([#801](https://github.com/dotnet/spark/pull/801))
 
 ### Infrastructure / Documentation / Etc.
 
 * Fix flaky CallbackTests.TestCallbackHandlers Test ([#745](https://github.com/dotnet/spark/pull/745))
 * Run E2E tests on Linux in build pipeline and add Backward/Forward E2E tests ([#737](https://github.com/dotnet/spark/pull/737))
 * Add comments and cleanup azure pipeline ([#764](https://github.com/dotnet/spark/pull/764))
+* Update dotnet-interactive deprecated feed ([#807](https://github.com/dotnet/spark/pull/807), [#808](https://github.com/dotnet/spark/pull/808))
+* Remove unnecessary RestoreSources ([#812](https://github.com/dotnet/spark/pull/812))
 
 ### Breaking Changes
 
diff --git a/eng/Versions.props b/eng/Versions.props
index 91b34b10c..510b6b475 100644
--- a/eng/Versions.props
+++ b/eng/Versions.props
@@ -1,7 +1,7 @@
 <?xml version="1.0" encoding="utf-8"?>
 <Project ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
   <PropertyGroup>
-    <VersionPrefix>1.0.1</VersionPrefix>
+    <VersionPrefix>1.1</VersionPrefix>
     <PreReleaseVersionLabel>prerelease</PreReleaseVersionLabel>
   </PropertyGroup>
 </Project>
diff --git a/src/scala/pom.xml b/src/scala/pom.xml
index 8dc6f4695..6ecc4ef20 100644
--- a/src/scala/pom.xml
+++ b/src/scala/pom.xml
@@ -7,7 +7,7 @@
   <version>${microsoft-spark.version}</version>
   <properties>
     <encoding>UTF-8</encoding>
-    <microsoft-spark.version>1.0.1</microsoft-spark.version>
+    <microsoft-spark.version>1.1</microsoft-spark.version>
   </properties>
 
   <modules>

From 907efc63d8a14ec246769bba8789f89e334524ba Mon Sep 17 00:00:00 2001
From: Niharika Dutta <nidutta@microsoft.com>
Date: Fri, 26 Mar 2021 12:20:48 -0700
Subject: [PATCH 28/36] changes

---
 README.md                             |  2 +-
 benchmark/scala/pom.xml               |  2 +-
 docs/release-notes/1.1/release-1.1.md | 13 +++++++++++--
 eng/Versions.props                    |  2 +-
 src/scala/pom.xml                     |  2 +-
 5 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/README.md b/README.md
index 8e2f068ec..20f570bdd 100644
--- a/README.md
+++ b/README.md
@@ -39,7 +39,7 @@
     <tbody align="center">
         <tr>
             <td >2.3</td>
-            <td rowspan=3><a href="https://github.com/dotnet/spark/releases/tag/v1.1">v1.1</a></td>
+            <td rowspan=3><a href="https://github.com/dotnet/spark/releases/tag/v1.1.0">v1.1.0</a></td>
         </tr>
         <tr>
             <td>2.4*</td>
diff --git a/benchmark/scala/pom.xml b/benchmark/scala/pom.xml
index b6eb7acff..656297d99 100644
--- a/benchmark/scala/pom.xml
+++ b/benchmark/scala/pom.xml
@@ -3,7 +3,7 @@
   <modelVersion>4.0.0</modelVersion>
   <groupId>com.microsoft.spark</groupId>
   <artifactId>microsoft-spark-benchmark</artifactId>
-  <version>1.0.1</version>
+  <version>1.1.0</version>
   <inceptionYear>2019</inceptionYear>
   <properties>
     <encoding>UTF-8</encoding>
diff --git a/docs/release-notes/1.1/release-1.1.md b/docs/release-notes/1.1/release-1.1.md
index 04e691250..5650e0496 100644
--- a/docs/release-notes/1.1/release-1.1.md
+++ b/docs/release-notes/1.1/release-1.1.md
@@ -128,10 +128,13 @@ The following table outlines the supported Spark versions along with the microso
         </tr>
         <tr>
             <td>3.0.0</td>
-            <td rowspan=2>microsoft-spark-3-0_2.12-1.0.1.jar</td>
+            <td rowspan=3>microsoft-spark-3-0_2.12-1.1.0.jar</td>
         </tr>
         <tr>
             <td>3.0.1</td>
+        </tr>
+		<tr>
+            <td>3.0.2</td>
         </tr>
     </tbody>
 </table>
@@ -172,6 +175,9 @@ The following table outlines the supported Delta versions along with the Microso
         </tr>
         <tr>
             <td>0.7.0</td>
+        </tr>
+		<tr>
+            <td>0.8.0</td>
         </tr>
     </tbody>
 </table>
@@ -190,10 +196,13 @@ The following table outlines the supported Hyperspace versions along with the Mi
     <tbody align="center">
         <tr>
             <td>0.1.0</td>
-            <td rowspan=2>1.0.1</td>
+            <td rowspan=2>1.1.0</td>
         </tr>
         <tr>
             <td>0.2.0</td>
+        </tr>
+		<tr>
+            <td>0.4.0</td>
         </tr>
     </tbody>
 </table>
diff --git a/eng/Versions.props b/eng/Versions.props
index 510b6b475..fc82f0815 100644
--- a/eng/Versions.props
+++ b/eng/Versions.props
@@ -1,7 +1,7 @@
 <?xml version="1.0" encoding="utf-8"?>
 <Project ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
   <PropertyGroup>
-    <VersionPrefix>1.1</VersionPrefix>
+    <VersionPrefix>1.1.0</VersionPrefix>
     <PreReleaseVersionLabel>prerelease</PreReleaseVersionLabel>
   </PropertyGroup>
 </Project>
diff --git a/src/scala/pom.xml b/src/scala/pom.xml
index 6ecc4ef20..37d691701 100644
--- a/src/scala/pom.xml
+++ b/src/scala/pom.xml
@@ -7,7 +7,7 @@
   <version>${microsoft-spark.version}</version>
   <properties>
     <encoding>UTF-8</encoding>
-    <microsoft-spark.version>1.1</microsoft-spark.version>
+    <microsoft-spark.version>1.1.0</microsoft-spark.version>
   </properties>
 
   <modules>

From 63d2648931815c9b13c388cde90871391d14a387 Mon Sep 17 00:00:00 2001
From: Niharika Dutta <nidutta@microsoft.com>
Date: Fri, 26 Mar 2021 12:25:10 -0700
Subject: [PATCH 29/36] fix formatting

---
 docs/release-notes/1.1/release-1.1.md | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/docs/release-notes/1.1/release-1.1.md b/docs/release-notes/1.1/release-1.1.md
index 5650e0496..8158ff1d8 100644
--- a/docs/release-notes/1.1/release-1.1.md
+++ b/docs/release-notes/1.1/release-1.1.md
@@ -98,11 +98,11 @@ The following table outlines the supported Spark versions along with the microso
     <tbody align="center">
         <tr>
             <td>2.3.*</td>
-            <td>microsoft-spark-2-3_2.11-1.0.1.jar</td>
+            <td>microsoft-spark-2-3_2.11-1.1.0.jar</td>
         </tr>
         <tr>
             <td>2.4.0</td>
-            <td rowspan=7>microsoft-spark-2-4_2.11-1.0.1.jar</td>
+            <td rowspan=7>microsoft-spark-2-4_2.11-1.1.0.jar</td>
         </tr>
         <tr>
             <td>2.4.1</td>
@@ -153,7 +153,7 @@ The following table outlines the supported Delta versions along with the Microso
     <tbody align="center">
         <tr>
             <td>0.1.0</td>
-            <td rowspan=8>1.0.1</td>
+            <td rowspan=9>1.1.0</td>
         </tr>
         <tr>
             <td>0.2.0</td>
@@ -196,7 +196,7 @@ The following table outlines the supported Hyperspace versions along with the Mi
     <tbody align="center">
         <tr>
             <td>0.1.0</td>
-            <td rowspan=2>1.1.0</td>
+            <td rowspan=3>1.1.0</td>
         </tr>
         <tr>
             <td>0.2.0</td>

From 1fceb0fbd9c2ed4c737422675882d184454c5577 Mon Sep 17 00:00:00 2001
From: Niharika Dutta <nidutta@microsoft.com>
Date: Fri, 26 Mar 2021 18:21:39 -0700
Subject: [PATCH 30/36] PR review comment

---
 docs/release-notes/1.1/release-1.1.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/release-notes/1.1/release-1.1.md b/docs/release-notes/1.1/release-1.1.md
index 8158ff1d8..434749c2c 100644
--- a/docs/release-notes/1.1/release-1.1.md
+++ b/docs/release-notes/1.1/release-1.1.md
@@ -22,10 +22,10 @@
 
 * Fix signer information mismatch issue ([#752](https://github.com/dotnet/spark/pull/752))
 * Fix package-worker.ps1 to handle output path with ":" ([#742](https://github.com/dotnet/spark/pull/742))
-* Fixes for TimestampType and DateType conversion ([#765](https://github.com/dotnet/spark/pull/765))
 * Fix for using Broadcast variables in Databricks ([#766](https://github.com/dotnet/spark/pull/766))
 * Fix macOS Catalina Permissions ([#784](https://github.com/dotnet/spark/pull/784))
 * Fix for memory leak in JVMObjectTracker ([#801](https://github.com/dotnet/spark/pull/801))
+* Bug Fix for Spark 3.x - Avoid converting converted Row values ([#868](https://github.com/dotnet/spark/pull/868))
 
 ### Infrastructure / Documentation / Etc.
 

From 2219cdd153eb9eb6d38e3ad8c4eec8d3963a2451 Mon Sep 17 00:00:00 2001
From: Niharika Dutta <nidutta@microsoft.com>
Date: Fri, 26 Mar 2021 18:23:21 -0700
Subject: [PATCH 31/36] PR review comment

---
 docs/release-notes/1.1/release-1.1.md | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/docs/release-notes/1.1/release-1.1.md b/docs/release-notes/1.1/release-1.1.md
index 434749c2c..c396d0d10 100644
--- a/docs/release-notes/1.1/release-1.1.md
+++ b/docs/release-notes/1.1/release-1.1.md
@@ -196,10 +196,13 @@ The following table outlines the supported Hyperspace versions along with the Mi
     <tbody align="center">
         <tr>
             <td>0.1.0</td>
-            <td rowspan=3>1.1.0</td>
+            <td rowspan=4>1.1.0</td>
         </tr>
         <tr>
             <td>0.2.0</td>
+        </tr>
+		<tr>
+            <td>0.3.0</td>
         </tr>
 		<tr>
             <td>0.4.0</td>

From f4439338a87f2699dac4aa94358c25957df33970 Mon Sep 17 00:00:00 2001
From: Niharika Dutta <nidutta@microsoft.com>
Date: Fri, 26 Mar 2021 21:30:44 -0700
Subject: [PATCH 32/36] PR review comment

---
 docs/release-notes/1.1/release-1.1.md | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/docs/release-notes/1.1/release-1.1.md b/docs/release-notes/1.1/release-1.1.md
index c396d0d10..b6532983b 100644
--- a/docs/release-notes/1.1/release-1.1.md
+++ b/docs/release-notes/1.1/release-1.1.md
@@ -1,5 +1,10 @@
 # .NET for Apache Spark 1.1 Release Notes
 
+### Deprecation notice for Spark 2.3
+
+We are planning to drop the support for Spark 2.3 in the 2.0 release, which is scheduled in April 2021.
+The last Spark 2.3 release (2.3.4) was back in September 2019, and no new release is planned for Spark 2.3. Since there have been no new features introduced for Spark 2.3 in the last few releases of .NET for Apache Spark, if you are relying on Spark 2.3, you should be able to continue using .NET for Apache Spark v1.0.0.
+
 ### New Features/Improvements
 
 * Support for Arrow 2.0 and GroupedMapUdf in Spark 3.0.0 ([#711](https://github.com/dotnet/spark/pull/711))
@@ -14,7 +19,6 @@
 * Support for Hyperspace 0.4.0 ([#815](https://github.com/dotnet/spark/pull/815))
 * Support for Delta Lake 0.8.0 ([#823](https://github.com/dotnet/spark/pull/823))
 * Add support for Spark 3.0.2 ([#833](https://github.com/dotnet/spark/pull/833))
-* Migrating master to main branch ([#847](https://github.com/dotnet/spark/pull/847), [#849](https://github.com/dotnet/spark/pull/849))
 * Add DOTNET_WORKER_<ver>_DIR environment variable ([#861](https://github.com/dotnet/spark/pull/861))
 * Add spark.dotnet.ignoreSparkPatchVersionCheck conf to ignore patch version in DotnetRunner ([#862](https://github.com/dotnet/spark/pull/862))
 
@@ -26,6 +30,7 @@
 * Fix macOS Catalina Permissions ([#784](https://github.com/dotnet/spark/pull/784))
 * Fix for memory leak in JVMObjectTracker ([#801](https://github.com/dotnet/spark/pull/801))
 * Bug Fix for Spark 3.x - Avoid converting converted Row values ([#868](https://github.com/dotnet/spark/pull/868))
+* Add 'Z' to the string format in Timestamp.ToString() to indicate UTC time ([#871](https://github.com/dotnet/spark/pull/871))
 
 ### Infrastructure / Documentation / Etc.
 
@@ -34,6 +39,7 @@
 * Add comments and cleanup azure pipeline ([#764](https://github.com/dotnet/spark/pull/764))
 * Update dotnet-interactive deprecated feed ([#807](https://github.com/dotnet/spark/pull/807), [#808](https://github.com/dotnet/spark/pull/808))
 * Remove unnecessary RestoreSources ([#812](https://github.com/dotnet/spark/pull/812))
+* Migrating master to main branch ([#847](https://github.com/dotnet/spark/pull/847), [#849](https://github.com/dotnet/spark/pull/849))
 
 ### Breaking Changes
 
@@ -62,10 +68,11 @@ The following table describes the oldest version of the worker that the current
     <tbody align="center">
         <tr>
             <td>v1.0.0</td>
-            <td>GroupedMap in Spark 3.0 is not compatible with Worker 1.0 <a href="https://github.com/dotnet/spark/pull/654">(#654)</a></td>
+            <td>GroupedMap in Spark 3.0 is not compatible with Worker 1.0 <a href="https://github.com/dotnet/spark/pull/654">(#654)</a>\*</td>
         </tr>
     </tbody>
 </table>
+\* This is not a breaking change since this feature never worked with Worker 1.0.0.
 
 #### Forward compatibility
 

From 893b09f9bd78d3f549ac31912ead1828f7266b9f Mon Sep 17 00:00:00 2001
From: Niharika Dutta <nidutta@microsoft.com>
Date: Fri, 26 Mar 2021 21:32:46 -0700
Subject: [PATCH 33/36] change

---
 docs/release-notes/1.1/release-1.1.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/release-notes/1.1/release-1.1.md b/docs/release-notes/1.1/release-1.1.md
index b6532983b..c451d0755 100644
--- a/docs/release-notes/1.1/release-1.1.md
+++ b/docs/release-notes/1.1/release-1.1.md
@@ -68,11 +68,11 @@ The following table describes the oldest version of the worker that the current
     <tbody align="center">
         <tr>
             <td>v1.0.0</td>
-            <td>GroupedMap in Spark 3.0 is not compatible with Worker 1.0 <a href="https://github.com/dotnet/spark/pull/654">(#654)</a>\*</td>
+            <td>GroupedMap in Spark 3.0 is not compatible with Worker 1.0 <a href="https://github.com/dotnet/spark/pull/654">(#654)</a>*</td>
         </tr>
     </tbody>
 </table>
-\* This is not a breaking change since this feature never worked with Worker 1.0.0.
+* This is not a breaking change since this feature never worked with Worker 1.0.0.
 
 #### Forward compatibility
 

From d6acb4352b5c6b959d40d446a1332407101fd40e Mon Sep 17 00:00:00 2001
From: Terry Kim <yuminkim@gmail.com>
Date: Sat, 27 Mar 2021 08:43:36 -0700
Subject: [PATCH 34/36] Update docs/release-notes/1.1/release-1.1.md

---
 docs/release-notes/1.1/release-1.1.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/release-notes/1.1/release-1.1.md b/docs/release-notes/1.1/release-1.1.md
index c451d0755..6a8a39c8f 100644
--- a/docs/release-notes/1.1/release-1.1.md
+++ b/docs/release-notes/1.1/release-1.1.md
@@ -3,7 +3,7 @@
 ### Deprecation notice for Spark 2.3
 
 We are planning to drop the support for Spark 2.3 in the 2.0 release, which is scheduled in April 2021.
-The last Spark 2.3 release (2.3.4) was back in September 2019, and no new release is planned for Spark 2.3. Since there have been no new features introduced for Spark 2.3 in the last few releases of .NET for Apache Spark, if you are relying on Spark 2.3, you should be able to continue using .NET for Apache Spark v1.0.0.
+The last Spark 2.3 release (2.3.4) was back in September 2019, and no new release is planned for Spark 2.3. Since there have been no new features introduced for Spark 2.3 in the last few releases of .NET for Apache Spark, if you are relying on Spark 2.3, you should be able to continue using .NET for Apache Spark 1.x.
 
 ### New Features/Improvements
 

From fb8ce2f2521fa92f1bab3412051b804a47f4aa22 Mon Sep 17 00:00:00 2001
From: Terry Kim <yuminkim@gmail.com>
Date: Sat, 27 Mar 2021 08:43:59 -0700
Subject: [PATCH 35/36] Update docs/release-notes/1.1/release-1.1.md

---
 docs/release-notes/1.1/release-1.1.md | 1 -
 1 file changed, 1 deletion(-)

diff --git a/docs/release-notes/1.1/release-1.1.md b/docs/release-notes/1.1/release-1.1.md
index 6a8a39c8f..fa227f5ef 100644
--- a/docs/release-notes/1.1/release-1.1.md
+++ b/docs/release-notes/1.1/release-1.1.md
@@ -36,7 +36,6 @@ The last Spark 2.3 release (2.3.4) was back in September 2019, and no new releas
 
 * Fix flaky CallbackTests.TestCallbackHandlers Test ([#745](https://github.com/dotnet/spark/pull/745))
 * Run E2E tests on Linux in build pipeline and add Backward/Forward E2E tests ([#737](https://github.com/dotnet/spark/pull/737))
-* Add comments and cleanup azure pipeline ([#764](https://github.com/dotnet/spark/pull/764))
 * Update dotnet-interactive deprecated feed ([#807](https://github.com/dotnet/spark/pull/807), [#808](https://github.com/dotnet/spark/pull/808))
 * Remove unnecessary RestoreSources ([#812](https://github.com/dotnet/spark/pull/812))
 * Migrating master to main branch ([#847](https://github.com/dotnet/spark/pull/847), [#849](https://github.com/dotnet/spark/pull/849))

From af0f631b1e3fa8b0dcd108c2446735b1aa2d0c56 Mon Sep 17 00:00:00 2001
From: Terry Kim <yuminkim@gmail.com>
Date: Sat, 27 Mar 2021 08:45:22 -0700
Subject: [PATCH 36/36] Update docs/release-notes/1.1/release-1.1.md

---
 docs/release-notes/1.1/release-1.1.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/release-notes/1.1/release-1.1.md b/docs/release-notes/1.1/release-1.1.md
index fa227f5ef..3503d1076 100644
--- a/docs/release-notes/1.1/release-1.1.md
+++ b/docs/release-notes/1.1/release-1.1.md
@@ -2,7 +2,7 @@
 
 ### Deprecation notice for Spark 2.3
 
-We are planning to drop the support for Spark 2.3 in the 2.0 release, which is scheduled in April 2021.
+We are planning to drop the support for Spark 2.3 in the 2.0 release, which will be the next release.
 The last Spark 2.3 release (2.3.4) was back in September 2019, and no new release is planned for Spark 2.3. Since there have been no new features introduced for Spark 2.3 in the last few releases of .NET for Apache Spark, if you are relying on Spark 2.3, you should be able to continue using .NET for Apache Spark 1.x.
 
 ### New Features/Improvements

Spark Version	microsoft-spark JAR
2.3.*	microsoft-spark-2-3_2.11-1.0.0.jar
2.4.0	microsoft-spark-2-4_2.11-1.0.0.jar
2.4.1
2.4.3
2.4.4
2.4.5
2.4.6
2.4.7
2.4.2	Not supported
3.0.0	microsoft-spark-3-0_2.12-1.0.0.jar
3.0.1	microsoft-spark-3-0_2.12-1.0.0.jar