Skip to content

Commit 5266c54

Browse files
committed
Update README.rst
passes block_size parameter name in example should be bsize as per the RDD's ArrayRDD and DictRDD function parameter names.
1 parent 4193104 commit 5266c54

File tree

1 file changed

+5
-5
lines changed

1 file changed

+5
-5
lines changed

README.rst

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ Sparkit-learn introduces two important distributed data format:
3636
rdd = sc.parallelize(data, 2) # each partition with 10 elements
3737
# ArrayRDD
3838
# each partition will contain blocks with 5 elements
39-
X = ArrayRDD(rdd, block_size=5) # 4 blocks, 2 in each partition
39+
X = ArrayRDD(rdd, bsize=5) # 4 blocks, 2 in each partition
4040
4141
Basic operations:
4242

@@ -84,19 +84,19 @@ Sparkit-learn introduces two important distributed data format:
8484
X = range(20)
8585
y = range(2) * 10
8686
# PySpark RDD with 2 partitions
87-
X_rdd = sc.parallelize(data_X, 2) # each partition with 10 elements
88-
y_rdd = sc.parallelize(data_y, 2) # each partition with 10 elements
87+
X_rdd = sc.parallelize(X, 2) # each partition with 10 elements
88+
y_rdd = sc.parallelize(y, 2) # each partition with 10 elements
8989
zipped_rdd = X_rdd.zip(y_rdd) # zip the two rdd's together
9090
# DictRDD
9191
# each partition will contain blocks with 5 elements
92-
Z = DictRDD(zipped_rdd, columns=('X', 'y'), block_size=5) # 4 blocks, 2/partition
92+
Z = DictRDD(zipped_rdd, columns=('X', 'y'), bsize=5) # 4 blocks, 2/partition
9393
9494
# or:
9595
import numpy as np
9696
9797
data = np.array([range(20), range(2)*10]).T
9898
rdd = sc.parallelize(data, 2)
99-
Z = DictRDD(rdd, columns=('X', 'y'), block_size=5)
99+
Z = DictRDD(rdd, columns=('X', 'y'), bsize=5)
100100
101101
Basic operations:
102102

0 commit comments

Comments
 (0)