Skip to content

Commit 0fa0927

Browse files
committed
Initial code commit
1 parent c9038c3 commit 0fa0927

File tree

4 files changed

+396
-0
lines changed

4 files changed

+396
-0
lines changed

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,3 +32,6 @@ node_modules
3232

3333
# Optional REPL history
3434
.node_repl_history
35+
36+
# Webstorm
37+
.idea

README.md

Lines changed: 107 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,109 @@
11
# node-csv-reader
2+
3+
[![npm Version](https://badge.fury.io/js/csv-reader.png)](https://npmjs.org/package/csv-reader)
4+
25
A CSV stream reader, with many many features, and ability to work with the largest datasets
6+
7+
Included features: (can be turned on and off)
8+
9+
* Support for excel-style multiline cells wrapped in quotes
10+
* Choosing a different delimiter instead of the comma
11+
* Automatic skipping empty lines
12+
* Automatic parsing of numbers and booleans
13+
* Automatic trimming
14+
* Being a stream transformer, you can `.pause()` if you need some time to process the row and `.resume()` when you are ready to receive and process more rows.
15+
* Consumes and emits rows one-by-one, allowing you to process datasets in any size imaginable.
16+
* Automatically strips the BOM if exists (not handled automatically by node.js stream readers)
17+
18+
The options you can pass are:
19+
20+
Name | Type | Default | Explanation
21+
---- | ---- | ------- | -----------
22+
`delimiter` | `String` | `,` | The character that separates between cells
23+
`multiline` | `Boolean` | `true` | Allow multiline cells, when the cell is wrapped with quotes ("...\n...")
24+
`allowQuotes` | `Boolean` | `true` | Should quotes be treated as a special character that wraps cells etc.
25+
`skipEmptyLines` | `Boolean` | `false` | Should empty lines be automatically skipped?
26+
`parseNumbers` | `Boolean` | `false` | Should numbers be automatically parsed? This will parse any format supported by `parseFloat` including scientific notation, `Infinity` and `NaN`.
27+
`parseBooleans` | `Boolean` | `false` | Automatically parse booleans (strictly lowercase `true` and `false`)
28+
`ltrim` | `Boolean` | `false` | Automatically left-trims columns
29+
`rtrim` | `Boolean` | `false` | Automatically right-trims columns
30+
`trim` | `Boolean` | `false` | If true, then both 'ltrim' and 'rtrim' are set to true
31+
32+
Usage example:
33+
34+
```javascript
35+
36+
var fs = require('fs');
37+
var CsvReadableStream = require('csv-reader');
38+
39+
var inputStream = fs.createReadStream('my_data.csv', 'utf8');
40+
41+
inputStream
42+
.pipe(CsvReader({ parseNumbers: true, parseBooleans: true, trim: true }))
43+
.on('data', function (row) {
44+
console.log('A row arrived: ', row);
45+
}).on('end', function (data) {
46+
console.log('No more rows!');
47+
});
48+
49+
```
50+
51+
A common issue with CSVs are that Microsoft Excel for some reason *does not save UTF8 files*. Microsoft never liked standards.
52+
In order to automagically handle the possibility of such files with ANSI encodings arriving from user input, you can use the [autodetect-decoder-stream](https://www.npmjs.com/package/autodetect-decoder-stream) like this:
53+
54+
```javascript
55+
56+
var fs = require('fs');
57+
var CsvReadableStream = require('csv-reader');
58+
var AutoDetectDecoderStream = require('autodetect-decoder-stream');
59+
60+
var inputStream = fs.createReadStream('my_data.csv')
61+
.pipe(new AutoDetectDecoderStream({ defaultEncoding: '1255' })); // If failed to guess encoding, default to 1255
62+
63+
// The AutoDetectDecoderStream will know if the stream is UTF8, windows-1255, windows-1252 etc.
64+
// It will pass a properly decoded data to the CsvReader.
65+
66+
inputStream
67+
.pipe(CsvReader({ parseNumbers: true, parseBooleans: true, trim: true }))
68+
.on('data', function (row) {
69+
console.log('A row arrived: ', row);
70+
}).on('end', function (data) {
71+
console.log('No more rows!');
72+
});
73+
74+
```
75+
76+
## Contributing
77+
78+
If you have anything to contribute, or functionality that you lack - you are more than welcome to participate in this!
79+
If anyone wishes to contribute unit tests - that also would be great :-)
80+
81+
## Me
82+
* Hi! I am Daniel Cohen Gindi. Or in short- Daniel.
83+
* danielgindi@gmail.com is my email address.
84+
* That's all you need to know.
85+
86+
## Help
87+
88+
If you want to buy me a beer, you are very welcome to
89+
[![Donate](https://www.paypalobjects.com/en_US/i/btn/btn_donate_LG.gif)](https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=G6CELS3E997ZE)
90+
Thanks :-)
91+
92+
## License
93+
94+
All the code here is under MIT license. Which means you could do virtually anything with the code.
95+
I will appreciate it very much if you keep an attribution where appropriate.
96+
97+
The MIT License (MIT)
98+
99+
Copyright (c) 2013 Daniel Cohen Gindi (danielgindi@gmail.com)
100+
101+
Permission is hereby granted, free of charge, to any person obtaining a copy
102+
of this software and associated documentation files (the "Software"), to deal
103+
in the Software without restriction, including without limitation the rights
104+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
105+
copies of the Software, and to permit persons to whom the Software is
106+
furnished to do so, subject to the following conditions:
107+
108+
The above copyright notice and this permission notice shall be included in all
109+
copies or substantial portions of the Software.

index.js

Lines changed: 257 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,257 @@
1+
var stream = require('stream');
2+
var util = require('util');
3+
4+
/**
5+
* @const
6+
* @type {RegExp}
7+
*/
8+
var PARSE_FLOAT_TEST = /^[0-9]+(?:\.[0-9]*)?(?:[eE]\+[0-9]+)?$|^(?:[0-9]+)?\.[0-9]+(?:e+[0-9]+)?$|^[-+]?Infinity$|^[-+]?NaN$/;
9+
10+
var Transform = stream.Transform;
11+
12+
/**
13+
* @param {Object?} options
14+
* @param {String?} options.delimiter=',' - Specify what is the CSV delimeter
15+
* @param {Boolean=true} options.multiline - Support Excel-like multiline CSV
16+
* @param {Boolean=true} options.allowQuotes - Allow quotation marks to wrap columns
17+
* @param {Boolean=false} options.skipEmptyLines - Should empty lines be automatically skipped?
18+
* @param {Boolean=false} options.parseNumbers - Automatically parse numbers (with a . as the decimal separator)
19+
* @param {Boolean=false} options.parseBooleans - Automatically parse booleans (strictly lowercase `true` and `false`)
20+
* @param {Boolean=false} options.ltrim - Automatically left-trims columns
21+
* @param {Boolean=false} options.rtrim - Automatically right-trims columns
22+
* @param {Boolean=false} options.trim - If true, then both 'ltrim' and 'rtrim' are set to true
23+
* @returns {CsvReadableStream}
24+
* @constructor
25+
*/
26+
var CsvReadableStream = function (options) {
27+
options = options || {};
28+
29+
//noinspection JSUndefinedPropertyAssignment
30+
options.objectMode = true;
31+
32+
if (!(this instanceof CsvReadableStream)) {
33+
return new CsvReadableStream(options);
34+
}
35+
36+
var data = null
37+
, dataIndex = null
38+
, nextIndex = null
39+
, dataLen = null
40+
, columns = []
41+
, column = ''
42+
, lastLineEndCR = false
43+
, lookForBOM = true
44+
, isQuoted = false
45+
46+
, multiline = !!options.multiline || typeof options.multiline === 'undefined'
47+
, delimiter = options.delimiter != null ? options.delimiter.toString() || ',' : ','
48+
, allowQuotes = !!options.allowQuotes || typeof options.allowQuotes === 'undefined'
49+
, skipEmptyLines = !!options.skipEmptyLines
50+
, parseNumbers = !!options.parseNumbers
51+
, parseBooleans = !!options.parseBooleans
52+
, ltrim = !!options.ltrim || !!options.trim
53+
, rtrim = !!options.rtrim || !!options.trim
54+
, trim = options.ltrim && options.rtrim
55+
56+
, postProcessingEnabled = parseNumbers || parseBooleans || ltrim || rtrim;
57+
58+
var postProcessColumn = function (column) {
59+
60+
if (trim) {
61+
column = column.replace(/^\s+|\s+$/, '');
62+
}
63+
else if (ltrim) {
64+
column = column.replace(/^\s+/, '');
65+
}
66+
else if (rtrim) {
67+
column = column.replace(/\s+$/, '');
68+
}
69+
70+
if (parseBooleans) {
71+
if (column === 'true') {
72+
return true;
73+
}
74+
if (column === 'false') {
75+
return false;
76+
}
77+
}
78+
79+
if (parseNumbers) {
80+
if (PARSE_FLOAT_TEST.test(column)) {
81+
return parseFloat(column);
82+
}
83+
}
84+
85+
return column;
86+
};
87+
88+
this._processChunk = function (newData) {
89+
90+
if (newData) {
91+
if (data) {
92+
data = data.substr(dataIndex) + newData;
93+
} else {
94+
data = newData;
95+
}
96+
dataLen = data.length;
97+
dataIndex = 0;
98+
}
99+
100+
// Node doesn't strip BOMs, that's in user's land
101+
if (lookForBOM) {
102+
if (newData.charCodeAt(0) === 0xfeff) {
103+
dataIndex++;
104+
}
105+
lookForBOM = false;
106+
}
107+
108+
var isFinishedLine = false;
109+
110+
for (; dataIndex < dataLen; dataIndex++) {
111+
var c = data[dataIndex];
112+
113+
if (c === '\n' || c === '\r') {
114+
if (!isQuoted || !multiline) {
115+
if (lastLineEndCR && c === '\n') {
116+
lastLineEndCR = false;
117+
continue;
118+
}
119+
lastLineEndCR = c === '\r';
120+
dataIndex++;
121+
isFinishedLine = true;
122+
123+
if (!multiline) {
124+
isQuoted = false;
125+
}
126+
127+
break;
128+
}
129+
}
130+
131+
if (isQuoted) {
132+
if (c === '"') {
133+
nextIndex = dataIndex + 1;
134+
135+
// Do we have enough data to peek at the next character?
136+
if (nextIndex >= dataLen && !this._isStreamDone) {
137+
// Wait for more data to arrive
138+
break;
139+
}
140+
141+
if (nextIndex < dataLen && data[nextIndex] === '"') {
142+
column += '"';
143+
dataIndex++;
144+
} else {
145+
isQuoted = false;
146+
}
147+
}
148+
else {
149+
column += c;
150+
}
151+
}
152+
else {
153+
if (c === delimiter) {
154+
columns.push(column);
155+
column = '';
156+
}
157+
else if (c === '"' && allowQuotes) {
158+
if (column.length) {
159+
column += c;
160+
}
161+
else {
162+
isQuoted = true;
163+
}
164+
}
165+
else {
166+
column += c;
167+
}
168+
}
169+
}
170+
171+
if (dataIndex === dataLen) {
172+
data = null;
173+
}
174+
175+
if (isFinishedLine || (data === null && this._isStreamDone)) {
176+
177+
if (columns.length || column || data || !this._isStreamDone) {
178+
179+
// We have a row, send it to the callback
180+
181+
// Commit this row
182+
columns.push(column);
183+
var row = columns;
184+
185+
// Clear row state data
186+
columns = [];
187+
column = '';
188+
isQuoted = false;
189+
190+
// Is this row full or empty?
191+
if (row.length > 1 || row[0].length || !skipEmptyLines) {
192+
193+
// Post processing
194+
if (postProcessingEnabled) {
195+
for (var i = 0, rowSize = row.length; i < rowSize; i++) {
196+
row[i] = postProcessColumn(row[i]);
197+
}
198+
}
199+
200+
// Emit the parsed row
201+
//noinspection JSUnresolvedFunction
202+
this.push(row);
203+
}
204+
205+
// Look to see if there are more rows in available data
206+
this._processChunk();
207+
208+
} else {
209+
// We just ran into a newline at the end of the file, ignore it
210+
}
211+
212+
} else {
213+
214+
if (data) {
215+
216+
// Let more data come in.
217+
// We are probably waiting for a "peek" at the next character
218+
219+
} else {
220+
221+
// We have probably hit end of file.
222+
// Let the end event come in.
223+
224+
}
225+
226+
}
227+
228+
};
229+
230+
Transform.call(this, options);
231+
};
232+
233+
util.inherits(CsvReadableStream, Transform);
234+
235+
//noinspection JSUnusedGlobalSymbols
236+
CsvReadableStream.prototype._transform = function (chunk, enc, cb) {
237+
238+
this._processChunk(chunk);
239+
240+
cb();
241+
};
242+
243+
//noinspection JSUnusedGlobalSymbols
244+
CsvReadableStream.prototype._flush = function (cb) {
245+
246+
this._isStreamDone = true;
247+
248+
this._processChunk();
249+
250+
cb();
251+
};
252+
253+
/**
254+
* @module
255+
* @type {CsvReadableStream}
256+
*/
257+
module.exports = CsvReadableStream;

0 commit comments

Comments
 (0)