-
Notifications
You must be signed in to change notification settings - Fork 29
New features: chunking and compression #93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
051cdc1
to
f1db1d6
Compare
9c403de
to
29e55b9
Compare
This sounds very interesting and is in line with some work we are planning to do this fiscal year (plan was for using the HDF5 compression under netCDF). If this is planned to go into production in the short term, we would also include this in the work that we do. |
Since you mentioned HDF5, I wonder if there is any thing missing in HDF5 but is in this PR of PnetCDF. |
No, we basically want to give the users some options. Some prefer HDF5-based files and some prefer the native based files. It also gives us flexibility if having problems we can try the other format and see if that does/doesn't have a similar issue. Dealing with the compression filter plugins on a netCDF-4 file is somewhat problematic since a user needs to know what was used to write the file at the time it is read to make sure the correct filter exists. That is our main delay in implementing some of the compression filters is that we can't guarantee that a reader of the file will have the correct filter or even given a raw hdf5 file, how do we know what filter is needed to read it... |
That is indeed a challenge. Another thing I learned is users may want to use different compression |
We have not investigated that yet. Currently using the same parameters for all datasets. We do have some integer and some double-precision floating point data sets, so different algorithms might help there, but we haven't looked at it yet. |
If you plan to give this PR a try, I very much welcome your feedback. FYI. PnetCDF also supports compression/decompression in its nonblocking |
92f2720
to
767df42
Compare
@wkliao It looks like the SZ headers are installed under Because of that, the following check fails to find sz.h:
A possible fix would be to adjust the include path:
|
For such unusually installation, I suggest to set CPPFLAGS environment variable |
Actually that is the default installation of SZ. Setting CPPFLAGS environment variable to the SZ include path before calling configure works: |
@wkliao
Notes:
|
Hi, @dqwu Thanks. I was able to reproduce the error. |
The problem has been fixed in the latest feature branch, thanks. |
This PR is to add data chunking and compression features. By using additional metadata and manipulate the space between data object file offsets, these features can be developed without breaking the NetCDF file format specifications.
More information about the design and implementation are described in K. Hou, Q. Kang, S. Lee, A. Agrawal, A. Choudhary, W. Liao. Supporting Data Compression in PnetCDF, published in the International Conference on Big Data, 2021.
An example program is available in ./examples/C/chunk_compress.c