-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Commit ccb1bbc
Update TFDS to 4.2.0
API:
* Add `tfds build` to the CLI. See [documentation](https://www.tensorflow.org/datasets/cli#tfds_build_download_and_prepare_a_dataset).
* DownloadManager now returns [Pathlib-like](https://docs.python.org/3/library/pathlib.html#basic-use) objects
* Datasets returned by `tfds.as_numpy` are compatible with `len(ds)`
* New `tfds.features.Dataset` to represent nested datasets
* Add `tfds.ReadConfig(add_tfds_id=True)` to add a unique identifiant to the example `ex['tfds_id']` (e.g. `b'train.tfrecord-00012-of-01024__123'`)
* Add `num_parallel_calls` option to `tfds.ReadConfig` to overwrite to default `AUTOTUNE` option
* `tfds.ImageFolder` now support `tfds.decode.SkipDecoder`
* Add multichannel audio support to `tfds.features.Audio`
* Better `tfds.as_dataframe` visualization (ffmpeg video if installed, bounding boxes,...)
* Add `try_gcs` to `tfds.builder(..., try_gcs=True)`
* Simpler `BuilderConfig` definition: global `VERSION` and `RELEASE_NOTES` are applied to all `BuilderConfig`. Config description is now optional.
Breaking compatibility changes:
* Removed non-plain text config of text datasets and remove config: `multi_nli/plain_text` -> `multi_nli`
* To guarantee better deterministic, new validations are performed on the keys when creating a dataset (to avoid filenames as keys (non-deterministic) and restrict key to `str`, `bytes` and `int`). New errors likely indicates an issue in the dataset implementation.
* `tfds.core.benchmark` now returns a `pd.DataFrame` (instead of a `dict`)
* `tfds.units` is not visible anymore from the public API
Bug fixes:
* Support 0-len sequence with images of dynamic shape (Fix #2616)
* Progression bar correctly updated when copying files.
* Many bug fixes (GPath consistency with pathlib, s3 compatibility, TQDM visual artifacts, GCS crash on windows, re-download when checksums updated,...)
* Better debugging and error message (e.g. human readable size,...)
* Allow `max_examples_per_splits=0` in `tfds build --max_examples_per_splits=0` to test `_split_generators` only (without `_generate_examples`).
And of course, new datasets and many datasets updates.
Thank you the community for their many valuable contributions and to supporting us in this project!!!
PiperOrigin-RevId: 3503440161 parent 7a40c59 commit ccb1bbcCopy full SHA for ccb1bbc
File tree
Expand file treeCollapse file tree
3 files changed
+461
-58
lines changedFilter options
- tensorflow_datasets
Expand file treeCollapse file tree
3 files changed
+461
-58
lines changed
0 commit comments