Release v2.0.0 · tensorflow/datasets

This is the last version of TFDS that will support Python 2. Going forward, we'll only support and test against Python 3.
The default versions of all datasets are now using the S3 slicing API. See the guide for details.
The previous split API is still available, but is deprecated. If you wrote DatasetBuilders outside the TFDS repository, please make sure they do not use experiments={tfds.core.Experiment.S3: False}. This will be removed in the next version, as well as the num_shards kwargs from SplitGenerator.
Several new datasets. Thanks to all the contributors!
API changes and new features:
- shuffle_files defaults to False so that dataset iteration is deterministic by default. You can customize the reading pipeline, including shuffling and interleaving, through the new read_config parameter in tfds.load.
- urls kwargs renamed homepage in DatasetInfo
- Support for nested tfds.features.Sequence and tf.RaggedTensor
- Custom FeatureConnectors can override the decode_batch_example method for efficient decoding when wrapped inside a tfds.features.Sequence(my_connector)
- Declaring a dataset in Colab won't register it, which allow to re-run the cell without having to change the name
- Beam datasets can use a tfds.core.BeamMetadataDict to store additional metadata computed as part of the Beam pipeline.
- Beam datasets' _split_generators accepts an additional pipeline kwargs to define a pipeline shared between all splits.
Various other bug fixes and performance improvements. Thank you for all the reports and fixes!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v2.0.0

Uh oh!