v2.0.0
·
4959 commits
to master
since this release
- This is the last version of TFDS that will support Python 2. Going forward, we'll only support and test against Python 3.
- The default versions of all datasets are now using the S3 slicing API. See the guide for details.
- The previous split API is still available, but is deprecated. If you wrote
DatasetBuilder
s outside the TFDS repository, please make sure they do not useexperiments={tfds.core.Experiment.S3: False}
. This will be removed in the next version, as well as thenum_shards
kwargs fromSplitGenerator
. - Several new datasets. Thanks to all the contributors!
- API changes and new features:
shuffle_files
defaults to False so that dataset iteration is deterministic by default. You can customize the reading pipeline, including shuffling and interleaving, through the newread_config
parameter intfds.load
.urls
kwargs renamedhomepage
inDatasetInfo
- Support for nested
tfds.features.Sequence
andtf.RaggedTensor
- Custom
FeatureConnector
s can override thedecode_batch_example
method for efficient decoding when wrapped inside atfds.features.Sequence(my_connector)
- Declaring a dataset in Colab won't register it, which allow to re-run the cell without having to change the name
- Beam datasets can use a
tfds.core.BeamMetadataDict
to store additional metadata computed as part of the Beam pipeline. - Beam datasets'
_split_generators
accepts an additionalpipeline
kwargs to define a pipeline shared between all splits.
- Various other bug fixes and performance improvements. Thank you for all the reports and fixes!