Skip to content

v2.0.0

Compare
Choose a tag to compare
@Conchylicultor Conchylicultor released this 24 Jan 20:02
· 4959 commits to master since this release
  • This is the last version of TFDS that will support Python 2. Going forward, we'll only support and test against Python 3.
  • The default versions of all datasets are now using the S3 slicing API. See the guide for details.
  • The previous split API is still available, but is deprecated. If you wrote DatasetBuilders outside the TFDS repository, please make sure they do not use experiments={tfds.core.Experiment.S3: False}. This will be removed in the next version, as well as the num_shards kwargs from SplitGenerator.
  • Several new datasets. Thanks to all the contributors!
  • API changes and new features:
    • shuffle_files defaults to False so that dataset iteration is deterministic by default. You can customize the reading pipeline, including shuffling and interleaving, through the new read_config parameter in tfds.load.
    • urls kwargs renamed homepage in DatasetInfo
    • Support for nested tfds.features.Sequence and tf.RaggedTensor
    • Custom FeatureConnectors can override the decode_batch_example method for efficient decoding when wrapped inside a tfds.features.Sequence(my_connector)
    • Declaring a dataset in Colab won't register it, which allow to re-run the cell without having to change the name
    • Beam datasets can use a tfds.core.BeamMetadataDict to store additional metadata computed as part of the Beam pipeline.
    • Beam datasets' _split_generators accepts an additional pipeline kwargs to define a pipeline shared between all splits.
  • Various other bug fixes and performance improvements. Thank you for all the reports and fixes!