Skip to content

Conversation

ForeverAngry
Copy link

@ForeverAngry ForeverAngry commented Sep 27, 2025

Summary

This PR adds comprehensive(ish) (i think) snapshot expiration functionality to iceberg-rust, enabling automatic cleanup of old snapshots and associated files according to configurable retention policies.

Features Added

Core Functionality

  • Snapshot Expiration API: New ExpireSnapshots builder with fluent configuration
  • Multiple Retention Strategies:
    • Time-based expiration (expire_older_than)
    • Count-based retention (retain_last)
    • Combined criteria with precedence rules
  • Safety Guarantees:
    • Never expire current snapshots
    • Preserve snapshots referenced by branches/tags
    • Atomic operations following Iceberg's commit model
  • File Cleanup: Optional cleanup of orphaned manifest and data files
  • Dry Run Mode: Preview functionality without making changes

API Example

// Expire snapshots older than 7 days, keeping at least 5 snapshots
let result = table.expire_snapshots()
    .expire_older_than(chrono::Utc::now().timestamp_millis() - 7 * 24 * 60 * 60 * 1000)
    .retain_last(5)
    .clean_orphan_files(true)
    .execute()
    .await?;

println!("Expired {} snapshots", result.expired_snapshot_ids.len());

Files Added/Modified

New Files

  • iceberg-rust/src/table/maintenance/ - New maintenance module
  • iceberg-rust/src/table/maintenance/expire_snapshots.rs - Core implementation (1,007 lines)
  • iceberg-rust/examples/expire_snapshots.rs - Usage examples
  • iceberg-rust/tests/snapshot_expiration.rs - Comprehensive integration tests

Modified Files

  • iceberg-rust/src/table/mod.rs - Added expire_snapshots() method to Table
  • iceberg-rust/src/lib.rs - Exposed maintenance module

Testing

Comprehensive Test Coverage

  • 9 Unit Tests: Core functionality, edge cases, validation logic
  • 6 Integration Tests: End-to-end scenarios with real table metadata
  • All Tests Passing: ✅ 15/15 tests pass (I included my supporting tests as well)

Test Categories

  • Time-based expiration logic
  • Count-based retention logic
  • Current snapshot protection
  • Reference-aware cleanup
  • Combined criteria precedence
  • Empty metadata handling
  • File cleanup identification

Implementation Details

Documentation

  • Module-level documentation explaining concepts
  • Some example (demo) code demonstrating common usage patterns
  • Updated the main readme.md with the new features.

Compatibility

  • ✅ All existing tests pass
  • ✅ No breaking changes to existing APIs
  • ✅ Compatible with latest upstream changes (merged 62 commits)

This implementation provides a solid foundation for table maintenance operations in iceberg-rust, following Icebergs specification, and similar to that of the implementation from pyiceberg (that I also worked on 😮‍💨 ) in Rust!

JanKaul and others added 30 commits August 28, 2025 14:09
…ented

provide more context for unimplemented macro
…ented

provide more context for unimplemented macro
…or-unimplemented

Revert "provide more context for unimplemented macro"
use separate manifest files for delete files
fix: make version-hint.text compatible with other readers
fix: conflict when projecting a field not present in equality deletes
fix: update `last-updated-ms` field when updating a table
fix: use correct number of splits for deletes
docs: default compression level is 3
JanKaul and others added 4 commits August 28, 2025 14:13
- Implemented snapshot expiration logic in the maintenance module.
- Added methods for time-based and count-based snapshot retention.
- Created examples demonstrating the usage of snapshot expiration.
- Updated README with new features and examples.
- Added integration tests for snapshot expiration functionality.
…erg-rust-spec APIs

- Fix imports for SnapshotReference, SnapshotRetention, and SnapshotBuilder
- Update create_test_snapshot to use correct builder pattern
- Update create_test_ref to match current SnapshotReference structure
- Remove unused imports from test file
- All tests passing (9 unit tests + 6 integration tests)
- Resolved conflicts in Cargo.lock, datafusion_iceberg/Cargo.toml, manifest_list.rs, and transaction/operation.rs
- Updated duckdb version to 1.4 in dev dependencies
- Accepted upstream changes for files not modified by expire_snapshots feature
@ForeverAngry
Copy link
Author

@JanKaul let me know what you think :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants