Skip to content

enhancement: select crawl by exact timestamp #66

@laurieburchell

Description

@laurieburchell

Querying the index brings back a status, timestamp, url triple, e.g.:

$ cdxt --cc --crawl CC-MAIN-2025-43 iter 'commoncrawl.org/get-started'  

status 200, timestamp 20251014220259, url https://www.commoncrawl.org/get-started
status 200, timestamp 20251016192109, url https://commoncrawl.org/get-started

It would be good to have direct method to bring back a particular record based on the timestamp alone. I'm aware you can do something like cdxt --cc --crawl CC-MAIN-2025-43 --from 20251016192109 --limit 1 warc 'commoncrawl.org/get-started' but a direct --timestamp flag or similar would be useful, given the presentation of the index records.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions