Skip to content

RFC a new field to store information about validated runs #3746

@tiborsimko

Description

@tiborsimko

Current behaviour

For CMS open data analysis workflows, it is necessary to read the information about validated runs.

For a given dataset, this information currently lives in the abstract field as a link.

Example of a record pointing to one list of validated runs: record 1

$ jq -S '.[] | select(.recid=="1") | .abstract' data/records/cms-primary-datasets.json
{
  "description": "<p>BTau primary dataset in AOD format from RunB of 2010</p> <p>The list of validated runs, which must be applied to all analyses, can be found in</p>",
  "links": [
    {
      "recid": "1000"
    }
  ]
}

Example of a record pointing to two lists of validated runs: record 30560

$ jq -S '.[] | select(.recid=="30560") | .abstract' data/records/cms-primary-datasets-Run2016.json
{
  "description": "<p>MuOnia primary dataset in NANOAOD format from RunH of 2016. Run period from run number 281613 to 284044.</p><p>The list of validated runs, which must be applied to all analyses, either with the full validation or for an analysis requiring only muons, can be found in:</p>",
  "links": [
    {
      "description": "Validated runs, full validation",
      "recid": "14220"
    },
    {
      "description": "Validated runs, muons only",
      "recid": "14221"
    }
  ]
}

This is usable, but not optimal, since in principle the description field can have any kind of link, not only to the list of validated runs, so the robot consumer would have to analyse what the links are pointing to.

Proposed behaviour

It would be good to introduce a new dedicated JSON field to hold the information about validated runs for the given dataset, so that this information could be easily machine-accessible in data production workflows.

For example:

  "validated_runs": [
    {
      "recid": "14221",
      "validation": "muonsonly"
    },
    {
      "recid": "14220",
      "validation": "full"
    }
  ]

Any suggestions?

Notes

Would it be also desirable to enrich the old CMS records, such as the record 1 cited above, to offer also muons-only validated run information? Or is the full validation sufficient for these Run1 datasets?

CC @IssaAlBawwab @nasseralbess @jmhogan @katilp @tpmccauley

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions