-
Notifications
You must be signed in to change notification settings - Fork 162
Description
Current behaviour
For CMS open data analysis workflows, it is necessary to read the information about validated runs.
For a given dataset, this information currently lives in the abstract field as a link.
Example of a record pointing to one list of validated runs: record 1
$ jq -S '.[] | select(.recid=="1") | .abstract' data/records/cms-primary-datasets.json
{
"description": "<p>BTau primary dataset in AOD format from RunB of 2010</p> <p>The list of validated runs, which must be applied to all analyses, can be found in</p>",
"links": [
{
"recid": "1000"
}
]
}
Example of a record pointing to two lists of validated runs: record 30560
$ jq -S '.[] | select(.recid=="30560") | .abstract' data/records/cms-primary-datasets-Run2016.json
{
"description": "<p>MuOnia primary dataset in NANOAOD format from RunH of 2016. Run period from run number 281613 to 284044.</p><p>The list of validated runs, which must be applied to all analyses, either with the full validation or for an analysis requiring only muons, can be found in:</p>",
"links": [
{
"description": "Validated runs, full validation",
"recid": "14220"
},
{
"description": "Validated runs, muons only",
"recid": "14221"
}
]
}
This is usable, but not optimal, since in principle the description
field can have any kind of link, not only to the list of validated runs, so the robot consumer would have to analyse what the links are pointing to.
Proposed behaviour
It would be good to introduce a new dedicated JSON field to hold the information about validated runs for the given dataset, so that this information could be easily machine-accessible in data production workflows.
For example:
"validated_runs": [
{
"recid": "14221",
"validation": "muonsonly"
},
{
"recid": "14220",
"validation": "full"
}
]
Any suggestions?
Notes
Would it be also desirable to enrich the old CMS records, such as the record 1 cited above, to offer also muons-only validated run information? Or is the full validation sufficient for these Run1 datasets?