RFC a new field to store information about validated runs

### Current behaviour

For CMS open data analysis workflows, it is necessary to read the information about validated runs.

For a given dataset, this information currently lives in the abstract field as a link.

Example of a record pointing to one list of validated runs: [record 1](https://opendata.cern.ch/record/1)

```console
$ jq -S '.[] | select(.recid=="1") | .abstract' data/records/cms-primary-datasets.json
{
 "description": "BTau primary dataset in AOD format from RunB of 2010 The list of validated runs, which must be applied to all analyses, can be found in",
 "links": [
 {
 "recid": "1000"
 }
 ]
}
```

Example of a record pointing to two lists of validated runs: [record 30560](https://opendata-qa.cern.ch/record/30560)

```console
$ jq -S '.[] | select(.recid=="30560") | .abstract' data/records/cms-primary-datasets-Run2016.json
{
 "description": "MuOnia primary dataset in NANOAOD format from RunH of 2016. Run period from run number 281613 to 284044.The list of validated runs, which must be applied to all analyses, either with the full validation or for an analysis requiring only muons, can be found in:",
 "links": [
 {
 "description": "Validated runs, full validation",
 "recid": "14220"
 },
 {
 "description": "Validated runs, muons only",
 "recid": "14221"
 }
 ]
}
```

This is usable, but not optimal, since in principle the `description` field can have any kind of link, not only to the list of validated runs, so the robot consumer would have to analyse what the links are pointing to.

### Proposed behaviour

It would be good to introduce a new dedicated JSON field to hold the information about validated runs for the given dataset, so that this information could be easily machine-accessible in data production workflows.

For example:

```json
 "validated_runs": [
 {
 "recid": "14221",
 "validation": "muonsonly"
 },
 {
 "recid": "14220",
 "validation": "full"
 }
 ]
```

Any suggestions?

### Notes

Would it be also desirable to enrich the old CMS records, such as the record 1 cited above, to offer also muons-only validated run information? Or is the full validation sufficient for these Run1 datasets?

CC @IssaAlBawwab @nasseralbess @jmhogan @katilp @tpmccauley

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RFC a new field to store information about validated runs #3746

Current behaviour

Proposed behaviour

Notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

RFC a new field to store information about validated runs #3746

Description

Current behaviour

Proposed behaviour

Notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions