Skip to content

[Question] Output of CLI command different from documentation? (v6.1.1) #251

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Coco501 opened this issue Apr 17, 2025 · 4 comments
Open

Comments

@Coco501
Copy link

Coco501 commented Apr 17, 2025

Hello! I have a few questions regarding the SIRIUS CLI, specifically the output when running commands. I am using SIRIUS version 6.1.1 on Ubuntu inside a Docker container based off the eclipse-temurin:21 java image.

So far, my commands have followed the following format (based off the quick-start CLI documentation):
sirius --input tests/test.mgf --output query-results/sirius-output formula -p orbitrap fingerprint compound-classes write-summaries --output query-results/sirius-summary

Based on the documentation, I would expect this command to create a formula_candidates.tsv file under the summary directory with the ten most likely compounds (along with other files). When I run this command, I get a sirius-output.sirius file and a sirius-summary folder under query-results/ which is expected, but inside the sirius-summary folder, I get 6 files: canopus_formula_summary.tsv, canopus_structure_summary.tsv, denovo_structure_identifications.tsv, spectral_matches.tsv, structure_identifications.tsv, and formula_identifications.tsv. The formula_identifications file is the only file that has any content (aside from headers). And the content is a single compound identified by SIRIUS given the MGF file. Why is there only one compound listed? Is it because SIRIUS is ultra-confident in that one being correct? Or should I be seeing a list of 10 compounds ordered by rank? And is this formula_identifications.tsv file the same as the formula_candidates.tsv file mentioned in the documentation? If not, where is that file / why is it not being produced?

Also, I am struggling to find information on the structure command. What does adding structure --database pubchem to my command do? What kind of database does the command use if no database is supplied? Is it always better to use pubchem over no specified db? I read pubchem has over 111 million formulas?

Another question I had was regarding the indexing of the commands. I noticed some commands do not work when you switch up the order, is there a documentation page where we can see which commands are expected at which index / order? I couldn't find any information on it. Sirius --help gave me a little more insight but I am still unsure. Thank you!

Here is the testing MGF file I have been using:

BEGIN IONS
PEPMASS=413.26611887841
CHARGE=1+

189.48956 1.9
283.62076 3.4
301.22977 66.3
311.08008 1.3
399.99106 2.3

END IONS

And the output inside formula_identifications has been: (how come some entries are empty, like overallFeatureQuality?)
1 C21H42O5 [M + K]+ C21H42KO5+ 0.677 14.089 14.089 0.000 1 0.882 1.862 1.862 -0.638 413.266 700493078913457792 700493074744319612 700493074744319612_UNKNOWN_FEATURE_1

Thank you so much for your time!

Joe

@MartinHoffmannJena
Copy link
Collaborator

Hi,

You might be looking at outdated documentation, this is the correct link for SIRIUS 6: https://v6.docs.sirius-ms.io. It contains information on most of your questions. Could you please let me know where you got the link for the old documentation from, so that we can replace it there as well?

For questions not answered in the docs, our gitter community at https://matrix.to/#/#sirius-ms:gitter.im would be a better suited place

@Coco501
Copy link
Author

Coco501 commented Apr 17, 2025

Hi Martin, thank you for the reply!

It seems somehow, even though I was aware of the deprecated v5 documentation, I had ended up reading some of that one for the Quick Start CLI guide. That's where my confusion about formula_candidates.tsv came from. My apologies. I still have a few questions after re-reading the proper documentation, though.

How many compounds can be expected in formula_identifications.tsv? My understanding of the documentation is that there should be several per command, ranked by SIRIUS score, but I am always receiving only one. I am assuming that is because SIRIUS has an exceeding confidence level in that one compound, but I am not sure. The reason I'm skeptical is because when providing exact MSMS and PEPMASS values, I am getting one result, and when I lower the precision of the PEPMASS for example, from 403.4015132 to 403.4015, I get a different result. I am expecting to see several compounds listed with different SIRIUS scores, and I would expect the previous result to be among them, but maybe my lack of knowledge in this field is causing this confusion. If so, I apologize. Any clarification on that would be great.

I have joined the gitter community and will forward my questions there from now on. Thank you again!

Joe

@kaibioinfo
Copy link
Contributor

kaibioinfo commented Apr 18, 2025

The summary files for the project should contain only one answer per compound (cause it is a "summary"). You can access the original ranking via the REST api.

@Coco501
Copy link
Author

Coco501 commented Apr 18, 2025

Thank you Kai!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants