Skip to content

Commit 53fa7e1

Browse files
authored
Merge pull request #49 from emdb-empiar/readme
Readme
2 parents 53d0dfa + fb7a6ba commit 53fa7e1

File tree

2 files changed

+98
-2
lines changed

2 files changed

+98
-2
lines changed

README.md

Lines changed: 98 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,98 @@
1-
# added_annotations
1+
# Added annotations: EMICSS (**E**MDB **In**tegration with **C**omplexes, **S**tructures and **S**equences)
2+
3+
This repository provides tools and scripts for extracting and adding annotations to EMDB entries, which are used to enhance the metadata associated with EM datasets.
4+
5+
### Table of Contents
6+
7+
* Installation
8+
* Configuration
9+
* Usage
10+
* Contributing
11+
* License
12+
13+
### Installation
14+
15+
To install the necessary dependencies, run:
16+
pip install -r requirements.txt
17+
18+
### Configuration
19+
20+
The repository uses a config.ini file for configuration, which is not included in the repository. This file should be created in the root directory of the project with the following structure:
21+
22+
```
23+
[file_paths]
24+
uniprot_tab: <path_to_file>/uniprot.tsv
25+
CP_ftp: <path_to_file>/complextab
26+
components_cif: <path_to_file>/components.cif
27+
chem_comp_list: <path_to_file>/chem_comp_list.xml
28+
pmc_ftp_gz: <path_to_file>/PMID_PMCID_DOI.csv.gz
29+
pmc_ftp: <path_to_file>/PMID_PMCID_DOI.csv
30+
emdb_pubmed: <path_to_file>/emdb_pubmed.log
31+
emdb_orcid: <path_to_file>/emdb_orcid.log
32+
assembly_ftp: <path_to_file>/assembly/
33+
BLAST_DB: <path_to_file>/ncbi-blast-2.13.0+/database/uniprot_sprot
34+
BLASTP_BIN: blastp
35+
sifts_GO: <path_to_file>/pdb_chain_go.csv
36+
GO_obo: <path_to_file>/go.obo
37+
GO_interpro: /nfs/ftp/pub/databases/GO/goa/external2go/interpro2go
38+
sifts: <path_to_file>/split_xml/
39+
alphafold_ftp: <path_to_file>/accession_ids.txt
40+
rfam_ftp: <path_to_file>/rfam_files_combined.txt
41+
42+
[api]
43+
pmc: https://www.ebi.ac.uk/europepmc/webservices/rest/searchPOST
44+
```
45+
46+
#### File Sources and Download Links
47+
| File | Descritption | Download Link |
48+
|-------------|-----------------------|----------------------------------------------------------------------------------------------------------------------------------------------------|
49+
| uniprot.tsv | UniProt annpotations | https://rest.uniprot.org/uniprotkb/stream?fields=accession,xref_pdb,protein_name&query=((database:pdb))&format=tsv&compressed=false |
50+
| complextab | Complex Portal data | https://ftp.ebi.ac.uk/pub/databases/complexportal/complexes.tab.gz |
51+
| components.cif | Chemical components data | https://ftp.ebi.ac.uk/pub/databases/msd/pdbechem_v2/ccd/components.cif |
52+
| chem_comp_list.xml | Chemical component list | https://ftp.ebi.ac.uk/pub/databases/msd/pdbechem_v2/ccd/chem_comp_list.xml |
53+
| PMID_PMCID_DOI.csv.gz | Europe PMC dataset (compressed) | https://europepmc.org/pub/databases/pmc/DOI/PMID_PMCID_DOI.csv.gz |
54+
| PMID_PMCID_DOI.csv | Unzipped version of the Europe PMC dataset | https://ftp.ebi.ac.uk/pub/databases/pmc/DOI/PMID_PMCID_DOI.csv |
55+
| emdb_pubmed | Mapping file created after running PublicationMapping.py | emdb_pubmed.log |
56+
| emdb_orcid | Mapping file created after running PublicationMapping.py | emdb_orcid.log |
57+
| assembly_ftp | PDB assemblies | https://ftp.ebi.ac.uk/pub/databases/msd/assemblies/split/ |
58+
| BLAST_DB | UniProt BLAST database | https://ftp.uniprot.org/pub/databases/uniprot/uniprot_sprot/uniprot_sprot.fasta.gz |
59+
| sifts_GO | PDB chain Gene Ontology mapping | https://ftp.ebi.ac.uk/pub/databases/msd/sifts/pdb_chain_go.csv |
60+
| GO_obo | Gene Ontology definitions | https://current.geneontology.org/ontology/go.obo |
61+
| GO_interpro | InterPro to GO mapping | https://ftp.ebi.ac.uk/pub/databases/GO/goa/external2go/interpro2go |
62+
| sifts | SIFTS data | https://ftp.ebi.ac.uk/pub/databases/msd/sifts/split_xml/ |
63+
| alphafold_ftp | AlphaFold DB accession IDs | https://ftp.ebi.ac.uk/pub/databases/alphafold/accession_ids.csv |
64+
| rfam_ftp | RFAM files | https://www.ebi.ac.uk/pdbe/search/pdb/select?q=emdb_id:*%20AND%20rfam:%5B*%20TO%20*%5D&wt=csv&fl=emdb_id,pdb_id,rfam,rfam_id,entity_id&rows=9999999 |
65+
| emd-xxxx-v30.xml | EMDB metadata | https://ftp.ebi.ac.uk/pub/databases/emdb/ |
66+
| xxxxx.xml | EMPIAR metadata | https://ftp.ebi.ac.uk/pub/databases/emtest/empiar |
67+
68+
### Usage
69+
70+
To use the tools and scripts in this repository, you just need to clone it and ensure the config.ini file is properly configured as described above.
71+
72+
#### Executing the scripts:
73+
74+
Execute the scripts independently in the following recommended order:
75+
##### EMPIAR mapping
76+
```
77+
fetch_empiar.py: python fetch_empiar.py -w <output_dir_to_store_annotated_empiar_files> -f <path_to_empiar_metadata_files>
78+
```
79+
##### Publication mapping
80+
```
81+
fetch_pubmed.py: python fetch_pubmed.py -w <output_dir_to_store_annotated_pubmed_files> -f <path_to_emdb_metadata_files>
82+
```
83+
##### Protein, complexes and ligands mapping
84+
```
85+
added_annotations.py: python added_annotations.py -w <output_dir_to_store_added_annotations> -f <path_to_emdb_metadata_files> --all -t <number_of_threads>
86+
```
87+
##### AlphaFold DB mapping
88+
```
89+
fetch_afdb.py: python fetch_afdb.py -w <output_dir_to_store_annotated_alphafdb_files>
90+
```
91+
##### Write files
92+
```
93+
write_xml.py: python write_xml.py <output_dir_to_store_EMICSS_xml_files>
94+
```
95+
96+
### Further information
97+
98+
For more information about EMICSS, visit the official EMICSS website (https://www.ebi.ac.uk/emdb/emicss). This page provides detailed information about the EMDB/EMICSS project.

fetch_pubmed.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,6 @@ def call_ePubmedCentral(pubmed_list, uri):
4141
if response.status_code == 200:
4242
try:
4343
pmcjdata = json.loads(response.text)
44-
#hitCount = pmcjdata['hitCount']
4544
if 'result' in pmcjdata['resultList']:
4645
result = pmcjdata['resultList']['result']
4746
for pub_data in result:

0 commit comments

Comments
 (0)