|
13 | 13 | " - `communities/`: Contains all the files that will be generated following the code in this notebook.\n", |
14 | 14 | " - `bedToBigBed`: Program to convert .bed to bigBed format as explained here: https://genome.ucsc.edu/goldenPath/help/bigBed.html\n", |
15 | 15 | " - `genomes.txt` / `hub.txt`: Files needed for Tracking Hub\n", |
16 | | - " - `hg19.chrom.size`: File needed to execute this notebook, downloaded from https://genome.ucsc.edu/goldenPath/help/bigBed.html\n", |
| 16 | + " - `hg38.chrom.size`: File needed to execute this notebook, downloaded from https://github.com/igvteam/igv/blob/master/genomes/sizes/hg38.chrom.sizes\n", |
17 | 17 | " - `process_beds.sh`: Script that will convert all .bed files to bigBed format, deleting all .bed files. In practice, it executes Example \\#2 from this link: https://genome.ucsc.edu/goldenPath/help/bigBed.html\n", |
18 | 18 | "\n", |
19 | 19 | "\n", |
20 | | - "Link provided to Track Hub: https://raw.githubusercontent.com/tjiagoM/gtex-transcriptome-modelling/master/track_hub/hub.txt" |
| 20 | + "The link provided to Track Hub is: https://raw.githubusercontent.com/tjiagoM/gtex-transcriptome-modelling/master/track_hub/hub.txt\n", |
| 21 | + "\n", |
| 22 | + "For a complete set of genes for all communities, we also provide the file `outputs/all_communities_genes.txt`. Unfortunately, some genes did not have a mapping resulting from the code in this notebook, therefore the tracking hub contains an incomplete set of genes. Instead, `all_communities_genes.txt` is complete. As in the paper, we only consider communities with more then 3 genes." |
21 | 23 | ] |
22 | 24 | }, |
23 | 25 | { |
|
56 | 58 | "# Saving the chromosome's limits, based on the file `hg19.chrom.sizes` downloaded from https://genome.ucsc.edu/goldenPath/help/bigBed.html\n", |
57 | 59 | "\n", |
58 | 60 | "chr_limits = dict()\n", |
59 | | - "with open('track_hub/hg19.chrom.sizes', 'r') as reader:\n", |
| 61 | + "with open('track_hub/hg38.chrom.sizes', 'r') as reader:\n", |
60 | 62 | " lines = reader.readlines()\n", |
61 | 63 | " for line in lines:\n", |
62 | 64 | " line_info = line.split('\\t')\n", |
|
104 | 106 | }, |
105 | 107 | { |
106 | 108 | "cell_type": "code", |
107 | | - "execution_count": 5, |
| 109 | + "execution_count": 6, |
108 | 110 | "metadata": {}, |
109 | 111 | "outputs": [], |
110 | 112 | "source": [ |
111 | | - "with open('track_hub/communities/trackDb.txt', 'w') as f_track_hubs:\n", |
| 113 | + "with open('track_hub/hg38/trackDb.txt', 'w') as f_track_hubs:\n", |
112 | 114 | " for tissue in TISSUES:\n", |
113 | 115 | " try:\n", |
114 | 116 | " for community_id in range(1, 999999):\n", |
115 | 117 | " arr_com = []\n", |
116 | 118 | " dic_community = pickle.load(open(\"svm_results/\" + tissue + '_' + str(community_id) + \".pkl\", \"rb\"))\n", |
117 | 119 | " len_common = len(dic_community['genes'])\n", |
118 | 120 | "\n", |
119 | | - " with open(f'track_hub/communities/{tissue}_{community_id}.bed', 'w') as f:\n", |
| 121 | + " with open(f'track_hub/hg38/{tissue}_{community_id}.bed', 'w') as f:\n", |
120 | 122 | " for gene in dic_community['genes']:\n", |
121 | 123 | " if gene in dic_all_genes_info.keys():\n", |
122 | 124 | " gene_info = dic_all_genes_info[gene]\n", |
123 | 125 | " f.write(f'{gene_info[\"chr\"]}\\t{gene_info[\"chr_start\"]}\\t{gene_info[\"chr_end\"]}\\n')\n", |
124 | 126 | " \n", |
125 | 127 | " f_track_hubs.write(f'track {tissue}_{community_id}\\n')\n", |
126 | | - " f_track_hubs.write(f'bigDataUrl https://raw.githubusercontent.com/tjiagoM/gtex-transcriptome-modelling/master/track_hub/communities/{tissue}_{community_id}.bb\\n')\n", |
| 128 | + " f_track_hubs.write(f'bigDataUrl https://raw.githubusercontent.com/tjiagoM/gtex-transcriptome-modelling/master/track_hub/hg38/{tissue}_{community_id}.bb\\n')\n", |
127 | 129 | " f_track_hubs.write(f'shortLabel {tissue}_{community_id}\\n')\n", |
128 | 130 | " f_track_hubs.write(f'longLabel {tissue}_{community_id}\\n')\n", |
129 | 131 | " f_track_hubs.write(f'type bigBed\\n')\n", |
|
0 commit comments