Fixed typos

jghanaim04 · web-flow · commit 7bd2570253d9 · 2025-03-17T16:49:45.000-04:00
diff --git a/GoogleCloud/01-RNA-Seq/RNA-seq.ipynb b/GoogleCloud/01-RNA-Seq/RNA-seq.ipynb
@@ -26,7 +26,7 @@
     "    <b>[1] Reference:</b> Virkud, Y. V., Kelly, R. S., Wood, C., & Lasky-Su, J. A. (2019). The nuts and bolts of omics for the clinical allergist. Annals of Allergy, Asthma and Immunology, 123(6), 558-563.\n",
     "</div>\n",
     "\n",
-    "The central dogma of molecular biology is the representation of omes and omics. Omic data of various kinds are frequently employed in human medical research. The fields of omic research include the omic data produced by the central dogma, which includes the fields of genomics (DNA), transcriptomics (RNA), proteomics (proteins), and metabolomics (small molecules, including amino acids, fatty acids, carbohydrates, vitamins, lipids, and nucleotides); however, new types of omic data have emerged, including the fields of epigenomics (methyl tags and histones), exposomics (allergens, toxins, diet (bacteria and microorganisms). As a result, the majority of early scientific efforts were devoted to describing the genome, transcriptome, and proteome. However, seven major omics disciplines are currently being investigated in great detail: the genome (DNA), transcriptome (RNA), proteome (proteins), epigenome (DNA modifications that influence expression), metabolome (metabolites), microbiome (microbiota), and exposome (exposures).\n",
+    "The central dogma of molecular biology is the representation of omes and omics. Omic data of various kinds are frequently employed in human medical research. The fields of omic research include the omic data produced by the central dogma, which includes the fields of genomics (DNA), transcriptomics (RNA), proteomics (proteins), and metabolomics (small molecules, including amino acids, fatty acids, carbohydrates, vitamins, lipids, and nucleotides); however, new types of omic data have emerged, including the fields of epigenomics (methyl tags and histones), exposomics (allergens, toxins, diet, bacteria and microorganisms). As a result, the majority of early scientific efforts were devoted to describing the genome, transcriptome, and proteome. However, seven major omics disciplines are currently being investigated in great detail: the genome (DNA), transcriptome (RNA), proteome (proteins), epigenome (DNA modifications that influence expression), metabolome (metabolites), microbiome (microbiota), and exposome (exposures).\n",
     "\n",
     "### <span> Next Generation Sequencing Technique for RNA-Seq. <span>\n",
     "\n",
@@ -121,7 +121,7 @@
     "\n",
     "A bioinformatics pipeline called nf-core/rnaseq can be used to analyze RNA sequencing data from organisms with an annotated reference genome. This pipeline represents different stages of the analysis. It contains all the analysis steps, starting from preprocessing of the fastq data followed by genome alignment and quantification. Gene expression levels are generated from mRNA and miRNA sequencing data using RNA-Seq quantification. The next step is pseudo-alignment and quantification, followed by post-processing of the data, and then the final quality control of the input data is performed. The different colors of the pipeline represent the different methods of processing the fastq files. For example, the black line represents STAR, quantification, and salmon software usage to process the files. The user can choose any method of their choice while processing their files. \n",
     "\n",
-    "This step is **optional** as it is the preprocessing step to let you experience generating your own gene counts table. To save on computational and storage resources, we have already provided the gene count table with this module that will be copied from our bucket in step 3. The gene counts can also be extracted from the NCBI's GEO website using the same data acccession under the supplementary files section.  \n",
+    "This step is **optional** as it is the preprocessing step to let you experience generating your own gene counts table. To save on computational and storage resources, we have already provided the gene count table with this module that will be copied from our bucket in step 3. The gene counts can also be extracted from the NCBI's GEO website using the same data accession under the supplementary files section.  \n",
     "\n",
     "If however you want to try the nextflow analysis, here are a few tips to help you along. First, if you are not using NIH Cloud Lab as your environment, you need to configure the Nextflow Service Account following [this guide](https://github.com/NIGMS/NIGMS-Sandbox/blob/main/docs/HowToCreateNextflowServiceAccount.md). Second, you will need to configure your config file to point to Google Batch. We provide a template that you can modify with your GCP bucket (need to create one, `gsutil mb gs://UNIQUE-BUCKET-NAME` and your project ID. Again, you only need to create the service account if not using NIH Cloud Lab. For further details on how to use Nextflow for RNA Seq analysis, please refer to [nf-core/rnaseq](https://nf-co.re/rnaseq) or [Transcriptome-Assembly-Refinement-and-Applications](https://github.com/NIGMS/Transcriptome-Assembly-Refinement-and-Applications) module to learn more about pre-processing through Nextflow."
    ]
@@ -233,7 +233,7 @@
    "outputs": [],
    "source": [
     "system('export NXF_MODE=google') \n",
-    "#Install nexflow, make it exceutable, and update it\n",
+    "#Install nexflow, make it executable, and update it\n",
     "system('curl https://get.nextflow.io | bash' , intern=TRUE)\n",
     "system('chmod +x nextflow' , intern=TRUE)\n",
     "system('./nextflow self-update' , intern=TRUE)"
@@ -275,7 +275,7 @@
    "source": [
     "<div class=\"alert alert-block alert-info\">\n",
     "    <i class=\"fa fa-lightbulb-o\" aria-hidden=\"true\"></i>\n",
-    "    <b>Tip: </b> If you don't immediately see a output on your screen check your output directory you have pointed to in your config file to insure that Nextflow is running. You should see some output directories/files.\n",
+    "    <b>Tip: </b> If you don't immediately see an output on your screen check your output directory you have pointed to in your config file to ensure that Nextflow is running. You should see some output directories/files.\n",
     "</div>"
    ]
   },
@@ -340,7 +340,7 @@
    "source": [
     "<div class=\"alert alert-block alert-success\">\n",
     "    <i class=\"fa fa-hand-paper-o\" aria-hidden=\"true\"></i>\n",
-    "    <b>Note: </b>  If you've used Nextflow to produce your gene counts table and would like to use it for the down processing analysis instead of the provided counts table enter your own files into the code above by copying the <b>salmon.merged.gene_counts.tsv</b> from the salmon subdirectory within your Nextflow output directory.\n",
+    "    <b>Note: </b>  If you've used Nextflow to produce your gene counts table and would like to use it for the downstream processing analysis instead of the provided counts table enter your own files into the code above by copying the <b>salmon.merged.gene_counts.tsv</b> from the salmon subdirectory within your Nextflow output directory.\n",
     "</div>"
    ]
   },
@@ -406,7 +406,7 @@
    "outputs": [],
    "source": [
     "DESeq.ds <- DESeq.ds[ rowSums(counts(DESeq.ds)) > 0, ]\n",
-    "#Inspect data after manupalation\n",
+    "#Inspect data after manipulation\n",
     "rowSums(counts(DESeq.ds)) %>% head\n",
     "colSums(readcounts)\n",
     "colSums(counts(DESeq.ds)) \n",
@@ -776,8 +776,22 @@
   }
  ],
  "metadata": {
+  "kernelspec": {
+   "display_name": "conda_python3",
+   "language": "python",
+   "name": "conda_python3"
+  },
   "language_info": {
-   "name": "python"
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.16"
   }
  },
  "nbformat": 4,
diff --git a/GoogleCloud/02-RRBS/RRBS-downstream.ipynb b/GoogleCloud/02-RRBS/RRBS-downstream.ipynb
@@ -39,7 +39,7 @@
     "\n",
     "<figure>\n",
     "<img src=\"../../images/epigenetic-mech.jpeg\" width=\"700\" height=\"500\">\n",
-    "<figcaption align = \"center\"> <b> Fig 1: Affect of epigenetic mechanisms on health. [1] </b> </figcaption>\n",
+    "<figcaption align = \"center\"> <b> Fig 1: Effect of epigenetic mechanisms on health. [1] </b> </figcaption>\n",
     "    \n",
     "</figure>\n",
     "    \n",
@@ -76,7 +76,7 @@
     "    \n",
     "This figure represents the analysis architecture followed in this module. The module has been designed according to the resources and the availability of data. The analysis steps represent the pipeline that can be implemented using the Nextflow nf-core/methylseq module. In this figure, the analysis steps to perform methyl seq are shown. Now, there are two different workflows that can be followed to implement this pipeline. The first one is Bismark workflow, where it shows all the tools which can be used for each step of the analysis. We have a similar tools list for each step for the bwa-meth workflow. Both of them are very popular workflows to implement methylseq pipeline.\n",
     "    \n",
-    "The sample command to run nf-core methylseq pipeline to generate quality control reports and extract methylation call and coverage file is provided below. #### This step is <u>optional</u> as it is the preprocessing step to let you experience generating your own methylation coverage file. To save on computational and storage resources, we have already provided the methylation coverage file you will use in the down processing analysis in step 3. \n",
+    "The sample command to run nf-core methylseq pipeline to generate quality control reports and extract methylation call and coverage file is provided below. #### This step is <u>optional</u> as it is the preprocessing step to let you experience generating your own methylation coverage file. To save on computational and storage resources, we have already provided the methylation coverage file you will use in the downstream processing analysis in step 3. \n",
     "    \n",
     "If you choose to generate your own methylation coverage file then refer to the instructions outlined in the RNAseq submodule, and refer to the nf-core [methylseq](https://nf-co.re/methylseq). Again, you will need to modify the config file to include your bucket and project ID. "
    ]
@@ -213,7 +213,7 @@
    "source": [
     "<div class=\"alert alert-block alert-info\">\n",
     "    <i class=\"fa fa-lightbulb-o\" aria-hidden=\"true\"></i>\n",
-    "    <b>Tip: </b> If you don't immediately see a output on your screen check your output directory you have pointed to in your config file to insure that Nextflow is running. You should see some output directories/files.\n",
+    "    <b>Tip: </b> If you don't immediately see a output on your screen check your output directory you have pointed to in your config file to ensure that Nextflow is running. You should see some output directories/files.\n",
     "</div>"
    ]
   },
@@ -300,7 +300,7 @@
    "source": [
     "<div class=\"alert alert-block alert-success\">\n",
     "    <i class=\"fa fa-hand-paper-o\" aria-hidden=\"true\"></i>\n",
-    "    <b>Note: </b>  If you've used Nextflow to produce your methylation coverage files and would like to use them for the down processing analysis instead of the test data provided enter your own files into the two previous code cells above with by copying them from the <b>bismark</b> subdirectory within your Nextflow outputs directory.\n",
+    "    <b>Note: </b>  If you've used Nextflow to produce your methylation coverage files and would like to use them for the downstream processing analysis instead of the test data provided enter your own files into the two previous code cells above by copying them from the <b>bismark</b> subdirectory within your Nextflow outputs directory.\n",
     "</div>"
    ]
   },
@@ -351,7 +351,7 @@
    "metadata": {},
    "source": [
     "### Filter Step\n",
-    "Filtering samples based on coverage can often be useful. Specifically, if samples have overamplification or PCR bias, it can be useful to discard bases that have a very high read coverage. Bases with a very low read coverage should also be discarded because they tend to produce statistics that are unreliable and unstable in the downstream analyses. The code shown below filters a methylRawList and discards bases that have covereage below 10 reads, which was already done when the files were read in. Additionally, the code below discards bases with more than 99.9th percentile coverage in each sample."
+    "Filtering samples based on coverage can often be useful. Specifically, if samples have overamplification or PCR bias, it can be useful to discard bases that have a very high read coverage. Bases with a very low read coverage should also be discarded because they tend to produce statistics that are unreliable and unstable in the downstream analyses. The code shown below filters a methylRawList and discards bases that have coverage below 10 reads, which was already done when the files were read in. Additionally, the code below discards bases with more than 99.9th percentile coverage in each sample."
    ]
   },
   {
@@ -540,7 +540,7 @@
    "source": [
     "### <span> Differential Methylation </span>\n",
     "### Single CpG Sites\n",
-    "Once we have confirmed that the basic statistics and data structures of the samples are reasonable, we can proceed to differential methylation. Differential DNA methylation is usually calculated by comparing the proportion of methylated Cs in a test sample relative to a control. The Fisher's Exact Test and similar methods can be applied when there are no replicates for the test and control cases. This can allow us to make simple comparisons between the pairs of samples such as the test and control. When replicates are present, regression based methods are typically used to model the methylation levels relative to the sample groups and variation between the replicates. Regression methods also have another additional advantage over the use of the Fisher's Exact test in that they all for the inclusion of sample specific covariates (categorical or continuous) as well as the ability to adjust for confounding variables. \n",
+    "Once we have confirmed that the basic statistics and data structures of the samples are reasonable, we can proceed to differential methylation. Differential DNA methylation is usually calculated by comparing the proportion of methylated Cs in a test sample relative to a control. The Fisher's Exact Test and similar methods can be applied when there are no replicates for the test and control cases. This can allow us to make simple comparisons between the pairs of samples such as the test and control. When replicates are present, regression based methods are typically used to model the methylation levels relative to the sample groups and variation between the replicates. Regression methods also have another additional advantage over the use of the Fisher's Exact test in that they allow for the inclusion of sample specific covariates (categorical or continuous) as well as the ability to adjust for confounding variables. \n",
     "\n",
     "There are three options provided to get the differential methylation results namely Fisher’s Exact Test, Betabinomial Distribution Based Test, and Logistic Regression Based Test as you will see below. Only the Fisher’s exact test and the Logistic Regression based test will be explored. If you plan to use Betabinomial Distribution Based Test or compare the results of all three types of tests, the code can be uncommented. "
    ]
@@ -615,7 +615,7 @@
    "metadata": {},
    "source": [
     "### Optional: Betabinomial-Distribution-Based Tests\n",
-    "The beta-binominal model for calculating the differential methylation can be accessed through the code below. This accounts for both sampling and epigenetic variablity, and is useful for better modeling of the variance. This model follows the binominal distribution of the number of reads which is similar to how logistic regression works. However, the beta distribution can have varying methylation proportions across samples.\n",
+    "The beta-binomial model for calculating the differential methylation can be accessed through the code below. This accounts for both sampling and epigenetic variability, and is useful for better modeling of the variance. This model follows the binomial distribution of the number of reads which is similar to how logistic regression works. However, the beta distribution can have varying methylation proportions across samples.\n",
     "\n",
     "If you plan to use Betabinomial Distribution Based Test or compare the results of all three types of tests, the code can be uncommented. "
    ]
@@ -947,8 +947,22 @@
   }
  ],
  "metadata": {
+  "kernelspec": {
+   "display_name": "conda_python3",
+   "language": "python",
+   "name": "conda_python3"
+  },
   "language_info": {
-   "name": "python"
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.16"
   }
  },
  "nbformat": 4,
diff --git a/GoogleCloud/03-Integration/Integration.ipynb b/GoogleCloud/03-Integration/Integration.ipynb