Skip to content

Commit 8e1493a

Browse files
authored
Merge pull request #1502 from cal-itp/portfolio-readme
add more to portfolio README
2 parents 1bd65ca + 6723ec0 commit 8e1493a

File tree

1 file changed

+56
-0
lines changed

1 file changed

+56
-0
lines changed

portfolio/README.md

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,3 +4,59 @@ This website contains data analysis and reports developed by Cal-ITP data analys
44
## Source code
55

66
All source code for these analyses and reports may be found [on GitHub](https://github.com/cal-itp/data-analyses).
7+
8+
## Workflow
9+
1. Create a parameterized notebook.
10+
This is one Jupyter notebook, set up in a way captions, charts, maps, and headings are constructed by using parameters.
11+
* [pointers for styling notebooks](https://docs.calitp.org/data-infra/publishing/sections/4_notebooks_styling.html)
12+
13+
2. Add a `site_name.yml` to `portfolio/sites/` - this controls the parameterization related to the notebooks.
14+
JupyterBooks have a table of contents and organize chapters and sections ([read more on Jupyterbook structure](https://jupyterbook.org/en/stable/structure/configure.html)).
15+
We allow both chapters and sections to be parameterized, and examples are given for the most common types of supported parameterized reports.
16+
* the path to the parameterized notebook must be defined (`project_folder/report.ipynb`)
17+
* the path to the README must be defiend (`project_folder/README.md`)
18+
* how to organize the chapters and sections. The 2 most common are (1): by Caltrans district (chapter) and each transit operator within the district gets a page (section), and (2) by Caltrans district (chapter) and each district is its own page (no section).
19+
* [various parameterization examples](https://docs.calitp.org/data-infra/publishing/sections/5_analytics_portfolio_site.html)
20+
21+
3. Building and deploying a new parameterized JupyterBook
22+
* There are two commands: `clean` and `build`
23+
* `python portfolio/portfolio.py clean MY_NEW_REPORT` (removes the local folder `portfolio/MY_NEW_REPORT/`)
24+
* `python portfolio/portfolio.py build MY_NEW_REPORT` (this parameterizes the notebook specified in `portfolio/sites/MY_NEW_REPORT.yml`). JupyterBook docs on [this](https://jupyterbook.org/en/stable/start/build.html).
25+
* local files are created in `portfolio/MY_NEW_REPORT/` (all files below are within this newly created `portfolio/MY_NEW_REPORT/` sub-directory)
26+
* JupyterBook necessary accessories: `toc.yml`, `README.md`, and `config.yml` (this needs to get checked into GitHub).
27+
* There will be additional files or directories holding the parameterized notebooks. The names of the notebooks will be constructed programmatically, but for
28+
illustrative purposes, we'll call the 2 notebooks `first_operator` and `second_operator`. (This folder needs to get checked into GitHub).
29+
Example: `portfolio/MY_NEW_REPORT/district_01_eureka/first_operator.ipynb`, `portfolio/MY_NEW_REPORT/district_01_eureka/second_operator.ipynb`.
30+
31+
`first_operator.ipynb` uses `project_folder/report.ipynb` to display the first operator's information in all the cells.
32+
`second_operator.ipynb` uses `project_folder/report.ipynb` to display the second operator's information in all the cells.
33+
* A second folder is created to [build the JupyterBook](https://jupyterbook.org/en/stable/start/build.html#aside-source-vs-build-files), and this is `portfolio/MY_NEW_REPORT/_build/`. Within the that directory, there are 2 more sub-directories (`_build/jupyter_execute/` and `_build/html`). This second folder is stored locally and **not** checked into GitHub. JupyterBook makes a distinction between the **source** and **build** files.
34+
* `_build/juptyer_execute/` files are a copy of the parameterized notebooks. The notebooks `portfolio/MY_NEW_REPORT/_build/jupyter_execute/district_01_eureka/first_operator.ipynb / second_operator.ipynb`, these are basically equivalent to the parameterized notebooks created in `portfolio/MY_NEW_REPORT/`
35+
* `_build/html` are the rendered HTML pages corresponding to the parameterized notebooks.
36+
* Instead of notebooks, now they are replaced with HTML files: `_build/html/district_01_eureka/first_operator / second_operator.html`
37+
* There are also additional folders within `_build/html`: `_sources`, `_sphyinx_design_static`, and `_static` and other files like `genindex.html`, `index.html`, `search.html`, `searchindex.js`, `README.html`, and `objects.inv`
38+
* `python portfolio/portfolio.py build MY_NEW_REPORT --deploy` (when we deploy to Netlify, the HTML files in `portfolio/MY_NEW_REPORT/_build/html` are rendered as a netlify site `https://{MY_NEW_REPORT}--cal-itp-data-analyses.netlify.app`
39+
40+
4. After checking the Netlify site that's created and making sure everything looks good, that site can make it onto the main `analysis.calitp.org` page with `python portfolio/portfolio.py index --deploy --prod`
41+
42+
5. All these steps are documented in the [Makefile](https://github.com/cal-itp/data-analyses/blob/main/Makefile). Some of the steps that are commented out should be uncommented depending on your use case.
43+
* If you've already checked in your site to GitHub, the next month you deploy your portfolio, you should use `git rm portfolio/$(site)/ -rf` and `clean $(site)` where `$(site)` is the name of your site based on `portfolio/sites/site_name.yml`. The `git rm` cleans up whatever is checked in and the `clean` removes the local folders that are not checked in. **Both are needed.** Not doing both can result in your `toc.yml` and HTML being out of sync.
44+
* If you're testing changes to your site, finish that up before you run `make production_portfolio`.
45+
* Check the files in with `make git_check_sections`, which adds all the parameterized notebooks in `portfolio/MY_NEW_REPORT/*.ipynb`, but doesn't check in the notebooks or HTML files in `_build/`.
46+
47+
```
48+
build_portfolio_site:
49+
cd portfolio/ && pip install -r requirements.txt && cd ../
50+
#need git rm because otherwise, just local removal, but git change is untracked
51+
#git rm portfolio/$(site)/ -rf
52+
python portfolio/portfolio.py clean $(site)
53+
python portfolio/portfolio.py build $(site) --deploy
54+
git add portfolio/sites/$(site).yml
55+
#make production_portfolio #(deploy onto the main page)
56+
57+
git_check_sections:
58+
git add portfolio/$(site)/*.ipynb # this one is most common, where operators nested under district
59+
```
60+
61+
6. We use Git Large File Storage `git lfs` to store these parameterized notebooks. However, we are also moving to storing these parameterized notebooks in Google Cloud Storage in the long run.
62+
* Swap out the `git add` and `git rm` steps. If using `gcsfs`, we can use the `fs.put` and `fs.rm` to cache the parameterized notebooks and built HTML files for JupyterBook.

0 commit comments

Comments
 (0)