-
Notifications
You must be signed in to change notification settings - Fork 2
Restructure of workflow(s) into separate components #143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
test: added testing for genbank fix: fixed amplicon_covs import error
fix: replace unnecessary f-strings with normal strings fix: fixed some copilot PR review spotted errors
test: added unit test for vcf_to_tsv
tests: wrote unit test for prepare_refs
…-rework-scripts-dir
refactor: change virconstrictor workflow stages into their own directories refactor: change structure of workflow scripts to be linked to stage-workflows refactor: addition of general workflow helper functionalities, change location of helper-files refactor: correct the pathing of scripts/configs in dockerfiles refactor: propagate module location changes throughout package
test: write unit tests for some scripts chore: add aminoextract dev branch to env.yml
… base. This way scripts are standardized, testing is standardized, and code can be shared via the baseclass (logging, etc) test: Scripts - added tests to some scripts, test files with starts for the rest. Now all scripts can be run individually to check when somethine went wrong. test: data - test data added for all scripts and e2e tests. EQA2024 public data is used for e2e tests. refactor: workflow.smk - changed execution of rules of the script to be executed as a package instead of script. chore: env files - used AminoExtract and TrueConsense dev branches instead of main because I needed to make quick edits to those repos. Also added biovalid. fix: dockerfiles - added adduser command because it is no longer in base
- conda-forge | ||
- bioconda | ||
- nodefaults | ||
- conda-forge | ||
dependencies: | ||
- conda-forge::python=3.8 | ||
- conda-forge::pip | ||
- conda-forge::pandas==1.3.5 | ||
- conda-forge::libffi==3.3 | ||
- conda-forge::biopython==1.79 | ||
- conda-forge::parmap==1.5.3 | ||
- conda-forge::tqdm=4.62 | ||
|
||
- bioconda::pysam=0.16 | ||
- bioconda::pysamstats==1.1.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correct?
This makes sense to me temporarily for the new version of aminoextract and its own dependencies, but i assume once that is published we should restore these dependencies?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it is correct, and yes we should restore/re-test all dependencies once the dev branches of aminoextract and trueconsense are merged to their respective main branches
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it feasible to add this to our GH-actions workflow for automated testing?
i.e. post-container creation so we can immediately test both the conda workflow and the container workflow with the newly created containers (if they were made)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but it would require some work. I'll make sure it is in Jira
fetch_recipes(f"{Path(os.path.dirname(os.path.realpath(__file__))).parent}/envs/") | ||
) | ||
script_files = sorted( | ||
fetch_scripts(f"{os.path.dirname(os.path.realpath(__file__))}/scripts/") | ||
fetch_scripts(f"{os.path.dirname(os.path.realpath(__file__))}/match_ref/scripts/") | ||
) | ||
config_files = sorted( | ||
fetch_files(f"{os.path.dirname(os.path.realpath(__file__))}/files/") | ||
script_files.extend( | ||
fetch_scripts(f"{os.path.dirname(os.path.realpath(__file__))}/main/scripts/") | ||
) | ||
# config_files = sorted( | ||
# fetch_files(f"{os.path.dirname(os.path.realpath(__file__))}/files/") | ||
# ) | ||
|
||
# Calculate hashes for script files | ||
script_hashes = calculate_hashes(script_files) | ||
|
||
# Calculate hashes for config files | ||
config_hashes = calculate_hashes(config_files) | ||
# config_hashes = calculate_hashes(config_files) | ||
|
||
# Sort the hashes of the scripts and the configs | ||
script_hashes = dict(sorted(script_hashes.items())) | ||
config_hashes = dict(sorted(config_hashes.items())) | ||
# config_hashes = dict(sorted(config_hashes.items())) | ||
|
||
# Join the hashes of the scripts and the configs, and create a new hash of the joined hashes | ||
merged_hashes = hashlib.sha256( | ||
"".join(list(script_hashes.values()) + list(config_hashes.values())).encode() | ||
).hexdigest()[:6] | ||
merged_hashes = hashlib.sha256("".join(list(script_hashes.values())).encode()).hexdigest()[:6] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note to self; this needs cleaning up.
import pandas as pd | ||
import pysam | ||
|
||
from ..base_script_class import BaseScript |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The scripts relating to match_ref
also use the from ..base_script_class import BaseScript
import. Previously the scripts for 'match_ref' and 'main' were in a shared directory (with one level distance) while now they are truly in their own directories with a larger distance between them.
Does the relative import used in the 'match_ref' scripts still work? @raaijmag
And if it does (or if it doesn't), would it be helpful and easier to understand if we move the "base_script_class.py" file away from the individual scripts directories and place it in the more generic "helpers" dir instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked quickly, and you are correct. This currently does not work. Will add a jira task to fix this
refactor: add workflow root path as a standard binding in containers chore: add init files to allow for correct importing of python module paths refactor: add consistent pathing for scripts both in container mode and conda mode fix: correct the inputs and outputs for all python modules in the match_ref workflow fix: use the new AminoExtract api functions in GFF related scripts chore: add black line-length configuration to pyproject.toml
…g allowed to inherit from reference genbank file
|
just some minor changes no big deal
This still requires some cleanup throughout the project as did some temporary bandaids just to get this branch working now, BUT this branch and new structure should work without too many issues, if any at all.
workflows are now centered in dedicated folders with their own scripts and components, the latter are smaller subsets of snakemake rules that all surround a specific task (or type of task).
Workflows should work both for Conda as well as Container mode (although you need to locally rebuild containers first)
Let me know what you all think