Skip to content

Conversation

florianzwagemaker
Copy link
Contributor

just some minor changes no big deal

This still requires some cleanup throughout the project as did some temporary bandaids just to get this branch working now, BUT this branch and new structure should work without too many issues, if any at all.

workflows are now centered in dedicated folders with their own scripts and components, the latter are smaller subsets of snakemake rules that all surround a specific task (or type of task).

Workflows should work both for Conda as well as Container mode (although you need to locally rebuild containers first)

Let me know what you all think

raaijmag and others added 11 commits July 10, 2025 09:28
test: added testing for genbank
fix: fixed amplicon_covs import error
fix: replace unnecessary f-strings with normal strings
fix: fixed some copilot PR review spotted errors
tests: wrote unit test for prepare_refs
refactor: change virconstrictor workflow stages into their own directories

refactor: change structure of workflow scripts to be linked to stage-workflows

refactor: addition of general workflow helper functionalities, change location of helper-files

refactor: correct the pathing of scripts/configs in dockerfiles

refactor: propagate module location changes throughout package
test: write unit tests for some scripts
chore: add aminoextract dev branch to env.yml
… base. This way scripts are standardized, testing is standardized, and code can be shared via the baseclass (logging, etc)

test: Scripts - added tests to some scripts, test files with starts for the rest. Now all scripts can be run individually to check when somethine went wrong.
test: data - test data added for all scripts and e2e tests. EQA2024 public data is used for e2e tests.
refactor: workflow.smk - changed execution of rules of the script to be executed as a package instead of script.
chore: env files - used AminoExtract and TrueConsense dev branches instead of main because I needed to make quick edits to those repos. Also added biovalid.
fix: dockerfiles - added adduser command because it is no longer in base
Comment on lines -3 to -16
- conda-forge
- bioconda
- nodefaults
- conda-forge
dependencies:
- conda-forge::python=3.8
- conda-forge::pip
- conda-forge::pandas==1.3.5
- conda-forge::libffi==3.3
- conda-forge::biopython==1.79
- conda-forge::parmap==1.5.3
- conda-forge::tqdm=4.62

- bioconda::pysam=0.16
- bioconda::pysamstats==1.1.2
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct?
This makes sense to me temporarily for the new version of aminoextract and its own dependencies, but i assume once that is published we should restore these dependencies?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is correct, and yes we should restore/re-test all dependencies once the dev branches of aminoextract and trueconsense are merged to their respective main branches

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it feasible to add this to our GH-actions workflow for automated testing?
i.e. post-container creation so we can immediately test both the conda workflow and the container workflow with the newly created containers (if they were made)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but it would require some work. I'll make sure it is in Jira

Comment on lines 110 to 133
fetch_recipes(f"{Path(os.path.dirname(os.path.realpath(__file__))).parent}/envs/")
)
script_files = sorted(
fetch_scripts(f"{os.path.dirname(os.path.realpath(__file__))}/scripts/")
fetch_scripts(f"{os.path.dirname(os.path.realpath(__file__))}/match_ref/scripts/")
)
config_files = sorted(
fetch_files(f"{os.path.dirname(os.path.realpath(__file__))}/files/")
script_files.extend(
fetch_scripts(f"{os.path.dirname(os.path.realpath(__file__))}/main/scripts/")
)
# config_files = sorted(
# fetch_files(f"{os.path.dirname(os.path.realpath(__file__))}/files/")
# )

# Calculate hashes for script files
script_hashes = calculate_hashes(script_files)

# Calculate hashes for config files
config_hashes = calculate_hashes(config_files)
# config_hashes = calculate_hashes(config_files)

# Sort the hashes of the scripts and the configs
script_hashes = dict(sorted(script_hashes.items()))
config_hashes = dict(sorted(config_hashes.items()))
# config_hashes = dict(sorted(config_hashes.items()))

# Join the hashes of the scripts and the configs, and create a new hash of the joined hashes
merged_hashes = hashlib.sha256(
"".join(list(script_hashes.values()) + list(config_hashes.values())).encode()
).hexdigest()[:6]
merged_hashes = hashlib.sha256("".join(list(script_hashes.values())).encode()).hexdigest()[:6]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note to self; this needs cleaning up.

import pandas as pd
import pysam

from ..base_script_class import BaseScript
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scripts relating to match_ref also use the from ..base_script_class import BaseScript import. Previously the scripts for 'match_ref' and 'main' were in a shared directory (with one level distance) while now they are truly in their own directories with a larger distance between them.

Does the relative import used in the 'match_ref' scripts still work? @raaijmag
And if it does (or if it doesn't), would it be helpful and easier to understand if we move the "base_script_class.py" file away from the individual scripts directories and place it in the more generic "helpers" dir instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked quickly, and you are correct. This currently does not work. Will add a jira task to fix this

refactor: add workflow root path as a standard binding in containers

chore: add init files to allow for correct importing of python module paths

refactor: add consistent pathing for scripts both in container mode and conda mode

fix: correct the inputs and outputs for all python modules in the match_ref workflow

fix: use the new AminoExtract api functions in GFF related scripts

chore: add black line-length configuration to pyproject.toml
…g allowed to inherit from reference genbank file
Copy link

@florianzwagemaker florianzwagemaker merged commit 636e1f2 into version_1.6.0 Sep 22, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants