Restructure of workflow(s) into separate components #143

florianzwagemaker · 2025-08-07T17:07:05Z

just some minor changes no big deal

This still requires some cleanup throughout the project as did some temporary bandaids just to get this branch working now, BUT this branch and new structure should work without too many issues, if any at all.

workflows are now centered in dedicated folders with their own scripts and components, the latter are smaller subsets of snakemake rules that all surround a specific task (or type of task).

Workflows should work both for Conda as well as Container mode (although you need to locally rebuild containers first)

Let me know what you all think

test: added testing for genbank fix: fixed amplicon_covs import error

fix: replace unnecessary f-strings with normal strings fix: fixed some copilot PR review spotted errors

test: added unit test for vcf_to_tsv

tests: wrote unit test for prepare_refs

…-rework-scripts-dir

refactor: change virconstrictor workflow stages into their own directories refactor: change structure of workflow scripts to be linked to stage-workflows refactor: addition of general workflow helper functionalities, change location of helper-files refactor: correct the pathing of scripts/configs in dockerfiles refactor: propagate module location changes throughout package

test: write unit tests for some scripts chore: add aminoextract dev branch to env.yml

… base. This way scripts are standardized, testing is standardized, and code can be shared via the baseclass (logging, etc) test: Scripts - added tests to some scripts, test files with starts for the rest. Now all scripts can be run individually to check when somethine went wrong. test: data - test data added for all scripts and e2e tests. EQA2024 public data is used for e2e tests. refactor: workflow.smk - changed execution of rules of the script to be executed as a package instead of script. chore: env files - used AminoExtract and TrueConsense dev branches instead of main because I needed to make quick edits to those repos. Also added biovalid. fix: dockerfiles - added adduser command because it is no longer in base

florianzwagemaker · 2025-09-03T19:10:22Z

ViroConstrictor/workflow/envs/Consensus.yaml

-  - conda-forge
  - bioconda
-  - nodefaults
+  - conda-forge
 dependencies:
-  - conda-forge::python=3.8
-  - conda-forge::pip
-  - conda-forge::pandas==1.3.5
-  - conda-forge::libffi==3.3
-  - conda-forge::biopython==1.79
-  - conda-forge::parmap==1.5.3
-  - conda-forge::tqdm=4.62
-
-  - bioconda::pysam=0.16
-  - bioconda::pysamstats==1.1.2


Is this correct?
This makes sense to me temporarily for the new version of aminoextract and its own dependencies, but i assume once that is published we should restore these dependencies?

Yes it is correct, and yes we should restore/re-test all dependencies once the dev branches of aminoextract and trueconsense are merged to their respective main branches

florianzwagemaker · 2025-09-03T19:12:22Z

tests/e2e/test_e2e.py

Is it feasible to add this to our GH-actions workflow for automated testing?
i.e. post-container creation so we can immediately test both the conda workflow and the container workflow with the newly created containers (if they were made)

Yes, but it would require some work. I'll make sure it is in Jira

florianzwagemaker · 2025-09-03T19:14:33Z

ViroConstrictor/workflow/helpers/containers.py

+        fetch_recipes(f"{Path(os.path.dirname(os.path.realpath(__file__))).parent}/envs/")
    )
    script_files = sorted(
-        fetch_scripts(f"{os.path.dirname(os.path.realpath(__file__))}/scripts/")
+        fetch_scripts(f"{os.path.dirname(os.path.realpath(__file__))}/match_ref/scripts/")
    )
-    config_files = sorted(
-        fetch_files(f"{os.path.dirname(os.path.realpath(__file__))}/files/")
+    script_files.extend(
+        fetch_scripts(f"{os.path.dirname(os.path.realpath(__file__))}/main/scripts/")
    )
+    # config_files = sorted(
+    #     fetch_files(f"{os.path.dirname(os.path.realpath(__file__))}/files/")
+    # )

    # Calculate hashes for script files
    script_hashes = calculate_hashes(script_files)

    # Calculate hashes for config files
-    config_hashes = calculate_hashes(config_files)
+    # config_hashes = calculate_hashes(config_files)

    # Sort the hashes of the scripts and the configs
    script_hashes = dict(sorted(script_hashes.items()))
-    config_hashes = dict(sorted(config_hashes.items()))
+    # config_hashes = dict(sorted(config_hashes.items()))

    # Join the hashes of the scripts and the configs, and create a new hash of the joined hashes
-    merged_hashes = hashlib.sha256(
-        "".join(list(script_hashes.values()) + list(config_hashes.values())).encode()
-    ).hexdigest()[:6]
+    merged_hashes = hashlib.sha256("".join(list(script_hashes.values())).encode()).hexdigest()[:6]


note to self; this needs cleaning up.

florianzwagemaker · 2025-09-03T19:29:19Z

ViroConstrictor/workflow/match_ref/scripts/count_mapped_reads.py

+import pandas as pd
+import pysam
+
+from ..base_script_class import BaseScript


The scripts relating to match_ref also use the from ..base_script_class import BaseScript import. Previously the scripts for 'match_ref' and 'main' were in a shared directory (with one level distance) while now they are truly in their own directories with a larger distance between them.

Does the relative import used in the 'match_ref' scripts still work? @raaijmag
And if it does (or if it doesn't), would it be helpful and easier to understand if we move the "base_script_class.py" file away from the individual scripts directories and place it in the more generic "helpers" dir instead?

I checked quickly, and you are correct. This currently does not work. Will add a jira task to fix this

refactor: add workflow root path as a standard binding in containers chore: add init files to allow for correct importing of python module paths refactor: add consistent pathing for scripts both in container mode and conda mode fix: correct the inputs and outputs for all python modules in the match_ref workflow fix: use the new AminoExtract api functions in GFF related scripts chore: add black line-length configuration to pyproject.toml

…g allowed to inherit from reference genbank file

sonarqubecloud · 2025-09-22T11:04:44Z

Quality Gate passed

Issues
41 New issues
1 Accepted issue

Measures
0 Security Hotspots
0.0% Coverage on New Code
1.8% Duplication on New Code

See analysis details on SonarQube Cloud

raaijmag and others added 11 commits July 10, 2025 09:28

WIP

2a25289

feat: started genbank support

969635b

Merge branch 'version_1.6.0' into IDSBIO-612-add-genbank-support

a05a238

feat: added genbank support

724d3ea

test: added testing for genbank fix: fixed amplicon_covs import error

chore: merge branch 'version_1.6.0' into IDSBIO-612-add-genbank-support

ac81483

test: re-activate e2e tests

d49b539

fix: replace unnecessary f-strings with normal strings fix: fixed some copilot PR review spotted errors

refactor: re-wrote vcf_to_tsv and prepare refs in a modular way

8da6eb8

test: added unit test for vcf_to_tsv

fix: remove own path and remove complexity from amplicon_covs

e1db2e2

refactor: refactored prepare_refs

bb1c6fe

tests: wrote unit test for prepare_refs

chore: merge branch 'IDSBIO-612-add-genbank-support' into IDSBIO-1074…

b365113

…-rework-scripts-dir

florianzwagemaker requested review from MargoRaijmakers, raaijmag and krskovai August 7, 2025 17:07

raaijmag added 4 commits August 12, 2025 09:31

refactor(wip): partial refactoring of the scripts dir

72bcf7d

refactor: re-write all scripts

b8d4e7a

test: write unit tests for some scripts chore: add aminoextract dev branch to env.yml

Merge branch 'IDSBIO-1074-rework-scripts-dir' into workflow_restructure

346e15c

florianzwagemaker commented Sep 3, 2025

View reviewed changes

florianzwagemaker mentioned this pull request Sep 15, 2025

IDSBIO612 - add genbank support #142

Closed

florianzwagemaker added 9 commits September 18, 2025 10:49

refactor: move base_script_class.py location into new file structure

40c7e55

chore: remove unused amplicon_arg_parser file

fb0ebb6

chore: add isort settings to pyproject.toml

2a54ce2

style: formatting with isort & black

ae7e469

fix: enhance genbank parsing with FEATURES column in samplesheet bein…

54db5a1

…g allowed to inherit from reference genbank file

fix: update script path for filter_bed rule in filter.primers.smk

2d75a53

deps: update dependencies to use AminoExtract version 0.4.0

1137c3d

chore: add todo comment to split_genbank() function

413cbe6

florianzwagemaker merged commit 636e1f2 into version_1.6.0 Sep 22, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Restructure of workflow(s) into separate components #143

Restructure of workflow(s) into separate components #143

Uh oh!

florianzwagemaker commented Aug 7, 2025

Uh oh!

florianzwagemaker Sep 3, 2025

Uh oh!

raaijmag Sep 15, 2025

Uh oh!

florianzwagemaker Sep 3, 2025

Uh oh!

raaijmag Sep 15, 2025

Uh oh!

florianzwagemaker Sep 3, 2025

Uh oh!

florianzwagemaker Sep 3, 2025

Uh oh!

raaijmag Sep 15, 2025

Uh oh!

sonarqubecloud bot commented Sep 22, 2025

Uh oh!

Uh oh!

Uh oh!

Restructure of workflow(s) into separate components #143

Restructure of workflow(s) into separate components #143

Uh oh!

Conversation

florianzwagemaker commented Aug 7, 2025

Uh oh!

florianzwagemaker Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

raaijmag Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

florianzwagemaker Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

raaijmag Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

florianzwagemaker Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

florianzwagemaker Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

raaijmag Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud bot commented Sep 22, 2025

Quality Gate passed

Uh oh!

Uh oh!

Uh oh!