-
Notifications
You must be signed in to change notification settings - Fork 3
Fixes for NaN Handling & Plot Generation Issues; Tips for Large Datasets #19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Commit 1-3 look good to me. Regarding Commit 4, couple of comments:
|
|
@ym2877 Thank you for your comments! The The Sam2pmp function does not support multithreading but consumes minimal memory (less than 1 GB) and has a long runtime (1~6 hours for a 16 GB fastq file). Therefore, it is suitable for processing multiple samples concurrently. Regarding To enhance clarity and functionality, I have introduced two new arguments to the Thank you for your consideration. I apologize for any confusion caused. Please feel free to adjust the code and documentation I committed as needed. |
Fixed bugs
Commit 1:
fix: Handle NaN values in Spearman correlation by replacing with 0
This commit addresses the issue where NaN values were generated during Spearman correlation computation, leading to errors in the code. By replacing NaN values with 0, the code now handles constant input arrays gracefully and computes the correlation correctly.
Changes:
_spearman_dissimfunction to replace NaN values with 0 in Spearman correlation computation.This change resolves issues 10 and ensures smoother execution of the program.
Commit 2:
fix: Exclude gene position plot when no taxonomy file or gene position file is provided
This commit resolves an issue where the gene position plot failed when no taxonomy file was provided. It ensures that the plotting functionality works correctly.
Changes:
p2toNonewhengenepossisNoneCommit 3:
fix: Fix Bokeh error when generating the plot
This commit addresses a Bokeh error encountered during plot generation.
Changes:
Commit 4:
fix: Introduce
sam2pmp.pyscript for parallel execution of thesam2pmpfunctionThis commit introduces
sam2pmp.py, a script dedicated to independently executing thesam2pmpfunction outside the main project context. Thesam2pmpfunction consumes significantly less memory compared to theicraiterations, making it suitable for parallel execution on large datasets. This approach aims to optimize processing time by preparing data beforeicraexecution.Changes:
sam2pmp.pyscript for standalone execution of thesam2pmpfunction.This enhancement enhances efficiency in data processing workflows, particularly for large-scale datasets.