-
Notifications
You must be signed in to change notification settings - Fork 0
Adding functions to drop a percentage of counts and plot FMS #24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds functionality to analyze factorization performance under reduced count conditions by implementing multinomial downsampling and FMS comparison metrics.
- Implements multinomial count downsampling to simulate sequencing depth variations
- Creates FMS comparison framework between full and downsampled datasets
- Adds visualization capabilities for count drop analysis
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| pf2rnaseq/factorization.py | Adds core functions for multinomial downsampling and FMS analysis with count reduction |
| pf2rnaseq/figures/commonFuncs/plotGeneral.py | Implements plotting function for FMS vs count drop percentage visualization |
| pf2rnaseq/figures/figureCountFMS.py | Creates figure demonstrating FMS analysis on cytokine dataset with various count drop percentages |
aarmey
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good stuff. Just a couple of minor comments.
pf2rnaseq/factorization.py
Outdated
| return sampled_data | ||
|
|
||
|
|
||
| def fms_percent_drop_counts_multinomial( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not generally a fan of having functions like this that combine functionality and loops. Make a function that does the processing for one situation, then you can put the loops elsewhere.
pf2rnaseq/factorization.py
Outdated
| results = np.zeros((runs, len(percentList))) | ||
|
|
||
| # Main loop | ||
| for j in range(runs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that you need to sweep across lots of runs. The result should be obvious at 0.5 downsampling, or there isn't a meaningful difference.
| data[start_idx:end_idx] = new_counts.astype(cell_data.dtype) | ||
|
|
||
| # Create new sparse matrix | ||
| sampled_csr = sp.csr_matrix((data, indices, indptr), shape=original_csr.shape) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohh preserving the sparsity is clever.
Added two functions:
downsample_counts_multinomial which will produce a dataset that has a specified percentage of reduced counts for each cells. Use multinomial sampling to construct the downsampled dataset.
fms_percent_drop_counts_multinomial will downsample the data, normalize and then factor. The factorization is compared to the baseline pf2 factorization with full counts. The function assumes that any low expressing genes are already filtered out and will set the geneThreshold to 0 when calling prepare_dataset to ensure that the same number of genes are contained between full count vs downsampled.
The figure CountFMS will plot this using our cytokine dataset.