Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🔥 add chromosome scaffold filtering #88

Open
wants to merge 1 commit into
base: old
Choose a base branch
from

Conversation

mbuttner
Copy link

This PR provides a fix to the issue stated in aertslab/scenicplus#61.
How it's done: Scaffold chromosomes are filtered out in the export_pseudobulk function when the fragments are loaded as DataFrame using a regular expression and the pandas pd.Series.str.contains() function. I introduced a new parameter for the export_pseudobulk function called chrom_filter = None.

Example following the SCENIC+ tutorial for 10X multiome data:

from pycisTopic.pseudobulk_peak_calling import export_pseudobulk
bw_paths, bed_paths = export_pseudobulk(input_data = cell_data,
                 variable = 'celltype',                                                                     # variable by which to generate pseubulk profiles, in this case we want pseudobulks per celltype
                 sample_id_col = 'sample_id',
                 chromsizes = chromsizes,
                 bed_path = os.path.join(work_dir, 'scATAC/consensus_peak_calling/pseudobulk_bed_files/'),  # specify where pseudobulk_bed_files should be stored
                 bigwig_path = os.path.join(work_dir, 'scATAC/consensus_peak_calling/pseudobulk_bw_files/'),# specify where pseudobulk_bw_files should be stored
                 path_to_fragments = fragments_dict,                                                        # location of fragment files
                 chrom_filter = "GL|KI",
                 n_cpu = 8,                                                                                 # specify the number of cores to use, we use ray for multi processing
                 normalize_bigwig = True,
                 remove_duplicates = True,
                 _temp_dir = tmp_dir,
                 split_pattern = '-')

Output:

2023-08-22 11:16:41,366 cisTopic     INFO     Reading fragments from ../atac_fragments.tsv.gz
2023-08-22 11:19:37,550 cisTopic     INFO     Filtering out 33056 fragments.
2023-08-22 11:20:42,732	INFO worker.py:1627 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265/ 
(export_pseudobulk_ray pid=3011257) 2023-08-22 11:20:46,836 cisTopic     INFO     Creating pseudobulk for CT1
(export_pseudobulk_ray pid=3011259) 2023-08-22 11:20:46,829 cisTopic     INFO     Creating pseudobulk for CT2
(export_pseudobulk_ray pid=3011259) 2023-08-22 11:20:47,958 cisTopic     INFO     Creating pseudobulk for CT3
(export_pseudobulk_ray pid=3011259) 2023-08-22 11:20:50,278 cisTopic     INFO     CT3 done!

Thank you for considering.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant