Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

This solver needs samples of at least 2 classes in the data #12

Open
skudashev opened this issue Jun 19, 2024 · 0 comments
Open

This solver needs samples of at least 2 classes in the data #12

skudashev opened this issue Jun 19, 2024 · 0 comments

Comments

@skudashev
Copy link

skudashev commented Jun 19, 2024

My issue is very similar to #9 (comment)

Parsing BAM file: chr22_alignments.sorted.bam
Identified 182998 introns
Annotated introns file /ei/projects/8/8289c66d-2d56-4706-a307-5a9a3eb3747e/data/Annotations/gencode.v44.annotated_juncs.bed provided
Identified 402454 annotated introns
debug: Tree structure:
debug: |--- jad <= 71.50
debug: |   |--- class: 0
debug: |--- jad >  71.50
debug: |   |--- is_canonical_motif <= 0.50
debug: |   |   |--- class: 0
debug: |   |--- is_canonical_motif >  0.50
debug: |   |   |--- class: 0
debug: Decision tree 1 confusion matrix:
debug: [[177013      0]
debug:  [  5985      0]]
Fetching junction sequences from /ei/projects/3/31655266-640a-41d2-8663-59bba38bc3c4/data/data/References/hg38_sequin.fa
Identified 132451 unique donors and 127498 unique acceptors
Scoring donor sequences with LR...
pgrep: /nbi/software/production/bin/core/../..//hpccore/5/x86_64/lib/liblzma.so.5: no version information available (required by /lib64/libsystemd.so.0)
pgrep: /nbi/software/production/bin/core/../..//hpccore/5/x86_64/lib/liblzma.so.5: no version information available (required by /lib64/libsystemd.so.0)
pgrep: /nbi/software/production/bin/core/../..//hpccore/5/x86_64/lib/liblzma.so.5: no version information available (required by /lib64/libsystemd.so.0)
pgrep: /nbi/software/production/bin/core/../..//hpccore/5/x86_64/lib/liblzma.so.5: no version information available (required by /lib64/libsystemd.so.0)
pgrep: /nbi/software/production/bin/core/../..//hpccore/5/x86_64/lib/liblzma.so.5: no version information available (required by /lib64/libsystemd.so.0)
pgrep: /nbi/software/production/bin/core/../..//hpccore/5/x86_64/lib/liblzma.so.5: no version information available (required by /lib64/libsystemd.so.0)
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/ei/software/testing/python_miniconda/4.10.3_py3.9_sk/x86_64/envs/2passtools/lib/python3.6/site-packages/joblib/externals/loky/process_executor.py", line 436, in _process_worker
    r = call_item()
  File "/ei/software/testing/python_miniconda/4.10.3_py3.9_sk/x86_64/envs/2passtools/lib/python3.6/site-packages/joblib/externals/loky/process_executor.py", line 288, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "/ei/software/testing/python_miniconda/4.10.3_py3.9_sk/x86_64/envs/2passtools/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "/ei/software/testing/python_miniconda/4.10.3_py3.9_sk/x86_64/envs/2passtools/lib/python3.6/site-packages/joblib/parallel.py", line 263, in __call__
    for func, args, kwargs in self.items]
  File "/ei/software/testing/python_miniconda/4.10.3_py3.9_sk/x86_64/envs/2passtools/lib/python3.6/site-packages/joblib/parallel.py", line 263, in <listcomp>
    for func, args, kwargs in self.items]
  File "/ei/software/testing/python_miniconda/4.10.3_py3.9_sk/x86_64/envs/2passtools/lib/python3.6/site-packages/lib2pass/seqlr.py", line 39, in train_and_predict
    lr.fit(X_train, y_train)
  File "/ei/software/testing/python_miniconda/4.10.3_py3.9_sk/x86_64/envs/2passtools/lib/python3.6/site-packages/sklearn/linear_model/_logistic.py", line 1376, in fit
    " class: %r" % classes_[0])
ValueError: This solver needs samples of at least 2 classes in the data, but the data contains only one class: 0
"""

I followed the instructions and then ran 2passtools with DEBUG on.

paftools.js gff2bed -j gencode.v44.annotation.gtf > gencode.v44.annotated_juncs.bed 
2passtools score -v DEBUG -f /ei/projects/3/31655266-640a-41d2-8663-59bba38bc3c4/data/data/References/hg38_sequin.fa -p 24 \
    -a /ei/projects/8/8289c66d-2d56-4706-a307-5a9a3eb3747e/data/Annotations/gencode.v44.annotated_juncs.bed --classifier-type decision_tree \
    -m "GTAG|GCAG|ATAG" -j 4 --keep-all-annot -o iPSC.merged.juncs.all.bed $subset_bam 
head -n 5  /ei/projects/8/8289c66d-2d56-4706-a307-5a9a3eb3747e/data/Annotations/gencode.v44.annotated_juncs.bed
chr1	12227	12612	ENST00000456328.2|lncRNA|DDX11L2	1000	+
chr1	12721	13220	ENST00000456328.2|lncRNA|DDX11L2	1000	+
chr1	12057	12178	ENST00000450305.2|transcribed_unprocessed_pseudogene|DDX11L1	1000	+
chr1	12227	12612	ENST00000450305.2|transcribed_unprocessed_pseudogene|DDX11L1	1000	+
chr1	12697	12974	ENST00000450305.2|transcribed_unprocessed_pseudogene|DDX11L1	1000	+

Could this be something to do with my canonical motifs? Also my JAD is set to 4 but the tree structure says jad <= 71.50, is this correct?

Kind regards,
Sofia

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant