Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

polyA estimation from cDNA: polyA estimated, but polyT unreported #1183

Open
liuqianhn opened this issue Dec 17, 2024 · 6 comments
Open

polyA estimation from cDNA: polyA estimated, but polyT unreported #1183

liuqianhn opened this issue Dec 17, 2024 · 6 comments
Labels
polyA Issue related to polyA tail estimation

Comments

@liuqianhn
Copy link

I run the command below on my cDNA data. I found that polyA called for polyA-contained reads, but the polyA estimation is not reported for all polyT-contain reads. May I know how to solve this issue?
Thank you.

Run environment:

  • Dorado version: 0.8.3+98456f7
  • Dorado command: dorado basecaller --device "cuda:0" --recursive --estimate-poly-a
  • Operating system: UNIX
  • Hardware (CPUs, Memory, GPUs): GPU
  • Source data type (e.g., pod5 or fast5 - please note we always recommend converting to pod5 for optimal basecalling performance): pod5
  • Source data location (on device or networked drive - NFS, etc.): nfs
@malton-ont
Copy link
Collaborator

Hi @liuqianhn,

Could you please provide us with a little more information:

  • What kit/chemistry is your sample using?
  • Are you using custom primers?
  • What length polyA/T are you expecting?
  • Does your tail prep include a linker?
  • What proportion of polyT reads are failing to call a tail?
  • How are you determining that the polyTs have not been called? Please note that dorado makes no distinction in its reporting between polyA and polyT, in case you are expecting two sets of data

Can you please post the output of your command with the addition of the -v flag to turn on verbose logging?

Are you able to share any data with us to investigate?

@malton-ont malton-ont added the polyA Issue related to polyA tail estimation label Dec 18, 2024
@liuqianhn
Copy link
Author

@malton-ont Here are some information I have now:

What kit/chemistry is your sample using? https://nanoporetech.com/document/single-cell-transcriptomics-with-cdna-prepared-using-10x
Are you using custom primers? No
What length polyA/T are you expecting? Patient samples. Unknown
Does your tail prep include a linker? NO
What proportion of polyT reads are failing to call a tail? ~50% for several samples

I cannot share the data now since they are patient samples.

Thank you for your help!

@malton-ont
Copy link
Collaborator

Hi @liuqianhn,

If I'm reading that document correctly, this means you are using SQK-LSK114? According to the dorado documentaion, polyA/T estimation for cDNA is supported for PCS and PCB kits only.

@liuqianhn
Copy link
Author

@malton-ont thank you for your reply. Yes, LSK114 is used. I am wondering why only PCS and PCB are supported. I see a lot of reads with polyA calling has pt for LSK114: what do this calling means? Is there any specific settings for PCB and PCS? Where can I find the documentation for this difference? Also, May I know where is the default configuration file for polyA estimation? Thank you.

@malton-ont
Copy link
Collaborator

@liuqianhn,

If I understand correctly, that prep does not include the primers the dorado polyA estimation is looking for. You can specify your own primers by passing a configuration file using the --poly-a-config option. Details of the format are shown in the documentation.

Details of the built-in defaults (including the primer sequences) are available in the code here.

@liuqianhn
Copy link
Author

@malton-ont Thank you very much for your clarification!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
polyA Issue related to polyA tail estimation
Projects
None yet
Development

No branches or pull requests

2 participants