Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

many to many warning when assigning annotations to markers in scRNAseq #112

Open
hwick opened this issue Dec 15, 2023 · 0 comments
Open

Comments

@hwick
Copy link
Contributor

hwick commented Dec 15, 2023

In https://hbctraining.github.io/scRNA-seq_online/lessons/09_merged_SC_marker_identification.html
adding annotations in the loop using the following method results in a many-to-many warning (when using methods to download annotations using annotationHub). This is because unique(annotations[, c("gene_name", "description")]) only finds unique gene_name/description combinations, but there are often gene repeats where genes with the same gene_name have slightly different descriptions due to being from different seq_names (such as being from a patch). For me, the many-to-many warning does not pop up when in the loop below but pops up if assigning to a single object, like the results of running FindAllMarkers. Despite the warning not popping up in the loop, I found instances of repeat gene names in my FindConservedMarkers output, with the same p values, fold changes, etc (this is what tipped me off to investigate this problem).

I downloaded annotations from annotationHub using methods outlined in another lesson section, rather than the annotations.csv example file. I'm not sure if this issue exists in the annotations.csv file, but regardless it creates some issue applying the lesson to real world analysis.

get_conserved <- function(cluster){
  FindConservedMarkers(seurat_integrated,
                       ident.1 = cluster,
                       grouping.var = "sample",
                       only.pos = TRUE) %>%
    rownames_to_column(var = "gene") %>%
    left_join(y = unique(annotations[, c("gene_name", "description")]),
               by = c("gene" = "gene_name")) %>%
    cbind(cluster_id = cluster, .)
  }

> dim(annotations) #full size
[1] 71440     5
> dim( unique(annotations[, c("gene_name", "description")])) #just unique gene_name/description combos, as used in the above loops
[1] 47618     2
> length(unique(annotations$gene_name)) #just unique gene_names
[1] 40902
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant