Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Principle of separation of types? #65

Open
Lao-Tz opened this issue Nov 8, 2023 · 8 comments
Open

Principle of separation of types? #65

Lao-Tz opened this issue Nov 8, 2023 · 8 comments

Comments

@Lao-Tz
Copy link

Lao-Tz commented Nov 8, 2023

Hello, I'm currently using BayesPrism for deconvolution and I have a question.

I'm working with single-cell sequencing data, which includes an equal amount of tumor cells and normal (non-tumor) cells. The bulk data also contains both tumor and normal cells. Suppose I've annotated 30 state subgroups, including CD8+, Plasma cells, etc., and then merged them into 8 type subgroups according to the cell types, such as Lymphocytes, Stromal cells, etc. However, I found that 10 of the state subgroups are only expressed in Tumor, and 5 state subgroups are only expressed in Normal. When viewing these 10 and 5 subgroups from the type dimension, some belong to the same type, such as Lymphocytes, while others do not.

I performed deconvolution in two ways: 1. Merge type subgroups accurately according to state. 2. Mark the type of state subgroups that are only expressed in tumor or normal as Tumor or Normal.

The single-cell data used in the BayesPrism paper did not include normal cells. After reading the BayesPrism paper, I started to dislike the method of CIBERSORT. However, my knowledge is limited and I currently do not have the ability to understand the underlying logic of BayesPrism. I'm not sure whether my analysis design is feasible, so I would like to ask for your opinion.

Both methods of analysis contain some collinearity (probably because there is redundancy in my cell subgroup division). I'm inclined to make the second method interpretable so that I can have a broader subsequent analysis.

By the way, the result of the first method is similar to CIBERSORT, but the second method is quite different

@Lao-Tz Lao-Tz changed the title Principle of separation of types and states? Principle of separation of types? Nov 8, 2023
@tinyi
Copy link
Collaborator

tinyi commented Nov 14, 2023 via email

@Lao-Tz
Copy link
Author

Lao-Tz commented Nov 22, 2023

Hi. Sorry for the late reply. I am not sure I am quite following. Could you elaborate a bit on the relationship between cell states and tumot/normal state? For example how may one cell state be found to exist in both tumor and normal samples? It is also unclear to me how you were trying to construct the reference. Were you trying to construct reference using scRNA datasets from both normal and tumor samples?

On Wed, Nov 8, 2023 at 5:10 PM Lao-Tz @.> wrote: Hello, I'm currently using BayesPrism for deconvolution and I have a question. I'm working with single-cell sequencing data, which includes an equal amount of tumor cells and normal (non-tumor) cells. The bulk data also contains both tumor and normal cells. Suppose I've annotated 30 state subgroups, including CD8+, Plasma cells, etc., and then merged them into 8 type subgroups according to the cell types, such as Lymphocytes, Stromal cells, etc. However, I found that 10 of the state subgroups are only expressed in Tumor, and 5 state subgroups are only expressed in Normal. When viewing these 10 and 5 subgroups from the type dimension, some belong to the same type, such as Lymphocytes, while others do not. I performed deconvolution in two ways: 1. Merge type subgroups accurately according to state. 2. Mark the type of state subgroups that are only expressed in tumor or normal as Tumor or Normal. The single-cell data used in the BayesPrism paper did not include normal cells. After reading the BayesPrism paper, I started to dislike the method of CIBERSORT. However, my knowledge is limited and I currently do not have the ability to understand the underlying logic of BayesPrism. I'm not sure whether my analysis design is feasible, so I would like to ask for your opinion. Both methods of analysis contain some collinearity (probably because there is redundancy in my cell subgroup division). I'm inclined to make the second method interpretable so that I can have a broader subsequent analysis. — Reply to this email directly, view it on GitHub https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FDanko-Lab%2FBayesPrism%2Fissues%2F65&data=05%7C01%7Ctc532%40g.cornell.edu%7C2318bdf0cb6847d05ac408dbe03a8ead%7C5d7e43661b9b45cf8e79b14b27df46e1%7C0%7C0%7C638350314373071193%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=QSkR3gIpQqw5p92H9rrlqERwscw9jzXAX90a3SuowVc%3D&reserved=0, or unsubscribe https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAB4NHSYO2ZYPEKIZ2QFOEYDYDNEALAVCNFSM6AAAAAA7CQUYIOVHI2DSMVQWIX3LMV43ASLTON2WKOZRHE4DGMJQGU3TGNQ&data=05%7C01%7Ctc532%40g.cornell.edu%7C2318bdf0cb6847d05ac408dbe03a8ead%7C5d7e43661b9b45cf8e79b14b27df46e1%7C0%7C0%7C638350314373071193%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=qEerJRpnDrAkcgBr5YYnECWHYaBWD3DcOxzbw0yKmQ4%3D&reserved=0 . You are receiving this because you are subscribed to this thread.Message ID: @.>

Thanks for your reply!
My input data consists of:

  • Single-cell RNA sequencing data: 40 samples, including 30,000 normal cells and 100,000 cancer cells.
  • Bulk RNA sequencing data: Obtained from TCGA, including 350+ cancer samples and 40+ normal samples.

I utilized the LIGER package for semi-supervised data dimensionality reduction and the Seurat package's FindClusters function for clustering. This resulted in the identification of over 30 subclusters. Upon examining the composition of these subclusters in terms of Tumor and Normal, I discovered that more than half of the subclusters were exclusively present in either Tumor or Normal. Consequently, I merged the subclusters exclusive to Tumor or Normal into two types, despite the possibility of dissimilar expression profiles between the subclusters distributed in Normal or Tumor. I set the key as 'Tumor'.

My current approach involves conducting two rounds of BayesPrism analysis. In the first round, I include both Tumor and Normal in the type definition. After deconvolution, I analyze whether the theta values of the types show significant differences between cancer and adjacent tissue in the bulk data. Upon identifying significant differences, I proceed with the second round of deconvolution, using only the subclusters from Tumor and Normal. However, I set their types based on the original cell types. I then analyze the theta values of the type results and perform single-factor Cox survival analysis to select major subclusters associated with survival for further analysis.

@tinyi
Copy link
Collaborator

tinyi commented Nov 23, 2023 via email

@Lao-Tz
Copy link
Author

Lao-Tz commented Nov 24, 2023

# Extracting the 'minor_cluster' and 'group' columns
minor_cluster <- sce@meta.data$minor_cluster
group <- sce@meta.data$group

# Creating a table that lists the count of 'minor_cluster' in each group
cluster_table <- table(minor_cluster, group)

# Finding the 'minor_cluster' with a count of 0 in the 'Normal' and 'Tumor' groups
tumor <- row.names(cluster_table)[cluster_table[, "Normal"] == 0]
normal <- row.names(cluster_table)[cluster_table[, "Tumor"] == 0]

# Setting the corresponding 'major_cluster' and 'minor_cluster' of these clusters as "Tumor Cells" and "Normal Cells"
sce@meta.data$major_cluster[sce@meta.data$minor_cluster %in% tumor] <- "Tumor Cells"
#[email protected]$major_cluster[[email protected]$minor_cluster %in% normal] <- "Normal Cells"

This code does not incorporate Normal Cells, because this code was intercepted in my current working environment. It will be run when BayesPrism is run, so the major_cluster of the following data does not contain Normal Cells.

# first round
> table(sce$minor_cluster,sce$group)
     
      Normal Tumor
  EE1    983     0
  EG1   1705  2248
  EG2    601  1524
  EG3    683    13
  EV1   2413     0
  EV2      0  1183
  EV3      0    31
  GC1   1381  2216
  LB1   3689    97
  LB2      0  2901
  LB3   1330     0
  LB4      0  1184
  LB5      0  2788
  LB6    243     0
  LB7      0   135
  LT1   4011  1501
  LT2   2118  2585
  LT3      0  3067
  LT4      0  1230
  LT5    139   603
  LT6    242     0
  LT7    150     0
  LT8      0   118
  MM1   1471   358
  MM2      0   973
  MM3      0   544
  MN1   1050   473
  MY1    558   443
  NN1   1346     0
  SC1    807     0
  SC2      0   307
  SF1   3234     0
  SM1      0  1110
  TT1      0   535

> table(sce$major_cluster,sce$group)
                   
                    Normal Tumor
  Endocrine Cells     1346     0
  Endothelial Cells   2413     0
  Epithelial Cells    5353  6001
  Lymphocytes        11922  4786
  Myeloid Cells       2521   831
  Stromal Cells       4599   443
  Tumor Cells            0 16106

> table(sce$major_cluster,sce$minor_cluster)
                   
                     EE1  EG1  EG2  EG3  EV1  EV2  EV3  GC1  LB1  LB2  LB3  LB4  LB5  LB6  LB7  LT1  LT2  LT3  LT4  LT5  LT6  LT7  LT8  MM1  MM2  MM3  MN1  MY1  NN1  SC1  SC2  SF1  SM1  TT1
  Endocrine Cells      0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0 1346    0    0    0    0    0
  Endothelial Cells    0    0    0    0 2413    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
  Epithelial Cells   983 3953 2125  696    0    0    0 3597    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
  Lymphocytes          0    0    0    0    0    0    0    0 3786    0 1330    0    0  243    0 5512 4703    0    0  742  242  150    0    0    0    0    0    0    0    0    0    0    0    0
  Myeloid Cells        0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0 1829    0    0 1523    0    0    0    0    0    0    0
  Stromal Cells        0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0 1001    0  807    0 3234    0    0
  Tumor Cells          0    0    0    0    0 1183   31    0    0 2901    0 1184 2788    0  135    0    0 3067 1230    0    0    0  118    0  973  544    0    0    0    0  307    0 1110  535
#second round  (Another Rscript)
Idents(sce) <- "minor_cluster"
NT_keep = table(sce$minor_cluster,sce$group) %>% as.data.frame() %>% filter(Freq == 0) %>% select(Var1)
sce <- subset(sce, idents = NT_keep$Var1)

My Tumor subgroup was sampled by layers, and then merged manually according to the number of cells. My subgroup annotation is based on the first 50 genes of the FindAllMarkers function in seurat package, and some of them may be able to see what cell type it is just by looking at the Top 10 or even the Top 5 genes.
I am a novice in the analysis of single cell sequencing data, and I have always wondered why everyone can annotate tumor cells when it is clear that they are all expression states of tumor microenvironment cells.
Thanks!

<style> </style>
major_cluster minor_cluster1 minor_cluster2
Lymphocytes T cells LT1
Lymphocytes T cells LT2
Lymphocytes B cells LB1
Epithelial Cells Gastric Endocrine Cells EG1
Lymphocytes T cells LT3
Epithelial Cells Gastric Endocrine Cells EG2
Stromal Cells Fibroblasts SF1
Endothelial Cells Vascular Endothelial Cells EV1
Myeloid Cells Macrophages MM1
Lymphocytes B cells LB2
Myeloid Cells Neutrophils MN1
Lymphocytes B cells LB3
Epithelial Cells Gastric Chief Cells GC1
Lymphocytes B cells LB4
Lymphocytes B cells LB5
Stromal Cells Mast Cells SM1
Lymphocytes T cells LT4
Myeloid Cells Macrophages MM2
Stromal Cells Myofibroblasts MY1
Stromal Cells Cancer-associated fibroblasts (CAFs) SC1
Lymphocytes T cells LT5
Epithelial Cells Epithelial Cells EE1
Endocrine Cells Neuroendocrine Cells NN1
Lymphocytes T cells LT6
Lymphocytes B cells LB6
Lymphocytes T cells LT7
Epithelial Cells Gastric Endocrine Cells EG3
Endothelial Cells Vascular Endothelial Cells EV2
Tumor Cells Tumor Cells TT1
Myeloid Cells Monocytes MM3
Stromal Cells Cancer-associated fibroblasts (CAFs) SC2
Lymphocytes B cells LB7
Lymphocytes T cells LT8
Endothelial Cells Vascular Endothelial Cells EV3

@tinyi
Copy link
Collaborator

tinyi commented Nov 26, 2023 via email

@Lao-Tz
Copy link
Author

Lao-Tz commented Nov 27, 2023

My "Tumor" and "Normal" here are the markers of "Cancer" and "Adjacent tissues" in the original data. I don't have enough experience to distinguish malignant and non-malignant cells, or I don't know how everyone does it, because I am the only one in our laboratory who is groping for single cell sequencing analysis.

Through pie chart, I observed the distribution of subgroups after dimensionality reduction of LIGER package clustering and FindClusters function, and tried to choose the parameters with the greatest difference between cancer and adjacent cancer, which resulted in lymphocytes and others appearing in "Tumor" and "Normal".

Therefore, in the case that the state subgroup only distributed in "Tumour" and "Normal" accounts for almost half, I consider extracting the state subgroup only distributed in "Tumour" and "Normal" and merging it into the type subgroup, and I don't set the key to run BayesPrism. I think it is still convincing.

Subsequently, I intend to use the type subgroup screened from here for Monocle and iTalk analysis, run WGCNA on the results of state and CIBERSORT, select the results with better results, intersect the above processes to find the key prognostic genes and build a gene model, which completes my exploration of single cell data at this stage.

Can you give me some advice for a beginner?
Thank you for your reply!

@Lao-Tz
Copy link
Author

Lao-Tz commented Nov 28, 2023

I found my problem. There are so many zero values because my merge function doesn't match. It's over. I have to do it again.

@Lao-Tz
Copy link
Author

Lao-Tz commented Dec 2, 2023

I used copyKAT to find that the effect was not very good, so I used endothelial cells as annotations_file to run inferCNV and found that half of the epithelial cell subsets were obviously malignant, but this was far from the number of malignant cells in BayesPrism's paper. I found that my scRNA data has a lot of lymphocytes after dimensionality reduction clustering, and the lymphocytes have TCR or BCR copy number variation, and the lymphocytes in the cancer I studied do not seem to be malignant. So I still have doubts about how this type data should be constructed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants