You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Spatial Principal Component Analysis (PCA), proposed by L. Shang and X. Zhou, NAT COM 2022, has been developed to project single cell data into a lower dimensional space while integrating the spatial information into the modelling. Here, we proposed an adaptation of the method for whole slide images (WSIs). To get a low-dimensional representation of these huge images (~20,000 x 20,000 pixels), they are sliced into patches called tiles. For each tile, a vector of features is computed by training a deep learning model; see our Barlow Twins implementation for WSIs. These encoded vectors are independent of the tile positions within a WSI. However, we can assume that tiles that are close to each other are more likely to have a similar representation in feature space than distant tiles, as they are more likely to share common morphological features. To model this assumption, we adapted spatial PCA by removing variable selection and using a multi-samples strategy. Given the quadratic memory and time cost of the algorithm, a random set of vectors must be selected for each patient (~185 tiles per patient), experimentally 50,000 encoded vectors are sufficient to produce a consistent latent space. Intermediate matrices extracted from the SpatialPCA R object created are then used to project new vectors into the low-dimensional space created by the spatial PCA (see supplementary method equation 13 of L. Shang and X. Zhou, NAT COM 2022).
Method used to obtain a spatially informed low-diension represention in "Assessment of the current and emerging criteria for the histopathological classification of lung neuroendocrine tumours in the lungNENomics project." ESMO Open 2023 (under review)
Installation
Clone this repository: tested on R 4.1.2
All needed packages will be install automatocally when the script will be launches
Please note that the original functions of Spatial PCA package are override by the ones in ImgSpatialPCA.R and ImgSpatialPCAMultipleSamles.R.
Organization of the repository
RunMultiSPCARandomSampling.R allows the Spatial PCA to be run.
ImgSpatialPCA.R contains the based function to create the Spatial PCA and overrides CreateSpatialPCAObject function of the original package.
ImgSpatialPCAMultipleSamles.R adapted the Spatial PCA to several samples and overrides SpatialPCA_Multiple_Sample function of the original package.
Step 1: Creation of the Spatial PCA latent space
To create a spatial PCA R object run RunMultiSPCARandomSampling.R an example of configuration file is given in RunSpatialPCA50K.sh
Command line for cluster running with slurm
sbatch RunSpatialPCA50K.sh
Description of the process
Load encoded vectors created by a deep-learning model, those ones have to be concatenated in a single csv file such as (see argline path2projectors):
X0
X1
X2
X3
...
X124
X125
X126
X127
img_id
sample_id
img_id_c
x
y
1
0.010731053
-0.017491885
-0.05379057
0.0060576447
...
-0.021526879
0.038895514
0.021861676
-0.0008289963
TNE1019_30721_19585
TNE1019
TNE1019_30721_19585
30721
19585
2
0.0031735892
-0.0024470983
-0.04042089
7.895916e-05
...
-0.01900657
-0.0067212125
0.0070669674
-0.015635846
TNE1019_33409_28801
TNE1019
TNE1019_33409_28801
33409
28801
Extraction of n random row of in the data frame (n = n_tiles).
Creation of lists of tables of features and coordinates per samples.
Creation of the Spatial PCA considering the first 20 principal components.
Save the SpatialPCA R object and coordinates in output_folder.
⚠️ WARNING ⚠️
For a representation containing 100,000 encoded vectors, a machine with 300 GB of RAM is required, and the R object that is created has a size of 6 GB.
The encoded vectors must not be normalised, this step is included in the pipeline.
Step 2: Projection
To project additional tiles' repsentations onto the low dimensional space created by the spatial PCA, the script ProjectionElaboratedByPatient.R can be used. An example of configuration is given in Sbacth_ProjectionByPatient.sh
Command line on a cluster working with slurm:
sbatch Sbacth_ProjectionByPatient.sh
Description of the process
Load the R SpatialPCA object created in the previous step (see argline parameter spca_obj)
Load the encoded vectors created by a deep-learning model which must be centred and standardised, and must follow the following structure (see argline proj_tab_norm) :
X0
X1
X2
X3
...
X124
X125
X126
X127
img_id
sample_id
img_id_c
x
y
1
0.5090191117
-0.9064313876
-2.726900674
0.274636068
...
1.0504566226
1.9215368440
1.0672475244
-0.0707460975
TNE1019_30721_19585
TNE1019
TNE1019_30721_19585
30721
19585
2
0.1726495568
-0.1714783594
-2.496432701
-0.016819896
...
0.0436065054
1.2325113930
1.7371222537
0.3325003079
TNE1019_33409_28801
TNE1019
TNE1019_33409_28801
33409
28801
Extraction of encoded vectors belonging to the patient of interest (see argline sample_id)
The patient's encoded vectors are projected into the latent space of the spatial PCA.
The new tiles representations are saved in the folder defined by the outdir argument under the following file name {outdir}/Proj_{sample_id}.csv.
Step 3: Search Leiden communities
To search for morphological clusters we applied the Leiden community detection methods on the low dimensionnal representations of the the tiles obtained through the spatial PCA.
The algorithm is implemented in LeidenCommunitySpatialPCA.R, the configuration used in our ESMO open paper are described in RunLeidenCommunity.sh.
Description of the process
Load all spatial PCA projections concatenated in a single csv file (see argline proj_tab_SPCA) with the following architecture:
img_id_c
axis_1
axis_2
axis_3
axis_4
...
axis_19
axis_20
sample_id
x
y
1
TNE0001_8065_37633
-0.2425984449
-1.5822019878
0.2216062175
-0.7004538129
...
0.0645403598
0.1015841795
TNE0001
8065
37633
2
TNE0001_22657_31489
-0.8694107393
-0.3258183767
-0.3124274849
-0.1520251365
...
0.08048248997
-0.03595781844
TNE0001
22657
31489
Samples randomly n rows (see argline ntiles)
Create a graph based on the K-nearest neighbors of each projection (see argline KNN)
Seach community of nodes according to the Leiden method (see argline Resolution)
Save cluster centroids in a file name {outputdir}/SPCA_centroids_leiden_ntiles_{ntiles}_KNN_{KNN}_Res_{Resolution}_ncluster_{n_clusters_leiden}.csv
Step 4: Assigning a community to each spatial PCA projection
The ClosestCentroids.R script is used to assign a community to each projection from the spatial PCA, based on the minimum distance between a projection and the centroids of the Leiden communities. An example of a slurm query is given in RunClosestCentroids.sh.
Process description
Load all concatenated PCA spatial projections into a single csv (see argline proj_tab_SPCA). This must be the same file as in step 3.1.
Extract the projections of the patient of interest (see argline sample_id)
Load the coordinates of the centroids of the Leiden communities (see command line centroids_tab), this table must have the following format:
cluster
axis_1
axis_2
axis_3
axis_4
...
axis_19
axis_20
1
1
1.0776234132
0.3351948348
-0.561474021
-1.1364130733
...
-0.2101122186
-0.1931117565
2
2
-1.4632848979
0.8883086482
-0.3643381155
-0.8784518651
...
-0.0111574198
0.03596174487
Each projection is assigned to a community according to the minimum distance to one of the centroids of the Leiden communities.
For the patient concerned, the vectors resulting from the projection carried out by the spatial PCA and the Leiden community associated with this projection are recorded in a file with the following format {outdir}/SPCA_centroids_leiden_ntiles100000_KNN_6000_Res_01_{sample_id}.csv. This table will have the following format: