Home

Awesome

PresentationImg

Spatial PCA for WSIs:

Spatial Principal Component Analysis (PCA), proposed by L. Shang and X. Zhou, NAT COM 2022, has been developed to project single cell data into a lower dimensional space while integrating the spatial information into the modelling. Here, we proposed an adaptation of the method for whole slide images (WSIs). To get a low-dimensional representation of these huge images (~20,000 x 20,000 pixels), they are sliced into patches called tiles. For each tile, a vector of features is computed by training a deep learning model; see our Barlow Twins implementation for WSIs. These encoded vectors are independent of the tile positions within a WSI. However, we can assume that tiles that are close to each other are more likely to have a similar representation in feature space than distant tiles, as they are more likely to share common morphological features. To model this assumption, we adapted spatial PCA by removing variable selection and using a multi-samples strategy. Given the quadratic memory and time cost of the algorithm, a random set of vectors must be selected for each patient (~185 tiles per patient), experimentally 50,000 encoded vectors are sufficient to produce a consistent latent space. Intermediate matrices extracted from the SpatialPCA R object created are then used to project new vectors into the low-dimensional space created by the spatial PCA (see supplementary method equation 13 of L. Shang and X. Zhou, NAT COM 2022).

Installation

Organization of the repository

Step 1: Creation of the Spatial PCA latent space

sbatch RunSpatialPCA50K.sh

Description of the process

  1. Load encoded vectors created by a deep-learning model, those ones have to be concatenated in a single csv file such as (see argline path2projectors):
X0X1X2X3...X124X125X126X127img_idsample_idimg_id_cxy
10.010731053-0.017491885-0.053790570.0060576447...-0.0215268790.0388955140.021861676-0.0008289963TNE1019_30721_19585TNE1019TNE1019_30721_195853072119585
20.0031735892-0.0024470983-0.040420897.895916e-05...-0.01900657-0.00672121250.0070669674-0.015635846TNE1019_33409_28801TNE1019TNE1019_33409_288013340928801
  1. Extraction of n random row of in the data frame (n = n_tiles).
  2. Creation of lists of tables of features and coordinates per samples.
  3. Creation of the Spatial PCA considering the first 20 principal components.
  4. Save the SpatialPCA R object and coordinates in output_folder.

Step 2: Projection

sbatch Sbacth_ProjectionByPatient.sh

Description of the process

  1. Load the R SpatialPCA object created in the previous step (see argline parameter spca_obj)
  2. Load the encoded vectors created by a deep-learning model which must be centred and standardised, and must follow the following structure (see argline proj_tab_norm) :
X0X1X2X3...X124X125X126X127img_idsample_idimg_id_cxy
10.5090191117-0.9064313876-2.7269006740.274636068...1.05045662261.92153684401.0672475244-0.0707460975TNE1019_30721_19585TNE1019TNE1019_30721_195853072119585
20.1726495568-0.1714783594-2.496432701-0.016819896...0.04360650541.23251139301.73712225370.3325003079TNE1019_33409_28801TNE1019TNE1019_33409_288013340928801
  1. Extraction of encoded vectors belonging to the patient of interest (see argline sample_id)
  2. The patient's encoded vectors are projected into the latent space of the spatial PCA.
  3. The new tiles representations are saved in the folder defined by the outdir argument under the following file name {outdir}/Proj_{sample_id}.csv.

Step 3: Search Leiden communities

Description of the process

  1. Load all spatial PCA projections concatenated in a single csv file (see argline proj_tab_SPCA) with the following architecture:
img_id_caxis_1axis_2axis_3axis_4...axis_19axis_20sample_idxy
1TNE0001_8065_37633-0.2425984449-1.58220198780.2216062175-0.7004538129...0.06454035980.1015841795TNE0001806537633
2TNE0001_22657_31489-0.8694107393-0.3258183767-0.3124274849-0.1520251365...0.08048248997-0.03595781844TNE00012265731489
  1. Samples randomly n rows (see argline ntiles)
  2. Create a graph based on the K-nearest neighbors of each projection (see argline KNN)
  3. Seach community of nodes according to the Leiden method (see argline Resolution)
  4. Save cluster centroids in a file name {outputdir}/SPCA_centroids_leiden_ntiles_{ntiles}_KNN_{KNN}_Res_{Resolution}_ncluster_{n_clusters_leiden}.csv

Step 4: Assigning a community to each spatial PCA projection

Process description

  1. Load all concatenated PCA spatial projections into a single csv (see argline proj_tab_SPCA). This must be the same file as in step 3.1.
  2. Extract the projections of the patient of interest (see argline sample_id)
  3. Load the coordinates of the centroids of the Leiden communities (see command line centroids_tab), this table must have the following format:
clusteraxis_1axis_2axis_3axis_4...axis_19axis_20
111.07762341320.3351948348-0.561474021-1.1364130733...-0.2101122186-0.1931117565
22-1.46328489790.8883086482-0.3643381155-0.8784518651...-0.01115741980.03596174487
  1. Each projection is assigned to a community according to the minimum distance to one of the centroids of the Leiden communities.
  2. For the patient concerned, the vectors resulting from the projection carried out by the spatial PCA and the Leiden community associated with this projection are recorded in a file with the following format {outdir}/SPCA_centroids_leiden_ntiles100000_KNN_6000_Res_01_{sample_id}.csv. This table will have the following format:
img_id_caxis_1axis_2axis_3axis_4...axis_19axis_20sample_idxycluster
1TNE0001_8065_37633-0.2574942826-1.62766598010.1956646737-0.7829603307...0.08447719010.1201035516TNE00018065376335
2TNE0001_22657_31489-0.8777365627-0.3758480951-0.299188705-0.2703297597...0.04097749198-0.09903248588TNE000122657314898

TO DO LIST