Skip to content

neurospin/point-cloud-pattern-mining

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Point-Cloud-Pattern-Mining PCPM

DOI

This library can be used to compare entities described as sets of 3D point clouds (like cortical sulci or fibers bundles).

The analysis starts with the initial calculation of a geometrical distance among the different N point clouds (aka subjects), resulting in a N by N distance matrix. Then multidimensional scaling via isomap is applied to infer the relationships that link entities sharing common features. The isomap algorithm estimates the intrinsic geometry of the data manifold based on a rough estimate of each data point’s neighbors. Then, the geodesic distances among the subjects are recalculated on the inferred manifold. The top n eigenvectors of the geodesic distance matrix represent the coordinates in the new n-dimensional Euclidean space.

The point clouds are then studied in this new n-dimensional space. Nevertheless, direct observation of all the point-clouds is not simple, especially if the number of subjects is large. In order to make a synthetic description of the analysis result, a set of weighted averages are calculated along a chosen axis (see figure 1) or on given regions of the newly defined n-dimensional space, possibly specified by a clustering algorithm.

install

clone this repo somewhere where you will keep it, then in the root folder install with pip3:

$ cd point-colud-pattern-mining
$ pip3 install -e .

the -e options causes the install to link the module to the cloned folder, instead of installing it in the default module location.

Example

import pcpm
# The point clouds need to be stored in a dictionnary
# This example works uses 1765 point-clouds representing the Central Sulcus
# of the Brain of different human subjects
len(pcs)
1765
# One point-cloud is an array of 3D points
pcs['L295146'].shape
(2382, 3)

png

Distance calculation


The first step consist in calculating a pair-wise distance among the point-clouds, resulting in an NxN distance matrix.

This can be done for example via ICP (Iterative Closest Point).

To this purpose, it is possible to use the pcpm_icp command line tool, which is installed with this package.

# We have calculated the ICP on a subset of 800 subjects with the pcpm_icp command line tool
# The results can be loaded with the following command
icp = pcpm.load_icp_result("Data/output/CSSyl_sulci_bigpieceRedo_data_distances.csv")

# selects only a subset of the point clouds
names = icp.names # the subjects names
pcs = pcpm.subset_of_pcs(pcs, names)
len(pcs)
800
# To each subject is associated a vector of coordinates in a space called 'embedding' corresponding to the distances from all the others
embedding = icp.dist_min     # use the ICP minimum distance matrix as embedding for the point clouds
embedding.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
R397861 R211417 R114419 L467351 R100610 L207628 L521331 L194140 R187345 R171532 ... R415837 L525541 R172534 L150524 L131217 L877168 R192035 L172938 R201717 L594156
R397861 0.000000 5.664300 5.737526 5.210066 7.683856 7.213072 12.411116 8.382079 5.947090 3.567614 ... 8.604815 9.117817 5.670137 11.083932 3.857694 9.753024 7.401399 6.783885 3.420082 8.577278
R211417 5.664300 0.000000 3.242321 4.050629 4.090161 7.235344 7.856446 11.322169 4.222616 7.290179 ... 5.461645 7.236711 3.584346 8.927880 3.623428 5.401354 6.704233 5.868034 7.348890 4.157319
R114419 5.737526 3.242321 0.000000 4.365764 7.590033 8.349736 9.058546 9.659633 3.837825 8.784393 ... 7.367931 7.148141 5.012304 7.509707 4.103284 6.236840 5.173184 6.318310 6.494448 3.889136
L467351 5.210066 4.050629 4.365764 0.000000 7.923952 11.941841 11.090422 8.219851 7.273594 8.186157 ... 11.431063 9.123229 7.413541 11.951092 4.556152 6.820432 8.444611 6.373734 6.571048 8.746338
R100610 7.683856 4.090161 7.590033 7.923952 0.000000 5.732312 8.962444 16.892410 7.464225 6.764188 ... 5.639178 8.992041 5.370903 10.090275 6.870065 6.116426 7.772079 4.246188 9.409130 7.164085

5 rows × 800 columns

# Load ICP resultsx
icp = pcpm.load_icp_result("Data/output/CSSyl_sulci_bigpieceRedo_data_distances.csv")
pcs = pcpm.subset_of_pcs(pcs, icp.names)

Alignement of the point-clouds


The point-clouds can be aligned to a particular subject. In this case, ICP is used to calculate and apply rotations and translations that minimize the distance of every point-cloud with the reference.

# the central subject is the point-cloud which is closest to all others in the embeddingnames = icp.names            # the subjects names
central_subject = pcpm.find_central_pcs_name(icp.dist_min)
print("Center subject:", central_subject)
Center subject: R192540
# the return values are dictionnaries, with the same keys as the pcs
aligned_pcs, aligned_rot, aligned_transl = pcpm.align_pcs(pcs, central_subject)
>>> INFO pcpm.transform - using 45 cores out of 48
Aligning point-clouds to R192540: 100%|██████████| 800/800 [02:48<00:00,  4.76it/s]

Dimensionality Reduction by ISOMAP


The following step is a dimensionality reduction of the embedding by ISOMAP.

embedding = pcpm.get_isomap(icp.dist_min, neighbors=3, components=5)
embedding.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
1 2 3 4 5
R397861 -7.521278 2.663364 -0.522464 4.029337 -1.052588
R211417 -0.180709 -2.385808 -0.892769 0.586591 -1.362598
R114419 -2.792592 -2.784301 1.283677 -0.720942 -0.313043
L467351 -4.249241 -3.364337 1.571591 2.045751 4.365674
R100610 -0.627027 3.599063 -6.467169 -1.535264 -2.066531
# Plot the embedding.
# The diagonal of this plot are the histgrams of the columns of the embedding matrix
# The off-diagonal scatter-plots represent a sub-space of the embedding where
# each point corresponds to a subject and its corrdinates are taken from two different
# columns of the embedding matrix
pcpm.plot.isomap_embedding(embedding)

png

Clustering


In the search for patterns in the point clouds, we do a clustering of the subjects in the embedding.

The choice of the clustering method is left to the used. Here we give an example with k-means and DBscan

# K-MEANS ==========================
from sklearn.cluster import KMeans
# select a 2D subspace of the embedding
sub_embedding = embedding.loc[:,[1,3]]
# do the clustering you prefer
k=3
kmeans = KMeans(n_clusters=k, random_state=0).fit(sub_embedding.values.reshape(-1, len(sub_embedding.iloc[0])))
cluster_centers = kmeans.cluster_centers_
# split the aligned point-clouds in clusters
pcpm.plot.clustering(sub_embedding, labels=kmeans.labels_)

png

# DBSCAN ==========================
from sklearn.cluster import DBSCAN
sub_embedding = embedding.loc[:,[1,3]]
dbscan = DBSCAN(eps=0.48, min_samples=3).fit(sub_embedding.values.reshape(-1, len(sub_embedding.iloc[0])))
pcpm.plot.clustering(sub_embedding, labels=dbscan.labels_)

png

# Split the point-clouds in clusters
clusters = pcpm.split_pcs_in_clusters(aligned_pcs, embedding=sub_embedding, labels=dbscan.labels_)

n = 2
larges_clusters, other_pcs = pcpm.clusters.get_n_largest_clusters(clusters, n)
cluster_names = list(larges_clusters.keys())
cluster_names

Averages


Now that clusters are defined, the next step consists in calculating the cluster averages

# caluculate the average of the point-clouds within each cluster.
averages = pcpm.average_each_cluster(larges_clusters,
                                     sub_embedding,
                                     FWHM=10e4,     # Full Width at Half Maximum for gaussian weights
                                     align=False     # if true align the point-clouds to the central point-cloud in the cluster beforea averaging
                                    )
av = averages['0']
av.central_name
'R142626'

Meshes


Building meshes of the moving averages. This part requires Brainvisa

Meshes of the averages

mesh_params = dict(
    gblur_sigma=0.5,
    threshold=0.5,
    deciReductionRate=99,
    deciMaxError=1.0,
    deciMaxClearance=3.0,
    smoothRate=0.4,
    smoothIt=30
)

meshes = dtb.mesh_of_averages(averages, in_embedding=False, **mesh_params)

# shift the mehes on one axis
# meshes_shifted = {label:dtb.mesh.shift_aims_mesh_along_axis(m, int(label)*30, 1) for label,m in meshes.items()}

meshes_in_emmbedding = dtb.mesh_of_averages(averages, in_embedding=True, embedding_scale=8, **mesh_params)

png

Meshes of the point clouds

# make the meshing of the point-clouds in a cluster
cluster_meshes = dtb.mesh_of_point_clouds(clusters['0'], embedding=sub_embedding, embedding_scale=8)
meshing...: 100%|██████████| 225/225 [00:04<00:00, 50.56it/s]
rebuilding results: 100%|██████████| 225/225 [00:24<00:00,  9.28it/s]
# shift the meshes according to the distance from the center of the cluster
distances = pcpm.clusters.calc_distances_from_central(clusters['0'], sub_embedding)

shifted_meshes = {}
for name,mesh in cluster_meshes.items():
    shifted_meshes[name] = dtb.mesh.shift_aims_mesh_along_axis(mesh, offset=distances[name], axis=1, scale=200)
<matplotlib.image.AxesImage at 0x7f42bb6b9eb8>

png