This library can be used to compare entities described as sets of 3D point clouds (like cortical sulci or fibers bundles).
The analysis starts with the initial calculation of a geometrical distance among the different N point clouds (aka subjects), resulting in a N by N distance matrix. Then multidimensional scaling via isomap is applied to infer the relationships that link entities sharing common features. The isomap algorithm estimates the intrinsic geometry of the data manifold based on a rough estimate of each data point’s neighbors. Then, the geodesic distances among the subjects are recalculated on the inferred manifold. The top n eigenvectors of the geodesic distance matrix represent the coordinates in the new n-dimensional Euclidean space.
The point clouds are then studied in this new n-dimensional space. Nevertheless, direct observation of all the point-clouds is not simple, especially if the number of subjects is large. In order to make a synthetic description of the analysis result, a set of weighted averages are calculated along a chosen axis (see figure 1) or on given regions of the newly defined n-dimensional space, possibly specified by a clustering algorithm.
clone this repo somewhere where you will keep it, then in the root folder install with pip3
:
$ cd point-colud-pattern-mining
$ pip3 install -e .
the -e
options causes the install to link the module to the cloned folder, instead of installing it in the default module location.
import pcpm
# The point clouds need to be stored in a dictionnary
# This example works uses 1765 point-clouds representing the Central Sulcus
# of the Brain of different human subjects
len(pcs)
1765
# One point-cloud is an array of 3D points
pcs['L295146'].shape
(2382, 3)
The first step consist in calculating a pair-wise distance among the point-clouds, resulting in an NxN distance matrix.
This can be done for example via ICP (Iterative Closest Point).
To this purpose, it is possible to use the pcpm_icp
command line tool, which is installed with this package.
# We have calculated the ICP on a subset of 800 subjects with the pcpm_icp command line tool
# The results can be loaded with the following command
icp = pcpm.load_icp_result("Data/output/CSSyl_sulci_bigpieceRedo_data_distances.csv")
# selects only a subset of the point clouds
names = icp.names # the subjects names
pcs = pcpm.subset_of_pcs(pcs, names)
len(pcs)
800
# To each subject is associated a vector of coordinates in a space called 'embedding' corresponding to the distances from all the others
embedding = icp.dist_min # use the ICP minimum distance matrix as embedding for the point clouds
embedding.head()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
R397861 | R211417 | R114419 | L467351 | R100610 | L207628 | L521331 | L194140 | R187345 | R171532 | ... | R415837 | L525541 | R172534 | L150524 | L131217 | L877168 | R192035 | L172938 | R201717 | L594156 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
R397861 | 0.000000 | 5.664300 | 5.737526 | 5.210066 | 7.683856 | 7.213072 | 12.411116 | 8.382079 | 5.947090 | 3.567614 | ... | 8.604815 | 9.117817 | 5.670137 | 11.083932 | 3.857694 | 9.753024 | 7.401399 | 6.783885 | 3.420082 | 8.577278 |
R211417 | 5.664300 | 0.000000 | 3.242321 | 4.050629 | 4.090161 | 7.235344 | 7.856446 | 11.322169 | 4.222616 | 7.290179 | ... | 5.461645 | 7.236711 | 3.584346 | 8.927880 | 3.623428 | 5.401354 | 6.704233 | 5.868034 | 7.348890 | 4.157319 |
R114419 | 5.737526 | 3.242321 | 0.000000 | 4.365764 | 7.590033 | 8.349736 | 9.058546 | 9.659633 | 3.837825 | 8.784393 | ... | 7.367931 | 7.148141 | 5.012304 | 7.509707 | 4.103284 | 6.236840 | 5.173184 | 6.318310 | 6.494448 | 3.889136 |
L467351 | 5.210066 | 4.050629 | 4.365764 | 0.000000 | 7.923952 | 11.941841 | 11.090422 | 8.219851 | 7.273594 | 8.186157 | ... | 11.431063 | 9.123229 | 7.413541 | 11.951092 | 4.556152 | 6.820432 | 8.444611 | 6.373734 | 6.571048 | 8.746338 |
R100610 | 7.683856 | 4.090161 | 7.590033 | 7.923952 | 0.000000 | 5.732312 | 8.962444 | 16.892410 | 7.464225 | 6.764188 | ... | 5.639178 | 8.992041 | 5.370903 | 10.090275 | 6.870065 | 6.116426 | 7.772079 | 4.246188 | 9.409130 | 7.164085 |
5 rows × 800 columns
# Load ICP resultsx
icp = pcpm.load_icp_result("Data/output/CSSyl_sulci_bigpieceRedo_data_distances.csv")
pcs = pcpm.subset_of_pcs(pcs, icp.names)
The point-clouds can be aligned to a particular subject. In this case, ICP is used to calculate and apply rotations and translations that minimize the distance of every point-cloud with the reference.
# the central subject is the point-cloud which is closest to all others in the embeddingnames = icp.names # the subjects names
central_subject = pcpm.find_central_pcs_name(icp.dist_min)
print("Center subject:", central_subject)
Center subject: R192540
# the return values are dictionnaries, with the same keys as the pcs
aligned_pcs, aligned_rot, aligned_transl = pcpm.align_pcs(pcs, central_subject)
>>> INFO pcpm.transform - using 45 cores out of 48
Aligning point-clouds to R192540: 100%|██████████| 800/800 [02:48<00:00, 4.76it/s]
The following step is a dimensionality reduction of the embedding by ISOMAP.
embedding = pcpm.get_isomap(icp.dist_min, neighbors=3, components=5)
embedding.head()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
1 | 2 | 3 | 4 | 5 | |
---|---|---|---|---|---|
R397861 | -7.521278 | 2.663364 | -0.522464 | 4.029337 | -1.052588 |
R211417 | -0.180709 | -2.385808 | -0.892769 | 0.586591 | -1.362598 |
R114419 | -2.792592 | -2.784301 | 1.283677 | -0.720942 | -0.313043 |
L467351 | -4.249241 | -3.364337 | 1.571591 | 2.045751 | 4.365674 |
R100610 | -0.627027 | 3.599063 | -6.467169 | -1.535264 | -2.066531 |
# Plot the embedding.
# The diagonal of this plot are the histgrams of the columns of the embedding matrix
# The off-diagonal scatter-plots represent a sub-space of the embedding where
# each point corresponds to a subject and its corrdinates are taken from two different
# columns of the embedding matrix
pcpm.plot.isomap_embedding(embedding)
In the search for patterns in the point clouds, we do a clustering of the subjects in the embedding.
The choice of the clustering method is left to the used. Here we give an example with k-means and DBscan
# K-MEANS ==========================
from sklearn.cluster import KMeans
# select a 2D subspace of the embedding
sub_embedding = embedding.loc[:,[1,3]]
# do the clustering you prefer
k=3
kmeans = KMeans(n_clusters=k, random_state=0).fit(sub_embedding.values.reshape(-1, len(sub_embedding.iloc[0])))
cluster_centers = kmeans.cluster_centers_
# split the aligned point-clouds in clusters
pcpm.plot.clustering(sub_embedding, labels=kmeans.labels_)
# DBSCAN ==========================
from sklearn.cluster import DBSCAN
sub_embedding = embedding.loc[:,[1,3]]
dbscan = DBSCAN(eps=0.48, min_samples=3).fit(sub_embedding.values.reshape(-1, len(sub_embedding.iloc[0])))
pcpm.plot.clustering(sub_embedding, labels=dbscan.labels_)
# Split the point-clouds in clusters
clusters = pcpm.split_pcs_in_clusters(aligned_pcs, embedding=sub_embedding, labels=dbscan.labels_)
n = 2
larges_clusters, other_pcs = pcpm.clusters.get_n_largest_clusters(clusters, n)
cluster_names = list(larges_clusters.keys())
cluster_names
Now that clusters are defined, the next step consists in calculating the cluster averages
# caluculate the average of the point-clouds within each cluster.
averages = pcpm.average_each_cluster(larges_clusters,
sub_embedding,
FWHM=10e4, # Full Width at Half Maximum for gaussian weights
align=False # if true align the point-clouds to the central point-cloud in the cluster beforea averaging
)
av = averages['0']
av.central_name
'R142626'
Building meshes of the moving averages. This part requires Brainvisa
mesh_params = dict(
gblur_sigma=0.5,
threshold=0.5,
deciReductionRate=99,
deciMaxError=1.0,
deciMaxClearance=3.0,
smoothRate=0.4,
smoothIt=30
)
meshes = dtb.mesh_of_averages(averages, in_embedding=False, **mesh_params)
# shift the mehes on one axis
# meshes_shifted = {label:dtb.mesh.shift_aims_mesh_along_axis(m, int(label)*30, 1) for label,m in meshes.items()}
meshes_in_emmbedding = dtb.mesh_of_averages(averages, in_embedding=True, embedding_scale=8, **mesh_params)
# make the meshing of the point-clouds in a cluster
cluster_meshes = dtb.mesh_of_point_clouds(clusters['0'], embedding=sub_embedding, embedding_scale=8)
meshing...: 100%|██████████| 225/225 [00:04<00:00, 50.56it/s]
rebuilding results: 100%|██████████| 225/225 [00:24<00:00, 9.28it/s]
# shift the meshes according to the distance from the center of the cluster
distances = pcpm.clusters.calc_distances_from_central(clusters['0'], sub_embedding)
shifted_meshes = {}
for name,mesh in cluster_meshes.items():
shifted_meshes[name] = dtb.mesh.shift_aims_mesh_along_axis(mesh, offset=distances[name], axis=1, scale=200)
<matplotlib.image.AxesImage at 0x7f42bb6b9eb8>