Point-Cloud-Pattern-Mining PCPM

This library can be used to compare entities described as sets of 3D point clouds (like cortical sulci or fibers bundles).

The analysis starts with the initial calculation of a geometrical distance among the different N point clouds (aka subjects), resulting in a N by N distance matrix. Then multidimensional scaling via isomap is applied to infer the relationships that link entities sharing common features. The isomap algorithm estimates the intrinsic geometry of the data manifold based on a rough estimate of each data point’s neighbors. Then, the geodesic distances among the subjects are recalculated on the inferred manifold. The top n eigenvectors of the geodesic distance matrix represent the coordinates in the new n-dimensional Euclidean space.

The point clouds are then studied in this new n-dimensional space. Nevertheless, direct observation of all the point-clouds is not simple, especially if the number of subjects is large. In order to make a synthetic description of the analysis result, a set of weighted averages are calculated along a chosen axis (see figure 1) or on given regions of the newly defined n-dimensional space, possibly specified by a clustering algorithm.

install

clone this repo somewhere where you will keep it, then in the root folder install with pip3:

$ cd point-colud-pattern-mining
$ pip3 install -e .

the -e options causes the install to link the module to the cloned folder, instead of installing it in the default module location.

Example

import pcpm

# The point clouds need to be stored in a dictionnary
# This example works uses 1765 point-clouds representing the Central Sulcus
# of the Brain of different human subjects
len(pcs)

# One point-cloud is an array of 3D points
pcs['L295146'].shape

(2382, 3)

Distance calculation

The first step consist in calculating a pair-wise distance among the point-clouds, resulting in an NxN distance matrix.

This can be done for example via ICP (Iterative Closest Point).

To this purpose, it is possible to use the pcpm_icp command line tool, which is installed with this package.

# We have calculated the ICP on a subset of 800 subjects with the pcpm_icp command line tool
# The results can be loaded with the following command
icp = pcpm.load_icp_result("Data/output/CSSyl_sulci_bigpieceRedo_data_distances.csv")

# selects only a subset of the point clouds
names = icp.names # the subjects names
pcs = pcpm.subset_of_pcs(pcs, names)
len(pcs)

# To each subject is associated a vector of coordinates in a space called 'embedding' corresponding to the distances from all the others
embedding = icp.dist_min     # use the ICP minimum distance matrix as embedding for the point clouds
embedding.head()

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	R397861	R211417	R114419	L467351	R100610	L207628	L521331	L194140	R187345	R171532	...	R415837	L525541	R172534	L150524	L131217	L877168	R192035	L172938	R201717	L594156
R397861	0.000000	5.664300	5.737526	5.210066	7.683856	7.213072	12.411116	8.382079	5.947090	3.567614	...	8.604815	9.117817	5.670137	11.083932	3.857694	9.753024	7.401399	6.783885	3.420082	8.577278
R211417	5.664300	0.000000	3.242321	4.050629	4.090161	7.235344	7.856446	11.322169	4.222616	7.290179	...	5.461645	7.236711	3.584346	8.927880	3.623428	5.401354	6.704233	5.868034	7.348890	4.157319
R114419	5.737526	3.242321	0.000000	4.365764	7.590033	8.349736	9.058546	9.659633	3.837825	8.784393	...	7.367931	7.148141	5.012304	7.509707	4.103284	6.236840	5.173184	6.318310	6.494448	3.889136
L467351	5.210066	4.050629	4.365764	0.000000	7.923952	11.941841	11.090422	8.219851	7.273594	8.186157	...	11.431063	9.123229	7.413541	11.951092	4.556152	6.820432	8.444611	6.373734	6.571048	8.746338
R100610	7.683856	4.090161	7.590033	7.923952	0.000000	5.732312	8.962444	16.892410	7.464225	6.764188	...	5.639178	8.992041	5.370903	10.090275	6.870065	6.116426	7.772079	4.246188	9.409130	7.164085

5 rows × 800 columns

# Load ICP resultsx
icp = pcpm.load_icp_result("Data/output/CSSyl_sulci_bigpieceRedo_data_distances.csv")
pcs = pcpm.subset_of_pcs(pcs, icp.names)

Alignement of the point-clouds

The point-clouds can be aligned to a particular subject. In this case, ICP is used to calculate and apply rotations and translations that minimize the distance of every point-cloud with the reference.

# the central subject is the point-cloud which is closest to all others in the embeddingnames = icp.names            # the subjects names
central_subject = pcpm.find_central_pcs_name(icp.dist_min)
print("Center subject:", central_subject)

Center subject: R192540

# the return values are dictionnaries, with the same keys as the pcs
aligned_pcs, aligned_rot, aligned_transl = pcpm.align_pcs(pcs, central_subject)

>>> INFO pcpm.transform - using 45 cores out of 48
Aligning point-clouds to R192540: 100%|██████████| 800/800 [02:48<00:00,  4.76it/s]

Dimensionality Reduction by ISOMAP

The following step is a dimensionality reduction of the embedding by ISOMAP.

embedding = pcpm.get_isomap(icp.dist_min, neighbors=3, components=5)
embedding.head()

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	1	2	3	4	5
R397861	-7.521278	2.663364	-0.522464	4.029337	-1.052588
R211417	-0.180709	-2.385808	-0.892769	0.586591	-1.362598
R114419	-2.792592	-2.784301	1.283677	-0.720942	-0.313043
L467351	-4.249241	-3.364337	1.571591	2.045751	4.365674
R100610	-0.627027	3.599063	-6.467169	-1.535264	-2.066531

# Plot the embedding.
# The diagonal of this plot are the histgrams of the columns of the embedding matrix
# The off-diagonal scatter-plots represent a sub-space of the embedding where
# each point corresponds to a subject and its corrdinates are taken from two different
# columns of the embedding matrix
pcpm.plot.isomap_embedding(embedding)

Clustering

In the search for patterns in the point clouds, we do a clustering of the subjects in the embedding.

The choice of the clustering method is left to the used. Here we give an example with k-means and DBscan

# K-MEANS ==========================
from sklearn.cluster import KMeans
# select a 2D subspace of the embedding
sub_embedding = embedding.loc[:,[1,3]]
# do the clustering you prefer
k=3
kmeans = KMeans(n_clusters=k, random_state=0).fit(sub_embedding.values.reshape(-1, len(sub_embedding.iloc[0])))
cluster_centers = kmeans.cluster_centers_
# split the aligned point-clouds in clusters
pcpm.plot.clustering(sub_embedding, labels=kmeans.labels_)

# DBSCAN ==========================
from sklearn.cluster import DBSCAN
sub_embedding = embedding.loc[:,[1,3]]
dbscan = DBSCAN(eps=0.48, min_samples=3).fit(sub_embedding.values.reshape(-1, len(sub_embedding.iloc[0])))
pcpm.plot.clustering(sub_embedding, labels=dbscan.labels_)

# Split the point-clouds in clusters
clusters = pcpm.split_pcs_in_clusters(aligned_pcs, embedding=sub_embedding, labels=dbscan.labels_)

n = 2
larges_clusters, other_pcs = pcpm.clusters.get_n_largest_clusters(clusters, n)
cluster_names = list(larges_clusters.keys())
cluster_names

Averages

Now that clusters are defined, the next step consists in calculating the cluster averages

# caluculate the average of the point-clouds within each cluster.
averages = pcpm.average_each_cluster(larges_clusters,
                                     sub_embedding,
                                     FWHM=10e4,     # Full Width at Half Maximum for gaussian weights
                                     align=False     # if true align the point-clouds to the central point-cloud in the cluster beforea averaging
                                    )

av = averages['0']
av.central_name

'R142626'

Meshes

Building meshes of the moving averages. This part requires Brainvisa

Meshes of the averages

mesh_params = dict(
    gblur_sigma=0.5,
    threshold=0.5,
    deciReductionRate=99,
    deciMaxError=1.0,
    deciMaxClearance=3.0,
    smoothRate=0.4,
    smoothIt=30
)

meshes = dtb.mesh_of_averages(averages, in_embedding=False, **mesh_params)

# shift the mehes on one axis
# meshes_shifted = {label:dtb.mesh.shift_aims_mesh_along_axis(m, int(label)*30, 1) for label,m in meshes.items()}

meshes_in_emmbedding = dtb.mesh_of_averages(averages, in_embedding=True, embedding_scale=8, **mesh_params)

Meshes of the point clouds

# make the meshing of the point-clouds in a cluster
cluster_meshes = dtb.mesh_of_point_clouds(clusters['0'], embedding=sub_embedding, embedding_scale=8)

meshing...: 100%|██████████| 225/225 [00:04<00:00, 50.56it/s]
rebuilding results: 100%|██████████| 225/225 [00:24<00:00,  9.28it/s]

# shift the meshes according to the distance from the center of the cluster
distances = pcpm.clusters.calc_distances_from_central(clusters['0'], sub_embedding)

shifted_meshes = {}
for name,mesh in cluster_meshes.items():
    shifted_meshes[name] = dtb.mesh.shift_aims_mesh_along_axis(mesh, offset=distances[name], axis=1, scale=200)

<matplotlib.image.AxesImage at 0x7f42bb6b9eb8>

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
docs		docs
pcpm		pcpm
pcpm_tests		pcpm_tests
.gitignore		.gitignore
README.md		README.md
changelog.txt		changelog.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Point-Cloud-Pattern-Mining PCPM

install

Example

Distance calculation

Alignement of the point-clouds

Dimensionality Reduction by ISOMAP

Clustering

Averages

Meshes

Meshes of the averages

Meshes of the point clouds

About

Releases 1

Packages

Contributors 4

Languages

neurospin/point-cloud-pattern-mining

Folders and files

Latest commit

History

Repository files navigation

Point-Cloud-Pattern-Mining PCPM

install

Example

Distance calculation

Alignement of the point-clouds

Dimensionality Reduction by ISOMAP

Clustering

Averages

Meshes

Meshes of the averages

Meshes of the point clouds

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 4

Languages

Packages