-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit 39b0796
Showing
4 changed files
with
254 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
*~ | ||
.~*# | ||
.nfs* | ||
*.mat | ||
*.fig | ||
*.aux | ||
*.log | ||
*.blg | ||
*.out | ||
*.gz | ||
*.ods | ||
*.eps |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,66 @@ | ||
# User Manual | ||
|
||
### Summary | ||
|
||
A Matlab/Octave script which generates 2D data for clustering; data is | ||
created along straight lines, which can be more or less parallel | ||
depending on the selected input parameters. | ||
|
||
### Synopsis | ||
|
||
[data, clustPoints, idx, centers, slopes, lengths] = | ||
generateData(slope, slopeStd, numClusts, xClustAvgSep, | ||
yClustAvgSep, lengthAvg, lengthStd, lateralStd, | ||
totalPoints) | ||
|
||
### Input parameters | ||
|
||
Parameter | Description | ||
-------------- | ------------------------------------------------------------------------------------------------------ | ||
*slope* | Base direction of the lines on which clusters are based | ||
*slopeStd* | Standard deviation of the slope; used to obtain a random slope variation from the normal distribution, which is added to the base slope in order to obtain the final slope of each cluster | ||
*numClusts* | Number of clusters (and therefore of lines) to generate | ||
*xClustAvgSep* | Average separation of line centers along the X axis | ||
*yClustAvgSep* | Average separation of line centers along the Y axis | ||
*lengthAvg* | The base length of lines on which clusters are based | ||
*lengthStd* | Standard deviation of line length; used to obtain a random length variation from the normal distribution, which is added to the base length in order to obtain the final length of each line | ||
*lateralStd* | "Cluster fatness", i.e., the standard deviation of the distance from each point to the respective line, in both *x* and *y* directions; this distance is obtained from the normal distribution | ||
*totalPoints* | Total points in generated data (will be randomly divided among clusters) | ||
|
||
### Return values | ||
|
||
Value | Description | ||
------------- | -------------------------------------------------------------------------------------- | ||
*data* | Matrix (*totalPoints* x *2*) with the generated data | ||
*clustPoints* | Vector (*numClusts* x *1*) containing number of points in each cluster | ||
*idx* | Vector (*totalPoints* x *1*) containing the cluster indices of each point | ||
*centers* | Matrix (*numClusts* x *2*) containing centers from where clusters were generated | ||
*slopes* | Vector (*numClusts* x *1*) containing the effective slopes used to generate clusters | ||
*lengths* | Vector (*numClusts* x *1*) containing the effective lengths used to generate clusters | ||
|
||
### Usage example | ||
|
||
[data cp idx] = generateData(1, 0.5, 5, 15, 15, 5, 1, 2, 200); | ||
|
||
The previous command creates 5 clusters with a total of 200 points, with | ||
a base slope of 1 (*std*=0.5), separated in average by 15 units in both | ||
*x* and *y* directions, with average length of 5 units (*std*=1) and a | ||
"fatness" or spread of 2 units. | ||
|
||
To take a quick look at the clusters just do: | ||
|
||
scatter(data(:,1), data(:,2), 8, idx); | ||
|
||
### Reference | ||
|
||
If you use this script in your work, please use the following reference: | ||
|
||
- Fachada, N., Figueiredo, M.A.T., Lopes, V.V., Martins, R.C., Rosa, | ||
A.C., [Spectrometric differentiation of yeast strains using minimum volume | ||
increase and minimum direction change clustering criteria](http://www.sciencedirect.com/science/article/pii/S0167865514000889), | ||
Pattern Recognition Letters, vol. 45, pp. 55-61 (2014), doi: http://dx.doi.org/10.1016/j.patrec.2014.03.008 | ||
|
||
### License | ||
|
||
This script is made available under the [Simplified BSD License](license.txt). | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,151 @@ | ||
function [data, clustPoints, idx, centers, slopes, lengths] = ... | ||
generateData( ... | ||
slope, ... | ||
slopeStd, ... | ||
numClusts, ... | ||
xClustAvgSep, ... | ||
yClustAvgSep, ... | ||
lengthAvg, ... | ||
lengthStd, ... | ||
lateralStd, ... | ||
totalPoints ... | ||
) | ||
% GENERATEDATA Generates 2D data for clustering; data is created along | ||
% straight lines, which can be more or less parallel depending | ||
% on slopeStd argument. | ||
% | ||
% [data clustPoints idx centers slopes lengths] = | ||
% GENERATEDATA(slope, slopeStd, numClusts, xClustAvgSep, yClustAvgSep, ... | ||
% lengthAvg, lengthStd, lateralStd, totalPoints) | ||
% | ||
% Inputs: | ||
% slope - Base direction of the lines on which clusters are based. | ||
% slopeStd - Standard deviation of the slope; used to obtain a random | ||
% slope variation from the normal distribution, which is | ||
% added to the base slope in order to obtain the final slope | ||
% of each cluster. | ||
% numClusts - Number of clusters (and therefore of lines) to generate. | ||
% xClustAvgSep - Average separation of line centers along the X axis. | ||
% yClustAvgSep - Average separation of line centers along the Y axis. | ||
% lengthAvg - The base length of lines on which clusters are based. | ||
% lengthStd - Standard deviation of line length; used to obtain a random | ||
% length variation from the normal distribution, which is | ||
% added to the base length in order to obtain the final | ||
% length of each line. | ||
% lateralStd - "Cluster fatness", i.e., the standard deviation of the | ||
% distance from each point to the respective line, in both x | ||
% and y directions; this distance is obtained from the | ||
% normal distribution. | ||
% totalPoints - Total points in generated data (will be | ||
% randomly divided among clusters). | ||
% | ||
% Outputs: | ||
% data - Matrix (totalPoints x 2) with the generated data | ||
% clustPoints - Vector (numClusts x 1) containing number of points in each | ||
% cluster | ||
% idx - Vector (totalPoints x 1) containing the cluster indices of | ||
% each point | ||
% centers - Matrix (numClusts x 2) containing centers from where | ||
% clusters were generated | ||
% slopes - Vector (numClusts x 1) containing the effective slopes | ||
% used to generate clusters | ||
% lengths - Vector (numClusts x 1) containing the effective lengths | ||
% used to generate clusters | ||
% | ||
% ---------------------------------------------------------- | ||
% Usage example: | ||
% | ||
% [data cp idx] = GENERATEDATA(1, 0.5, 5, 15, 15, 5, 1, 2, 200); | ||
% | ||
% This creates 5 clusters with a total of 200 points, with a base slope | ||
% of 1 (std=0.5), separated in average by 15 units in both x and y | ||
% directions, with average length of 5 units (std=1) and a "fatness" or | ||
% spread of 2 units. | ||
% | ||
% To take a quick look at the clusters just do: | ||
% | ||
% scatter(data(:,1), data(:,2), 8, idx); | ||
|
||
% N. Fachada | ||
% Instituto Superior Técnico, Lisboa, Portugal | ||
|
||
% Make sure totalPoints >= numClusts | ||
if totalPoints < numClusts | ||
error('Number of points must be equal or larger than the number of clusters.'); | ||
end; | ||
|
||
% Determine number of points in each cluster | ||
clustPoints = abs(randn(numClusts, 1)); | ||
clustPoints = clustPoints / sum(clustPoints); | ||
clustPoints = round(clustPoints * totalPoints); | ||
|
||
% Make sure totalPoints is respected | ||
while sum(clustPoints) < totalPoints | ||
% If one point is missing add it to the smaller cluster | ||
[C,I] = min(clustPoints); | ||
clustPoints(I(1)) = C + 1; | ||
end; | ||
while sum(clustPoints) > totalPoints | ||
% If there is one extra point, remove it from larger cluster | ||
[C,I] = max(clustPoints); | ||
clustPoints(I(1)) = C - 1; | ||
end; | ||
|
||
% Make sure there are no empty clusters | ||
emptyClusts = find(clustPoints == 0); | ||
if ~isempty(emptyClusts) | ||
% If there are empty clusters... | ||
numEmptyClusts = size(emptyClusts, 1); | ||
for i=1:numEmptyClusts | ||
% ...get a point from the largest cluster and assign it to the | ||
% empty cluster | ||
[C,I] = max(clustPoints); | ||
clustPoints(I(1)) = C - 1; | ||
clustPoints(emptyClusts(i)) = 1; | ||
end; | ||
end; | ||
|
||
% Initialize data matrix | ||
data = zeros(sum(clustPoints), 2); | ||
|
||
% Initialize idx (vector containing the cluster indices of each point) | ||
idx = zeros(totalPoints, 1); | ||
|
||
% Initialize lengths vector | ||
lengths = zeros(numClusts, 1); | ||
|
||
% Determine cluster centers | ||
xCenters = xClustAvgSep * numClusts * (rand(numClusts, 1) - 0.5); | ||
yCenters = yClustAvgSep * numClusts * (rand(numClusts, 1) - 0.5); | ||
centers = [xCenters yCenters]; | ||
|
||
% Determine cluster slopes | ||
slopes = slope + slopeStd * randn(numClusts, 1); | ||
|
||
% Create clusters | ||
for i=1:numClusts | ||
% Determine length of line where this cluster will be based | ||
lengths(i) = abs(lengthAvg + lengthStd*randn); | ||
% Determine how many points have been assigned to previous clusters | ||
sumClustPoints = 0; | ||
if i > 1 | ||
sumClustPoints = sum(clustPoints(1:(i - 1))); | ||
end; | ||
% Create points for this cluster | ||
for j=1:clustPoints(i) | ||
% Determine where in the line the next point will be projected | ||
position = lengths(i) * rand - lengths(i) / 2; | ||
% Determine x coordinate of point projection | ||
delta_x = cos(atan(slopes(i))) * position; | ||
% Determine y coordinate of point projection | ||
delta_y = delta_x * slopes(i); | ||
% Get point distance from line in x coordinate | ||
delta_x = delta_x + lateralStd * randn; | ||
% Get point distance from line in y coordinate | ||
delta_y = delta_y + lateralStd * randn; | ||
% Determine the actual point | ||
data(sumClustPoints + j, :) = [(xCenters(i) + delta_x) (yCenters(i) + delta_y)]; | ||
end; | ||
% Update idx | ||
idx(sumClustPoints + 1 : sumClustPoints + clustPoints(i)) = i; | ||
end; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
Copyright (c) 2012, Nuno Fachada | ||
All rights reserved. | ||
|
||
Redistribution and use in source and binary forms, with or without | ||
modification, are permitted provided that the following conditions are | ||
met: | ||
|
||
* Redistributions of source code must retain the above copyright | ||
notice, this list of conditions and the following disclaimer. | ||
* Redistributions in binary form must reproduce the above copyright | ||
notice, this list of conditions and the following disclaimer in | ||
the documentation and/or other materials provided with the distribution | ||
|
||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" | ||
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE | ||
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE | ||
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE | ||
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR | ||
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF | ||
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS | ||
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN | ||
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) | ||
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE | ||
POSSIBILITY OF SUCH DAMAGE. |