Skip to content

Commit

Permalink
Update citation reference
Browse files Browse the repository at this point in the history
  • Loading branch information
nunofachada committed May 30, 2020
1 parent f69fe12 commit 7f97812
Show file tree
Hide file tree
Showing 2 changed files with 33 additions and 14 deletions.
23 changes: 19 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
[![Latest release](https://img.shields.io/github/release/fakenmc/generateData.svg)](https://github.com/fakenmc/generateData/releases)
[![MIT Licence](https://img.shields.io/badge/license-MIT-yellowgreen.svg)](https://opensource.org/licenses/MIT/)
[![View Generate Data for Clustering on File Exchange](https://www.mathworks.com/matlabcentral/images/matlab-file-exchange.svg)](https://www.mathworks.com/matlabcentral/fileexchange/37435-generate-data-for-clustering)

# generateData

Expand Down Expand Up @@ -100,14 +101,28 @@ rand("state", 123);
randn("state", 123);
```

## Previous behaviors and reproducibility of results

Before [v2.0.0](https://github.com/fakenmc/generateData/tree/v2.0.0), lines
supporting clusters were parameterized with slopes instead of angles. We found
this caused difficulties when choosing line orientation, thus the change to
angles, which are much easier to work with.
Version [v1.3.0](https://github.com/fakenmc/generateData/tree/v1.3.0) still
uses slopes, for those who prefer this behavior.

For reproducing results in studies published before May 2020, use version
[v1.2.0](https://github.com/fakenmc/generateData/tree/v1.2.0) instead.
Subsequent versions were optimized in a way that changed the order in which
the required random values are generated, thus producing slightly different
results.

## Reference

If you use this function in your work, please cite the following reference:

- Fachada, N., Figueiredo, M.A.T., Lopes, V.V., Martins, R.C., Rosa,
A.C., [Spectrometric differentiation of yeast strains using minimum volume
increase and minimum direction change clustering criteria](http://www.sciencedirect.com/science/article/pii/S0167865514000889),
Pattern Recognition Letters, vol. 45, pp. 55-61 (2014), doi: http://dx.doi.org/10.1016/j.patrec.2014.03.008
- Fachada, N., & Rosa, A. C. (2020).
[generateData—A 2D data generator](https://doi.org/10.1016/j.simpa.2020.100017).
Software Impacts, 4:100017. doi: [10.1016/j.simpa.2020.100017](https://doi.org/10.1016/j.simpa.2020.100017)

## License

Expand Down
24 changes: 14 additions & 10 deletions generateData.m
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,11 @@
totalPoints, ...
varargin ...
)
% GENERATEDATA Generates 2D data for clustering. Data is created along
% GENERATEDATA Generates 2D data for clustering. Data is created along
% straight lines, which can be more or less parallel
% depending on the angleStd parameter.
%
% [data clustPoints idx centers angles lengths] =
% [data clustPoints idx centers angles lengths] =
% GENERATEDATA(angleMean, angleStd, numClusts, xClustAvgSep, ...
% yClustAvgSep, lengthMean, lengthStd, lateralStd, ...
% totalPoints, ...)
Expand All @@ -31,7 +31,7 @@
% Line lengths are drawn from the folded normal
% distribution.
% lengthStd - Standard deviation of line lengths.
% lateralStd - Cluster "fatness", i.e., the standard deviation of the
% lateralStd - Cluster "fatness", i.e., the standard deviation of the
% distance from each point to its projection on the
% line. The way this distance is obtained is controlled by
% the optional 'pointOffset' parameter.
Expand Down Expand Up @@ -64,18 +64,18 @@
% of each point.
% centers - Matrix (numClusts x 2) containing centers from where
% clusters were generated.
% angles - Vector (numClusts x 1) containing the effective angles
% angles - Vector (numClusts x 1) containing the effective angles
% of the lines used to generate clusters.
% lengths - Vector (numClusts x 1) containing the effective lengths
% lengths - Vector (numClusts x 1) containing the effective lengths
% of the lines used to generate clusters.
%
% ----------------------------------------------------------
% Usage example:
%
% [data cp idx] = GENERATEDATA(pi / 2, pi / 8, 5, 15, 15, 5, 1, 2, 200);
%
% This creates 5 clusters with a total of 200 points, with a mean angle
% of pi/2 (std=pi/8), separated in average by 15 units in both x and y
% This creates 5 clusters with a total of 200 points, with a mean angle
% of pi/2 (std=pi/8), separated in average by 15 units in both x and y
% directions, with mean length of 5 units (std=1) and a "fatness" or
% spread of 2 units.
%
Expand All @@ -84,8 +84,12 @@
% scatter(data(:, 1), data(:, 2), 8, idx);

% Copyright (c) 2012-2020 Nuno Fachada
% Distributed under the MIT License (See accompanying file LICENSE or copy
% Distributed under the MIT License (See accompanying file LICENSE or copy
% at http://opensource.org/licenses/MIT)
%
% Reference:
% Fachada, N., & Rosa, A. C. (2020). generateData—A 2D data generator.
% Software Impacts, 4:100017. doi: 10.1016/j.simpa.2020.100017

% Known distributions for sampling points along lines
pointDists = {'unif', 'norm'};
Expand Down Expand Up @@ -225,7 +229,7 @@
% each point
perpAngles = angles(i) + sign(points_dist) * pi / 2;
perpVecs = [cos(perpAngles) sin(perpAngles)];

% Set vector magnitudes
perpVecs = abs(points_dist) .* perpVecs;

Expand Down Expand Up @@ -253,4 +257,4 @@

% Update idx
idx(cumSumPoints(i) + 1 : cumSumPoints(i + 1)) = i;
end;
end;

0 comments on commit 7f97812

Please sign in to comment.