Feat: data clustering workflow #94

d-schindler · 2024-04-04T14:31:40Z

I've implemented a DataClustering class that constructs a graph from data and then applies PyGenStability. The implementation follows sklearn conventions. An example with synthetic data sampled from two multiscale circles is provided.

Proper docstrings are missing because I wanted to wait for your first review @arnaudon and looking forward to your feedback.

d-schindler · 2024-04-04T14:39:14Z

Later we will also have to update the readme and documentation, and the tests.

arnaudon

you can also run tox -e format and tox -e lint to check if the style is correct, I'll try to run the code later today

src/pygenstability/data_clustering.py

d-schindler · 2024-04-17T09:49:28Z

I'm working on the revised code and will commit later. I use black for formatting, which style is required to pass the tests?

arnaudon · 2024-04-17T11:39:44Z

you need tox -e format, and tox -e lint (first improves formation), second gives you errors it could not fix, it has black inside and a bit more, you'll see

d-schindler · 2024-04-17T16:35:02Z

I addressed all your requests and will look into tox later. Will we also have to write tests for this part of the software?

src/pygenstability/data_clustering.py

arnaudon · 2024-04-18T07:40:06Z

I addressed all your requests and will look into tox later. Will we also have to write tests for this part of the software?

No worries, I can run tox when I'll try the code. For the tests, we could write some to improve coverage, yes. What is your plan? You are ok to use this branch, or you prefer to merge it asap to share it more easily? We can polish it a bit more, merge, and re-open PR if there is something to fix later

d-schindler · 2024-04-18T12:19:05Z

I think we can work a bit longer on this branch and polish it, write tests, update documentation etc. No need to merge ASAP.

d-schindler · 2024-04-19T15:33:54Z

I've added plot_sankey as a method and started to update the readme. Will work on improving the documentation later.

@arnaudon , could you help with the test?

d-schindler · 2024-04-24T06:41:30Z

Open tasks are now improving the docstrings and documentation, and writing tests. Once this is done we could merge this branch and assign a new version number and release updated online documentation?

d-schindler · 2024-04-26T10:34:16Z

The documentation is ready. Only the tests are left now.

d-schindler · 2024-04-26T13:32:38Z

I accidentally changed the formatting of pygenstability.py when I improved a small bit of the docstring

implemented data clustering

98126cd

d-schindler added the enhancement New feature or request label Apr 4, 2024

d-schindler requested review from arnaudon and mauriciobarahona April 4, 2024 14:31

d-schindler added 2 commits April 16, 2024 18:12

add scale selection method

ebbf04d

add precomputed option

12df4b7

arnaudon requested changes Apr 17, 2024

View reviewed changes

first improvements

4e0e050

second improvement

9ec9f12

arnaudon reviewed Apr 18, 2024

View reviewed changes

src/pygenstability/data_clustering.py Outdated Show resolved Hide resolved

arnaudon reviewed Apr 18, 2024

View reviewed changes

src/pygenstability/data_clustering.py Outdated Show resolved Hide resolved

arnaudon reviewed Apr 18, 2024

View reviewed changes

src/pygenstability/data_clustering.py Show resolved Hide resolved

d-schindler added 3 commits April 18, 2024 14:30

more improvements

a830445

add plot_sankey method

8595f2e

update readme

c48914b

d-schindler added 2 commits April 22, 2024 17:39

syntax highlighting

5807ea6

formatting

41ab488

d-schindler requested a review from arnaudon April 24, 2024 06:38

d-schindler added 2 commits April 26, 2024 12:32

write proper documentation

266d361

improve documentation

e09c18c

arnaudon added 5 commits April 29, 2024 11:36

lint

ed8f0d4

add tests

1a31152

fix

a7f4e11

fix doc

db89e51

added test files

442c28d

arnaudon approved these changes Apr 29, 2024

View reviewed changes

arnaudon added 2 commits April 29, 2024 14:02

missing files

ae30894

fix

719052f

arnaudon changed the title ~~data clustering with PyGenStability~~ Feat: data clustering workflow Apr 29, 2024

remove codecov

4461268

arnaudon merged commit 88b8122 into master Apr 29, 2024
2 checks passed

arnaudon deleted the data-clustering branch April 29, 2024 12:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: data clustering workflow #94

Feat: data clustering workflow #94

d-schindler commented Apr 4, 2024

d-schindler commented Apr 4, 2024 •

edited

Loading

arnaudon left a comment

d-schindler commented Apr 17, 2024

arnaudon commented Apr 17, 2024

d-schindler commented Apr 17, 2024

arnaudon commented Apr 18, 2024

d-schindler commented Apr 18, 2024

d-schindler commented Apr 19, 2024 •

edited

Loading

d-schindler commented Apr 24, 2024

d-schindler commented Apr 26, 2024

d-schindler commented Apr 26, 2024

Feat: data clustering workflow #94

Feat: data clustering workflow #94

Conversation

d-schindler commented Apr 4, 2024

d-schindler commented Apr 4, 2024 • edited Loading

arnaudon left a comment

Choose a reason for hiding this comment

d-schindler commented Apr 17, 2024

arnaudon commented Apr 17, 2024

d-schindler commented Apr 17, 2024

arnaudon commented Apr 18, 2024

d-schindler commented Apr 18, 2024

d-schindler commented Apr 19, 2024 • edited Loading

d-schindler commented Apr 24, 2024

d-schindler commented Apr 26, 2024

d-schindler commented Apr 26, 2024

d-schindler commented Apr 4, 2024 •

edited

Loading

d-schindler commented Apr 19, 2024 •

edited

Loading