Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added implementation for KMeans algorithm in GIL #587

Open
wants to merge 8 commits into
base: develop
Choose a base branch
from

Conversation

Sayan-Chaudhuri
Copy link

This is an implementation of Kmeans Clustering algorithm for Boost GIL. The input image filename(absolute/relative) must be passed as a command line argument.During execute ,input the required number of iterations and K value that indicates number of clusters.
Output is stored in the file named output-kmeans.tif
I have attached the example output of my implementation for the input image frog.jpg(attached),and number of clusters and iterations being 10 and 1 respectively.
I would request the community to kindly provide feedback for my implementation
I would also like to use the implementation of this algorithm for my gil competency test for GSOC 2021.

Description

References

Tasklist

  • Add test case(s)
  • Ensure all CI builds pass
  • Review and approve

This is an implementation of Kmeans Clustering algorithm for Boost GIL. The input image filename(absolute/relative) must be passed as a command line argument.During execute ,input the required number of iterations and K value that indicates number of clusters.
Output is stored in the file named output-kmeans.tif
I have attached the example output of my implementation for the input image frog.jpg(attached),and number of clusters and iterations being 10 and 1 respectively.
I would request the community to kindly provide feedback for my implementation
I would also like to use the implementation of this algorithm for my gil competency test for GSOC 2021.
@mloskot mloskot added the example Examples of how to use GIL label Mar 26, 2021
@lpranam
Copy link
Member

lpranam commented Mar 28, 2021

This algorithm is used everywhere so shouldn't this be introduced as a usable functionality of the library instead of an example?

Copy link
Member

@lpranam lpranam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't have to provide output file

@Sayan-Chaudhuri
Copy link
Author

I had thought that first my implementation will be checked whether its working correctly or not and so I did it in the form of an example. But if you suggest, I shall make it in the form of a usable functionality and then raise a pull request. I am also removing the output files following your suggestion.

@lpranam
Copy link
Member

lpranam commented Mar 28, 2021

First example and then the actual implementation just increases the amount of work reviewers and you will have to do. Let's just review the final thing.

Turned the example file into a usable functionality(.hpp) for KMeans Clustering as suggested by lPranam and added it to the image_processing folder of boost.
@codecov
Copy link

codecov bot commented Mar 28, 2021

Codecov Report

Merging #587 (ca9e727) into develop (6e91e4b) will increase coverage by 0.13%.
The diff coverage is n/a.

❗ Current head ca9e727 differs from pull request most recent head e823ebd. Consider uploading reports for the commit e823ebd to get more accurate results

@@             Coverage Diff             @@
##           develop     #587      +/-   ##
===========================================
+ Coverage    78.59%   78.72%   +0.13%     
===========================================
  Files          117      118       +1     
  Lines         5003     5034      +31     
===========================================
+ Hits          3932     3963      +31     
  Misses        1071     1071              

@Sayan-Chaudhuri
Copy link
Author

@lpranam I have made the necessary changes as suggested .

@lpranam
Copy link
Member

lpranam commented Mar 29, 2021

new functionality also needs tests...

@Sayan-Chaudhuri
Copy link
Author

@lpranam Ok,I shall soon make the changes and update here

@Sayan-Chaudhuri
Copy link
Author

@lpranam Is it ok if for the test file I keep a static dataset? I saw the test files of other algorithms where they have fixed the input and expected output data like the pixel value of images

@lpranam
Copy link
Member

lpranam commented Mar 30, 2021

@Sayan-Chaudhuri yes, as far as it covers the cases it is okay.

@Sayan-Chaudhuri
Copy link
Author

@lpranam I wish to upload the test file along with the KMeans implementation but I want to clarify certain things.

  1. I have used a single dataset generated with the help of make_blobs() function in python. I have tested it against the existing implementations of kmeans with random centre intialization like in sklearn and OPENCV. So, for benchmarking I have used the silhoutte score obtained using those implementations on that dataset.For a set of 20 runs of the algorithm using different centre initializations, I obtain a silhoutte score of above 0.8 for 90% of the runs .Using the existing implementations in sklearn and OPENCV , this score has come to be 0.83 with the same number of runs. Is it OK If I submit my implementation then?
  2. I have implemented the entire silhoutte score algorithm while testing my implementation. Silhoutte score is very common in clustering techniques and Boost does not have a separate implementation for that. So should I make a separate header file for the Silhoutte score implementation so that it can be reviewed and later merged with the library if deemed ok? I have already googled and I found no api for Silhoutte score calculation in boost.

If you can kindly find time to clarify these doubts, I will be highly grateful.

@Sayan-Chaudhuri
Copy link
Author

Upon your clarification, I shall push the files accordingly

@lpranam
Copy link
Member

lpranam commented Apr 1, 2021

you should upload it if, it does not fit then obviously can be removed but need to have look.

@lpranam
Copy link
Member

lpranam commented Apr 1, 2021

can also you PR must pass all the CI checks

@Sayan-Chaudhuri
Copy link
Author

Added a new header file for Kmeans

@Sayan-Chaudhuri
Copy link
Author

@lpranam

@Sayan-Chaudhuri
Copy link
Author

@lpranam I have also added the test file

@mloskot mloskot added the google-summer-of-code All items related to GSoC activities label Apr 7, 2021
@meshtag
Copy link
Member

meshtag commented May 20, 2021

Hi @Sayan-Chaudhuri ,
I had a look at this build which contains the following message

error: toolset gcc initialization:
error: version '8' requested but 'g++-8' not found and version '7.5.0' of default 'g++' does not match>

I don't think this is related to any changes made by you in this PR, you should probably update this branch with latest develop branch of boost Gil.
Pushing again after updating should probably solve this issue.

PS : I encountered a similar error here

@Sayan-Chaudhuri
Copy link
Author

@meshtag how to do so what you have mentioned?

@meshtag
Copy link
Member

meshtag commented May 23, 2021

There are a couple of ways to do this, you can look here to understand some common ones.

@mloskot
Copy link
Member

mloskot commented May 23, 2021

@Sayan-Chaudhuri

how to do so what you have mentioned?

Please, read the CONTRIBUTING.md on updating your PR.
Updating/Syncing PR on GitHub is a common operation, so it is very well documented on the web, GitHub docs, StackOverflow.

Not to mention the lazy button-based way recently offered by GitHub
https://github.blog/changelog/2021-05-06-sync-an-out-of-date-branch-of-a-fork-from-the-web/

@mloskot mloskot added status/need-feedback Asking for more details about the problem status/work-in-progress Do NOT merge yet until this label has been removed! labels Jun 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
example Examples of how to use GIL google-summer-of-code All items related to GSoC activities status/need-feedback Asking for more details about the problem status/work-in-progress Do NOT merge yet until this label has been removed!
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants