-
-
Notifications
You must be signed in to change notification settings - Fork 1k
Google Summer of Code
Jump straight to ideas below.
Orange is an open-source, cross-platform, component-based data mining and machine learning software suite which features friendly yet powerful and flexible visual programming front-end for exploratory data analysis, visualization, model construction, evaluation, and forecast. It includes a comprehensive set of components (we call them widgets) for data preprocessing, feature scoring and filtering, modeling, model evaluation and exploration. It is maintained and developed at the Bioinformatics Laboratory of the Faculty of Computer and Information Science, University of Ljubljana, Slovenia.
Website: http://orange.biolab.si/
Wikipedia: https://en.wikipedia.org/wiki/Orange_(software)
Since Orange is mostly written in Python 3, the ideal student will have strong skills in idiomatic Python 🐍 and NumPy with at least two years of experience. They would not be uncomfortable reading technical and scientific articles. 🎓 They would also possess some knowledge of Git and GitHub because this is how we roll. Some understanding of how GUI widget toolkits behave and work (such as Qt in particular) is a strong plus.
The best candidates will understand the foundations of the Unix Philosophy and will chant its koans regularly before bedtime. The best candidates use a POSIX system because everyone knows you can't develop on Windos.
Google Summer of Code selection process is quite competitive. Accepted students typically have thoroughly researched the technologies of their proposed project and have been in frequent contact with potential mentors.
Here's the recipe to win:
- Follow the DOs and DON'Ts for students.
- Read the relevant sections in the Student Manual.
- Import the GSoC timeline (iCal) into your Google Calendar (or your other preferred calendar software).
If you're set to win with Orange:
- Brush-up on your Git skills.
- Download and try Orange.Click around. Get a feel for what Orange can (and can't) do. Install add-ons (Options → Add-ons). Follow the tutorials on youtube, our blog and in documentation.
- Join us on gitter chat. Talk about what you want to do. If we see you're willing to get involved, we are sure to review your proposal more favorably.
- Review the projects that were contributed through GSoC in the previous years.
- Fill the application forms.
If you intend to come visit us in person, we share lunch (pizzas or Indian food) on Fridays!
- Student Application Form (ours) – you need to fill this, and you may edit it multiple times before deadline (25 March).
- Student Application Form (Google's) – You need to sign up here as well to be in the system. Also submit a copy of your proposal here.
- GSoC 2016 Timeline – Don't miss the important dates. You should import the iCal file into your calendar.
- GSoC Student Manual
- GSoC General Resources
Open-source development includes open and transparent communication. To step in contact with us, please use one of the preferred means of communication:
- gitter chat can be used to say hi, discuss your GSoC application, proposed ideas, idea implementation details, ...,
- Orange issue tracker for queries that look like issues (bugs, legitimate feature requests),
- pull-requests section for issues that include patches 👍 that fix them
Listed in no particular order, sometimes vague and incomplete, are some ideas for projects that might be interesting to carry out during this year's Google Summer of Code program.
If you'd like to discuss a particular implementation of any of these ideas, or if you have questions regarding your own idea, please join us on gitter chat and don't be shy posing any questions there. 😃
To be clear, your own ideas that complement Orange are most welcome!
Orange can currently connect to PostgreSQL and MSSQL databases. Support for alternative databases could be provided via a (well maintained) third party python library that supports connections to ODBC data sources. The end result should include (beside a working backend) automated tests that run on travis and/or appveyor, documentation of the feature, installation guide. Any external packages used should be installable on all supported platforms (win, macos, linux) via pip and conda.
Intensity: moderate
Extent: moderate
Involves: Python, SQL, ODBC
Mentors: @astaric
Constructing new features is a crucial task in data mining. There is a widget for this in Orange (Feature Constructor), but it has some unnecessary limitations and most of all is not very intuitive and easy to use. The need for better solutions has already lead to the Create Class widget being introduced, which is limited to a more specific case, but does that much better. However a good general Feature Constructor is still needed.
The widget should be redone / extensively improved, with focuses on:
- the best user experience possible
- high efficiency and effectiveness
- compromises between ease of use and advanced features (somehow hidden at first?)
- very good documentation and in-widget help (probably with some examples etc)
We would like to see a good first proposal with ideas and suggestions, but expect a lot of coordination about design decisions with other developers after that.
Intensity: moderate
Extent: limited
Involves: Python, QT
Mentors: @janezd, @lanzagar
In a desire to make Orange a complete data analysis software, we wish to introduce simple statistics widgets to Orange. The new add-on would include t-test (sample t-test, independent t-test, pair samples t-test + Bayesian counterparts, ANOVA (+ Bayesian ANOVA), Pearson's r, correlations, normalization, etc. We also wish to extend Box Plot widget to output basic statistics (mean, median, variance, confidence intervals, standard deviation) to a Data Table.
Intensity: moderate
Extent: extensive
Involves: Python, Qt
Mentors: @janezd, @ajdapretnar, @kernc
Orange has had a long history. There was a period when most of Orange core was a custom C++ code, with glue code and widgets in Python 2 and PyQwt. It eventually proved hard to manage, so we migrated to the current basic stack: Python 3, PyQtGraph, NumPy.
We haven't yet managed to port all the widgets (there's so much to do!), and we would really like to see the following widgets from Orange 2: Interaction Graph, SOM (self-organizing maps) with viewer, Reliability (classifier reliability estimation), Ensemble, and multiple visualization widgets.
Any of those widgets can be a proposal itself!
Intensity: hard
Extent: extensive
Involves: Orange 2, Cython
Mentors: @janezd, @BlazZupan
GNU/Linux is one of our three target platforms. While the inferior platforms enjoy prebuilt executable packages, GNU/Linux users are let to themselves. This is all fine and dandy as GNU/Linux users usually find they way around pretty easily, but it would still be convenient if users could more simply just dpkg that deb or apt-get orange onto their 'buntu boxen.
The only way to ever become a Debian maintainer and positively affect millions is to start.
Intensity: easy
Extent: limited
Involves: git-buildpackage, Open Build Service
Mentors: @kernc
Imagine analyzing some data in Orange. You build a model that you are satisfied with, but you just want to make some final adjustments to your model in the code. Unfortunately, there is currently no way to export widgets' functionality and the built OWS workscheme into the underlying Python code (think of IPython Notebook's
--to script
).
It'd be useful if we could transparently export (linearize) Orange workflow schemes into raw Python code one could further edit and run as Python script.
Note, the mentors have no idea how this could be done. They just wish it was.
Intensity: hard
Extent: broad
Involves: Python
Mentors: @kernc