Skip to content

Google Summer of Code

kernc edited this page Aug 24, 2016 · 8 revisions

Google Summer of Code

Jump straight to ideas below.

Orange - Data mining fruitful & fun!

About Orange

Orange is an open-source, cross-platform, component-based data mining and machine learning software suite which features friendly yet powerful and flexible visual programming front-end for exploratory data analysis, visualization, model construction, evaluation, and forecast. It includes a comprehensive set of components (we call them widgets) for data preprocessing, feature scoring and filtering, modeling, model evaluation and exploration. It is maintained and developed at the Bioinformatics Laboratory of the Faculty of Computer and Information Science, University of Ljubljana, Slovenia.

Website: http://orange.biolab.si/
Wikipedia: https://en.wikipedia.org/wiki/Orange_(software)

Orange workflow screenshot

Ideal student candidate

Since Orange is mostly written in Python 3, the ideal student will have strong skills in idiomatic Python 🐍 and NumPy with at least two years of experience. They would not be uncomfortable reading technical and scientific articles. 🎓 They would also possess some knowledge of Git and GitHub because this is how we roll. :octocat: Some understanding of how GUI widget toolkits behave and work (such as Qt in particular) is a strong plus.

The best candidates will understand the foundations of the Unix Philosophy and will chant its koans regularly before bedtime. The best candidates use a POSIX system because everyone knows you can't develop on Windos.

Google Summer of Code selection process is quite competitive. Accepted students typically have thoroughly researched the technologies of their proposed project and have been in frequent contact with potential mentors.

How to start

Here's the recipe to win:

If you're set to win with Orange:

  • Brush-up on your Git skills.
  • Download and try Orange. (If you're on GNU/Linux, please find partly outdated installation instructions somewhere in the Wiki.) Click around. Get a feel for what Orange can (and can't) do. Install add-ons (Options → Add-ons). Follow the tutorials on our blog and in documentation.
  • Join the mailing list. Talk about what you want to do. If we see you're willing to get involved, we are sure to review your proposal more favorably.
  • Fill the application forms.

If you intend to come visit us in person, we share lunch (pizzas or Indian food) on Fridays!

GSoC application resources

Contact

Open-source development includes open and transparent communication. To step in contact with us, please use one of the preferred means of communication:

  • the mailing list to discuss your GSoC application, proposed ideas, idea implementation details, ...,
  • Orange issue tracker for queries that look like issues (bugs, legitimate feature requests),
  • pull-requests section for issues that include patches 👍 that fix them,
  • (please join) #orange-dev IRC channel on Freenode for low-latency discussions and almost instant feedback (provided we're around),
  • email info at biolab.si for private queries,
  • contact the mentors directly (only if all else fails).

Project ideas

Listed in no particular order, sometimes vague and incomplete, are some ideas for projects that might be interesting to carry out during this year's Google Summer of Code program.

If you'd like to discuss a particular implementation of any of these ideas, or if you have questions regarding your own idea, please join our mailing list and don't be shy posing any questions there. 😃

To be clear, your own ideas that complement Orange are most welcome!

Widgets in separate threads

Widgets in Orange currently run in a single thread. As widgets are conceptually mostly independent, given their inputs, they could frequently appear more responsive by working in parallel (in a real data-flow manner). The objective of this task is to modify Orange so that each widget could run in its own thread or process. The problem is exacerbated by the requirement of Qt GUI widgets (and everything that touches them) to run in the main application thread. The task requires one to navigate around this limitation with as little change as possible to existing widgets' codebase.

Intensity: hard
Extent: limited
Severity: wishlist
Involves: Qt, threading
Mentors: @kernc, @ales-erjavec

Modernize/clean/unify widget GUI construction API

Most Orange widgets rely on Orange/widgets/gui.py module, which, to be honest, shows age and the multitude of developers that worked on it. The result is that widgets' windows don't always look uniform, some widgets' windows' widgets don't behave as convenience would dictate, some important and helpful widgets are unavailable, the API of gui.py functions is not uniform, ...

The task assumes a much nicer gui2.py that still plays reasonably well with existing widgets' code and also accounts for Widgets in separate threads task.

Intensity: moderate
Extent: extensive
Severity: important
Involves: Orange widgets, Qt, software architecture design
Mentors: @kernc

Orange Add-on: Internet of Things

The recent availability of single-board computers allows for collection of large amounts of sensoric and streaming data. With a proper tool for analysis of such data, there is a huge potential of extracting relevant information for further use.

This task includes porting Orange to Raspberry Pi and development of widgets for reading data from Raspberry Pi sensors. Furthermore, this task also includes collecting streaming data through network to support reading data from other devices on the network or an online source such as a weather station.

Intensity: hard
Extent: extensive
Severity: wishlist
Involves: Python, RPi
Mentors: @acopar

Orange Add-on: Statistics

In a desire to make Orange a complete data analysis software, we wish to introduce simple statistics widgets to Orange. The new add-on would include t-test (sample t-test, independent t-test, pair samples t-test + Bayesian counterparts, ANOVA (+ Bayesian ANOVA), Pearson's r, correlations, normalization, etc. We also wish to extend Box Plot widget to output basic statistics (mean, median, variance, confidence intervals, standard deviation) to a Data Table.

Intensity: moderate
Extent: extensive
Severity: wishlist
Involves: Python, Qt
Mentors: @janezd, @ajdapretnar, @kernc

Porting Orange 2 widgets to Orange 3

Orange has had a long history. There was a period when most of Orange core was a custom C++ code, with glue code and widgets in Python 2 and PyQwt. It eventually proved hard to manage, so we migrated to the current basic stack: Python 3, PyQtGraph, NumPy.

We haven't yet managed to port all the widgets (there's so much to do!), and we would really like to see the following widgets from Orange 2: Nomogram, Interaction Graph, SOM (self-organizing maps) with viewer, Reliability (classifier reliability estimation), Ensemble, and multiple visualization widgets.

Any of those widgets can be a proposal itself!

Intensity: hard
Extent: extensive
Severity: important
Involves: Orange 2, Cython
Mentors: @janezd, @BlazZupan

Orange package for Debian/Ubuntu GNU/Linux

GNU/Linux is one of our three target platforms. While the inferior platforms enjoy prebuilt executable packages, GNU/Linux users are let to themselves. This is all fine and dandy as GNU/Linux users usually find they way around pretty easily, but it would still be convenient if users could more simply just dpkg that deb or apt-get orange onto their 'buntu boxen.

The only way to ever become a Debian maintainer and positively affect millions is to start.

Intensity: easy
Extent: limited
Severity: wishlist
Involves: git-buildpackage, Open Build Service
Mentors: @kernc

Telemetry and automatic bug submission

We would like to know how our users use Orange, what widgets they use most often; whether they experience any crashes and what are they.

This task requires engineering of an opt-in telemetry and automatic GitHub bug submission solution for when the application eventually crashes.

Intensity: moderate
Extent: broad
Severity: important
Involves: Qt, web back-end programming, GitHub API
Mentors: @kernc, @VesnaT, @ales-erjavec

Export OWS to Python code

Imagine analyzing some data in Orange. You build a model that you are satisfied with, but you just want to make some final adjustments to your model in the code. Unfortunately, there is currently no way to export widgets' functionality and the built OWS workscheme into the underlying Python code (think of IPython Notebook's --to script).

It'd be useful if we could transparently export (linearize) Orange workflow schemes into raw Python code one could further edit and run as Python script.

Note, the mentors have no idea how this could be done. They just wish it was.

Intensity: hard
Extent: broad
Severity: wishlist
Involves: Python
Mentors: @kernc