-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Epic] Automated Data Science bootstrap from curated content sets #402
Comments
/kind key-result |
We have discussed an approach that would reuse logic for template projects. Let's sync if we want to develop and maintain this type of logic in Kebechet. |
I suggest making this the first milestone. It's ok to assume the user knows which stacks are available. So that
does that make sense? |
/sig user-experience |
There is a pre-requisite to this, which is that the stack content is readily-usable. This fits with Frido's comment about template projects.
.. and install the bot.
These 3 steps can be potentially reduced to one by just using the template project logic. As this is a pre-requisite anyway, I suggest we make that the first milestone. The bot automation can be a follow-up. I updated the description a bit to reflect that, with the template logic being "phase 1". Makes sense? |
Not sure how you plan to implement this, but sounds like it would require the addition of an ever growing set of template repo's (is that right?). Have you considered using a cookie-cutter [1] like repo that would serve as a single repo, with dynamic options that can be implemented on new repo creations based on the users specific needs? You can also look at [2] our ds cookie cutter repo for a very simple example. [1] https://github.com/cookiecutter/cookiecutter |
Correct, that was the idea: start initially with the 3 "predictable stacks" we have been working on, but eventually offer more options.
I saw cookie-cutter and the work you are doing with it, and I was planning to look closer at it, starting with one of the repos (see mention of cookie-cutter/your template in thoth-station/ps-nlp#154 as an option to explore). But I was still thinking on separate repos. Honestly, this did not occur to me:
Thanks for the suggestion! I will look closer. An initial question that comes to mind, though: wouldn't that single repo become too big/complex? e.g. the NLP stack alone already has 4 overlays. One goal of this functionality is to be simple, easy to understand - it is meant to bootstrap/get started, and I am wondering if we would potentially be over-complicating the starting point. |
Its certainly a trade off to consider. Managing 1 complex repo vs complexity of managing multiple simple repos. Again, depends on how you plan to implement this. Was just presenting a possible suggestion/ alternative. Would it be as or more complex than https://github.com/operate-first/apps ? |
Yes, and thanks again for the suggestion, it is being considered.
It would not be as complex as that one, no. /milestone OKR review Q2 2022 |
/milestone OKR review Q2 2022 |
/triage accepted |
/remove-lifecycle active |
Problem statement
As a Data Scientist,
I want a service that provides me an easy mechanism to bootstrap a new Data Science project, starting from a curated software stack that is appropriate for my project’s goals and available in a shared environment,
so that I can quickly start working on the Data Science project tasks without having to invest time in preparing a working environment, and I can be confident that the project is reproducible and maintainable.
High-level Goals
Starting a new Data Science project from scratch, user interacts with a git forge to obtain a git repository populated from a relevant curated software stack, with bots that keep it up to date with recommendations and make the git project readily available to start working on the Data Science tasks.
This involves:
Proposal description
Phase 1
As a Data Scientist,
I want to be able to bootstrap a new GitHub repository from an existing template that contains a curated software stack that is relevant to my project.
Phase 1.5 is: automate phase 1 with a script.
Phase 2
As a Data Scientist
I want to open an Issue "please create an Image Processing notebook" on GitHub that triggers Thoth bot to start populating my repository:
Alternatives
User manually doing each step
Additional context
Acceptance Criteria
The text was updated successfully, but these errors were encountered: