Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Baby Steps Towards De-Sci (Decentralized Science) with Datalad + git annex + IPFS (Interplanetary File Storage System) #95

Open
7 of 13 tasks
hebbianloop opened this issue Dec 11, 2020 · 4 comments

Comments

@hebbianloop
Copy link
Contributor

hebbianloop commented Dec 11, 2020

Project info

Title:

Baby Steps Towards a Decentralized Science (De-Sci) with Datalad + git annex + Interplanetary File Storage System and a Dash of Ethereum

image

Project lead:
Shady El Damaty - @hebbianloop

Project collaborators:
@nkhalsa

Registered Brainhack Global 2020 Event:
Brainhack DC

Project Description:
A substantial barrier to open science practice is the sharing and accessibility of datasets. Often datasets are stored in a centralized location such as a lab's server or in costly enterprise cloud systems.

There are multiple problems associated with centralized data storage: 1) outages may make data temporarily unavailable, 2) data can disappear forever if the central location suffers failure, 3) centralized data storage enables censorship and can limit accessibility.

The datalad version control software takes steps to address this by including git annex in the back-end to support multiple types of "special remotes" for downloading and publishing datasets. However, there has been no attempt to bridge a decentralized file storage protocol into the datalad suite of supported remotes.

The interplanetary file system allows peer-to-peer sharing of data and storage on distributed networks such as bittorrent, filecoin and cloudflare.

Data storage on these distributed networks also enables tokenization of individual datasets on the Ethereum blockchain and is an important first step for establishing data marketplaces for the peer-to-peer exchange of data and models.

The current project aims to explore the requirements and feasibility of upgrading datalad to support ipfs by including wrapper code for the definition of an ipfs special remote. Once implemented, the project will satisfy requirements for tools needed to automate the tokenization of datasets on the ethereum blockchain.

What we are Doing
Including IPFS special remote capability to datalad

For Who?
For Decentralized Science!

Why?
Centralized data storage is not sustainable in the era of web 3.0

Resources
Git Annex IPFS
Datalad FAQ
IPFS
Infura (IPFS API)
A tokenized brain

Data to use:
Open Neuro

Link to project repository/sources:

Goals for Brainhack Global 2020:

  • Create test IPFS repository with git annex

  • Research datlad CLI and outline strategy for modifying special remotes. Open well-documented and clear issue on datalad github repository.

  • Implement modification to datalad for special remotes with ipfs

  • Test modification with open source data and host on IPFS

  • Tokenize an example dataset on the ethereum blockchain

Good first issues:

  1. How does datalad work with special remotes under the hood? Can you set up your own ftp/ssh special remote?
  2. Demonstrate git annex special remote with IPFS.
  3. Add special remote wrapper/plugin to datalad core
  4. Test on multiple machines/environments
  5. Pull request on datalad repository
  6. Create tokenized dataset on ethereum blockchain

Skills:
You don't require much background besides familiarity with the terminal and working with the command line in a unix-y environment. We will work together and research how to add the special remote. Familiarity with git highly recommended.

Tools/Software/Methods to Use:
git
git annex
datalad
python

Communication channels:
https://mattermost.brainhack.org/brainhack/channels/bhg-washingtondc

Project labels

  • Type of project:
    #coding_methods, #data_management

  • Project development status:
    #0_concept_no_content

  • Topic of the project:
    #reproducible_scientific_methods

  • Tools used in the project:
    #BIDS, #Datalad, #Jupyter

  • Tools skill level required to enter the project (more than one possible):
    #familiar, #no_skills_required

  • Programming language used in the project:
    #Python, R, #shell_scripting, #Unix_command_line, #Web, workflows

  • Modalities involved in the project (if any):
    none

  • Git skills reuired to enter the project (more than one possible):
    #2_branches_PRs

  • I added all of the labels I want an associate to my project

Project Submission

Submission checklist

Once the issue is submitted, please check items in this list as you add under ‘Additional project info’

  • Link to your project: could be a code repository, a shared document, etc.
  • Goals for Brainhack Global 2020: describe what you want to achieve during this brainhack.
  • Flesh out at least 2 “good first issues”: those are tasks that do not require any prior knowledge about your project, could be defined as issues in a GitHub repository, or in a shared document.
  • Skills: list skills that would be particularly suitable for your project. We ask you to include at least one non-coding skill. Use the issue labels for this purpose.
  • Chat channel: A link to a chat channel that will be used during the Brainhack Global 2020 event. This can be an existing channel or a new one. We recommend using the Brainhack space on Mattermost.

We would like to think about how you will credit and onboard new members to your project. If you’d like to share your thoughts with future project participants, you can include information about:

  • Specify how you will acknowledge contributions (e.g. listing members on a contributing page).
  • Provide links to onboarding documents if you have some:
@hebbianloop
Copy link
Contributor Author

Hi @Brainhack-Global/project-monitors: my project is ready!

@complexbrains
Copy link
Contributor

Dear @seldamat Thank you very much for submitting your project to the Brainhack Global 2020 🎉 Project seems ready but only missing an image of its own for us to create for your own project-specific card as the other examples here. As soon as we will have the image (you can put the image anywhere in the issue) we will publish it 🚀

Looking forward to hearing from you 🤗

@hebbianloop
Copy link
Contributor Author

thank you! done!

@complexbrains
Copy link
Contributor

@seldamat your project is published https://brainhack.org/global2020/project/project_95/ and tweeted https://twitter.com/brainhackorg/status/1337770184187240453

Hope you enjoy your participation to Brainhack 🤗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment