diff --git a/.nojekyll b/.nojekyll index cd6972f7..acb7f826 100644 --- a/.nojekyll +++ b/.nojekyll @@ -1 +1 @@ -0d39069e \ No newline at end of file +6279f136 \ No newline at end of file diff --git a/cards/AlbaMartinez.html b/cards/AlbaMartinez.html index a3b4977b..1f0d3180 100644 --- a/cards/AlbaMartinez.html +++ b/cards/AlbaMartinez.html @@ -2,7 +2,7 @@ - + diff --git a/cards/JARomero.html b/cards/JARomero.html index 73d87355..04119fdb 100644 --- a/cards/JARomero.html +++ b/cards/JARomero.html @@ -2,7 +2,7 @@ - + diff --git a/develop/01_RDM_intro.html b/develop/01_RDM_intro.html index 8cc4f07b..f8e3424a 100644 --- a/develop/01_RDM_intro.html +++ b/develop/01_RDM_intro.html @@ -2,7 +2,7 @@ - + @@ -292,7 +292,7 @@

1. Introduction to RDM

Modified
-

August 20, 2024

+

September 13, 2024

diff --git a/develop/02_DMP.html b/develop/02_DMP.html index 6569cf3c..2226548e 100644 --- a/develop/02_DMP.html +++ b/develop/02_DMP.html @@ -2,7 +2,7 @@ - + @@ -258,7 +258,7 @@

2. Data Management Plan

Modified
-

August 20, 2024

+

September 13, 2024

diff --git a/develop/03_DOD.html b/develop/03_DOD.html index 180f41bb..39492582 100644 --- a/develop/03_DOD.html +++ b/develop/03_DOD.html @@ -2,7 +2,7 @@ - + @@ -308,7 +308,7 @@

3. Data organization and storage

Modified
-

August 20, 2024

+

September 13, 2024

diff --git a/develop/04_metadata.html b/develop/04_metadata.html index 1f00434c..8f0e6ac9 100644 --- a/develop/04_metadata.html +++ b/develop/04_metadata.html @@ -2,7 +2,7 @@ - + @@ -302,7 +302,7 @@

4. Documentation for biodata

Modified
-

August 20, 2024

+

September 13, 2024

diff --git a/develop/05_VC.html b/develop/05_VC.html index 5405c4cb..0d56e236 100644 --- a/develop/05_VC.html +++ b/develop/05_VC.html @@ -2,7 +2,7 @@ - + @@ -261,7 +261,7 @@

5. Version Control with Git and GitHub

Modified
-

August 20, 2024

+

September 13, 2024

@@ -316,25 +316,6 @@

Best Pract -
-
-
- -
-
-Take our course on Git & Github -
-
-
-

if you’re interested in delving deeper, explore our course on Git and GitHub.

-

Alternatively, here are some examples and online resources to expand your understanding:

- -
-

Version control using Git

Git is a widely adopted version control system that empowers developers and researchers to efficiently manage their project’s history, collaborate seamlessly, track changes, and ensure data integrity. Git operates on core principles and mechanisms:

@@ -354,6 +335,7 @@

Version control

GitHub Hosting for Git

In addition to exploring Git, we will also explore GitHub, a collaborative platform for hosting Git repositories. GitHub enhances Git’s capabilities by offering features like issue tracking, security measures to protect repositories, and GitHub Pages for creating project websites. Additionally, GitHub provides the option to set repositories as private until you are ready to share your work publicly.

+

The difference between Git and GitHub is that Git is a version control system used to track changes in code, while GitHub is a cloud-based platform that hosts Git repositories and facilitates collaboration. Essentially, GitHub serves as an online access point for managing and sharing repositories.

@@ -381,7 +363,7 @@

GitHub Hosting for

-

We will discuss repositories for archiving experimental or large datasets in lesson 7. However, if you are interested in version control large files, we recommend the use of git annex. It is important to store files with a checksum (MD5, SHA1, SHA256) to verify that files are not altered or corrupted buy recomputing their signature.

+

We will discuss repositories for archiving experimental or large datasets in lesson 7. However, if you are interested in version control large files, we recommend the use of git annex. It is also important to archive files with a checksum (MD5, SHA1, SHA256) to verify that files are not altered or corrupted buy recomputing their signature.

@@ -423,7 +405,7 @@
Con
-
+
@@ -432,7 +414,7 @@
Con
-
+
@@ -500,7 +482,24 @@

Wrap up

In this lesson, we explored version control and utilized Git and GitHub to establish data analysis repositories from our Project folders. Additionally, we delved into creating a GitHub organization and leveraging GitHub Pages to showcase data analysis scripts and notebooks publicly. Remember to complete the corresponding exercise from the practical workshop to reinforce your knowledge.

Sources

+
+
+
+ +
+
+Take our Git & Github course at KU +
+
+
+

If you’re interested in delving deeper, explore our course on Git and GitHub.

+
+
+

Alternatively, here are some examples and online resources to expand your understanding:

diff --git a/develop/06_pipelines.html b/develop/06_pipelines.html index 52cb8a4b..eee1219f 100644 --- a/develop/06_pipelines.html +++ b/develop/06_pipelines.html @@ -2,7 +2,7 @@ - + @@ -243,7 +243,7 @@

6. Processing and analyzing biodata

Modified
-

August 20, 2024

+

September 13, 2024

diff --git a/develop/07_repos.html b/develop/07_repos.html index 6e378f98..2bd4f827 100644 --- a/develop/07_repos.html +++ b/develop/07_repos.html @@ -2,7 +2,7 @@ - + @@ -235,6 +235,7 @@

On this page

-

While platforms like GitHub excel in version control and collaborative coding, repositories like Zenodo, Gene Expression Omnibus, and Annotare specialize in archiving and sharing scientific data, ensuring long-term accessibility for the global research community.

+

While platforms like GitHub excel in version control and collaborative coding, repositories specialize in archiving and sharing scientific data (e.g. Zenodo), ensure long-term accessibility for the global research community.

+

What to archive and how?

+

A framework for reproducibility in computational research can generally be divided into three key, though sometimes overlapping, categories:

+
    +
  1. Readable Components: This includes elements such as literature reviews, code documentation, data documentation, and workflow descriptions outlining how the code interacts with the data.
  2. +
  3. Executable Components: These are the actual code, scripts, and software that need to be compiled and run to reproduce results.
  4. +
  5. Interpretable Components: This refers to the data itself—raw or processed—that the code and scripts work on.
  6. +
+

Researchers are typically more accustomed to archiving readable components, such as papers or data documentation, compared to executable components like scripts and code. However, for research to be fully reproducible, it is crucial that all key components, including executable ones, are properly archived.

+

When choosing an archival solution, it’s important to recognize that there is no one-size-fits-all option. Several factors must be considered, including data size, format requirements, licensing conditions, cost, and tools for data attribution and citation. Each of these features plays a crucial role in selecting the most suitable archive for your needs.

Data Repositories and Archives

Specialized repositories and archives securely store, curate, and disseminate scientific data, ensuring long-term preservation, transparency, and citability of research findings through standardized formats and rigorous curation processes.

@@ -312,7 +322,7 @@

Data Reposi

-

Check the registry of research data repositoriesre3data.org for a full overview. You can browse by subject if you are looking within a specific field.

+

Check the registry of research data repositories,re3data.org for a full overview. You can browse by subject if you are looking within a specific field.

There are two types of repositories:

@@ -425,24 +435,15 @@

Domain-specif

By adhering to standards, repositories ensure that submitted data is high quality, well-documented, and compliant with community best practices, promoting data discovery, reproducibility, and interoperability within the scientific community.

Following all the recommendations in this course makes it straightforward to provide the necessary documentation and information for these repositories. For instance, repositories specific to NGS data will require the raw FASTQ files, sample metadata, and protocols as well as final pre-processing results (for instance, read count matrices in BED files).

-
-
-
- -
-
-Warning -
-
-
-

Keep in mind that these repositories are not intended for downstream analysis data and associated code. However, you should already have those versions controlled by GitHub, which eliminates any concerns. You can then archive such repositories in a general repository like Zenodo.

+

+
+

Software and computations archives

+

Keep in mind that data repositories are not intended for downstream analysis data and associated code. However, you should already have those versions controlled by GitHub, which eliminates any concerns. You can then archive such repositories in a general repository like Zenodo.

Archives for software source code are essential for long-term accessibility and reproducibility and are becoming very popular. Check Software Heritage if you are developing software.

- -

General repositories

-

There are plenty of data archiving repositories. We recommend to check the Longwood Research Data management website at Harvard for a quick overview. Some of the most well-known are:

+

There are plenty of general archiving repositories. We recommend to check the Longwood Research Data management website at Harvard for a quick overview. Some of the most well-known are:

  • Dataverse
  • Dryad
  • @@ -1051,7 +1052,7 @@

    Wrap up

    }); - -