-
-
@@ -791,23 +791,23 @@ Sample metadata fie
Sample metadata fie
-
-
@@ -1376,23 +1376,23 @@ Assay metadata field
-
-
@@ -1980,23 +1980,23 @@ Assay metadata field
-
-
diff --git a/search.json b/search.json
index 1a71e96c..bc645fb3 100644
--- a/search.json
+++ b/search.json
@@ -15,7 +15,7 @@
"href": "develop/06_pipelines.html",
"title": "6. Processing and analyzing biodata",
"section": "",
- "text": "In this section, we explore essential elements of reproducibility and efficiency in computational research, highlighting techniques and tools for creating robust and transparent coding and workflows. By prioritizing reproducibility and replicability, researchers can enhance the credibility and impact of their findings while fostering collaboration and knowledge dissemination within the scientific community.\n\n\n\n\n\n\nBefore you start…\n\n\n\n\nChoose a folder structure (e.g., using cookiecutter)\nChoose a file naming system\nAdd a README describing the project (and the naming conventions)\nInstall and set up version control (e.g., Git and Github)\nChoose a coding style!\n\n\nPython: Python’s PEP or Google’s style guide\nR: Google’s style guide or Tidyverse’s style guide\n\n\n\n\n\nThrough techniques such as scripting, containerization (e.g., Docker), and virtual environments, researchers can create reproducible analyses that enable others to validate and build upon their work. Emphasizing the documentation of data processing steps, parameters, and results ensures transparency and accountability in research outputs. To write clear and reproducible code, take the following approach: write functions, code defensively (such as input validation, error handling, etc.), add comments, conduct testing, and maintain proper documentation.\nTools for reproducibility:\n\nCode notebooks: Utilize tools like Jupyter Notebook and R Markdown to combine code with descriptive text and visualizations, enhancing data documentation.\n\nIntegrated development environments: Consider using platforms such as (knitr or MLflow) to streamline code development and documentation processes.\nPipeline frameworks or workflow management systems: Implement systems like Nextflow and Snakemake to automate data analysis steps (including data extraction, transformation, validation, visualization, and more). Additionally, they contribute to ensuring interoperability by facilitating seamless integration and interaction between different components or stages.\n\n\n\nComputational notebooks (e.g., Jupyter, R Markdown) provide researchers with a versatile platform for exploratory and interactive data analysis. These notebooks facilitate sharing insights with collaborators and documentation of analysis procedures.\n\n\n\nTools such as Nextflow and Snakemake streamline and automate various data analysis steps, enabling parallel processing and seamless integration with existing tools. Remember to create portable code and use relative paths to ensure transferability between users.\n\nNextflow: offers scalable and portable NGS data analysis pipelines, facilitating data processing across diverse computing environments.\nSnakemake: Utilizing Python-based scripting, Snakemake allows for flexible and automated NGS data analysis pipelines, supporting parallel processing and integration with other tools.\n\nOnce your scientific computational workflow is ready to be shared, publish your scientific computational workflow on WorkflowHub.\n\n\n\nEach computer or HPC (High-Performance Computing) platform has a unique computational environment that includes its operating system, installed software, versions of software packages, and other features. If a research project is moved to a different computer or platform, the analysis might not run or produce consistent results if it depends on any of these factors.\nFor research to be reproducible, the original computational environment must be recorded so others can replicate it. There are several methods to achieve this:\n\nContainerization platforms (e.g., Docker, Singularity): allow the researcher to package their software and dependencies into a standardized container image.\nVirtual Machines (e.g., VirtualBox): can share an entire virtualized computing environment (OS, software and dependencies)\nEnvironment managers: provide an isolated environment with specific packages and dependencies that can be installed without affecting the system-wide configuration. These environments are particularly useful for managing conflicting dependencies and ensuring reproducibility. Configuration files can automate the setup of the computational environment:\n\nconda: allows users to export environment specifications (software and dependencies) to YAML files enabling easy recreation of the environment on another system\nPython virtualenv: is a tool for creating isolated environments to manage dependencies specific to a project\nrequirements.txt: may contain commands for installing packages (such as pip for Python packages or apt-get for system-level dependencies), configuring system settings, and setting environment variables. Package managers can be used to install, upgrade and manage packages.\nR’s renv: The ‘renv’ package creates isolated environments in R.\n\nEnvironment descriptors\n\nsessionInfo() or devtools::session_info(): In R, these functions provide detailed information about the current session\nsessionInfo(), similarly, in Python. Libraries like NumPy and Pandas have show_versions() methods to display package versions.\n\n\nWhile environment managers are very easy to use and share across different systems, and are lightweight and efficient, offering fast start-up times, Docker containers provide a full env isolation (including the operating system) which ensures consistent behavior across different systems.\n\n\n\n\nTo maintain clarity and organization in the data analysis process, adopt best practices such as:\n\nData documentation: create a README.md file to provide an overview of the project and its structure, and metadata for understanding the context of your analysis.\nAnnotate your pipelines and comment your code (look for tutorials and templates such as this one from freeCodeCamp).\nUse coding style guides (code lay-out, whitespace in expressions, comments, naming conventions, annotations…) to maintain consistency.\nLabel files numerically to organize the entire data analysis process (scripts, notebooks, pipelines, etc.).\n\n00.preprocessing., 01.data_analysis_step1., etc.\n\nProvide environment files for reproducing the computational environment (such as ‘requirements.txt’ for Python or ‘environment.yml’ for Conda). The simplest way is to document the dependencies by reporting the packages and their versions used to run your analysis.\nData versioning: use version control systems (e.g., Git) and upload your code to a code repository Lesson 5.\nIntegrated development environments (e.g., RStudio, PyCharm) offer tools and features for writing, testing, and debugging code\nLeverage curated pipelines such as the ones developed by the nf-core community, further ensuring adherence to community standards and guidelines.\nAdd a LICENSE file and perform regular updates: clarifying usage permissions and facilitating collaboration.\n\n\n\n\n\n\n\nPractical HPC pipes\n\n\n\nWe provide a hand-on workshop on computational environments and pipelines. Keep an eye on the upcoming events on the Sandbox website.\nIf you’re interested in delving deeper, check out the HPC best practices module we’ve developed here.",
+ "text": "In this section, we explore essential elements of reproducibility and efficiency in computational research, highlighting techniques and tools for creating robust and transparent coding and workflows. By prioritizing reproducibility and replicability, researchers can enhance the credibility and impact of their findings while fostering collaboration and knowledge dissemination within the scientific community.\n\n\n\n\n\n\nBefore you start…\n\n\n\n\nChoose a folder structure (e.g., using cookiecutter)\nChoose a file naming system\nAdd a README describing the project (and the naming conventions)\nInstall and set up version control (e.g., Git and Github)\nChoose a coding style!\n\n\nPython: Python’s PEP or Google’s style guide\nR: Google’s style guide or Tidyverse’s style guide\n\n\n\n\n\nThrough techniques such as scripting, containerization (e.g., Docker), and virtual environments, researchers can create reproducible analyses that enable others to validate and build upon their work. Emphasizing the documentation of data processing steps, parameters, and results ensures transparency and accountability in research outputs. To write clear and reproducible code, take the following approach: write functions, code defensively (such as input validation, error handling, etc.), add comments, conduct testing, and maintain proper documentation.\nTools for reproducibility:\n\nCode notebooks: Utilize tools like Jupyter Notebook and R Markdown to combine code with descriptive text and visualizations, enhancing data documentation.\n\nIntegrated development environments: Consider using platforms such as (knitr or MLflow) to streamline code development and documentation processes.\nPipeline frameworks or workflow management systems: Implement systems like Nextflow and Snakemake to automate data analysis steps (including data extraction, transformation, validation, visualization, and more). Additionally, they contribute to ensuring interoperability by facilitating seamless integration and interaction between different components or stages.\n\n\n\nComputational notebooks (e.g., Jupyter, R Markdown) provide researchers with a versatile platform for exploratory and interactive data analysis. These notebooks facilitate sharing insights with collaborators and documentation of analysis procedures.\n\n\n\nTools such as Nextflow and Snakemake streamline and automate various data analysis steps, enabling parallel processing and seamless integration with existing tools. Remember to create portable code and use relative paths to ensure transferability between users.\n\nNextflow: offers scalable and portable NGS data analysis pipelines, facilitating data processing across diverse computing environments.\nSnakemake: Utilizing Python-based scripting, Snakemake allows for flexible and automated NGS data analysis pipelines, supporting parallel processing and integration with other tools.\n\nOnce your scientific computational workflow is ready to be shared, publish your scientific computational workflow on WorkflowHub.\n\n\n\nEach computer or HPC (High-Performance Computing) platform has a unique computational environment that includes its operating system, installed software, versions of software packages, and other features. If a research project is moved to a different computer or platform, the analysis might not run or produce consistent results if it depends on any of these factors.\nFor research to be reproducible, the original computational environment must be recorded so others can replicate it. There are several methods to achieve this:\n\nContainerization platforms (e.g., Docker, Singularity): allow the researcher to package their software and dependencies into a standardized container image.\nVirtual Machines (e.g., VirtualBox): can share an entire virtualized computing environment (OS, software and dependencies)\nEnvironment managers: provide an isolated environment with specific packages and dependencies that can be installed without affecting the system-wide configuration. These environments are particularly useful for managing conflicting dependencies and ensuring reproducibility. Configuration files can automate the setup of the computational environment:\n\nconda: allows users to export environment specifications (software and dependencies) to YAML files enabling easy recreation of the environment on another system\nPython virtualenv: is a tool for creating isolated environments to manage dependencies specific to a project\nrequirements.txt: may contain commands for installing packages (such as pip for Python packages or apt-get for system-level dependencies), configuring system settings, and setting environment variables. Package managers can be used to install, upgrade and manage packages.\nR’s renv: The ‘renv’ package creates isolated environments in R.\n\nEnvironment descriptors\n\nsessionInfo() or devtools::session_info(): In R, these functions provide detailed information about the current session\nsessionInfo(), similarly, in Python. Libraries like NumPy and Pandas have show_versions() methods to display package versions.\n\n\nWhile environment managers are very easy to use and share across different systems, and are lightweight and efficient, offering fast start-up times, Docker containers provide a full env isolation (including the operating system) which ensures consistent behavior across different systems.\n\n\n\n\nTo maintain clarity and organization in the data analysis process, adopt best practices such as:\n\nData documentation: create a README.md file to provide an overview of the project and its structure, and metadata for understanding the context of your analysis.\nAnnotate your pipelines and comment your code (look for tutorials and templates such as this one from freeCodeCamp).\nUse coding style guides (code lay-out, whitespace in expressions, comments, naming conventions, annotations…) to maintain consistency.\nLabel files numerically to organize the entire data analysis process (scripts, notebooks, pipelines, etc.).\n\n00.preprocessing., 01.data_analysis_step1., etc.\n\nProvide environment files for reproducing the computational environment (such as ‘requirements.txt’ for Python or ‘environment.yml’ for Conda). The simplest way is to document the dependencies by reporting the packages and their versions used to run your analysis.\nData versioning: use version control systems (e.g., Git) and upload your code to a code repository Lesson 5.\nIntegrated development environments (e.g., RStudio, PyCharm) offer tools and features for writing, testing, and debugging code\nLeverage curated pipelines such as the ones developed by the nf-core community, further ensuring adherence to community standards and guidelines.\nAdd a LICENSE file and perform regular updates: clarifying usage permissions and facilitating collaboration.\n\n\n\n\n\n\n\nPractical HPC pipes\n\n\n\nWe provide a hand-on workshop on computational environments and pipelines. Keep an eye on the upcoming events on the Sandbox website. If you’re interested in delving deeper, check out the HPC best practices module we’ve developed here.",
"crumbs": [
"Course material",
"Key practices",
@@ -27,7 +27,7 @@
"href": "develop/06_pipelines.html#code-and-pipelines-for-data-analysis",
"title": "6. Processing and analyzing biodata",
"section": "",
- "text": "In this section, we explore essential elements of reproducibility and efficiency in computational research, highlighting techniques and tools for creating robust and transparent coding and workflows. By prioritizing reproducibility and replicability, researchers can enhance the credibility and impact of their findings while fostering collaboration and knowledge dissemination within the scientific community.\n\n\n\n\n\n\nBefore you start…\n\n\n\n\nChoose a folder structure (e.g., using cookiecutter)\nChoose a file naming system\nAdd a README describing the project (and the naming conventions)\nInstall and set up version control (e.g., Git and Github)\nChoose a coding style!\n\n\nPython: Python’s PEP or Google’s style guide\nR: Google’s style guide or Tidyverse’s style guide\n\n\n\n\n\nThrough techniques such as scripting, containerization (e.g., Docker), and virtual environments, researchers can create reproducible analyses that enable others to validate and build upon their work. Emphasizing the documentation of data processing steps, parameters, and results ensures transparency and accountability in research outputs. To write clear and reproducible code, take the following approach: write functions, code defensively (such as input validation, error handling, etc.), add comments, conduct testing, and maintain proper documentation.\nTools for reproducibility:\n\nCode notebooks: Utilize tools like Jupyter Notebook and R Markdown to combine code with descriptive text and visualizations, enhancing data documentation.\n\nIntegrated development environments: Consider using platforms such as (knitr or MLflow) to streamline code development and documentation processes.\nPipeline frameworks or workflow management systems: Implement systems like Nextflow and Snakemake to automate data analysis steps (including data extraction, transformation, validation, visualization, and more). Additionally, they contribute to ensuring interoperability by facilitating seamless integration and interaction between different components or stages.\n\n\n\nComputational notebooks (e.g., Jupyter, R Markdown) provide researchers with a versatile platform for exploratory and interactive data analysis. These notebooks facilitate sharing insights with collaborators and documentation of analysis procedures.\n\n\n\nTools such as Nextflow and Snakemake streamline and automate various data analysis steps, enabling parallel processing and seamless integration with existing tools. Remember to create portable code and use relative paths to ensure transferability between users.\n\nNextflow: offers scalable and portable NGS data analysis pipelines, facilitating data processing across diverse computing environments.\nSnakemake: Utilizing Python-based scripting, Snakemake allows for flexible and automated NGS data analysis pipelines, supporting parallel processing and integration with other tools.\n\nOnce your scientific computational workflow is ready to be shared, publish your scientific computational workflow on WorkflowHub.\n\n\n\nEach computer or HPC (High-Performance Computing) platform has a unique computational environment that includes its operating system, installed software, versions of software packages, and other features. If a research project is moved to a different computer or platform, the analysis might not run or produce consistent results if it depends on any of these factors.\nFor research to be reproducible, the original computational environment must be recorded so others can replicate it. There are several methods to achieve this:\n\nContainerization platforms (e.g., Docker, Singularity): allow the researcher to package their software and dependencies into a standardized container image.\nVirtual Machines (e.g., VirtualBox): can share an entire virtualized computing environment (OS, software and dependencies)\nEnvironment managers: provide an isolated environment with specific packages and dependencies that can be installed without affecting the system-wide configuration. These environments are particularly useful for managing conflicting dependencies and ensuring reproducibility. Configuration files can automate the setup of the computational environment:\n\nconda: allows users to export environment specifications (software and dependencies) to YAML files enabling easy recreation of the environment on another system\nPython virtualenv: is a tool for creating isolated environments to manage dependencies specific to a project\nrequirements.txt: may contain commands for installing packages (such as pip for Python packages or apt-get for system-level dependencies), configuring system settings, and setting environment variables. Package managers can be used to install, upgrade and manage packages.\nR’s renv: The ‘renv’ package creates isolated environments in R.\n\nEnvironment descriptors\n\nsessionInfo() or devtools::session_info(): In R, these functions provide detailed information about the current session\nsessionInfo(), similarly, in Python. Libraries like NumPy and Pandas have show_versions() methods to display package versions.\n\n\nWhile environment managers are very easy to use and share across different systems, and are lightweight and efficient, offering fast start-up times, Docker containers provide a full env isolation (including the operating system) which ensures consistent behavior across different systems.\n\n\n\n\nTo maintain clarity and organization in the data analysis process, adopt best practices such as:\n\nData documentation: create a README.md file to provide an overview of the project and its structure, and metadata for understanding the context of your analysis.\nAnnotate your pipelines and comment your code (look for tutorials and templates such as this one from freeCodeCamp).\nUse coding style guides (code lay-out, whitespace in expressions, comments, naming conventions, annotations…) to maintain consistency.\nLabel files numerically to organize the entire data analysis process (scripts, notebooks, pipelines, etc.).\n\n00.preprocessing., 01.data_analysis_step1., etc.\n\nProvide environment files for reproducing the computational environment (such as ‘requirements.txt’ for Python or ‘environment.yml’ for Conda). The simplest way is to document the dependencies by reporting the packages and their versions used to run your analysis.\nData versioning: use version control systems (e.g., Git) and upload your code to a code repository Lesson 5.\nIntegrated development environments (e.g., RStudio, PyCharm) offer tools and features for writing, testing, and debugging code\nLeverage curated pipelines such as the ones developed by the nf-core community, further ensuring adherence to community standards and guidelines.\nAdd a LICENSE file and perform regular updates: clarifying usage permissions and facilitating collaboration.\n\n\n\n\n\n\n\nPractical HPC pipes\n\n\n\nWe provide a hand-on workshop on computational environments and pipelines. Keep an eye on the upcoming events on the Sandbox website.\nIf you’re interested in delving deeper, check out the HPC best practices module we’ve developed here.",
+ "text": "In this section, we explore essential elements of reproducibility and efficiency in computational research, highlighting techniques and tools for creating robust and transparent coding and workflows. By prioritizing reproducibility and replicability, researchers can enhance the credibility and impact of their findings while fostering collaboration and knowledge dissemination within the scientific community.\n\n\n\n\n\n\nBefore you start…\n\n\n\n\nChoose a folder structure (e.g., using cookiecutter)\nChoose a file naming system\nAdd a README describing the project (and the naming conventions)\nInstall and set up version control (e.g., Git and Github)\nChoose a coding style!\n\n\nPython: Python’s PEP or Google’s style guide\nR: Google’s style guide or Tidyverse’s style guide\n\n\n\n\n\nThrough techniques such as scripting, containerization (e.g., Docker), and virtual environments, researchers can create reproducible analyses that enable others to validate and build upon their work. Emphasizing the documentation of data processing steps, parameters, and results ensures transparency and accountability in research outputs. To write clear and reproducible code, take the following approach: write functions, code defensively (such as input validation, error handling, etc.), add comments, conduct testing, and maintain proper documentation.\nTools for reproducibility:\n\nCode notebooks: Utilize tools like Jupyter Notebook and R Markdown to combine code with descriptive text and visualizations, enhancing data documentation.\n\nIntegrated development environments: Consider using platforms such as (knitr or MLflow) to streamline code development and documentation processes.\nPipeline frameworks or workflow management systems: Implement systems like Nextflow and Snakemake to automate data analysis steps (including data extraction, transformation, validation, visualization, and more). Additionally, they contribute to ensuring interoperability by facilitating seamless integration and interaction between different components or stages.\n\n\n\nComputational notebooks (e.g., Jupyter, R Markdown) provide researchers with a versatile platform for exploratory and interactive data analysis. These notebooks facilitate sharing insights with collaborators and documentation of analysis procedures.\n\n\n\nTools such as Nextflow and Snakemake streamline and automate various data analysis steps, enabling parallel processing and seamless integration with existing tools. Remember to create portable code and use relative paths to ensure transferability between users.\n\nNextflow: offers scalable and portable NGS data analysis pipelines, facilitating data processing across diverse computing environments.\nSnakemake: Utilizing Python-based scripting, Snakemake allows for flexible and automated NGS data analysis pipelines, supporting parallel processing and integration with other tools.\n\nOnce your scientific computational workflow is ready to be shared, publish your scientific computational workflow on WorkflowHub.\n\n\n\nEach computer or HPC (High-Performance Computing) platform has a unique computational environment that includes its operating system, installed software, versions of software packages, and other features. If a research project is moved to a different computer or platform, the analysis might not run or produce consistent results if it depends on any of these factors.\nFor research to be reproducible, the original computational environment must be recorded so others can replicate it. There are several methods to achieve this:\n\nContainerization platforms (e.g., Docker, Singularity): allow the researcher to package their software and dependencies into a standardized container image.\nVirtual Machines (e.g., VirtualBox): can share an entire virtualized computing environment (OS, software and dependencies)\nEnvironment managers: provide an isolated environment with specific packages and dependencies that can be installed without affecting the system-wide configuration. These environments are particularly useful for managing conflicting dependencies and ensuring reproducibility. Configuration files can automate the setup of the computational environment:\n\nconda: allows users to export environment specifications (software and dependencies) to YAML files enabling easy recreation of the environment on another system\nPython virtualenv: is a tool for creating isolated environments to manage dependencies specific to a project\nrequirements.txt: may contain commands for installing packages (such as pip for Python packages or apt-get for system-level dependencies), configuring system settings, and setting environment variables. Package managers can be used to install, upgrade and manage packages.\nR’s renv: The ‘renv’ package creates isolated environments in R.\n\nEnvironment descriptors\n\nsessionInfo() or devtools::session_info(): In R, these functions provide detailed information about the current session\nsessionInfo(), similarly, in Python. Libraries like NumPy and Pandas have show_versions() methods to display package versions.\n\n\nWhile environment managers are very easy to use and share across different systems, and are lightweight and efficient, offering fast start-up times, Docker containers provide a full env isolation (including the operating system) which ensures consistent behavior across different systems.\n\n\n\n\nTo maintain clarity and organization in the data analysis process, adopt best practices such as:\n\nData documentation: create a README.md file to provide an overview of the project and its structure, and metadata for understanding the context of your analysis.\nAnnotate your pipelines and comment your code (look for tutorials and templates such as this one from freeCodeCamp).\nUse coding style guides (code lay-out, whitespace in expressions, comments, naming conventions, annotations…) to maintain consistency.\nLabel files numerically to organize the entire data analysis process (scripts, notebooks, pipelines, etc.).\n\n00.preprocessing., 01.data_analysis_step1., etc.\n\nProvide environment files for reproducing the computational environment (such as ‘requirements.txt’ for Python or ‘environment.yml’ for Conda). The simplest way is to document the dependencies by reporting the packages and their versions used to run your analysis.\nData versioning: use version control systems (e.g., Git) and upload your code to a code repository Lesson 5.\nIntegrated development environments (e.g., RStudio, PyCharm) offer tools and features for writing, testing, and debugging code\nLeverage curated pipelines such as the ones developed by the nf-core community, further ensuring adherence to community standards and guidelines.\nAdd a LICENSE file and perform regular updates: clarifying usage permissions and facilitating collaboration.\n\n\n\n\n\n\n\nPractical HPC pipes\n\n\n\nWe provide a hand-on workshop on computational environments and pipelines. Keep an eye on the upcoming events on the Sandbox website. If you’re interested in delving deeper, check out the HPC best practices module we’ve developed here.",
"crumbs": [
"Course material",
"Key practices",
diff --git a/sitemap.xml b/sitemap.xml
index 9e04b9d0..e6521cbb 100644
--- a/sitemap.xml
+++ b/sitemap.xml
@@ -2,74 +2,74 @@
https://hds-sandbox.github.io/RDM_NGS_course/use_cases.html
- 2024-05-09T11:48:15.632Z
+ 2024-05-09T11:48:51.746Z
https://hds-sandbox.github.io/RDM_NGS_course/develop/06_pipelines.html
- 2024-05-09T11:48:15.592Z
+ 2024-05-09T11:48:51.706Z
https://hds-sandbox.github.io/RDM_NGS_course/develop/practical_workshop.html
- 2024-05-09T11:48:15.632Z
+ 2024-05-09T11:48:51.742Z
https://hds-sandbox.github.io/RDM_NGS_course/develop/04_metadata.html
- 2024-05-09T11:48:15.592Z
+ 2024-05-09T11:48:51.706Z
https://hds-sandbox.github.io/RDM_NGS_course/develop/05_VC.html
- 2024-05-09T11:48:15.592Z
+ 2024-05-09T11:48:51.706Z
https://hds-sandbox.github.io/RDM_NGS_course/develop/07_repos.html
- 2024-05-09T11:48:15.592Z
+ 2024-05-09T11:48:51.706Z
https://hds-sandbox.github.io/RDM_NGS_course/develop/examples/proteomics_metadata.html
- 2024-05-09T11:48:15.608Z
+ 2024-05-09T11:48:51.726Z
https://hds-sandbox.github.io/RDM_NGS_course/develop/examples/NGS_management.html
- 2024-05-09T11:48:15.608Z
+ 2024-05-09T11:48:51.726Z
https://hds-sandbox.github.io/RDM_NGS_course/cards/AlbaMartinez.html
- 2024-05-09T11:48:15.588Z
+ 2024-05-09T11:48:51.706Z
https://hds-sandbox.github.io/RDM_NGS_course/cards/JARomero.html
- 2024-05-09T11:48:15.592Z
+ 2024-05-09T11:48:51.706Z
https://hds-sandbox.github.io/RDM_NGS_course/develop/examples/mkdocs_pages.html
- 2024-05-09T11:48:15.608Z
+ 2024-05-09T11:48:51.726Z
https://hds-sandbox.github.io/RDM_NGS_course/develop/examples/NGS_OS_FAIR.html
- 2024-05-09T11:48:15.608Z
+ 2024-05-09T11:48:51.726Z
https://hds-sandbox.github.io/RDM_NGS_course/develop/examples/NGS_metadata.html
- 2024-05-09T11:48:15.608Z
+ 2024-05-09T11:48:51.726Z
https://hds-sandbox.github.io/RDM_NGS_course/develop/contributors.html
- 2024-05-09T11:48:15.608Z
+ 2024-05-09T11:48:51.722Z
https://hds-sandbox.github.io/RDM_NGS_course/develop/03_DOD.html
- 2024-05-09T11:48:15.592Z
+ 2024-05-09T11:48:51.706Z
https://hds-sandbox.github.io/RDM_NGS_course/develop/01_RDM_intro.html
- 2024-05-09T11:48:15.592Z
+ 2024-05-09T11:48:51.706Z
https://hds-sandbox.github.io/RDM_NGS_course/develop/02_DMP.html
- 2024-05-09T11:48:15.592Z
+ 2024-05-09T11:48:51.706Z
https://hds-sandbox.github.io/RDM_NGS_course/index.html
- 2024-05-09T11:48:15.632Z
+ 2024-05-09T11:48:51.742Z
-
-
@@ -1376,23 +1376,23 @@ Assay metadata field
Assay metadata field
-
-
@@ -1980,23 +1980,23 @@ Assay metadata field
-
-
diff --git a/search.json b/search.json
index 1a71e96c..bc645fb3 100644
--- a/search.json
+++ b/search.json
@@ -15,7 +15,7 @@
"href": "develop/06_pipelines.html",
"title": "6. Processing and analyzing biodata",
"section": "",
- "text": "In this section, we explore essential elements of reproducibility and efficiency in computational research, highlighting techniques and tools for creating robust and transparent coding and workflows. By prioritizing reproducibility and replicability, researchers can enhance the credibility and impact of their findings while fostering collaboration and knowledge dissemination within the scientific community.\n\n\n\n\n\n\nBefore you start…\n\n\n\n\nChoose a folder structure (e.g., using cookiecutter)\nChoose a file naming system\nAdd a README describing the project (and the naming conventions)\nInstall and set up version control (e.g., Git and Github)\nChoose a coding style!\n\n\nPython: Python’s PEP or Google’s style guide\nR: Google’s style guide or Tidyverse’s style guide\n\n\n\n\n\nThrough techniques such as scripting, containerization (e.g., Docker), and virtual environments, researchers can create reproducible analyses that enable others to validate and build upon their work. Emphasizing the documentation of data processing steps, parameters, and results ensures transparency and accountability in research outputs. To write clear and reproducible code, take the following approach: write functions, code defensively (such as input validation, error handling, etc.), add comments, conduct testing, and maintain proper documentation.\nTools for reproducibility:\n\nCode notebooks: Utilize tools like Jupyter Notebook and R Markdown to combine code with descriptive text and visualizations, enhancing data documentation.\n\nIntegrated development environments: Consider using platforms such as (knitr or MLflow) to streamline code development and documentation processes.\nPipeline frameworks or workflow management systems: Implement systems like Nextflow and Snakemake to automate data analysis steps (including data extraction, transformation, validation, visualization, and more). Additionally, they contribute to ensuring interoperability by facilitating seamless integration and interaction between different components or stages.\n\n\n\nComputational notebooks (e.g., Jupyter, R Markdown) provide researchers with a versatile platform for exploratory and interactive data analysis. These notebooks facilitate sharing insights with collaborators and documentation of analysis procedures.\n\n\n\nTools such as Nextflow and Snakemake streamline and automate various data analysis steps, enabling parallel processing and seamless integration with existing tools. Remember to create portable code and use relative paths to ensure transferability between users.\n\nNextflow: offers scalable and portable NGS data analysis pipelines, facilitating data processing across diverse computing environments.\nSnakemake: Utilizing Python-based scripting, Snakemake allows for flexible and automated NGS data analysis pipelines, supporting parallel processing and integration with other tools.\n\nOnce your scientific computational workflow is ready to be shared, publish your scientific computational workflow on WorkflowHub.\n\n\n\nEach computer or HPC (High-Performance Computing) platform has a unique computational environment that includes its operating system, installed software, versions of software packages, and other features. If a research project is moved to a different computer or platform, the analysis might not run or produce consistent results if it depends on any of these factors.\nFor research to be reproducible, the original computational environment must be recorded so others can replicate it. There are several methods to achieve this:\n\nContainerization platforms (e.g., Docker, Singularity): allow the researcher to package their software and dependencies into a standardized container image.\nVirtual Machines (e.g., VirtualBox): can share an entire virtualized computing environment (OS, software and dependencies)\nEnvironment managers: provide an isolated environment with specific packages and dependencies that can be installed without affecting the system-wide configuration. These environments are particularly useful for managing conflicting dependencies and ensuring reproducibility. Configuration files can automate the setup of the computational environment:\n\nconda: allows users to export environment specifications (software and dependencies) to YAML files enabling easy recreation of the environment on another system\nPython virtualenv: is a tool for creating isolated environments to manage dependencies specific to a project\nrequirements.txt: may contain commands for installing packages (such as pip for Python packages or apt-get for system-level dependencies), configuring system settings, and setting environment variables. Package managers can be used to install, upgrade and manage packages.\nR’s renv: The ‘renv’ package creates isolated environments in R.\n\nEnvironment descriptors\n\nsessionInfo() or devtools::session_info(): In R, these functions provide detailed information about the current session\nsessionInfo(), similarly, in Python. Libraries like NumPy and Pandas have show_versions() methods to display package versions.\n\n\nWhile environment managers are very easy to use and share across different systems, and are lightweight and efficient, offering fast start-up times, Docker containers provide a full env isolation (including the operating system) which ensures consistent behavior across different systems.\n\n\n\n\nTo maintain clarity and organization in the data analysis process, adopt best practices such as:\n\nData documentation: create a README.md file to provide an overview of the project and its structure, and metadata for understanding the context of your analysis.\nAnnotate your pipelines and comment your code (look for tutorials and templates such as this one from freeCodeCamp).\nUse coding style guides (code lay-out, whitespace in expressions, comments, naming conventions, annotations…) to maintain consistency.\nLabel files numerically to organize the entire data analysis process (scripts, notebooks, pipelines, etc.).\n\n00.preprocessing., 01.data_analysis_step1., etc.\n\nProvide environment files for reproducing the computational environment (such as ‘requirements.txt’ for Python or ‘environment.yml’ for Conda). The simplest way is to document the dependencies by reporting the packages and their versions used to run your analysis.\nData versioning: use version control systems (e.g., Git) and upload your code to a code repository Lesson 5.\nIntegrated development environments (e.g., RStudio, PyCharm) offer tools and features for writing, testing, and debugging code\nLeverage curated pipelines such as the ones developed by the nf-core community, further ensuring adherence to community standards and guidelines.\nAdd a LICENSE file and perform regular updates: clarifying usage permissions and facilitating collaboration.\n\n\n\n\n\n\n\nPractical HPC pipes\n\n\n\nWe provide a hand-on workshop on computational environments and pipelines. Keep an eye on the upcoming events on the Sandbox website.\nIf you’re interested in delving deeper, check out the HPC best practices module we’ve developed here.",
+ "text": "In this section, we explore essential elements of reproducibility and efficiency in computational research, highlighting techniques and tools for creating robust and transparent coding and workflows. By prioritizing reproducibility and replicability, researchers can enhance the credibility and impact of their findings while fostering collaboration and knowledge dissemination within the scientific community.\n\n\n\n\n\n\nBefore you start…\n\n\n\n\nChoose a folder structure (e.g., using cookiecutter)\nChoose a file naming system\nAdd a README describing the project (and the naming conventions)\nInstall and set up version control (e.g., Git and Github)\nChoose a coding style!\n\n\nPython: Python’s PEP or Google’s style guide\nR: Google’s style guide or Tidyverse’s style guide\n\n\n\n\n\nThrough techniques such as scripting, containerization (e.g., Docker), and virtual environments, researchers can create reproducible analyses that enable others to validate and build upon their work. Emphasizing the documentation of data processing steps, parameters, and results ensures transparency and accountability in research outputs. To write clear and reproducible code, take the following approach: write functions, code defensively (such as input validation, error handling, etc.), add comments, conduct testing, and maintain proper documentation.\nTools for reproducibility:\n\nCode notebooks: Utilize tools like Jupyter Notebook and R Markdown to combine code with descriptive text and visualizations, enhancing data documentation.\n\nIntegrated development environments: Consider using platforms such as (knitr or MLflow) to streamline code development and documentation processes.\nPipeline frameworks or workflow management systems: Implement systems like Nextflow and Snakemake to automate data analysis steps (including data extraction, transformation, validation, visualization, and more). Additionally, they contribute to ensuring interoperability by facilitating seamless integration and interaction between different components or stages.\n\n\n\nComputational notebooks (e.g., Jupyter, R Markdown) provide researchers with a versatile platform for exploratory and interactive data analysis. These notebooks facilitate sharing insights with collaborators and documentation of analysis procedures.\n\n\n\nTools such as Nextflow and Snakemake streamline and automate various data analysis steps, enabling parallel processing and seamless integration with existing tools. Remember to create portable code and use relative paths to ensure transferability between users.\n\nNextflow: offers scalable and portable NGS data analysis pipelines, facilitating data processing across diverse computing environments.\nSnakemake: Utilizing Python-based scripting, Snakemake allows for flexible and automated NGS data analysis pipelines, supporting parallel processing and integration with other tools.\n\nOnce your scientific computational workflow is ready to be shared, publish your scientific computational workflow on WorkflowHub.\n\n\n\nEach computer or HPC (High-Performance Computing) platform has a unique computational environment that includes its operating system, installed software, versions of software packages, and other features. If a research project is moved to a different computer or platform, the analysis might not run or produce consistent results if it depends on any of these factors.\nFor research to be reproducible, the original computational environment must be recorded so others can replicate it. There are several methods to achieve this:\n\nContainerization platforms (e.g., Docker, Singularity): allow the researcher to package their software and dependencies into a standardized container image.\nVirtual Machines (e.g., VirtualBox): can share an entire virtualized computing environment (OS, software and dependencies)\nEnvironment managers: provide an isolated environment with specific packages and dependencies that can be installed without affecting the system-wide configuration. These environments are particularly useful for managing conflicting dependencies and ensuring reproducibility. Configuration files can automate the setup of the computational environment:\n\nconda: allows users to export environment specifications (software and dependencies) to YAML files enabling easy recreation of the environment on another system\nPython virtualenv: is a tool for creating isolated environments to manage dependencies specific to a project\nrequirements.txt: may contain commands for installing packages (such as pip for Python packages or apt-get for system-level dependencies), configuring system settings, and setting environment variables. Package managers can be used to install, upgrade and manage packages.\nR’s renv: The ‘renv’ package creates isolated environments in R.\n\nEnvironment descriptors\n\nsessionInfo() or devtools::session_info(): In R, these functions provide detailed information about the current session\nsessionInfo(), similarly, in Python. Libraries like NumPy and Pandas have show_versions() methods to display package versions.\n\n\nWhile environment managers are very easy to use and share across different systems, and are lightweight and efficient, offering fast start-up times, Docker containers provide a full env isolation (including the operating system) which ensures consistent behavior across different systems.\n\n\n\n\nTo maintain clarity and organization in the data analysis process, adopt best practices such as:\n\nData documentation: create a README.md file to provide an overview of the project and its structure, and metadata for understanding the context of your analysis.\nAnnotate your pipelines and comment your code (look for tutorials and templates such as this one from freeCodeCamp).\nUse coding style guides (code lay-out, whitespace in expressions, comments, naming conventions, annotations…) to maintain consistency.\nLabel files numerically to organize the entire data analysis process (scripts, notebooks, pipelines, etc.).\n\n00.preprocessing., 01.data_analysis_step1., etc.\n\nProvide environment files for reproducing the computational environment (such as ‘requirements.txt’ for Python or ‘environment.yml’ for Conda). The simplest way is to document the dependencies by reporting the packages and their versions used to run your analysis.\nData versioning: use version control systems (e.g., Git) and upload your code to a code repository Lesson 5.\nIntegrated development environments (e.g., RStudio, PyCharm) offer tools and features for writing, testing, and debugging code\nLeverage curated pipelines such as the ones developed by the nf-core community, further ensuring adherence to community standards and guidelines.\nAdd a LICENSE file and perform regular updates: clarifying usage permissions and facilitating collaboration.\n\n\n\n\n\n\n\nPractical HPC pipes\n\n\n\nWe provide a hand-on workshop on computational environments and pipelines. Keep an eye on the upcoming events on the Sandbox website. If you’re interested in delving deeper, check out the HPC best practices module we’ve developed here.",
"crumbs": [
"Course material",
"Key practices",
@@ -27,7 +27,7 @@
"href": "develop/06_pipelines.html#code-and-pipelines-for-data-analysis",
"title": "6. Processing and analyzing biodata",
"section": "",
- "text": "In this section, we explore essential elements of reproducibility and efficiency in computational research, highlighting techniques and tools for creating robust and transparent coding and workflows. By prioritizing reproducibility and replicability, researchers can enhance the credibility and impact of their findings while fostering collaboration and knowledge dissemination within the scientific community.\n\n\n\n\n\n\nBefore you start…\n\n\n\n\nChoose a folder structure (e.g., using cookiecutter)\nChoose a file naming system\nAdd a README describing the project (and the naming conventions)\nInstall and set up version control (e.g., Git and Github)\nChoose a coding style!\n\n\nPython: Python’s PEP or Google’s style guide\nR: Google’s style guide or Tidyverse’s style guide\n\n\n\n\n\nThrough techniques such as scripting, containerization (e.g., Docker), and virtual environments, researchers can create reproducible analyses that enable others to validate and build upon their work. Emphasizing the documentation of data processing steps, parameters, and results ensures transparency and accountability in research outputs. To write clear and reproducible code, take the following approach: write functions, code defensively (such as input validation, error handling, etc.), add comments, conduct testing, and maintain proper documentation.\nTools for reproducibility:\n\nCode notebooks: Utilize tools like Jupyter Notebook and R Markdown to combine code with descriptive text and visualizations, enhancing data documentation.\n\nIntegrated development environments: Consider using platforms such as (knitr or MLflow) to streamline code development and documentation processes.\nPipeline frameworks or workflow management systems: Implement systems like Nextflow and Snakemake to automate data analysis steps (including data extraction, transformation, validation, visualization, and more). Additionally, they contribute to ensuring interoperability by facilitating seamless integration and interaction between different components or stages.\n\n\n\nComputational notebooks (e.g., Jupyter, R Markdown) provide researchers with a versatile platform for exploratory and interactive data analysis. These notebooks facilitate sharing insights with collaborators and documentation of analysis procedures.\n\n\n\nTools such as Nextflow and Snakemake streamline and automate various data analysis steps, enabling parallel processing and seamless integration with existing tools. Remember to create portable code and use relative paths to ensure transferability between users.\n\nNextflow: offers scalable and portable NGS data analysis pipelines, facilitating data processing across diverse computing environments.\nSnakemake: Utilizing Python-based scripting, Snakemake allows for flexible and automated NGS data analysis pipelines, supporting parallel processing and integration with other tools.\n\nOnce your scientific computational workflow is ready to be shared, publish your scientific computational workflow on WorkflowHub.\n\n\n\nEach computer or HPC (High-Performance Computing) platform has a unique computational environment that includes its operating system, installed software, versions of software packages, and other features. If a research project is moved to a different computer or platform, the analysis might not run or produce consistent results if it depends on any of these factors.\nFor research to be reproducible, the original computational environment must be recorded so others can replicate it. There are several methods to achieve this:\n\nContainerization platforms (e.g., Docker, Singularity): allow the researcher to package their software and dependencies into a standardized container image.\nVirtual Machines (e.g., VirtualBox): can share an entire virtualized computing environment (OS, software and dependencies)\nEnvironment managers: provide an isolated environment with specific packages and dependencies that can be installed without affecting the system-wide configuration. These environments are particularly useful for managing conflicting dependencies and ensuring reproducibility. Configuration files can automate the setup of the computational environment:\n\nconda: allows users to export environment specifications (software and dependencies) to YAML files enabling easy recreation of the environment on another system\nPython virtualenv: is a tool for creating isolated environments to manage dependencies specific to a project\nrequirements.txt: may contain commands for installing packages (such as pip for Python packages or apt-get for system-level dependencies), configuring system settings, and setting environment variables. Package managers can be used to install, upgrade and manage packages.\nR’s renv: The ‘renv’ package creates isolated environments in R.\n\nEnvironment descriptors\n\nsessionInfo() or devtools::session_info(): In R, these functions provide detailed information about the current session\nsessionInfo(), similarly, in Python. Libraries like NumPy and Pandas have show_versions() methods to display package versions.\n\n\nWhile environment managers are very easy to use and share across different systems, and are lightweight and efficient, offering fast start-up times, Docker containers provide a full env isolation (including the operating system) which ensures consistent behavior across different systems.\n\n\n\n\nTo maintain clarity and organization in the data analysis process, adopt best practices such as:\n\nData documentation: create a README.md file to provide an overview of the project and its structure, and metadata for understanding the context of your analysis.\nAnnotate your pipelines and comment your code (look for tutorials and templates such as this one from freeCodeCamp).\nUse coding style guides (code lay-out, whitespace in expressions, comments, naming conventions, annotations…) to maintain consistency.\nLabel files numerically to organize the entire data analysis process (scripts, notebooks, pipelines, etc.).\n\n00.preprocessing., 01.data_analysis_step1., etc.\n\nProvide environment files for reproducing the computational environment (such as ‘requirements.txt’ for Python or ‘environment.yml’ for Conda). The simplest way is to document the dependencies by reporting the packages and their versions used to run your analysis.\nData versioning: use version control systems (e.g., Git) and upload your code to a code repository Lesson 5.\nIntegrated development environments (e.g., RStudio, PyCharm) offer tools and features for writing, testing, and debugging code\nLeverage curated pipelines such as the ones developed by the nf-core community, further ensuring adherence to community standards and guidelines.\nAdd a LICENSE file and perform regular updates: clarifying usage permissions and facilitating collaboration.\n\n\n\n\n\n\n\nPractical HPC pipes\n\n\n\nWe provide a hand-on workshop on computational environments and pipelines. Keep an eye on the upcoming events on the Sandbox website.\nIf you’re interested in delving deeper, check out the HPC best practices module we’ve developed here.",
+ "text": "In this section, we explore essential elements of reproducibility and efficiency in computational research, highlighting techniques and tools for creating robust and transparent coding and workflows. By prioritizing reproducibility and replicability, researchers can enhance the credibility and impact of their findings while fostering collaboration and knowledge dissemination within the scientific community.\n\n\n\n\n\n\nBefore you start…\n\n\n\n\nChoose a folder structure (e.g., using cookiecutter)\nChoose a file naming system\nAdd a README describing the project (and the naming conventions)\nInstall and set up version control (e.g., Git and Github)\nChoose a coding style!\n\n\nPython: Python’s PEP or Google’s style guide\nR: Google’s style guide or Tidyverse’s style guide\n\n\n\n\n\nThrough techniques such as scripting, containerization (e.g., Docker), and virtual environments, researchers can create reproducible analyses that enable others to validate and build upon their work. Emphasizing the documentation of data processing steps, parameters, and results ensures transparency and accountability in research outputs. To write clear and reproducible code, take the following approach: write functions, code defensively (such as input validation, error handling, etc.), add comments, conduct testing, and maintain proper documentation.\nTools for reproducibility:\n\nCode notebooks: Utilize tools like Jupyter Notebook and R Markdown to combine code with descriptive text and visualizations, enhancing data documentation.\n\nIntegrated development environments: Consider using platforms such as (knitr or MLflow) to streamline code development and documentation processes.\nPipeline frameworks or workflow management systems: Implement systems like Nextflow and Snakemake to automate data analysis steps (including data extraction, transformation, validation, visualization, and more). Additionally, they contribute to ensuring interoperability by facilitating seamless integration and interaction between different components or stages.\n\n\n\nComputational notebooks (e.g., Jupyter, R Markdown) provide researchers with a versatile platform for exploratory and interactive data analysis. These notebooks facilitate sharing insights with collaborators and documentation of analysis procedures.\n\n\n\nTools such as Nextflow and Snakemake streamline and automate various data analysis steps, enabling parallel processing and seamless integration with existing tools. Remember to create portable code and use relative paths to ensure transferability between users.\n\nNextflow: offers scalable and portable NGS data analysis pipelines, facilitating data processing across diverse computing environments.\nSnakemake: Utilizing Python-based scripting, Snakemake allows for flexible and automated NGS data analysis pipelines, supporting parallel processing and integration with other tools.\n\nOnce your scientific computational workflow is ready to be shared, publish your scientific computational workflow on WorkflowHub.\n\n\n\nEach computer or HPC (High-Performance Computing) platform has a unique computational environment that includes its operating system, installed software, versions of software packages, and other features. If a research project is moved to a different computer or platform, the analysis might not run or produce consistent results if it depends on any of these factors.\nFor research to be reproducible, the original computational environment must be recorded so others can replicate it. There are several methods to achieve this:\n\nContainerization platforms (e.g., Docker, Singularity): allow the researcher to package their software and dependencies into a standardized container image.\nVirtual Machines (e.g., VirtualBox): can share an entire virtualized computing environment (OS, software and dependencies)\nEnvironment managers: provide an isolated environment with specific packages and dependencies that can be installed without affecting the system-wide configuration. These environments are particularly useful for managing conflicting dependencies and ensuring reproducibility. Configuration files can automate the setup of the computational environment:\n\nconda: allows users to export environment specifications (software and dependencies) to YAML files enabling easy recreation of the environment on another system\nPython virtualenv: is a tool for creating isolated environments to manage dependencies specific to a project\nrequirements.txt: may contain commands for installing packages (such as pip for Python packages or apt-get for system-level dependencies), configuring system settings, and setting environment variables. Package managers can be used to install, upgrade and manage packages.\nR’s renv: The ‘renv’ package creates isolated environments in R.\n\nEnvironment descriptors\n\nsessionInfo() or devtools::session_info(): In R, these functions provide detailed information about the current session\nsessionInfo(), similarly, in Python. Libraries like NumPy and Pandas have show_versions() methods to display package versions.\n\n\nWhile environment managers are very easy to use and share across different systems, and are lightweight and efficient, offering fast start-up times, Docker containers provide a full env isolation (including the operating system) which ensures consistent behavior across different systems.\n\n\n\n\nTo maintain clarity and organization in the data analysis process, adopt best practices such as:\n\nData documentation: create a README.md file to provide an overview of the project and its structure, and metadata for understanding the context of your analysis.\nAnnotate your pipelines and comment your code (look for tutorials and templates such as this one from freeCodeCamp).\nUse coding style guides (code lay-out, whitespace in expressions, comments, naming conventions, annotations…) to maintain consistency.\nLabel files numerically to organize the entire data analysis process (scripts, notebooks, pipelines, etc.).\n\n00.preprocessing., 01.data_analysis_step1., etc.\n\nProvide environment files for reproducing the computational environment (such as ‘requirements.txt’ for Python or ‘environment.yml’ for Conda). The simplest way is to document the dependencies by reporting the packages and their versions used to run your analysis.\nData versioning: use version control systems (e.g., Git) and upload your code to a code repository Lesson 5.\nIntegrated development environments (e.g., RStudio, PyCharm) offer tools and features for writing, testing, and debugging code\nLeverage curated pipelines such as the ones developed by the nf-core community, further ensuring adherence to community standards and guidelines.\nAdd a LICENSE file and perform regular updates: clarifying usage permissions and facilitating collaboration.\n\n\n\n\n\n\n\nPractical HPC pipes\n\n\n\nWe provide a hand-on workshop on computational environments and pipelines. Keep an eye on the upcoming events on the Sandbox website. If you’re interested in delving deeper, check out the HPC best practices module we’ve developed here.",
"crumbs": [
"Course material",
"Key practices",
diff --git a/sitemap.xml b/sitemap.xml
index 9e04b9d0..e6521cbb 100644
--- a/sitemap.xml
+++ b/sitemap.xml
@@ -2,74 +2,74 @@
https://hds-sandbox.github.io/RDM_NGS_course/use_cases.html
- 2024-05-09T11:48:15.632Z
+ 2024-05-09T11:48:51.746Z
https://hds-sandbox.github.io/RDM_NGS_course/develop/06_pipelines.html
- 2024-05-09T11:48:15.592Z
+ 2024-05-09T11:48:51.706Z
https://hds-sandbox.github.io/RDM_NGS_course/develop/practical_workshop.html
- 2024-05-09T11:48:15.632Z
+ 2024-05-09T11:48:51.742Z
https://hds-sandbox.github.io/RDM_NGS_course/develop/04_metadata.html
- 2024-05-09T11:48:15.592Z
+ 2024-05-09T11:48:51.706Z
https://hds-sandbox.github.io/RDM_NGS_course/develop/05_VC.html
- 2024-05-09T11:48:15.592Z
+ 2024-05-09T11:48:51.706Z
https://hds-sandbox.github.io/RDM_NGS_course/develop/07_repos.html
- 2024-05-09T11:48:15.592Z
+ 2024-05-09T11:48:51.706Z
https://hds-sandbox.github.io/RDM_NGS_course/develop/examples/proteomics_metadata.html
- 2024-05-09T11:48:15.608Z
+ 2024-05-09T11:48:51.726Z
https://hds-sandbox.github.io/RDM_NGS_course/develop/examples/NGS_management.html
- 2024-05-09T11:48:15.608Z
+ 2024-05-09T11:48:51.726Z
https://hds-sandbox.github.io/RDM_NGS_course/cards/AlbaMartinez.html
- 2024-05-09T11:48:15.588Z
+ 2024-05-09T11:48:51.706Z
https://hds-sandbox.github.io/RDM_NGS_course/cards/JARomero.html
- 2024-05-09T11:48:15.592Z
+ 2024-05-09T11:48:51.706Z
https://hds-sandbox.github.io/RDM_NGS_course/develop/examples/mkdocs_pages.html
- 2024-05-09T11:48:15.608Z
+ 2024-05-09T11:48:51.726Z
https://hds-sandbox.github.io/RDM_NGS_course/develop/examples/NGS_OS_FAIR.html
- 2024-05-09T11:48:15.608Z
+ 2024-05-09T11:48:51.726Z
https://hds-sandbox.github.io/RDM_NGS_course/develop/examples/NGS_metadata.html
- 2024-05-09T11:48:15.608Z
+ 2024-05-09T11:48:51.726Z
https://hds-sandbox.github.io/RDM_NGS_course/develop/contributors.html
- 2024-05-09T11:48:15.608Z
+ 2024-05-09T11:48:51.722Z
https://hds-sandbox.github.io/RDM_NGS_course/develop/03_DOD.html
- 2024-05-09T11:48:15.592Z
+ 2024-05-09T11:48:51.706Z
https://hds-sandbox.github.io/RDM_NGS_course/develop/01_RDM_intro.html
- 2024-05-09T11:48:15.592Z
+ 2024-05-09T11:48:51.706Z
https://hds-sandbox.github.io/RDM_NGS_course/develop/02_DMP.html
- 2024-05-09T11:48:15.592Z
+ 2024-05-09T11:48:51.706Z
https://hds-sandbox.github.io/RDM_NGS_course/index.html
- 2024-05-09T11:48:15.632Z
+ 2024-05-09T11:48:51.742Z
-
-
@@ -1980,23 +1980,23 @@ Assay metadata field
Assay metadata field
-
-
diff --git a/search.json b/search.json
index 1a71e96c..bc645fb3 100644
--- a/search.json
+++ b/search.json
@@ -15,7 +15,7 @@
"href": "develop/06_pipelines.html",
"title": "6. Processing and analyzing biodata",
"section": "",
- "text": "In this section, we explore essential elements of reproducibility and efficiency in computational research, highlighting techniques and tools for creating robust and transparent coding and workflows. By prioritizing reproducibility and replicability, researchers can enhance the credibility and impact of their findings while fostering collaboration and knowledge dissemination within the scientific community.\n\n\n\n\n\n\nBefore you start…\n\n\n\n\nChoose a folder structure (e.g., using cookiecutter)\nChoose a file naming system\nAdd a README describing the project (and the naming conventions)\nInstall and set up version control (e.g., Git and Github)\nChoose a coding style!\n\n\nPython: Python’s PEP or Google’s style guide\nR: Google’s style guide or Tidyverse’s style guide\n\n\n\n\n\nThrough techniques such as scripting, containerization (e.g., Docker), and virtual environments, researchers can create reproducible analyses that enable others to validate and build upon their work. Emphasizing the documentation of data processing steps, parameters, and results ensures transparency and accountability in research outputs. To write clear and reproducible code, take the following approach: write functions, code defensively (such as input validation, error handling, etc.), add comments, conduct testing, and maintain proper documentation.\nTools for reproducibility:\n\nCode notebooks: Utilize tools like Jupyter Notebook and R Markdown to combine code with descriptive text and visualizations, enhancing data documentation.\n\nIntegrated development environments: Consider using platforms such as (knitr or MLflow) to streamline code development and documentation processes.\nPipeline frameworks or workflow management systems: Implement systems like Nextflow and Snakemake to automate data analysis steps (including data extraction, transformation, validation, visualization, and more). Additionally, they contribute to ensuring interoperability by facilitating seamless integration and interaction between different components or stages.\n\n\n\nComputational notebooks (e.g., Jupyter, R Markdown) provide researchers with a versatile platform for exploratory and interactive data analysis. These notebooks facilitate sharing insights with collaborators and documentation of analysis procedures.\n\n\n\nTools such as Nextflow and Snakemake streamline and automate various data analysis steps, enabling parallel processing and seamless integration with existing tools. Remember to create portable code and use relative paths to ensure transferability between users.\n\nNextflow: offers scalable and portable NGS data analysis pipelines, facilitating data processing across diverse computing environments.\nSnakemake: Utilizing Python-based scripting, Snakemake allows for flexible and automated NGS data analysis pipelines, supporting parallel processing and integration with other tools.\n\nOnce your scientific computational workflow is ready to be shared, publish your scientific computational workflow on WorkflowHub.\n\n\n\nEach computer or HPC (High-Performance Computing) platform has a unique computational environment that includes its operating system, installed software, versions of software packages, and other features. If a research project is moved to a different computer or platform, the analysis might not run or produce consistent results if it depends on any of these factors.\nFor research to be reproducible, the original computational environment must be recorded so others can replicate it. There are several methods to achieve this:\n\nContainerization platforms (e.g., Docker, Singularity): allow the researcher to package their software and dependencies into a standardized container image.\nVirtual Machines (e.g., VirtualBox): can share an entire virtualized computing environment (OS, software and dependencies)\nEnvironment managers: provide an isolated environment with specific packages and dependencies that can be installed without affecting the system-wide configuration. These environments are particularly useful for managing conflicting dependencies and ensuring reproducibility. Configuration files can automate the setup of the computational environment:\n\nconda: allows users to export environment specifications (software and dependencies) to YAML files enabling easy recreation of the environment on another system\nPython virtualenv: is a tool for creating isolated environments to manage dependencies specific to a project\nrequirements.txt: may contain commands for installing packages (such as pip for Python packages or apt-get for system-level dependencies), configuring system settings, and setting environment variables. Package managers can be used to install, upgrade and manage packages.\nR’s renv: The ‘renv’ package creates isolated environments in R.\n\nEnvironment descriptors\n\nsessionInfo() or devtools::session_info(): In R, these functions provide detailed information about the current session\nsessionInfo(), similarly, in Python. Libraries like NumPy and Pandas have show_versions() methods to display package versions.\n\n\nWhile environment managers are very easy to use and share across different systems, and are lightweight and efficient, offering fast start-up times, Docker containers provide a full env isolation (including the operating system) which ensures consistent behavior across different systems.\n\n\n\n\nTo maintain clarity and organization in the data analysis process, adopt best practices such as:\n\nData documentation: create a README.md file to provide an overview of the project and its structure, and metadata for understanding the context of your analysis.\nAnnotate your pipelines and comment your code (look for tutorials and templates such as this one from freeCodeCamp).\nUse coding style guides (code lay-out, whitespace in expressions, comments, naming conventions, annotations…) to maintain consistency.\nLabel files numerically to organize the entire data analysis process (scripts, notebooks, pipelines, etc.).\n\n00.preprocessing., 01.data_analysis_step1., etc.\n\nProvide environment files for reproducing the computational environment (such as ‘requirements.txt’ for Python or ‘environment.yml’ for Conda). The simplest way is to document the dependencies by reporting the packages and their versions used to run your analysis.\nData versioning: use version control systems (e.g., Git) and upload your code to a code repository Lesson 5.\nIntegrated development environments (e.g., RStudio, PyCharm) offer tools and features for writing, testing, and debugging code\nLeverage curated pipelines such as the ones developed by the nf-core community, further ensuring adherence to community standards and guidelines.\nAdd a LICENSE file and perform regular updates: clarifying usage permissions and facilitating collaboration.\n\n\n\n\n\n\n\nPractical HPC pipes\n\n\n\nWe provide a hand-on workshop on computational environments and pipelines. Keep an eye on the upcoming events on the Sandbox website.\nIf you’re interested in delving deeper, check out the HPC best practices module we’ve developed here.",
+ "text": "In this section, we explore essential elements of reproducibility and efficiency in computational research, highlighting techniques and tools for creating robust and transparent coding and workflows. By prioritizing reproducibility and replicability, researchers can enhance the credibility and impact of their findings while fostering collaboration and knowledge dissemination within the scientific community.\n\n\n\n\n\n\nBefore you start…\n\n\n\n\nChoose a folder structure (e.g., using cookiecutter)\nChoose a file naming system\nAdd a README describing the project (and the naming conventions)\nInstall and set up version control (e.g., Git and Github)\nChoose a coding style!\n\n\nPython: Python’s PEP or Google’s style guide\nR: Google’s style guide or Tidyverse’s style guide\n\n\n\n\n\nThrough techniques such as scripting, containerization (e.g., Docker), and virtual environments, researchers can create reproducible analyses that enable others to validate and build upon their work. Emphasizing the documentation of data processing steps, parameters, and results ensures transparency and accountability in research outputs. To write clear and reproducible code, take the following approach: write functions, code defensively (such as input validation, error handling, etc.), add comments, conduct testing, and maintain proper documentation.\nTools for reproducibility:\n\nCode notebooks: Utilize tools like Jupyter Notebook and R Markdown to combine code with descriptive text and visualizations, enhancing data documentation.\n\nIntegrated development environments: Consider using platforms such as (knitr or MLflow) to streamline code development and documentation processes.\nPipeline frameworks or workflow management systems: Implement systems like Nextflow and Snakemake to automate data analysis steps (including data extraction, transformation, validation, visualization, and more). Additionally, they contribute to ensuring interoperability by facilitating seamless integration and interaction between different components or stages.\n\n\n\nComputational notebooks (e.g., Jupyter, R Markdown) provide researchers with a versatile platform for exploratory and interactive data analysis. These notebooks facilitate sharing insights with collaborators and documentation of analysis procedures.\n\n\n\nTools such as Nextflow and Snakemake streamline and automate various data analysis steps, enabling parallel processing and seamless integration with existing tools. Remember to create portable code and use relative paths to ensure transferability between users.\n\nNextflow: offers scalable and portable NGS data analysis pipelines, facilitating data processing across diverse computing environments.\nSnakemake: Utilizing Python-based scripting, Snakemake allows for flexible and automated NGS data analysis pipelines, supporting parallel processing and integration with other tools.\n\nOnce your scientific computational workflow is ready to be shared, publish your scientific computational workflow on WorkflowHub.\n\n\n\nEach computer or HPC (High-Performance Computing) platform has a unique computational environment that includes its operating system, installed software, versions of software packages, and other features. If a research project is moved to a different computer or platform, the analysis might not run or produce consistent results if it depends on any of these factors.\nFor research to be reproducible, the original computational environment must be recorded so others can replicate it. There are several methods to achieve this:\n\nContainerization platforms (e.g., Docker, Singularity): allow the researcher to package their software and dependencies into a standardized container image.\nVirtual Machines (e.g., VirtualBox): can share an entire virtualized computing environment (OS, software and dependencies)\nEnvironment managers: provide an isolated environment with specific packages and dependencies that can be installed without affecting the system-wide configuration. These environments are particularly useful for managing conflicting dependencies and ensuring reproducibility. Configuration files can automate the setup of the computational environment:\n\nconda: allows users to export environment specifications (software and dependencies) to YAML files enabling easy recreation of the environment on another system\nPython virtualenv: is a tool for creating isolated environments to manage dependencies specific to a project\nrequirements.txt: may contain commands for installing packages (such as pip for Python packages or apt-get for system-level dependencies), configuring system settings, and setting environment variables. Package managers can be used to install, upgrade and manage packages.\nR’s renv: The ‘renv’ package creates isolated environments in R.\n\nEnvironment descriptors\n\nsessionInfo() or devtools::session_info(): In R, these functions provide detailed information about the current session\nsessionInfo(), similarly, in Python. Libraries like NumPy and Pandas have show_versions() methods to display package versions.\n\n\nWhile environment managers are very easy to use and share across different systems, and are lightweight and efficient, offering fast start-up times, Docker containers provide a full env isolation (including the operating system) which ensures consistent behavior across different systems.\n\n\n\n\nTo maintain clarity and organization in the data analysis process, adopt best practices such as:\n\nData documentation: create a README.md file to provide an overview of the project and its structure, and metadata for understanding the context of your analysis.\nAnnotate your pipelines and comment your code (look for tutorials and templates such as this one from freeCodeCamp).\nUse coding style guides (code lay-out, whitespace in expressions, comments, naming conventions, annotations…) to maintain consistency.\nLabel files numerically to organize the entire data analysis process (scripts, notebooks, pipelines, etc.).\n\n00.preprocessing., 01.data_analysis_step1., etc.\n\nProvide environment files for reproducing the computational environment (such as ‘requirements.txt’ for Python or ‘environment.yml’ for Conda). The simplest way is to document the dependencies by reporting the packages and their versions used to run your analysis.\nData versioning: use version control systems (e.g., Git) and upload your code to a code repository Lesson 5.\nIntegrated development environments (e.g., RStudio, PyCharm) offer tools and features for writing, testing, and debugging code\nLeverage curated pipelines such as the ones developed by the nf-core community, further ensuring adherence to community standards and guidelines.\nAdd a LICENSE file and perform regular updates: clarifying usage permissions and facilitating collaboration.\n\n\n\n\n\n\n\nPractical HPC pipes\n\n\n\nWe provide a hand-on workshop on computational environments and pipelines. Keep an eye on the upcoming events on the Sandbox website. If you’re interested in delving deeper, check out the HPC best practices module we’ve developed here.",
"crumbs": [
"Course material",
"Key practices",
@@ -27,7 +27,7 @@
"href": "develop/06_pipelines.html#code-and-pipelines-for-data-analysis",
"title": "6. Processing and analyzing biodata",
"section": "",
- "text": "In this section, we explore essential elements of reproducibility and efficiency in computational research, highlighting techniques and tools for creating robust and transparent coding and workflows. By prioritizing reproducibility and replicability, researchers can enhance the credibility and impact of their findings while fostering collaboration and knowledge dissemination within the scientific community.\n\n\n\n\n\n\nBefore you start…\n\n\n\n\nChoose a folder structure (e.g., using cookiecutter)\nChoose a file naming system\nAdd a README describing the project (and the naming conventions)\nInstall and set up version control (e.g., Git and Github)\nChoose a coding style!\n\n\nPython: Python’s PEP or Google’s style guide\nR: Google’s style guide or Tidyverse’s style guide\n\n\n\n\n\nThrough techniques such as scripting, containerization (e.g., Docker), and virtual environments, researchers can create reproducible analyses that enable others to validate and build upon their work. Emphasizing the documentation of data processing steps, parameters, and results ensures transparency and accountability in research outputs. To write clear and reproducible code, take the following approach: write functions, code defensively (such as input validation, error handling, etc.), add comments, conduct testing, and maintain proper documentation.\nTools for reproducibility:\n\nCode notebooks: Utilize tools like Jupyter Notebook and R Markdown to combine code with descriptive text and visualizations, enhancing data documentation.\n\nIntegrated development environments: Consider using platforms such as (knitr or MLflow) to streamline code development and documentation processes.\nPipeline frameworks or workflow management systems: Implement systems like Nextflow and Snakemake to automate data analysis steps (including data extraction, transformation, validation, visualization, and more). Additionally, they contribute to ensuring interoperability by facilitating seamless integration and interaction between different components or stages.\n\n\n\nComputational notebooks (e.g., Jupyter, R Markdown) provide researchers with a versatile platform for exploratory and interactive data analysis. These notebooks facilitate sharing insights with collaborators and documentation of analysis procedures.\n\n\n\nTools such as Nextflow and Snakemake streamline and automate various data analysis steps, enabling parallel processing and seamless integration with existing tools. Remember to create portable code and use relative paths to ensure transferability between users.\n\nNextflow: offers scalable and portable NGS data analysis pipelines, facilitating data processing across diverse computing environments.\nSnakemake: Utilizing Python-based scripting, Snakemake allows for flexible and automated NGS data analysis pipelines, supporting parallel processing and integration with other tools.\n\nOnce your scientific computational workflow is ready to be shared, publish your scientific computational workflow on WorkflowHub.\n\n\n\nEach computer or HPC (High-Performance Computing) platform has a unique computational environment that includes its operating system, installed software, versions of software packages, and other features. If a research project is moved to a different computer or platform, the analysis might not run or produce consistent results if it depends on any of these factors.\nFor research to be reproducible, the original computational environment must be recorded so others can replicate it. There are several methods to achieve this:\n\nContainerization platforms (e.g., Docker, Singularity): allow the researcher to package their software and dependencies into a standardized container image.\nVirtual Machines (e.g., VirtualBox): can share an entire virtualized computing environment (OS, software and dependencies)\nEnvironment managers: provide an isolated environment with specific packages and dependencies that can be installed without affecting the system-wide configuration. These environments are particularly useful for managing conflicting dependencies and ensuring reproducibility. Configuration files can automate the setup of the computational environment:\n\nconda: allows users to export environment specifications (software and dependencies) to YAML files enabling easy recreation of the environment on another system\nPython virtualenv: is a tool for creating isolated environments to manage dependencies specific to a project\nrequirements.txt: may contain commands for installing packages (such as pip for Python packages or apt-get for system-level dependencies), configuring system settings, and setting environment variables. Package managers can be used to install, upgrade and manage packages.\nR’s renv: The ‘renv’ package creates isolated environments in R.\n\nEnvironment descriptors\n\nsessionInfo() or devtools::session_info(): In R, these functions provide detailed information about the current session\nsessionInfo(), similarly, in Python. Libraries like NumPy and Pandas have show_versions() methods to display package versions.\n\n\nWhile environment managers are very easy to use and share across different systems, and are lightweight and efficient, offering fast start-up times, Docker containers provide a full env isolation (including the operating system) which ensures consistent behavior across different systems.\n\n\n\n\nTo maintain clarity and organization in the data analysis process, adopt best practices such as:\n\nData documentation: create a README.md file to provide an overview of the project and its structure, and metadata for understanding the context of your analysis.\nAnnotate your pipelines and comment your code (look for tutorials and templates such as this one from freeCodeCamp).\nUse coding style guides (code lay-out, whitespace in expressions, comments, naming conventions, annotations…) to maintain consistency.\nLabel files numerically to organize the entire data analysis process (scripts, notebooks, pipelines, etc.).\n\n00.preprocessing., 01.data_analysis_step1., etc.\n\nProvide environment files for reproducing the computational environment (such as ‘requirements.txt’ for Python or ‘environment.yml’ for Conda). The simplest way is to document the dependencies by reporting the packages and their versions used to run your analysis.\nData versioning: use version control systems (e.g., Git) and upload your code to a code repository Lesson 5.\nIntegrated development environments (e.g., RStudio, PyCharm) offer tools and features for writing, testing, and debugging code\nLeverage curated pipelines such as the ones developed by the nf-core community, further ensuring adherence to community standards and guidelines.\nAdd a LICENSE file and perform regular updates: clarifying usage permissions and facilitating collaboration.\n\n\n\n\n\n\n\nPractical HPC pipes\n\n\n\nWe provide a hand-on workshop on computational environments and pipelines. Keep an eye on the upcoming events on the Sandbox website.\nIf you’re interested in delving deeper, check out the HPC best practices module we’ve developed here.",
+ "text": "In this section, we explore essential elements of reproducibility and efficiency in computational research, highlighting techniques and tools for creating robust and transparent coding and workflows. By prioritizing reproducibility and replicability, researchers can enhance the credibility and impact of their findings while fostering collaboration and knowledge dissemination within the scientific community.\n\n\n\n\n\n\nBefore you start…\n\n\n\n\nChoose a folder structure (e.g., using cookiecutter)\nChoose a file naming system\nAdd a README describing the project (and the naming conventions)\nInstall and set up version control (e.g., Git and Github)\nChoose a coding style!\n\n\nPython: Python’s PEP or Google’s style guide\nR: Google’s style guide or Tidyverse’s style guide\n\n\n\n\n\nThrough techniques such as scripting, containerization (e.g., Docker), and virtual environments, researchers can create reproducible analyses that enable others to validate and build upon their work. Emphasizing the documentation of data processing steps, parameters, and results ensures transparency and accountability in research outputs. To write clear and reproducible code, take the following approach: write functions, code defensively (such as input validation, error handling, etc.), add comments, conduct testing, and maintain proper documentation.\nTools for reproducibility:\n\nCode notebooks: Utilize tools like Jupyter Notebook and R Markdown to combine code with descriptive text and visualizations, enhancing data documentation.\n\nIntegrated development environments: Consider using platforms such as (knitr or MLflow) to streamline code development and documentation processes.\nPipeline frameworks or workflow management systems: Implement systems like Nextflow and Snakemake to automate data analysis steps (including data extraction, transformation, validation, visualization, and more). Additionally, they contribute to ensuring interoperability by facilitating seamless integration and interaction between different components or stages.\n\n\n\nComputational notebooks (e.g., Jupyter, R Markdown) provide researchers with a versatile platform for exploratory and interactive data analysis. These notebooks facilitate sharing insights with collaborators and documentation of analysis procedures.\n\n\n\nTools such as Nextflow and Snakemake streamline and automate various data analysis steps, enabling parallel processing and seamless integration with existing tools. Remember to create portable code and use relative paths to ensure transferability between users.\n\nNextflow: offers scalable and portable NGS data analysis pipelines, facilitating data processing across diverse computing environments.\nSnakemake: Utilizing Python-based scripting, Snakemake allows for flexible and automated NGS data analysis pipelines, supporting parallel processing and integration with other tools.\n\nOnce your scientific computational workflow is ready to be shared, publish your scientific computational workflow on WorkflowHub.\n\n\n\nEach computer or HPC (High-Performance Computing) platform has a unique computational environment that includes its operating system, installed software, versions of software packages, and other features. If a research project is moved to a different computer or platform, the analysis might not run or produce consistent results if it depends on any of these factors.\nFor research to be reproducible, the original computational environment must be recorded so others can replicate it. There are several methods to achieve this:\n\nContainerization platforms (e.g., Docker, Singularity): allow the researcher to package their software and dependencies into a standardized container image.\nVirtual Machines (e.g., VirtualBox): can share an entire virtualized computing environment (OS, software and dependencies)\nEnvironment managers: provide an isolated environment with specific packages and dependencies that can be installed without affecting the system-wide configuration. These environments are particularly useful for managing conflicting dependencies and ensuring reproducibility. Configuration files can automate the setup of the computational environment:\n\nconda: allows users to export environment specifications (software and dependencies) to YAML files enabling easy recreation of the environment on another system\nPython virtualenv: is a tool for creating isolated environments to manage dependencies specific to a project\nrequirements.txt: may contain commands for installing packages (such as pip for Python packages or apt-get for system-level dependencies), configuring system settings, and setting environment variables. Package managers can be used to install, upgrade and manage packages.\nR’s renv: The ‘renv’ package creates isolated environments in R.\n\nEnvironment descriptors\n\nsessionInfo() or devtools::session_info(): In R, these functions provide detailed information about the current session\nsessionInfo(), similarly, in Python. Libraries like NumPy and Pandas have show_versions() methods to display package versions.\n\n\nWhile environment managers are very easy to use and share across different systems, and are lightweight and efficient, offering fast start-up times, Docker containers provide a full env isolation (including the operating system) which ensures consistent behavior across different systems.\n\n\n\n\nTo maintain clarity and organization in the data analysis process, adopt best practices such as:\n\nData documentation: create a README.md file to provide an overview of the project and its structure, and metadata for understanding the context of your analysis.\nAnnotate your pipelines and comment your code (look for tutorials and templates such as this one from freeCodeCamp).\nUse coding style guides (code lay-out, whitespace in expressions, comments, naming conventions, annotations…) to maintain consistency.\nLabel files numerically to organize the entire data analysis process (scripts, notebooks, pipelines, etc.).\n\n00.preprocessing., 01.data_analysis_step1., etc.\n\nProvide environment files for reproducing the computational environment (such as ‘requirements.txt’ for Python or ‘environment.yml’ for Conda). The simplest way is to document the dependencies by reporting the packages and their versions used to run your analysis.\nData versioning: use version control systems (e.g., Git) and upload your code to a code repository Lesson 5.\nIntegrated development environments (e.g., RStudio, PyCharm) offer tools and features for writing, testing, and debugging code\nLeverage curated pipelines such as the ones developed by the nf-core community, further ensuring adherence to community standards and guidelines.\nAdd a LICENSE file and perform regular updates: clarifying usage permissions and facilitating collaboration.\n\n\n\n\n\n\n\nPractical HPC pipes\n\n\n\nWe provide a hand-on workshop on computational environments and pipelines. Keep an eye on the upcoming events on the Sandbox website. If you’re interested in delving deeper, check out the HPC best practices module we’ve developed here.",
"crumbs": [
"Course material",
"Key practices",
diff --git a/sitemap.xml b/sitemap.xml
index 9e04b9d0..e6521cbb 100644
--- a/sitemap.xml
+++ b/sitemap.xml
@@ -2,74 +2,74 @@
https://hds-sandbox.github.io/RDM_NGS_course/use_cases.html
- 2024-05-09T11:48:15.632Z
+ 2024-05-09T11:48:51.746Z
https://hds-sandbox.github.io/RDM_NGS_course/develop/06_pipelines.html
- 2024-05-09T11:48:15.592Z
+ 2024-05-09T11:48:51.706Z
https://hds-sandbox.github.io/RDM_NGS_course/develop/practical_workshop.html
- 2024-05-09T11:48:15.632Z
+ 2024-05-09T11:48:51.742Z
https://hds-sandbox.github.io/RDM_NGS_course/develop/04_metadata.html
- 2024-05-09T11:48:15.592Z
+ 2024-05-09T11:48:51.706Z
https://hds-sandbox.github.io/RDM_NGS_course/develop/05_VC.html
- 2024-05-09T11:48:15.592Z
+ 2024-05-09T11:48:51.706Z
https://hds-sandbox.github.io/RDM_NGS_course/develop/07_repos.html
- 2024-05-09T11:48:15.592Z
+ 2024-05-09T11:48:51.706Z
https://hds-sandbox.github.io/RDM_NGS_course/develop/examples/proteomics_metadata.html
- 2024-05-09T11:48:15.608Z
+ 2024-05-09T11:48:51.726Z
https://hds-sandbox.github.io/RDM_NGS_course/develop/examples/NGS_management.html
- 2024-05-09T11:48:15.608Z
+ 2024-05-09T11:48:51.726Z
https://hds-sandbox.github.io/RDM_NGS_course/cards/AlbaMartinez.html
- 2024-05-09T11:48:15.588Z
+ 2024-05-09T11:48:51.706Z
https://hds-sandbox.github.io/RDM_NGS_course/cards/JARomero.html
- 2024-05-09T11:48:15.592Z
+ 2024-05-09T11:48:51.706Z
https://hds-sandbox.github.io/RDM_NGS_course/develop/examples/mkdocs_pages.html
- 2024-05-09T11:48:15.608Z
+ 2024-05-09T11:48:51.726Z
https://hds-sandbox.github.io/RDM_NGS_course/develop/examples/NGS_OS_FAIR.html
- 2024-05-09T11:48:15.608Z
+ 2024-05-09T11:48:51.726Z
https://hds-sandbox.github.io/RDM_NGS_course/develop/examples/NGS_metadata.html
- 2024-05-09T11:48:15.608Z
+ 2024-05-09T11:48:51.726Z
https://hds-sandbox.github.io/RDM_NGS_course/develop/contributors.html
- 2024-05-09T11:48:15.608Z
+ 2024-05-09T11:48:51.722Z
https://hds-sandbox.github.io/RDM_NGS_course/develop/03_DOD.html
- 2024-05-09T11:48:15.592Z
+ 2024-05-09T11:48:51.706Z
https://hds-sandbox.github.io/RDM_NGS_course/develop/01_RDM_intro.html
- 2024-05-09T11:48:15.592Z
+ 2024-05-09T11:48:51.706Z
https://hds-sandbox.github.io/RDM_NGS_course/develop/02_DMP.html
- 2024-05-09T11:48:15.592Z
+ 2024-05-09T11:48:51.706Z
https://hds-sandbox.github.io/RDM_NGS_course/index.html
- 2024-05-09T11:48:15.632Z
+ 2024-05-09T11:48:51.742Z
-
-
diff --git a/search.json b/search.json
index 1a71e96c..bc645fb3 100644
--- a/search.json
+++ b/search.json
@@ -15,7 +15,7 @@
"href": "develop/06_pipelines.html",
"title": "6. Processing and analyzing biodata",
"section": "",
- "text": "In this section, we explore essential elements of reproducibility and efficiency in computational research, highlighting techniques and tools for creating robust and transparent coding and workflows. By prioritizing reproducibility and replicability, researchers can enhance the credibility and impact of their findings while fostering collaboration and knowledge dissemination within the scientific community.\n\n\n\n\n\n\nBefore you start…\n\n\n\n\nChoose a folder structure (e.g., using cookiecutter)\nChoose a file naming system\nAdd a README describing the project (and the naming conventions)\nInstall and set up version control (e.g., Git and Github)\nChoose a coding style!\n\n\nPython: Python’s PEP or Google’s style guide\nR: Google’s style guide or Tidyverse’s style guide\n\n\n\n\n\nThrough techniques such as scripting, containerization (e.g., Docker), and virtual environments, researchers can create reproducible analyses that enable others to validate and build upon their work. Emphasizing the documentation of data processing steps, parameters, and results ensures transparency and accountability in research outputs. To write clear and reproducible code, take the following approach: write functions, code defensively (such as input validation, error handling, etc.), add comments, conduct testing, and maintain proper documentation.\nTools for reproducibility:\n\nCode notebooks: Utilize tools like Jupyter Notebook and R Markdown to combine code with descriptive text and visualizations, enhancing data documentation.\n\nIntegrated development environments: Consider using platforms such as (knitr or MLflow) to streamline code development and documentation processes.\nPipeline frameworks or workflow management systems: Implement systems like Nextflow and Snakemake to automate data analysis steps (including data extraction, transformation, validation, visualization, and more). Additionally, they contribute to ensuring interoperability by facilitating seamless integration and interaction between different components or stages.\n\n\n\nComputational notebooks (e.g., Jupyter, R Markdown) provide researchers with a versatile platform for exploratory and interactive data analysis. These notebooks facilitate sharing insights with collaborators and documentation of analysis procedures.\n\n\n\nTools such as Nextflow and Snakemake streamline and automate various data analysis steps, enabling parallel processing and seamless integration with existing tools. Remember to create portable code and use relative paths to ensure transferability between users.\n\nNextflow: offers scalable and portable NGS data analysis pipelines, facilitating data processing across diverse computing environments.\nSnakemake: Utilizing Python-based scripting, Snakemake allows for flexible and automated NGS data analysis pipelines, supporting parallel processing and integration with other tools.\n\nOnce your scientific computational workflow is ready to be shared, publish your scientific computational workflow on WorkflowHub.\n\n\n\nEach computer or HPC (High-Performance Computing) platform has a unique computational environment that includes its operating system, installed software, versions of software packages, and other features. If a research project is moved to a different computer or platform, the analysis might not run or produce consistent results if it depends on any of these factors.\nFor research to be reproducible, the original computational environment must be recorded so others can replicate it. There are several methods to achieve this:\n\nContainerization platforms (e.g., Docker, Singularity): allow the researcher to package their software and dependencies into a standardized container image.\nVirtual Machines (e.g., VirtualBox): can share an entire virtualized computing environment (OS, software and dependencies)\nEnvironment managers: provide an isolated environment with specific packages and dependencies that can be installed without affecting the system-wide configuration. These environments are particularly useful for managing conflicting dependencies and ensuring reproducibility. Configuration files can automate the setup of the computational environment:\n\nconda: allows users to export environment specifications (software and dependencies) to YAML files enabling easy recreation of the environment on another system\nPython virtualenv: is a tool for creating isolated environments to manage dependencies specific to a project\nrequirements.txt: may contain commands for installing packages (such as pip for Python packages or apt-get for system-level dependencies), configuring system settings, and setting environment variables. Package managers can be used to install, upgrade and manage packages.\nR’s renv: The ‘renv’ package creates isolated environments in R.\n\nEnvironment descriptors\n\nsessionInfo() or devtools::session_info(): In R, these functions provide detailed information about the current session\nsessionInfo(), similarly, in Python. Libraries like NumPy and Pandas have show_versions() methods to display package versions.\n\n\nWhile environment managers are very easy to use and share across different systems, and are lightweight and efficient, offering fast start-up times, Docker containers provide a full env isolation (including the operating system) which ensures consistent behavior across different systems.\n\n\n\n\nTo maintain clarity and organization in the data analysis process, adopt best practices such as:\n\nData documentation: create a README.md file to provide an overview of the project and its structure, and metadata for understanding the context of your analysis.\nAnnotate your pipelines and comment your code (look for tutorials and templates such as this one from freeCodeCamp).\nUse coding style guides (code lay-out, whitespace in expressions, comments, naming conventions, annotations…) to maintain consistency.\nLabel files numerically to organize the entire data analysis process (scripts, notebooks, pipelines, etc.).\n\n00.preprocessing., 01.data_analysis_step1., etc.\n\nProvide environment files for reproducing the computational environment (such as ‘requirements.txt’ for Python or ‘environment.yml’ for Conda). The simplest way is to document the dependencies by reporting the packages and their versions used to run your analysis.\nData versioning: use version control systems (e.g., Git) and upload your code to a code repository Lesson 5.\nIntegrated development environments (e.g., RStudio, PyCharm) offer tools and features for writing, testing, and debugging code\nLeverage curated pipelines such as the ones developed by the nf-core community, further ensuring adherence to community standards and guidelines.\nAdd a LICENSE file and perform regular updates: clarifying usage permissions and facilitating collaboration.\n\n\n\n\n\n\n\nPractical HPC pipes\n\n\n\nWe provide a hand-on workshop on computational environments and pipelines. Keep an eye on the upcoming events on the Sandbox website.\nIf you’re interested in delving deeper, check out the HPC best practices module we’ve developed here.",
+ "text": "In this section, we explore essential elements of reproducibility and efficiency in computational research, highlighting techniques and tools for creating robust and transparent coding and workflows. By prioritizing reproducibility and replicability, researchers can enhance the credibility and impact of their findings while fostering collaboration and knowledge dissemination within the scientific community.\n\n\n\n\n\n\nBefore you start…\n\n\n\n\nChoose a folder structure (e.g., using cookiecutter)\nChoose a file naming system\nAdd a README describing the project (and the naming conventions)\nInstall and set up version control (e.g., Git and Github)\nChoose a coding style!\n\n\nPython: Python’s PEP or Google’s style guide\nR: Google’s style guide or Tidyverse’s style guide\n\n\n\n\n\nThrough techniques such as scripting, containerization (e.g., Docker), and virtual environments, researchers can create reproducible analyses that enable others to validate and build upon their work. Emphasizing the documentation of data processing steps, parameters, and results ensures transparency and accountability in research outputs. To write clear and reproducible code, take the following approach: write functions, code defensively (such as input validation, error handling, etc.), add comments, conduct testing, and maintain proper documentation.\nTools for reproducibility:\n\nCode notebooks: Utilize tools like Jupyter Notebook and R Markdown to combine code with descriptive text and visualizations, enhancing data documentation.\n\nIntegrated development environments: Consider using platforms such as (knitr or MLflow) to streamline code development and documentation processes.\nPipeline frameworks or workflow management systems: Implement systems like Nextflow and Snakemake to automate data analysis steps (including data extraction, transformation, validation, visualization, and more). Additionally, they contribute to ensuring interoperability by facilitating seamless integration and interaction between different components or stages.\n\n\n\nComputational notebooks (e.g., Jupyter, R Markdown) provide researchers with a versatile platform for exploratory and interactive data analysis. These notebooks facilitate sharing insights with collaborators and documentation of analysis procedures.\n\n\n\nTools such as Nextflow and Snakemake streamline and automate various data analysis steps, enabling parallel processing and seamless integration with existing tools. Remember to create portable code and use relative paths to ensure transferability between users.\n\nNextflow: offers scalable and portable NGS data analysis pipelines, facilitating data processing across diverse computing environments.\nSnakemake: Utilizing Python-based scripting, Snakemake allows for flexible and automated NGS data analysis pipelines, supporting parallel processing and integration with other tools.\n\nOnce your scientific computational workflow is ready to be shared, publish your scientific computational workflow on WorkflowHub.\n\n\n\nEach computer or HPC (High-Performance Computing) platform has a unique computational environment that includes its operating system, installed software, versions of software packages, and other features. If a research project is moved to a different computer or platform, the analysis might not run or produce consistent results if it depends on any of these factors.\nFor research to be reproducible, the original computational environment must be recorded so others can replicate it. There are several methods to achieve this:\n\nContainerization platforms (e.g., Docker, Singularity): allow the researcher to package their software and dependencies into a standardized container image.\nVirtual Machines (e.g., VirtualBox): can share an entire virtualized computing environment (OS, software and dependencies)\nEnvironment managers: provide an isolated environment with specific packages and dependencies that can be installed without affecting the system-wide configuration. These environments are particularly useful for managing conflicting dependencies and ensuring reproducibility. Configuration files can automate the setup of the computational environment:\n\nconda: allows users to export environment specifications (software and dependencies) to YAML files enabling easy recreation of the environment on another system\nPython virtualenv: is a tool for creating isolated environments to manage dependencies specific to a project\nrequirements.txt: may contain commands for installing packages (such as pip for Python packages or apt-get for system-level dependencies), configuring system settings, and setting environment variables. Package managers can be used to install, upgrade and manage packages.\nR’s renv: The ‘renv’ package creates isolated environments in R.\n\nEnvironment descriptors\n\nsessionInfo() or devtools::session_info(): In R, these functions provide detailed information about the current session\nsessionInfo(), similarly, in Python. Libraries like NumPy and Pandas have show_versions() methods to display package versions.\n\n\nWhile environment managers are very easy to use and share across different systems, and are lightweight and efficient, offering fast start-up times, Docker containers provide a full env isolation (including the operating system) which ensures consistent behavior across different systems.\n\n\n\n\nTo maintain clarity and organization in the data analysis process, adopt best practices such as:\n\nData documentation: create a README.md file to provide an overview of the project and its structure, and metadata for understanding the context of your analysis.\nAnnotate your pipelines and comment your code (look for tutorials and templates such as this one from freeCodeCamp).\nUse coding style guides (code lay-out, whitespace in expressions, comments, naming conventions, annotations…) to maintain consistency.\nLabel files numerically to organize the entire data analysis process (scripts, notebooks, pipelines, etc.).\n\n00.preprocessing., 01.data_analysis_step1., etc.\n\nProvide environment files for reproducing the computational environment (such as ‘requirements.txt’ for Python or ‘environment.yml’ for Conda). The simplest way is to document the dependencies by reporting the packages and their versions used to run your analysis.\nData versioning: use version control systems (e.g., Git) and upload your code to a code repository Lesson 5.\nIntegrated development environments (e.g., RStudio, PyCharm) offer tools and features for writing, testing, and debugging code\nLeverage curated pipelines such as the ones developed by the nf-core community, further ensuring adherence to community standards and guidelines.\nAdd a LICENSE file and perform regular updates: clarifying usage permissions and facilitating collaboration.\n\n\n\n\n\n\n\nPractical HPC pipes\n\n\n\nWe provide a hand-on workshop on computational environments and pipelines. Keep an eye on the upcoming events on the Sandbox website. If you’re interested in delving deeper, check out the HPC best practices module we’ve developed here.",
"crumbs": [
"Course material",
"Key practices",
@@ -27,7 +27,7 @@
"href": "develop/06_pipelines.html#code-and-pipelines-for-data-analysis",
"title": "6. Processing and analyzing biodata",
"section": "",
- "text": "In this section, we explore essential elements of reproducibility and efficiency in computational research, highlighting techniques and tools for creating robust and transparent coding and workflows. By prioritizing reproducibility and replicability, researchers can enhance the credibility and impact of their findings while fostering collaboration and knowledge dissemination within the scientific community.\n\n\n\n\n\n\nBefore you start…\n\n\n\n\nChoose a folder structure (e.g., using cookiecutter)\nChoose a file naming system\nAdd a README describing the project (and the naming conventions)\nInstall and set up version control (e.g., Git and Github)\nChoose a coding style!\n\n\nPython: Python’s PEP or Google’s style guide\nR: Google’s style guide or Tidyverse’s style guide\n\n\n\n\n\nThrough techniques such as scripting, containerization (e.g., Docker), and virtual environments, researchers can create reproducible analyses that enable others to validate and build upon their work. Emphasizing the documentation of data processing steps, parameters, and results ensures transparency and accountability in research outputs. To write clear and reproducible code, take the following approach: write functions, code defensively (such as input validation, error handling, etc.), add comments, conduct testing, and maintain proper documentation.\nTools for reproducibility:\n\nCode notebooks: Utilize tools like Jupyter Notebook and R Markdown to combine code with descriptive text and visualizations, enhancing data documentation.\n\nIntegrated development environments: Consider using platforms such as (knitr or MLflow) to streamline code development and documentation processes.\nPipeline frameworks or workflow management systems: Implement systems like Nextflow and Snakemake to automate data analysis steps (including data extraction, transformation, validation, visualization, and more). Additionally, they contribute to ensuring interoperability by facilitating seamless integration and interaction between different components or stages.\n\n\n\nComputational notebooks (e.g., Jupyter, R Markdown) provide researchers with a versatile platform for exploratory and interactive data analysis. These notebooks facilitate sharing insights with collaborators and documentation of analysis procedures.\n\n\n\nTools such as Nextflow and Snakemake streamline and automate various data analysis steps, enabling parallel processing and seamless integration with existing tools. Remember to create portable code and use relative paths to ensure transferability between users.\n\nNextflow: offers scalable and portable NGS data analysis pipelines, facilitating data processing across diverse computing environments.\nSnakemake: Utilizing Python-based scripting, Snakemake allows for flexible and automated NGS data analysis pipelines, supporting parallel processing and integration with other tools.\n\nOnce your scientific computational workflow is ready to be shared, publish your scientific computational workflow on WorkflowHub.\n\n\n\nEach computer or HPC (High-Performance Computing) platform has a unique computational environment that includes its operating system, installed software, versions of software packages, and other features. If a research project is moved to a different computer or platform, the analysis might not run or produce consistent results if it depends on any of these factors.\nFor research to be reproducible, the original computational environment must be recorded so others can replicate it. There are several methods to achieve this:\n\nContainerization platforms (e.g., Docker, Singularity): allow the researcher to package their software and dependencies into a standardized container image.\nVirtual Machines (e.g., VirtualBox): can share an entire virtualized computing environment (OS, software and dependencies)\nEnvironment managers: provide an isolated environment with specific packages and dependencies that can be installed without affecting the system-wide configuration. These environments are particularly useful for managing conflicting dependencies and ensuring reproducibility. Configuration files can automate the setup of the computational environment:\n\nconda: allows users to export environment specifications (software and dependencies) to YAML files enabling easy recreation of the environment on another system\nPython virtualenv: is a tool for creating isolated environments to manage dependencies specific to a project\nrequirements.txt: may contain commands for installing packages (such as pip for Python packages or apt-get for system-level dependencies), configuring system settings, and setting environment variables. Package managers can be used to install, upgrade and manage packages.\nR’s renv: The ‘renv’ package creates isolated environments in R.\n\nEnvironment descriptors\n\nsessionInfo() or devtools::session_info(): In R, these functions provide detailed information about the current session\nsessionInfo(), similarly, in Python. Libraries like NumPy and Pandas have show_versions() methods to display package versions.\n\n\nWhile environment managers are very easy to use and share across different systems, and are lightweight and efficient, offering fast start-up times, Docker containers provide a full env isolation (including the operating system) which ensures consistent behavior across different systems.\n\n\n\n\nTo maintain clarity and organization in the data analysis process, adopt best practices such as:\n\nData documentation: create a README.md file to provide an overview of the project and its structure, and metadata for understanding the context of your analysis.\nAnnotate your pipelines and comment your code (look for tutorials and templates such as this one from freeCodeCamp).\nUse coding style guides (code lay-out, whitespace in expressions, comments, naming conventions, annotations…) to maintain consistency.\nLabel files numerically to organize the entire data analysis process (scripts, notebooks, pipelines, etc.).\n\n00.preprocessing., 01.data_analysis_step1., etc.\n\nProvide environment files for reproducing the computational environment (such as ‘requirements.txt’ for Python or ‘environment.yml’ for Conda). The simplest way is to document the dependencies by reporting the packages and their versions used to run your analysis.\nData versioning: use version control systems (e.g., Git) and upload your code to a code repository Lesson 5.\nIntegrated development environments (e.g., RStudio, PyCharm) offer tools and features for writing, testing, and debugging code\nLeverage curated pipelines such as the ones developed by the nf-core community, further ensuring adherence to community standards and guidelines.\nAdd a LICENSE file and perform regular updates: clarifying usage permissions and facilitating collaboration.\n\n\n\n\n\n\n\nPractical HPC pipes\n\n\n\nWe provide a hand-on workshop on computational environments and pipelines. Keep an eye on the upcoming events on the Sandbox website.\nIf you’re interested in delving deeper, check out the HPC best practices module we’ve developed here.",
+ "text": "In this section, we explore essential elements of reproducibility and efficiency in computational research, highlighting techniques and tools for creating robust and transparent coding and workflows. By prioritizing reproducibility and replicability, researchers can enhance the credibility and impact of their findings while fostering collaboration and knowledge dissemination within the scientific community.\n\n\n\n\n\n\nBefore you start…\n\n\n\n\nChoose a folder structure (e.g., using cookiecutter)\nChoose a file naming system\nAdd a README describing the project (and the naming conventions)\nInstall and set up version control (e.g., Git and Github)\nChoose a coding style!\n\n\nPython: Python’s PEP or Google’s style guide\nR: Google’s style guide or Tidyverse’s style guide\n\n\n\n\n\nThrough techniques such as scripting, containerization (e.g., Docker), and virtual environments, researchers can create reproducible analyses that enable others to validate and build upon their work. Emphasizing the documentation of data processing steps, parameters, and results ensures transparency and accountability in research outputs. To write clear and reproducible code, take the following approach: write functions, code defensively (such as input validation, error handling, etc.), add comments, conduct testing, and maintain proper documentation.\nTools for reproducibility:\n\nCode notebooks: Utilize tools like Jupyter Notebook and R Markdown to combine code with descriptive text and visualizations, enhancing data documentation.\n\nIntegrated development environments: Consider using platforms such as (knitr or MLflow) to streamline code development and documentation processes.\nPipeline frameworks or workflow management systems: Implement systems like Nextflow and Snakemake to automate data analysis steps (including data extraction, transformation, validation, visualization, and more). Additionally, they contribute to ensuring interoperability by facilitating seamless integration and interaction between different components or stages.\n\n\n\nComputational notebooks (e.g., Jupyter, R Markdown) provide researchers with a versatile platform for exploratory and interactive data analysis. These notebooks facilitate sharing insights with collaborators and documentation of analysis procedures.\n\n\n\nTools such as Nextflow and Snakemake streamline and automate various data analysis steps, enabling parallel processing and seamless integration with existing tools. Remember to create portable code and use relative paths to ensure transferability between users.\n\nNextflow: offers scalable and portable NGS data analysis pipelines, facilitating data processing across diverse computing environments.\nSnakemake: Utilizing Python-based scripting, Snakemake allows for flexible and automated NGS data analysis pipelines, supporting parallel processing and integration with other tools.\n\nOnce your scientific computational workflow is ready to be shared, publish your scientific computational workflow on WorkflowHub.\n\n\n\nEach computer or HPC (High-Performance Computing) platform has a unique computational environment that includes its operating system, installed software, versions of software packages, and other features. If a research project is moved to a different computer or platform, the analysis might not run or produce consistent results if it depends on any of these factors.\nFor research to be reproducible, the original computational environment must be recorded so others can replicate it. There are several methods to achieve this:\n\nContainerization platforms (e.g., Docker, Singularity): allow the researcher to package their software and dependencies into a standardized container image.\nVirtual Machines (e.g., VirtualBox): can share an entire virtualized computing environment (OS, software and dependencies)\nEnvironment managers: provide an isolated environment with specific packages and dependencies that can be installed without affecting the system-wide configuration. These environments are particularly useful for managing conflicting dependencies and ensuring reproducibility. Configuration files can automate the setup of the computational environment:\n\nconda: allows users to export environment specifications (software and dependencies) to YAML files enabling easy recreation of the environment on another system\nPython virtualenv: is a tool for creating isolated environments to manage dependencies specific to a project\nrequirements.txt: may contain commands for installing packages (such as pip for Python packages or apt-get for system-level dependencies), configuring system settings, and setting environment variables. Package managers can be used to install, upgrade and manage packages.\nR’s renv: The ‘renv’ package creates isolated environments in R.\n\nEnvironment descriptors\n\nsessionInfo() or devtools::session_info(): In R, these functions provide detailed information about the current session\nsessionInfo(), similarly, in Python. Libraries like NumPy and Pandas have show_versions() methods to display package versions.\n\n\nWhile environment managers are very easy to use and share across different systems, and are lightweight and efficient, offering fast start-up times, Docker containers provide a full env isolation (including the operating system) which ensures consistent behavior across different systems.\n\n\n\n\nTo maintain clarity and organization in the data analysis process, adopt best practices such as:\n\nData documentation: create a README.md file to provide an overview of the project and its structure, and metadata for understanding the context of your analysis.\nAnnotate your pipelines and comment your code (look for tutorials and templates such as this one from freeCodeCamp).\nUse coding style guides (code lay-out, whitespace in expressions, comments, naming conventions, annotations…) to maintain consistency.\nLabel files numerically to organize the entire data analysis process (scripts, notebooks, pipelines, etc.).\n\n00.preprocessing., 01.data_analysis_step1., etc.\n\nProvide environment files for reproducing the computational environment (such as ‘requirements.txt’ for Python or ‘environment.yml’ for Conda). The simplest way is to document the dependencies by reporting the packages and their versions used to run your analysis.\nData versioning: use version control systems (e.g., Git) and upload your code to a code repository Lesson 5.\nIntegrated development environments (e.g., RStudio, PyCharm) offer tools and features for writing, testing, and debugging code\nLeverage curated pipelines such as the ones developed by the nf-core community, further ensuring adherence to community standards and guidelines.\nAdd a LICENSE file and perform regular updates: clarifying usage permissions and facilitating collaboration.\n\n\n\n\n\n\n\nPractical HPC pipes\n\n\n\nWe provide a hand-on workshop on computational environments and pipelines. Keep an eye on the upcoming events on the Sandbox website. If you’re interested in delving deeper, check out the HPC best practices module we’ve developed here.",
"crumbs": [
"Course material",
"Key practices",
diff --git a/sitemap.xml b/sitemap.xml
index 9e04b9d0..e6521cbb 100644
--- a/sitemap.xml
+++ b/sitemap.xml
@@ -2,74 +2,74 @@
https://hds-sandbox.github.io/RDM_NGS_course/use_cases.html
- 2024-05-09T11:48:15.632Z
+ 2024-05-09T11:48:51.746Z
https://hds-sandbox.github.io/RDM_NGS_course/develop/06_pipelines.html
- 2024-05-09T11:48:15.592Z
+ 2024-05-09T11:48:51.706Z
https://hds-sandbox.github.io/RDM_NGS_course/develop/practical_workshop.html
- 2024-05-09T11:48:15.632Z
+ 2024-05-09T11:48:51.742Z
https://hds-sandbox.github.io/RDM_NGS_course/develop/04_metadata.html
- 2024-05-09T11:48:15.592Z
+ 2024-05-09T11:48:51.706Z
https://hds-sandbox.github.io/RDM_NGS_course/develop/05_VC.html
- 2024-05-09T11:48:15.592Z
+ 2024-05-09T11:48:51.706Z
https://hds-sandbox.github.io/RDM_NGS_course/develop/07_repos.html
- 2024-05-09T11:48:15.592Z
+ 2024-05-09T11:48:51.706Z
https://hds-sandbox.github.io/RDM_NGS_course/develop/examples/proteomics_metadata.html
- 2024-05-09T11:48:15.608Z
+ 2024-05-09T11:48:51.726Z
https://hds-sandbox.github.io/RDM_NGS_course/develop/examples/NGS_management.html
- 2024-05-09T11:48:15.608Z
+ 2024-05-09T11:48:51.726Z
https://hds-sandbox.github.io/RDM_NGS_course/cards/AlbaMartinez.html
- 2024-05-09T11:48:15.588Z
+ 2024-05-09T11:48:51.706Z
https://hds-sandbox.github.io/RDM_NGS_course/cards/JARomero.html
- 2024-05-09T11:48:15.592Z
+ 2024-05-09T11:48:51.706Z
https://hds-sandbox.github.io/RDM_NGS_course/develop/examples/mkdocs_pages.html
- 2024-05-09T11:48:15.608Z
+ 2024-05-09T11:48:51.726Z
https://hds-sandbox.github.io/RDM_NGS_course/develop/examples/NGS_OS_FAIR.html
- 2024-05-09T11:48:15.608Z
+ 2024-05-09T11:48:51.726Z
https://hds-sandbox.github.io/RDM_NGS_course/develop/examples/NGS_metadata.html
- 2024-05-09T11:48:15.608Z
+ 2024-05-09T11:48:51.726Z
https://hds-sandbox.github.io/RDM_NGS_course/develop/contributors.html
- 2024-05-09T11:48:15.608Z
+ 2024-05-09T11:48:51.722Z
https://hds-sandbox.github.io/RDM_NGS_course/develop/03_DOD.html
- 2024-05-09T11:48:15.592Z
+ 2024-05-09T11:48:51.706Z
https://hds-sandbox.github.io/RDM_NGS_course/develop/01_RDM_intro.html
- 2024-05-09T11:48:15.592Z
+ 2024-05-09T11:48:51.706Z
https://hds-sandbox.github.io/RDM_NGS_course/develop/02_DMP.html
- 2024-05-09T11:48:15.592Z
+ 2024-05-09T11:48:51.706Z
https://hds-sandbox.github.io/RDM_NGS_course/index.html
- 2024-05-09T11:48:15.632Z
+ 2024-05-09T11:48:51.742Z