From 8ab2298112e3c231099df5b7f246f5283aea9f8c Mon Sep 17 00:00:00 2001 From: Gammerdinger Date: Mon, 15 Apr 2024 11:08:37 -0400 Subject: [PATCH] Polish after seeing HTML --- Finding_and_summarizing_colossal_files/lessons/03_sed.md | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/Finding_and_summarizing_colossal_files/lessons/03_sed.md b/Finding_and_summarizing_colossal_files/lessons/03_sed.md index ebef4507..19431923 100644 --- a/Finding_and_summarizing_colossal_files/lessons/03_sed.md +++ b/Finding_and_summarizing_colossal_files/lessons/03_sed.md @@ -83,10 +83,11 @@ sed 's/jungle/rainforest/2g' ecosystems.txt 1. In annotation files (e.g., gtf, gff3) chromosomes are generally written as CHR1 *or* chr1. Some programs will want one or the other and `sed` can switch between the two easily. Use `sed` to alter the `chr` to `CHR` in the `hg38_subset.gff` file and save the output as `hg38_subset.uppercase.gff`:
- Click here to see the answer + Click here for the answer sed 's/chr/CHR/g' hg38_subset.gff > hg38_subset.uppercase.gff
+ 2. Your colleague has prepared a FASTA file for you. Take a look by typing `cat mygenes.fasta`. They named the genes (lines that start with `>`) with sequences! Rename the genes GAA and GAA_2 replacing GAA with gene1 with a **single command that does not alter any of the sequences**.
@@ -169,8 +170,9 @@ Let's extract only the quality scores from the `Mov10_oe_1.subset.fq` file and w |3|Always begins with a '+', and sometimes the same info as in line 1| |4|Has a string of characters representing the quality scores; must have same number of characters as line 2| +
- Click here to see the answer + Click here for the answer sed -n '4~4p' Mov10_oe_1.subset.fq > quality_scores.txt
@@ -243,7 +245,7 @@ You have the vcf file `test.vcf` in your `advanced_shell` directory. Use `sed` t
Click here for the answer To remove lines that START with ##:
sed '/^##/d' test.vcf > vcf_noheader.vcf
- to remove any lines that contain ##:
sed '/##/d' test.vcf > vcf_nodoublehash.vcf + To remove any lines that contain ##:
sed '/##/d' test.vcf > vcf_nodoublehash.vcf