CenterForMedicalGeneticsGhent · tsertijn · Mar 5, 2024 · Mar 5, 2024 · Apr 3, 2024 · Apr 3, 2024
diff --git a/.eslintignore b/.eslintignore
diff --git a/.github/workflows/build-and-deploy.yml b/.github/workflows/build-and-deploy.yml
@@ -2,7 +2,7 @@ name: Build and Deploy
 
 on:
   push:
-    branches: [master]
+    branches: [main]
   pull_request:
 
 jobs:
@@ -11,7 +11,7 @@ jobs:
 
     runs-on: ubuntu-latest
 
-    if: github.ref == 'refs/heads/master'
+    if: github.ref == 'refs/heads/main'
 
     steps:
       - uses: actions/checkout@v2

diff --git a/.yarn/install-state.gz b/.yarn/install-state.gz
diff --git a/.yarnrc.yml b/.yarnrc.yml
@@ -0,0 +1 @@
+nodeLinker: node-modules
diff --git a/README.md b/README.md
@@ -56,3 +56,6 @@ http://localhost:3000/docs
 
 Please cite the [following publication](10.31219/osf.io/pyqrx):
 > Sehi L’Yi, Dominika Maziec, Victoria Stevens, Trevor Manz, Alexander Veit, Michele Berselli, Peter J. Park, Dominik Głodzik, and Nils Gehlenborg. Chromoscope: interactive multiscale visualization for structural variation in human genomes. Nat Methods 20, 1834–1835 (2023). https://doi.org/10.1038/s41592-023-02056-x
+
+## Funding
+Chromoscope is funded in part through a grant awarded by [Innovation in Cancer Informatics.](https://www.the-ici-fund.org/)
diff --git a/docs/docs/loading-data/data-formats.md b/docs/docs/loading-data/data-formats.md
@@ -9,7 +9,7 @@ This page describes file formats used in Chromoscope. To find a list of required
 ## Structural Variants (BEDPE)
 <!-- https://bedtools.readthedocs.io/en/latest/content/general-usage.html#bedpe-format -->
 
-The structural variants are stored in a BEDPE file. The following columns are used in the browser:
+The structural variants are stored in a headed BEDPE file. The order of the columns does not need to be in the exact same order. This is a The following columns are used in the browser:
 
 | Property | Type | Note |
 |---|---|---|
@@ -43,7 +43,7 @@ In Chromosope, strands are mapped with the following types of SVs.
 ## CNV (TSV)
 <!-- https://bedtools.readthedocs.io/en/latest/content/general-usage.html#bedpe-format -->
 
-The CNV is stored in a tab-delimited file that is visualized as three tracks: CNV, Gain, and LOH.
+The CNV is stored in a headed tab-delimited file that is visualized as three tracks: CNV, Gain, and LOH. The order of the columns does not need to be in the exact same order.
 
 | Property | Type | Note |
 |---|---|---|
@@ -63,7 +63,7 @@ https://s3.amazonaws.com/gosling-lang.org/data/SV/7a921087-8e62-4a93-a757-fd8cdb
 ## Drivers (TSV or JSON)
 <!-- https://bedtools.readthedocs.io/en/latest/content/general-usage.html#bedpe-format -->
 
-The drivers are stored in a tab-delimited file. When this file is present, the browser will show drivers that are included in the file only.
+The drivers are stored in a headed tab-delimited file. When this file is present, the browser will show drivers that are included in the file only.
 
 The order of the columns does not need to be in the exact same order.
 

diff --git a/docs/docs/loading-data/through-data-config.md b/docs/docs/loading-data/through-data-config.md
@@ -23,7 +23,7 @@ For each sample, you need to prepare the following information in a JSON object.
 | `cancer` | `string` | Required. Type of a cancer. |
 | `assembly` | `'hg38'` or `'hg19'` | Required. Assembly. |
 | `sv` | `string` | Required. An URL of the SV bedpe file (`.bedpe`). |
-| `cnv` | `string` | Required. An URL of the CNV text file (`.tsv`). |
+| `cnv` | `string` | Optional. An URL of the CNV text file (`.tsv`). |
 | `drivers` | `string` | Optional. An URL of a file that contains drivers (`.tsv` or `.json`). |
 | `vcf` | `string` | Optional. An URL of the point mutation file (`.vcf`). |
 | `vcfIndex` | `string` | Optional. An URL of the point mutation index file (`.tbi`). |

diff --git a/docs/docs/visualizations/genome-view.md b/docs/docs/visualizations/genome-view.md
@@ -11,5 +11,5 @@ The genome view shows the selected sample in a circular visualization. This uses
 
 ## Interactions
 
-- You can move or resize an interactive brush (light blue) using the mouse. This is linked with a [variant view](./cohort-view) that is shown on the bottom of the genome view.
+- You can move or resize an interactive brush (light blue) using the mouse. This is linked with a [variant view](./variant-view) that is shown on the bottom of the genome view.
 - You can move your mouse on top of a structural variant to see detailed information on a tooltip.
diff --git a/docs/src/pages/about/index.md b/docs/src/pages/about/index.md
@@ -18,3 +18,7 @@ Chromoscope is developed and maintained by the members of [Department of Biomedi
 When citing Chromoscope in your paper, please cite the [following publication](https://10.31219/osf.io/pyqrx).
 
 > Sehi L’Yi, Dominika Maziec, Victoria Stevens, Trevor Manz, Alexander Veit, Michele Berselli, Peter J. Park, Dominik Głodzik, and Nils Gehlenborg. Chromoscope: interactive multiscale visualization for structural variation in human genomes. _Nat Methods_ 20, 1834–1835 (2023). https://doi.org/10.1038/s41592-023-02056-x
+
+## Funding
+
+Chromoscope is funded in part through a grant awarded by [Innovation in Cancer Informatics](https://www.the-ici-fund.org/)
diff --git a/package.json b/package.json
@@ -22,8 +22,9 @@
         "@types/react": "^17.0.37",
         "@types/react-dom": "^17.0.11",
         "@types/react-router-dom": "^5.2.0",
+        "bootstrap": "^5.3.3",
         "buffer": "^6.0.3",
-        "gosling.js": "^0.11.0",
+        "gosling.js": "^0.17.0",
         "idb": "^7.0.2",
         "lodash": "^4.17.21",
         "path": "^0.12.7",
@@ -54,5 +55,8 @@
         "hooks": {
             "pre-commit": "yarn format && git add ."
         }
+    },
+    "peerDependencies": {
+        "@popperjs/core": "*"
     }
 }
diff --git a/scripts/README.md b/scripts/README.md
@@ -1,3 +1,15 @@
 # Chromoscope
 
+## presigned_url_scripts
 This folder contains scripts that are used to process datasets and for generating AWS presigned URLS for a cohort.
+
+
+## S3_Bucket_Query_to_chromoscope
+
+This folder contains scripts for querying a locally hosted S3 bucket, selecting files of interest, and either generating a `config.json` file or visualizing the data directly in Chromoscope using the external URL parameter.
+
+**Important:** Before running the scripts, ensure that the S3 bucket name and credentials are properly configured in `app.py` and `scripts.js`.
+
+## file_creation_for_higlassserver
+
+This folder contains scripts that are used to create different filetypes used in Higlass, such as multivec files and bed files. The `beddb_file_creation_from_tsv.sh` bash script is used to run both the `tsv_to_bed_bash.py` file which will create a bed file and then encode and upload it to a local higlass server.
diff --git a/scripts/S3_Bucket_Query_to_Chromoscope.zip b/scripts/S3_Bucket_Query_to_Chromoscope.zip
diff --git a/scripts/clustering/run_clustering.sh b/scripts/clustering/run_clustering.sh
diff --git a/scripts/file_creation_for_higlassserver/beddb_file_creation_from_tsv.sh b/scripts/file_creation_for_higlassserver/beddb_file_creation_from_tsv.sh
@@ -0,0 +1,22 @@
+#!/usr/bin/env bash
+
+#check if length of the arguments gives all needed info
+if [[ "$#" -ne 2 ]]; then
+        echo "incorrect number of arguments: eg: script filepath filetype (eg. baf)"
+        exit 1
+fi
+
+filepath="$1"
+filepath_array=($(echo $filepath| cut -d. -f1))
+for i in "${filepath_array[@]}"  
+do  
+echo $i
+done
+new_bed_file_name=${filepath_array[0]}
+new_bed_file_name+=".bed"
+uploaded_file_name=$new_bed_file_name
+uploaded_file_name+=".beddb"
+#python tsv_to_bed_bash.py $1 $2
+
+clodius aggregate bedfile --chromsizes-filename hg19.chrom.sizes $new_bed_file_name
+higlass-manage ingest $uploaded_file_name --filetype bedfile --datatype bedlike --assembly hg19
diff --git a/scripts/file_creation_for_higlassserver/create_all_files_for_multivec_formatting.py b/scripts/file_creation_for_higlassserver/create_all_files_for_multivec_formatting.py
@@ -0,0 +1,60 @@
+import pandas as pd
+import os
+import h5py
+os.chdir("../..")
+os.chdir("documents2/stage/test_data")
+
+# changes the chromosome column to always have chr. as a row instead of just the numbers as it is outputted from hopla
+df = pd.read_csv("combined_filtered_CNV_data.tsv", sep="\t")
+df.rename(columns={'seqnames':'chromosome'}, inplace=True)
+grouped = df.groupby("chromosome")
+df['chromosome'] = df['chromosome'].apply(lambda x: f'chr{x}')
+
+chromosome_info = {}
+
+# change the postition of the sample column for my convenience and drop the columns that are giving problems to
+# the clodius aggregate function
+temp_cols = df.columns.tolist()
+new_cols = temp_cols[-1:] + temp_cols[:-1]
+df = df[new_cols]
+df = df.drop(columns=['range', 'mask', 'threshold', 'seg_threshold'])
+unique_values = df['sample'].unique()
+df.set_index('sample', inplace=True)
+
+# make a file containing all sampletypes -> needed for clodius
+with open('sample_types.txt', 'w') as f:
+    for value in unique_values:
+        f.write(str(value) + '\n')
+#for column in df.columns:
+
+    #df[column] = df[column].astype('S')
+
+#df.to_hdf("combined_csv_test.hdf5",key="df", mode='w', complevel=5)
+# create a hdf5 file from the input tsv file in order to use in the aggregate function
+with h5py.File('blah.h5', 'w') as f:
+    # Group by the chromosome column
+    for chromosome, group in df.groupby('chromosome'):
+        # Convert the group (DataFrame) to a numpy array, excluding the chromosome column
+        data = group.drop(columns=['chromosome'])
+
+        # Create a dataset for each chromosome
+        dset = f.create_dataset(chromosome, data.shape, data=data, compression='gzip')
+
+# create a matching chromsizes file by looking at the length
+for chrom_name, group in grouped:
+    # Determine the number of rows for this chromosome
+    num_rows = len(group)
+    # Assuming all columns from 'start' onward are data columns (adjust as needed)
+    num_columns = group.shape[1] - 3  # Adjust based on actual column positions
+
+    # Store the size of this chromosome for the chromsizes file
+    chromosome_info[chrom_name] = num_rows
+
+with open("chromsizes.txt", 'w') as f:
+    for chrom_name, size in chromosome_info.items():
+        f.write(f"chr{chrom_name}\t{size}\n")
+
+# checking sizes in order to make sure that no errors occur during the whole aggregate process.
+with h5py.File('blah.h5', 'r') as f:
+    for key in f.keys():
+        print(f"Dataset {key}: shape {f[key].shape}, dtype {f[key].dtype}")
diff --git a/scripts/file_creation_for_higlassserver/hg19.chrom.sizes b/scripts/file_creation_for_higlassserver/hg19.chrom.sizes
@@ -0,0 +1,93 @@
+chr1	249250621
+chr2	243199373
+chr3	198022430
+chr4	191154276
+chr5	180915260
+chr6	171115067
+chr7	159138663
+chrX	155270560
+chr8	146364022
+chr9	141213431
+chr10	135534747
+chr11	135006516
+chr12	133851895
+chr13	115169878
+chr14	107349540
+chr15	102531392
+chr16	90354753
+chr17	81195210
+chr18	78077248
+chr20	63025520
+chrY	59373566
+chr19	59128983
+chr22	51304566
+chr21	48129895
+chr6_ssto_hap7	4928567
+chr6_mcf_hap5	4833398
+chr6_cox_hap2	4795371
+chr6_mann_hap4	4683263
+chr6_apd_hap1	4622290
+chr6_qbl_hap6	4611984
+chr6_dbb_hap3	4610396
+chr17_ctg5_hap1	1680828
+chr4_ctg9_hap1	590426
+chr1_gl000192_random	547496
+chrUn_gl000225	211173
+chr4_gl000194_random	191469
+chr4_gl000193_random	189789
+chr9_gl000200_random	187035
+chrUn_gl000222	186861
+chrUn_gl000212	186858
+chr7_gl000195_random	182896
+chrUn_gl000223	180455
+chrUn_gl000224	179693
+chrUn_gl000219	179198
+chr17_gl000205_random	174588
+chrUn_gl000215	172545
+chrUn_gl000216	172294
+chrUn_gl000217	172149
+chr9_gl000199_random	169874
+chrUn_gl000211	166566
+chrUn_gl000213	164239
+chrUn_gl000220	161802
+chrUn_gl000218	161147
+chr19_gl000209_random	159169
+chrUn_gl000221	155397
+chrUn_gl000214	137718
+chrUn_gl000228	129120
+chrUn_gl000227	128374
+chr1_gl000191_random	106433
+chr19_gl000208_random	92689
+chr9_gl000198_random	90085
+chr17_gl000204_random	81310
+chrUn_gl000233	45941
+chrUn_gl000237	45867
+chrUn_gl000230	43691
+chrUn_gl000242	43523
+chrUn_gl000243	43341
+chrUn_gl000241	42152
+chrUn_gl000236	41934
+chrUn_gl000240	41933
+chr17_gl000206_random	41001
+chrUn_gl000232	40652
+chrUn_gl000234	40531
+chr11_gl000202_random	40103
+chrUn_gl000238	39939
+chrUn_gl000244	39929
+chrUn_gl000248	39786
+chr8_gl000196_random	38914
+chrUn_gl000249	38502
+chrUn_gl000246	38154
+chr17_gl000203_random	37498
+chr8_gl000197_random	37175
+chrUn_gl000245	36651
+chrUn_gl000247	36422
+chr9_gl000201_random	36148
+chrUn_gl000235	34474
+chrUn_gl000239	33824
+chr21_gl000210_random	27682
+chrUn_gl000231	27386
+chrUn_gl000229	19913
+chrM	16571
+chrUn_gl000226	15008
+chr18_gl000207_random	4262
diff --git a/scripts/file_creation_for_higlassserver/tsv_to_bed.py b/scripts/file_creation_for_higlassserver/tsv_to_bed.py
@@ -0,0 +1,12 @@
+import pandas as pd
+import os
+import fuc
+import re
+os.chdir("../..")
+os.chdir("documents2/stage/test_data")
+df = pd.read_csv("D2201410_new.tsv", delimiter="\t")
+df = df.drop(columns=['Sample'])
+df.columns = ['Chromosome', 'Start', 'End','REF', 'ALT','BAF']
+df['Chromosome'] = 'chr' + df['Chromosome'].astype(str)
+bf = fuc.pybed.BedFrame.from_frame(meta=[], data=df)
+bf.to_file('D2201410_new.bed')
diff --git a/scripts/file_creation_for_higlassserver/tsv_to_bed_bash.py b/scripts/file_creation_for_higlassserver/tsv_to_bed_bash.py
@@ -0,0 +1,18 @@
+#!/usr/bin/python3
+import pandas as pd
+import sys
+import fuc
+
+filepath = sys.argv[1]
+type_file = sys.argv[2]
+if len(sys.argv) < 3:
+    print("2 addition arguments are needed: format: tsv_to_bed_bash.py filepath filetype")
+    sys.exit(1)
+filename = filepath.split(".")
+df = pd.read_csv(filepath, delimiter="\t")
+if type_file.upper() == "BAF":
+    df = df.drop(columns=['Sample'])
+    df.columns = ['Chromosome', 'Start', 'End','REF', 'ALT','BAF']
+    df['Chromosome'] = df['Chromosome'].apply(lambda x: f'chr{x}')
+bf = fuc.pybed.BedFrame.from_frame(meta=[], data=df)
+bf.to_file(f'{filename}.bed')