Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inclusion of all changes made during the 2024 internship #11

Open
wants to merge 68 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
86a6e51
feat: make CNV file optional (#132)
crfmc Mar 5, 2024
4c9f8bc
fix: cnv track
sehilyi Mar 5, 2024
4a550fd
Update README.md
dominikglodzikhms Apr 3, 2024
f98d0a9
Update README.md
dominikglodzikhms Apr 3, 2024
822f4c6
Update README.md
dominikglodzikhms Apr 3, 2024
ae0fc4f
Update index.md
sehilyi Apr 3, 2024
bf82344
fix: update PTEN position based on hg38 (#140)
sehilyi Apr 17, 2024
3104b02
feat: make pe_support optional in sv bedpe data (#137)
sehilyi Apr 17, 2024
83c293a
Update driver.custom.json
dominikglodzikhms Apr 18, 2024
b00adca
Update genome-view.md
dominikglodzikhms Apr 29, 2024
4d0191d
feat: upgrade gosling.js to 0.17.0 (#143)
sehilyi May 16, 2024
d1c1969
feat: minimal_mode for iframe embedding (#138)
crfmc May 16, 2024
3c6278e
Create baf
tsertijn Jul 1, 2024
9b141e8
Rename baf to baf.ts
tsertijn Jul 1, 2024
8657d2e
Update baf.ts
tsertijn Jul 1, 2024
a9259b0
Delete src/track directory
tsertijn Jul 5, 2024
bed22cf
Add files via upload
tsertijn Jul 5, 2024
2663345
Delete src/App.tsx
tsertijn Jul 5, 2024
eeb8f80
Add files via upload
tsertijn Jul 5, 2024
915bf0d
Add files via upload
tsertijn Jul 5, 2024
7ee40ba
Add files via upload
tsertijn Jul 5, 2024
6bdc6ca
Delete src/mid-spec.ts
tsertijn Jul 5, 2024
5178320
Add files via upload
tsertijn Jul 5, 2024
14e5d26
Update samples.ts
tsertijn Jul 5, 2024
fca41b2
test
Jul 5, 2024
d136da7
Update main.tsx to include the dev/codec path instead of app
tsertijn Jul 5, 2024
b5876dc
Update vite.config.ts to allow for the dev/codec to work
tsertijn Jul 5, 2024
df1c8f1
hope is a nice thing
Aug 8, 2024
4025551
feat: updates to the minimal mode version (#146)
crfmc Aug 9, 2024
8ae8b20
Update build-and-deploy.yml
sehilyi Aug 9, 2024
36e7acd
Update build-and-deploy.yml
sehilyi Aug 9, 2024
6118728
Add Parent Mapping track and Mendelian Errors track
Maximvan Aug 19, 2024
9418f87
Merge branch 'main' of https://github.com/tsertijn/chromoscope
Maximvan Aug 19, 2024
64bf9d5
Add Haplotyping track and changed CNVs track
nicolasdebusschere Aug 19, 2024
9417216
fixed Lines
tsertijn Aug 19, 2024
4dfb120
Merge branch 'main' of https://github.com/tsertijn/chromoscope into main
tsertijn Aug 19, 2024
0893dc2
Merge branch 'main' of https://github.com/tsertijn/chromoscope
nicolasdebusschere Aug 19, 2024
35caebd
Merge branch 'main' of https://github.com/tsertijn/chromoscope
nicolasdebusschere Aug 19, 2024
8caaaa7
added zoomlimit to lower the loadburden of the mutaionstrack
tsertijn Aug 19, 2024
c770e1f
Delete src/ui/genomic-table.tsx
tsertijn Aug 19, 2024
cab03b4
Add thresholds to the cnv track, different colors per point now show up
nicolasdebusschere Aug 22, 2024
f2370d8
Merge branch 'main' of https://github.com/tsertijn/chromoscope
nicolasdebusschere Aug 22, 2024
72590ad
Add changes to cnv track so that the segmented mean also shows up in …
nicolasdebusschere Aug 29, 2024
1aa562b
New tracks and summary view
Maximvan Aug 29, 2024
33b2c95
Merge branch 'main' of https://github.com/tsertijn/chromoscope
Maximvan Aug 29, 2024
c5eabee
Merge remote-tracking branch 'upstream/main'
nicolasdebusschere Aug 29, 2024
f4139d1
Update question mark placement to be responsive with tracks shown
Maximvan Aug 30, 2024
9175c58
Add inline to cnv track
nicolasdebusschere Aug 30, 2024
630dfce
Fixing Layout of main specs and region search.
Maximvan Sep 2, 2024
9944452
Merge branch 'main' of https://github.com/tsertijn/chromoscope
nicolasdebusschere Sep 2, 2024
2817fec
add explanation and popover images for CNV, haplo and BAF
nicolasdebusschere Sep 2, 2024
3c35de0
Merge branch 'main' of https://github.com/tsertijn/chromoscope
nicolasdebusschere Sep 2, 2024
b1f4842
Customized genome and variant view
nicolasdebusschere Sep 2, 2024
e66efe0
Layout changes
Maximvan Sep 2, 2024
647a506
Merge branch 'main' of https://github.com/tsertijn/chromoscope
Maximvan Sep 2, 2024
dda9b31
added a track to run baf as from a higlass server, these changes also…
tsertijn Sep 4, 2024
62fae2b
Update create_all_files_for_multivec_formatting.py
tsertijn Sep 4, 2024
5ddaef2
Update tsv_to_bed_bash.py
tsertijn Sep 5, 2024
07571d7
Add S3 Bucket Query and implement external URL configuration
Maximvan Sep 5, 2024
090e5d4
Merge branch 'main' of https://github.com/tsertijn/chromoscope
Maximvan Sep 5, 2024
c7d40fc
Add description of S3 Bucket Query
Maximvan Sep 5, 2024
41b6f88
Update README.md
Maximvan Sep 5, 2024
3f9ef6b
Update README.md
Maximvan Sep 5, 2024
32f4db0
Delete src/lib/S3search_scripts.js
Maximvan Sep 5, 2024
a7ace83
Added new filetypes to the side-menu
tsertijn Sep 5, 2024
48cdcf0
Update README.md
tsertijn Sep 5, 2024
c6003e5
Update Bi_allel_via_higlass.ts
tsertijn Sep 5, 2024
642ac56
Update App.tsx
tsertijn Sep 5, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .eslintignore

This file was deleted.

4 changes: 2 additions & 2 deletions .github/workflows/build-and-deploy.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ name: Build and Deploy

on:
push:
branches: [master]
branches: [main]
pull_request:

jobs:
Expand All @@ -11,7 +11,7 @@ jobs:

runs-on: ubuntu-latest

if: github.ref == 'refs/heads/master'
if: github.ref == 'refs/heads/main'

steps:
- uses: actions/checkout@v2
Expand Down
Binary file added .yarn/install-state.gz
Binary file not shown.
1 change: 1 addition & 0 deletions .yarnrc.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
nodeLinker: node-modules
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,3 +56,6 @@ http://localhost:3000/docs

Please cite the [following publication](10.31219/osf.io/pyqrx):
> Sehi L’Yi, Dominika Maziec, Victoria Stevens, Trevor Manz, Alexander Veit, Michele Berselli, Peter J. Park, Dominik Głodzik, and Nils Gehlenborg. Chromoscope: interactive multiscale visualization for structural variation in human genomes. Nat Methods 20, 1834–1835 (2023). https://doi.org/10.1038/s41592-023-02056-x

## Funding
Chromoscope is funded in part through a grant awarded by [Innovation in Cancer Informatics.](https://www.the-ici-fund.org/)
6 changes: 3 additions & 3 deletions docs/docs/loading-data/data-formats.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ This page describes file formats used in Chromoscope. To find a list of required
## Structural Variants (BEDPE)
<!-- https://bedtools.readthedocs.io/en/latest/content/general-usage.html#bedpe-format -->

The structural variants are stored in a BEDPE file. The following columns are used in the browser:
The structural variants are stored in a headed BEDPE file. The order of the columns does not need to be in the exact same order. This is a The following columns are used in the browser:

| Property | Type | Note |
|---|---|---|
Expand Down Expand Up @@ -43,7 +43,7 @@ In Chromosope, strands are mapped with the following types of SVs.
## CNV (TSV)
<!-- https://bedtools.readthedocs.io/en/latest/content/general-usage.html#bedpe-format -->

The CNV is stored in a tab-delimited file that is visualized as three tracks: CNV, Gain, and LOH.
The CNV is stored in a headed tab-delimited file that is visualized as three tracks: CNV, Gain, and LOH. The order of the columns does not need to be in the exact same order.

| Property | Type | Note |
|---|---|---|
Expand All @@ -63,7 +63,7 @@ https://s3.amazonaws.com/gosling-lang.org/data/SV/7a921087-8e62-4a93-a757-fd8cdb
## Drivers (TSV or JSON)
<!-- https://bedtools.readthedocs.io/en/latest/content/general-usage.html#bedpe-format -->

The drivers are stored in a tab-delimited file. When this file is present, the browser will show drivers that are included in the file only.
The drivers are stored in a headed tab-delimited file. When this file is present, the browser will show drivers that are included in the file only.

The order of the columns does not need to be in the exact same order.

Expand Down
2 changes: 1 addition & 1 deletion docs/docs/loading-data/through-data-config.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ For each sample, you need to prepare the following information in a JSON object.
| `cancer` | `string` | Required. Type of a cancer. |
| `assembly` | `'hg38'` or `'hg19'` | Required. Assembly. |
| `sv` | `string` | Required. An URL of the SV bedpe file (`.bedpe`). |
| `cnv` | `string` | Required. An URL of the CNV text file (`.tsv`). |
| `cnv` | `string` | Optional. An URL of the CNV text file (`.tsv`). |
| `drivers` | `string` | Optional. An URL of a file that contains drivers (`.tsv` or `.json`). |
| `vcf` | `string` | Optional. An URL of the point mutation file (`.vcf`). |
| `vcfIndex` | `string` | Optional. An URL of the point mutation index file (`.tbi`). |
Expand Down
2 changes: 1 addition & 1 deletion docs/docs/visualizations/genome-view.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,5 +11,5 @@ The genome view shows the selected sample in a circular visualization. This uses

## Interactions

- You can move or resize an interactive brush (light blue) using the mouse. This is linked with a [variant view](./cohort-view) that is shown on the bottom of the genome view.
- You can move or resize an interactive brush (light blue) using the mouse. This is linked with a [variant view](./variant-view) that is shown on the bottom of the genome view.
- You can move your mouse on top of a structural variant to see detailed information on a tooltip.
4 changes: 4 additions & 0 deletions docs/src/pages/about/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,3 +18,7 @@ Chromoscope is developed and maintained by the members of [Department of Biomedi
When citing Chromoscope in your paper, please cite the [following publication](https://10.31219/osf.io/pyqrx).

> Sehi L’Yi, Dominika Maziec, Victoria Stevens, Trevor Manz, Alexander Veit, Michele Berselli, Peter J. Park, Dominik Głodzik, and Nils Gehlenborg. Chromoscope: interactive multiscale visualization for structural variation in human genomes. _Nat Methods_ 20, 1834–1835 (2023). https://doi.org/10.1038/s41592-023-02056-x

## Funding

Chromoscope is funded in part through a grant awarded by [Innovation in Cancer Informatics](https://www.the-ici-fund.org/)
6 changes: 5 additions & 1 deletion package.json
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,9 @@
"@types/react": "^17.0.37",
"@types/react-dom": "^17.0.11",
"@types/react-router-dom": "^5.2.0",
"bootstrap": "^5.3.3",
"buffer": "^6.0.3",
"gosling.js": "^0.11.0",
"gosling.js": "^0.17.0",
"idb": "^7.0.2",
"lodash": "^4.17.21",
"path": "^0.12.7",
Expand Down Expand Up @@ -54,5 +55,8 @@
"hooks": {
"pre-commit": "yarn format && git add ."
}
},
"peerDependencies": {
"@popperjs/core": "*"
}
}
12 changes: 12 additions & 0 deletions scripts/README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,15 @@
# Chromoscope

## presigned_url_scripts
This folder contains scripts that are used to process datasets and for generating AWS presigned URLS for a cohort.


## S3_Bucket_Query_to_chromoscope

This folder contains scripts for querying a locally hosted S3 bucket, selecting files of interest, and either generating a `config.json` file or visualizing the data directly in Chromoscope using the external URL parameter.

**Important:** Before running the scripts, ensure that the S3 bucket name and credentials are properly configured in `app.py` and `scripts.js`.

## file_creation_for_higlassserver

This folder contains scripts that are used to create different filetypes used in Higlass, such as multivec files and bed files. The `beddb_file_creation_from_tsv.sh` bash script is used to run both the `tsv_to_bed_bash.py` file which will create a bed file and then encode and upload it to a local higlass server.
Binary file added scripts/S3_Bucket_Query_to_Chromoscope.zip
Binary file not shown.
Empty file modified scripts/clustering/run_clustering.sh
100755 → 100644
Empty file.
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
#!/usr/bin/env bash

#check if length of the arguments gives all needed info
if [[ "$#" -ne 2 ]]; then
echo "incorrect number of arguments: eg: script filepath filetype (eg. baf)"
exit 1
fi

filepath="$1"
filepath_array=($(echo $filepath| cut -d. -f1))
for i in "${filepath_array[@]}"
do
echo $i
done
new_bed_file_name=${filepath_array[0]}
new_bed_file_name+=".bed"
uploaded_file_name=$new_bed_file_name
uploaded_file_name+=".beddb"
#python tsv_to_bed_bash.py $1 $2

clodius aggregate bedfile --chromsizes-filename hg19.chrom.sizes $new_bed_file_name
higlass-manage ingest $uploaded_file_name --filetype bedfile --datatype bedlike --assembly hg19
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
import pandas as pd
import os
import h5py
os.chdir("../..")
os.chdir("documents2/stage/test_data")

# changes the chromosome column to always have chr. as a row instead of just the numbers as it is outputted from hopla
df = pd.read_csv("combined_filtered_CNV_data.tsv", sep="\t")
df.rename(columns={'seqnames':'chromosome'}, inplace=True)
grouped = df.groupby("chromosome")
df['chromosome'] = df['chromosome'].apply(lambda x: f'chr{x}')

chromosome_info = {}

# change the postition of the sample column for my convenience and drop the columns that are giving problems to
# the clodius aggregate function
temp_cols = df.columns.tolist()
new_cols = temp_cols[-1:] + temp_cols[:-1]
df = df[new_cols]
df = df.drop(columns=['range', 'mask', 'threshold', 'seg_threshold'])
unique_values = df['sample'].unique()
df.set_index('sample', inplace=True)

# make a file containing all sampletypes -> needed for clodius
with open('sample_types.txt', 'w') as f:
for value in unique_values:
f.write(str(value) + '\n')
#for column in df.columns:

#df[column] = df[column].astype('S')

#df.to_hdf("combined_csv_test.hdf5",key="df", mode='w', complevel=5)
# create a hdf5 file from the input tsv file in order to use in the aggregate function
with h5py.File('blah.h5', 'w') as f:
# Group by the chromosome column
for chromosome, group in df.groupby('chromosome'):
# Convert the group (DataFrame) to a numpy array, excluding the chromosome column
data = group.drop(columns=['chromosome'])

# Create a dataset for each chromosome
dset = f.create_dataset(chromosome, data.shape, data=data, compression='gzip')

# create a matching chromsizes file by looking at the length
for chrom_name, group in grouped:
# Determine the number of rows for this chromosome
num_rows = len(group)
# Assuming all columns from 'start' onward are data columns (adjust as needed)
num_columns = group.shape[1] - 3 # Adjust based on actual column positions

# Store the size of this chromosome for the chromsizes file
chromosome_info[chrom_name] = num_rows

with open("chromsizes.txt", 'w') as f:
for chrom_name, size in chromosome_info.items():
f.write(f"chr{chrom_name}\t{size}\n")

# checking sizes in order to make sure that no errors occur during the whole aggregate process.
with h5py.File('blah.h5', 'r') as f:
for key in f.keys():
print(f"Dataset {key}: shape {f[key].shape}, dtype {f[key].dtype}")
93 changes: 93 additions & 0 deletions scripts/file_creation_for_higlassserver/hg19.chrom.sizes
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
chr1 249250621
chr2 243199373
chr3 198022430
chr4 191154276
chr5 180915260
chr6 171115067
chr7 159138663
chrX 155270560
chr8 146364022
chr9 141213431
chr10 135534747
chr11 135006516
chr12 133851895
chr13 115169878
chr14 107349540
chr15 102531392
chr16 90354753
chr17 81195210
chr18 78077248
chr20 63025520
chrY 59373566
chr19 59128983
chr22 51304566
chr21 48129895
chr6_ssto_hap7 4928567
chr6_mcf_hap5 4833398
chr6_cox_hap2 4795371
chr6_mann_hap4 4683263
chr6_apd_hap1 4622290
chr6_qbl_hap6 4611984
chr6_dbb_hap3 4610396
chr17_ctg5_hap1 1680828
chr4_ctg9_hap1 590426
chr1_gl000192_random 547496
chrUn_gl000225 211173
chr4_gl000194_random 191469
chr4_gl000193_random 189789
chr9_gl000200_random 187035
chrUn_gl000222 186861
chrUn_gl000212 186858
chr7_gl000195_random 182896
chrUn_gl000223 180455
chrUn_gl000224 179693
chrUn_gl000219 179198
chr17_gl000205_random 174588
chrUn_gl000215 172545
chrUn_gl000216 172294
chrUn_gl000217 172149
chr9_gl000199_random 169874
chrUn_gl000211 166566
chrUn_gl000213 164239
chrUn_gl000220 161802
chrUn_gl000218 161147
chr19_gl000209_random 159169
chrUn_gl000221 155397
chrUn_gl000214 137718
chrUn_gl000228 129120
chrUn_gl000227 128374
chr1_gl000191_random 106433
chr19_gl000208_random 92689
chr9_gl000198_random 90085
chr17_gl000204_random 81310
chrUn_gl000233 45941
chrUn_gl000237 45867
chrUn_gl000230 43691
chrUn_gl000242 43523
chrUn_gl000243 43341
chrUn_gl000241 42152
chrUn_gl000236 41934
chrUn_gl000240 41933
chr17_gl000206_random 41001
chrUn_gl000232 40652
chrUn_gl000234 40531
chr11_gl000202_random 40103
chrUn_gl000238 39939
chrUn_gl000244 39929
chrUn_gl000248 39786
chr8_gl000196_random 38914
chrUn_gl000249 38502
chrUn_gl000246 38154
chr17_gl000203_random 37498
chr8_gl000197_random 37175
chrUn_gl000245 36651
chrUn_gl000247 36422
chr9_gl000201_random 36148
chrUn_gl000235 34474
chrUn_gl000239 33824
chr21_gl000210_random 27682
chrUn_gl000231 27386
chrUn_gl000229 19913
chrM 16571
chrUn_gl000226 15008
chr18_gl000207_random 4262
12 changes: 12 additions & 0 deletions scripts/file_creation_for_higlassserver/tsv_to_bed.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
import pandas as pd
import os
import fuc
import re
os.chdir("../..")
os.chdir("documents2/stage/test_data")
df = pd.read_csv("D2201410_new.tsv", delimiter="\t")
df = df.drop(columns=['Sample'])
df.columns = ['Chromosome', 'Start', 'End','REF', 'ALT','BAF']
df['Chromosome'] = 'chr' + df['Chromosome'].astype(str)
bf = fuc.pybed.BedFrame.from_frame(meta=[], data=df)
bf.to_file('D2201410_new.bed')
18 changes: 18 additions & 0 deletions scripts/file_creation_for_higlassserver/tsv_to_bed_bash.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
#!/usr/bin/python3
import pandas as pd
import sys
import fuc

filepath = sys.argv[1]
type_file = sys.argv[2]
if len(sys.argv) < 3:
print("2 addition arguments are needed: format: tsv_to_bed_bash.py filepath filetype")
sys.exit(1)
filename = filepath.split(".")
df = pd.read_csv(filepath, delimiter="\t")
if type_file.upper() == "BAF":
df = df.drop(columns=['Sample'])
df.columns = ['Chromosome', 'Start', 'End','REF', 'ALT','BAF']
df['Chromosome'] = df['Chromosome'].apply(lambda x: f'chr{x}')
bf = fuc.pybed.BedFrame.from_frame(meta=[], data=df)
bf.to_file(f'{filename}.bed')
Loading