Skip to content

Commit

Permalink
update expected files with new genes and cluster
Browse files Browse the repository at this point in the history
  • Loading branch information
JeanMainguy committed Oct 15, 2024
1 parent 7cb227c commit 6507c57
Show file tree
Hide file tree
Showing 3 changed files with 11 additions and 8 deletions.
7 changes: 5 additions & 2 deletions ppanggolin/cluster/cluster.py
Original file line number Diff line number Diff line change
Expand Up @@ -475,8 +475,11 @@ def read_clustering_file(families_tsv_path: Path) -> Tuple[pd.DataFrame, bool]:
families_df["representative"] = families_df["representative"].astype(str)

# Check for duplicate gene IDs
if families_df["gene"].duplicated().any():
raise Exception("It seems that there is duplicated gene id in your clustering.")
duplicates = families_df[families_df["gene"].duplicated()]["gene"].unique()

if len(duplicates) > 0:
raise ValueError(f"Duplicate gene IDs found in your clustering: {', '.join(duplicates)}")


return families_df[["family", "representative", "gene", "is_frag"]], families_df["is_frag"].any()

Expand Down
4 changes: 2 additions & 2 deletions testingDataset/expected_info_files/checksum.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
9d6219523e890b08c467c936590b37436e915840a797ea161c9707041c3b821c mybasicpangenome/gene_families.tsv
9d6219523e890b08c467c936590b37436e915840a797ea161c9707041c3b821c stepbystep/gene_families.tsv
bd71918de23737aec5a262c883c1154a507ae616bdc47a152db0e76319e7484d readclusterpang/gene_families.tsv
b893343f249102dff23a9542752f83cdee93efd34bc951b48a0c986fcb5d26a4 myannopang/gene_families.tsv
5cb88afbc1a49e7d21531d36d5ee3471a87834f4bb82578347dc92231ba0a452 readclusterpang/gene_families.tsv
5cb88afbc1a49e7d21531d36d5ee3471a87834f4bb82578347dc92231ba0a452 myannopang/gene_families.tsv
41511d7c482c0c400a504d5c564605427081917e54aee1e5f37ffda220d2cdb9 test_config/gene_families.tsv
8 changes: 4 additions & 4 deletions testingDataset/expected_info_files/myannopang_info.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,10 @@ Status:
PPanGGOLiN_Version: 2.1.2

Content:
Genes: 47961
Genes: 47986
Genomes: 53
Families: 1084
Edges: 1318
Families: 1086
Edges: 1315
Persistent:
Family_count: 871
min_genomes_frequency: 0.89
Expand All @@ -28,7 +28,7 @@ Content:
sd_genomes_frequency: 0.18
mean_genomes_frequency: 0.52
Cloud:
Family_count: 157
Family_count: 159
min_genomes_frequency: 0.02
max_genomes_frequency: 0.23
sd_genomes_frequency: 0.04
Expand Down

0 comments on commit 6507c57

Please sign in to comment.