Skip to content

Commit

Permalink
Improve reading of .SAM files
Browse files Browse the repository at this point in the history
The data section does not always begin at line 3.
This improvement makes the code able to handle those situations where
it does not.
  • Loading branch information
Donaim committed Jul 9, 2024
1 parent 1819951 commit 05aac4f
Showing 1 changed file with 11 additions and 1 deletion.
12 changes: 11 additions & 1 deletion gene_splicer/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -334,7 +334,17 @@ def splice_aligned_genes(query, target, samfile, annotation):


def load_samfile(samfile_path):
result = pd.read_table(samfile_path, skiprows=2, header=None)
# Open the SAM file and find the starting point for data
with open(samfile_path, 'r') as file:
# Skip meta fields
lines = file.readlines()
data_start_index = 0
for i, line in enumerate(lines):
if not line.startswith('@'):
data_start_index = i
break

result = pd.read_table(samfile_path, skiprows=data_start_index, header=None)
result['cigar'] = result.apply(split_cigar, axis=1)
return result

Expand Down

0 comments on commit 05aac4f

Please sign in to comment.