Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VCF Overhaul #105

Merged
merged 27 commits into from
Sep 27, 2021
Merged

VCF Overhaul #105

merged 27 commits into from
Sep 27, 2021

Conversation

charlesgregory
Copy link
Contributor

Lots of changes went into this.

fixes #96.

New Features:

  • HeaderRecord wrapper struct for getting VCFHeader records in a structured format along with getters and setters and VCFHeader can now accept this type to add to a header.
  • ability to remove header lines removeHeaderLines.
  • Moved functions from VCFReader to VCFHeader as appropriate.
  • New InfoField and FormatField wrapper structs to get and transform VCF format and info data into D types (data is copied).
  • New functions (getInfos and getFormats) to return a hashmap of all info and format data using new wrapper structs.
  • New function to remove info fields removeInfo.
  • New enum replacements for htslib BCF* enums.
  • lots of new unittests

Though I did introduce a lot of new features, I have tried to leave old functionality behind. (Though with the reorganization and the enums, the API will change for most of dhtslib.vcf).

Fixes:

Concerns:

  1. We sometimes report errors through hts_log and have the function return, sometimes we report errors through hts_log and have the function throw an exception, or sometimes we just throw an exception. We should decide formally on how this should be done across the whole package, Functional style Optional and Result return types #99 probably relevant.
  2. Naming in general. I am not sold on my naming of the new enums, I just wanted them to be more informative than the htslib enums.
  3. Could use more asserts and checks

@codecov-commenter
Copy link

codecov-commenter commented Sep 5, 2021

Codecov Report

Merging #105 (da10a93) into develop (b4e5e01) will increase coverage by 3.41%.
The diff coverage is 91.29%.

❗ Current head da10a93 differs from pull request most recent head 1768e09. Consider uploading reports for the commit 1768e09 to get more accurate results
Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #105      +/-   ##
===========================================
+ Coverage    80.40%   83.82%   +3.41%     
===========================================
  Files           40       40              
  Lines         2822     3424     +602     
===========================================
+ Hits          2269     2870     +601     
- Misses         553      554       +1     
Impacted Files Coverage Δ
source/dhtslib/sam/record.d 76.61% <79.59%> (-4.64%) ⬇️
source/dhtslib/sam/tagvalue.d 95.90% <83.33%> (-0.63%) ⬇️
source/dhtslib/vcf/record.d 91.27% <87.07%> (+0.05%) ⬆️
source/dhtslib/sam/reader.d 87.60% <94.44%> (+2.96%) ⬆️
source/dhtslib/vcf/header.d 95.58% <95.40%> (+70.58%) ⬆️
source/dhtslib/vcf/reader.d 96.92% <96.92%> (+2.05%) ⬆️
source/dhtslib/sam/cigar.d 89.76% <100.00%> (+0.33%) ⬆️
source/dhtslib/vcf/writer.d 100.00% <100.00%> (+47.54%) ⬆️
source/htslib/sam.d 52.17% <100.00%> (+2.17%) ⬆️
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b4e5e01...1768e09. Read the comment docs.

Copy link
Member

@jblachly jblachly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything needs to be run through dfmt also

source/dhtslib/vcf/package.d Outdated Show resolved Hide resolved
source/dhtslib/vcf/package.d Show resolved Hide resolved
source/dhtslib/vcf/package.d Show resolved Hide resolved
source/dhtslib/vcf/package.d Outdated Show resolved Hide resolved
source/dhtslib/vcf/package.d Show resolved Hide resolved
source/dhtslib/vcf/header.d Show resolved Hide resolved
source/dhtslib/vcf/header.d Show resolved Hide resolved
source/dhtslib/vcf/header.d Outdated Show resolved Hide resolved
source/dhtslib/vcf/header.d Outdated Show resolved Hide resolved
source/dhtslib/vcf/header.d Show resolved Hide resolved
@charlesgregory
Copy link
Contributor Author

Alright I added several changes to the enums. They now refer directly to the htslib BCF_* enums. I expanded some of the enum names to make them more self-explanatory. I made a function for calling bcf_unpack to standardize it for all functions along with unittests to confirm it works.

I also changed the default MAX_UNPACK value to UnpackLevels.None as this should help performance by default and help us better catch bugs where we have not correctly unpacked data before using it. You may disagree with this sentiment. Let me know.

Still todo:

  • Decide on no-copy vs copy for InfoField and FormatField
  • Convert integration tests to actual unittests
  • move integration tests to tests folder
  • dfmt

Copy link
Member

@jblachly jblachly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ALmost there. There were a few conversations not addressed (minor stuff) and a few you did comment on that I replied to.

The only major change I am requesting is no-copy VCFRecord, to match SAMRecord ; I could be persuaded either way on INFO and FORMAT structs

source/dhtslib/vcf/package.d Outdated Show resolved Hide resolved
source/dhtslib/vcf/reader.d Show resolved Hide resolved
source/dhtslib/vcf/record.d Show resolved Hide resolved
source/dhtslib/vcf/record.d Show resolved Hide resolved
source/dhtslib/vcf/record.d Outdated Show resolved Hide resolved
These structs now hold references to underlying
info and format data. It follows a scheme similar to
Cigar and TagValue (scheme in develop not in this branch vcf_overhaul).
They increment and decrement the VCFRecord refct via a pointer
they recieve via ctor.
@charlesgregory
Copy link
Contributor Author

I think this should be ready to go.

@charlesgregory charlesgregory merged commit 9dd381c into develop Sep 27, 2021
@charlesgregory charlesgregory deleted the vcf_overhaul branch September 29, 2021 15:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

VCF Overhaul dhtslib.vcf.record poor test coverage VCFReader doesn't conform to Range rules
3 participants