Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing GFF file for SPECTRA_MSD/TB_COVID_GFP #2

Open
Tasmin153 opened this issue May 1, 2024 · 2 comments
Open

Missing GFF file for SPECTRA_MSD/TB_COVID_GFP #2

Tasmin153 opened this issue May 1, 2024 · 2 comments

Comments

@Tasmin153
Copy link

Hi,

I am currently working on phenotype prediction from unique strains of antitubercular agents. Thanks for this great work, SPECTRA would be helpful to test the model that I am working on. I was trying to reproduce the analysis for TB drug splits. I have downloaded the required data files from the dataverse page. But in the "run_baseline.py" file, I find that there are some missing files like input_gff_file, reference_nucleotide, and full_reference_sequence. Could you let me know how can I get these files and run the "run_baseline.py" file correctly?

Input GFF File
reference_nucleotide

@yashaektefaie
Copy link
Collaborator

Apologies for the delayed response! You actually do not need those files, these were needed when once upon a time I did data processing per step but I did that once and provide the processed data files. So if you provide None for those entries the script should run if there are more errors though let me know!

@Tasmin153
Copy link
Author

Hi, I am trying to run with the following command python run_baseline.py 0 INH logistic_regression 0.1 binary --trial_run True. There are some files missing callbacks.py constants.py, generate_barcode.py, mod_alignment_utils.py, check_mutational_splits.py missing from the SPECTRA_paper/SPECTRA_MSD/TB_Covid_GFP/utils folder. I am guessing most of these methods have been moved to #https://github.com/mims-harvard/SPECTRA/blob/75b59639dcae6adad92af4a34313a75196a2659c/SPECTRA_paper/SPECTRA_MSD/TB_Covid_GFP/utils/general_utility_functions.py. Right now I am commenting out those import statements and importing this file.

However, I could not fine "GenerateBarcode" as mentioned in this line #

self.fetcher = GenerateBarcode(drug, use_pregenerated)
in any of the utils files or the Sequence_dataset.py file. I have tried commenting it out but it does not work. There are some methods related to barcode in the Sequence_dataset file but I could not find the one matching with this params GenerateBarcode(drug, use_pregenerated)

Traceback (most recent call last):
  File "run_baseline.py", line 395, in <module>
    run_baseline(**params_to_use)
  File "run_baseline.py", line 140, in run_baseline
    sequence_dataset.initialize_encoder()
  File "./SPECTRA/SPECTRA_paper/SPECTRA_MSD/TB_Covid_GFP/run/Sequence_Dataset.py", line 109, in initialize_encoder
    all_train_outputs = [self.__getitem__(i) for i in tqdm(self.return_train_strains(), total=len(self.train_strains))]
  File "./SPECTRA/SPECTRA_paper/SPECTRA_MSD/TB_Covid_GFP/run/Sequence_Dataset.py", line 109, in <listcomp>
    all_train_outputs = [self.__getitem__(i) for i in tqdm(self.return_train_strains(), total=len(self.train_strains))]
  File "./SPECTRA/SPECTRA_paper/SPECTRA_MSD/TB_Covid_GFP/run/Sequence_Dataset.py", line 376, in __getitem__
    sequences = self.get_sequences_barcode(i)
  File "./SPECTRA/SPECTRA_paper/SPECTRA_MSD/TB_Covid_GFP/run/Sequence_Dataset.py", line 370, in get_sequences_barcode
    return self.fetcher.barcode(strain)
AttributeError: 'Sequence_Dataset' object has no attribute 'fetcher'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants