Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CreateScaffoldedFasta.pl error #46

Open
bhagya-ct opened this issue Apr 25, 2018 · 2 comments
Open

CreateScaffoldedFasta.pl error #46

bhagya-ct opened this issue Apr 25, 2018 · 2 comments

Comments

@bhagya-ct
Copy link

mml@mml:/media/mml/6f60ef75-45fb-4532-9f2a-1a5d642a3093/3C_data/Ctrp_WT$ CreateScaffoldedFasta.pl PacBio_denovo.fasta out
Wed Apr 25 14:14:42 2018: CreateScaffoldedFasta.pl with input fasta = PacBio_denovo.fasta, OUTPUT_DIR = out
Wed Apr 25 14:14:42 2018: Found 7 ordering files ('group*.ordering' in out/main_results/).
Wed Apr 25 14:14:42 2018: Reading in sequences from assembly file PacBio_denovo.fasta
Wed Apr 25 14:14:42 2018: Found 141 contigs/scaffolds in assembly.
ERROR: Ordering file out/main_results/group0.ordering includes contig named 'tig00000015', not found in fasta file PacBio_denovo.fasta
Wed Apr 25 14:14:42 2018: Creating a scaffold from file out/main_results/group0.ordering...

But, PacBio_denovo.fasta does contain tig00000015.

Unable to figure out how to fix this.

Bhagya C T

@phillip-mcclurg-driscolls

I have run into this problem as well with the fasta output of FALCON - when parsing the fasta file it appears that the function "LoadFasta" does not parse the header lines correctly. Instead of splitting off the contig name (immediately following ">") the variable contig_name is actually the entire header line (without ">"). The following modification of "LoadFasta" does this correctly and I have successfully created the Lachesis Assembly Fasta file with this change. I had not looked at perl code for sometime so this is a workaround, perhaps no the solution the authors might have chosen:

LoadFasta: Convert a fasta file to contigs.

Outputs:

1. An array of contig names.

2. A hash of contig name to contig sequence.

sub LoadFasta( $ ) {

#print localtime() . ": LoadFasta: $_[0]\n";

open IN, '<', $_[0] or die;

my $contig_name;
my @contig_names;
my @A1;
my %contig_seqs;
while (<IN>) {
    chomp;
    if ( /^\>(.+)/ ) {
        $contig_name = $1;
        @A1 = split (/ /,$contig_name);
        push @contig_names, $A1[0];
    }
    else {
        @A1 = split (/ /,$contig_name);
        $contig_seqs{$A1[0]} .= $_;
    }
}

close IN;

die "ERROR: LoadFasta: Couldn't parse file $_[0] properly.  Are you sure this is a FASTA file?" unless scalar @contig_names >= 1 && scalar keys %contig_seqs >= 1;

return ( \@contig_names, \%contig_seqs );

}

I hope this helps!

@bhagya-ct
Copy link
Author

@pwmcclurg,

thank you for your reply, I could fix the error with the help of my friend and extracted .FASTA file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants