Friday, August 5, 2016

NCBI Blast error - with Pipe symbols in fasta header

I am running NCBI blast version 2.2.28 for my analysis.  I encountered an error while making database for my local blast.

Error:

$ ~/Documents/Blast/makeblastdb -in HSA_GRCh37.p13_EnsGenes.fasta -dbtype nucl -out HSA_GRCh37.p13_EnsGenes.BlastDB -parse_seqids

Building a new DB, current time: 08/05/2016 11:11:15
New DB name:   HSA_GRCh37.p13_EnsGenes.BlastDB
New DB title:  HSA_GRCh37.p13_EnsGenes.fasta
Sequence type: Nucleotide
Keep Linkouts: T
Keep MBits: T
Maximum file size: 1073741824B

No volumes were created because no sequences were found.


Error: NCBI C++ Exception:

    "/am/ncbiapdata/release/blast/src/2.2.25/Linux64-Suse-icc/c++/ICC1010-ReleaseMT64--Linux64-Suse-icc/../src/objects/seq/../seqloc/Seq_id.cpp", line 1637: Error: ncbi::objects::CSeq_id::x_Init() - Unsupported ID type ENSG00000003137

Fasta file was okay but the header is the problem. It contained pipe symbol "|". So replaced all the pipes with an underscore "_"


$ sed -i 's/|/_/g' HSA_GRCh37.p13_EnsGenes.fasta

Database was created succesfully.

$ ~/Documents/Blast/makeblastdb -in HSA_GRCh37.p13_EnsGenes.fasta -dbtype nucl -out HSA_GRCh37.p13_EnsGenes.BlastDB -parse_seqids


Building a new DB, current time: 08/05/2016 11:12:05

New DB name:   HSA_GRCh37.p13_EnsGenes.BlastDB
New DB title:  HSA_GRCh37.p13_EnsGenes.fasta
Sequence type: Nucleotide
Keep Linkouts: T
Keep MBits: T
Maximum file size: 1073741824B
Adding sequences from FASTA; added 215647 sequences in 15.894 seconds.


No comments:

Post a Comment