This error troubled me a lot while parsing some data, at last this error is resolved for my case.
filename: test_genome_lengths:
KP5_Contig1 1099843
KP5_Contig2 939199
KP5_Contig3 804334
KP5_Contig4 704755
KP5_Contig5 490858
KP5_Contig6 445261
KP5_Contig7 336421
KP5_Contig8 205120
KP5_Contig9 173756
KP5_Contig10 63375
KP5_Contig11 4752
filename: original.bed
KP5_Contig1 2 378871
KP5_Contig1 378872 812978
KP5_Contig1 814316 1099843
KP5_Contig10 27093 28206
KP5_Contig10 30740 42583
KP5_Contig10 43383 46800
KP5_Contig10 47283 51877
KP5_Contig10 52485 57209
KP5_Contig10 57496 57838
KP5_Contig11 1 902
KP5_Contig11 3859 4197
KP5_Contig11 4429 4752
KP5_Contig2 1 939199
KP5_Contig3 1 8672
bedtools complement -i original.bed -g test_genome_lengths
Error: Sorted input specified, but the file has the following out of order record with a different sort order than the genomeFile
KP5_Contig2 1 939199
This is caused because my bed file is not sorted numerically using sort. The correct order I needed to input was:
filename: corrected.bed
KP5_Contig1 2 378871
KP5_Contig1 378872 812978
KP5_Contig1 814316 1099843
KP5_Contig2 1 939199
KP5_Contig3 1 8672
KP5_Contig10 27093 28206
KP5_Contig10 30740 42583
KP5_Contig10 43383 46800
KP5_Contig10 47283 51877
KP5_Contig10 52485 57209
KP5_Contig10 57496 57838
KP5_Contig11 1 902
KP5_Contig11 3859 4197
KP5_Contig11 4429 4752
bedtools complement -i
corrected.bed -g test_genome_lengths
Now, no error exists.
To get corrected.bed, I have sorted numerically in the following way:
cat original.bed | sort -n -k1.11 -nk2,2 >corrected.bed
For more info on this sort function, check this stack overflow post.
No comments:
Post a Comment