Tuesday, November 20, 2018

Error: Sorted input specified, but the file has the following out of order record with a different sort order than the genomeFile

This error troubled me a lot while parsing some data, at last this error is resolved for my case.

filename: test_genome_lengths:
KP5_Contig1    1099843
KP5_Contig2    939199
KP5_Contig3    804334
KP5_Contig4    704755
KP5_Contig5    490858
KP5_Contig6    445261
KP5_Contig7    336421
KP5_Contig8    205120
KP5_Contig9    173756
KP5_Contig10    63375
KP5_Contig11    4752

filename: original.bed
KP5_Contig1    2    378871
KP5_Contig1    378872    812978
KP5_Contig1    814316    1099843
KP5_Contig10    27093    28206
KP5_Contig10    30740    42583
KP5_Contig10    43383    46800
KP5_Contig10    47283    51877
KP5_Contig10    52485    57209
KP5_Contig10    57496    57838
KP5_Contig11    1    902
KP5_Contig11    3859    4197
KP5_Contig11    4429    4752
KP5_Contig2    1    939199
KP5_Contig3    1    8672

 bedtools complement -i original.bed -g test_genome_lengths

Error: Sorted input specified, but the file has the following out of order record with a different sort order than the genomeFile
KP5_Contig2    1    939199

This is caused because my bed file is not sorted numerically using sort. The correct order I needed to input was:

filename: corrected.bed
KP5_Contig1    2    378871
KP5_Contig1    378872    812978
KP5_Contig1    814316    1099843
KP5_Contig2    1    939199
KP5_Contig3    1    8672
KP5_Contig10    27093    28206
KP5_Contig10    30740    42583
KP5_Contig10    43383    46800
KP5_Contig10    47283    51877
KP5_Contig10    52485    57209
KP5_Contig10    57496    57838
KP5_Contig11    1    902
KP5_Contig11    3859    4197
KP5_Contig11    4429    4752

bedtools complement -i corrected.bed -g test_genome_lengths

Now, no error exists.

To get corrected.bed, I have sorted numerically in the following way:

cat original.bed | sort -n -k1.11 -nk2,2 >corrected.bed

For more info on this sort function, check this stack overflow post.