This error troubled me a lot while parsing some data, at last this error is resolved for my case.
filename: test_genome_lengths:
KP5_Contig1    1099843
KP5_Contig2    939199
KP5_Contig3    804334
KP5_Contig4    704755
KP5_Contig5    490858
KP5_Contig6    445261
KP5_Contig7    336421
KP5_Contig8    205120
KP5_Contig9    173756
KP5_Contig10    63375
KP5_Contig11    4752
filename: original.bed 
KP5_Contig1    2    378871
KP5_Contig1    378872    812978
KP5_Contig1    814316    1099843
KP5_Contig10    27093    28206
KP5_Contig10    30740    42583
KP5_Contig10    43383    46800
KP5_Contig10    47283    51877
KP5_Contig10    52485    57209
KP5_Contig10    57496    57838
KP5_Contig11    1    902
KP5_Contig11    3859    4197
KP5_Contig11    4429    4752
KP5_Contig2    1    939199
KP5_Contig3    1    8672
 bedtools complement -i original.bed -g test_genome_lengths
Error: Sorted input specified, but the file has the following out of order record with a different sort order than the genomeFile
KP5_Contig2    1    939199
This is caused because my bed file is not sorted numerically using sort. The correct order I needed to input was:
filename: corrected.bed
KP5_Contig1    2    378871
KP5_Contig1    378872    812978
KP5_Contig1    814316    1099843
KP5_Contig2    1    939199
KP5_Contig3    1    8672 
KP5_Contig10    27093    28206
KP5_Contig10    30740    42583
KP5_Contig10    43383    46800
KP5_Contig10    47283    51877
KP5_Contig10    52485    57209
KP5_Contig10    57496    57838
KP5_Contig11    1    902
KP5_Contig11    3859    4197
KP5_Contig11    4429    4752
bedtools complement -i 
corrected.bed -g test_genome_lengths
Now, no error exists.
To get corrected.bed, I have sorted numerically in the following way:
cat original.bed | sort -n -k1.11 -nk2,2 >corrected.bed
For more info on this sort function, check this stack overflow post. 
 
No comments:
Post a Comment