Donnerstag, 30. November 2017

bedtools intersect for large genomes works with the -sorted option

When trying to intersect to bed files for regions of the barley genome, I got the following error:


bedtools intersect -a A_input.bed -b B_input.bed 
ERROR: Received illegal bin number 37453 from getBin call.
ERROR: Unable to add record to tree.


This seems to be due to the very large size of the barley chromosomes, which are up to almost 770 Mbp long. Curiously, when intersecting with the -sorted option, bedtools can handle the files. Using the -sorted option is recommended anyways, because it makes bedtools intersect faster and more memory efficient.

So, after sorting the files with

sort -k 1,1 -k2,2n input.bed

or the slower

bedtools sort input.bed

the intersection can be accomplished with

bedtools intersect -sorted -a A_input.bed -b B_input.bed  

Keine Kommentare:

Kommentar veröffentlichen