Donnerstag, 30. November 2017

bedtools intersect for large genomes works with the -sorted option

When trying to intersect to bed files for regions of the barley genome, I got the following error:


bedtools intersect -a A_input.bed -b B_input.bed 
ERROR: Received illegal bin number 37453 from getBin call.
ERROR: Unable to add record to tree.


This seems to be due to the very large size of the barley chromosomes, which are up to almost 770 Mbp long. Curiously, when intersecting with the -sorted option, bedtools can handle the files. Using the -sorted option is recommended anyways, because it makes bedtools intersect faster and more memory efficient.

So, after sorting the files with

sort -k 1,1 -k2,2n input.bed

or the slower

bedtools sort input.bed

the intersection can be accomplished with

bedtools intersect -sorted -a A_input.bed -b B_input.bed  

Donnerstag, 22. Juni 2017

Run a jupyter notebook on a compute-cluster, use it locally




For those of you who want to run a jupyter-notebook on a cluster-server, so that you can run large processes or submit cluster-jobs directly from within the notebook:


- start a shell
- type "ssh -N -f -L localhost:8888:localhost:8889 urusrname@hpc002" to open a "tunnel" between the server and your computer
don't worry if you don't see anything. The tunnel is there, it's just invisible (magic).
- then, in the same shell type "ssh urusrname@hpc002" to login to the same server
- now start a jupyter-notebook, but without opening it in the browser. Instead make it use the tunnel "jupyter-notebook --no-browser --port=8889"
You get an output like this:


"    Copy/paste this URL into your browser when you connect for the first time,
    to login with a token:
        http://localhost:8889/?token=e76a6b5e3b22d1ae1cc985be277c2d81e120faf10fa0014a
"
Open a browser and copy paste the link into the browser. Exchange localhost:8889 by localhost:8888


Voila.

Inspired by
https://coderwall.com/p/ohk6cg/remote-access-to-ipython-notebooks-via-ssh

Mittwoch, 15. Februar 2017

Install R locally with anaconda to use ballgown for RNA-Seq

Just a mail I just send to a couple of colleagues, explaining how I installed the RNA-Seq analysis package ballgown on a unxi debian "Jessie".
Should in general work for unix.



Hello,

after mapping the RNA-Seq reads with Hisat2
and calculating transcripts and readcounts with stringtie
it is time to analyse the data with 

ballgown
https://github.com/alyssafrazee/ballgown


First it needs to be installed.
This doesn't work straight-forward, because you can not install it in
the default R. At least for me it didn't work.

This is how I did it:

install anaconda (this automatically installs a local R version in
/home/ries/anaconda2/)
 
https://docs.continuum.io/anaconda/install


install very important R packages:
> I found the command to install a number of famous R packages: conda
> install -c r r-essentials

it also automatically installs the bioconductor installer,
which can then be used to install ballgown:

start your local R:
/home/ries/anaconda2/bin/R

and from within R:
source("http://bioconductor.org/biocLite.R")
biocLite("ballgown")


Good bye from
ries@home