Naive python implementation of a de Bruijn Graph

Neat implementation of simple de Bruijn assembler in Python

Bits of Bioinformatics

De Bruijn graphs are widely used in assembly algorithms and I guess writing your own assembler is a rite of passage. Hoping to get some answers I got this question from Mick Watson. Since I had a script lying around that I used for validation I thought I would share it.

So let’s see how this is done. First we’ll need to work with k-mers, which are substrings of a fixed length k. Since we’re keeping this as simple as possible we’ll use python strings for k-mers. Some functions to work with those

The yield statement in python gives us an iterator object, so we can write stuff like

which will print TATA, TATC, TATG and TATT, i.e. all forward neighbors of the k-mer “ATAT”. If we need to convert the iterator to a list the easiest thing to do is list(fw("ATAT")).

To keep track of all the k-mers…

View original post 657 more words

Advertisements

A hybrid model for a High-Performance Computing infrastructure for bioinformatics

In between lines of code

I work for the Norwegian High-Throughput Sequencing Centre (NSC), but at the Centre for Ecological and Evolutionary Synthesis (CEES). At CEES, numerous researchers run bioinformatic analyses, or other computation-heavy analyses, for their projects. With this post, I want to describe the infrastructure we use for calculations and storage, and the reason why we chose to set these up the way we did.

In general, when one needs high-performance compute (HPC) infrastructure, a (group of) researcher(s) can purchase these and locate them in or around the office, or use a cloud solution. Many, if not most, universities offer a computer cluster for their researchers’ analysis needs. We chose a hybrid model between the universitys HPC infrastructure and setting up one ourselves. In other words, our infrastructure is a mix of self-owned, and shared resources that we either apply for, or rent.

View original post 780 more words

Reading documents on Kindle

Recently, I wanted to read some documents (i.e. from Google Docs) on my Kindle. I was interested in exporting .mobi, as reading PDFs on Kindle is really annoying.
I found converting .doc into .mobi not a trivial task. Luckily, nearly everything is possible in Ubuntu with a few steps:

  1. Download your document as .doc / .docx
  2. Open .doc / docx in LibreOffice / OpenOffice and export to ePub (you’ll need to install Writer2ePub before)
  3. Import ePub into Calibre and export it to your favourite eBook reader.

Inspired by Quora.

Transfer WordPress to Amazon EC2

After rather successful year of using WordPress, I have decided to move my blog to AWS. I was considering the move for long time, motivated by Free Tier and finally I found some time to do it.

At first, I have created WordPress Stack using CloudFormation, but personally I prefer Ubuntu over Amazon Linux and I will focus on configuration of Ubuntu EC2 instance here.

  1. Export your existing blog
    WP-Admin > Tools > Export

  2. Login to AWS console and Create Key Pair
  3. Launch EC2 instance
    I use Ubuntu HVM. I recommend t2.micro, as it’s free for the first year. You should specify created/uploaded key.

  4. Login to your EC2 instance using Public DNS or IP and your key
    [bash]ssh -i .aws/your_key.pem ubuntu@ec2xxxxx.compute.amazonaws.com[/bash]
    NOTE: you key should be readable only by you. To achieve that, you can do:
    [bash]chmod 600 .aws/your_key.pem[/bash]

  5. Configure Ubuntu
    [bash]
    sudo apt-get update && sudo apt-get upgrade
    sudo apt-get install apache2 php5 php5-mysql libapache2-mod-php5 libapache2-mod-auth-mysql mysql-server
    [/bash]

  6. Configure MySQL
    [bash]
    sudo mysql_secure_installation
    mysql -uroot -p

    CREATE DATABASE wordpress;
    CREATE USER ‘wordpress’ IDENTIFIED BY ‘SOMEPASS’;
    GRANT ALL ON wordpress.* TO ‘wordpress’;
    [/bash]

  7. Configure wordpress
    [bash]
    sudo -i
    cd /var/www/html/
    wget https://wordpress.org/latest.tar.gz
    tar xpfz latest.tar.gz
    rm latest.tar.gz
    cd wordpress/
    mv wp-config-sample.php wp-config.php
    sudo chown -R www-data:www-data /var/www/html

    # edit wp-config.php
    define(‘DB_NAME’, ‘wordpress’);
    define(‘DB_USER’, ‘wordpress’);
    define(‘DB_PASSWORD’, ‘SOMEPASS’);
    define(‘DB_HOST’, ‘localhost’);
    [/bash]

  8. Configure Apache
    [bash]
    # edit /etc/apache2/sites-available/wordpress.conf

    ServerName ec2xxxxx.compute.amazonaws.com
    ServerAlias YOURDOMAIN.COM
    DocumentRoot /var/www/html/wordpress
    DirectoryIndex index.php

    AllowOverride All
    Order Deny,Allow
    Allow from all

    # enable wordpress in apache2
    sudo a2ensite wordpress
    sudo service apache2 restart
    [/bash]

  9. Enable HTTP access to your EC2 instance
    Go to EC2 console > Instances > Select you instance > Description >
    Click on your `Security group` > Select Inbound > Edit > Add rule > HTTP > Save

  10. Point your webrowser to your EC2 instance: http://ec2xxxxx.compute.amazonaws.com/
  11. Setup your wordpress account
  12. Upload dumped wordpress data
    WP-Admin > Tools > Import > WordPress > > Upload file import
    NOTES:
    You will need to install WordPress Importer plugin.

  13. Assign post to correct user.
    Don’t forget to Import Attachments!

  14. Install your favourite plugins and themes
    As for plugins, I strongly recommend: JetPack, SyntaxHighlighter Evolved and Google Analytics Dashboard for WP.

  15. Add favicon
    Copy selected favicon.ico to /var/www/html/wordpress

Voilà!
BTW: You may want to increase security of your instance and setup swap just in case memory usage exceeds your EC2 instance size.

Speed of USB 3.0 external hard drives

I’m using several external drives for backup purposes. I was interested whether there are some differences in terms of read/write/access time between these. I have tested two drives connected directly to USB3.0 port and through USB3.0 hub:

  1. HITACHI Touro 2.5″ 500GB USB 3.0
  2. WD My Passport 2TB USB 3.0

Touro 500GB USB3.0
Touro 500GB USB3.0

Touro 500GB USB3.0 HUB
Touro 500GB USB3.0 HUB

WD My Passport 2TB USB3.0
WD My Passport 2TB USB3.0

WD My Passport 2TB USB3.0 HUB
WD My Passport 2TB USB3.0 HUB

There are two interesting results. First of all, there is no speed difference between both drives (read/write/access: 83MB/49MB/17ms). Thus buying super-performance (read super-expensive) external drive make no sense, as the drive speed will be limited most likely by USB3.0 port connection anyway.
Secondly, there is no speed difference if drive is attached directly to the computer USB3.0 or through USB3.0 hub. But, I have noticed mouse lags when mouse was connected through the same USB hub. Thus if you expect to perform extensive disk reads/writes it’s better to connect it directly to the computer or at least avoid many important devices to use the same hub.

Shrink a dynamically growing disk from VirtualBox

I’m preparing Ubuntu image for the course. After installation of required software and datasets, the system image grew a lot. I have decided to shrink its size by disabling swap and removing unused data. But the system image didn’t shrink automatically after removal of the files. You need to do it manually:

  1. Run your system from LiveCD
  2. Install zerofree and free unused space
  3. sudo apt-get install zerofree
    # 10min
    sudo zerofree -v /dev/sda1
    
  4. In the host system, release and remove the image from VirtualBox (but keep it in the filesystem!)
  5. shrink1

    shrink2

  6. And compact image size
  7. VBoxManage modifyhd Ubuntu1404/Ubuntu1404.vdi --compact</code>
  8. Finally, you will have to add .vdi image back to VirtualBox machine.
  9. shrink3

In my case, the .vdi image shrunk from 7.2G to 6.2G (exactly the released size under Ubuntu), so I think it’s worth the effort.

Inspired by AskUbuntu.