Connecting to MySQL without passwd prompt

If you are (like me) annoyed by providing password at every mysql login, you can skip it. Also it makes easier programmatic access to any MySQL db, as not passwd prompting is necessary 🙂
Create `~/.my.cnf` file:



And login without `-p` parameter:
mysql -h host -u username dbname

If you want to use `~/.my.cnf` file in MySQLdb, just connect using this:
import MySQLdb
cnx = MySQLdb.connect(host=host, port=port, read_default_file="~/.my.cnf")


Get gene names for set of RefSeq IDs

Today I needed to annotate set of RefSeq IDs in .bed file with gene names.
Firstly, I was looking for a way to get gene names for RefSeq IDs. I have found simple solution on BioStars.

mysql –user=genome -N – -A -D danRer10
-e "select name,name2 from refGene" > refSeq2gene.txt

Secondly, I’ve written simple Python script to add the gene name to .bed file in place of score which I don’t need at all.

#!/usr/bin/env python
# Add gene name instead of score to BED file

# USAGE: cat bed | refSeq2gene.txt > with_gene_names.bed

import sys

fn = sys.argv[1]
refSeq2gene = {}
for l in open(fn):
refSeq, gene = l[:-1].split(‘t’)
refSeq2gene[refSeq] = gene

sys.stderr.write(" %s accesssions loaded!n"%len(refSeq2gene))

for l in sys.stdin:
ldata = l[:-1].split(‘t’)
chrom, s, e, refSeq = ldata[:4]
if refSeq in refSeq2gene:
ldata[4] = refSeq2gene[refSeq]
ldata[4] = "-"

Hope, some will find it useful.

Progress of long processes in BASH

You can view progress of your process execution in UNIX using pv or bar. With pv, you can even report progress of multiple modules of your pipeline.

This is very useful for tracing large database dump/restore progress:

pv -cN gzip backup.sql.gz | gzip -d | pv -cN mysql | mysql
  mysql: 799MiB 0:06:30 [1.68MiB/s] [ <=> ]
   gzip: 173MiB 0:06:30 [ 250kiB/s] [=> ] 4% ETA 2:25:09

Fetching from MySQL in batches

Today, I have encountered problems during fetching large MySQL table. Simply, the server was getting out of memory due to large table size.

I knew, you can fetch subset of data from MySQL table using LIMIT. Standard `LIMIT N` returns first N rows of your table.

However there is second way of calling LIMIT, `LIMIT S, N`; where S is the starting row and N is total number of rows to be returned. Thus, you can easily plug it into FOR LOOP to fetch from large MySQL table in for example 1 million rows batches:

#get row count
c=`mysql -u$user -p$pass $db -N -A -e"select count(*) from $t"`
echo `date` $t "with $c rows";
#write table header
echo -e $header | gzip > $t.txt.gz
#get table in batch
for i in `seq 1 $step $c`; do
  cmd="select * from $t limit $i, $step"
  echo -en "\r$i / $c"
  mysql -u$user -p$pass $db -A -N -e"$cmd" | gzip >> $t.txt.gz
echo ""

Hope, someone will find it useful:)