Ryan's Blog

Latex scientific notation, made easy

Posted in latex by ryanlayer on January 13, 2012

\providecommand{\e}[1]{\ensuremath{\times 10^{#1}}}

Then, typing

The [111] crystal planes are 3.2\e{-10} m apart.

http://www.tapdancinggoats.com/easy-scientific-notation-in-latex.htm

R plot magic

Posted in Uncategorized by ryanlayer on January 12, 2012

To move the axis labels/lines:

par(mpg=c(, , ))

default is mpg=c(3,1,0), but mgp=c(1.75, 0.5, 0) works

To move margins:

par(mar=c(,,,))

default is mar=c(5,4,4,2)+0.1, but mar=c(3,3,0,0)+0.1 works

Using awk to randomly sample a file

Posted in Uncategorized by ryanlayer on January 6, 2012

Create a file with 1000 lines:

for i in {1..1000};do echo $i; done > f


export P=0.5
cat f | awk -v p=$P 'BEGIN{srand()} {r = rand(); if (r <= p) print}'

Best Latex Web site EVER

Posted in Uncategorized by ryanlayer on November 10, 2011

http://detexify.kirelabs.org/classify.html

General Papers

Posted in research by ryanlayer on November 22, 2010

Next-generation gap

http://www.nature.com/nmeth/journal/v6/n11s/full/nmeth.f.268.html

 

Tagged with:

Downloading an Entire Web Site with wget

Posted in Uncategorized by ryanlayer on October 8, 2010
$ wget \
     --recursive \
     --no-clobber \
     --page-requisites \
     --html-extension \
     --convert-links \
     --restrict-file-names=windows \
     --domains website.org \
     --no-parent \
         www.website.org/tutorials/html/

This command downloads the Web site www.website.org/tutorials/html/.

The options are:

  • –recursive: download the entire Web site.
  • –domains website.org: don’t follow links outside website.org.
  • –no-parent: don’t follow links outside the directory tutorials/html/.
  • –page-requisites: get all the elements that compose the page (images, CSS and so on).
  • –html-extension: save files with the .html extension.
  • –convert-links: convert links so that they work locally, off-line.
  • –restrict-file-names=windows: modify filenames so that they will work in Windows as well.
  • –no-clobber: don’t overwrite any existing files (used in case the download is interrupted and
    resumed).

http://www.linuxjournal.com/content/downloading-entire-web-site-wget

My 5c

Posted in Uncategorized by ryanlayer on June 21, 2010

Local Installation and Use of R Packages

Posted in research by ryanlayer on June 1, 2010

http://csg.sph.umich.edu/docs/R/localpackages.html

1. Specifying a local library search location

Specify a local library search location.

You can use several library trees of add-on packages. The easiest way to tell R to use these via a ‘dotfile’ by creating the following file ‘$HOME/.Renviron’ (watch the quotes and ~ character):

  R_LIBS_USER="~/R/library"

This specifies a keyword (R_LIBS_USER) which points to a colon-separated list of directories at which R library trees are rooted. You do not have to specify the default tree for R packages.

If necessary, create a place for your R libraries

  mkdir ~/R ~/R/library         # Only need do this once

Set your R library path

  echo 'R_LIBS_USER="~/R/library"' >  $HOME/.Renviron

2. Installing to a local library search location

Installation is dead easy. Start up R and tell R to fetch your package from CRAN, compile whatever needs compiling and set everything else up.

Beware – each package will only work for the platform (i.e. Linux or Solaris) where you installed it. If you want a package on both Linux and Solaris, you’ll need to install it in different directories for each system type.

  R                     # Invoke R
  > install.packages("name-of-your-package",lib="~/R/library")
Tagged with:

Novoalign Alignment Scores

Posted in research by ryanlayer on May 31, 2010

Base Qualities and Alignment Scores

Novoalign aligns reads against a reference genome using qualities and ambiguous nucleotide codes.
The initial alignment process finds alignment locations in the indexed sequence that are possible
sources of the read sequence. The alignment locations are scored using the Needleman­Wunsch
algorithm with affine gap penalties and with position specific scoring derived from the read base
qualities and any ambiguous codes in the reference sequence. User defined affine gap penalties are
used for scoring insert/deletes.
Novoalign uses Needleman­Wunsch alignments with affine gap penalties, the gap opening penalty
should be set to -10\log_{10}(P_{gap}) - G_{extend} where P_{gap} is the probability of an insertion deletion
mutation vs the reference genome and G_{extend} is the gap extension penalty. Likewise the gap extend
penalty can be set to -10\log_{10}(P_{gap2} / P_{gap1}) where P_{gap1} is the probability of a single base indel
and P_{gap2} is the probability of a 2 base insert/delete mutation. The default gap penalties were
derived from the frequency of short insert/deletes in human genome resequencing projects.
Base quality values are used to calculate base penalties for the Needleman ­Wunsch algorithm. The
base qualities are converted to base probabilities and then to score penalties.

PRB Quality to Score Conversion

The prb file has quality score Q(b,i) for each base, b, at each position, i, in the read. The quality
value is converted to a probability, Pr(b,i) and then to a penalty P(b, i).

Pr(b,i) = \frac{10^{\frac{Q(b,i)}{10}}}{1 + 10^{\frac{Q(b,i)}{10}}}

P(b,i) = -10\log_{10}(Pr(b,i))

Alignment Score and Threshold

The alignment score is -10\log_{10}(P(R|A_i)) where P(R | A_i) is the probability of the read sequence
given the alignment location i.

A threshold of 75 would allow for alignment of reads with two mismatches at high quality base
positions plus one or two mismatches at low quality positions or to ambiguous characters in the
reference sequence.
If a threshold is not specified then Novoalign will calculate a threshold for each read such that an
alignment to a non­repetitive sequence will have an alignment quality of at least 20. I.e. The
iterative process of finding an alignment will terminate before finding a low quality chance
alignment. Alignments to repetitive sequences may still have qualities less than 20.

Posterior Alignment Probabilities and Quality Scores

The posterior alignment probability calculation includes all the alignments found; the probability
that the read came from a repeat masked region or from any regions coded in the reference genome
as N’s; and an allowance for a chance hit above the threshold based on the mutual information
content of the read and the genome.
A posterior alignment probability, P(A_i| R, G) is calculated as:

P(A_i| R, G) = \frac{P(R|A_i, G)}{P(R|N, G) + \sum P(R|A_i,B)}

where P(R|N,G) is the probability of finding the read by chance in any masked reference sequence
or   any   region   of   the   reference   sequence   coded   as   N‘s,   and   where  \sum i is the sum over all the
alignments found plus a factor for chance alignments calculated using the usable read and genome
lengths.
The P(R|N,G) term allows for the fact that a fragment could have been sourced from portions of the
genome that are not represented in the reference sequence. For instance in Human genome build 36
there is approximately 7% of sequence represented by large blocks of N‘s.
A quality score is calculated as -10\log_{10}(1 - P(A_i | R, G)), where P(A_i|R, G) is the probability of the
alignment given the read and the genome.

Tagged with: ,

Mus musculus (laboratory mouse) Chromosome

Posted in research, Uncategorized by ryanlayer on May 31, 2010
GenBank id chr length
NC_000067 chr1 197195432
NC_000068 chr2 181748087
NC_000069 chr3 159599783
NC_000070 chr4 155630120
NC_000071 chr5 152537259
NC_000072 chr6 149517037
NC_000073 chr7 152524553
NC_000074 chr8 131738871
NC_000075 chr9 124076172
NC_000076 chr10 129993255
NC_000077 chr11 121843856
NC_000078 chr12 121257530
NC_000079 chr13 120284312
NC_000080 chr14 125194864
NC_000081 chr15 103494974
NC_000082 chr16 98319150
NC_000083 chr17 95272651
NC_000084 chr18 90772031
NC_000085 chr19 61342430
NC_000086 chrX 166650296
NC_000087 chrY 15902555
Tagged with:
Follow

Get every new post delivered to your Inbox.