Latex scientific notation, made easy
\providecommand{\e}[1]{\ensuremath{\times 10^{#1}}}
Then, typing
The [111] crystal planes are 3.2\e{-10} m apart.
http://www.tapdancinggoats.com/easy-scientific-notation-in-latex.htm
R plot magic
To move the axis labels/lines:
par(mpg=c(, , ))
default is mpg=c(3,1,0), but mgp=c(1.75, 0.5, 0) works
To move margins:
par(mar=c(,,,))
default is mar=c(5,4,4,2)+0.1, but mar=c(3,3,0,0)+0.1 works
Using awk to randomly sample a file
Create a file with 1000 lines:
for i in {1..1000};do echo $i; done > f
export P=0.5
cat f | awk -v p=$P 'BEGIN{srand()} {r = rand(); if (r <= p) print}'
General Papers
Next-generation gap
http://www.nature.com/nmeth/journal/v6/n11s/full/nmeth.f.268.html
Downloading an Entire Web Site with wget
$ wget \
--recursive \
--no-clobber \
--page-requisites \
--html-extension \
--convert-links \
--restrict-file-names=windows \
--domains website.org \
--no-parent \
www.website.org/tutorials/html/
This command downloads the Web site www.website.org/tutorials/html/.
The options are:
- –recursive: download the entire Web site.
- –domains website.org: don’t follow links outside website.org.
- –no-parent: don’t follow links outside the directory tutorials/html/.
- –page-requisites: get all the elements that compose the page (images, CSS and so on).
- –html-extension: save files with the .html extension.
- –convert-links: convert links so that they work locally, off-line.
- –restrict-file-names=windows: modify filenames so that they will work in Windows as well.
- –no-clobber: don’t overwrite any existing files (used in case the download is interrupted and
resumed).
http://www.linuxjournal.com/content/downloading-entire-web-site-wget
Local Installation and Use of R Packages
http://csg.sph.umich.edu/docs/R/localpackages.html
1. Specifying a local library search location
Specify a local library search location.
You can use several library trees of add-on packages. The easiest way to tell R to use these via a ‘dotfile’ by creating the following file ‘$HOME/.Renviron’ (watch the quotes and ~ character):
R_LIBS_USER="~/R/library"
This specifies a keyword (R_LIBS_USER) which points to a colon-separated list of directories at which R library trees are rooted. You do not have to specify the default tree for R packages.
If necessary, create a place for your R libraries
mkdir ~/R ~/R/library # Only need do this once
Set your R library path
echo 'R_LIBS_USER="~/R/library"' > $HOME/.Renviron
2. Installing to a local library search location
Installation is dead easy. Start up R and tell R to fetch your package from CRAN, compile whatever needs compiling and set everything else up.
Beware – each package will only work for the platform (i.e. Linux or Solaris) where you installed it. If you want a package on both Linux and Solaris, you’ll need to install it in different directories for each system type.
R # Invoke R
> install.packages("name-of-your-package",lib="~/R/library")
Novoalign Alignment Scores
Base Qualities and Alignment Scores
Novoalign aligns reads against a reference genome using qualities and ambiguous nucleotide codes.
The initial alignment process finds alignment locations in the indexed sequence that are possible
sources of the read sequence. The alignment locations are scored using the NeedlemanWunsch
algorithm with affine gap penalties and with position specific scoring derived from the read base
qualities and any ambiguous codes in the reference sequence. User defined affine gap penalties are
used for scoring insert/deletes.
Novoalign uses NeedlemanWunsch alignments with affine gap penalties, the gap opening penalty
should be set to where
is the probability of an insertion deletion
mutation vs the reference genome and is the gap extension penalty. Likewise the gap extend
penalty can be set to where
is the probability of a single base indel
and is the probability of a 2 base insert/delete mutation. The default gap penalties were
derived from the frequency of short insert/deletes in human genome resequencing projects.
Base quality values are used to calculate base penalties for the Needleman Wunsch algorithm. The
base qualities are converted to base probabilities and then to score penalties.
PRB Quality to Score Conversion
The prb file has quality score for each base,
, at each position,
, in the read. The quality
value is converted to a probability, and then to a penalty
.
Alignment Score and Threshold
The alignment score is where
is the probability of the read sequence
given the alignment location .
A threshold of 75 would allow for alignment of reads with two mismatches at high quality base
positions plus one or two mismatches at low quality positions or to ambiguous characters in the
reference sequence.
If a threshold is not specified then Novoalign will calculate a threshold for each read such that an
alignment to a nonrepetitive sequence will have an alignment quality of at least 20. I.e. The
iterative process of finding an alignment will terminate before finding a low quality chance
alignment. Alignments to repetitive sequences may still have qualities less than 20.
Posterior Alignment Probabilities and Quality Scores
The posterior alignment probability calculation includes all the alignments found; the probability
that the read came from a repeat masked region or from any regions coded in the reference genome
as N’s; and an allowance for a chance hit above the threshold based on the mutual information
content of the read and the genome.
A posterior alignment probability, is calculated as:
where is the probability of finding the read by chance in any masked reference sequence
or any region of the reference sequence coded as ‘s, and where
is the sum over all the
alignments found plus a factor for chance alignments calculated using the usable read and genome
lengths.
The term allows for the fact that a fragment could have been sourced from portions of the
genome that are not represented in the reference sequence. For instance in Human genome build 36
there is approximately 7% of sequence represented by large blocks of ‘s.
A quality score is calculated as , where
is the probability of the
alignment given the read and the genome.
Mus musculus (laboratory mouse) Chromosome
| GenBank id | chr | length |
| NC_000067 | chr1 | 197195432 |
| NC_000068 | chr2 | 181748087 |
| NC_000069 | chr3 | 159599783 |
| NC_000070 | chr4 | 155630120 |
| NC_000071 | chr5 | 152537259 |
| NC_000072 | chr6 | 149517037 |
| NC_000073 | chr7 | 152524553 |
| NC_000074 | chr8 | 131738871 |
| NC_000075 | chr9 | 124076172 |
| NC_000076 | chr10 | 129993255 |
| NC_000077 | chr11 | 121843856 |
| NC_000078 | chr12 | 121257530 |
| NC_000079 | chr13 | 120284312 |
| NC_000080 | chr14 | 125194864 |
| NC_000081 | chr15 | 103494974 |
| NC_000082 | chr16 | 98319150 |
| NC_000083 | chr17 | 95272651 |
| NC_000084 | chr18 | 90772031 |
| NC_000085 | chr19 | 61342430 |
| NC_000086 | chrX | 166650296 |
| NC_000087 | chrY | 15902555 |
leave a comment