Ryan's Blog

Reading mapped files and fastq files

Posted in programming, research by ryanlayer on December 2, 2009

When reading the files, we deal with two types, three files in total.  Two of the files (.out files) are the result of mapping the fastq files to the reference.  Each file represents all tags on one side of the pair.  The fastq file can represent either side of the pair.

We assume that all files are ordered.

For each file, we want to extract the entries.

In the mapped file, we ignore any line that begins with ‘#’, or has a status other than ‘U’ which indicates it is a uniq mapping.  While we are reading the mapped files, each line read can have one of three results and will return one of three integer values:

  1. valid entry (1)
  2. invalid entry (status != ‘R’, ^ = ‘#’, etc.) (0)
  3. end of file (-1)

These different return values allow us to loop for the next valid entry and stop looping when we reach the end:

while(read_line(mapped.file) == 0) {;} will loop until we have a valid entry or have reached the end of the file.

In the fastq file, there are 4 lines of data per tag, we are only concerned with the first, so we simply skip three lines between entry reads.

Tagged with: , , , , ,