The data structure consists of:
– a set of chromosomes
– each chromosome is represented by a name, and an ordered (doublely linked) list of nodes
– each node represents one tag of a pair
– each node has a double sided link to the next node in chromosome order (the node with the next largest offset), and a list of nodes in which the node is part of a pair. The links to the pairs are one-way.
The chromosome name is not part of the node struct. Each node does have a pointer back to it’s chromosome structure, and that chromosome struct contains the name. This prevents a potentially long chromosome name from being stored a ton of time (in each node). When we are reading the nodes from a file, we must pass in a char pointer so that the name of the chromosome can be set. We will use this char pointer to put the node into the proper chromosome structure.