Examples
Trail finding
Suppose CoMetGeNe.py
(trail finding) is executed as follows:
python2 CoMetGeNe.py eco data/eco/ -dG 2 -dD 1 -o eco.out
Metabolic pathway maps for Escherichia coli K-12 MG1655 ( eco) are automatically downloaded from KEGG and stored under data/eco/. At most two genes (-dG 2) and one reaction (-dD 1) can be skipped. Results are saved in the output file eco.out.
For the above example,
CoMetGeNe identifies 238
trails of span ranging from 2 to 10 (i.e., the 238 trails contain from
2 to 10 unique metabolic reactions). Below is the output corresponding
to a trail of span 3:
path_eco00564.kgml: Found a trail of span 3 containing skipped vertices
path_eco00564.kgml: 110 -> [139] -> 104 -> 123
path_eco00564.kgml: R02054 -> [R04864] -> R02053 -> R03416
path_eco00564.kgml: eco:b3821 -> [eco:b2836] -> eco:b3821 -> eco:b3825
path_eco00564.kgml: 3.1.1.32 -> [2.3.1.40] -> 3.1.1.4 -> 3.1.1.5
path_eco00564.kgml: Skipped genes: eco:b3823, eco:b3822
- path_eco00564.kgml is the file name for the pathway map 00564 in eco (glycerophospholipid metabolism), retrieved automatically from KEGG by CoMetGeNe.
- The four lines with entities separated by arrows
(->) represent the trail
R02054 -> R02053 -> R03416
in four distinct manners, using:
- The KGML identifiers of the reactions in the trail (110 -> 104 -> 123).
- The KEGG R numbers associated to the reactions in the trail (R02054 -> R02053 -> R03416). Span is computed in terms of distinct R numbers in the trail.
- The names of genes whose products are involved in reactions in the trail (eco:b3821 -> eco:b3821 -> eco:b3825).
- The EC numbers associated to the reactions in the trail (3.1.1.32 -> 3.1.1.4 -> 3.1.1.5).
- The reaction R04864 was skipped (allowed because CoMetGeNe.py was executed with the option -dD 1): it is shown in square brackets, along with the corresponding KGML identifier (139), associated gene (b2836), and EC number (2.3.1.40).
- Two genes were skipped (allowed because CoMetGeNe.py was executed with the option -dG 2): eco:b3823 and eco:b3822.
Trail grouping
Trail grouping by genes
Suppose trail finding was performed for species
aae,
bbn,
eco, and
mpn. A small part of the
CSV obtained when
grouping
CoMetGeNe trails
by genes for
eco as the reference species
is reproduced below (slightly re-formatted for readability purposes):
eco_gene;chr;str;aae;bbn;mpn
b0114; chr; + ; . ; . ; x
b0115; chr; + ; . ; . ; x
b0116; chr; + ; x ; . ; x
From the table above, it can be seen that:
- Species aae has at least two neighboring homologues to the gene b0116 in eco;
- Species bbn has no neighboring homologues for the three genes in eco;
- Species mpn has neighboring homologues for all the three genes in eco.
Trail grouping by reactions
Suppose trail finding was performed for species
aae,
bbn,
eco, and
mpn. A small part of the CSV
obtained when grouping
CoMetGeNe trails
by reactions for
eco as the reference species is
reproduced below (slightly re-formatted for readability purposes):
reaction;eco_gene;pathway; aae;bbn;mpn
R07618; b0116; 00010 00020 00280 00620 00640; . ; o ; x
R03270; b0114; 00010 00020 00620; o ; o ; x
R00014; b0114; 00010 00020 00620; o ; o ; x
R02569; b0115; 00010 00020 00620; o ; o ; x
From the table above, it can be seen that:
- Species aae performs only reaction R07618 but none of the three other reactions;
- Species bbn performs none of the four reactions;
- Species mpn performs all four reactions using products of neighboring genes.