Once CoMetGeNe trails are identified for several species, the conservation of metabolic and genomic organizational motifs can be investigated at an interspecific level. For simplicity, trails of metabolic reactions catalyzed by products of neighboring genes for a given species are called metabolic and genomic patterns. The role of trail grouping is to identify conserved such patterns for several species.
Given a reference species S among the ones for which trail grouping has been performed (using CoMetGeNe.py or CoMetGeNe_launcher.py), the script grouping.py can exploit trails of the reference species in either of the following two ways:
- CoMetGeNe trails of S are analyzed in terms of genomic conservation across the other species in the data set, referred to as grouping trails by genes. This consists in determining whether the genes of S involved in these trails have neighboring homologues in other species. See Genomic conservation patterns (grouping by genes) below.
- CoMetGeNe trails of S are analyzed in terms of metabolic conservation across the other species in the data set, referred to as grouping trails by reactions. This consists in determining whether reactions in the CoMetGeNe trails of S are also performed by products of neighboring geens in other species. See Metabolic conserveration patterns (grouping by reactions) below.
Trail grouping is provided with a user manual. You can also check out a few examples.
Note regarding directory structure
It is important to preserve this type of directory structure if CoMetGeNe.py is launched directly; in case CoMetGeNe_launcher.py was used, this particular directory structure is ensured.
Genomic conservation patterns (grouping by genes)
Trail grouping by genes identifies conservation patterns between a reference species and the other species in the data set in terms of genomic organization.
Syntax
This results in detecting genomic conservation patterns (trail grouping by genes) for species eco, using CoMetGeNe results stored in results/ and metabolic pathway maps stored in KGML format in data/ (see the Note regarding directory structure above). The output of trail grouping by genes is stored in CSV format in the file tsg_eco.csv.
Output format
The CSV file contains a line for every gene of the reference species S involved in CoMetGeNe trails of S that are common to S and at least one other species from the data set. Groups of neighboring genes in S involved in CoMetGeNe trails of S are separated by the line ***. In this CSV file, the line for a gene g of S contains:
- The name of gene g.
- The name of the chromosome on which g is located.
- The strand on the chromosome on which g is located (+ for the positive strand, - for the negative strand).
- A column for every other species in the data set that can take
either of the following values:
- A cross (x) if g has an homologue in the other species that is a neighbor of at least one other gene involved in the trail;
- A dot (.) if g has no such homologue.
Metabolic conserveration patterns (grouping by reactions)
Trail grouping by reactions identifies conservation patterns between a reference species and the other species in the data set in terms of metabolic organization.
Syntax
This results in detecting metabolic conservation patterns (trail grouping by reactions) for species eco, using CoMetGeNe results stored in results/ and metabolic pathway maps stored in KGML format in data/ (see the Note regarding directory structure above). The output of trail grouping by reactions is stored in CSV format in the file tsr_eco.csv.
Output format
The CSV file contains a line for every reaction of the reference species S involved in CoMetGeNe trails of S. Groups of reactions involved in CoMetGeNe trails of S are separated by the line ***. Note that a given reaction may appear several times in the CSV file, if it occurs in several CoMetGeNe trails of S. In this CSV file, the line for a reaction r in a CoMetGeNe trail of S contains:
- The KEGG R number for reaction r.
- The gene name(s) of the gene(s) of S involved in reaction r.
- The KEGG pathway map ID(s) for the pathway(s) in which the R number associated to r occurs.
- A column for every other species S' in the data set that can
take one of the three following values:
- A cross (x) if r is performed in species S' by the product of at least one gene neighboring at least one other gene involved in the CoMetGeNe trail to which reaction r belongs;
- A dot (.) if r is performed in species S' by the product of a gene that is not a neighbor of at least one other gene involved in the CoMetGeNe trail to which reaction r belongs.
- A circle (o) if r is absent from species S'.