Program Responsible: Guillermo Dávila, PhD

Research Projects

Top Microevolutionary Genomics of Rhizobium etli.

Earlier genetic population studies pointed out that R. etli has high diversity levels as measured by multilocus enzyme electrophoresis. There are also differences in plasmid number and size. Recently, we obtain the complete genome sequence of R. etli CFN42. This strain contains six large plasmids and a circular chromosome that together account for 6, 530, 234 bp. In this context, we want to evaluate the levels of molecular variation among strains of R. etli, to answer the following questions:

  1. What is the amount of variation among the R. etli strains?
  2. Is there a difference among the rates of evolution of the different replicons of R. etli?
  3. Is the symbiotic plasmid (or any other replicon, set of genes, or genes) under positive selection?
  4. What is the main evolutive force that drives the diversification of R. etli? Is it the mutation or recombination?

Our approach consists on collect random sequences from the genome of 10 R. etli strains from distinct geographical origin to a depth of 0.5x. This will allow us to cover at least 60% of the genes determined for R. etli CFN42. By paired comparisons we will assess the amount of nucleotide polymorphisms (SNPs) between each strain in relation to the CFN42 strain. Several evolutionary models will be used to determine the rates of evolution and selection on replicons or individual genes.

To evaluate the role of recombination on the diversification of R. etli populations, we choose a local population already characterized by MLEE, and a set of polymorphic gene markers derived from the former experiment. These markers, located throughout the chromosome and the replicons of the CFN42 strain, will represent the ranges of variability high, medium or poor, found among geographically distant R. etli strains. Pairs of oligonucleotides for each marker will be designed and PCR products for 30 local strains will be raised and sequenced. Inferences on polymorphism degree and rates of recombination will be done by genetic population techniques (e. g. determination of heterocigocity and linkage disequilibrium).


Dr. Víctor González, experimental design and analyses.
Biol. José L. Acosta, evaluation of recombination on local populations.
M. C. Rosa I. Santamaría, DNA sequencing and polymorphism analyses.
M. C. José L. Fernández, DNA sequencing.
Q. I. Patricia Bustos, polymorphism analyses.
M. C. Ismael L. Hernández, Bioinformatics.
Dr. Guillermo Dávila, data analysis.

TopEvolution of Insertion Sequences in Rhizobium.

Insertion sequences (ISs) are mobile elements constituted basically by a gene that encodes the transposase (an enzyme required for transposition) and invert repeat sequences (IRs) at both sides. ISs have been grouped in different families according to their similarity, transposase type, transposition mechanism and the sequence of the IRs. The representation of the ISs families varies widely among bacteria. Some species lack of the ISs, whereas others contain more than 100 ISs of different families or belong to only one predominant family. Neither the origin of ISs nor their cellular function is known, except for some of them that code for adaptive phenotypes (e. g. antibiotic resistance).

There are numerous ISs in the genomes of the symbiotic nitrogen fixing bacteria already sequenced. Particularly, they are abundant in symbiotic plasmids and symbiotic islands. In R. etli CFN42 there are 153 ORFs related to ISs. They are distributed in the symbiotic plasmid p42d, the conjugative plasmid p42a, and in the chromosome. There are no ISs in the other four plasmids (p42b, p42c, p42e, p42f) that compose the genome of the R. etli CFN42 strain. It is also noteworthy that no ISs interrupt any ORF, known or hypothetical. Altogether, these observations indicate that ISs are associated to horizontal gene transfer through conjugative plasmids. Therefore, they might represent a link with the origin of symbiosis and the life-style of R. etli. The project aims to answer the following questions:

  1. What is the distribution of IS families among the species of the order Rhizobiales?
  2. What is the degree of divergence among the ISs of the same family across species of the order Rhizobiales?
  3. Do the ISs are subjected to selective pressures?
  4. What is the dynamic of ISs in the R. etli population?
  5. Do they are inherited by vertical or horizontal descendence?

Currently we have started a systematic search for ISs into the complete genomes available for species of the order Rhizobiales. It should produce a map of the distribution and preponderance of the ISs, as well as a first sight of their dynamics. Taken the already characterized ISs of the genome of R. etli, we are looking for the homologous ISs into syntenic regions of R. etli strains of distinct geographical locations and from a single locality. Inferences about ISs movement will be drawn from the differences in the size of the amplified fragments for homologous regions of the genome. Finally, phylogenetic analysis will be employed to reconstruct the dynamic of one or two IS families across the species of the order Rhizobiales.

Participants .

M. C. Luis Lozano, design, performs experiments, and data analysis.
Dr. Víctor González, design experiments, and data analysis.
Dr. Guillermo Dávila, data analysis.

TopEvolutionary Coherence among Orthologs of the Order Rhizobiales.

The order Rhizobiales of the a -proteobacteria is one of the taxa with more complete genomes available. It allows designing comparative and evolutionary studies at high level taxonomic categories. Moreover, the order Rhizobiales groups species with very distinct and contrasting life-styles. There are pathogens of animals and plants, autotrophic species and symbionts of plants. The differences in life-styles probably owe to adaptations and require particular sets of genes, and could explain the variability in genome structure among these species. Nonetheless, if the order Rhizobiales is a natural clade, it must conserve a common core of genes across all the species. Therefore, the core of rhizobial orthologs would produce coherent phylogenies that might reveal the speciation process. The next questions will be addressed in this project:

  • What is the set of orthologs common to the species of the order Rhizobiales
  • Do they are coherent with the species divergence?
  • If question 2 is negative, what is the fraction of incongruent orthologs? Why do they are incongruent?
  • Do the orthologs have conserved the same function?
  • Does it possible to identify the genes responsible for specific adaptations?

We will take R. etli CFN42 as reference specie for this study, and all the orthologs detected in the order Rhizobiales for phylogenetic reconstruction. At first, we will determine the evolutionary histories for the entire set of orthologs. By different test of congruency and distance, it will be tested the coherence among all topologies of the trees generated. Since we started with a working definition of orthology (the best bi-directional hits criterion), our approach aims to contrast it with the species divergence. It is expected to find a diversity of evolutive histories according to the selective regimes and life-style of the specie.


M. C. Santiago Castillo, design, performs experiments and data analysis.
Dr. Víctor González, experimental design and data analysis.

TopEvolution of Cell Organelles Genomes.

In eucaryotic cells there are two important organelles, mitochondria and plastids, which produce the energy and synthesize substances for cell metabolism respectively. These organelles are compartmentalized by two layers of membrane, have multiple copies of its genome, and are inherited in non-Mendelian fashion. It is now accepted that these organelles were originated from ancient endosymbionts of eukaryotic cell. Therefore, the genetic information from organelle genomes can provide an independent view of the phylogeny of their host organisms.

Both mitochondria and plastid have evolved via a process of genome reduction. The size of mitochondrial genome varies from 200 to 2500 kb while the plastid genomes are much less variable in size from 35 to 217 kb, but the majority of them are between 115-165 kb. So far, there are 49 genomes of plastids reported and only a few of 774 mitochondrial genomes sequenced belong to plant. It is clear, that the vast majority of plastid proteins are nucleus-encoded. The plastid genomes only have retained a small portion of sequences from their ancient endo-symbiotic ancestor and had transposed the majority of their genes to nucleus of host cells. There is no proof showing the transfer of nucleus-encoded gene to organelle genomes yet. Recently, there is evidence indicating that the chloroplast genes could be transferred to mitochondria. On the other hand, intergenomic mtDNA and cpDNA recombination seems to be a relatively low frequent phenomenon. Consequently, it could advantageous to use organelle genes for the analysis of phylogeny of plants. In our research line, we first focus on the study of the genome of chloroplast of common bean (Phaseolus vulgaris) and several of their varieties. The chloroplast of bean will be sequenced, analyzed, and compared. Further, some nuclear genes will also be sequenced to gain insight about the phylogeny of this important crop. Meanwhile, we will test the rearrangements within plastid genome and the gene transposition among the three compartments. The following questions will be in our research:

  1. What is the general structure of the chloroplast genome of Phaseolus vulgaris and the divergence pattern of its varieties?
  2. What is the phylogenetic relationship between the nucleus-encoded genes and chloroplast genes?
  3. Are there intra- or inter-molecule rearrangements within chloroplasts genomes?
  4. What is the amount of gene flow in between the genome compartments of beans?
  5. What is the effect of self-fertilization plant on the evolution of its plastid genome?


Xianwu Guo, design, performs experiments and data analysis.
Victor González, design experiments and data analysis.
Oscar Brito, Bachelor Thesis, DNA purification and template preparation.
Jose Luis Fernández, construction of DNA libraries.
Rosa I. Santamaría, DNA sequencing.
Patricia Bustos, assembly.
Guillermo Dávila, data analysis.

TopIdentification of Promoters in the Rhizobium etli Genome.

All bacteria, except Deinococcus radiodurans, contain a main sigma subunit, also known as housekeeping or sigma 70 ( s 70 ), which controls the expression of most of the genes in any growth condition. The s 70 subunit of E. coli and Bacillus subtillis recognize two types of promoter sequence consensus: the first one, or “canonical promoter”, consists of two hexamer sequences [TTGACA] and [TATAAT], centered respectively in the positions -10 and -35 relative to the transcription start site. The second one, named “extended–10 promoter” consist of an element [TATAAT] plus two nucleotides [TG], located in the –12 and –13 positions. In this promoter type a –35 region or equivalent sequences does not exist. Thought these promoter systems have been thoroughly studied in E. coli and B. subtillis, nothing is known about the promoter structure in the majority of the bacteria. Rhizobium etli possesses a housekeeping s 70 (SigA) larger than the E. coli s 70 . Most of the s 70 promoters already characterized in R. etli are not transcribed by the E. coli RNApol in vivo or in vitro. In contrast, the RNApol holoenzyme of R. etli can initiate transcription of typical E. coli s 70 promoter. This observation suggests some differences between the transcriptional machinery of E. coli and R. etli , perhaps at the level of promoter recognition by s 70 factor. To analyze the molecular basis of the R. etli gene expression, in this study we want to identify, characterize and sequence active promoters of R. etli under exponential growth conditions.


Dr. Miguel Ramírez, experimental design, performing experiments and data analysis.
C. Gamaliel López Leal, promoter analysis by saturation mutagenesis.
Dr. Guillermo Dávila, data analysis.

TopTranscriptional Regulation of the Heat Shock Regulon in Rhizobium etli.

In all organisms, a sudden increase in temperature leads to the accumulation of partially unfolded proteins which tend to form life-threatening aggregates within cells. These non-native proteins induce the activation of the heat shock regulon. The bioinformatic analysis of the R. etli genome suggests that the heat shock regulon comprises about 30 different genes whose expression is controlled by two copies of the alternative sigma factor RpoH ( s 32 ). Preliminary results indicate that RpoH1 and RpoH2 are not functionally equivalent. To understand the different roles of both copies of RpoH, we want to identify the expression conditions of the rpoH1 and rpoH2 genes, the set of genes that are induced by RpoH1 and/or RpoH2, and the promoters recognized by these proteins.


Biól. Alma Ruth Reyes González, Doctoral thesis.
Dr. Miguel A. Ramírez, Director.

TopFunctional characterization of regulatory networks in Rhizobium etli.

The general goal of this research line is to contribute to the understanding of the regulatory networks that control genetic expression in R. etli in response to different external signals during free life and symbiosis. An integrative global regulatory model requires the identification of target genes for a certain transcriptional factor and the characterization of the molecular mechanisms that participate in its expression. Focused in new ORFs identified in the R. etli complete genome sequence, the objectives of my research include:

  1. Characterization of additional regulatory elements for fix genes in R. etli. (Participant: Manuel Granados)
  2. Analysis of R. etli FixL-FixJ-like proteins as members of putative two-component regulatory systems. (Participant: Cristian Arriaga)
  3. Functional characterization of the global mechanism used by R. etli to respond to the presence of nitrogen oxides. (Participant: Nicolás Gómez)
  4. Evaluation of the participation of FNR-type regulators present in the R.etli genome as regulatory proteins of the bacterial response to different environmental conditions. (Participant: Analilia Arroyo)


Dra. Lourdes Girard Cuesy
Manuel Granados
Cristian Arriaga
Nicolás Gómez
Analilia Arroyo

TopMolecular analysis of the repABC replication systems of R. etli.

The continuity of life depends of two highly coordinated processes: the duplication of the genetic material (replication) and its accurate and equally distribution into the daughter cells (segregation). In bacteria, chromosomes and plasmids of low-copy number are transmitted from one generation to the other with amazing precision, but the molecular mechanisms underling this phenomenon remain poorly understood.

The genes involved in replication and segregation of low copy-number plasmids are organized at least in two modules: the first one embraces those genes controlling the initiation of plasmid replication, and the other contains genes encoding the components of an active machinery of segregation. Each one of these modules are subject of an independent and complex regulation that difficult their study. The exceptions are the plasmids belonging to the repABC family. In these plasmids both replication and partitioning genes are encoded within a single operon.

The repABC operons are present not only on large plasmids of low copy number of some a -proteobacteria such as Rhizobium, Mesorhizobium, Sinorhizobium, Agrobacterium, Rhodobacter, Oligotropha, Ruegeria, Nitrobacter and Paracoccus, but also in the chromosomes of Agrobacterium and Brucella.

The repABC basic replicons consist of an operon of three protein-encoding genes repA, repB, and repC, a par-site and a highly conserved small antisense RNA (ctRNA) gene located between repB and repC. The first two genes of the repABC operon encode proteins that show sequence and functional similarities to ParA and ParB, the proteins of the best-characterized segregational system of chromosomes and plasmids. RepA, by itself or along with RepB, has been implicated in the negative transcriptional regulation of the repABC operon. RepA has also been identified as a trans-incompatibility factor. RepC is essential for replication and for this reason was suggested to be the initiator protein. RepA and the ctRNA were found to be incompatibility determinants.

The par-site is a centromere-like sequence that can be located upstream to the repA gene, between repA and repB gene or downstream of repC. This sequence is the target of RepB, one of the components of the segregation machinery. It has been shown that the par site is also a strong incompatibility factor.

In our group we have four research lines:

  • We are interested two understand at the molecular level which are the mechanisms involved in the genetic regulation of the repABC operon.
  • We are also studying the elements involved in the segregation of the repABC plasmids. Their interactions and their roles in plasmid partitioning.
  • We are investigating the mechanisms of replication of these plasmids, with special emphasis in the replication initiation process.
  • We are interested in the mechanisms that determine the positions of the repABC plasmid within the cell and during the cell cycle.


Dr. Miguel Ángel Cevallos Gaos