Pablo's research and teaching areas The laurel forest at the Pijaral, Anaga mountains, Tenerife, Canary Islands

Software tools developed by Bruno Contreras-Moreira and Pablo Vinuesa

On this page I provide links to the software tools developed in collaboration with my colleague Bruno Contreras-Moreira for pan-genome analyses, phylogenomcis, population genetics, genomic taxonomy and design of lineage-specific degenerated PCR primers.

GET_HOMOLOGUES: a versatile, powerful and user-friendly software package for microbial pangenomics

GET_HOMOLOGUES is an open source software package for microbial pangenomics and comparative genomics that we have recently released to the public domain under the GNU Genral Public Licencense. It is written in Perl and R and has been been designed to run on Linux/Mac OS X computer systems. It implements a fully automatic and highly customizable analysis pipeline, including genome data download, extraction of user-selected sequence features, running of BLAST and HMMER jobs, and indexing, clustering, and parsing of results. It can take advantage of modern multiprocessor architectures, as well as computer clusters, to parallelize time-consuming BLAST and HMMER jobs. It can handle large data sets on reasonably modest machines by using Berkeley DB to write temporary data to a disk and/or by calling a heuristic version of our bidirectional best-hit (BDBH) algorithm. Auxiliary scripts are integrated to facilitate the parsing and generation of gene families, including the computation of consensus clusters recovered by combinations of the sequence-clustering algorithms supported. Other scripts are provided for the statistical analysis and graphical display of results, including core and pangenome plots, by calling R functions. Diverse comparative-genomics analyses can be also performed. Finally, an installation script is provided to simplify the installation process, and a very detailed manual with hands-on tutorials is also provided to make this software package reasonably user-friendly.

GET_HOMOLOGUES is freely available for academic use only. You can dowload the source code here and here GitHub.

If you find our code useful in your work, please cite any of the following publications:

Analysis of Plant Pan-Genomes and Transcriptomes with GET_HOMOLOGUES-EST, a Clustering Solution for Sequences of the Same Species

The pan-genome of a species is defined as the union of all the genes and non-coding sequences found in all its individuals. However, constructing a pan-genome for plants with large genomes is daunting both in sequencing cost and the scale of the required computational analysis. A more affordable alternative is to focus on the genic repertoire by using transcriptomic data. GET_HOMOLOGUES_EST is designed to compare the transcriptomes of differente tissues of an individual or those from the same tissues and developmental stage of different individuals/ecotypes of a species. The goal was to sample their pan-genomes and classify sequences as core, if detected in all accessions, or accessory, when absent in some of them. The resulting sequence clusters were used to simulate pan-genome growth, and to compile Average Nucleotide Identity matrices that summarize intra-species variation. read more: Front. Plant Sci., 14 February 2017

back to table of contents

The primers4clades web server: tree-guided design of degenerate PCR oligonucleotide primers from multiple sequence alignments for metagenomic and molecular systematic studies

the primers4clades web server: design PCR oligonucleotide primers from multiple sequence alignments

primers for clades is an easy-to-use web server developed for researchers interested in designing PCR primers for cross-species amplification of novel sequences from metagenomic DNA or from uncharacterized organisms belonging to user-specified phylogenetic clades or taxonomic groups. It implements an extended CODEHOP primer design strategy based on both DNA and protein multiple sequence alignments of coding sequences. It evaluates a comprehensive set of thermodynamic properties of the oligonucleotide pairs, as well as the phylogenetic information content of the theoretical amplicons, which is computed from the branch support values of maximum likelihood phylogenies estimated for each molecular marker. Phylogenetic trees displayed on screen make it easy to target the primer design for particular species groups or sequence clusters selected by the user. The server is useful to design taxon-specific PCR primers for molecular ecology and systematic studies of bacteria and eukaryotes.

Rarefaction curves demonstrate the power of the lineage-targeted metagenomics approach

rarefaction curves for rpoB amplicon libraries from two Mexican soils vs. 16S library from Amazonian soil

The results presented above (Fig. 6 of Sachman et al., 2011) demonstrate the power of the lineage targeted approach to metagenomics. A nearly saturating sampling of taxa at the species level (P-distance cutoff ~ 0.05 for the rpoB gene used herein) for the genus Mycobacterium can be achieved by sequencing rpoB-amplicon libraries of as few as 50 clones generated from two Mexican tropical soils. In contrast, sequencing 16S rRNA amplicon libraries (100 clones) generated with universal primers from Amazonian soil DNA does not provide a representative sampling of the community.

Primers4clades is developed by Bruno Contreras-Moreira (Laboratorio de Biología Computacional, Estación Experimental Aula Dei, CSIC, Spain) and Pablo Vinuesa (Center for Genomic Sciences, UNAM, Mexico) and is mirrored at two sites: primers4clades - Mexico and primers4clades - Spain. If you use it, we would greatly appreciate your feedback in order to improve the documentation and extend the FAQs list!

Other useful PCR-related sites and webpages:

back to table of contents