Carlos Francisco Méndez Cruz

Computational Genomics Researcher

Assistant Professor

Academic history

Carlos-Francisco Méndez-Cruz studied Information technology at the National Autonomous University of Mexico (Universidad Nacional Autónoma de México, UNAM) and obtained his bachelor´s degree in 1997. He worked for the Board of Trustees of the UNAM (Patronato universitario) for seven years, where he started as computer programmer and finished as head of the information technology area. His interests moved towards Natural language processing and Computational linguistics and he started postgraduate studies in 2005. He obtained master and PhD degrees in linguistics from the UNAM. His doctoral research was about unsupervised learning of morphology of Spanish language and he received his doctoral title in 2013. He did postdoctoral research at the Engineering Institute of the UNAM from 2014 to 2015 in author profiling, news processing and the development of a digital library of Mexican art. He currently holds the position of Assistant Professor (Investigador Asociado C) at the Center for Genomic Sciences at the UNAM in the Computational Genomics Research Program.
The focus of his work is on knowledge extraction from biomedical literature by using natural language processing and text mining methods. This extracted knowledge is used by life science researchers to formulate new research plans, and by biocurators to feed biological databases such as RegulonDB. Ongoing projects are “Automatic summarization of transcription factor properties”, “Automatic extraction of regulatory interactions and growth conditions” and “Automatic extraction of gene-disease events for pulmonary diseases”. He also coordinates the development of a new infrastructure for curation with text mining capabilities as a collaborator in a Dr. Julio Collado’s project (NIGMS-NIH).
He has been lecturer since 2000 in master (Linguistics and Management of Information Technology) and undergraduate programs (Linguistics, Computational Engineering, Information Technology). His latest given courses have been “Introduction to Natural language processing for biomedical literature” and “Introduction to Text mining for biomedical literature” in the Undergraduate Program on Genomic Sciences. Additionally, he has supervised several dissertations of undergraduate students and one of a master student.



Selected publications

Méndez-Cruz, C.-F., Gama-Castro, S., Mejía-Almonte, C., Castillo-Villalba, M.-P., Muñiz-Rascado, L.-J. and Collado-Vides, Julio (2017). First steps in automatic summarization of transcription factor properties for RegulonDB: classification of sentences about structural domains and regulated processes. Database, Vol. 2017: article ID bax070. PMID: 29220462.
Méndez-Cruz, C. F., Medina-Urrea, A., & Sierra, G. (2016). Unsupervised morphological segmentation based on affixality measurements. Pattern Recognition Letters, 84, 127-133. DOI: 10.1016/j.patrec.2016.09.001.
Méndez-Cruz, C. F., Torres-Moreno, J. M., Medina, A. y Sierra, G. (2013). Extrinsic Evaluation on Automatic Summarization Tasks. Testing Affixality Measurements for Statistical Word Stemming. En I. Batyrshin y M. González Mendoza (Eds.), Advances in Computational Intelligence, MICAI 2012, Part II (LNAI Vol. 7630, pp. 46-57). Heidelberg: Springer. ISBN 978-3-642-37797-6, ISSN 0302-9743.

Telephone: (777) 3132063


2019Santos-Zavaleta, A, Salgado, H, Gama-Castro, S, Sánchez-Pérez, M., Gómez-Romero, L., Ledezma-Tejeida, D., García-Sotelo, J.S., Alquicira-Hernández, K, Muñíz-Rascado, L, Peña-Loredo, P., Ishida-Gutiérrez, C., Velázquez-Ramírez, D.A., Del Moral-Chávez, V, Bonavides-Martínez, C, Méndez-Cruz, CF, Galagan, J. and Collado-Vides, J. (2019). "RegulonDB v 10.5: tackling challenges to unify classic and high throughput knowledge of gene regulation in E. coli K-12". Nucleic Acids Research. 47(D1):212-220. [doi:10.1093/nar/gky1077]30395280
2017Méndez-Cruz, CF, Gama-Castro, S, Mejía-Almonte C, Castillo-Villalba MP, Muñíz-Rascado, L and Collado-Vides, J. (2017). "First steps in automatic summarization of transcription factor properties for RegulonDB: classification of sentences about structural domains and regulated processes". DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION. 2017():-. [doi:10.1093/database/bax070]29220462
2016Méndez-Cruz, CF, Medina-Urrea, Alfonso and Sierra, Gerardo. (2016). "Unsupervised morphological segmentation based on affixality measurements". PATTERN RECOGNITION LETTERS. 84():127-133. [doi:10.1016/j.patrec.2016.09.001]
2019Arroyo Fernández, Ignacio, Méndez-Cruz, CF, Sierra, Gerardo, Sidorov, Grigori and Torres-Moreno, Juan-Manuel. (2019). "Unsupervised Sentence Representations as Word Information Series: Revisiting TF--IDF". COMPUT SPEECH LANG. 56():107-129. [doi:10.1016/j.csl.2019.01.005]
2018Arroyo Fernández, Ignacio, Carrasco Ruiz, Mauricio and Méndez-Cruz, CF. (2018). "Procesamiento de lenguaje natural en la conservación de la herencia cultural: una red semántica del Popol Vuh". Latest Computational Approaches to Cultural Heritage. ():-.
2018Méndez-Cruz, CF and Arroyo Fernández, Ignacio. (2018). "Análisis automático de unidades morfológicas: segmentación y agrupamiento en español, maya y náhuatl". Ámbitos Morfológicos. Descripciones y métodos. ():-.
2017Rinaldi F, Lithgow O, Gama-Castro, S, Solano H, López-Fuentes, A, Muñíz-Rascado, L, Ishida-Gutiérrez C, Méndez-Cruz, CF and Collado-Vides, J. (2017). "Strategies towards digital and semi-automated curation in RegulonDB". DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION. 2017(1):bax012-. [doi:10.1093/database/bax012]28365731
2013. (2013). "Extrinsic Evaluation on Automatic Summarization Tasks. Testing Affixality Measurements for Statistical Word Stemming". MICAI 2012: Advances in Computational Intelligence. II():46-57.
2011. (2011). "The GIL-UNAM-3 summarizer: an experiment in the track QA@ INEX’10". INEX 2010: Comparative Evaluation of Focused Retrieva. 6932():282-289.
2006. (2006). "Arquitectura del Corpus Histórico del Español de México (CHEM)". Avances en la Ciencia de la Computación. ():248-253.
2005. (2005). "Extractive summarization based on word information and sentence position". Computational Linguistics and Intelligent Text Processing. ():653-656.
2004. (2004). "CLI: An Open Linguistic Corpus for Engineering". Proceedings of IX Ibero-American Workshop on Artificial Intelligence. ():203-208.