Ph.D., Columbia University, 2008
Computational biology; molecular networks; human genetics; human disease; applied statistical and machine learning; biomedical text-mining; molecular evolution
I work in the field of computational biology. I apply advanced machine learning and statistical modeling techniques on massive amounts of biomedical data. I focus on building and refining models of molecular networks and in applying these models to address biological and medical problems.
The Biomedical literature is a rich source of unstructured data about molecular interactions. Both the tremendous growth of the scientific literature and the need to look at the biological systems at a global/system level call for automated approaches for extracting information locked in text form. I have worked on the development of the GeneWays system which addresses this need. One of the problems with the information extracted from the literature is that not all published statements are correct. An indication that such errors exist is given both by the fact that some papers are eventually retracted and by observed inconsistencies between statements from different articles. In particular, in our automatically extracted data, we observe sets of statements about a particular interaction containing both positive (claiming the two entities interact) and negative (claiming that the two entries do not interact) assertions. We developed global statistical models describing the database of all extracted facts which allowed us to resolve such inconsistencies and also to uncover curious trends in the collaborative dynamics of a scientific community. For example, we observed with high statistical confidence that already published statements do influence the interpretation of current experimental results.
Common complex hereditary disorders are believed to be both multifactorial and heterogeneous. Traditional methods of genetic analysis, which work successfully on "simple" Mendelian disorders in which variation at a single genomic locus almost deterministically influences whether the individual will get the disease, fail when applied to complex (multifactorial and heterogeneous) diseases and the need for new methods to trace the etiology of common disorders is apparent. I believe that a successful approach should incorporate both new technologies (i.e. next-generation sequencing) for ascertainment of genetic variants and the automated interpretation of the generated data in the context of the vast amounts of biological knowledge stored in a structured form in number of biomedical databases or locked into the scientific literature. I develop such methods and apply them in my efforts to understand the genetics of diverse set of disorders like autism, schizophrenia, ataxia, and leukemia.
Please visit Iossifov's Lab home page.
Rodriguez-Esteban, R. & Iossifov, I. Figure and table mining for biomedical research. (2009) Bioinformatics 25: 2082–4
Liu, J., Ghanim, M., Xue, L., Brown, C. D., Iossifov, I., Angeletti, C., Hua, S., Nègre, N., Ludwig, M., Stricker, T., Al-Ahmadie, H. A., Tretiakova, M., Camp, R. L., Perera-Alberto, M., Rimm, D. L., Xu, T., Rzhetsky, A., & White, K. P. Integrated genomic analysis of the Drosophila segmentation network leads to identification of a highly specific biomarker for human kidney cancer. (2009) Science 323: 1218–22
Ivan Iossifov, Tian Zheng, Miron Baron, T. Conrad Gilliam, and Andrey Rzhetsky Genetic-linkage Mapping of Complex Hereditary Disorders to a Whole-genome Molecular-interaction Network. (2008) Genome Research
Murat Cokol, Ivan Iossifov, Raul Rodriguez-Esteban, and Andrey Rzhetsky How many scientific papers should be retracted? (2007) EMBO reports 8: 422–3
Raul Rodriguez-Esteban, Ivan Iossifov, and Andrey Rzhetsky Imitating manual curation of text-mined facts in biomedicine. (2006) PLoS Comput Biol. 2: e118