We are excited to present our next “People in ME/CFS Research” spotlight this week, which is my interview with Dr. Peter Robinson. Peter and his group have developed an international standard for computation over human disease that is widely used by scientists and clinicians, and he is also the Computational Biology Lead for the ME/CFS CRC at The Jackson Laboratory. I recently sat down with Peter to learn more about his background and his plans for the ME/CFS project. Keep reading to find out what he thinks!
Hi Peter! Thanks for meeting with me and answering some questions. So, you’re an MD, but you’re currently doing research. Can you tell me a little about your background and how you ended up at JAX?
Hey, Courtney! So, when I was in college, I studied Mathematics and Computer Science. I was actually accepted into a couple of PhD programs in Computer Science, but my father, who was a physician, said “why don’t you study medicine?” After a couple of conversations about it, it seemed like a good idea, so I kind of switched and wound up studying Medicine at UPenn [University of Pennsylvania]. I then did what in America would be called a residency, in Pediatrics in Germany at the Charité University Hospital. During that time, my original plan was to be a clinician. But then I got involved in a research project in genetics and diagnostics of Marfan Syndrome and Neurofibromatosis, and I just developed an interest in genetics. So, after I finished the residency, I switched to the Genetics Department of the hospital, and for the first time was doing mainly research instead of being a clinician. Sometime after I made the switch, by chance my wife was between jobs. She’s an Ophthalmologist and she was doing her Master’s in Medical Informatics, so she said, “hey, can you help me with this homework? I have to program something in Java,” so I started helping and I was like, “wow this is a lot of fun!” So anyway, I heard there was the field of Bioinformatics, and I looked that up and said, “hey this would be perfect for me.” So, I studied Computer Science via the internet at Columbia University and I got a diploma in it in 2004. Then one of my mentors in Germany sort of financed a Master’s student, so we did the first bioinformatics project together, and that resulted in a publication in Human Molecular Genetics, which is quite a good journal. And every year I added another person to the group and ended up with a pretty big group.
And then in terms of coming to JAX – after being at this hospital in Germany for almost 20 years, it just seemed like the right time for a change. So often you just want new opportunities and new challenges. So, I had been looking for a new position, and I liked the mix of people here at JAX and the possibility to work on things relatively close to basic science and also relatively close to genomics, and also be able to do bioinformatics. And there is the dedication here to research, so instead of spending half of my time teaching and 20% of my time doing clinical diagnostics, and not having much time to do research, now I would have the time. I had a pretty big team at the hospital that was very efficient, but I personally wanted to be doing research all day myself too.
So what research are you currently working on?
Well, we’re working on a number of projects. One of my biggest projects is this Human Phenotype Ontology [HPO], which is essentially a semantic network that describes the signs and symptoms of human disease. Any one patient will have a certain collection of signs and symptoms, and what the HPO does very well is allow you to compute over those symptoms, and to calculate similarities to disease models and to model organisms. We have been developing software to combine that with genome sequencing for diagnostics, and we’re now in the process of expanding that. So, we have improved algorithms for genome sequencing, and we’re working on a new algorithm for using HPO in a way that will be more realistic for normal clinical settings outside of research. We’re starting to look into integrating HPO into the electronic healthcare system. And we’re also working on genomics and we have had a long-standing interest in ChIP-seq [ChIP sequencing is a method used to analyze protein interactions with DNA], which is one way of looking at genetic regulation. Another thing is we will be looking at sex differences between some datasets, which have about 20,000 samples. We’ll be looking at cancer, and of course the ME/CFS project. Initially we developed the HPO for rare disease, but for something like ME/CFS, the time course is also important and there are some areas like the amount of pain and fatigue that need to be captured in more detail. So, we’re working on acquiring the resources to be able to do that.
Going off of that, what exactly will you be doing for the ME/CFS project?
So, the clinical data captured from the project will involve a series of yes/no questions, or a series of numbers. So if you have a yes/no question such as “did you have a fever in the last 7 days?” then that means you can divide your entire patient group into two cohorts, those that had fever and those that didn’t, and then you can do an association test. So, you can test an association with any of the things you can measure, like gene expression or immune activation. And this is fine for some clinical questions, but it’s not obvious how to create a profile for related signs and symptoms, as ME/CFS is a very complicated disease with manifestations in the neurological system, immunological system, pain, fatigue, and a very complex phenotype. So just measuring one of the features at a time is not necessarily the best way of capturing the phenotype.
So, our ontology framework basically allows you to capture the phenotypic cluster where patients are similar but not identical, so it’s not a yes/no question where it’s really clear you fall in this group or that group, but there are clusters of phenotypic profiles that resemble each other in a very flexible way. The goal is to develop a framework that will allow us to divide patients into clusters and then see if these clusters correlate with one or more of the features in the genomics base. That’s really the first component of the project.
The other component is to develop a machine learning framework that will also try to cluster the features such as distribution of bacterial species, gene expression, immune profiles, and metabolism. So, we have some major data sources in the project, and the overall goal is to create a machine learning framework that would look for clusters that can interrelate these data sources. And if we find something where we thought there was a significant co-clustering in two or more of these data sources, then my interpretation is that this would represent something that’s real rather than some kind of statistical blip.
Can you explain what machine learning is?
A lot of machine learning is trying to find a classification. So, let’s say you’re interested in a particular disease, so you want to classify people as either having the disease or being healthy. Machine learning essentially takes as input data that is arranged in features – like the level of your blood sugar, your temperature, your heart rate, your breathing rate; these could all be features. So, when you start off trying to do a classification, it’s unlikely that all of these features will actually be relevant, so machine learning is using statistical techniques to identify combinations of relevant features and then combines them algorithmically in ways that will provide a classification. When I say combine, imagine that you’re in essence adding up the weights of all of these individual features, and that gives you an as an answer, true or false. Machine learning comes up with a formula for you with a clever algorithm, where you put in data, and it will output a prediction.
So what are some of the technologies you use to accomplish machine learning?
Ontologies are written in OWL, the Web Ontology Language. It basically defines individual concepts with reference to other things. For example, our ontology defines the individual terms with other ontologies for anatomy, biochemistry, histology, pathology, and a few others. So, there are ways of combining all of these ontologies and using what’s called computer reasoning to find new associations. We additionally are using a number of other algorithmic ideas, graph algorithms, and quite a bit of frameworks.
What do you hope to accomplish in the first year of the ME/CFS project?
For the project, I would feel successful after the first year if we have been able to establish the ontology resources that we will need for the project. And I would also hope that we would have been able to import some patient data and test some clustering procedures.
What do you think are the important advances happening right now in the ME/CFS field?
Looking at the literature, what’s striking to me is that the main problem is still identifying patients that have this disease and making a definition for it. It seems there are varying inclusion criteria depending on what study you look at, such that it’s very difficult to compare the results of two different studies. It also seems that a lot of the studies have been underpowered, so I don’t really know of any studies that have been replicated in an independent cohort. There have been a lot of interesting leads, especially with immune abnormalities, but I’m not currently convinced that there’s a good handle on the pathophysiology.
For a disease as heterogeneous as this one, you need someone to say, “I have a rule for identifying patients that have a subtype of this disease that has the same mechanism.” Because medications will treat a mechanism. And I don’t think anyone really knows right now if ME/CFS is one disease, or if it’s several diseases, and I don’t think anyone really knows how to reliably diagnose a patient let’s say at their first visit, who is going to have this typical course who doesn’t have one of the differential diagnoses. I think that the whole field is kind of at the beginning and it’s obvious to many people that there is a disease entity, or a set of disease entities that are a very serious problem for a large number of people. I think that what the community needs to do is have a lot more rigor in defining patient cohorts, and I’m hoping that the NIH consortium will converge on case definitions that will not only extend some of the clinical definitions but will hopefully include some biomarkers that, for instance might be immunological parameters. I don’t think there would be agreement in the community right now with what they should be, but I think that one of the outcomes of the several groups that have been funded is to identify a shortlist of parameters that can serve as the starting point for more rigorous clinical studies.
Another thing we want to do with this project is really to develop this clinical ontology, but also to use (epi)genomics to try to identify some features which will identify a cohort of patients that not only have ME/CFS but also have a common cause of the disease.
Great, thanks for all the information about your part of the project, Peter!