Joe Gray from the Oregon Health and Science University explains how its cloud-based cancer research project with Intel is progressing
The worldwide roll-out of the Intel-backed Collaborative Cancer Cloud (CCC) research initiative is a step closer, with two further universities joining the cause.
The CCC is a joint project led by Intel and the US-based Oregon Health and Science University (OHSU), with the Ontario Institute for Cancer Research in Toronto and the Dana-Farber Cancer Institute in Boston now joining the fray.
Speaking at the Intel Cloud Day in San Francisco, Joe Gray from OHSU said the announcement marks an important next step in expanding the initiative overseas.
“The idea is to deploy the first stage of the Collaborative Cancer Cloud, where we have instances in the three institutions,” said Gray.
“What we’re working together to do is develop the sociological rules of the road about how to effect information exchange. This is just the first step in the international expansion of the whole concept.”
Using cloud to share cancer data
The CCC’s aim is to create a cloud-based platform for use by medical researchers and hospital staff to share genomic, imaging and clinical data that can be used to provide patients with cancer-specific treatment options – a process known as “precision medicine”.
The difficulty researchers face when trying to achieve this is the fact cancers are “remarkably heterogeneous”, said Gray, which is why they are using cloud to effectively create a global database of information that can be drawn on for diagnostic purposes.
“Every cancer, at some level, is its own unique disease. What we’re trying to understand in this chaos are the patterns that characterise a particular cancer. We can use that information to define how we’re going to treat that cancer,” he said.
“We need to have a database we can use to compare our individual cancer. To have enough statistical power to identify relevant patterns, we’re going to have to compare our cancers to samples from millions of patient to really make this precise.”
Big problems with big data
The project has been several yearsin the making, with formal details first announced at the Intel Developer Forum in August 2015.
Gray outlined some of the big data analytics challenges the research team has faced to-date at the Intel Cloud Day, as it seeks to cut treatment decision times for cancer patient to less than a day by 2020.
From a workflow perspective, this would see a patient undergo a biopsy, have the information gleaned from that immediately analysed and swiftly compared with other samples to establish how best to treat it.
“The problem with this workflow is not only do we have to manage the data and interpret it, we have to do so quickly,” said Gray.
“We can’t take months and months to do this. We have to be able to render a decision in a few days, and our goal by working with Intel is to do all that in a day in 2020.”
Achieving this requires data collection on a massive scale, as each patient – just from a genomics point of view – gives rise to a couple of terabytes of information, said Gray.
This will undoubtedly rise over time to exabytes of data, as the sophistication of the technologies used to track a patient’s genomic, clinical and image data improves.
“The measurement technologies we have today generate the amount of data that they do because that’s all their computer capabilities allow. As they advance, the amount of data that we’re going to be developing is going to scale likewise,” he said. “It’s going to get worse, so we really have to figure out how to manage the data.”
Conducting data in the cloud
This is why the CCC is taking a cloud-based approach to managing this data, while many other medical institutions are spending huge sums of money trying to build their own in-house hardware and analytics stacks, said Gray.
“What we’ve been doing in collaboration with Intel over the past two years is to design CCC to meet some of these challenges. The system we have come up with is really federated computing,” said Gray.
“Rather than trying to centralise the data, what we’re going to do is allow each data-generating institution to keep control of their data and orchestrate these federated computer centres in a way that will allow us to seamlessly send our compute to the data.”
04 Apr 2016