Unraveling the Complexities of Life Sciences Data

Big Data, Volume 1, Number 1, 2013
Roger Higdon, Winston Haynes, Larissa Stanberry, Elizabeth Stewart, Gregory Yandl, Chris Howard, William Broomall, Natali Kolker and Eugene Kolker

The life sciences has entered into the realm of big data and data-enabled science, where data can either empower
or overwhelm. These data bring with them the challenges of the 5 Vs of big data: volume, veracity, velocity, variety,
and value. Both independently and through our involvement with DELSA Global (Data-Enabled Life Sciences
Alliance International,, the Kolker Lab is creating partnerships that identify data challenges
and solve community needs. We specialize in solutions to complex biological data challenges, as exemplified by
the community resource of MOPED (Model Organism Protein Expression Database,
and the analysis pipeline of SPIRE (Systematic Protein Investigative Research Environment, PROTEINSPIRE.
org). Our work extends into the computationally intensive tasks of analysis and visualization of millions of protein
sequences through innovative implementations of sequence alignment algorithms and creation of the Protein
Sequence Universe tool (PSU). Pushing into the future, our lab is pursuing integration of multi-omics data and
exploration of biological pathways, as well as assigning function to proteins and porting solutions to the cloud. Big
data have come to the life sciences; discovering the knowledge in the data will bring breakthroughs.