Subscribe to Newsletter
Discovery & Development Analytical Science

Data: Wanted Dead or Alive

Looking back to my time in the lab as a process chemist for a pharmaceutical company, I remember my daily laboratory routine like it was yesterday. Outside of meetings and discussions on our target API, it was very much a design-synthesize-analyze-report routine. We didn’t have an electronic lab notebook (ELN) at the time, so everything was in paper. Even back then, when I stocked away yet another lab notebook on a shelf with hundreds of others, I couldn’t help but wonder, “Does anyone ever look back at these notebooks? And if they do, how can they possibly ever find what they are looking for?”

Several years later, the old routine involving the paper lab notebook has disappeared, especially in large pharmaceutical and biotech laboratory environments with the emergence of the ELN. The movement had huge intellectual property reasoning behind it, but there was also the expectation that it would help shed light on the dark data and experiments that were buried away in old paper lab notebooks. But among all the scientific data that is being generated in the industry to support chemical and biological workflows, a significant amount of that data still ends up in a place where it can never be re-accessed or re-used. Indeed, a joint Scientific Computing and IDBS survey from 2011 claims that, despite the emergence of modern laboratory informatics systems, 88 percent of R&D organizations lack adequate systems to automatically collect data for reporting, analysis, re-use, and future decision making.

In the old days, our data transactions were very much of the “One-and-Done” variety; we’d acquire data, print it out, review it and then either throw it away – or if it was really important, glue it into our lab notebooks. Unfortunately, the workflow hasn’t changed much, even though we have much better technology and much fancier ways to process and visualize the data. For example, chemists will run a reaction and acquire some analytical data to confirm that they have made the right stuff. After that, they will generally convert their information-rich data into a PDF for proof of the transaction and attach it to their ELN. The analytical data is being treated almost exactly the same as it was 20 years ago; the processed and interpreted data is buried, never to be interrogated again.

Dead data is difficult to search and impossible to re-process.

What value does this legacy transactional data really have? Imagine  that you are in the lab doing a separation or purification on your compound of interest. What happens when you discover that dreaded new impurity peak in your chromatogram? Why have you never seen it before? Has someone else seen it before? In reality, data is often acquired and interpreted from scratch, without any knowledge of previous investigations. Perhaps, someone else has actually fully characterized and studied the very impurity you are concerned about. This unproductive environment is the result of the one-and-done data lifecycle. Given the technology we have at our disposal today, this really should not be the case.

Managing in-house data is one thing, but what if 70 percent of your data is being generated by contract research organizations in various locations? In our increasingly outsourced world, data is dispersed all over the globe and, therefore, is commonly shared and distributed via PDF files, text documents, and spreadsheets. Certainly, the file sizes are kept pretty small and the applications for viewing the data are universal, when I was at ACD/Labs we called it ‘dead’ data. Why? Because all of the rich information within the file has been completely stripped away, reducing it to a series of text strings, tables, or images. Dead data is difficult to search, and impossible to re-process, re-analyze, or compare with newly acquired ‘live’ data sets.

Crucially, such dead-data workflows are preventing scientists from re-using, re-purposing, and re-leveraging legacy and existing data sets. In the past, if we had questions, we could turn to our colleague across the hall, or take a short walk to another department to speak to a long-term specialist. But the world has changed. We are being asked to collaborate with colleagues across the globe with whom we’ve never met; sometimes, we may not share a common language. I strongly believe that the future of medicine, science, and technology demands that we evaluate our changing landscape and tackle emerging issues head on with the myriad of technologies already available.

So, is your data dead or alive?

Receive content, products, events as well as relevant industry updates from The Medicine Maker and its sponsors.
Stay up to date with our other newsletters and sponsors information, tailored specifically to the fields you are interested in

When you click “Subscribe” we will email you a link, which you must click to verify the email address above and activate your subscription. If you do not receive this email, please contact us at [email protected].
If you wish to unsubscribe, you can update your preferences at any point.

About the Author
Ryan Sasaki

Ryan Sasaki is former Director of Global Strategy at ACD/Labs.

Register to The Medicine Maker

Register to access our FREE online portfolio, request the magazine in print and manage your preferences.

You will benefit from:
  • Unlimited access to ALL articles
  • News, interviews & opinions from leading industry experts
  • Receive print (and PDF) copies of The Medicine Maker magazine

Register