Data: Wanted Dead or Alive
Can R&D organizations really afford to kill off analytical data that could be used for future decision-making? Perhaps it’s time to breathe life back into analytical data environments.
Ryan Sasaki |
Looking back to my time in the lab as a process chemist for a pharmaceutical company, I remember my daily laboratory routine like it was yesterday. Outside of meetings and discussions on our target API, it was very much a design-synthesize-analyze-report routine. We didn’t have an electronic lab notebook (ELN) at the time, so everything was in paper. Even back then, when I stocked away yet another lab notebook on a shelf with hundreds of others, I couldn’t help but wonder, “Does anyone ever look back at these notebooks? And if they do, how can they possibly ever find what they are looking for?”
Several years later, the old routine involving the paper lab notebook has disappeared, especially in large pharmaceutical and biotech laboratory environments with the emergence of the ELN. The movement had huge intellectual property reasoning behind it, but there was also the expectation that it would help shed light on the dark data and experiments that were buried away in old paper lab notebooks. But among all the scientific data that is being generated in the industry to support chemical and biological workflows, a significant amount of that data still ends up in a place where it can never be re-accessed or re-used. Indeed, a joint Scientific Computing and IDBS survey from 2011 claims that, despite the emergence of modern laboratory informatics systems, 88 percent of R&D organizations lack adequate systems to automatically collect data for reporting, analysis, re-use, and future decision making.
In the old days, our data transactions were very much of the “One-and-Done” variety; we’d acquire data, print it out, review it and then either throw it away – or if it was really important, glue it into our lab notebooks. Unfortunately, the workflow hasn’t changed much, even though we have much better technology and much fancier ways to process and visualize the data. For example, chemists will run a reaction and acquire some analytical data to confirm that they have made the right stuff. After that, they will generally convert their information-rich data into a PDF for proof of the transaction and attach it to their ELN. The analytical data is being treated almost exactly the same as it was 20 years ago; the processed and interpreted data is buried, never to be interrogated again.
What value does this legacy transactional data really have? Imagine that you are in the lab doing a separation or purification on your compound of interest. What happens when you discover that dreaded new impurity peak in your chromatogram? Why have you never seen it before? Has someone else seen it before? In reality, data is often acquired and interpreted from scratch, without any knowledge of previous investigations. Perhaps, someone else has actually fully characterized and studied the very impurity you are concerned about. This unproductive environment is the result of the one-and-done data lifecycle. Given the technology we have at our disposal today, this really should not be the case.
Managing in-house data is one thing, but what if 70 percent of your data is being generated by contract research organizations in various locations? In our increasingly outsourced world, data is dispersed all over the globe and, therefore, is commonly shared and distributed via PDF files, text documents, and spreadsheets. Certainly, the file sizes are kept pretty small and the applications for viewing the data are universal, when I was at ACD/Labs we called it ‘dead’ data. Why? Because all of the rich information within the file has been completely stripped away, reducing it to a series of text strings, tables, or images. Dead data is difficult to search, and impossible to re-process, re-analyze, or compare with newly acquired ‘live’ data sets.
Crucially, such dead-data workflows are preventing scientists from re-using, re-purposing, and re-leveraging legacy and existing data sets. In the past, if we had questions, we could turn to our colleague across the hall, or take a short walk to another department to speak to a long-term specialist. But the world has changed. We are being asked to collaborate with colleagues across the globe with whom we’ve never met; sometimes, we may not share a common language. I strongly believe that the future of medicine, science, and technology demands that we evaluate our changing landscape and tackle emerging issues head on with the myriad of technologies already available.
So, is your data dead or alive?
Ryan Sasaki is former Director of Global Strategy at ACD/Labs.