Skip to main content

Data Storage Evolves

By entimo Staff

Increasing volumes and changing types of data require new approaches.

Considerations of cost and ownership are encouraging companies in pharmaceutical research to separate their data storage from their computation needs. The widespread perception that storage is inexpensive and easily extensible compared to computing power is not necessarily correct. The real deal is that organizations should scale storage and computing separately, as they do not always rise in tandem, and thus the separation makes sense to better provide for savings. Looking at a situation more closely reveals that ownership of data used in analysis and computational processes may be more complex. A computational environment might not be considered the owner of the data it uses, and so that data is stored elsewhere. Similarly, data silos within organizations may be set up such that data cannot be used for downstream processes, so that exports or copies have to be created to perform the computations.

Clinical data used to be more homogeneous.

Another development affecting how organizations evaluate storage and computation is the changing nature of clinical data that are gathered in the course of trials. In previous years, clinical data was primarily stored as SAS datasets natively in file systems, along with additional items derived from or working with those datasets, items such as documents, programs, output, and logs. As data volumes have grown – thanks to more studies, bigger studies, more types of data used in studies, as well as keeping data for secondary use in the active systems – the industry standard has shifted away from clinical/tabular data and toward heterogeneous collections such as x-omics data, biomarker, and healthcare data from wearables. The need for version control and audit trails imposes new and different storage requirements on these kinds of collections, in contrast to SAS datasets in a file system, where one of the main challenges was rapid availability stemming from the need for rapid input-output when working with SAS.


The desire for end-to-end traceability along with regulatory requirements have led entimo, in both its products entimICE DARE and entimICE FastTrack, to follow the principle of ownership of the data in their repository. This applies to all metadata, and also to data, which is mostly held in a central file system. Following an evolutionary path, entimICE FastTrack is designed to offer more flexibility and connectivity for external storage. Within entimICE DARE, there is still an option to store clinical data in a database, with entimICE retaining ownership.

As the size and nature of data gathered during clinical trials expand and change, the tools needed to analyze and understand that data will are also evolving. Consider the example of a trial sponsor that decides to build clinical data storage for all data that comes in from EDC systems and other sources. While the storage is organized in a fashion specific to the company, it uses approaches that are well understood. Further, the sponsor needs all other tools to interact with the data in this cluster.

Engineers are working to ensure that entimICE FastTrack can adapt to this new environment by projecting data from the storage into its own repository, following a proxy concept and other comparable methods. The software only projects relevant data, and is able to track changes on the projected data as well as mirroring this information in its own traceability records. This capability is important for keeping information needed for compliance all within entimICE FastTrack.

Using a clinical repository approach, entimICE FastTrack can have full ownership and control on dedicated production life cycle areas in the external data. This guarantees the control for production runs within entimICE, where greater oversight of specific workflows is necessary to guarantee regulatory compliance. The file repository as well as the data projection remain transparent for end users. For example, in a study’s development lifecycle area, programs from the file system and datasets from the projection can appear side by side, and can both be accessed and used.

Contemporary data is more heterogeneous.

The combination of these approaches enables entimICE Fast Track to evolve with the changes in how organizations throughout the drug development process are addressing increasing amounts of data not only in increasingly heterogeneous formats but also data sources outside of entimICE.