[Audio] Harmonizing Electronic Health Records through a common data model. Seminar 1: Duke Clinical Research Datamart..
[Audio] The first Informatics seminar in the Fall 2021 series discussed the development of the Duke Clinical Research Datamart ( CRDM) to integrate Electronic Health Records ( EHR) data using a Common Data Model( CDM). EHR is an important resource for research on population health and clinical data, but there are many obstacles to overcome in order to aggregate the data into a useful, usable and meaningful data set..
[Audio] As described by Hurst, et al in their 2021 paper on the development of the Duke CRDM, "While EHR systems represent an important research data source, these data are highly complex and can be difficult to access... one way to make EHR data more accessible and actionable for research purposes is to organize it into it into smaller relational databases, referred to as datamarts. These datamarts are typically organized under Common Data Models ( CDMs).".
[Audio] As Duke University already participated in a nationwide datamart, the Patient Centered Outcomes Research Network ( PCORnet), the development of the CRDM required adding containers for additional data to the existing common data model. These are referred to in the presentation as " data sidecars"..
[Audio] How does disparate ehr data get into a common data model? This presentation will explore the design of metadata ontologies in implementing Common Data Models for Electronic Health Records. I am beginning work as a consultant with the North Carolina Department of Health and Human Services Injury & Violence Prevention Branch on a project to develop a Common Data Model for Injury & Violence data, and I am using this presentation to expand my background knowledge for that work..
[Audio] There are many types of CDM for EHR. PCORnet is structured like a traditional relational database with many tables, and is built on the Mini-Sentinel data model ( Klann et al, 2019). Mini-Sentinel is a component of the US Food and Drug Administration ( FDA) Sentinel Initiative, an active surveillance system using EHR data ( Curtis et al, 2012). The All Of Us Research Program ( AOU), a project of the National Institutes of Health, is building a database containing EHR and genomic data from one million patients. AOU is built on " informatics for integrating biology in the bedside" (i2b2), which "uses a star-schema format, pioneered by General Mills in the 1970s and widely used in retail data warehouses"(Klann et al, 2019). This format contains a single data table, rather than many related tables. The AOU project required transforming i2b2 data into Observational Medical Outcomes Partnership ( OMOP) data, described as a hybrid model which includes domain tables like those in PCORnet and also a " fact" table like i2b2..
[Audio] " Healthcare institutions tend to support at most one CDM, and the choice often depends on which national initiatives a site participates in. Each of these models have their own quirks, value sets, terminologies, and value representations, making each one unique enough to impede interoperability...Increasingly, in order to participate in multiple national initiatives, sites must support all three data models"( Klann et al, 2019). To populate PCORnet, partner data contributors transform their local EHR data into the PCORnet Common Data Model (PCORnet, 2020), which describes the specifications for the tables and fields (shown on next slide)..
[Audio] This slide shows an example of a populated table in PCORnet and the table specification, which describes each column in the table and the format for the data it should contain..
[Audio] EHR data comes in many different formats which must be reconciled in order to be meaningful when aggregated. Common Data Models also come in many different formats, and a given institution may need to translate data between different CDMs in order to engage in national projects. This is time consuming and complex..
[Audio] "A clinic might code birthdate as " Date_of_Birth," a health system might call it " Birth_DT," and a health registry might use the code " DOB." In this example, the CDM offers a universal variable: " BIRTH_DATE." Each organization's system can map its own birth date, no matter how it is labeled, to this variable. By using the CDM, your organization keeps control over your members' data, but you can also collaborate with other groups or researchers to run more effective observational research and clinical trials. " ( PCORnet, 2017)..
[Audio] The i2b2 data harmonization article describes a process similar to that which occurs for data entering PCORnet: "In our previous work, we developed a " PCORnet Information Model" in i2b2, modeled as an i2b2 ontology, that exactly represents the data structure and permissible data elements of PCORnet CDM. [ 10] Local sites adopt this ontology and use our mapping methodology to "redirect" ontology elements to the sites' local codes, without modifying their underlying data. Then data is transformed "on-the-fly" through the ontology module when it is queried using i2b2 or the multi-site i2b2 query system, the Shared Health Research Informatics Network ( SHRINE) tool." ( Klann et al, 2019).
[Audio] Ontology: "In computer science and information science, an ontology encompasses a representation, formal naming and definition of the categories, properties and relations between the concepts, data and entities that substantiate one, many, or all domains of discourse. More simply, an ontology is a way of showing the properties of a subject area and how they are related, by defining a set of concepts and categories that represent the subject. " ( Wikimedia, 2021).
[Audio] But how does it get there? Processes such as Dynamic Extraction, Transformation and Loading ( D-ETL) have been developed for health data. Extraction, Transformation and Loading is typically performed in two phases by different skilled personnel: a subject matter expert who extracts the original data, and a database programmer who transforms the data. D-ETL is an approach that automates some parts of the process and retains manual aspects to others, "lowering technical barriers for health data domain experts to play the main role in ETL operations by simplifying data transformation processes." ( Ong et al, 2017)..
[Audio] EHR data promises to transform healthcare through allowing analysis of huge amounts of real-world data. While EHR systems have become widespread throughout clinical care settings, the integration of those data into structures accessible and useful to researchers presents an ongoing complex challenge..