The Launch of the Toxin Vocabulary: Our Solution to the Medical Research Complexity

iconDigital healthcare

In such an evolving Medical field, the synthesis of reliable and insightful data is a basis for making innovation and progress possible. With its global adoption, the Observational Medical Outcomes Partnership Common Data Model (OMOP CDM) has been an important tool for advancing drug safety monitoring and healthcare outcome prediction. However, it has a gap in the representation of toxic substances and environmental exposures, an aspect central to deepening our understanding of their impacts on human health.

Meanwhile, the Toxin Vocabulary – our revolutionary solution, is designed to improve the representation of toxic substances within the OMOP CDM, enabling better analysis of the complex interplay between environmental factors and health outcomes.

In this article, we will tell you about the approaches and collaborative efforts that powered the creation of our Toxin Vocabulary. The Vocabulary aims to empower researchers, healthcare professionals, and organizations to get deeper insights regarding environmental exposures and human health outcomes.

Let’s explore what insights are possible with our Toxin Vocabulary!

Background: Expanding OMOP CDM Usage

Now, let’s get into more detail about the current approach and the problems researchers face.

The OMOP CDM was established as an open community data standard, created to harmonize the structure and content of observational data and to enable efficient analyses that can produce reliable evidence. It was widely adopted by researchers and healthcare organizations around the globe. That’s how OMOP CDM simplified drug safety monitoring, comparative effectiveness research, clinical trial design, and healthcare outcome prediction. However, the representation of toxic substances and environmental exposures within the OMOP CDM has been a crucial need in environmental epidemiology.

At the same time, environmental epidemiology focuses on investigating the impacts of exposure to toxic substances on human health, considering both short-term and long-term effects. To support such studies, Geographic Information Systems (GIS) have been utilized to analyze the spatial distribution of exposures and assess their potential health consequences. While recent efforts have aimed to integrate GIS data with the OMOP CDM, insufficient standards have hindered the comprehensive evaluation of environmental exposures and their associated health risks.

So, that is how we came up with the idea of solving this issue and developing a hierarchical Toxin Vocabulary as a solution to improve the representation of environmental exposomes within the OMOP CDM. This standardized terminology has been developed through a systematic review of toxicological literature, analysis of open toxin databases, and consultation with experts in the field. By synthesizing the most relevant and up-to-date toxin terminology, our Vocabulary aims to facilitate environmental exposure assessment, support toxicological and epidemiological research, and enable the integration of GIS-related data into the OMOP CDM.

Our Journey of Creating the Toxin Vocabulary

The journey of the Toxin Vocabulary development needed a systematic approach containing a comprehensive review of toxicological literature, analysis of open-source toxin databases, and consultation with domain experts. These steps were essential in synthesizing a thorough and accurate representation of toxic substances within the OMOP CDM.

As we already mentioned before, firstly, we conducted a systematic review of toxicological literature to identify relevant terms and classifications associated with various toxins and their impact on human health. By examining a variety of research papers, regulatory documents, and many different authoritative sources, we reached a comprehensive understanding of the diverse range of toxins and their associated semantic attributes.

And, simultaneously with the literature review, we performed an analysis of open-source toxin databases. A primary resource that stood out in terms of comprehensiveness and reliability was the Toxin and Toxin Target Database (T3DB). T3DB provided us with a vast repository of toxin terminology, including descriptions of over 3,000 toxins with 41,602 synonyms. This database encompassed a wide range of toxins, including pollutants, pesticides, drugs, and food toxins, and provided extensive metadata fields for each toxin record (ToxCard), such as chemical properties, toxicity values, molecular and cellular interactions, and medical details.

The process of integration required using the information obtained from the literature review and the T3DB to develop the Toxin Vocabulary. Also, it involved automatically uploading the source data to the PostgreSQL database using Python. Afterward, we extracted essential metadata, established cross-term connections, and performed a semi-automated mapping of selected terms to the OMOP Vocabulary standards.

To ensure compatibility and seamless integration with the existing OMOP CDM standard vocabularies, the Toxin Vocabulary was mapped to relevant terminologies, including SNOMED CT, RxNorm, and RxNorm Extension. During the mapping process, we needed to associate corresponding concepts from the Toxin Vocabulary with the appropriate standard concepts within the OMOP CDM. This created the link between toxin terms and established clinical concepts, enabling us to do a comprehensive analysis and integration of environmental exposures with other healthcare-related data.

As unique vocabulary identifiers, we used CAS codes due to their alignment with GIS data and the CAS Registry, one of the largest registries encompassing around 204 million organic substances. For toxins without CAS codes, unique T3DB codes were assigned, ensuring proper identification and classification.

We have seamlessly incorporated the Toxin Vocabulary into OMOP instance by methodically organizing the information in preliminary stages, following the standard OHDSI contribution process, and ensuring each piece of data is accurately placed and interconnected for optimal use. These staging tables were instrumental in incorporating the Toxin Vocabulary's semantic and syntactic aspects. In this way, we ensured the compatibility of the system with the existing OMOP CDM framework.

In the picture below you can see how our vocabulary works, our decisions, and their subsequent impact on the OMOP CDM structure


Revealing the Results

Our Toxin Vocabulary represents a hierarchical and expansive representation of toxic substances within the OMOP CDM. It has over 79,377 internal relationships and maps the complex interconnections between toxins, cellular structures, relevant diseases, biological processes, and more, offering researchers an unprecedented level of detail in their analysis.

But the Vocabulary's strength doesn't end here. The integration with standardized vocabularies such as SNOMED CT and RxNorm strengthens its capabilities, creating a symbiotic relationship. Such a synergy is the foundation for a more deep and detailed exploration of the exposome and can offer better insights and create a more complex understanding of the toxin-health outcome dynamics.

Furthermore, it opens up new sights in drug safety monitoring, clinical trial design, and health outcome predictions, empowering researchers and healthcare professionals to harness rich, GIS-related data for advancing toxicoepidemiological research.

The Toxin Vocabulary is not just a tool – it's a gateway to a future with an insightful understanding of environmental impacts on health.

What's Next: Presenting Our Vocabulary

In the world of open science, innovative approaches and tools play a key role. The Medical Team of our company, Sciforce, is truly proud to contribute to this development, focusing on OHDSI vocabularies. Our Vocabulary was first presented at the OHDSI GIS Working Group. And, after the validation of the vocabulary is completed, we would be happy to present it publicly at the Global OHDSI Symposium (New Jersey, USA) on October 20, 2023, and officially integrate it into the OHDSI ecosystem! This opens up new opportunities in the fields of Geographic Epidemiology and Toxicoepidemiology.

We are truly happy to introduce our enhanced integration of the Toxin Vocabulary, setting a new standard for healthcare analytics and research.