Automated Orchestration of Observational Research

Published: January 16, 2025

# Healthcare

# Big Data

Large data volumes are revolutionizing industries, including medical research. This influx of data enables observational studies that harness global statistical evidence. However, conducting such studies can be labor-intensive and prone to inconsistencies due to disconnected communication channels, like repositories, emails, forums, and chats. Moreover, adapting code to different environments during the execution phase can create unscalable and non-reusable analytical frameworks. In response, the OHDSI community is developing ARACHNE, an innovative platform designed to streamline observational research by fostering collaboration among life sciences, healthcare, academia, and organizations handling patient-level data.

Challenge

As a partner in several projects with OHDSI, our team played an instrumental role in the development of the Arachne Collaboration Network, a platform designed to facilitate the execution of federated observational studies, also known as Real-World Evidence. Sciforce's contributions included shaping the platform's architecture, overseeing both backend and frontend programming, and implementing fundamental DevOps practices.

Solution

We assumed that ARACHNE should automate evidence generation through distributed research studies. Our implementation is divided into two software applications:

ARACHNE Central

Registers and initializes new research projects.
Creates packages with necessary R scripts for each project.
Uses Docker to simplify R distribution installation by wrapping software with all necessary components, ensuring smooth execution regardless of environment.

ARACHNE Daemon

Installed at Data Nodes, communicates with ARACHNE Central.
Runs relevant R scripts and executes embedded SQL against the CDM database.
Sends resulting tables back to ARACHNE Central.

Development Journey

Back-end

Language: Java 8
Framework: Spring (used Boot for auto deployment and Security for authorization/access management)
ORM: Hibernate (simplifies data access with object-relational mapping)
Database: PostgreSQL (for user data)
Search Engine: SOLR (for advanced search capabilities)
Build Tool: Maven (for dependency management and project build)

Front-end

Libraries: React + Redux + Reselect
Styling: SASS (using BEM methodology)
Communication: Websockets (for real-time data transfer)
Bundling: Webpack (for module bundling and asset optimization)

Tools

Containerization: Docker (for R script isolation and ease of deployment)
Cloud: AWS (for scalable infrastructure)
Project Management: Atlassian tools (Confluence for documentation, JIRA for task tracking, Bitbucket for version control)

Data Storage

Big Data: Apache Cassandra (for storing research results)

Impact

We built the main part of the front-end app and its user interface, along with a file storage system. We also added features for faster communication and organized data indexing and multi-user support. Our work helped create a user-friendly tool that makes research more straightforward, transparent, and secure. It connects everyone involved in the research process, from those providing data to the people analyzing it, for a smooth end-to-end study.

RELATED CASE STUDIES

Cohort Definition and Building

The OMOP CDM is widely recognized as the industry standard for observational health research. It provides a standardized data model that facilitates data integration and sharing across different sources, enabling researchers to conduct studies at scale. ATLAS is a central tool used for research within the OMOP CDM ecosystem, providing a user-friendly interface for querying the data model and creating visualizations. Building well-defined cohorts is a critical first step in conducting research using the OMOP CDM. By selecting and defining cohorts of patients with specific characteristics or conditions, researchers can ensure that their studies are focused and relevant, and can generate reliable evidence.

# Healthcare

# Data Science

Attribute-based Mapping for Medical Terms

We have extensive experience in automating mappings between medical coding and classification systems by computationally processing their semantic meaning, which can be extracted using NLP methodology or explicitly available from the classification model. Our successful applications include mappings made from ICD-10-PCS, LOINC, and ICD-O-3 systems for OHDSI OMOP Standardized Vocabularies.

Automated Lung Pathology Detection with an AI Chest X-Ray Tool

The client is a healthcare company focused on medical imaging and diagnostics, working to improve the diagnosis of lung diseases like tuberculosis and COVID-19. They wanted to create an AI-based algorithm that could detect abnormal changes in chest X-rays and automatically prioritize the most urgent cases. It helps to make the diagnostic process faster and more accurate by identifying abnormalities and highlighting critical cases. It reduces the workload for radiologists and pulmonologists by automating routine tasks, so they can focus on more complex cases. By solving issues like delayed diagnoses and heavy workloads, the solution ensures patients with serious conditions get quicker and more reliable care.

View all Case Studies