logo

Patient Similarity Networks Development To Guide Clinical Decision-Making

Published: January 21, 2025
# Healthcare
# Data Science
A private research organization focused on evaluating the safety and effectiveness of medications. Their goal is to use advanced data analysis to support healthcare decisions and meet regulatory standards. For this project, they aimed to assess the safety of hydroxychloroquine, used alone or with azithromycin, for treating rheumatoid arthritis. The study focused on identifying short-term side effects and long-term risks, especially related to heart health and the use of multiple medications together. The client needed a data-driven approach to fill gaps in existing evidence by combining different clinical data sources, identifying patient groups with similar characteristics, and addressing factors that could affect treatment outcomes.

Challenge

1. Data Integration and Standardization

Integrating datasets from EHRs, genetic data, lab results, and patient-reported outcomes was challenging due to varying formats and qualities. Mismatched medical codes like ICD-10 and SNOMED CT caused inconsistencies, while missing data and conflicting details required resolution and tracking. These issues complicated the creation of a clean, unified dataset for analysis.

2. Patient Variability and Subgroup Representation

Differences in demographics (age, gender, ethnicity), medical histories (comorbidities, disease severity), medications, and lifestyle factors (e.g., smoking, activity levels) made accounting for variability challenging. Hidden factors, such as undiagnosed conditions or environmental influences, introduced biases that could distort results.

3. Patient Grouping and Similarity Analysis

Creating meaningful patient groupings was difficult due to variations in medical histories, lab results, genetic markers, symptoms, and social factors like income and access to care. Many patients didn’t fit neatly into a single group, making clustering methods like k-means and hierarchical clustering challenging to implement effectively.

4. Reliable and Reproducible Results

Ensuring reliable results required handling complex data and addressing uncertainties. Missing data and confounding variables posed significant challenges, demanding advanced techniques like survival analysis and mixed-effects models. Probabilistic predictions added complexity, requiring external validation to confirm relevance in real-world scenarios.

5. Treatment Risk Analysis and Impact Assessment

Evaluating treatment risks was challenging due to differences in patient demographics, pre-existing conditions, concurrent medications, and external factors like seasonal trends and socioeconomic disparities. Issues like inconsistent adherence and variable data collection further complicated efforts to ensure findings reflected real-world conditions.

Solution

To address the challenges of evaluating hydroxychloroquine safety, we developed a comprehensive solution that integrates advanced data processing, analysis, and modeling techniques. The product offers tools to:

1. Integrate and Standardize Diverse Medical Data:

Consolidates large datasets from multiple sources (EHRs, genetic data, lab results, patient-reported outcomes) into the OMOP Common Data Model (CDM), ensuring consistency and compatibility.

2. Build Patient Similarity Networks (PSNs):

Groups patients with similar characteristics (clinical, genetic, and phenotypic) to enhance risk analysis and treatment outcome prediction.

01_PS.jpg

3. Analyze Risks and Long-Term Safety:

Provides detailed insights into treatment risks and safety, accounting for variability across patient subgroups and external factors.

4. Support Research and Decision-Making:

Enables evidence-based findings for scientific publications and regulatory reports, facilitating informed clinical decisions.

02_PS.jpg

1. Unified Patient Network Framework

Creates a streamlined system for identifying patient similarities by combining clinical histories, genetic data, phenotypic traits, and social determinants of health (SDOH). It integrates this information from various sources into a unified network that supports precise and meaningful analysis.

2. Context-Aware Clustering Algorithms

Uses advanced clustering methods like hierarchical, density-based, and k-means to group patients accurately. These algorithms combine data such as clinical histories, genetic markers, phenotypic traits, and SDOH to create detailed and practical health profiles.

3. Confounder-Resilient Matching System

Uses methods like propensity score matching, inverse probability weighting, and stratification to carefully account for confounders. These techniques reduce bias, improve fairness in comparisons, and ensure results are precise and relevant for clinical decisions.

4. Dynamic Sensitivity Exploration

Performs thorough sensitivity analyses, including E-values, tipping point scenarios, and leave-one-out tests, to ensure findings are stable and reliable. It highlights potential weaknesses caused by unmeasured confounders, checks the validity of model assumptions, and adjusts methods to address variations and complexities in real-world data.

5. Insight-Enriched Clinical Tools

Features interactive dashboards for subgroup-specific risk assessments, predictive models for treatment outcomes, scenario-driven decision support tools, detailed effectiveness reports, and a proactive alert system for potential adverse events.

Development Journey

Iterative Methodology Refinement

  • Began with clustering approaches based solely on demographic data, but low accuracy necessitated incorporating genetic and phenotypic data to refine models.
  • Adjusted methods repeatedly to address unexpected data inconsistencies, ensuring robustness.

Adaptation to Rheumatoid Arthritis Patients

  • Factored in comorbidities, such as cardiovascular conditions, by adding specific adjustment coefficients.
  • Collaborated with clinicians to validate and refine these adjustments for clinical relevance.

Multidisciplinary Team Coordination

  • Brought together experts from clinical medicine, statistics, and informatics to balance rigorous statistical analysis with practical usability.
  • Facilitated continuous communication to align objectives and resolve interdisciplinary challenges.

Addressing Key Data Challenges

  • Terminology Mismatches: Automating mapping through SQL and OHDSI Usagi tools to resolve coding differences between datasets (e.g., SNOMED CT and ICD-10).
  • Underrepresented Subgroups: Applied reweighting techniques and generative methods to model characteristics of underrepresented patient groups.
  • Seasonality Effects: Used STL (Seasonal and Trend decomposition using Loess) to remove seasonal components from data and added seasonal indicators to statistical models to capture residual effects.
Technical Highlights

Data Integration and Standardization:

OMOP CDM:

Standardizes and integrates diverse medical data, including EHRs, laboratory results, and genetic profiles, ensuring consistency and compatibility for robust analysis.

OHDSI Tools:

  • Rabbit-in-a-Hat: Designs and implements ETL processes to streamline the integration of complex datasets from multiple sources.
  • Data Quality Dashboard: Automates the validation and quality assessment of data inputs, ensuring accuracy and reliability.

Data Processing, Analysis, and Sensitivity Assessment:

Python Libraries:

  • Pandas and NumPy: Handle data manipulation, transformation, and numerical computations efficiently.
  • Scikit-learn: Provides tools for clustering (e.g., K-means, hierarchical), dimensionality reduction (e.g., PCA, t-SNE), machine learning, and predictive modeling.
  • Statsmodels: Facilitates statistical modeling, hypothesis testing, mixed-effects models, and sensitivity analyses.
  • Lifelines: Supports survival and time-to-event analysis for deeper insights into treatment outcomes.
  • SciPy: Offers advanced computations and similarity metrics like cosine similarity and Euclidean distance.
  • spaCy and NLTK: Enable natural language processing for tasks such as tokenization, named entity recognition, and analysis of unstructured data.

R Packages:

  • EValue: Calculates E-values to assess robustness against unmeasured confounders.
  • boot: Performs bootstrap resampling for estimating confidence intervals and validating model stability.

SQL:

  • Efficiently queries and manages large-scale relational datasets, supporting data integration and analysis.

Visualization and Dashboard Development:

Python Libraries:

  • Dash: Creates interactive, web-based dashboards for data exploration and visualization.
  • Plotly: Delivers high-quality visualizations integrated seamlessly into dashboards.
  • Flask: Supports server-side integration and API deployment for dashboard hosting.

Deployment and Hosting:

AWS:

  • Provides scalable, secure hosting for dashboards and analytics pipelines, ensuring accessibility and reliability.

Impact

Starting Point (Point A):

At the beginning of the project, the client faced significant challenges:

  • Data came from heterogeneous sources, including EHRs, lab results, and genetic profiles, and lacked standardization.
  • Missing, incomplete, and conflicting data due to different terminologies (e.g., SNOMED CT vs. ICD-10) hindered analysis.
  • No clear analytical framework existed to accurately evaluate the risks and efficacy of hydroxychloroquine.
  • The client lacked tools to group patients with similar characteristics or account for critical risk factors.

Final Outcome (Point B):

The project delivered an integrated solution centered on Patient Similarity Networks (PSNs), providing:

  • Accurate Patient Grouping: PSNs formed patient cohorts with similar demographic, phenotypic, and genetic traits for precise risk and treatment analysis.
  • Interactive Dashboards: Enabled data visualization, subgroup analysis, risk prediction, and scenario modeling for both short- and long-term treatment effects.
  • Validated Findings: Demonstrated short-term safety of hydroxychloroquine while identifying potential long-term risks, including cardiovascular mortality.
  • Foundation for Research: Established a robust evidence base for personalized recommendations and future studies.

RELATED CASE STUDIES

View all Case Studies