Claim Denial Management

iconHealthcare
iconMachine Learning
iconComputer Vision
iconNatural Language Processing
iconNLP
iconDigital healthcare

Healthcare's intertwining with insurance has made claim denials a costly issue for individuals and institutions. In the U.S., $262 billion of $3 trillion in total claims were denied in 2016, averaging nearly $5 million per hospital. Hospitals, burdened with massive insurance plans and aiming for minimal coverage, struggle with the resources and time needed to appeal denials. They are unsure how to create an efficient prevention program without overwhelming their teams.

icon
Challenge

CD_1.png

The project aims to create an AI model that assesses the risks of claim denials. The current system consists of two main components: the Autoencoder and the Denial Brain.

icon
Solution

1. Autoencoder

CD_2.png

free text.png

Components

  • PDF Parsing and Computer Vision Module: Converts PDF to structured text.
  • Free Text Coder: Converts text information about procedure or diagnosis into CPT/HCPCS/ICD-10 codes.

Workflow

NLP Parsing: Extracts text from PDF files and identifies keywords using regular expressions.

Code Extraction and Validation: Uses pattern matching and the Free Text Coder to convert values into codes, and validates against a code dictionary.

RegEx Extractor: Tries to extract codes from text using regular expressions.

Code Validator: Verifies obtained codes against a dictionary.

TFIDF Extractor: Extracts codes based on text similarity, returning the nearest code in the vocabulary.

CD_3.png

  • Computer Vision: Converts PDF to image, fitting it into a model to yield text blocks. Locates keywords in boxes using letterwise matching and Levenstein distance.

  • Merge Outputs: Combines NLP and Computer Vision outputs to create the final Autoencoder output.

  1. Denial Brain

Data Preparation:

  • LabelEncoder is used for categorical variables.
  • The target set to the amount paid for the claim.

Data Processing

  • The dataset contained above 7M samples, split into test and train sets (20%/80%).
  • The most important features were the bill amount and procedure.

Model:

  • Used a Random Forest as a regression algorithm to predict the paid amount.
  • Hyperparameters: n_estimators=10, max_depth=None, max_features='auto', min_samples_split as needed.

Performance Evaluation: Used the mean squared error (MSE) score to estimate prediction performance.

icon
Impact

CD_5.png

icon
Tech Stack

CD_4.png

Programming Languages

  • Python 3
  • SQL (specifically AWS Redshift dialect)

AI Stack:

  • Prediction Modelling: Random Forest Extreme Gradient Boosting Machine Generalized Linear Models

  • Computer Vision: Image Segmentation Text Detection

Optical Character Recognition (OCR)

  • Natural Language Processing: Regular Expression TF-IDF Models

Containerization: Docker

Frameworks and Libraries:

Data Manipulation and Analysis:

  • NumPy
  • pandas Visualization:
  • matplotlib
  • seaborn

Machine Learning and AI:

  • sci-kit-learn
  • xgboost
  • TensorFlow
  • Keras

Database Connectivity:

  • psycopg2
  • pypyodbc

Web Frameworks:

  • Flask
  • Werkzeug Utilities:
  • easydict
  • tqdm
  • Cython

Computer Vision and Image Processing:

  • opencv-python
  • pdf2image
  • python-dfbox
  • scipy

We developed a solution to reduce claim denial rates and streamline payment processes. Our AI model automates claim processing, cutting human capital costs and speeding up payment plans, which shortens accounts receivable (A/R) times. As the first nationwide solution for managing data claims, it suits hospitals and large physician practices. With a deep understanding of claim denial management, we're ready to adapt our model to other countries' healthcare systems.

astronaut