Ticket Sales Prediction

Published: January 22, 2025

# Entertainment

# Data Science

# NLP

In this case study, we developed a sophisticated ticket sales prediction system designed to evaluate the potential success of events using advanced data analytics. The system generates insights based on historical data, audience sentiment, and economic factors, providing actionable outputs.

Challenge

We were commissioned to create a prediction system that would return several outputs, as follows: Likelihood of an event market success presented as a percentage when comparing the following elements to historical data points:

Type of event;
Participants of the event (if it includes participants of past events that won awards);
Similar events that won awards;
The venue of the event;
The opening date of the event;
The number of seats in the venue.

Social media followers broken down by channel and categories:

Age;
Geographic area;
Gender;
Event name;
How many comments they left in total broken down by channel.

Comments on an event:

Classification of the comment sentiment as positive or negative;
Finding out if the person who left the comment on the event or participants attended the event (people who attended an event mention that they have bought a ticket).

Reviews published in social media by writers that work in newspapers:

Positive or negative
Comparing histories of running time for events that had positive reviews versus those that had negative reviews.

Channel, categories, and sentiment break down of social media followers for each type of event from both sides of participant and organizer.

Analysis of the geographic data to find out how many participants traveled to the event and from what region.

Analysis of economic data that correlates with the willingness of venue goers to spend money to attend an event.

Analysis of event visitors based on social comments, sentiment, and tone.

Solution

To fulfill the task, we have developed a system that provides an AI-driven explanation and presents the result itself and the success factor behind it.

The model of choice, a boosted regression trees ensemble, is a supervised ensemble model used for regression tasks. Regression trees and random forests provide an easy way of explaining each prediction. In this way, we can trace the decision and find the most critical inputs and their importance scores for each case. Another output of the model, together with a prediction, is a list of contributions for each input.

Development Journey

The development of the product was a multi-stage process that combined several technologies for different subtasks.

For such subtasks as sentiment analysis, text categorization, entities detection, adding news from Discovery News Collection, we used the following tools:

random forest regressor, lasso regression, elastic-net regression from scikit-learn python library;
boosted regression trees from XGBoost library;
sentiment analysis, text categorization, entities detection from Watson NLU;
getting news from Discovery News Collection of Watson Discovery Service;
scrapy for scraping data from different sources;
PostgreSQL for storing all the data;
AWS as a cloud machines provider;
Nominatim as a geocoding service.

Impact

Below we show an example of prediction for some events based on the data obtained for similar events, where the orange line represents the system prediction, and the blue line shows the actual numbers.

RELATED CASE STUDIES

ML-Driven ECG Interpretation for Decision-Making

Cardiovascular diseases (CVDs) are a major global health concern, accounting for a significant number of deaths each year and being a leading cause of mortality worldwide. Accurate and timely diagnosis of CVDs is crucial for effective treatment and improved patient outcomes. The gold standard used for screening and diagnosing CVDs is Electrocardiography (ECG). However, accurately interpreting ECG results can be challenging for healthcare professionals. In this case study, we explored the implementation of Machine Learning (ML) for ECG recognition to enhance diagnostic accuracy and enable timely interventions.

# Healthcare

# Data Science

# AI / ML

A private research organization focused on evaluating the safety and effectiveness of medications. Their goal is to use advanced data analysis to support healthcare decisions and meet regulatory standards. For this project, they aimed to assess the safety of hydroxychloroquine, used alone or with azithromycin, for treating rheumatoid arthritis. The study focused on identifying short-term side effects and long-term risks, especially related to heart health and the use of multiple medications together. The client needed a data-driven approach to fill gaps in existing evidence by combining different clinical data sources, identifying patient groups with similar characteristics, and addressing factors that could affect treatment outcomes.

# Healthcare

# Data Science

Cohort Definition and Building

The OMOP CDM is widely recognized as the industry standard for observational health research. It provides a standardized data model that facilitates data integration and sharing across different sources, enabling researchers to conduct studies at scale. ATLAS is a central tool used for research within the OMOP CDM ecosystem, providing a user-friendly interface for querying the data model and creating visualizations. Building well-defined cohorts is a critical first step in conducting research using the OMOP CDM. By selecting and defining cohorts of patients with specific characteristics or conditions, researchers can ensure that their studies are focused and relevant, and can generate reliable evidence.

# Healthcare

# Data Science

View all Case Studies