logo

Ticket Sales Prediction

Published: January 22, 2025
# Entertainment
# Data Science
# NLP
In this case study, we developed a sophisticated ticket sales prediction system designed to evaluate the potential success of events using advanced data analytics. The system generates insights based on historical data, audience sentiment, and economic factors, providing actionable outputs.

Challenge

We were commissioned to create a prediction system that would return several outputs, as follows: Likelihood of an event market success presented as a percentage when comparing the following elements to historical data points:

  • Type of event;
  • Participants of the event (if it includes participants of past events that won awards);
  • Similar events that won awards;
  • The venue of the event;
  • The opening date of the event;
  • The number of seats in the venue.

Social media followers broken down by channel and categories:

  • Age;
  • Geographic area;
  • Gender;
  • Event name;
  • How many comments they left in total broken down by channel.

Comments on an event:

  • Classification of the comment sentiment as positive or negative;
  • Finding out if the person who left the comment on the event or participants attended the event (people who attended an event mention that they have bought a ticket).

Reviews published in social media by writers that work in newspapers:

  • Positive or negative
  • Comparing histories of running time for events that had positive reviews versus those that had negative reviews.

Channel, categories, and sentiment break down of social media followers for each type of event from both sides of participant and organizer.

Analysis of the geographic data to find out how many participants traveled to the event and from what region.

Analysis of economic data that correlates with the willingness of venue goers to spend money to attend an event.

Analysis of event visitors based on social comments, sentiment, and tone.

Solution

To fulfill the task, we have developed a system that provides an AI-driven explanation and presents the result itself and the success factor behind it.

The model of choice, a boosted regression trees ensemble, is a supervised ensemble model used for regression tasks. Regression trees and random forests provide an easy way of explaining each prediction. In this way, we can trace the decision and find the most critical inputs and their importance scores for each case. Another output of the model, together with a prediction, is a list of contributions for each input.

Development Journey

The development of the product was a multi-stage process that combined several technologies for different subtasks.

For such subtasks as sentiment analysis, text categorization, entities detection, adding news from Discovery News Collection, we used the following tools:

  • random forest regressor, lasso regression, elastic-net regression from scikit-learn python library;
  • boosted regression trees from XGBoost library;
  • sentiment analysis, text categorization, entities detection from Watson NLU;
  • getting news from Discovery News Collection of Watson Discovery Service;
  • scrapy for scraping data from different sources;
  • PostgreSQL for storing all the data;
  • AWS as a cloud machines provider;
  • Nominatim as a geocoding service.

Impact

Below we show an example of prediction for some events based on the data obtained for similar events, where the orange line represents the system prediction, and the blue line shows the actual numbers.

01_Ticket.jpg

02_Ticket.jpg

03_Ticket.jpg

04_Ticket.jpg

RELATED CASE STUDIES

View all Case Studies