We were commissioned to create a prediction system that would return several outputs, as follows: Likelihood of an event market success presented as a percentage when comparing the following elements to historical data points:
Social media followers broken down by channel and categories:
Comments on an event:
Reviews published in social media by writers that work in newspapers:
Channel, categories, and sentiment break down of social media followers for each type of event from both sides of participant and organizer.
Analysis of the geographic data to find out how many participants traveled to the event and from what region.
Analysis of economic data that correlates with the willingness of venue goers to spend money to attend an event.
Analysis of event visitors based on social comments, sentiment, and tone.
To fulfill the task, we have developed a system that provides an AI-driven explanation and presents the result itself and the success factor behind it.
The model of choice, a boosted regression trees ensemble, is a supervised ensemble model used for regression tasks. Regression trees and random forests provide an easy way of explaining each prediction. In this way, we can trace the decision and find the most critical inputs and their importance scores for each case. Another output of the model, together with a prediction, is a list of contributions for each input.
The development of the product was a multi-stage process that combined several technologies for different subtasks.
For such subtasks as sentiment analysis, text categorization, entities detection, adding news from Discovery News Collection, we used the following tools:
Below we show an example of prediction for some events based on the data obtained for similar events, where the orange line represents the system prediction, and the blue line shows the actual numbers.