I-Com Data Science Hackathon: 36 Hours – 18 Teams – 2 Problems
The I-COM Data Science Hackathon is a 36-hour marathon, where competing teams develop algorithms using data science analytics to solve predictive modelling challenges on marketers’ datasets.
This year, it was hosted in the beautiful Cruise Terminal in Porto. Unilever and Intel provided the challenges for teams. The attending teams were a mix of academic data scientists (universities), analytics and marketing specialists (agencies).
The team from OMD EMEA took on the Intel challenge –
Business Challenge: What is the impact of discussions in social media and brand health indicators on advertising effectiveness for high consideration purchases such as consumer PC sales in the US?
Prediction Challenge: Predict the sales revenue by CPU brand/device brand combination by month for Jan and Feb 2017.
A sample of the data was provided by sponsors and Intel in advance of the Hackathon for teams to interrogate. The shared data included social (twitter volume), Millward Brown brand health survey data, search, ad spend and sales data.
To cover all aspects, OMD EMEA sent a team with a blended skill set to approach it. Our team included Paul Cuckoo (Global Channel Planning Manager), Harry Daniels (Analyst), Cate McVeigh (Head of Marketing Sciences, Intel team) and Adam Abu-Nab (Social Intelligence Exec Director).
The Hackathon –
The OMD EMEA team created a predictive sales estimator from a combination of MMM (marketing mix models) and data output to show how variable ad spend can affect revenue.
What worked: Consideration was most effective in predicting sales. The consideration data from Millward Brown was effective in allowing us to predict revenue. Consideration was shown to be a strong driver of revenue and we were able to isolate a strong December/Christmas trend.
What didn’t work: Twitter data effects. We weren’t able to truly isolate the effects of the twitter data on media effectiveness.
Finalist teams from Ebiquity and Analytic Partners also presented MMM solutions but instead used a nested approach. This approach carves out relationships with twitter/brand health and spend first, before nesting this in a final revenue model.
Interestingly, dashboard solutions were also presented as outputs. These dashboards could forecast spend required to meet revenue targets based on brand health/twitter indicators.
Our key takeaways –
Do your prep: Sample data prep is the secret ingredient to success for Hackathons. Teams which pre-formatted and did as much data prep work in advance of the hackathon freed up valuable time. This meant they had more time to spend on modelling their solutions, as well as conceptualising and visualising the story they wanted to tell.
A fuller data eco-system is needed for business application: A challenge for all teams was the limitations of the data sources provided. For there to be actual business applications, a fuller eco-system of data sources and metrics could be provided.
For example, a common problem teams faced was the search and social data provided (tweets) was solely volume over time and a mix of owned (brand driven) and earned (user) mentions within that. This limited data caused predictable peaks around owned campaign activity and campaign seasonality trends (Black Friday, Christmas, Apple Launches). This volume was also a mention and not the reach of a mention, which could prove a stronger correlation with ad effectiveness/intent/sales.
The semantics are equally important as the numbers: For social to be used as an indicator for purchase, you need to be able to cut where the real user discussion is happening and the richer semantics out of it. For example, are these mentions positive, negative, intent or consideration based? Can they be correlated and validated with intent/consideration survey data from Millward Brown? For search, what is the context in which people are searching for your brand, not just the volume?
At OMD EMEA, we have the capability to use tools that can cut social and search in these more meaningful ways. There’s also the differing audience discussion environments that need to be considered. A parallel test we ran using our social tools found that more Intel sales/intent discussions and social video views were happening on wider social platforms. For example, YouTube/twitch were platforms that resonated with gamers, while forums were preferred for B2B tech-heads in particular.
Given the nature of a hackathon, it’s understandable that the amount of data provided to teams is managed so that a solution can be turned around in 24 hours. What it has allowed teams to do is test some interesting ideas and models, take these and plug them into the broader data sets they have to work with during their day to day.
MMM still the most useful for marketers
Data science is an exciting field with new techniques that can revolutionise accurate predictions with minimal data. However, to properly answer business questions, regression modelling in the form of MMM has a long way to go before it’s beaten. Feeding that model with all the correct data sources is key to its accuracy.