Tales from the War Room- I-COM Data Science Hackathon 2018
Following OMD’s first entry into I-COM’s annual Hackathon in Porto in 2017; Paul Cuckoo, Chris Morris, Giuseppe Angele, and myself: Aaron Brace flew to San Sebastian the northern-most point of Spain to compete in this year’s event. The Hackathon itself consists of a gruelling 24 hours, answering two components of a data and marketing challenge. The sponsors this year (Intel) set the challenge of predicting January 2018’s digital engagement of artificial intelligence, using only limited historical data of indexed Google Trends, Kantar’s reporting of media spend, Twitter comments, and Web activity over the period. The second component, the qualitative component, was developing the right marketing strategies to be seen as a leader in AI, and the best tactics to engage customers and audiences . Competing with us were a blend of media agencies, specialist data-science consultancies and major European academic institutions, commercial/financial companies – needless to say, the competition was fierce.
The enormity of the situation left me awash like the unexpectedly stormy Spanish weather ambushing teams en route, as we were allocated our room – a 30-ft ceilinged, windowless music rehearsal space. With the acoustics designed for instrumental practice, all echo and reverberation was removed from our voices, our now alien timbre made the room feel low on oxygen. This is where it would be won and lost, our War Room until we’d hand over the file to the competition jury.
Our ambition was to develop not only an accurate predictive model, but to also develop an application or tool that allows a user to simulate predictions and visualise core components of the data used. We began isolating our KPI – converting 13 million tweets in the AI space, and a similar amount of web traffic hits. Once cleaned, aggregated, and processed, these would become our two key variables to predict for the month of January 2018, forming the first component of the challenge. We hypothesised however that the Twitter data may also provide another function: if we could determine the frequency of specific keywords in social data (a list determined from the mining of keywords from web-traffic URLs and search terms), this might allow us to determine specific brand and keyword engagement, and the extent to which this might be associated with brand perceptions, or predicted by brand specific media spend.
Despite our best preparations with the small amount of sample data that was provided ahead of time, the period from 9am until after lunch was primarily focused on data wrangling; getting the data in the format required for the modelling and prediction work to begin. Building the models long into the small hours, Paul and Giuseppe knew that they had to produce predictions with sufficient time for myself to feed the data into the data tool, and for Chris to be able to construct the story of the presentation, and the ‘strategic implications for leadership section’ of the task. By 5am the models were complete, Paul and Giuseppe having utilised Gradient Boosting and ARIMA time series algorithms to make their predictions, the stage was set for myself to frantically pipe the data into our application and finalise the functionality, and for Chris to craft the media tale; to make some sense of the last 20 hours of chaos. With the 9.30am deadline rapidly looming and having not slept, there was still sufficient time for me to have some (not metaphorical) last minute data frustrations, before Paul ran to submit our USB drive at 9.28am – a cool 2 minutes to spare, never in doubt.
We now waited for the presentation of findings which were split into two groups: Tier 2 (designed for those with less experience, or University entrants) and intimidatingly, our tier, Tier 1 (for experts or the more experienced data scientists). The submissions and stories presented from teams were vast in the qualitative element; the University of Kiev presenting a solution derived from using conferences to drive social engagement in the AI space and Ekimetrics proposed a solution and strategy built closely around the use of traditional print media. Other teams’ solutions for future strategy were constructed around the concept of tweet sentiment: using positive sentiment as a measure of engagement with a brand – the extent to which sentiment could show sufficient variation to understand engagement predictability in tech contexts in my opinion might however have been optimistic. Our solution was primarily built around the idea of AI strategy piggybacking off of ‘cognitive consonance’ – aligning targeting for the product with the product: if you want to influence through technology – use technology. With only 6 minutes to showcase 24 hours’ work – Chris’ presentation had to be more than concise, and it was over. Whilst we narrowly missed out on being chosen as a top-two finalist, our position did leave us the best performing media agency in Tier 1.
One key learning from competing across this 24-hour period was that you need everything to go your way early on; from choice of models, choosing efficient code to merge and manipulate your data, even down to choices of technology at hand. 24 hours in hindsight is a very short period to start playing from behind – which we did at times despite being satisfied with our performance.
We learned a huge amount; understanding other teams’ approaches to data challenges, and through understanding industry engagement and sentiment with the proposed strategies first hand – we have gained key experience of how data-science is viewed by tech leaders like Intel themselves, and how even pioneering brands in the tech space see the value of data-science in marketing in critically developing business solutions as AI becomes more relevant in our space and across the industry.
We’d love to chat to anyone interested in our experiences, or anyone who’d feel like there is something they could take from our learnings going forward, from a data-science, analytics, tool building, or data storytelling strategy perspective. Please contact us at [email protected]