Master Europa

Dundee DDD conference

I’ve spent the last few days at the University of Dundee, working on my masters project and preparing for the DDD day.
Here’s a quick review of the sessions I went too – it was a great day, shame it’s only once a year

Data mining the social web

Gary’s (@garyshort) talk focused on Twitter for marketing analysis. He’s an accomplished very natural presenter, easily held my attention for an hour, so not much else to say on that

I picked out the calculations used  to measure the data with, I probably would have called the session reporting on the social web rather than data mining but maybe that’s being too pedantic and not in the spirit of the day.

Posts by time of day – if people are posting, they’re probably reading too.
Busiest hour = best time put out marketing messages

Acceleration graph, how fast do they start to talk. 1st differentiation
Shows conversations – how engaging is the product?
Use standard deviation to alert when acceleration falls outside the norm

Share of tweets by day, in market space you’re in. Buzz or volume of tweets
Doesn’t show sentiment, but all tweets will  be referenced by search engines
I thought this would be difficult to measure in practice, getting all the tweets for the market space you’re in could end up with many different definitions.

Top 10 words by frequency. tab
Reveals words to use in marketing – google adwords

Most frequent posters – Influencers

Most retweets – also influencers, remove the bots (who’s tweets are retweeted the most)
A measure of how much people agree with what you say

Evangelist engagement – replies by post (number of replies / number of posts)
Indication of quality of relationship

Lexical diversity – measurement of vocabulary (distinct words / all words)
Indication of new content injected into community for each new tweet.

10% equals 1 in 10 tweets contains new information

What do conversations look like.
Outliers from main network are called cliques – need to be brought into network to be engaged

Where are my customers – geospatial info. Shows where the tweet was made from

How do I make links more effective, link in 2/3 of tweet
Bit.ly api to show how many links were clicked…

Sentiment – 48hours, minimal, hygiene
Trigram, co-locations

Overall a very good presentation, I would have liked to hear more on how the location data works and the sentiment info. I have no idea how a trigram works! Also shame to not see the code but that boy can talk LOL

SQL, one language to rule them all

Duncan Irving presented  on this topic which showed the Teradata approach to data management following the acquisition of Asterdata.

Good breakdown of data related tasks into 3 areas

Knowledge discovery -data science, deriving new informatio – vaguer questions, how does something respond to factors
Decision support.  Operational – business grade data
Deep freeze – data storage from an Ito point of view

Then looked at the overlap between KD and DS as operational usage and the elements of data analysis and the users.
Mining – statisticians
Management- DBA, data architects
Analysis- marketeers
Development- programmers

I like this definition of the law of big data
more data outperforms more complex models.
I’d not heard that before… Also the phrase “repurposing data” is a great one, changing the way data is a analysed, stored or formatted to fit a new purpose.

the rise of investigative analytics requires an investigative architecture

Duncan gave a brief overview of the day to role of a data scientist (big buzzword today)
comp science, Maths, data mining, choosing when data to warehouse or distributed processing is appropriate.
Integration
Investigation
Implementation – feedback to integrate
Output data driven products, insight for decisions, data warehouse

Finishing with the different types of data activity and the EDW should be good at handling all of them.

The Teradata approach is to bring the world of nosql under the RDBMS umbrella using the as asterdata connectors which allow SQL querying of all data.

Nice idea to fit it all together if you can afford it! One of the main ideas behind hadoop is cheap commodity kit, not sure how that translates to the teradata world.

It must be difficult for speakers tied to a particular company to present at an open event like ddd but Duncan did an excellent job, it never felt like a sales pitch, was clear and he’s an entertaining speaker also loved the oil and gas seismology images

Mobile CouchDB

Next i saw the talk on Couch db mobile by Dale Harvey
Nice relaxed style with clear simple slides (even if it was on a mac )
Started with some great stats
3 b users online by 2015
With 15b connected devices

Mobile Issues with
Reliability
Latency
Bandwidth
Securityin
Topology

0.5 sec delay = 20% drop in traffic

I knew nothing about DBs on mobile devices or couch, I thought dale pitched it just right so that a novice like me or some of the more experienced programmers could get something out of it. I’ll definitely download the app on my iPad and have a play around… I like the idea of creating a shopping list that gets sent to my wife’s phone

Getting Started with Hadoop

Next up was my talk on getting started with Hadoop, I’ll leave you to tell me what you thought!

My impression was that,as usual, I speak too fast, miss stuff out and bugger about with the pc too much but hey, I’m a geek that’s all allowed! Only one bit of code messed up when I forgot to transfer the tweet file to hdfs and I didn’t trip up or knock anything over.

I must get in contact with Gary short re: creating all his measures in Hadoop

Then I had to shoot off to get to Edinburg airport to fly home, so missed prof. Whitehorn’s talk – hope Andy videoed it!

Leave a Reply

Your email address will not be published. Required fields are marked *