Editor’s Note: We found this great example of data mining and thought it would be a shame not to share it with you. Neil Kodner analysed the data from all the tribute messages that were sent to Apple after Steve Jobs passed away and checked for patterns and trends in what people were saying. Here is how he did it…
Neil Kodner.com
Two weeks have passed since Apple’s Co-Founder/CEO Steve Jobs passed away. Upon his passing, Apple encouraged people to share their memories, thoughts, and feelings by emailing rememberingsteve@apple.com. Earlier this week, Apple posted asite (http://www.apple.com/stevejobs) in tribute to Steve Jobs. According to the site, over a million people have submitted messages. The site cycles through the submitted messages.
I decided to take a closer look at what people are saying about Steve Jobs, as a whole. Looking at how the site updates, it appears to use Ajax to retrieve and display new messages. Using Chrome’s developer tools, I monitored the requests it was making to get the new messages.
Once I found the location of the individual messages, it was trivial to download all of them. The message endpoint URLs are in the format
and a sample message looks like
The site makes a request to http://www.apple.com/stevejobs/messages/main.json which returns
So it appears that it cycles through 10975 messages. I didn’t decompose the javascript powering the site to determine this, I just made an assumption. I tried querying values greater than 10975 and they returned 404. I wrote a quick python program to download the messages:
So now, we have over ten thousand tribute messages saved to the file stevejobs_tribute.txt. What I was most interested in seeing how many of these messages contain a reference to a certain Apple product.
I came up with a few search terms based on some legendary Apple product names including
- Newton
- Macintosh
- MacBook
- iBook
- Mac
- iPhone
- iPod
- iMac
- iPad
- Apple II family
- OSX
- iMovie
- Apple TV
- iTunes
- LaserWriter (yes, Laserwriter)
Each product received an entry in a python dictionary. The value is another dictionary containing a regex for the product name and a count for the running totals. Some of the regular expressions are as simple as testing for an optional s at the end of the product name, some are a little more complex – check the Apple II regular expression to match all of entire product Apple 2 line. As I’m ok but not great with regular expressions, I welcome your corrections.
Here’s a screenshot of me testing the Apple II regular expression, using the excellent Regexr.
Overall, out of 10975 messages downloaded(as of now), 2,186, or just under 20% mentioned an apple product by name. Here’s the breakdown of the products mentioned:
More than one out of every ten messages included a reference to a Mac! Nearly one in ten mentioned an iPhone – not bad for a device that’s been out a fraction of the time the Mac has been available. [
Read more…]