Data journalism – is it worth it?

IN PUBLISHING – By Paul Bradshaw

Whether it is the desire to replicate the enormous sales successes of the MPs’ expenses and WikiLeaks revelations, or publishers wanting to expand into selling data services, it seems everyone wants to do something with data. The only question, writes Paul Bradshaw, is: where to start?

When Simon Rogers first asked to publish data on the Guardian website, someone asked: “Who on earth would want to look at a spreadsheet online?” It turned out that over 100,000 people would regularly hit the website to do just that. One person’s audit, it seemed, was another’s sticky content. And the past few years have seen data transformed from conversation killer to hot topic – in both newsroom and boardroom.

Tapping into development talent

For some publishers, the advantage of a data-driven approach to news production is that it allows them to tap into latent development talent within the readership. The Guardian and the New York Times are among an increasing number of media organisations to publish APIs – Application Programming Interfaces – that allow web developers to build new products with their content and – equally importantly – the data surrounding it. In return, the new services can carry advertising sold by the publisher, drive new traffic to the original site, or act as market research to demonstrate demand for a more developed proposition (as happened, for example, with the Guardian’s mobile app).

To stimulate this development, organisations organise ‘Hack Days’ where developers are invited to spend a day or a weekend creating quick editorial ‘hacks’. The investment is minimal when compared to the cost of doing everything in-house: a small amount of staff time, and a lot of pizza.

Hack day events have led to all sorts of outcomes from personalised mobile editions, applications which would alert people to events and route them to the location, even a tool which suggests recipes based on an image uploaded by the user. The Guardian say they benefit from “being able to reach new markets that we might not otherwise find. We grow our vertical ad network through high quality partners [taking part in hack days]. We’re also able to offer our end users innovative, clever and useful interactive services provided by experts outside of our domain.” [Read more…]

7 ways to get data out of PDFs

HELP ME INVESTIGATE – By  Paul Bradshaw

A frequent obstacle in data journalism is when the information you want to analyse is locked away in a PDF. Here are 6 ways to tackle that problem – with space for a 7th:

1) For simple PDFs: Google Docs’ conversion facility

 

Google Docs recently added a feature that allows you to convert a PDF to a ‘Google document’ when you upload it. It’s pretty powerful, and about the simplest way you can extract information.

 

It does not work, however, if the PDF was generated by scanning – in other words if it is an image, rather than a document that has been converted to PDF.

 

2) For scanned documents and pulling out key players: Document Cloud

 

Document Cloud is a tool for journalists to convert PDFs to text. It will also add ‘semantic’ information along the way, such as what organisations, people and ‘entities’ such as dates and locations are mentioned within it, and there are some useful features that allow you to present documents for others to comment on.

 

The good news is that it works very well with scanned documents, using Optical Character Recognition (OCR). The bad news is that you need to ask permission to use it, so if you don’t work as a professional journalist you may not be able to use it. Still, there’s no harm in asking. [Read more…]

 

Nato operations in Libya: data journalism breaks down which country does what

THE GUARDIAN – By 

How much is each Nato country contributing to operations in Libya? Here’s the most comprehensive analysis yet of who is doing what
• Get the data

Nato in Libya graphic

Nato operations in Libya, data journalism breaks them down. Click image for full graphic

Nato‘s Libya operations are costing millions and involving thousands of airmen and sailors. But who’s contributing to Operation Unified Protector? That’s the official name for the attacks on the Gadaffi regime’s bases and tanks by Nato aircraft and ships, plus the enforcement of the no-fly zone and the arms embargo.

Data journalism can help us find out. Nato, which has been running operations in Libya since the beginning of April, doesn’t give out details of individual member’s efforts so we went to each country’s defence ministry direct to find out for ourselves.

We wanted to know the answers to some specific questions, ending at the end of the first week of May. We set some very specific parameters: details for the first week of operations, operations taking place week commencing 2 May and totals for the whole operation, ending 5 May. We asked each country:

• How many aircraft, ships and military personnel are in the region?
• How many attacks and sorties has each country been involved in?
• Which base are they operating from?

By combining official responses, scraping the defence ministry websites of each country and news reports, we assembled the most complete breakdown of the Nato operation yet published. [Read more…]

#Sparktweets: Wall Street Journal visualising data in tweets

NEWS:REWIRED – by Sarah Marshall

The Wall Street Journal has started using data visualisation (albeit in a fairly simple form) in tweets, using an online tool called Sparkblocks. The tweets are being called “sparktweets”.

And other so-called sparktweets have since been created:

We tracked the use of the hashtag #sparktweets using Hashtags.org:

Zach Seward’s blog explains how the Wall Street Journal’s unemployment sparktweet came about. He says that the team first tried using Unicode to display graphics in tweets, but found there were problems when viewing on Macs. [Read more…]

16 Awesome Data Visualization Tools

MASHABLE – by 

From navigating the Web in entirely new ways to seeing where in the world twitters are coming from, data visualization tools are changing the way we view content. We found the following 16 apps both visually stunning and delightfully useful.

Visualize Your Network with Fidg’t
Fidg’t is a desktop application that aims to let you visualize your network and its predisposition for different types of things like music and photos. Currently, the service has integrated with Flickr and last.fm, so for example, Fidg’t might show you if your network is attracted or repelled by Coldplay, or if it has a predisposition to taking photos of their weekend partying. As the service expands to support other networks (they suggest integrations with Facebook, digg, del.icio.us, and several others are in the works), this one could become very interesting.

See Where Flickr Photos are Coming From
Flickrvision combines Google Maps and Flickr to provide a real-time view of where in the world Flickr photos are being uploaded from. You can then enlarge the photo or go directly to the user’s Flickr page.

See Where Twitters are Coming From
From the maker of Flickrvision (David Troy) comes Twittervision, which, you guessed it, shows where in the world the most recent Twitters are coming from. Troy has taken things one step further with Twitter vision and has given each user a page where you can see all of their location updates.

New Ways to Visualize Real-Time Activity on Digg
Digg Labs offers three different ways to visualize activity in real-time on the site, building on the original Digg Spy feature.

BigSpy places stories at the top of the screen as they are dugg. Stories with more diggs show up in a bigger font, and next to each one you can see the number of diggs in red:

[Read more…]

Simon Rogers, guardian of the Data Store [VIDEO]

The Guardian is one of the most respected newspaper when it comes to data journalism and data visualizations. Their website has a section dedicated to data where people can enjoy beautiful infographics made by the likes of David McCandless and other data visionaries.

We met with Simon at his Guardian’s desk to talk about the Data Blog and the impact of Wikileaks on journalism. Look out for his tips on good data visualizations!

[vimeo 27072059]

Ad Agency Bloodline [Infographic]

AGENCY SPY

The Barbarian Group has been busy with some pretty interesting projects as of late and here’s yet another notch on the totem. The digital shop sent us this ambitious effort that marks a team-up with newly launched Aquent unit Vitamin Talent and is essentially a lovely visual display of the ad business (including the seven major holding companies and stats on the rest) through its 180 some-odd year history. We’d like to provide you with a worthy enough synopsis for this infographic, but it wouldn’t do it any justice. See full image here and original post from Agency Spy here

Interested in data-driven journalism? Get your voice heard!

The DJB supports good causes and when we heard that the European Journalism Center was doing this survey on data-driven journalism, we couldn’t help but blog about it! By getting involved and answering the survey you could not only win 100€ worth of amazon vouchers  but you would also make a good contribution to the future of data journalism. What a great feeling… No need to say we’ve all done it, what are YOU waiting for?

by  Liliana Bounegru from EJC

The European Journalism Centre (EJC) in collaboration with Mirko Lorenz (Deutshe Welle) created a survey that aims to gather the opinion of journalists on the emerging practice of data-driven journalism and understand their training needs in this field.

Data has always been used as a source for reporting especially by investigative journalists and will play an increasingly important role in journalism in the future. Data-driven investigative operations in the past however involved a lot of resources and time. With the increasing pressure on newsrooms to be more time and cost efficient, they remained a marginal practice.

Why data-driven journalism?

Data-driven journalism enables journalists and media outlets to produce value and revenues without requiring the large investments of time and resources that data-driven investigative operations required in the past, thus holding the potential to more evenly distribute this practice across newsrooms. This is partly due to the increasing availability of open data catalogues which reduces the time required for journalists to get their hands on valuable data, and of free and open tools for data interrogation and visualization that lend themselves to non-expert use, which make data-driven reporting easier to undertake. The most notable data journalism operation in Europe, the Guardian Data Blog, works mainly with Excel or Google spreadsheets and free tools for data interrogation and visualization, and was until not long ago a one-man show, using the potential of crowdsourcing for data analysis at times.

How to understand what journalists need?

To enable more journalists and newsrooms across Europe to tap into the potential of data-driven journalism, the European Journalism Centre plans to organize a series of trainings this year and in the coming year. To understand what journalists need in order to practice data journalism, we created a survey. The survey has 16 questions asking for their opinion on data journalism, aspects of working with data in their newsrooms, and what they are interested in learning.

Answer the survey and get your voice heard!

We’ve had a good start: in a bit over one week over 80 journalists responded. If you are a journalist we would be grateful if you took 10 minutes of your time to take the survey and help us understand what is useful for journalists in order to organize trainings that fit real needs. To say thank you one of the entries will win a 100€ Amazon gift voucher.

The insights from this survey will be made feely available. We would much appreciate also help with tweeting, blogging or forwarding this to relevant people you might know.

 

DATA VISUALISING THE STORY OF FOOD AND EMOTION

OWNI.eu by EKATERINA YUDIN

How do we even begin to visualize and draw connections between the intimately complex relationship that exists between food and emotion? Here is a great article by Ekaterina Yudin that we picked for its compelling data visualisations. You can find the original version on the Masters of Media website, otherwise read on! It is worth it.

Can we discover patterns amongst global food trends and global emotional trends? Could data visualization help us weave a story, and make use of the complex streams of data surrounding food and its consumption, to reveal insights otherwise invisible to the naked eye? And why would we try to do so in the first place?

To begin, let’s just establish that one has an ambitious appetite.

For our group information visualization project we have set out to measure global food sentiment. The main objective of our project matches the very definition of information visualization first put forth by Card et al. (1999) – of using computer-supported, interactive, visual representations of data to amplify cognition, where the main goal of insight is discovery, decision making (as investigated in my last post), and explanation. Our mission is to gauge and visualize, in real-time, the planet’s feelings towards particular foods using Twitter data; does pizza make everyone happy, do salads make people sad, does cake comfort us? Will there be an accordance of food with nations?

Setting the visualization in the backdrop of country GDP and obesity levels we can begin to ponder how the social, political and cultural issues will play out and what reflections of globalization will emerge. Will richer countries be more obese? It should be noted that being restricted to English language tweets for now creates a huge bias in our visualization, and one should keep in mind that the snapshot of data will obviously not be completely representative of the entire world; for example, in developing countries it’s most probable that only rich/modern people speak English AND use Twitter at the same time.

The relationships between all the variables is already an enigmatic one, particularly when each carry their own layers of baggage, so a narrative of complexity emerges even before the visualization can be realized. Incidentally this is the story the data is already beginning to weave, which makes it a perfect calling for data visualization to reduce the complexity, present it in a meaningful way we can understand and use its power of storytelling to understand our puzzling relationships towards food — a story worth discovering.

WHY FOOD?

Food is at the core of our daily survival, with broad-ranging effects on personal health, and a particularly hot topic these days with everyone having some opinion about it — after all, everyone needs it, which makes food intrinsically emotional. So it is no surprise that a wealth of conversations emerge about food when today’s increased citizen interest, health focus and demand for a transparent food industry collide; to top it off, this is all happening amidst concerns of food security, shortages, rising food prices, obesity, hunger, addiction and diseases. With data related to food increasingly open, the benefits of using data visualization, as well as the empowerment that access to layers of hidden information produces, is already being explored on the web.

A brief survey of food visualizations reveal: the ten most carnivorous countries, world hunger visualization, how the U.S.A was much thinner not that long go, snacks available in middle and high school vending machines, calories per dollar, driving is why you’re fat, where Twinkies come from, and so on.

Health issues related to food run high in the corpus of visualizations and it is no surprise. With improved access to information about food (sources, ingredients, effects, consumption statistics, etc.) presented in a visually engaging way, we can begin to distill the essential changes that could then impact our food-purchasing choices, enable better health, and enhance the design of an open food movement. [An additional reel of 60 food/health infographics can be found here].

Food is not just a lifestyle that is essential and important to the world. It can also be one of the most effective ways to reshape health, poverty issues, and relationships; and because it touches all facets of life, it shouldn’t be treated as just a lifestyle’y sort of thing. –Nicola Twilley (FoodandTechConnect Interview)

What’s the insight worth?

Beyond helping discover new understandings amidst a profoundly complicated world where massive amounts of information create a problem of scaling, a great visualization can help create a shared view of a situation and align people on needed action — it can often make people realize they are more similar than different, and that they agree more than they disagree. And it is precisely via stories — which are compelling and have always been used to convey information, experiences, ideas and cultural values — that we can begin to better understand the world and transform the interdependent factors of food and sentiment discussions into a visual form that makes sense. In this way, food – a naturally social phenomenon — can become our lens that reveals patterns in society.

A multitude of blogs, projects and companies such as GOOD’s Food StudiesFood+Tech Connect,The Foodprint Project, innovation series like the interactive future of food research) and lest not forget Jamie Oliver’s food revolution, to name just a few, propel the exploration, understanding and the reshaping of conversation about food, health and technology today and in the future. (Food+Tech Connect, 2011). But it is the newest wave of infographics and data visualizations that seek to draw our attention to epidemics such as food shortages and obesity by illustrating meaning in the numbers for people to truly see and understand the implications.

 

A WEB OF FEELINGS

We also can’t entirely separate feelings from food. People consistently experience varying emotional levels (see Natalie’s post on this very subject) and these play key roles in our daily decision-making. Emotions, too, have now begun to be mapped out in visualizations ranging from a mapping of a nation’s well being to a view of the world mean happiness.

 

 

Taking food and emotion together we come to understand that this data of the everyday paints a picture and hyper-digitizes life in a way that self-portraits and global portraits of food consumption patterns begin to emerge. As psychology researchers have shown us, people are capable of a diverse range of emotions. And because food provides a sense of place – a soothing and comforting feeling — it makes food evoke strong emotions that tie it right back to the people (Resnick, 2009).

Now that we spend a majority of our time online, our feelings and raw emotion, too, find their way to the web. We can visualize this phenomenon with projects like We Feel Fine, which taps into our and other people’s emotions by scanning the blogosphere and mapping the entire range of human emotions (thereby essentially painting a picture of international human emotion), I want you to want me, which explores the complex relationship on love and hope amongst people, Lovelines, which illuminates the emotional landscape between love and hate, and The Whale Hunt, which explores death and anxiety.

What all these visualizations have in common is the critical component of an emotional aesthetic — the display of people’s bubbling feelings that are often removed from visualizations but is the very human aspect we tend to remember. This is in line with Gert Nielsen’s philosophy that he shared with the audience at the Wireless Stories conference early last month — that you can’t take the human being out of the visualization or else you take out the emotion, too; the key, it seems, is data should ‘enrich’ the human stuff and the powerful human stories that are waiting to be captured and told.

MAKING DISCOVERIES AND SPREADING AWARENESS IN A SEA OF DATA

Which brings us to our data deluge world. We’re increasingly dependent on data while perpetually creating it at the same time. But creating data isn’t the question (at least not for Western and emerging countries, whereas producing relevant data for developing countries is still quite a challenge) – it’s whether someone is paying attention to the data, and whether someone is using the data usefully in an even larger question (Resnick, 2009).

The age of data accessibility, information [sharing], and connectivity allows people, cultures and institutions to share and influence each other daily via a plethora of broadcast platforms available on the web; these function as a public shout box for daily chatter, emotional self-expression, social interaction, and commiseration. Twitter – the social media network, twenty-four-hour news site and conversation platform that connects those with access across the world — is also the chosen data pool for our project. It’s a place to share just as much as it is to peek into other lives and conversations. And precisely because it’s a place where millions of people express feelings and opinions about every issue that the distillation of knowledge from this huge amount of unstructured data becomes a challenging task. In this case visualization can serve to extend the digital landscape to better understand broadcasts of human interaction. Our digital lives, and conversations within them, are full of traces we leave behind.  But by transcoding and mapping these into visual images, representations, and associations, we can begin to comprehend meanings and associations.

Twitter is also a narrative domain, and serves as a platform for Web 2.0 storytelling – the telling of stories using Web 2.0 tools, technologies, and strategies (Alexander & Levine, 2008). Alexander and Levine (2008) distinguish such web 2.0 projects as having features of micro-content (small chunks of content, with each chunk conveying a primary idea or concept) and social media (platforms that are structured around people). With the number of distributed discussions across Twitter, a new environment for storytelling emerges — one we will explore to uncover and analyze global patterns amongst conversations surrounding food sentiment.

SO WHAT’S THE FOOD + EMOTION STORY?

As put forth by Segel & Heer (2009), each data point has a story behind it in the same way that every character in a book has a past, present, and future, with interactions and relationships that exist between the data points themselves. Thus, to reveal information and stories hiding behind the data we can turn to the storytelling potential of data visualization, where visualization can serve to create new stories and insights that can ultimately function in place of a written story. These new types of stories — ones that are made possible by data visualization — empower an open door for the free exploration and filtering of visual data, which according to Ben Shneiderman also allow people to become more engaged (NYTimes, 2011).

To date, the storytelling potential of data visualization has been explored and popularized by news organizations such as the NY Times and the Guardian, where visualizations of news data are used to convince us of something (humanize us), compel us to action, enlighten us with new information, or force us to question our own preconceptions (Yau, 2008). There is a growing sense of the importance of making complex data visually comprehensible and this was the very motivation behind our project; of linking food and emotion sentiment with country GDP and obesity to see if insightful patterns emerge using this new visual language. With our visualization still in progress, and data still dispersed, I’m still wondering what’s the story and what could the story of our visualization become? Will the visualization of our data streams produce something insightful? What will we be able to say about how people feel towards foods in different countries? At this point it’s only a matter of time until we dig deeper into the complexities of our real world data ti understand the (food <–> emotion) <–> (income <–> obesity) paradox.

This post was originally published on Masters of Media

Photo Credits: The New York TimesR. Veenhoven, World Database of Happiness, Trend in Nations, Erasmus University RotterdamWorld Food ProgramGOOD and HyperaktA Wing, A prayer, Zut Alors, Inc. and GOOD, and Flickr CC Kokotron

References:

Alexander, B. & Levine, A. (2008). “Web 2.0 Storytelling: Emergence of a New Genre”. Web. Educause. Accessed on 19/04/11

Card, K.S., Mackinlay, J. D., & Shneiderman, B. (1999). “Readings in Information Visualization, using vision to think”. Morgan Kaufmann, Cal. USA.

Resnick, M. (2009). “The Moveable Feast of Memory”. Web. PsychologyToday.com. Accessed on 20/04/11

Segel, E. & Heer, J. (2010). “Narrative Visualization: Telling Stories with Data”.

Singer, N. (2011). “When the Data Struts Its Stuff”. Web. NYTimes.com. Accessed on 19/04/11

Yau, N. (2008). “Great Data Visualization Tells a Great Story”. Web. FlowingData.com. Accessed on 20/04/11


Opening the data, with Rufus Pollock [AUDIO]

Rufus Pollock is the co-founder of the Open Knowledge Foundation. He spent the past few months travelling across Europe to promote the raise of open data and to make people aware that they need more of it as well as more transparency from their government and big organisations.

We met with him in the busy Hub in London to ask him why this “data openness” isn’t widespread yet…

[audio:https://www.datajournalismblog.com/wp-content/uploads/2011/05/Rufus-Pollock-for-DJB1.mp3|titles=Rufus Pollock for DJB1]