Data-Driven Journalism In A Box: what do you think needs to be in it?

The following post is from Liliana Bounegru (European Journalism Centre), Jonathan Gray (Open Knowledge Foundation), and Michelle Thorne (Mozilla), who are planning a Data-Driven Journalism in a Box session at the Mozilla Festival 2011, which we recently blogged about here. This is cross posted at DataDrivenJournalism.net and on the Mozilla Festival Blog.

We’re currently organising a session on Data-Driven Journalism in a Box at the Mozilla Festival 2011, and we want your input!

In particular:

  • What skills and tools are needed for data-driven journalism?
  • What is missing from existing tools and documentation?

If you’re interested in the idea, please come and say hello on our data-driven-journalism mailing list!

Following is a brief outline of our plans so far…

What is it?

The last decade has seen an explosion of publicly available data sources – from government databases, to data from NGOs and companies, to large collections of newsworthy documents. There is an increasing pressure for journalists to be equipped with tools and skills to be able to bring value from these data sources to the newsroom and to their readers.

But where can you start? How do you know what tools are available, and what those tools are capable of? How can you harness external expertise to help to make sense of complex or esoteric data sources? How can you take data-driven journalism into your own hands and explore this promising, yet often daunting, new field?

A group of journalists, developers, and data geeks want to compile a Data-Driven Journalism In A Box, a user-friendly kit that includes the most essential tools and tips for data. What is needed to find, clean, sort, create, and visualize data — and ultimately produce a story out of data?

There are many tools and resources already out there, but we want to bring them together into one easy-to-use, neatly packaged kit, specifically catered to the needs of journalists and news organisations. We also want to draw attention to missing pieces and encourage sprints to fill in the gaps as well as tighten documentation.

What’s needed in the Box?

  • Introduction
    • What is data?
    • What is data-driven journalism?
    • Different approaches: Journalist coders vs. Teams of hacks & hackers vs. Geeks for hire
    • Investigative journalism vs. online eye candy
  • Understanding/interpreting data:
    • Analysis: resources on statistics, university course material, etc. (OER)
    • Visualization tools & guidelines – Tufte 101, bubbles or graphs?
    • Acquiring data
  • Guide to data sources
  • Methods for collecting your own data
  • FOI / open data
  • Scraping
    • Working with data
  • Guide to tools for non-technical people
  • Cleaning
    • Publishing data
  • Rights clearance
  • How to publish data openly.
  • Feedback loop on correcting, annotating, adding to data
  • How to integrate data story with existing content management systems

What bits are already out there?

What bits are missing?

  • Tools that are shaped to newsroom use
  • Guide to browser plugins
  • Guide to web-based tools

Opportunities with Data-Driven Journalism:

  • Reduce costs and time by building on existing data sources, tools, and expertise.
  • Harness external expertise more effectively
  • Towards more trust and accountability of journalistic outputs by publishing supporting data with stories. Towards a “scientific journalism” approach that appreciates transparent, empirically- backed sources.
  • News outlets can find their own story leads rather than relying on press releases
  • Increased autonomy when journalists can produce their own datasets
  • Local media can better shape and inform media campaigns. Information can be tailored to local audiences (hyperlocal journalism)
  • Increase traffic by making sense of complex stories with visuals.
  • Interactive data visualizations allow users to see the big picture & zoom in to find information relevant to them
  • Improved literacy. Better understanding of statistics, datasets, how data is obtained & presented.
  • Towards employable skills.

KF Alumn to lead Knight Mozilla Effort

Photo by Daniel X. O'Neill

KNIGHT GARAGE – By PAM MAPLES

Dan Sinker, a 2008 Knight Fellow, is joining Mozilla to lead the Knight-Mozilla News Technology Partnership.

The program is funded by the Knight Foundation and run by Mozilla, makers of the Firefox web browser. The goal is to help create deeper collaboration between journalists and technologists through a series of design challenges like this one in San Francisco last spring,  learning labs and a fellowship program that puts developers in residence at newsrooms around the world. This year, the partner newsrooms for fellows are Al Jazeera, the BBC, the Guardian, Die Zeit and the Boston Globe. [Read more…]

#wjchat Working towards the future of journalism

Some people think that the future of journalism is uncertain. At the DJB we believe that it is exciting, more data-friendly and a scary bit challenging.

It’s not taking a big risk to say that most of it will happen online, but what is really interesting to witness is the very making of it…

Every Wednesdays, around midnight, while most people are scrabbling the internet or scrolling down their Facebook page in an hypnotic manner, hoping to fall asleep, a Twitter community actually works toward the future of online journalism.

#Wjchat is a weekly online conversation for web journalists which tackles all things content, technology, ethics, & business of journalism on the web.

While it doesn’t seem like the most interesting thing at first, a Twitter search for the hashtag #wjchat and a look at their profile will show you that 2,300 people already follow their tweets and that hundreds of them are posted every week on topics such as “innovation, culture and engagement”, “multi-media and visual storytelling”, or even “jobs and internships”.

Last week #wjchat was about the future of online journalism, hosted by the Chicago’s Tribune very own data guru, Brian Boyer. You can find out about what he does here.

Brian is also the person in charge of the very exciting PANDA project, winner of this year’s Knight News Challenge, and is said to revolutionise data journalism in a very practical way. (more on the PANDA project here)

You can follow #wjchat discussions in many ways but our favourites are via Tweetchat and on Google+.

And like everyone else, #wjchat is also on Facebook.

Dozens of people from around the globe joined the discussion last week. The archive of the #wjchat discussion is available on their website but we made a compilation of the best tweets that were posted that day. Enjoy and keep the discussion going in the comments!

 


2011-07-21T00:09:26Z
brianboyer (Brian Boyer)
A preface to tonight’s chat: “Take a set of encyclopedias and ask, ‘How do I make this digital?’ You get a Microsoft Encarta CD.”… #wjchat


2011-07-21T00:10:18Z
brianboyer (Brian Boyer)
“Take the philosophy of encyclopedia-making and ask…” #wjchat

2011-07-21T00:10:30Z
brianboyer (Brian Boyer)
“‘How does digital change our engagement with this?’ You get Wikipedia.” –Craig Mod

2011-07-21T00:11:21Z
brianboyer (Brian Boyer)
So! Journalism is a myopic business. We need to be more speculative, we need to think way beyond existing technologies. #wjchat

 


2011-07-21T00:12:00Z
brianboyer (Brian Boyer)
Someone’s gonna make the future, and it might as well be us. With that in mind, let’s begin! #wjchat


2011-07-21T00:13:21Z
wjchat (wjchat)
Q1 If Twitter is the telegraph that we’ll all laugh at when we’re old what’s the future? Crazy ideas, please. #wjchat


2011-07-21T00:17:42Z
TheChalkOutline (Scott Schwebke)
Another crazy idea- Computer programs that will be able to predict tomorrow’s news based on yesterday’s events #wjchat

 


2011-07-21T00:21:45Z
rynk (Stephen M Rynkiewicz)
#wjchat Q1 Twitter is a telegraph with better hardware. The future device? Embedded in clothes, notebooks, refrigerator magnets.


2011-07-21T00:22:15Z
wjchat (wjchat)
Q2 What are some examples of great digital-native journalism? @BrianBoyer will define “digital-native journalism” #wjchat


2011-07-21T00:22:17Z
brianboyer (Brian Boyer)
Preface for Q2: It seems to me that we’re still doing print journalism, and shoehorning it onto the web. #wjchat


2011-07-21T00:23:12Z
acnatta (André Natta)
Q2 data-visualization based pieces? #wjchat

 

2011-07-21T00:24:14Z
brianboyer (Brian Boyer)
A2: I can name two — Politifact and Everyblock. Both relate newsworthy information in a web-native fashion. #wjchat


2011-07-21T00:33:09Z
brianboyer (Brian Boyer)
I would like to take this opportunity to say STOP MAKING INTERACTIVE GRAPHICS. (usually, sometimes they’re awesome. sometimes.) #wjchat


2011-07-21T00:33:52Z
gteresa (Teresa Gorman)
@brianboyer can you expand on that? What’s one that really works? #wjchat


2011-07-21T00:34:33Z
webjournalist (Robert Hernandez)
@brianboyer Do you mean stop lame interactive graphics? Be more specific. #wjchat

 


2011-07-21T00:34:43Z
brianboyer (Brian Boyer)
A counterexample to my previous statement: http://nyti.ms/3dLjAe #wjchat


2011-07-21T00:37:08Z
schwanksta (Ken Schwencke)
@brianboyer Well, you (we) sometimes are. But it’s a big box! Perhaps: “Stop taking print graphics and adding animation” #wjchat


2011-07-21T00:39:44Z
wjchat (wjchat)
Q3 Reinterpreting Mod’s piece: Wrong q: How do we change journalism to make it digital? Instead: How does digital change journalism? #wjchat


2011-07-21T00:39:45Z
ivanlajara (Ivan Lajara)
I found this Arab Spring timeline by @guardian very useful http://bit.ly/pGsHbw #wjchat

 


2011-07-21T00:43:09Z
FatFighterTV (FatFighterTV)
A3 More accessible, fluid, and more sources – which means you have to be more careful and know which ones to trust. #wjchat


2011-07-21T00:43:22Z
brianboyer (Brian Boyer)
A3: When everyone with a mobile is a source, reporter, and publisher… I think a lot is gonna change. #wjchat


2011-07-21T01:14:00Z
wjchat (wjchat)
LR3 What sources (books, blogs, anything) do you recommend for fellow future-makers? #wjchat

 


2011-07-21T01:17:37Z
brianboyer (Brian Boyer)
LR3: Attend your local open-source user groups. Start a Hacks/Hackers chapter. #wjchat


2011-07-21T01:20:18Z
wjchat (wjchat)
Q6 What do you see other mediums doing online that journalism can steal? Give examples (Tools, ideas, etc). #wjchat


2011-07-21T01:22:57Z
knowtheory (Ted Han)
@wjchat #wjchat Steal the open source development model. engage users, foster their interest & support, and turn them into collaborators


2011-07-21T01:27:02Z
wjchat (wjchat)
Q6B What technological advances that are coming are you most excited about harnessing for journalism? #wjchat

 


2011-07-21T01:27:39Z
roeberg (Roei Eisenberg)
A6B: Augmented reality, w/o a question. #wjchat


2011-07-21T01:29:35Z
knowtheory (Ted Han)
@wjchat #wjchat q6b) better analysis and workflow tools. I want systems that help me understand big chunks of info.


2011-07-21T01:30:28Z
roeberg (Roei Eisenberg)
A6B: Also, making Excel easier and more intuitive. We need to be able to crunch numbers w/o hassle. #wjchat


2011-07-21T01:33:05Z
wjchat (wjchat)
Q7 So! You’re not a programmer? So what. What’s your role to help bring journalism into the future?

 


2011-07-21T01:34:53Z
DLoxLA (D Lock)
A7: Find someone who is a programmer and team-up. #wjchat #sharingIScaring


2011-07-21T01:41:27Z
knowtheory (Ted Han)
@wjchat #wjchat BonusQ) interested ppl should go checkout the @knightmozilla MoJo project and #moznewslab experimenting w/ new ideas

 

Will PANDA save data journalism?

Panda image used under a Creative Commons license from Jenn and Tony Bot

Over the past few years, the Knight Foundation News Challenge has helped develop amazing projects such as DocumentCloud and Localwiki.

Data and the use of it for journalism was a big trend among this year’s winners. No need to say we were quite excited to see this burst of idea dedicated to data journalism.

The project that caught our attention, and not just because of its cute name, is PANDA, a newsroom data application that would help journalists find context and relationships between datasets in a flick of an eye.

“While national news organizations often have the staff and know-how to handle federal data, smaller news organizations are at a disadvantage. City and state data are messier, and newsroom staff often lack the tools to use it,” John Bracken from the Knight Foundation explains. The PANDA project will “help news organisations better use public information.”

Brian Boyer, the news applications editor at the Chicago Tribune, in partnership with Investigative Reporters & Editors (IRE) and The Spokane Spokesman-Review, will build a set of open-source, web-based tools that will make it easier for journalists to use and analyze data. “The goal is to have a system that each news organization can put to their own use,” Boyer said. “I want this to be something an editor can set up for you, not your IT department.”

In the following PPT slides, Brian Boyer explains the concept of PANDA and how it could revolutionize data journalism:

 

You must have understood by now, there is unfortunately no link to the furry animal, in fact, PANDA stands for PANDA A News Data Application.

One of the backbones of the project will be Google Refine, a tool launched last year that cleans up messy datasets and detect patterns. “One of the added benefits of Google Refine, Boyer said, is that it can help draw relationships across data.” It would also allow newsrooms that can’t afford developers, to integrate PANDA into their workplace easily.

The PANDA project received a $150,000 grant. The money will mainly be used to hire a developer to build the application and to give the project a nice fancy look and easy-to-use features.

The first step in this project will be to survey journalists on how they would like PANDA to work in their newsroom. The team will then have to implement those needs and scale the project across newsrooms of different sizes.

Dealing with big datasets requires big storage space and Boyer said that the best option would be for PANDA to work with a cloud storage system, although they haven’t worked out any specifics yet.

Other data-related projects received Knight funding: ScraperWiki (you can find our interview with their media partner manager here), OpenBlock Rural, Overview and SwiftRiver.

Here is a video from the Knight Foundation website giving an overview of all the projects:

(For Brian Boyer’s talk about the PANDA project, go to 9:42)

[vimeo 25222167]

Data journalism, data tools, and the newsroom stack

O’REILLY RADAR – By 

New York Times 365/360 - 1984 (in color) By blprnt_van

MIT’s recent Civic Media Conference and the latest batch of Knight News Challenge winners made one reality crystal clear: as a new era of technology-fueled transparency, innovation and open government dawns, it won’t depend on any single CIO or federal program. It will be driven by a distributed community of media, nonprofits, academics and civic advocates focused on better outcomes, more informed communities and the new news, whatever form it is delivered in.

The themes that unite this class of Knight News Challenge winners were data journalism and platforms for civic connections. Each theme draws from central realities of the information ecosystems of today. Newsrooms and citizens are confronted by unprecedented amounts of data and an expanded number of news sources, including a social web populated by our friends, family and colleagues. Newsrooms, the traditional hosts for information gathering and dissemination, are now part of a flattened environment for news, where news breaks first on social networks, is curated by a combination of professionals and amateurs, and then analyzed and synthesized into contextualized journalism.

 

Data journalism and data tools

 

In an age of information abundance, journalists and citizens alike all need better tools, whether we’re curating the samizdat of the 21st century in the Middle East, like Andy Carvin, processing a late night data dump, or looking for the best way to visualize water quality to a nation of consumers. As we grapple with the consumption challenges presented by this deluge of data, new publishing platforms are also empowering us to gather, refine, analyze and share data ourselves, turning it into information. [Read more…]

ProPublica’s newest news app uses education data to get more social

NIEMANLAB – By Megan Garber

Yesterday, the U.S. Department of Education’s Office of Civil Rights released a data set— the most comprehensive to date — documenting student access to advanced classes and special programs in public high schools. Shorthanded as the Civil Rights survey, the information tracks the availability of offerings, like Advanced Placement courses, gifted-and-talented programs, and higher-level math and science classes, that studies suggest are important factors for educational attainment — and for success later in life.

ProPublica reporters used the Ed data to produce a story package, “The Opportunity Gap,” that analyzes the OCR info and other federal education data; their analysis found among other things that, overall and unsurprisingly, high-poverty schools are less likely than their wealthier counterparts to have students enrolled in those beneficial programs. The achievement gap, the data suggest, isn’t just about students’ educational attainment; it’s also about the educational opportunities provided to those students in the first place. And it’s individual states that are making the policy decisions that affect the quality of those opportunities. ProPublica’s analysis, says senior editor Eric Umansky, is aimed at answering one key question: “Are states giving their kids a fair shake?”

The fact that the OCR data set is relatively comprehensive — reporting on districts with more than 3,000 students, it covers 85,000 schools, and around 75 percent of all public high schoolers in the U.S. — means that the OCR data set is also enormous. And while ProPublica’s text-based takes on the info have done precisely the thing you’d want them to do — find surprises, find trends, make it meaningful, make it human — the outfit’s reporters wanted to go beyond the database-to-narrative formula with the OCR trove. Their solution: a news app that encourages, even more than your typical app, public participation. And that looks to Facebook for social integration. [Read more…]

 

OKCon 2011: Introduction and a Look to the Future

OPEN KNOWLEDGE FOUNDATION – By Rufus Pollock

This is a blog post by Rufus Pollock, co-Founder and Director of the Open Knowledge Foundation.

OKCon, the annual Open Knowledge Conference kicked off today and it’s been great so far. For those not here in Berlin with us you can follow main track talks via video streaming:http://www.ustream.tv/channel/open-knowlegde

Below are my slides from my introductory talk which gives an overview of the Foundation and its activities and then looked to what the challenges are for the open data community going forward.

Looking to the Future

The last several decades the world has seen an explosion of digital technologies which have the potential to transform the way knowledge is disseminated.

This world is rapidly evolving and one of its more striking possibilities is the creation of an open data ecosystem in which information is freely used, extended and built on. [Read more…]

Data journalism – is it worth it?

IN PUBLISHING – By Paul Bradshaw

Whether it is the desire to replicate the enormous sales successes of the MPs’ expenses and WikiLeaks revelations, or publishers wanting to expand into selling data services, it seems everyone wants to do something with data. The only question, writes Paul Bradshaw, is: where to start?

When Simon Rogers first asked to publish data on the Guardian website, someone asked: “Who on earth would want to look at a spreadsheet online?” It turned out that over 100,000 people would regularly hit the website to do just that. One person’s audit, it seemed, was another’s sticky content. And the past few years have seen data transformed from conversation killer to hot topic – in both newsroom and boardroom.

Tapping into development talent

For some publishers, the advantage of a data-driven approach to news production is that it allows them to tap into latent development talent within the readership. The Guardian and the New York Times are among an increasing number of media organisations to publish APIs – Application Programming Interfaces – that allow web developers to build new products with their content and – equally importantly – the data surrounding it. In return, the new services can carry advertising sold by the publisher, drive new traffic to the original site, or act as market research to demonstrate demand for a more developed proposition (as happened, for example, with the Guardian’s mobile app).

To stimulate this development, organisations organise ‘Hack Days’ where developers are invited to spend a day or a weekend creating quick editorial ‘hacks’. The investment is minimal when compared to the cost of doing everything in-house: a small amount of staff time, and a lot of pizza.

Hack day events have led to all sorts of outcomes from personalised mobile editions, applications which would alert people to events and route them to the location, even a tool which suggests recipes based on an image uploaded by the user. The Guardian say they benefit from “being able to reach new markets that we might not otherwise find. We grow our vertical ad network through high quality partners [taking part in hack days]. We’re also able to offer our end users innovative, clever and useful interactive services provided by experts outside of our domain.” [Read more…]