This post is for people who are new to data sourcing, or interested in Data Journalism but unsure of where to begin.
First, it is useful to start with an idea, question or hypothesis. In Story Based Enquiry Mark Lee Hunter emphasises the importance of having an idea of what you are looking for in data.
He said: “We do not think that the only issue is finding information. Instead, we think that the core task is telling the story. Stories are the cement which holds together every step of the investigative process, from conception to research, writing, quality control and publication.”
Data stories and visualisations are part of journalism and, when looking for information, a good starting place is to use traditional journalistic methods. Contacts, tip offs, interviews and research can all point you in the direction of interesting data, and of questions that could be answered by statistics. This is known as Active Data Journalism.
Alternatively, you can start with the data and look for the story. This was the way that Heather Brooke broke the MPs expenses scandal in 2009. However, it does help to have an idea of which data is worth looking into.
The release dates of data on websites such as the Office for National Statistics, www.ons.gov.uk, show you when sets of public data are due to be released. On sites like this you can look out for interesting data as it is released and write a timely story based upon it.
Another good source of public data is www.Data.gov.uk, which holds data from a variety of government departments. Local sites such as OpenlyLocal and Mash the State are good at making government data accessible.
In addition to this, private companies and charities often release data online on their websites. Media sites like the Guardian and the New York Times have a DataStore which routinely publishes data in raw form which can be downloaded.
To get non public data it may be necessary to send a Freedom of Information Request (FOI). All public bodies are open to FOI requests, including local councils, police forces and travel organisations. It may be best to specify which format you wish to receive the information. A spreadsheet in electronic format is the easiest way to clean and scrape data and so it is good to request information in that format.
Being part of a data community helps. Here at the Data Journalism Blog we aim to provide a platform for such a community! Other good community sites are GetTheData.org, the Open Data Cookbook and the Wolfram Alpha forum. Here you can talk to other data enthusiasts, see what data others are playing with and be inspired.
Finally, remember that it is always important to ensure that you have a legal right to republish data before you use it.
Once you have the data you can clean, scrape and analyse it to turn it into a story, visualisation or interactive graphic….