A data journalist’s guide to sports data

The Winter Olympics 2018 in PyeongChang, South Korea, are just a few weeks away, and the football world cup 2018 is not far off either. While many journalists around the world are preparing their coverage, we wonder: how do you get ready for these big sporting events? What’s the difference between a sports data journalism project and any other data project? Where do you find data and analytics on this topic?

 

From top left, clockwise: ‘The Tennis Racket’ project by BuzzFeed News, ‘Who is your Olympic body match?’ by the BBC, the ‘One-handed backhand’ project by The New York Times, and ‘Could you be an assistant referee?’ by The Times.

 

We’ve gathered four experts from both sides of the pond to answer these questions and share tips on how to best work with sports data in the newsroom.

Steve Doig from ASU’s Cronkite School of Journalism (US), Paula Lavigne from ESPN (US), Nassos Stylianou from the BBC (UK), and Malcolm Coles, digital publishing strategy consultant, formerly with the Telegraph and the Trinity Mirror (UK), all joined the conversation. Here is a compilation of what we’ve learned.

 

The main differences between sports data and other types of data

All our experts agreed that working with sports data is a little different from working with any other types of data.

Here are the four main differences they pointed out during our discussion:

  • You don’t have to have a public records fight to get it
  • The problem with sports data is that there’s such a flood of it that people are still trying to find ways to get good signal out of all the noise
  • The data is often very granular (up-to-the-minute data, or even up-to-the-second data, is quite common)
  • Fans have a huge interest in it

“Sports is the one part of a news organisation where the consumers really care about numbers. It’s a lot harder to sell a data story in other news contexts,” Steve Doig (ASU’s Cronkite School of Journalism, US).

The fastest 100m times ever. Those caught doping struck out in red.

— @jonbir90

 

As the example above shows, there’s a whole data ecosystem of what you can call the ‘obsessed fans’, some of whom ‘have gone on to create viable business models of gathering and adding value to the raw data’, Doig argued.

 

Steve Doig shared with us this glossary of some “moneyball” metrics that have been created, often by fans rather than the pros themselves

 

Where do you find sports data?

“In the US, certainly, the major pro sports leagues have opened up their data streams to just about anyone…and much of it can be played with using simple computer tools like Excel,” Steve Doig (ASU’s Cronkite School of Journalism, US).

 

 

Opta

Opta is the world’s leading live, detailed sports data provider. A lot of their stats are proprietary, but a lot of news organisations in the world have agreements with them.

 


 

 

Transfermarkt

Transfermarkt is a German-based website owned by Axel Springer that has footballing information, such as scores, results, statistics, transfer news, and fixtures.

 


 

 

WhoScored

WhoScored brings you live scores, match results and player ratings from the top football leagues and competitions.

 


 

 

Statsbomb Services

Many clubs are interested in incorporating statistics into their workflow, but few have the staff who know where to start. StatsBomb Services organises and parses all the data, delivers cutting edge visualisations and analysis, and is totally useful to journalists too.

 


 

 

Sport-reference websites (US)

In the US, a good source of data are the various *-Reference.com sites, with the asterisk filled in with the name of the sport, like baseball and pro football (American style).

 


 

CIES Football Observatory

Since 2013, the CIES Football Observatory has developed a powerful approach to estimate the transfer value of professional footballers on a scientific basis.

 


NBA Stats

The leagues themselves, such as the NBA, supply data on players, teams, scores, lineups, and more.

 


 

 

ESPN Cricinfo

For cricket data, ESPN cricinfo is fantastic. It gathers very granular information on all matches and series from the past few years, ordered by country or by team.

 


 

Wikipedia

Scroll down Wikipedia pages and they often have tables of data that you can grab.

 


 

Where do you find olympics data?

When it comes to the Olympic Games it is usually the Olympics Data Feed that has all the data:

 

The Olympic Data Feed is used by many news organisations worldwide

 

Alternatively, you can always look at Wikipedia, where a lot of data tables are available. For example, here is a table about the 100 metres at the Olympics:

 

Wikipedia offers a lot of historical data related to the olympics

“What is fantastic with Olympic Games is the very different attributes of the athletes (age, height, weight) which you do not really get with other sports,” Nassos Stylianou from the BBC (UK).

Here is a project the BBC ended up doing for the Rio Olympics:

 

Over 10,500 athletes out of some 11,500 in the official Olympic Data Feed (ODF) have been used in this project.

 

Is verification a big issue in sports data?

“Verification is tricky, but not in the same way as data verification for other topics. It could be tricky when different data organisations or websites have different methodologies in their data collection,” Nassos Stylianou from the BBC (UK).

How do you choose which data to go after?

Nassos Stylianou: From our point of view, presenting data in a way that the audience understands is key. So wherever possible really, ‘industry standards’ are great, if they are meaningful and can provide interesting stories. But sometimes, it is the analysis of that data in a slightly different way that could provide a new and interesting angle. I don’t think that is different to any other type of data journalism really. Ask the right questions of your data, ask why certain things could be happening, try to visualise them in a way that answers all these questions.

 

The “One race, every medalist ever” project by The New York Times

 

 

Malcolm Coles: It depends what you’re trying to achieve. Are you looking to illuminate a specific event or match? Or trying to tell a story? Even for the latter, I think something like the project ‘One race, every medalist ever’ by The New York Times is doable with just Wikipedia data. But if you wanted to tell the story of how Bolt dominates, you would need split times for every 10m and you can’t get that from Wikipedia.

 

Interesting examples to look at

This project below, which is video-led, is a good example of where analysis of techniques worked really well with some data.

 

The “One-handed backhand project” by The New York Times

 

And this one, is an example where the Times newspaper worked with the Football Association to build a game for their audience to show how difficult or easy it is to referee (The Wall Street Journal did a similar one with being a tennis line judge). So working with analysts really does help.

 

 

 

What makes a good sports data story?

Steve Doig: Much of my career has been in investigative work, so I lean towards stories that investigate problems. A good example is the ‘Tennis Racket’ investigation by Buzzfeed’s John Templon and Heidi Blake.

 

The Tennis Racket investigation by BuzzFeed News

 

I also like fun stories, which can be created out of novel use of data. I’ve always argued that data journalism in general adds evidence to stories that otherwise would be collections of anecdotes. So sports data can do the same, I think. The data at least adds weight to the arguments being made about strategies or player choices, etc.

Nassos Stylianou: I don’t think this is different from any news story really –although it can be a lot more fun! So as with data journalism in general, a [good sports data story is a] story that tells you something new in a visually engaging way.

Malcolm Coles: A good sports data story is the same as any other good story really. I’ve tended to be more interested in how you can use data to visualise a story that you would otherwise tell in lots of complicated words.

Tips on visualising sports data

Nassos Stylianou: Always think of who your audience is. Many sports fans could be used to a certain type of visualisation that makes sense to them but makes no sense to other people. If you are aiming your story in their direction, you can work with that in mind but if you want this to go beyond the sport obsessive, that’s not always the best strategy.

 

 

Malcolm Coles: I think a good visualisation is one that works on a mobile phone … I get shown this visualisation (pictured left) on the 2010 World Cup every year. It’s just fixtures data visualised — was great for its time. I get asked to build one like it every year, yet it won’t work on a mobile.

Steve Doig: Be aware of the growing number of sports analytics conferences being organized. The original, I believe, is the MIT Sloan Sports Analytics Conference held each year in Boston. About 1,800 young MBA students from all over the country (and now the world) show up trying to get hired as data analysts by sports leagues.

 

How do you get ready for big sports events like the Olympics, the Superbowl, or the Football World Cup?

Steve Doig: I’d say, do the same thing the on-air commentators do: gather all the relevant historical stats and be ready to use them in your stories. It’s also good to have stable of data analytics experts whose voices you can add to your stories.

Nassos Stylianou: Yep, prep well in advance. The great thing with these big events is also to build things that will work throughout the tournament.

Malcolm Coles: Try and build stuff outside of one off stories or investigations that you can reuse when the big tournament is over.

 


To see the full discussion, check out previous ones and take part in future ones, join the Data Journalism Awards community on Slack!

Over the past six years, the Global Editors Network has organised the Data Journalism Awards competition to celebrate and credit outstanding work in the field of data-driven journalism worldwide. To see the full list of winners, read about the categories, join the competition yourself, go to our website.

 


marianne-bouchart

Marianne Bouchart is the founder and director of HEI-DA, a nonprofit organisation promoting news innovation, the future of data journalism and open data. She runs data journalism programmes in various regions around the world as well as HEI-DA’s Sensor Journalism Toolkit project and manages the Data Journalism Awards competition.

Before launching HEI-DA, Marianne spent 10 years in London where she worked as a web producer, data journalism and graphics editor for Bloomberg News, amongst others. She created the Data Journalism Blog in 2011 and gives lectures at journalism schools, in the UK and in France.

 

How three women are influencing data journalism and what you can learn from them

This article was originally published on the Data Journalism Awards Medium Publication managed by the Global Editors Network. You can find the original version right here.

________________________________________________________________________________________________________________________

 

Stephanie Sy of Thinking Machines (Philippines), Yolanda Ma of Data Journalism China and Esra Dogramaci of Deutsche Welle, formerly Al Jazeera (Germany), new members of the Data Journalism Awards jury, talk innovation, data journalism in Asia and the Middle East, and women in news.

left to right: Yolanda Ma (Data Journalism China), Esra Dogramaci (Deutsche Welle, formerly BBC and Al Jazeera), and Stephanie Sy (Thinking Machines) join DJA Jury

 

We welcomed three new members to the Data Journalism Awards jury last year (pictured above). They are all women, strong-willed and inspiring women, and they represent two regions that are often overlooked in the world of data journalism: Asia and the Middle East.

What was your first project in data journalism or interactive news and what memory do you keep from it?

Esra Dogramaci: In 2012, Invisible Children launched a campaign to seek out Lord’s Resistance Army(LRA) leader Joseph Kony and highlight the exploitation of child soldiers. Then, at Al Jazeera, we wanted to see what people in North Uganda, who lived in one of the areas who were affected by the LRA actually had to say about it. They would ‘speak to tweet’ and we would map their reactions on Ushahidi using a Google Fusion table in the background.

 
Uganda Speaks by Al Jazeera

 

Although Al Jazeera had started doing this kind of projects back in 2009 during the war on Gaza (the experiment’s page of the Al Jazeera Lab website has now disappeared but can be viewed through WebArchive.org), it picked up steam during Egypt’s 2011 Arab Spring where, due to lack of broadcast media coverage, protesters were using social media to bring attention to what was happening.

Interactive story by Thinking Machines

 

Stephanie Sy: Our first data journalism project as a team at Thinking Machines was a series of interactive stories on traffic accidents in Metro Manila. We cleaned and analysed a set of Excel sheets of 90,000 road accidents spanning 10 years.

It was the first project we worked on as a mixed team of journalists, designers, and data scientists, and the first time we tried to build something from scratch with d3.js! I worked on the d3 charts, and remember being in utter despair at how hard it was to get the interactive transitions to render nicely across different browser types. It was surprisingly well received by the local civic community, and that positive feedback emboldened us to keep working.

 
Connected China, Thomson Reuters

 

Yolanda Ma: One of my first projects was Connected China for Thomson Reuters, which tracked and visualised the people, institutions and relationships that form China’s elite power structure (learn more about it here).

This project taught me the importance of facts and every piece of data in it (thousands, if not millions in total) went through a rigid fact-checking process (by human beings, not machines, unfortunately). I learned by doing that facts are the bones of data journalism, not fancy visualisations, even though this project turned out to be fancy and cool, which is good too.

 

Now, what was the latest project you worked on and how do the two compare?

 

ED: Towards the end of last year, I taught a data journalism module to City University London Master’s students who were able to pull together their own data visualisation projects in the space of an hour. The biggest difference is how vastly the interfaces have improved and how quick and intuitive the designs and interactive softwares are now. There are a lot more companies switched on to storytelling beyond TV or text and that knowledge combined, how do you stand out in the world of online news?

Complementary to that Al Jazeera was always a front runner because they were willing to take risks and try something new when no one else was. In the newsrooms I’ve worked at or see since, there is still a general aversion to risk taking in preference of safety — though everyone knows that to survive and thrive in this digital media landscape, its risk taking, innovation that is going push those boundaries and really get you places.

SS: Our latest related data story is a piece we put together visualising traffic jams across Metro Manila during the holiday rush season. This time we were looking at gigabytes of Waze jams data that we accessed through the Waze API. It definitely grew out of our early work in transit data stories, but reflects a huge amount on growth in our ability to handle complex data, and understanding of what appeals to our audience.

One big piece of learning we got from this is that our audience in the Philippines mainly interacts with the news through mobile phones and via Facebook, so complex d3 interactives don’t work for them. What we do now is to build gifs on top of the interactives, which we then share on Facebook. You can see an example of that in the linked story. That gets us a tremendous amount of reach, as we’re able to communicate complex results in a format that’s friendly for our audience.

YM: I’ve been doing data journalism training mostly in the past few years and helping others do their data projects, so nothing comparable really. The latest project I worked on is this Data Journalism MOOC with HKU in partnership with Google News Lab. It is tailored-made for practitioners in Asia, and it’s re-starting again soon (begins March 6), so go on and register before it’s too late!

 

What excites you about the future of data journalism and interactive news?

 

ED: The ability to tell stories in a cleaner, more engaging way. Literally everything can be turned into a story just by interrogating the data, being curious and asking questions. The digital news world has always been driven by data and it’s exciting to see how “traditional” journalism is embracing this more. I love this example from Berliner Morgenpost where they charted this bus line in Berlin, combined with a dash cam comparing various data such as demographics, voting. Its an ingenious way of taking complex data and breaking it into a meaningful, engaging way rather than pie charts.

M29 from Berliner Morgenpost

 

SS: There are tremendous amounts of data being generated in this digital age, and I think data journalism is a very natural evolution of the field. Investigative journalists should be able to use computer science skills to find their way through messy datasets and big data. It’s absolutely reasonable to expect that a news organization might get a 1 terabyte dump of files from a source.

YM: It excites me because it is the future. We live in the age of data, and the inevitable increasing amount of data available means there is growingly huge potential for data journalism. People’s news consumption is also changing and I believe personalisation is one of the key characteristics for the new generation of consumers, which means interactive news — interactive in many different ways — will thrive.

 

How are Asian and Middle Eastern media organisations (depending on your experience) doing in terms of data journalism and interactive news compared to the rest of the world?

 

ED: I think Al Jazeera has always been a pioneer in this. They have a great interactive team that drew together people from various disciplines within the organisation — coders, video people, designers, journalists — before everyone else was doing it and they’ve been able to shed light on stories that wouldn’t usually be picked up on by mainstream media radars.

Example that illustrates my point: The project “Broken homes, a record year of home demolitions in occupied East Jerusalem” by Al Jazeera

“Broken homes, a record year of home demolitions in occupied East Jerusalem” by Al Jazeera

 

SS: We have a few media organisations like the Philippine Center for Investigative Journalism, Rappler, and Inquirer who have been integrating data analysis into their reporting, but there isn’t anyone regularly producing complex data journalism pieces.

Our key problem is the lack of useful datasets. A huge amount of work goes into acquiring, cleaning, and triple checking the raw data. Analysis is “garbage in, garbage out” and we can’t create good data journalism without the presence of good data. This is where the European and North American media organisations have an edge. Their governments and civic society organisations follow open data standards, and citizens can request data [via FOIA]! The Philippine government has been making serious progress towards more open data sharing, and I hope they’re able to sustain that commitment.

Example that illustrates my point: PCIJ’s Money Politics project is a great example of an organisation doing the data janitorial work of acquiring and validating hard-to-find data. During our last presidential elections in 2015, GMA News Network and Rappler both created hugely popular election tracking live data stories.

PCIJ’s Money Politics

 

YM: Media organisations in Asia are catching up on data journalism and interactive news. There are some challenges of course, for example, lack of data in less developped countries, lack of skills and talents (and limited training opportunities), and even poor infrastructure or unstable internet especially in rural areas that would limit the presentation of news stories. Despite the difficulties, we do see good works emerging, though not necessarily in English. Check out some of the stories from the last GIJN’s Investigative Journalism Conference held in Nepal and you’ll get an idea.

Example that illustrates my point: This Caixin Media data story analysed and visualised the property market in China for the past few years.

 

Another New Normal, Caixin Media

 

What view do you have on the role of women in the world of news today? How is it being a woman in your respective work environment? Do you feel it makes a difference? If so, which one and why?

 

ED: Women are underrepresented not just in news coverage but in leadership positions too. I have to admit though that being at Deutsche Welle, I see a lot more women in senior management and it feels like a much more egalitarian working environment. However looking at my overall experience as a woman in news, you do face a lot of sexism and prejudice. Every woman I know has a story to tell and when the latest story about Uber came out a lot of my female colleagues around me were nodding their heads.

What got me through challenging times is having a fantastic network of female role models and mentors who are there to support you. That was one piece of advice I gave to prior teams, get a mentor. A lot of women feel isolated or feel the way they are treated is normal but it’s not. Women should also be aware that there is a real risk you will be punished if you speak up, challenge the status quo and tow the party line. If this happens, it’s an environment or team you probably shouldn’t be in anyway.

SS: It’s alarming to see parties around the world trying to stifle the voices of anyone who doesn’t belong and calling any news that doesn’t flatter them as “fake news.”. It’s important for us to speak up as women, and to practice intersectionality when it comes to other marginalised communities. As people who work with data, we can see past the aggregates and look at the complex messy truth. We must be able to communicate that complexity in order for our work to make a difference.

YM: Most of the data journalism teams in China are led by woman, and I think they are doing really well 🙂

 

What do you think makes a great data journalism project? What will you be looking for when marking projects for the Data Journalism Awards this year?

 

ED: Simplicity. It’s easy to get lost in data and try to do too much, but it’s often about taking something complex and making it accessible for a wider audience, getting them to think about something they haven’t or perhaps consider in a different way. I’ll be looking for the why — why does this matter, does this story or project make a dent in the universe?

After all, isn’t that what telling stories is about? The obvious thing that comes through is passion. It’s also something obvious but you can tell when a person or team has cared and really invested into the work versus projects being rolled off a conveyor belt.

SS: A great data journalism project involves three things: novel data, clever analytical methods, and great communication through the project’s medium of choice. I’m hoping to see a wide variety of mediums this year!

Will someone be submitting an audio data journalism project? With all the very exciting advances in the field of artificial intelligence this year, I’m also hoping to see projects that incorporate machine learning, and artificial intelligence.

YM: I believe data journalism is after all journalism — it has to reveal truth and tell stories, based or driven by data. I’ll be looking for stories that do make an impact in one way or another.

 

If you had one piece of advice for people applying for the Data Journalism Awards competition, what would it be?

 

ED: Don’t be intimidated by the competition or past award winners. Focus on what you do best. I say this especially for those applying for the first time, I see a lot of hesitation and negative self talk of ‘I’m not good enough’ etc. In every experience there’s something to learn, so don’t hesitate.

SS: Don’t forget to tell a story! With data science methods, it’s easy to get lost in fancy math and lose track of the narrative.

YM: Tell us a bit about the story behind your story — say, we may not know how hard it might be to get certain data in your country.

 

What was the best piece of advice you were ever given in your years of experience in the media industry?

 

ED: Take every opportunity. That’s related to a quote that has been coming up over and over again for the past week or so, “success is when preparation meets opportunity.”

SS: One of my best former bosses told me to imagine that a hungover, unhappy man with a million meetings that day was the only reader of my work. He haunts me to this day.

YM: I started my career with the ambition (like many idealistic young people) to change China. My first (and second) boss Reg Chua once said to me, don’t worry about changing China but focus on making small changes and work with a long-term vision. Sounds cliche.

He said that to me in 2012. The next year, together with two other friends I started DJChina.org, which started in 2013 as a small blog and now grown to be one of the best educational platforms for data journalism practitioners in China. The year after, in 2014, Open Data China was launched (using the domain name I registered a few years back), and indicated a bottom-up movement to push for more open data, which was incorporated into national policy within a year. So I guess all these proved that Reg was right, and it could be applied to anywhere, or anything. Think big, act small, one story (or project) at a time, and changes will happen.

 


left to right: Yolanda Ma (Data Journalism China), Esra Dogramaci (Deutsche Welle, formerly BBC and Al Jazeera), and Stephanie Sy (Thinking Machines)

 

Stephanie Sy is the founder of Thinking Machines, a data science and data engineering team based in the Philippines. She brings to the jury her expertise in data science, engineering and storytelling.

Yolanda Ma is the co-founder of Data Journalism China, one of the best educational platforms for data journalism practitioners in China. Not only representing the biggest country in Asia, she also has experience teaching data skills to journalists and a great knowledge of data journalism from her region.

Esra Dogramaci has now joined Deutsche Welle and formerly worked with the BBC, Al Jazeera in Qatar and Turkey, as well as the UN Headquarters and UNICEF. She brings to the DJA jury significant experience in digital transformation across news and current affairs, particularly in social video and off platform growth and development.

 


The Data Journalism Awards are the first international awards recognising outstanding work in the field of data journalism worldwide. Started in 2012, the competition is organised by the Global Editors Network, with support from the Google News Lab, the John S. and James L. Knight Foundation, and in partnership with Chartbeat. More info about cash prizes, categories and more, can be found on the DJA 2017 website.


marianne-bouchart

Marianne Bouchart is the founder and director of HEI-DA, a nonprofit organisation promoting news innovation, the future of data journalism and open data. She runs data journalism programmes in various regions around the world as well as HEI-DA’s Sensor Journalism Toolkit project and manages the Data Journalism Awards competition.

Before launching HEI-DA, Marianne spent 10 years in London where she worked as a web producer, data journalism and graphics editor for Bloomberg News, amongst others. She created the Data Journalism Blog in 2011 and gives lectures at journalism schools, in the UK and in France.