Skip to main content

Data Quality Alert! Data Quality Alert!

There's a new piece over at Hockey Analytics which I heartily recommend to those interested in furthering the use of statistics related to NHL hockey. Alan Ryder pioneered the investigation of Shot Quality, which attempts to measure the characteristics of shots (distance, type, situation) to provide a more finely detailed view of offensive and defensive performance. I use a slightly simplified version of Alan's SQ techniques in my analysis here quite often, so when the article entitled "Product Recall Notice for Shot Quality" was posted, it definitely caught my eye. While it is obvious to anyone who has read through the NHL's play-by-play files that data quality problems exist, the presumption has been that these errors are basically random and cancel each other out over the course of 70,000+ shots in an NHL season.

By looking at arena-by-arena details, however, Alan has raised some pretty serious issues with the data, basically demonstrating that scorers in different venues seem to have systematic biases in how shots are recorded. Games played at Madison Square Garden, for example, consistenly have the most dangerous shots recorded in the logs, whereas scorers in Buffalo and Tampa tend towards the opposite view. The implications are that first of all, we always need to keep in mind the limitations of the data that the NHL presents to us, and secondly, look into possible means of correcting for such biases (by using something like the "park effect" that baseball stats junkies use). I guess I've got one more thing added to my summer to-do list...

Back in March I did something similar along the lines of the Giveaway/Takeaway stats, as well as how frequenty different scorers record Missed Shots vs. Saves. In my Give/Take and Missed Shot pieces, for example, I looked at how teams performed at home, how they performed on the road, and how visitors performed in their building, in order to isolate the effect of the official scorer. It was interesting to see that games in Chicago feature an absurdly low number of Giveaways and Takeaways by either team, while in Montreal or Edmonton the per-game figures are five times higher or more!

The potential for statistical analysis to extend our understanding of professional hockey remains largely untapped, but the quality of the data being recorded is a critical obstacle that needs to be overcome if we're to make the best progress we can. I'm not quite sure how best to pursue this issue with the NHL, but I'm open to suggestions.

Popular posts from this blog

How I'm Trying To Make Money Sports Blogging

To kick off this series of articles general sports-blogging articles here at OTF Classic, I think it's best to start with a comment that Brad left here last week, after I shared my goals for 2012, which include specific revenue targets:
I considered diving into the world of internet marketing myself, but I felt that my friends would hate me for bugging them about stuff. I mean, it's pretty low-risk high-reward, so it's tempting. I wouldn't mind reading about tips on how to maximize impact of blogging in general to make it a legitimate income source. Trying to make money at sports blogging can be a very touchy subject - for the vast majority of us, this is an activity we pursue to both exercise our creativity and share our love of the game, whether it's hockey, football, badminton, whatever, with fellow fans. Mixing that personal conversation with a commercial message can turn people off, especially if it becomes too intrusive for the reader.

It's not unreasonabl…

Canadian Baloney, starring James Mirtle

A tireless refrain from the Canadian media is that Nashville is an absolute failure as a hockey market, and failing to move the team north of the border is an exercise in folly by the NHL.

Our latest exhibit comes from James Mirtle, usually one of the more thoughtful hockey bloggers extant:
But Nashville, quite simply, has proven it cannot sustain an NHL hockey team. Even with the lowest ticket prices in the entire league (I know: I've looked into flying there for a game or two) and a ridiculously forgiving arena lease, the team has had attendance issues despite having one of the best records in the league.

It's not a matter of Canadians not wanting teams in the southern U.S.; I've argued time and again in favour of teams like Dallas and Tampa Bay that have supported their teams and really brought something to the table in terms of bringing news fans and new energy to the game. That's a good thing.

The Predators, however, are not that, not in the beginning and certainly no…

Get Your NHL Super Schedule 2008-9 Right Here!

Click here for the 2009-10 NHL Super Schedule, at my new site,!

The NHL announced the 2008-9 Regular Season schedule today, so of course, it's time right here to publish my very own NHL Super Schedule 2008-9 as well.

For those unfamiliar with what I did last year, the NHL Super Schedule is a spreadsheet that I put together and make publicly available via Google Documents*. It includes an entry for each game in each team's 82-game regular season schedule, with additional information such as how far that team has had to travel since its last game, how many days have passed since that previous game, and various statistics relative to the opponent that evening, such as 2007-8 Winning Percentage, Goals Per Game, Goals Against Per Game, etc. For example, you can total the distance that each team will travel during the upcoming season, or find who plays the most back-to-back games. Check out which team faces the toughest opposing offenses, or which power plays…