Just to review, the basic idea behind win probability models like PythagenPuck is that over the course of a season, the Goals For and Goals Against numbers can be used to derive a team's winning percentage, within a a surprisingly narrow margin of error. For instance, if you only had GF/GA information, you could make a very good stab at projecting what the standings would look like. The wildcard that remains outside PythagenPuck's reach is the Shootout - we don't have a good way to predict how often teams will go to a shootout, nor how well they'll do once they get there. Keep that in mind, as I've removed SO Win points from the analysis below.
The left-hand side of this table shows what the data looked like in November - looking at Ottawa as an example, it said that based on Goals For and Against, Ottawa's winning percentage would be expected to be .550, but in fact they stood at only .389 at the time (and media outlets were rife with "fire the coach!" and "break up the team!" articles). In Boston, the Bruins were clinging around playoff contention with a .469 winning percentage, while their GF/GA ratio would have predicted a .349 mark. Those gaps between Expected and Actual figures were the largest from a positive and negative perspective, and my suggestion at the time was that those gaps would close, and given the choice between GF/GA and Winning Percentage, I expected both team's Actual Winning Percentage to make the move. Sure enough, as we see in the right side of the table, the Senators have soared back over .500 and into solid playoff position in the East (although, to be noted, their GF/GA has improved as well), and the Bruins have tailed off. Their Gap value is still the highest in the league, but it has shrunk by roughly 25% from the November mark.
It was nice to see that PythagenPuck was that useful back in November, but I wanted to try and get a handle on just how useful it was, so I decided to run some correlation numbers. Comparing the Actual Winning Percentage in November to the Actual results since then, I came up with a value of 0.483, a pretty strong correlation. This makes sense, as most teams doing well (or poorly) early on will continue doing so, albeit with some significant exceptions. Then I ran the figures comparing Expected Win Percentage from November, to the Expected Win Percentage since that time, and came up with an even stronger number: 0.603. In other words, by taking our snapshot at the quarter-pole of the season, a team's Goals For/Against numbers were more likely to hold true for the rest of the year than their Winning Percentage. Then lastly I ran the numbers comparing the Expected Win Percentage from November, against the Actual Winning Percentages since that time. That number came in at 0.594, another strong figure, and significantly stronger than the Actual/Actual correlation of 0.483.
That's an interesting result, because what it basically says is that to get an idea of where teams will finish in the standings, you were better off looking at the GF and GA columns in the paper than actual Win/Loss results that time! While I happened to pick the quarter-pole to run that analysis, I'm curious to run a time-series of such correlations to see at what point the GF/GA data gives us a comfortably acceptable picture of a team's ability. Is that mark 10 games, 20 games, 30 games? I don't know quite yet, but the answer could help determine whether teams that made major moves after early disappointments like Philadelphia, Columbus and Chicago really had enough information to act on, or whether other teams like St. Louis should perhaps have acted earlier if they hoped to actually turn this season's fortunes around.