Back to the Basics: Baseball's Pythagorean Theorem
[Editor's Note: This is the first article in a new series - Back to the Basics. Please welcome our new contributor, Jeff Meehan. Jeff, a JD/MBA candidate at Suffolk University, recently interned with the Boston Red Sox.]
The majority of the sports fan population [including your's truly] suffers from the belief that they would be the perfect General Manager [GM] for their favorite sport organizations. However, due to the increased importance placed on sports analytics, the majority of these fans lack the skillset sought by today’s front offices.
Other industries such as corporate finance, accounting, or even architecture, have structured career paths that guarantee you will be prepared to succeed in those fields. The sports industry by contrast, does not. Those looking to develop the skillset to succeed in a GM type of role will likely begin by combing through the piles upon piles of information available on the web or in various publications. However, up until this point, much of this information has been scattered, and required extensive amounts of discovery time to bring together.
The “Back to the Basics” series is designed to explore the foundation principals of statistical analysis across the four major American sports. The series will provide readers with an understanding of how teams approach roster construction and why certain decisions are made both on and off the field. Readers will also be directed to additional information sources, such as websites, books, or even magazine articles that could substantially increase their knowledge of the subject at hand. Now that you know why we are doing this, let’s explore today’s topic.
Baseball’s Pythagorean [Win-Loss] Theorem Pythagorean Win-Loss is a metric used in baseball to determine a clubs Expected Win Percentage based on two factors: (1) The number of runs scored by the team [RS]; and (2) The number of runs allowed by the team [RA]. In essence, the formula allows us to reduce any club's season to the following: “Tell me how may runs a team scored and how many it allowed and I’ll tell you how many games it won.”
The mathematical formula looks like this:
Exp. WP% = [RS]^2 / ([RS]^2 + [RA]^2)
Bill James first published the formula in the early 1980’s. Today, it is widely accepted and agreed upon that the most accurate exponent is 1.83, not 2.
Therefore, the current formula looks like this:
Exp. WP% = [RS]^1.83 / ([RS]^1.83 + [RA]^1.83)
If you know a club's expected win percentage, you can calculate the number of wins a club can expect to win when given the number of games played. For example, in the 2013 regular season, the New York Yankees scored 650 runs and allowed 671. Plugging those figures into the formula yields a win percentage of .4856.
Bill James is considered the main founder of baseball statistical analysis and has been given credit for coining the phrase “sabermetrics.” Much of his work is found in his Baseball Abstract publications, and he currently writes for the Hardball Times and consults with the Boston Red Sox.
Clay Davenport and the team at Baseball Prospectus have refined the exponent even further. In their model, the exponent changes or “floats” based on the run environment (i.e. home park) a club plays in. Therefore, based on a 162 game Major League Baseball [MLB] schedule, the 2013 New York Yankees should have won 78.67 games. Expected Wins is almost always within three games of a clubs actual win-loss record. Therefore, under these conditions, the New York Yankees should have won between 76 and 82 games.
However, the Yankees actually won 85 games in 2013. That is almost seven wins greater than their Expected Win calculation and three wins greater than the 82 wins projected by the Expected Win Range. The 2013 Yankees make a great example because they show how Baseball’s Pythagorean Theorem is not an exact calculation. Such strong deviations such as those displayed here are typically attributed to the quality of the clubs bullpen or the presence of clutch or timely hitting and pitching. The strength of the theorem lies in projecting wins based on the projected performances of the talent on the roster.
The goal, under this theorem, is to acquire talent that will either increase the amount of runs scored by a club or decrease the amount of runs it allows, leading to a greater projected win percentage and therefore a larger number of Expected Wins. While it should not be used as an exact calculation and determination of team talent, Pythagorean Win-Loss is a good and simple rule of thumb to determine if a club is over or underperforming.
Quick Stat: In 2013, the Pythagorean Win-Loss theorem correctly predicted 18 of the 30 teams actual wins within a three game + or – spread. Of the deviations greater than three games, three were four games off, three were five games off, and six were six or more games off of the actual wins mark. [See Table Below]
Note: ESPN uses 2 as an exponent when calculating Exp. Win-Loss Record. I re-performed the calculations using 1.83 for an exponent.