Saturday, May 20, 2017

Looking forward

Now that I have exactly 0 followers and 29 page views of my blog (at least 15 of them are me, so that leaves about 14 of them being attributable to my girlfriend), I am planning on outlining the future of this blog. For one, I'd like to continue to bring my evidence-free and technique-free musings on college and professional football one week at a time. So in the coming weeks, I hope to recap a variety of the Spring games that were played. My hope is to review the following:

1. Stanford, a spring game that I actually attended
2. University of Southern California
3. University of Oklahoma
4. Oklahoma State University
5. University of Alabama
6. University of Oregon
7. University of Michigan 

Separately, though, given my extensive interest in statistical methods, I'd like this blog to be an incubator for some of my ideas on machine learning and statistical methodology that I may then be able to use in my real job.

My first idea is to address the fact that current sports statistics are woefully unadjusted. With respect to quarterback passer rating (a combination of completions/attempt, yards/attempt, TD/attempt, and interceptions/attempt), passing yards, etc., these numbers neither account for the talent of the secondary nor does it account for the talent of your target. If you put anybody with at least one arm and two legs in the shotgun with Hunter Renfrow, Mike Williams, and Deon Cain as your receivers, he'll look pretty decent. In fact, I could probably get a couple completions, and I'm 5'9" 143lbs and haven't thrown a football since middle school PE class (circa 1996).

Instead, I propose the following:
Part A: My thought is to use an Elo system to arrive at a rating for each player over the last several years and to use this data to predict passing yards, rushing yards, kick returns, first downs, 3rd down conversions, touchdowns, turnovers, sacks, and tackles. This involves:
1. Web scraping a variety of websites to evaluate play-by-play matchups: receivers vs corners/safeties, running backs + O-line vs. D-line, quarterbacks vs. corner/safeties, quarterbacks vs. linebackers/defensive ends.
2. Introduce the effect of offensive coordinators and defensive coordinators
3. Finally include down and yardage.

Part B: The second part would be to employ an ensemble method to predict the final score of the games based on the passing yards, rushing yards, kick return yardage, first downs, 3rd down conversions, touchdowns, turnovers, sacks, and tackles. Off the top of my head, I'm thinking about using LASSO or Ridge Regression on linear regression, partial least squares, and a bootstrap regression technique to predict the actual score, though I'm open to other ideas. There's the possibility of using a LASSO on Poisson to try to predict the count of TDs and Field Goals.

Any other ideas out there?

No comments:

Post a Comment