Saturday, May 27, 2017

Web scraping with R

Since this is a football blog written by a self-professed stats nerd, I have to write a little about the progress I'm making with respect to my efforts to perform web-scraping. Given my limited ability to program using HTML and something called "CSS," it has been a bit of an up hill battle... not unlike my social life has been for the last 31 years... but I digress.

This is code that I found online:

#Loading the rvest package
library('rvest')

#Specifying the url for desired website to be scrapped
url <- 'http://www.imdb.com/search/title?count=100&release_date=2016,2016&title_type=feature'

#Reading the HTML code from the website
webpage <- read_html(url)

#Using CSS selectors to scrap the rankings section
rank_data_html <- html_nodes(webpage,'.text-primary')

#Converting the ranking data to text
rank_data <- html_text(rank_data_html)

#Let's have a look at the rankings
head(rank_data)
 

The MOST important part about all this code is figuring out where the CSS selector is. Some folks have developed web extensions for Chrome or Firefox. I went about it in a couple different ways:
1. In Mozilla firefox, hit: ctrl + shift + i.
2. Scroll through things and hover over the code, and see what it's point at
3. Figure out the CSS selector and plug that into the html_nodes call.

The slightly easier way is to...
1. right click on something on a webpage and click "Inspect Element."
2. This should bring you to the inspector at the line you want.
3. right click on the line of code
4. --> copy --> CSS Selector

Your web scraping pain is just a tiny bit less.

You're welcome.

Saturday, May 20, 2017

Looking forward

Now that I have exactly 0 followers and 29 page views of my blog (at least 15 of them are me, so that leaves about 14 of them being attributable to my girlfriend), I am planning on outlining the future of this blog. For one, I'd like to continue to bring my evidence-free and technique-free musings on college and professional football one week at a time. So in the coming weeks, I hope to recap a variety of the Spring games that were played. My hope is to review the following:

1. Stanford, a spring game that I actually attended
2. University of Southern California
3. University of Oklahoma
4. Oklahoma State University
5. University of Alabama
6. University of Oregon
7. University of Michigan 

Separately, though, given my extensive interest in statistical methods, I'd like this blog to be an incubator for some of my ideas on machine learning and statistical methodology that I may then be able to use in my real job.

My first idea is to address the fact that current sports statistics are woefully unadjusted. With respect to quarterback passer rating (a combination of completions/attempt, yards/attempt, TD/attempt, and interceptions/attempt), passing yards, etc., these numbers neither account for the talent of the secondary nor does it account for the talent of your target. If you put anybody with at least one arm and two legs in the shotgun with Hunter Renfrow, Mike Williams, and Deon Cain as your receivers, he'll look pretty decent. In fact, I could probably get a couple completions, and I'm 5'9" 143lbs and haven't thrown a football since middle school PE class (circa 1996).

Instead, I propose the following:
Part A: My thought is to use an Elo system to arrive at a rating for each player over the last several years and to use this data to predict passing yards, rushing yards, kick returns, first downs, 3rd down conversions, touchdowns, turnovers, sacks, and tackles. This involves:
1. Web scraping a variety of websites to evaluate play-by-play matchups: receivers vs corners/safeties, running backs + O-line vs. D-line, quarterbacks vs. corner/safeties, quarterbacks vs. linebackers/defensive ends.
2. Introduce the effect of offensive coordinators and defensive coordinators
3. Finally include down and yardage.

Part B: The second part would be to employ an ensemble method to predict the final score of the games based on the passing yards, rushing yards, kick return yardage, first downs, 3rd down conversions, touchdowns, turnovers, sacks, and tackles. Off the top of my head, I'm thinking about using LASSO or Ridge Regression on linear regression, partial least squares, and a bootstrap regression technique to predict the actual score, though I'm open to other ideas. There's the possibility of using a LASSO on Poisson to try to predict the count of TDs and Field Goals.

Any other ideas out there?

Sunday, May 14, 2017

NFL Draft Analysis: Part 3 -- Runningbacks

This year's running back class was full of talent. Full like Blake_201 after a 4000 calorie tray of waffles, eggs, bacon, and hashbrowns. Look at the size of that food baby...

Anyway. Odd penchants for binge eating aside, we're talking about running the football. There are a few really nice things about the way the modern game of football has developed and the running backs in the current draft class fit in exceedingly well to a variety of schemes. For one, as the game has become increasingly pass-oriented, there has been a shift towards nickel defenses (that's a defensive back field with 5 backs; the 5th guy being the nickelback). If you have 5 defensive backs, then there will either be 3 down linemen and 3 linebackers, or you have 4 down linemen and 2 linebackers. In either situation, there are 6 defensive players in the box, and that means the offense may be able to take advantage of double teams in the run game. The other way to look at this is that a running back who can catch out of the backfield or can go into the slot will easily be able to step in and augment the pass game

Christian McCaffrey (Stanford, RB) Christian McCaffrey was one of my favorite players of the entire college football season, and is the player I'm most excited to watch play in the NFL. A friend who used to play wide receiver on the Stanford Football team has only great things to say about the kid's work ethic, attitude, and leadership. These are some of the qualities that are sometimes overlooked as we found out when some folks plagued by off-the-field antics still got drafted, though perhaps not quite as high as they otherwise would have been.

Much has been made of McCaffrey's speed (40yd dash @4.48), but in a flat out horse race he isn't quite as fast as the Panther's 2nd round pick, Curtis Samuel (40yd dash @4.31). He's not even to close to the speed that Marvin Bracy (60m @6.48 and 100m @9.93) is potentially bringing to the Panthers as an undrafted free agent. As a comparison, Christian Coleman (Tennessee) ran the 40yd dash in 4.12, which is as fast as the legend of Bo Jackson. Coleman's 60m best is 6.45 and 100m is 9.95. Moreover, there have been some concerns about his durability given his size (5'11" and 202lbs), strength (10 reps of 225), and his workload at Stanford: he averaged 24 carries per game in his record breaking 2015 season (3,864 all-purpose yards) and 23 carries per game in 2016. I disagree strongly. Bench press is something for the linemen and maybe the linebackers. Excepting Heisman trophy style stiff arms and pass-protection, there aren't too many functional reasons for a tailback to have exceptional upper body pushing strength. Additionally, I suspect that nobody was really thinking about drafting McCaffrey to be a part of any sort of 6-man (blocking 4 down linemen and 2 of 3 linebackers in a 4-3 scheme) or 7-man (accounting for 4 down linemen and 3 linebackers necessitating full back and tail back pass pro assignments) pass protection scheme, and in that case his size is somewhat less important. I submit that a better strength test for running backs might be any combination of the following: average of 10 consecutive vertical leaps, average of 10 consecutive 40yd dashes, repping out 225lbs full-squat, and maximum pull-ups. I specifically suggest the full-squat, because generating force (not power) at acute angles of knee flexion is probably the most similar to running through through contact.

Christian McCaffrey has two qualities that make him really standout: he's nimble (AF as the kids say these days) and he has great rhythm. Watch him doing position drills, and you'll notice that he is incredibly light on his feet with good knees and really great foot speed.



This contributes greatly to his ability to change directions as evidenced by his incredible 3-cone drill (6.57 seconds) and 60yd shuttle (11.03 seconds): McCaffrey was far-and-away the top performer in both of these. By comparison, Joe Williams, who ran 4.41 in the 40 posted a time of 7.19 seconds in the 3-cone drill. This is infinitely more important than straight line speed. Straight line speed is of value when the back is in the open field, and in that situation your running back is trying to outrun a linebacker or safety. This match up is less about speed as it is about anticipation and making the defensive player miss. The difference between a 4.3 and a 4.48 in the 40 is negligible.



In each of these plays, the element that makes Christian McCaffrey so devastating is his elusiveness. It's the quick stutter stepping as he waits for the defensive player to commit before turning on a dime. Once he's out in the open field, this kind of running is what you would expect of any half-decent running back. Getting the defenders to miss. That's something special. It'll be harder at the top level when very fit linebackers will be able to move and make arm tackles that bring down a somewhat undersized running back, but only time will tell how it plays out.

Finally, McCaffrey's ability to catch out of the backfield gives him the opportunity to be an additional weapon for the Panthers who appear to run a variety of QB counter, QB power-read, and QB Buck Sweep. With Cam Newton, an athletic running quarterback who was the key feature of the Auburn QB Power-Read (rushing for nearly 1500 yards and 20 touchdowns), I'm excited to see how they take advantage of McCaffrey's versatility. In a Buck Sweep, the linebackers have to go with the threat of the pass to McCaffrey who is motioning out allowing the pulling guards to get to the second level to block for Newton. This is going to be something.

Leonard Fournette (LSU, RB) Fournette is a traditional workhorse running back. He is someone who can carry the ball through contact and has the physical size to withstand a beating. Allegedly. With the nagging ankle injuries, no amount of size is going to fix that, though, and this is one of the reasons I am not as big a fan of Fournette as I am of McCaffrey. With ankle issues, will he still be able to cut and spin his way between the tackles running it up the gut? Or bouncing it outside? Having had a number of ankle issues myself when I was younger, the feeling of instability running the curve in a 200m or worse the seemingly interminable turns on the indoor track gives me some pause when considering him as the top running back in this draft class.

The ankle issues and potentially some weight problems represent two of the largest problems that teams faced in evaluating his draftability. While some people like to see that he was able to drop 12 pounds between the NFL combine (March 1st: Medical Examinations for Group 3: running backs) and the LSU pro-day (April 5th). That's 12 pounds in approximately 5 weeks; assuming that it's all fat (and it probably isn't), then that would be a calorie deficit of 1200 per day. This shows great discipline, but my question is why didn't he show the discipline going into the combine since that had the potential to be a major concern? With a lot of money on the line, most people would be motivated to lose weight. Just look at The Biggest Loser. The problem with this rapid weight loss is the metabolic derangement as a result and the tendency to regain all the weight. This is not to say that Leonard Fournette's weight loss is bad. It's just to say that if weight is an issue, then yo-yoing is not the solution. Moreso than for The Biggest Loser contestants, this may be more an issue of discipline.

Despite these issues, Fournette's game tape is excellent. He has incredible speed for such a large guy (240lbs, 4.51 in the 40yd dash), though his explosiveness left something to be desired with a meager 28.5" in the vertical jump.
Fournette's ability to cut is solid, but it's really his ability to run through contact and force missed tackles (or rather run through defenders) makes him look a lot like Marshawn Lynch. Whether he'll have the same success with this strategy in the NFL is up in the air. NFL linebackers won't be pushed over easily, though the violence with which Fournette runs may force some of defenders to lay down and play dead.



Samaje Perine (Oklahoma, RB) I thought about discussing Dalvin Cook (another extraordinarily elusive running back who is great at running through contact and forcing missed tackles), but I wanted to highlight a running back who can run up the gut but also may be great when it comes to pass pro.In a pass oriented game, the fact the Samaje Perine is bringing you both of these elements is critical. Here's a guy who is 230 pounds and can bench press 225lbs, i.e. his weight, 30 times. As discussed above, bench press is a lot more important for a lineman than a running back, but if you're tasked with slowing down an edge rusher, then it may be slightly more important. This may be the guy who can actually slow down a Myles Garrett or Jonathan Allen.



Whether he'll be able to run in a typical zone scheme is uncertain especially if he's trying to bend it back or bounce outside against the nimbler linebackers that he'll be encountering in the NFL (especially in comparison to what he saw in the Big-12--Sorry, guys), but he is definitely a guy who can drive the pile and carry the ball in a downhill power scheme. Looking forward to seeing how he develops!

Thursday, May 4, 2017

NFL Draft Analysis: Part 2 -- Quarterbacks

In part 2 of my (somewhat late: my girlfriend was sick of hearing about football and suggested that I start a blog recently) NFL draft analysis, we're going to look at the winners and losers of the quarterback class.

One of the challenges of quarterbacks transitioning from college is that the spread offense (or it's logical progression: the run-and-shoot or air raid) and the option offense don't work quite as well in the NFL on account of the size, speed, and smarts of the NFL defensive players. As a result, many extraordinarily successful college quarterbacks, who dominated by virtue of their incredible athleticism, have faced significant challenges in the NFL: Tim Tebow (Florida), RG3 (Baylor), and Johnny Manziel (Texas A&M) just to name a few recent Heisman winners. The modern successful NFL quarterback needs to have excellent accuracy, impeccable timing, and solid decision making. While the pundits and talking heads continue to obsess over the athleticism and arm strength, pocket presence, field vision, and accuracy are almost definitely more important.

I'd like to talk a little about this year's draft class. Later this spring, I'll be reviewing a variety of Spring Games in which we will also check out next year's draft eligible quarterbacks.

Deshaun Watson, QB (Clemson) As the quarterback of the recently crowned College Football Playoff National Champion Clemson Tigers who also played in the previous year's championship game, Deshaun Watson probably has the best pedigree of the bunch. There aren't many players who have demonstrated the ability to handle pressure on the big stage. The game winning drive in the most recent CFP championship game was a thing of beauty; however, the threat of the read-sweep/QB power being augmented by the fake-toss/QB counter was not only a show of unique athleticism on the part of Deshaun Watson but also brilliant preparation by Dabo Swinney.



However, is he worth trading this year's first round pick and next year's first round pick to nab him as the 12th overall? As demonstrated by the 1:2 TD to interception ratio and modest 64% completions (admittedly one of his worse games this year against a talented secondary), there may be two issues with his passing game. Firstly, his decision making in throwing into double coverage (Cover-1/man with the deep safety helping out) is a mild problem. But more importantly, some moderate issues with accuracy. These accuracy issues appear slightly more pronounced when throwing to his left as evidenced by his interception reel gelow. The leftward thrown interceptions seem to be more commonly overthrown or off target while the rightward thrown interceptions look more like they are just thrown into coverage (bad decision).



Interestingly, though, these problems are less prominent at the Combine.


What it comes down to is the challenge that running quarterbacks who rely on the inside/outside zone read schemes and power-read concepts face when entering the NFL. On the one hand, there's Chip Kelly, who brought the Oregon Ducks to prominence with the inside/outside zone read, experiencing severe difficulties with the 49ers; on the other side, there's Carolina's varying success with the QB Counter Read, Power Read/Inverted Veer, and QB Buck Sweep: 15-1 in 2015 vs. 6-10 in 2016. While Deshaun Watson is a terrific athlete, only time will tell if he will be able to develop as a pocket passer and whether the read-option will be effective for the Houston Texans. I specifically mention read-option as compared to the zone-read (note Chip Kelly's differentiation between Zone-Read and Read-Option), because the athleticism and strength of the NFL defensive linemen and outside linebackers is such that reading the defensive end or outside linebacker may not be an effective means of getting him out of the play.


Pat Mahomes II, QB (Texas Tech) When it comes to Pat Mahomes, people have a tendency to use germs like "gunslinger" and "cowboy." While I do like cowboy boots (I'm wearing a pair of lucchese Ostrich boots this very second), I don't know how well it pays off at the professional level where the matchups really are not as favorable as they were in college. Especially when college was in the "defense optional" BIG-12: Oklahoma vs. Texas Tech was 66 - 59. That's pretty close to the Oklahoma vs. Texas Tech basketball game (69 - 77). In that game, Mahomes threw for 734 yards, and if you watch the game tape, he looked exceedingly good rolling out of the pocket to the left, turning quickly, and throwing both intermediate and long balls. The ease with which he opens up the left hip to throw left even under pressure is very impressive. On top of that his accuracy is excellent. Some of this probably comes from being a pitcher with a 96mph fastball in high school who was also clocked at 62mph to beat David Carr (57mph) in a throw off.



The thing I don't like as much about him is that he's coming out of an air raid offense that focuses less on play calling and more on pace and matchups. This eliminates a lot of the tactical cognitive component that is so necessary in the NFL. While it's apparent that he has the physical talents, who knows how he'll develop as a play caller in the League? He's definitely not the next Andrew Luck (Stanford), who was a 3x pro-bowler in his first 3 years and took the Colts from 2-14 to back-to-back-to-back 11 win regular seasons. It's obvious that the Kansas City Chiefs felt that his coachability and the opportunity to develop under the wing of Alex Smith overcame this deficiency, though I find it exceedingly curious that the Chiefs picked him 10th overall by trading their 2017 first and third round picks and their 2018 first round pick. I had him on my board as a 2nd rounder and likely the best long term developmental prospect.


Mitchell Trubisky, QB (UNC) No one has experienced as much hype in this year's NFL draft as Mitchell trubisky. Epic poems in iambic hexameter have been written about his athleticism and accuracy, and I believe as much of that as I do the Odyssey. Much like Odysseus faced Scylla and Charybdis, Mitchell Trubisky will be facing defensive linemen on one side and judgmental Chicago fans on the other. There was nobody in this draft who was worth trading the 3rd overall pick, a 3rd and 4th round pick, a 3rd round 2018 pick, naming rights to the owner's grandchildren, and all the brick and wood in Settlers of Catan... and the bears did this for Mitchell Trubisky, who they almost definitely would have gotten unless someone else traded the 49ers for that spot and also had not watched any of the guy's game tape. This is also in the face of paying Mile Glennon a truckload of money: a stack of 100 bills is 2.6" x 6.14" x 0.43. Assuming that we're paying in $20 bills, that's $2,000 per stack and 9250 stacks, which comes out to 63,500 cubic inches, which is a 10+" tall stack of money in the bed of a 6.5' box Ford F150.



Don't get me wrong. I do think his arm is good. His TD-to-interception ratio (30:6) speaks well to his decision making and accuracy. His size and athleticism is appropriate. Having said that, he has only ever started 13 games, and played in a handful more. He is still learning a nuanced and complicated game that has as much strategy as tactics. Indeed, when he was pressured in the Sun Bowl, he faltered again and again. This loss was against a Stanford team that was hobbled by Christian McCaffrey electing to sit out, and Keller Chryst getting injured.



Unfortunately, this is likely another case of one  stellar game deciding a player's reputation. Playing against a very strong Florida State team (a team that made Michigan look pretty silly for 3 quarters). His passing was stellar (81% completion for 405 yards, 3 TD and no interceptions). No one will ever be able to take that away from him. But one good game doesn't make a #2 draft pick, and looking deeper into the FSU and virginia tech games, he had negative yards in both and had 3 and 2 sacks in each game, respectively. In both games, it seems he either held onto the ball too long or was rattled when flushed out of the pocket.

On a separate note, I also don't like his face. He kinda looks like every kid who beat me up and took my lunch money in grade school. And middle school. Pretty much through medical school.

;

On the one hand, I think another year of ACC play would have told the world a lot more about his skillset. He would have been able to learn more about directing a team. He would have been able to develop his field vision. On the other hand, his draft stock could never possibly be any higher than the #2 pick, and the prospect of going into the 2018 draft against some stellar quarterbacks (a topic for my spring game analyses in the next series of posts) was no doubt daunting. The possibility of losing that by putting more football on tape or worse... getting injured (think: Jake Butts, Chad Kelly) would push anyone to declare early for the draft. Very clever, Team Trubisky. Very clever.


I think everyone is going to be watching the development of these 3 players with great interest over the next few years given how much draft-capital was expended.

Coming up on Draft Analysis...
NFL Draft Analysis: Part 3 -- Running backs