Since this is a football blog written by a self-professed stats nerd, I have to write a little about the progress I'm making with respect to my efforts to perform web-scraping. Given my limited ability to program using HTML and something called "CSS," it has been a bit of an up hill battle... not unlike my social life has been for the last 31 years... but I digress.
This is code that I found online:
The MOST important part about all this code is figuring out where the CSS selector is. Some folks have developed web extensions for Chrome or Firefox. I went about it in a couple different ways:
1. In Mozilla firefox, hit: ctrl + shift + i.
2. Scroll through things and hover over the code, and see what it's point at
3. Figure out the CSS selector and plug that into the html_nodes call.
The slightly easier way is to...
1. right click on something on a webpage and click "Inspect Element."
2. This should bring you to the inspector at the line you want.
3. right click on the line of code
4. --> copy --> CSS Selector
Your web scraping pain is just a tiny bit less.
You're welcome.
This is code that I found online:
#Loading the rvest package
library('rvest')
#Specifying the url for desired website to be scrapped
url <- 'http://www.imdb.com/search/title?count=100&release_date=2016,2016&title_type=feature'
#Reading the HTML code from the website
webpage <- read_html(url)
#Using CSS selectors to scrap the rankings section rank_data_html <- html_nodes(webpage,'.text-primary') #Converting the ranking data to text rank_data <- html_text(rank_data_html) #Let's have a look at the rankings head(rank_data)
The MOST important part about all this code is figuring out where the CSS selector is. Some folks have developed web extensions for Chrome or Firefox. I went about it in a couple different ways:
1. In Mozilla firefox, hit: ctrl + shift + i.
2. Scroll through things and hover over the code, and see what it's point at
3. Figure out the CSS selector and plug that into the html_nodes call.
The slightly easier way is to...
1. right click on something on a webpage and click "Inspect Element."
2. This should bring you to the inspector at the line you want.
3. right click on the line of code
4. --> copy --> CSS Selector
Your web scraping pain is just a tiny bit less.
You're welcome.
 
No comments:
Post a Comment