Before we Start
- Use RStudio to write and run R programs.
- Use
install.packages()
to install packages (libraries).
Creating a WebpageIntroduction
- A basic web page is an annotated text file
Introduction: What is web scraping?
- Humans are good at categorizing information, computers not so much.
- Often, data on a web site is not properly structured, making its extraction difficult.
- Web scraping is the process of automating the extraction of data from web sites.
Web scraping using R and rvest
- rvest is an R package that can be use to scrape content from the web.
- With rvest, we can define what elements to scrape from a page.
- By using R to scrape the data, analysis can also be done on the scraped data.
ConclusionReferences
- Web scraping is, in general, legal and won’t get you into trouble.
- There are a few things to be careful about, notably don’t overwhelm a web server and don’t steal content.
- Be nice. In doubt, ask.