Before we Start


  • Use RStudio to write and run R programs.
  • Use install.packages() to install packages (libraries).

Creating a WebpageIntroduction


  • A basic web page is an annotated text file

Introduction: What is web scraping?


  • Humans are good at categorizing information, computers not so much.
  • Often, data on a web site is not properly structured, making its extraction difficult.
  • Web scraping is the process of automating the extraction of data from web sites.

Web scraping using R and rvest


  • rvest is an R package that can be use to scrape content from the web.
  • With rvest, we can define what elements to scrape from a page.
  • By using R to scrape the data, analysis can also be done on the scraped data.

ConclusionReferences


  • Web scraping is, in general, legal and won’t get you into trouble.
  • There are a few things to be careful about, notably don’t overwhelm a web server and don’t steal content.
  • Be nice. In doubt, ask.