Lab Scrape Emails
Update:
It appears if there are too many requests to the Ridgewater URL with the same User-Agent, then anti-bot measures kick in and you will get a 403 Forbidden error.
Websites that describe how to overcome this:
How to customize Your User-Agent with Python Requests
How to Effectively Use User Agents for Web Scraping
The fix is to use your own User-Agent.
Go to Chrome tools, Console, and type navigator.userAgent at the > (prompt)
You will get a response with the User-Agent of your machine. Use it in your Python code.
Here’s how you can check and get the user agent using your browser’s console:
Open the developer tools in Google Chrome, Microsoft Edge, Mozilla Firefox, Safari or any other browser. You can use F12 or Ctrl+Shift+I on Windows/Linux, or Cmd+Option(⌘)+I on macOS. Switch to the Console tab.
Type navigator.userAgent in the console and press Enter (or Ctrl+Enter). The console will return a string which is your browser’s user agent.
This lab is to be done on your own.
Submit all these files to the D2L dropbox:
The image below shows an example start run of the program.
The image below shows an example end run of the program.