Web Scraping Workshop

Web Scraping is a very useful method used by data scientists to gather data from websites. This workshop introduces the basics of web scraping and reviews common web scraping methodologies. Although there are many different ways to scrape data from websites this workshop covers some of the most popularly used libraries that Python has to offer.


Software Configuration

 

To prepare, please follow the steps below to download the programs and we’ll teach you the rest!

  1. Navigate to https://www.anaconda.com/download/ and download the 3.6 version of Python and make sure to click ‘run’ on the popup. Click through the setup window, continuing with the default options. We are not going to be using visual basics for this purpose.

python1.jpg

2. Open the Anaconda Navigator from your start menu. You can save this as a desktop icon if you choose. For Mac users, you can add Anaconda Navigator to your applications folder.

python2.jpg

3. Click ‘launch’ under the Jupyter Notebook icon. It should open a tab in your default browser (Chrome, etc)

python3.jpg

4. Now, navigate to DSI’s Natural Language Processing Github repository, https://github.com/dsiufl/Web_Scraping and click Download Zip. Navigate to the .zip file in your files on your computer and unzip it.

scrape1.jpg

5. In the file browser inside of Jupyter, navigate to the unzipped file folder and open the .ipynb file for the workshop you are attending, in this case, Web Scraping. Open the student version (Web_Scraping_Student.ipynb) to code along with the instructor at the workshop or the instructor version to see the completed workshop.

s.jpg

6. It should look like this. Thanks for coming and enjoy the workshop!

s1.jpg

Remember, don’t hesitate to ask any questions if you need help and come to the workshop early if you are having trouble downloading or installing any of the software so our instructors can assist you. Feel free to post on the Facebook event page or DSI Facebook group if you have any questions!