Nick Cassady

My Portfolio
  • About
  • Home

Month: March 2018

  • Home
  • /
  • Month: March 2018

Processing Wikipedia Pages, Part 2

By Nick

  • 710
  • Global Timeline ,
  • Tags: global timeline, python, regex, scraping, scripting
  • 14 Mar

Before reading this, I recommend you read part 1 here. File Handling Now that I have Wikipedia pages downloaded, I need to parse through the text and markup to get just the information I need for my other project. The first step in parsing through all these files was to open every file in a specified

Read More

Processing Wikipedia Pages, Part 1

By Nick

  • 2
  • Global Timeline ,
  • Tags: encoding, global timeline, python, scraping, scripting
  • 13 Mar

As part of a larger project (Global Timeline), I set out to write a script to scrape Wikipedia pages for events, dates, and locations for an initial set of data. There were several major steps involved, each bringing its own challenge. Gathering Pages Which Pages? The first step I had to accomplish was to find

Read More

Instagram

Load More...
Follow on Instagram

Search

    Recent Posts

  • Processing Wikipedia Pages, Part 2

    14 March 2018

  • Processing Wikipedia Pages, Part 1

    13 March 2018

Archives

  • March 2018

Categories

Tags

encoding global timeline python regex scraping scripting

Powered By Impressive Business WordPress Theme