Jacob Kaplan-Moss

7 items tagged “scraping”

📌 Scraping Google Groups « Saturnboy

Tool (PHP) to scrape the archives out of Google Groups. #

📌 Wrangler

“An interactive tool for data cleaning and transformation.” A fantastic tool for data cleanup. Watch the video; it’s like magic. #

📌 Scraping for Journalism: A Guide for Collecting Data - ProPublica

Really not just “for journalism” — this series (five parts plus an intro) is a wonderful introduction to ways of extracting data from hard-to-process places. A perfect resource for those just starting down the scraping journey, but there’s also some cool tricks for the experienced. #

📌 MySpace cyber-bullying conviction tentatively dismissed - Los Angeles Times

The cyber-bullying aspect of this case is utterly un-interesting to me. However, the ruling contains this tidbit: Judge Wu finds that violating a site’s terms of service cannot be considered a crime. Big news for anyone doing data scraping! #

📌 Boomerang: A bidirectional programming language for ad-hoc data

Really interesting: a language for developing text processing that’s bidirectional -- each program can transform data both “forwards” and “backwards.” #

📌 templatemaker - Google Code

Adrian’s “reverse template engine:” take a series of files and construct a template that could have been used to generate those pages. Obviously extremely useful for data scraping. #

📌 The ElementSoup Module ::: www.effbot.org

Fan-frickin-tastic! #