Jacob Kaplan-Moss

51 items tagged “data”

📌 2018 in Review – Ellen Chisa – Medium

I tend not to have the discipline to track my personal data this closely, but I always want to. Anyway—the major takeaway for me here is the “most important task of the day” idea. I do this weekly already, so it’d be a minor (and useful, I think) addition to take up doing it daily. #

📌 Tiller

Like Mint &c, but syncs financial data to Google Sheets instead of a web app. I’m a huge fan of Sheets: it’s a far more powerful product than you’d think, with strong scripting support and a pretty easy API. Using it for my own financial analysis seems perfect, looking forward to giving this a try. #

📌 Performance Dashboard - Snacks - NB PCT 2018 #
📌 Quartz/bad-data-guide

“An exhaustive reference to problems seen in real-world data along with suggestions on how to resolve them.” #

📌 Shiny #

did the motion graphics for xoxo #

📌 Sophisticated Security: Bitcoin Private Key Necromancy

Ignore the Bitcoin part; this is a cool store of forensic data recovery from a thrashed hard drive. #

📌 Captricity

“Captricity is the easiest, fastest, and most cost-effective way to capture data trapped on paper—such as thousands of hand-completed survey forms—and convert it into digital data that can be searched, stored, shared, and studied.” #

📌 obfuscurity.

“An awesome collection of examples for using D3 and general visualization work. Pro: You don’t have to scour the web for these yourself. Con: It’s unlikely you’ll ever fully consume all the awesome.” #

📌 d3.js

Really good-looking visualization library. #

📌 pskomoroch's dataset Bookmarks on Delicious

Lots of data sets available on the Internets #

📌 How to process a million songs in 20 minutes « Music Machinery

A good “introduction to big data” -- this one using the million song dataset I just linked. #

📌 Million Song Dataset | scaling MIR research

Looks like an awesome large data set (280 GB) to fool with. #

📌 Hi My Name is John... // RailsTips by John Nunemaker

A nice intro to Graphite and Statsd. It’s a shame that Graphite’s such a PITA to install because this is hawt. #

📌 Wrangler

“An interactive tool for data cleaning and transformation.” A fantastic tool for data cleanup. Watch the video; it’s like magic. #

📌 Scraping for Journalism: A Guide for Collecting Data - ProPublica

Really not just “for journalism” — this series (five parts plus an intro) is a wonderful introduction to ways of extracting data from hard-to-process places. A perfect resource for those just starting down the scraping journey, but there’s also some cool tricks for the experienced. #

📌 MnDNR Data Deli

GIS data from the MN DNR. #

📌 Cartographer.js – thematic mapping for Google Maps

Heat maps, point clustering, etc. Built on top of Raphael. #

📌 Boomerang: A bidirectional programming language for ad-hoc data

Really interesting: a language for developing text processing that’s bidirectional -- each program can transform data both “forwards” and “backwards.” #

📌 Timetric: making sense of statistics

A platform for storing, updating, and embedding time series analysis data. Complete with a (very good) API, OpenID support, nice looking graphs, etc. Really quite cool. #

📌 GiantBomb API

Andy’s spent the last month or so working on APIs for all our sites. GiantBomb’s API launched today; the other sites will follow shortly. There’s a crap ton of data here; I’m looking forward to seeing what gets built on top of it. #

📌 How Nate Silver Went From Forecasting Baseball Games to Forecasting Elections -- New York Magazine

FiveThirtyEight has firmly wedged its way into the small list of sites I read obsessively. #

📌 ThinkGeek :: Car Chip Pro Engine Performance Monitor

Want. #

📌 National TIGER/Line Shapefiles

TIGER/Line data in SHP format. Importing this into a GeoDjango site is easy-peasy. #

📌 Girl Turk: Mechanical Turk Meets Girl Talk's "Feed the Animals" - Waxy.org

This is why I love the Internet. #

📌 Geeking with Greg: Clever method of near duplicate detection

A slick algorithm to “fingerprint” text based on chains for words following stop words. #

📌 django-tables: A QuerySet renderer

Really neat base class for doing presentation of tabular data in Django. Very well-done. #

📌 Star Wars Kid: The Data Dump - Waxy.org

“Be warned, this is more detail than you’ll ever want about the origins of the Star Wars Kid meme and how it spread. You don’t care about this level of detail.” #

📌 Official Google Docs Blog: Stop sharing spreadsheets, start collecting information

New in Google Spreadsheets: publish a web form which submits data into your spreadsheet. This is completely brilliant. I can see this being incredibly useful. Yet another reason never to use Excel again. #

📌 Google Spreadsheet: Functions for external data

“=importXML(”www.google.com“, ”//a/@href“)” -- wow. #