Jacob Kaplan-Moss

3 items tagged “mapreduce”

📌 How to process a million songs in 20 minutes « Music Machinery

A good “introduction to big data” -- this one using the million song dataset I just linked. #

📌 Writing An Hadoop MapReduce Program In Python - Michael G. Noll

Neat. I hadn’t realized that Hadoop map/reduce jobs could be a bog-standard shell script. It’s especially cute that testing the job comes down to “cat data | map | reduce”. #

📌 Google Scalability Conference Trip Report: MapReduce, BigTable, and Other Distributed System Abstractions for Handling Large Datasets

More details (than I’ve seen so far) on the Google architecture. If I’m doing my math right, the data given here tells us that Google has around 25 exa(10^18)bytes. #