Skip to content
Archive of posts filed under the BigData category.

AmpCamp 3 – The Stack

The Berkeley Data Analytics Stack (BDAS) was the central subject at AmpCamp 3. Spark is the core of the stack. It has been recently adopted for incubation as an Apache Project. True to form for a fast-moving OSS project, we actually used the 0.80 git repo version, rather than the 0.73 that you will find […]

AmpCamp 3 – Intro

I cannot say enough about AmpCamp 3, a two day workshop at UC Berkeley that has just completed. The Berkeley AMP Lab (Algorithms, Machines and People) put on another great Open Source community building event, with state-of-the-art tech, precision execution, and the sort of fun that comes from a job well done. Many interesting people, […]

Sizing Up California – Homes

I live in California, and it’s a big place. I was reviewing some records regarding residential homes. Using some simple stats, I broke the records into partitioned tables in PostgreSQL by county, and then let the rest fall into a general bucket. There is no one correct answer for this kind of analysis setup, but […]