Skip to content
Archive of posts filed under the BigData category.

PostgreSQL 9.3 plus Hadoop File System

It seems that things just got a little bit more interesting with the release of Pg 9.3

Data Characterization and the Live

People may already know about the OSGeo Live project. Its a great base as a VM since a) it is stable and very well tested, and b) it has much software installed, but in a way that is transparent through install scripts, so customization is as straightforward as it gets.. I was faced with a […]

AmpCamp 3 – HDFS

As described in a previous post, I used Cloudera .debs to install Hadoop/HDFS on an Ubuntu 12.04 ‘Precise’ single node. Now, to put some data into the HDFS system and use it. (note- this did not work in a 32bit VM) The Hadoop/HDFS install consisted of two steps: obtain and install .deb cdh4-repository, which enables […]

AmpCamp 3 – The Stack

The Berkeley Data Analytics Stack (BDAS) was the central subject at AmpCamp 3. Spark is the core of the stack. It has been recently adopted for incubation as an Apache Project. True to form for a fast-moving OSS project, we actually used the 0.80 git repo version, rather than the 0.73 that you will find […]

AmpCamp 3 – Intro

I cannot say enough about AmpCamp 3, a two day workshop at UC Berkeley that has just completed. The Berkeley AMP Lab (Algorithms, Machines and People) put on another great Open Source community building event, with state-of-the-art tech, precision execution, and the sort of fun that comes from a job well done. Many interesting people, […]

Sizing Up California – Homes

I live in California, and it’s a big place. I was reviewing some records regarding residential homes. Using some simple stats, I broke the records into partitioned tables in PostgreSQL by county, and then let the rest fall into a general bucket. There is no one correct answer for this kind of analysis setup, but […]