Skip to content
Archive of posts filed under the BigData category.


The Group On Earth Observation System of Systems (GEOSS) plenary conference was held in November of 2016. 2017 Work Plan -LINK- Earth System Grid -PAGE-  

AmpCamp 2014

BDAS  the Berkeley Data Analytics Stack At a minimum, suffice it to say I participated online in roughly twelve hours of lecture and lab on Nov 20 and 21, 2014 at AmpCamp 5 (I also attended one in Fall 2012). I put an emphasis on python, IPython Notebook, and SQL. Once again this year, the […]

Numeric Stats on Bay Area Intersection Counts

In preparing for an upcoming Datathon, a column of data in PostgreSQL numeric format needed formatting for presentation. “Intersection Count” intersection_density_sqkm is a count of street intersections per unit area – a quick way to measure density of the built environment. A table of grid cells (covering the nine-county San Francisco Bay Area) that the […]

Worldwide Forestry Inventory Published, Nov13

Dozens of major news outlets posted articles yesterday profiling a paper published in the journal ‘Science’ by a team led by Matthew Hansen, a remote sensing scientist at the University of Maryland, along with extensive data. ‘Published by Hansen, Potapov, Moore, Hancher et al. * Powered by Google Earth Engine‘

California POIs 2013

I took a short course on the US Census at the University of California, Berkeley D-Lab recently. Of course, the first topic was the shutdown of People using the convenient, interactive Census API have been left without access to census data. Two hours of condensed lecture was just enough time to cover the basics […]

Five Colors for Stats

I am building some visualization layers in Geoserver from PostGIS, which requires .sld files (until Geoserver catches up with the CSS styling world – oh wait, look here). It is convenient to show ranges using ColorBrewer2 colors in a set of one plus five.. a color for NoValue, then what I call little0, little1, central […]

PostgreSQL 9.3 plus Hadoop File System

It seems that things just got a little bit more interesting with the release of Pg 9.3

Data Characterization and the Live

People may already know about the OSGeo Live project. Its a great base as a VM since a) it is stable and very well tested, and b) it has much software installed, but in a way that is transparent through install scripts, so customization is as straightforward as it gets.. I was faced with a […]

AmpCamp 3 – HDFS

As described in a previous post, I used Cloudera .debs to install Hadoop/HDFS on an Ubuntu 12.04 ‘Precise’ single node. Now, to put some data into the HDFS system and use it. (note- this did not work in a 32bit VM) The Hadoop/HDFS install consisted of two steps: obtain and install .deb cdh4-repository, which enables […]

AmpCamp 3 – The Stack

The Berkeley Data Analytics Stack (BDAS) was the central subject at AmpCamp 3. Spark is the core of the stack. It has been recently adopted for incubation as an Apache Project. True to form for a fast-moving OSS project, we actually used the 0.80 git repo version, rather than the 0.73 that you will find […]