Skip to content

GEOSS XIII

geo_logo

The Group On Earth Observation System of Systems (GEOSS) plenary conference was held in November of 2016.

2017 Work Plan -LINK-

Earth System Grid -PAGE-

 

geo-xiii-2-inf-03_geoss_components

OSM Software Meta

There is a non-obvious relationship of big engines like Mapnik, and the rest of Openstreetmap activity. While building OSGeo-Live v10, I am trying to make sense of “the whole of openstreetmap software” — to make a map of it, so to speak.. but a map of logical groupings, by purpose, and weighted by popularity and utility. Server-side to client-side is represented as one spectrum, right to left.. and then separate activity classes, like the difference between data pipelines for maintenance, rendering, and more recently, analysis.. then the nouns of the actual software projects, some of which are quite large, like Mapnik.

base_osm_sfwrF

related links:
http://wiki.openstreetmap.org/wiki/Develop#How_the_pieces_fit_together

Mapnik: main site; OSM wiki page;wiki; tutorial; repo; python interfaces; python-mapnik quickstart

OSMIUM repo; pyosmium; and other OSM Code
 
osm2pgsql repo and a tutorial
 
Imposm3 repo and tutorial

OSM Node One http://www.openstreetmap.org/node/1

OSM dot-org Internal Git https://git.openstreetmap.org/

OSM Packaging in Debian -blends- -ref-

OSM TagInfo language example

 
US TIGER Data

A representative example of US Census Bureau TIGER data, integrated into OSM. -Here-

 
OSMBuildings

Sonoma State University in osmbuildings

osmlab labuildings gitter channel

OSM Wiki – Multipoygons -link-

 
OSM-Analytics

>-here- presented by mikel maron and jennings anderson at SOTM-US in this video Odd thing here may be, that the “unit of analysis is the tile” .. so, in a twist, the delivery of the graphics, becomes the unit of analytics. MapBox blog post on osm-qa tiles

 
Overpass-Turbo

Openstreetmap Wiki Overpass-turbo

OSM Future Directions have been brewing for a long time

osm2vectortiles osm2vectortiles-logo

 
Other Notable Resources
 
3rd Party OSM WMTS OWS Service via MapProxy

Wikimedia Foundation Maps https://www.mediawiki.org/wiki/Maps

Omniscale Gmbh and Co. KG, OSM https://osm.omniscale.de

Overpress Express http://overpass-turbo.eu/

OSM Software Watchlist -here-

OSM Geometry Inspector -link-

MapBox Mapping -Repo- -Wiki-

OpenSolarMap -hackpad- http://opensolarmap.org -Github-
http://2016.stateofthemap.org/2016/opensolarmap-crowdsourcing-and-machine-learning-to-classify-roofs/
 

OSM Basemaps -LINK-

OSGeo-Live 9.5 Released

osgeolive_menu6 The OSGeo Community has announced immediate availability of the OSGeo-Live reference distribution of geospatial open-source software, version 9.5. OSGeo-Live is available now as both 32-bit and 64-bit .iso images, as well as a 64-bit Virtual Machine (VM), ready to run. Users across the globe can depend on OSGeo-Live, which includes overview and introductory examples for every major software package on the disk, translated into twelve languages. LINK

New Applications:
Project Jupyter (formerly the IPython Notebook) with examples
istSOS – Sensor Observation Service
NASA World Wind – Desktop Virtual Globe

Twenty-two geospatial programs have been updated to newer versions, including:

QGIS 2.14 LTR with more than one hundred new features added or improved since the last QGIS LTR release (version 2.8), sponsored by dozens of geospatial data providers, private sector companies and public sector governing bodies around the world.
MapServer 7.0 with major new features, including complex filtering being pushed to the database backends, labeling performance and the ability to render non-latin scripts per layer. See the complete list of new features
Cesium JavaScript library for world-class 3D globes and maps
PostGIS 2.2 with optional SFCGAL geometry engine
GeoNetwork 3.0

Analytics and Geospatial Data Science:
R geostatistics
Python reference libraries including Iris, SciPy, PySAL, geoPandas

About OSGeo-Live

OSGeo-Live is a self-contained bootable USB flash drive, DVD and Virtual Machine, pre-installed with robust open source geospatial software, which can be trialled without installing any software.

• Over 50 quality geospatial Open Source applications installed and pre-configured
• Free world maps and sample datasets
• Project Overview and step-by-step Quickstart for each application
• Lightning presentation of all applications, along with speaker’s script
• Overviews of key OGC standards
• Translations to multiple languages
• Based upon the rock-solid Lubuntu 14.04 LTS GNU/Linux distribution, combined with the light-weight LXDE desktop interface for ease of use.

Homepage: http://live.osgeo.org
Download details: http://live.osgeo.org/en/download.html
Post release glitches collected here: http://wiki.osgeo.org/wiki/Live_GIS_Disc/Errata/9.5

Winter California 2015

For those that have been following the Climate Change story over the years, this satellite imagery tells a story quite vividly.. no modelling uncertainty involved.

ca-feb2015-noaa-viz

AmpCamp 2014

spark_logo_sm

BDAS  the Berkeley Data Analytics Stack

At a minimum, suffice it to say I participated online in roughly twelve hours of lecture and lab on Nov 20 and 21, 2014 at AmpCamp 5 (I also attended one in Fall 2012). I put an emphasis on python, IPython Notebook, and SQL.

Once again this year, the camp mechanics went very smoothly — readable and succinct online exercises; Spark docs; Spark python, called pyspark is advancing, although some interfaces may not be available to python yet; Spark SQL appears to be useable.

To setup on my own Linux box, I unzipped the following files:
ampcamp5-usb.zip ampcamp-pipelines.zip training-downloads.zip

The resulting directories provided a pre-built Spark 1.1
Using Scala version 2.10.4 (OpenJDK 64-Bit Server VM, Java 1.7.0_65)

The Lab exercises are almost all available as both Scala and python. Tools to do the first labs:

$SPARK_HOME/bin/spark-shell  $SPARK_HOME/bin/pyspark

and for extra practice

$SPARK_HOME/bin/spark-submit  $SPARK_HOME/bin/run-example

IPython Notebook

An online teaching assistant (TA) suggested a command line to launch the Notebook – here are my notes:

##-- TA suggestion
IPYTHON_OPTS="notebook --pylab inline" ./bin/pyspark --master "local[4]"

##-- a server already setup with a Notebook, options
--matplotlib inline --ip=192.168.1.200 --no-browser --port=8888

##-- COMBINE
IPYTHON_OPTS="notebook --matplotlib inline --ip=192.168.1.200 --no-browser --port=8888" $SPARK_HOME/bin/pyspark --master "local[4]"

The IPython Notebook worked ! Lots of conveniences, interactivity and viz potential immediately available against the pyspark environment. I created several Notebooks in short order, to test and explore, for example SQL.

The SQL exercise reads data from a format new to me, called Parquet

 
Part 1.2

After rest and recuperation, I wanted to try python in the almost-ready Spark 1.2 branch. It turned out to build and run easily. First get the spark code:

 https://github.com/apache/spark/tree/branch-1.2

make sure maven is installed on your system, then run

./make-distribution.sh

. Afterwards, I set $SPARK_HOME to this directory, and launched IPython Notebook again. All the examples and experiments I had built worked without modification ! Success.

Other Links

http://databricks.com/blog/2014/03/26/spark-sql-manipulating-structured-data-using-spark-2.html
http://spark-summit.org/2014/training
https://github.com/amplab-extras

http://www.planetscala.com/

experimental
https://github.com/ooyala/spark-jobserver

JSONb First Looks

PostgreSQL_logo.3colors.120x120
PostgreSQL 9.4 beta 3 on Linux

-- Simple JSON/JSONb compare, by Oleg
-- json: text storage, as is
-- jsonb: whitespace dissolved, no duplicate keys (last in wins), keys sorted
SELECT 
  '{"c":0,   "a":2, "a":1}'::json,
  '{"c":0,   "a":2, "a":1}'::jsonb;

          json           |      jsonb       
-------------------------+------------------
 {"c":0,   "a":2, "a":1} | {"a": 1, "c": 0}
(1 row)


-- emit JSON text from Census corpus
--
SELECT json_agg(row_to_json(p)) from 
(  
  select gid,fullname,'feat' as ftype from tiger_data.ca_featnames 
  where fullname ~ '^Az' ) as p;

          json_agg  (formatting added) 
-------------------------------------------
[ 
  {"gid":5048,"fullname":"Aztec Way","ftype":"feat"},
  {"gid":9682,"fullname":"Azalea Ct","ftype":"feat"},
    ...
  {"gid":4504601,"fullname":"Azure Pl","ftype":"feat"}
]

##--  return a dict with metadata fields, and an array of dict
select row_to_json(a.*) from 
(select 
  'census_acs_2013' as origin,
  'ca' as state,
  'ca_featnames' as table,
  (
    SELECT json_agg(row_to_json(p)) from (  
      select gid,fullname,'feat' as ftype from tiger_data.ca_featnames 
      where fullname ~ '^Az' ) as p
  ) as rows
) a;

          row_to_json  (formatting added) 
---------------------------------------------------------
{   "origin":"census_acs_2013",
    "state":"ca",
    "table":"ca_featnames",
    "rows": [
      {"gid":5048,"fullname":"Aztec Way","ftype":"feat"},
      ...
      {"gid":4519032,"fullname":"Azalea Way","ftype":"feat"}
  ]
}

GeoPandas and NaturalEarth2 tryout

things are looking good with GeoPandas

gpd_ex0

Census Tract and 150 Meter Grids Compare

In this screenshot of Central Silicon Valley, Census tracts have been combined with a constraints layer, and then cut with a 150 meter grid in the EPSG:3310 projection. Using imputation tables and external sources, each grid cell is then computed. The result is a statistically defensible, higher-resolution and handily applicable set of grid cells.

tracts_150m_comp

ACS 5yr Viz Processing

A systematic way to choose, extract and visualize data from the massive American Community Survey 5 Year census product is a challenge. I have written python code to ingest raw inputs into tables, and a small relational engine to handle the verbose naming.

An extraction and visualization process is underway… something like the following:

0) bulk tables in all geographies for all states
1a)   define a batch of tables to extract by table_id
1b)   choose a state or territory
1c)   choose a geographic summary level

for example:

STATE  California (FIPS 06)
TABLE  ('B01001', 'SEX BY AGE', 'Age-Sex', 'Universe:  Total population')
  GEO  Tracts (Summary level 140 - State-County-Census Tract)

Once the choice is made, SQL + Python is executed, either as a standalone program in Linux or in the IPython Notebook. The code creates a working schema in PostgreSQL, copies table subsets into the new schema, and JOINs them with TIGER geometry to get spatial data. A preliminary, working version looks something like this:

domaketractstable

graphical browsing of the results in QGis:

acs5yr_viz_progress1

geographic summaries defined in ACS_2008-2012_SF_Tech_Doc:
Appendix F: ACS 5-year Summary Levels/Components for Detailed Tables

Numeric Stats on Bay Area Intersection Counts

pg_logo

In preparing for an upcoming Datathon, a column of data in PostgreSQL numeric format needed formatting for presentation. “Intersection Count” intersection_density_sqkm is a count of street intersections per unit area – a quick way to measure density of the built environment. A table of grid cells (covering the nine-county San Francisco Bay Area) that the column comes from consists of roughly 814,000 cells. How to quickly characterize the data contents? Use SQL and the psql quantile extension to look at ranges, with and without the zeroes.
 

SELECT 
   min(intersection_density_sqkm), 
    quantile(intersection_density_sqkm,ARRAY[0.02,0.25,0.5,0.75,0.92]), 
   max(intersection_density_sqkm) 

 FROM uf_singleparts.ba_intersection_density_sqkm
-[ RECORD 1 ]-----------------------------------------------------------------------------
min      | 0.0
quantile | {0.0, 0.0, 0.683937...,3.191709...,25.604519...}
max      | 116.269430...

Psql extension quantile takes as arguments a column name and an ARRAY for N positional elements by percentage, e.g. above


How Many Gridcells Have Non-zero Data ?

select count(*) from ba_intersection_density_sqkm;
count => 814439

select count(*) from ba_intersection_density_sqkm where intersection_density_sqkm <> 0;
count => 587504

 Select stats on non-zero data

 SELECT 
   min(intersection_density_sqkm), 
    quantile(intersection_density_sqkm,ARRAY[0.02,0.25,0.5,0.75,0.92]), 
   max(intersection_density_sqkm) 

 FROM uf_singleparts.ba_intersection_density_sqkm

 where intersection_density_sqkm <> 0;
-[ RECORD 1 ]-----------------------------------------------------------------------------
min      | 0.227979...
quantile | {0.227979...,0.455958...,1.367875...,7.751295...,31.461139...}
max      | 116.269430...

 
and, what does the high-end of the range look like ? Use SQL for a quick visual inspection for either outliers or smooth transitions:

SELECT intersection_density_sqkm 
FROM ba_intersection_density_sqkm 
 ORDER BY  intersection_density_sqkm   desc limit 12;

 intersection_density_sqkm 
---------------------------
      116.2694300518134736
      115.5854922279792768
      115.3575129533678764
      115.1295336787564760
      114.9015544041450756
      114.9015544041450756
      114.4455958549222792
      113.7616580310880824
      112.6217616580310892
      112.6217616580310892
      112.1658031088082884
      112.1658031088082884

 

So, recall that a natural log e of 1.0 is 0; a natural log of 116 is slightly over 4.75; a natural log of a number less than 1 is a negative number. To simplify the range for visualization, add a float column called data, set the new data column to the natural log of (intersection_density_sqkm + 1); use a simple multiply-then-divide technique to limit the precision to two digits (screenshot from an IPython Notebook session using psycopg2).

ipython_notebook UPDATE sql

 select quantile(data,ARRAY[0.02,0.25,0.5,0.75,0.92]) from ba_intersection_density_sqkm;
  { 0, 0, 0.52, 1.43, 3.28 }
SELECT
  min(data), 
    quantile(data,ARRAY[0.02,0.25,0.5,0.75,0.92]), 
  max(data) 
FROM ba_intersection_density_sqkm
WHERE data <> 0;

 min  |          quantile          | max  
------+----------------------------+------
 0.21 | {0.21,0.38,0.86,2.17,3.48} | 4.76
(1 row)

 

Final Results in GeoServer 2.5 CSS styler:

geoserv_css_style_inters

ps- a full sequential scan on this table takes about four seconds, on a Western Digital Black Label 1TB disk, ext4 filesystem, on Linux.