Skip to content


21 Dec 16
Data Integration pre-Setup — California AB 802 Support

* Berkeley ImageSEG
– email license terms with James

* Similarity Measures JTS Interface

* Example Work Context
– EPA POINT in Davis, California -LINK-

14 Dec 16
Data Integration pre-Setup — California AB 802 Support


LA Buildings Import Project status misc -LINK-

Openstreetmap osm2pgsql 0.92 RC1 — Polygon Validity -LINK-

Computer Vision / Image Recognition Berkeley ImageSEG quote -LINK-

Base Map Imagery for Tracing, QA & QC
National Agricultural Imagery Program (NAIP) -LINK- Add 2014 County assets to an existing 2008 NAIP set.
Processing NAIP 2014 Imagery — origin: State of California Atlas website -EX-LINK-

## DOWNLOAD new data
dbb@i7d:/sg22/geodata_misc/CA_NAIP_2014$ at 1 am tomorrow -f
# Kings  Madera  Merced  SanJoaquin Stanislaus
# ElDorado  Imperial  Riverside  SanBernardino  Ventura
## DECOMPRESS from SID format to TIFF
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/sg22/geodata_misc/LizardTech/MrSID_DSDK-
sudo ldconfig
cd /sg22/geodata_misc/CA_NAIP_2014/
mrsiddecode -j 8 -o ortho_1-1_1n_s_ca075_2014_1.tif -i ortho_1-1_1n_s_ca075_2014_1.sid
dbb@i7d:/sg22/geodata_misc/CA_NAIP_2014$ mrsiddecode -j 8 -o /black3/ca_naip_2014_misc/ortho_1-1_1n_s_ca029_2014_1.tif -i ortho_1-1_1n_s_ca029_2014_1.sid
mrsiddecode: Copyright (c) 2010 LizardTech. All rights reserved.

  width:         244281
  height:        121710
  upper-left X:  205124.500000
  upper-left Y:  3968328.500000
  X scale:       1.000000
  Y scale:       -1.000000

  width:         244281
  height:        121710
  format:        GeoTIFF

Scripted local tooling runs a decompress / convert from MrSID format to JPEG 90 YCbCr, and adds embedded overviews.

Performance Metric: one of the smallest county-wide SID files, San_Mateo, takes just over 40 minutes on i7d to convert to JPEG YCbCr with overviews. By comparison, Fresno County imagery is about 8x bigger, and Kern County is 12x bigger. Note2: at command set a default NICE value of two; combined with the RAM requirements, all three jobs were deferring ! React – killed one job, renice two others (-2) and now disk and CPU are much more performant. Aprox. 3GB of dedicated RAM between two warp processes – Fresno (019) and Kern (029).

California NAIP 2014, Counties Acquired Locally 09dec11

California NAIP 2014, Counties Acquired Locally 09dec16. County cousub colored [red, mf17] [green,commercial] [brown,both] Whole selected county dark red

California NAIP 2014, Counties Acquired Locally 11dec16

California NAIP 2014, Counties Acquired Locally 11dec16


Polygon Repair
Academic research tool pprepair not ready for “primetime” ?? -LINK-


dbb      pid  4.7 10.5 1459796 1287088 - D+   10:08   0:09 
  gdal_translate ...  (options)

dbb      pid  2.9 10.3 1445140 1263924 -     D< 04:28  10:08 
  gdal_translate ...  (options)

dbb      pid  100 12.6 1878196 1538268 -     S<l  09:35  37:26 
  gdalwarp ... (options)  

I started the conversion on Friday night
27 files, four steps per file.. the file sizes range from 2GB to 127GB 
I have made changes and adjustments, but basically, the three largest files are not done yet, while many smaller ones finished easily
I suggest there is some non-obvious behavior with the OS and the largest files
the source disk most of the time, is a Western Digital Black label 1TB .. which is a fast disk with good characteristics.. yet I see very low I/O and large RAM caches
if it was CPU bound that would be fine, but there is also low CPU useage at many times
the motherboard is high-end consumer grade.. it would be surprising to me if there were serious bus problems on that model.. but the motherboard is not dual-xeon or other high end.. it is a fast i7 first-generation
profiling and other careful investigation, is possible.. but I cannot do it today
I hope to be constructive, with real world feedback.. I appreciate your insights
nearblack seems to be the most puzzling, and low-performance
the others steps are   warp;  compress to JPEG YCbCr; and   add overviews
.. some kind of cache-thrashing, or other things going on .. not sure
there are TWO processess only now.. both are showing I/O starved.. (D)
they are both reading from the same disk.. and that disk is showng a grand total of 10MB/sec reads
that disk is my newest disk, so I was relying on it.. 
however, with this context, i am becoming suspicious of that one disk behavior
that is WAY too slow.. and there is very little contention on the system right now

-rw-rw-r-- 1 dbb dbb 1.2G Dec 11 12:20 final_ortho_1-1_1n_s_ca001_2014_1.tif
-rw-rw-r-- 1 dbb dbb 1.1G Dec 11 12:57 final_ortho_1-1_1n_s_ca013_2014_1.tif
-rw-rw-r-- 1 dbb dbb 2.2G Dec 14 13:06 final_ortho_1-1_1n_s_ca019_2014_1.tif
-rw-rw-r-- 1 dbb dbb 8.3G Dec 14 12:19 final_ortho_1-1_1n_s_ca029_2014_1.tif
-rw-rw-r-- 1 dbb dbb 1.6G Dec 13 00:37 final_ortho_1-1_1n_s_ca031_2014_1.tif
-rw-rw-r-- 1 dbb dbb 7.9G Dec 12 18:52 final_ortho_1-1_1n_s_ca037_2014_1.tif
-rw-rw-r-- 1 dbb dbb 1.1G Dec 13 01:32 final_ortho_1-1_1n_s_ca041_2014_1.tif
-rw-rw-r-- 1 dbb dbb 1.5G Dec 13 02:29 final_ortho_1-1_1n_s_ca055_2014_1.tif
-rw-rw-r-- 1 dbb dbb 1.5G Dec 11 14:05 final_ortho_1-1_1n_s_ca059_2014_1.tif
-rw-rw-r-- 1 dbb dbb 2.7G Dec 11 18:47 final_ortho_1-1_1n_s_ca061_2014_1.tif
-rw-rw-r-- 1 dbb dbb 1.5G Dec 11 20:28 final_ortho_1-1_1n_s_ca067_2014_1.tif
-rw-rw-r-- 1 dbb dbb 2.1G Dec 12 01:59 final_ortho_1-1_1n_s_ca069_2014_1.tif
-rw-rw-r-- 1 dbb dbb 221M Dec 11 20:03 final_ortho_1-1_1n_s_ca075_2014_1.tif
-rw-rw-r-- 1 dbb dbb 1.9G Dec 12 08:31 final_ortho_1-1_1n_s_ca077_2014_1.tif
-rw-rw-r-- 1 dbb dbb 692M Dec 12 05:22 final_ortho_1-1_1n_s_ca079_2014_1.tif
-rw-rw-r-- 1 dbb dbb 1.1G Dec 13 12:07 final_ortho_1-1_1n_s_ca081_2014_1.tif
-rw-rw-r-- 1 dbb dbb 4.0G Dec 13 17:26 final_ortho_1-1_1n_s_ca083_2014_1.tif
-rw-rw-r-- 1 dbb dbb 2.0G Dec 14 05:29 final_ortho_1-1_1n_s_ca085_2014_1.tif
-rw-rw-r-- 1 dbb dbb 1.3G Dec 13 17:32 final_ortho_1-1_1n_s_ca087_2014_1.tif
-rw-rw-r-- 1 dbb dbb 1.5G Dec 13 17:39 final_ortho_1-1_1n_s_ca095_2014_1.tif
-rw-rw-r-- 1 dbb dbb 3.0G Dec 13 17:58 final_ortho_1-1_1n_s_ca097_2014_1.tif
-rw-rw-r-- 1 dbb dbb 914M Dec 14 09:35 final_ortho_1-1_1n_s_ca101_2014_1.tif
-rw-rw-r-- 1 dbb dbb 127G Dec 13 08:54 ortho_1-1_1n_s_ca019_2014_1.tif
-rw-rw-r-- 1 dbb dbb  52G Dec 13 09:10 ortho_1-1_1n_s_ca073_2014_1.tif
-rw-rw-r-- 1 dbb dbb 9.7G Dec 13 09:25 ortho_1-1_1n_s_ca081_2014_1_bak.tif
-rw-rw-r-- 1 dbb dbb  53G Dec 11 22:29 ortho_1-1_1n_s_ca111_2014_1.tif
-rw-rw-r-- 1 dbb dbb  17G Dec 10 22:57 ortho_1-1_1n_s_ca113_2014_1.tif
-rw-rw-r-- 1 dbb dbb  16G Oct 20 09:00 ortho_1-1_1n_s_ca115_2014_1.tif
-rw-rw-r-- 1 dbb dbb 972M Dec 11 00:52 v1final_ortho_1-1_1n_s_ca001_2014_1.tif
-rw-rw-r-- 1 dbb dbb 898M Dec 11 01:12 v1final_ortho_1-1_1n_s_ca013_2014_1.tif
-rw-rw-r-- 1 dbb dbb 1.2G Dec 11 08:54 v2final_ortho_1-1_1n_s_ca001_2014_1.tif
-rw-rw-r-- 1 dbb dbb 1.1G Dec 11 09:17 v2final_ortho_1-1_1n_s_ca013_2014_1.tif

07 Dec 16
Data Integration pre-Setup — California AB 802 Support

Housekeeping Misc:

Cal GIS Council Meeting 08Dec16

Data: LA Buildings Import file a post-process result of
LARIAC_Buildings_2014.gdb using

– LA assessors data -LINK-
– public discussion -LINK-
Data: AMBAG (Santa Cruz County & Monterey County)

INFO: Open of `RGNL_Footprints/Footprints_RGNL.shp'
      using driver `ESRI Shapefile' successful.

1: Footprints_RGNL (3D Measured Polygon)
Layer name: Footprints_RGNL
Geometry: 3D Measured Polygon
Feature Count: 238115
Extent: (5653657.710255, 1962160.892429) - (5940174.894438, 2291789.620637)

Task 2 — Data Integration Workflow

Commerical Raster-to-Vector Software: Trimble Ecognition -LINK-

Berkeley ImageSEG -LINK-

Internal Process
  build a workflow that can combine new flawed layers, with existing layers, in sets, and produce a new “best” set.. Do that over and over again..

some assumptions: layers originate from various sources.. including vector construction from raster sources.. those vector POLYGONS will have flaws and omissions.. Any layer may have correct information.. the content of every layer is incomplete…

Polygons from disparate sources may show small differences which impede identification

Polygons from disparate sources may show small differences which impede identification


Theory: Java Conflation Suite — GIS Conflation using Open Source Tools -LINK-

from the JCS Whitepaper (note conflation means integration):
1.4.2 Vertical Conflation
 Vertical Conflation involves matching and/or eliminating discrepancies between datasets that occupy the same area in space. Examples include road network matching between two representations of roads in the same region (or building polygons ed.).

 Two important kinds of vertical conflation are Version Matching and Feature Alignment.
 In the case of Version Matching the input datasets consist of different versions of the same features. The conflation process is intended to identify matching features. Attributes may be transferred between matched features, and unmatched features may be transferred in their entirety. An example of this is matching different versions of road networks for the same geographical area (or building polygons ed.).
 In case of Feature Alignment the input data consists of features from two or more different feature classes that bear some defined relationship to each other. The conflation process is intended to remove discrepancies between the datasets that causes this relationship to fail to hold. A common relationship is that of geometric alignment. An example of this is aligning the boundaries of different kinds of feature classes such as municipal districts and lot parcels. (or building polygons ed.)

Many conflation tasks have in the past been carried out in a manual process, with a human operator identifying matches or errors, determining alignments or corrective actions, and manually correcting geometry or attribution. Although various tasks within this process may have been carried out with the assistance of software, scripts and macros, the overall process would generally be characterized as ‘computer assisted’ manual conflation.
 Increasingly sophisticated software tools have recently enabled ‘human-assisted’ approaches to conflation, inverting the traditional paradigm ‘computer-assisted’ paradigm. Now, human expert judgment and intervention is required only in relatively rare situations within an essentially automated process. Such approaches, if they can be applied to a given problem, greatly improve the productivity and reliability of conflation projects.

Pre-processing → Quality Assurance → Alignment → Feature Matching → Geometry Alignment and/or Information Transfer.

— end of JCS Whitepaper quote

OSM Editing Layers


Easily installed osm_editor_layer_index, a web page featuring recent links to various authoritative sources of basemap materials. OSM Index Layers Bug Report -LINK-
Other OSM Web Resources:
* Wikimedia vs TIGER Battle Grid -LINK-
* OSM Inspector a.k.a. OSM-I -LINK-



JOSM building tools -LINK-

JOSM Building Outlines setup in Fresno, California (bg=0601900000)

Drawing Building Outlines in Fresno, California (bg='060190045051')

OSM-Dev Channel
Openstreetmap Project convenes on many public communication channels, include text-based chat (IRC) on the OFTC Network: The Open and Free Technology Community aims to provide stable and effective collaboration services to members of the community in any part of the world — OFTC is a member project of Software in the Public Interest, a non-profit organization which was founded to help organizations develop and distribute open hardware and software.

In the past months, I have gradually made my presence known on the #osm-dev IRC channel, as a member of the LA Buildings Import project (with Ben Discoe), and as a software developer interested in “building polygons in California.” My blog post on OSM Fresno made it into the Openstreetmap Weekly News 332, last week. -LINK-

On Monday, I spoke extensively with core OSM developers: simponpoole, Yuri Astrakhan aka yurik of the Wikimedia Foundation, Nicolás Alvarez aka nicolas17, Paul Norman and others, regarding “automated scripts to repair multipolygons in Openstreetmap.” To summarize, the culture at Wikimedia Foundation (wikipedia) has long since accepted and developed automated tasks that read, connect and in some cases, repair content. In Openstreetmap, the culture is to “encourage community members to map local knowledge.” Automated-edits are called “mechanical edits” and there is quite a bit of resistance to the practice.. However, the conversation was generally constructive.

OSM Wiki:
* Buildings -PAGE- -TALK-
* Area -PAGE-
* Future-of-Area -PAGE-

Topics including:


01 Dec 16
Data Analysis — California AB 802 Support

* OSM Fresno post (see main blog page)

* Towards a Graphical Reporting System
In order to track and understand the contents of the developing catalog, a graphical reporting system is desirable. Using direct SQL against a master file database is a modern approach.

The “look and feel” of the reports can vary quite a bit.

OSM Buildings In the News
How you can help StatCan test crowdsourcing in mapping Ottawa’s buildings
Canada’s national statistics agency is testing the power of crowdsourcing through an unusual project that aims to map Ottawa’s buildings. As part of its pilot project, Statistics Canada is asking people who live in Ottawa-Gatineau to contribute information about local buildings to an open source map.

* OSM Landuse Viz -LINK-


16 Nov 16
Data Analysis — California AB 802 Support

Task One Status Sheet

  • County
  • CoStar Locations — MultiFamily 17+; Commercial Single-tenant and Multi-tenant
  • Report “roll-up” for building footprints inventory
-- get PZ, County and OSM Buildings, statewide
SELECT (pd.pz_id, pd.countyfp, pd.county_name), count(*)
  pz_region_defs pd, 
  tl_2016_us_county c,
   alt_bldgs.multipolygons  osm_bldgs
    c.statefp = '06'    AND
    pd.countyfp = c.countyfp  AND
    st_intersects( c.geom, osm_bldgs.wkb_geometry )
  (pd.pz_id, pd.countyfp, pd.county_name)
  (pd.pz_id, pd.countyfp, pd.county_name);

-- costar all intersect osm bldgs all, but not LA

SELECT (pd.pz_id, pd.countyfp, pd.county_name), count(*)
  pz_region_defs pd, 
  tl_2016_us_county c,
  alt_bldgs.multipolygons  osm_bldgs,
  tmp_costar_res costar_p

    c.statefp = '06'    AND
    c.countyfp != '037'   AND   --exclude LA 
    pd.countyfp = c.countyfp  AND
    osm_bldgs.building is not null   AND    -- rough count in OSM buildings
    st_intersects( c.geom, osm_bldgs.wkb_geometry )  AND
    st_intersects( costar_p.geom, osm_bldgs.wkb_geometry )
  (pd.pz_id, pd.countyfp, pd.county_name)
  (pd.pz_id, pd.countyfp, pd.county_name);

backend API notes: Apache libcloud -link-


09 Nov 16
Data Analysis — California AB 802 Support

Building Footprints from Authoritative Sources, by County

Building Footprints from Authoritative Sources, by County 08Nov16

County: [ Sacramento, Solano, Marin, San Francisco, San Mateo, Santa Cruz, Los Angeles]
City: [ (Placer) Roseville, (Orange) Newport Beach, (Kern) Bakersfield, (Sonoma) Petaluma ]

Non-Authoritative Sources by County Subdivisions (cousub)
Openstreetmap snapshots from high-density cousubs not otherwise covered:

San JoaquinStockton

AlamedaFremont, Hayward, Livermore, Oakland

Contra CostaConcord
San DiegoSan Diego
Santa ClaraSan Jose
SonomaSanta Rosa

costar top30 concentrations with Planning Zone 7 (LA) highlited.

costar “top 30” concentrations

Report Summary Units (cousub) misc:

  • no cousub in the state has zero costar locations in the sample
  • six cousubs contain as many costar sample points as the rest of the state combined
  • 73% of all costars occur in the combined top 30 cousubs
  • 241 cousubs in fifty-eight California counties; seven cousubs per county average
  • six of fifty-eight counties have more than fifteen cousubs, with Los Angeles having the most, at twenty

yellow/black map of Assets; -costars-top30- -costars-per-cousub- TXT JSON CSV

Fifty percent of costar samples exist in the six largest cousubs.

Fifty percent of costar samples exist in the six largest cousubs. More than a third of all samples occur in Planning Zone 7 alone


Special Note on the name Los Angeles: there are at least four common uses of that name, each meaning very different things — County of Los Angeles (pop. 10 million+); City of Los Angeles (pop. 4 million); US Census County Subdivision (cousub) Los Angeles (pop. aprox. 6 million); Los Angeles-Long Beach-Anaheim, CA Metropolitan Statistical Area (MSA) (pop. aprox. 12 million).

Cities of Los Angeles County

Cities of Los Angeles County -wikipedia- with the City of Los Angeles highlited



LARIAC14 building footprints on statewide parcels set -- nov16.

LARIAC14 building footprints on statewide parcels set — Inglewood Test Area, Los Angeles — nov16.


Los Angeles County GIS Data Portal
Countywide Building Outlines – 2014 Update – Public Domain Release -LINK-

November 1, 2016 – this data is now public

The Countywide building outline dataset contains building outlines (over 3,000,000) for all buildings in Los Angeles County, including building height, building area, and the parcel number (also known as building footprints). This data was captured from stereo imagery as part of the LAR-IAC2 Project (2008 acquisition) and was updated as part of the LARIAC4 (2014) imagery acquisition.

There are a number of sources. All buildings were updated to include changes between 2008 and 2014.

  • City of Palmdale – building outlines from the LAR-IAC (2006) imagery – derived from orthogonal imagery.
  • City of Pasadena – building outlines from earlier imagery, updated with LAR-IAC2 (2008) imagery in 2008
  • City of Glendale – building outlines from earlier imagery, updated with LAR-IAC2 (2008) imagery in 2008
  • City of Los Angeles – building outlines from LAR-IAC2 (2008), stereo generated, for all buildings > 64 square feet
  • The rest of the County – building outlines from LAR-IAC2 (2008), stereo generated, for all buildings > 400 square feet

Most of the buildings in this dataset were generated using stereo imagery. This means that the person capturing the buildings actually saw them in 3-D, and therefore was able to more accurately capture the location of the roof line, since this method eliminated the impacts of building lean (where the height of the building impacts its apparent location). Basically – this is the most accurate method for capturing building outlines. In many cases the location is more accurate than our aerial photography and parcel boundaries.


LARIAC Buildings (2014) -LINK- note 1 Gb download

LARIAC2 Buildings (2008) -LINK-

Oct 29, 2015 – Parcels CA 2014. -LINK-
Los Angeles County Parcels Tax Roll. -LINK-


Note: data described may be archived on 50GB BD-R DL discs

LARIAC Other -main- -2014-Imagery-

LARIAC Buildings (2014)
This file contains a file geodatabase which has two feature classes:

  • LARIAC4_BUILDINGS_2014 – this is the current set of buildings as of 2014
  • LARIAC2_BUILDINGS_DELETED_2014 – these are the buildings from LARIAC2 that have been modified or deleted. These can be for change analysis and detection.


Current Building Data Fields
 CODE       (Building or courtyard)
 BLD_ID      Unique ID
 HEIGHT     (Height in feet)
 Elevation  (Ground elevation)
 Area       (Building roofline in Square Feet)
 Source     (which provenance)
 Date       (data acquired)
 AIN        (Parcel ID)
 Status     (Unchanged, New, Replacement, Modified)
 OLD_BLD_ID (connects to the Deleted Buildings BLD_ID field)

As an addition, LARIAC requested that the parcel ID number
(AIN) as of July 2014 be assigned to each building based
upon the building centroid.  
We have attached the information from the Assessor's Local
Roll based upon this join, which will provide address
information as well as many other attributes such as use
type, etc.  
This is a one-time effort -- 
 it can be updated by participants at their leisure later.

Deleted Building Data Structure
 CODE       (Building or courtyard)
 BLD_ID      Unique ID
 HEIGHT     (Height in feet)
 Elevation  (Ground elevation)
 Area       (Building roofline in Square Feet)
 Source     (which provenance)
 Date       (data acquired)
 AIN        (Parcel ID)
 Status     (Destroyed, Modified)
 NEW_BLD_ID (connects to the Current Buildings BLD_ID field)

Extract | Transform | Load:

ogr2ogr -F PostgreSQL PG:dbname=geo_datamine_f2 -lco GEOMETRY_NAME=geom -lco SCHEMA=la14 -lco FID=gid LARIAC_Buildings_2014.gdb 'LARIAC2_BUILDINGS_DELETED_2014' -t_srs EPSG:4326

ogr2ogr -F PostgreSQL PG:dbname=geo_datamine_f2 -lco GEOMETRY_NAME=geom -lco SCHEMA=la14 -lco FID=gid LARIAC_Buildings_2014.gdb 'LARIAC4_BUILDINGS_2014' -t_srs EPSG:4326

-- SQL begin --------------
CREATE TABLE la14.la_bldgs14_invalid as
FROM la14.lariac4_buildings_2014
   not ST_IsValid(geom);
DELETE from la14.lariac4_buildings_2014
   not ST_IsValid(geom);
SELECT gid, ST_IsValidReason(geom) 
  not ST_IsValid(geom);
--(422 Rows)

DROP TABLE IF EXISTS la14.la_bldgs14_pt cascade;
CREATE TABLE la14.la_bldgs14_pt as
SELECT gid, code, bld_id, height, elev, lariac_buildings_2014_area, 
       source, date_, ain, status, old_bld_id,
       apn, situshouseno, situsfraction, situsdirection, 
       situsunit, situsstreet, situsaddress, situscity, situszip, taxratearea, 
       agencyclassno, agencyname, agencytype, usecode, usecode_2, usetype, 
       usedescription, yearbuilt1, effectiveyear1, recdate, recdocno, 
       ownername, owneroverflow, secondowner, specialname, mailhouseno, 
       mailfraction, maildirection, mailunit, mailstreet, mailcity, 
       mailzip, roll_year, roll_landvalue, roll_impvalue, roll_perspropvalue, 
       roll_fixturevalue, roll_homeownersexemp, roll_realestateexemp, 
       roll_perspropexemp, roll_fixtureexemp, roll_landbaseyear, roll_impbaseyear, 
       spatialchangedate, parcelcreatedate, assr_map, assr_index_map, 
         st_POINTONSURFACE(geom) as geom
  FROM la14.lariac4_buildings_2014;
ALTER TABLE la_bldgs14_pt add PRIMARY KEY (gid);
CREATE INDEX lb14_geom_idx on la_bldgs14_pt using GIST(geom);
ANALYZE la14.la_bldgs14_pt;

DROP TABLE IF EXISTS la14.la_bldgs14_del_pt cascade;
CREATE TABLE la14.la_bldgs14_del_pt as
SELECT gid, code, bld_id, height, elev, area, source, date_, ain, 
       status, new_bld_id, 
           st_POINTONSURFACE(geom) as geom
  FROM la14.lariac2_buildings_deleted_2014;
ALTER TABLE la_bldgs14_del_pt add PRIMARY KEY (gid);
CREATE INDEX lb14d_geom_idx on la_bldgs14_del_pt using GIST(geom);
ANALYZE la14.la_bldgs14_del_pt;

--  geo_datamine_f2
SELECT 'la_bldgs14_invalid' as invd, count(*) as bldg_cnt
FROM  la14.la_bldgs14_invalid;
        invd        | bldg_cnt 
 la_bldgs14_invalid |      422
SELECT 'la_bldgs14' as inv, count(*) as bldg_cnt
FROM  la14.lariac4_buildings_2014;
    inv     | bldg_cnt 
 la_bldgs14 |  3118551
SELECT 'la_bldgs14_del' as ladel, count(*) as bldg_cnt
FROM  la14.lariac2_buildings_deleted_2014;
     ladel      | bldg_cnt 
 la_bldgs14_del |   219666


LARIAC14 LA Buildings Import Comparison
Between two layers, LA Buildings (2008) and (2014), additions are shown in red for several Inglewood, LA blockgroups .



internal ECN -invalid-polygon-

NAIP Processing Framework -opengdp- Task 2 billable — setting up the NAIP imagery is time-consuming, with many variables contributing to platform stability. It is a benefit to the project deadlines, to have critical path setup in place before it becomes a bottleneck.

Google to end MapMaker Community Mapping -link-


01 Nov 16
Data Analysis — California AB 802 Support

Geonames -link- is a geographic gazetteer of POINT; a current snapshot yields about 64,000 entries of class ‘S’ in California, which includes buildings. Data sources -link-

--  wget -c
--  unzip

CREATE TABLE geonames.geonames_16 (
    geonameid integer NOT NULL,
    name character varying(200),
    asciiname character varying(200),
 drop table if exists gn16_ca cascade;
 create table gn16_ca as 
    select distinct on (geonameid) * from gn16_view 
    where admin1 ~* 'California'  AND  feature_class = 'S';
 alter table gn16_ca add PRIMARY KEY (geonameid );
create index gn16ca_geom_idx on gn16_ca using GIST(geom);
analyze gn16_ca;
geo_datamine_f2=# select count(*), feature_name from gn16_ca group by feature_name order by count(*) desc;

 count |       feature_name       
 12966 | school
 10150 | church
  7390 | building(s)
  7189 | 
  6602 | hotel
  3333 | mine(s)
  2898 | camp(s)
  2201 | mall
  1460 | dam
  1192 | post office
  1130 | library
  1094 | hospital
   963 | tower
   892 | cemetery
   604 | airport
   481 | golf course
   354 | heliport
   348 | resort
   327 | museum
   311 | military installation
   275 | farm
   183 | restaurant
   162 | bridge
   129 | abandoned camp
    94 | marina
    92 | stadium
    90 | abandoned airfield

NAIP 2014 -link-

~/CEC_i7d/Code_Misc_repo/naip_fetch$ ls    


$ at 1 am tomorrow -f 
$ at 2 am tomorrow -f
$ at 3 am tomorrow -f 
$ at 4 am tomorrow -f 
$ at 5 am tomorrow -f 
$ at 6 am tomorrow -f

-rw-rw-r-- 1 dbb dbb 897M Apr  8  2016 ortho_1-1_1n_s_ca001_2014_1.sid

~/CEC_i7d/Code_Misc_repo/naip_fetch$ ogrinfo -al ./CA_NAIP_2014/ortho_1-1_1n_s_ca001_2014_1.shp -so

INFO: Open of `.../CA_NAIP_2014/ortho_1-1_1n_s_ca001_2014_1.shp'
      using driver `ESRI Shapefile' successful.

Layer name: ortho_1-1_1n_s_ca001_2014_1
Geometry: Polygon
Feature Count: 22
Extent: (549934.358000, 4123967.911000) - (643548.295000, 4214567.898000)
Layer SRS WKT:

SHAPE_AREA: Real (19.11)

## MrSID Decode 
$ wget

$ tar xf *
$ cd MrSID_DSDK-

$ ls .../geodata_misc/LizardTech/MrSID_DSDK-


$ sudo ldconfig

$ .../bin/mrsidinfo  .../CA_NAIP_2014/ortho_1-1_1n_s_ca115_2014_1.sid

$ .../bin/mrsiddecode  -help

$ .../bin/mrsiddecode  -j 8 -o .../ca_naip_2014_misc/ortho_1-1_1n_s_ca115_2014_1.tif -i .../CA_NAIP_2014/ortho_1-1_1n_s_ca115_2014_1.sid

-rw-rw-r-- 1 dbb dbb 16G Oct 20 09:00 ortho_1-1_1n_s_ca115_2014_1.tif

.../CA_NAIP_2014$ gdalinfo .../ca_naip_2014_misc/*
Driver: GTiff/GeoTIFF
Files: .../ca_naip_2014_misc/ortho_1-1_1n_s_ca115_2014_1.tif
Size is 59722, 91784
Coordinate System is:
PROJCS["NAD83 / UTM zone 10N",
            SPHEROID["GRS 1980",6378137,298.257222101,
Origin = (612852.000000000000000,4395286.000000000000000)
Pixel Size = (1.000000000000000,-1.000000000000000)
Image Structure Metadata:
Corner Coordinates:
Upper Left  (  612852.000, 4395286.000) (121d41' 1.28"W, 39d41'59.90"N)
Lower Left  (  612852.000, 4303502.000) (121d41'56.68"W, 38d52'23.27"N)
Upper Right (  672574.000, 4395286.000) (120d59'14.72"W, 39d41'23.96"N)
Lower Right (  672574.000, 4303502.000) (121d 0'39.39"W, 38d51'48.37"N)
Center      (  642713.000, 4349394.000) (121d20'43.23"W, 39d16'55.78"N)
Band 1 Block=59722x1 Type=Byte, ColorInterp=Red
Band 2 Block=59722x1 Type=Byte, ColorInterp=Green
Band 3 Block=59722x1 Type=Byte, ColorInterp=Blue


OSM Multipolygon
pnorman blog -index- -p2--alt-rendering-

wiki -link-


15 Oct 16
Data Analysis — California AB 802 Support

CA State Court of Appeals Interprets Cost Formula -link-

CA NAIP 2014 Online -dir- ALAMEDA ⚈ 2014 LA County Acquired 18Oct16

LA County Assessors’ Use Codes -link-

LA Skylines -info-

Associating Buildings of Interest

table_name Description
costar_mf_pts0 CoStar Set (multi-family, 17 units and up)
parcels_la_2014_4326 LA Parcels Assessors Roll 2014
(LA County parcels)
la_bldgs_raw LA Buildings Import Source
ca_osm_bldgs_pt OSM Snapshot (current)

Simple Case:

   cos_ps : costar_mf_pts0    ->  parcels_la_2014_4326
 (parcels)  (where pt in LA)

  bldg_ps : la_bldgs_raw      ->  parcels_la_2014_4326
 (parcels)  (where units > 16)

-result_set- : -in_set_A-  'sql_ops' -> -in_set_B-


  • all attributes of both the parcel layer, and ls_bldgs_raw are now available per-CoStar site
  • category la_bldgs_costar ‘LA buildings of interest’ is available via bldg_ps

Each and every costar_mf_pts0 will be included in the CEC benchmarking.

  • What contribution do the non-CoStar layers bring to the CoStar data set?
  • Where is CoStar deficient ? How so ?
  • Where is external data deficient ? How so ?
  • What additional sets would contribute ?


  • osm identification, building to building
  • document data assets for each table
  • reproducible workflows for ingestion, analysis


  • list graphics here


  • LA (geo-boundary), CoStar POINT, parcel

count denoted volume of set; density shows groupings; match/miss data quality and coverage


Example — Inglewood Parcels


This screenshot shows parcels, center points for buildings, building footprints, and one CoStar mf17 POINT as a red dot. Observations:

* parcels appear to be mis-aligned by some meters to the East.
* many parcels bear more than one building
* alignment of buildings to parcels is problematic
* center points tend to be, but are not always true to a parcel


Example — (CoStar MF17 -> LA_Parcels) and (LA Buildings -> LA_Parcels)


Hilite of CoStar sample intersect LA Parcels; minus: LA_Bldgs with (units::integer > 16) intersect LA Parcels

OSM Alt_Bldgs Import

An experimental import sequence to ingest current California OSM snapshot is under development. The toolchain starts with downloading a California snapshot as pbf format. Use a translation and filter tool on the file to prepare it, either OSMOSIS or libOSMIUM. Next use ogr2ogr OSM driver with a customized osmconf.ini to generate SQL. Load the SQL into PostGIS. Apply post-processing SQL to the tables, and export via pg_dump. Load the result into a dedicated schema alt_bldgs in the main analytics database.

San Mateo County building footprints from Openstreetmap, imported locally and displayed using -WMS- over NAIP 2009:



Aprox. 3.5 million multipolygons are generated as alt_bldgs, including buildings, however there are some invalid geometries (most recently, less than 0.5% in CA), as shown by this query:

geo_datamine_f2=# select st_isvalidreason(wkb_geometry) from alt_bldgs.multipolygons
where not st_isvalid(wkb_geometry);

Fixing Polygons in OSM -link- -link2-

Polygon Extraction -code-



01 Oct 16
Data Acquisition — California AB 802 Support

Spatial Inventories

  • Files retrieved from the field; description, extent, QA
  • Area Coverage — where are the assets overall
  • Contents — classify, count and crosswalk to internal schemas

In the preliminary stages of the project, these descriptions and measurements (meta-data) are built by hand -report-. Longer-term cataloging is TBD.
Reporting access is TBD.

Early asset collection shows a wide difference between contents. Flat “shapes” with almost no identifying information on the one hand, and fully populated datasets on the other.

Cataloging combined with state-wide base maps, begin to show the “missing maps” story. Urban Areas (uac) are an excellent predictor of Residential MultiFamily 17 units or more.

A substantial portion of urban California is represented in the LA Bldgs export (6.3 GB).

Note in Openstreetmap LA Maps, building data has been captured in two forms: in OSM native .pbf; and in pre-export form courtesy of the LA Buildings Import Project -link-

County Bldg Footprints - 01Oct16

County Bldg Footprints – 01Oct16


Classification and Reporting on LA Bldgs

Using table la_bldgs_pt, and Subdivisions of Los Angeles County, count total buildings available, and the number of “Residential - Five or more units” buildings in each set. Show the SQL used.

                csub    | bldg_cnt | csub_res5p | res5p_cnt 
              Inglewood |   100623 |  res5p |     6956
                Newhall |    69822 |  res5p |     1963
               Torrence |    48424 |  res5p |     1351
               Pasadena |   178277 |  res5p |     9532
                Compton |   106024 |  res5p |     2454
            Los Angeles |   706297 |  res5p |    47428
           Palos Verdes |    35751 |  res5p |      616
               Whittier |   113813 |  res5p |     2206
           Santa Monica |    24156 |  res5p |     3523
         Downey-Norwalk |   119760 |  res5p |     5329
    Agoura Hills-Malibu |    27751 |  res5p |      411
     South Gate-East LA |   127402 |  res5p |     8977
    Long Beach-Lakewood |   181583 |  res5p |     8447
    San Fernando Valley |   577742 |  res5p |    20933
       South Bay Cities |    50015 |  res5p |     1608
  South Antelope Valley |    81832 |  res5p |     1402
  North Antelope Valley |    73238 |  res5p |     2104

Southwest San Gabriel Valley |   103087 | res5p |      5340
    Upper San Gabriel Valley |   109889 | res5p |      5149
     East San Gabriel Valley |   284619 | res5p |      7392





Prioritizing Progress — Planning Zones (PZ)

Currently, there are about three means of presenting graphical spatial results: via WMS to any capable client; local vectors from Shapefile (and PostGIS to any networked node); and Python Jupyter Notebook from GeoJSON, Shapefile and PostGIS, with different libraries. Coloring, labels and specifying line weights are handled differently in each case.

Analytics / Reporting Core Misc:

  • PZ simplified shapes -img-
  • OpenJUMP spatial SQL plus PZ -img-
  • PZ1 priority report -test-
  • PZ report base -img- plus WP sub-pages (under construction)
  • County Contacts Worksheet -scratch-
  • postgis scratch -sql-


 ID  light  dark  Counties  Cousubs Informal Name
 1  #7FC97F  #008400  17  86 Northern California
 2  #FF7F00  #006600  6  20 Sierra
 3  #F0207F  #E03090  9  45 SF Bay Area
 4  #66A0FF  #335500  6  38 SACOG
 5  #CCFFCC  #66DD66  8  84 Central Valley
 6  #DD5522  #AA3311  5  33 Central Coast
 7  #FFDDDD  #DD6666  6  77 SCAG
 8  #FF6F6F  #DD3333  1  13 San Diego

The table above is a first pass at a standardized color pallette for each of the eight PZ.




Aggregation — Which Container?

Which geographic territory might be an efficient way to focus search ?
Use Bakersfield, Kern County to examine three possibilites, looking for MF units.



Census Urban Areas
  • Very large areas
  • TIGER Standard -link-
  • can cross two or up to five CA counties


Census Place
  • Small and large areas
  • TIGER Standard -link-
  • can cross two or up to five CA counties


FMMP Urban Areas
  • Detailed (similar to block level)
  • Non-standard
  • Can be cut to county boundaries

Note: reporting and aggregation by county is always available.


Cal State GIS Data Portal
Note that the “Structures” category is locked -link-

LA Buildings Import
Stamen Design — using-open-data-to-learn-about-los-angeles

CA State Ag Data Portal

OSM Community Base Maps

OSM Buildings


14 Sep 16
Project Infrastructure — California AB 802 Support


07 Sep 16
Data Snapshots — California AB 802 Support

## Import OSM California

  wget -c
  # osm2pgsql version 0.91.0-dev (64 bit id space)

  osm2pgsql_git/build$ ./osm2pgsql -c -d osm_ca_osm2pgsql_trunk -C 8000 -l \

  # Osm2pgsql took 404s overall

  # execute osm_post_import.sql

## Summary Stats from OSM California Buildings as POINT
##  using a template - repeat for each Planning Zone (PZ) 

 zone | count
 PZ_1 | 13138
 PZ_2 | 11784
 PZ_3 | 631228
 PZ_4 | 39149
 PZ_5 | 299287
 PZ_6 | 202205
 PZ_7 | 1434736
 PZ_8 | 38848

##  Generate a count for a given Planning Zone
##   complete SQL in osm_bldg_pt_count0.sql

Select 'PZ_8' as zone, count(*)
  from tl_2016_us_county c, ca_osm_bldgs_pt p
  st_intersects( c.geom, p.geom)  AND
  statefp = '06' AND
  countyfp in 

#--- PZ_1
-- 'Alpine','Butte','Colusa','Del Norte','Glenn','Humboldt','Lake','Lassen','Mendocino','Modoc','Nevada','Plumas','Shasta','Sierra','Siskiyou','Tehama','Trinity'

#--- PZ_2
-- 'Amador','Calaveras','Inyo','Mariposa','Mono','Tuolumne'
#--- PZ_3
-- 'Alameda','Contra Costa','Marin','Napa','San Francisco','San Mateo','Santa Clara','Solano','Sonoma'
#--- PZ_4
-- 'El Dorado','Placer','Sacramento','Sutter','Yolo','Yuba'
#--- PZ_5
-- 'Fresno','Kern','Kings','Madera','Merced','San Joaquin','Stanislaus','Tulare'
 '031', '029', '019', '039', '077', '047', '107', '099'

#--- PZ_6
-- 'Monterey','San Benito','San Luis Obispo','Santa Barbara','Santa Cruz'
 '083', '053', '069','087','079'

#--- PZ_7
-- 'Imperial','Los Angeles','Orange','Riverside','San Bernardino','Ventura'
 '111', '037', '071', '065', '059', '025'

#--- PZ_8
-- 'San Diego'

OSM Extracted Data Products from

example -here- Geo Asset


01 Sep 16
Project Assets — California AB 802 Support


LA Buildings Import – OSM Wiki,_California/Buildings_Import

LA Buildings Import – -GitHub-

latimes-graphics-media lariac_buildings_2008 Building Data -BINARY-

  Other OSM Building Import

PDX Buildings Import -GitHub-



AB-802 Background Reading -here-

Mayors’ Announcement -here-

A Regional Approach to Tally and Reporting

There are formal planning districts in the State of California,
representing more than 90% of the state’s population. Each is
composed of member counties. It is convenient to a statewide mapping
effort to track and focus based on the following divisions:

 region_id |     region_name     
         1 | northern_california
         2 | sierra
         3 | bay_area
         4 | sacramento
         5 | central_valley
         6 | central_coast
         7 | southern_california
         8 | san_diego

1) Northern counties (very low populations)

2) Sierra counties (very low populations)

3) Association of Bay Area Governments

4) Sacramento Area Council of Governments

5) Central Valley

6) Central Coast

7) Southern California Association of Governments

8) San Diego Association of Governments

A preliminary survey of county GIS resources -here-

US Census — Place Definitions Overview -here-

Resources for Openstreetmap Los Angeles Buildings Import

* Geofabrik Openstreetmap Extracts, California -here-

* Los Angeles Geoportal Data -here- Notes -here-

* Openstreetmap Extract API Server Links -here-

* OSM Hollywood Hills Extract -BINARY-

Openstreetmap LA Buildings Import Samples

Custom Tags Question -here-