15 Feb 17
Automated Ingestion Setup — California AB 802 Support
========================================================
Primary Components of Image Recognition
* Automated data loading from primary sources
* Automated Retrieval of specific content
* Engine(s) applied to content
* Review and Feedback
previously, we focused on automation of vector sources (auth and osm),
this time, focus on new digital imagery content, automated/scriptable viewer,
and first steps in recognition. Also note content in /ECN
DOQQ California 2014
What is a DOQQ ? -LINK- CA 2014 – statewide, 1 meter resolution, 4-band (true color and color-infrared) GeoTIFF tiles; County-composites typically lack the Infrared (IR) band.
2016 is 0.6 meter resolution. 11TB of hard disks delivered to:
Bruce Nielsen CA State GIS Coordinator
Bruce.Nielsen@ca.usda.gov
## A prioritized queue for DOQQ Downloading
...
Priority 1 => commercial; 2 => MF; 3 => both; 0 => neither
subtract whole counties already done
...
auth_buildings=# select priority, count(*) from doqq_processing group by
priority order by priority;
priority | count
----------+-------
0 | 5683
1 | 957
2 | 415
3 | 4007
doqqid | priority | fetched | processed
--------+----------+---------+-----------
6684 | 0 | f | f
1021 | 0 | f | f
4944 | 3 | f | f
6490 | 1 | f | f
3467 | 3 | f | f
4380 | 3 | t | f
...
As of 13Feb17, a little over a 1/4 of the priority 3 DOQQs have been fetched and
processed and about 22% of all the priority DOQQs. Fetching is the
limiting factor with 4,184 left to fetch.
Est. 4,184 * 5 min / 60 min/hr = 349 hrs, or 15 days to finish fetching.
update 15Feb17: Year 2014 DOQQs processed, compressed and IR extracted: 1484
NAIP Imagery Misc
http://ucanr.edu/blogs/blogcore/
http://www.atlas.ca.gov/download.html
2D Building Extraction other
GEM User Guide -LINK- -HOME-
IEEE Explorer -LINK-
NYPL-Spacetime FOSS vectorizer -LINK-
NYPL-Spacetime FOSS building browser-LINK-
Facebook ML Object Recognition -LINK-
osmb/?zoom=18&lat=40.90375&lon=-124.08229&layers=0000BTFFFFFF
Too much detail in one scene:
Thought Experiment:
If you were a computer program, which parts would you identify as "buildings" ?
Is the color version better ?

##--- BIS Session record - 14Feb17
##- confirm operation
dbb@i7d:~/CEC_i7d/bis-2/bisapi-hamlin/bis-workd$ source /home/dbb/CEC_i7d/bis-2/bisenv/bin/activate bisenv
(bisenv) dbb@i7d:~/CEC_i7d/bis-2/bisapi-hamlin/bis-workd$ python segment.py --test
##-- pick a sample image
## tiffinfo generic TIFF details
## rio info short description
##
(bisenv) dbb@i7d:~/CEC_i7d/bis-2/bisapi-hamlin/bis-workd$ tiffinfo /wd4m/ca_naip_2014_quads/final-ir/m_4012408_sw_10_1_20140607.tif
...
DocumentName: Arcata North SW 4012408
DateTime: 2014:09:18 10:08:51
...
##--
(bisenv) dbb@i7d:~/CEC_i7d/bis-2/bisapi-hamlin/bis-workd$ rio info /wd4m/ca_naip_2014_quads/final-ir/m_4012408_sw_10_1_20140607.tif
{
"res" : [ 1.02267103930246e-05, 1.02267103930246e-05 ],
"shape" : [ 6764, 7021 ],
"lnglat" : [
-124.093772354795,
40.9062477860959
],
"dtype" : "uint8",
"driver" : "GTiff",
"blockysize" : 256,
"bounds" : [ -124.129673, 40.87166105, -124.0578714, 40.94083452 ],
"count" : 2,
"crs" : "EPSG:4326",
"width" : 7021,
"interleave" : "pixel",
"nodata" : null,
"height" : 6764,
"transform" : [ 1.02267103930246e-05, 0, -124.12967322163, 0, -1.02267103930246e-05, 40.9408345206451 ],
"tiled" : true,
"colorinterp" : [ "gray", "alpha" ],
"blockxsize" : 256
}
##--
## run file - Too Large, crash
## hand extract smaller sample
## run two segmentations at very low threshold
TFILE=/wd4m/Arcata_Ex0.png
(bisenv) dbb@i7d:~/CEC_i7d/bis-2/bisapi-hamlin/bis-workd$ python segment.py ${TFILE} -t [20,50] --tile True --xt 10
thresh 1
...
##--
## more samples
python segment.py ${TFILE} -t [16,32,64,96] --tile True --xt 10
python segment.py ${TFILE} -t [108,124,142] --tile True --xt 10
python segment.py ${TFILE} -t [150,180,210,240] --tile True --xt 10
##-- End Session
OSM Buildings Load Chain
fetch-osm-ca-latest
adds OSM California into OSM hot folder
reload-osm-buildings
DROP osm_buildings database
C++ tool filters out 2D buildings by tag
RELOAD SUMMARY:
Analyzing building data ... Wed Feb 15 12:18:35 PST 2017
select key, count(*) as cnt from (select (each(tags)).* from areas) as foo group by key order by cnt desc;
key | cnt
---------------------------------------+---------
building | 4442239
lacounty:bld_id | 2924011
lacounty:ain | 2922837
height | 2898947
ele | 2890401
start_date | 2795684
building:units | 2680758
...
$psql osm_buildings -f osm_bldg_pt_count0.sql
zone | cnt | zone | cnt
------+-------+------+-------
PZ_1 | 17354 | PZ_2 | 12207
(1 row)
zone | cnt | zone | cnt
------+--------+------+-------
PZ_3 | 739953 | PZ_4 | 39836
(1 row)
zone | cnt | zone | cnt
------+--------+------+--------
PZ_5 | 303110 | PZ_6 | 204510
(1 row)
zone | cnt | zone | cnt
------+---------+------+-------
PZ_7 | 3085052 | PZ_8 | 39997
OSM Misc
https://tile.openstreetmap.org/cgi-bin/debug
https://taginfo.openstreetmap.org/keys/building#wiki
01 Feb 17
Automated Ingestion Setup — California AB 802 Support
========================================================
Data Provenance v0.01 -LINK-
Tripartite Solution
Three parts to iterative building definition: building outlines from authoritative sources auth_buildings; building outlines from Openstreetmap osm_buildings; machine-assisted recognition ma_buildings.
Code-driven workflow components work together to form the basis of this “human in-in-the-loop” discovery and definition environment. Think of it as a “power assist” for a person, instead of a fully-autonomous discovery agent. Inference, association and record-keeping are provided as a service to the human operator. A key element is some rigor to a reproducible workflow. Two guiding examples are “Reproducible Scientific Workflows for Data Science” -LINK- and the German Spatial Data Infrastructure (GD-SDI) ingestion system -LINK- -WIKIPEDIA-. These are not “the most advanced systems” available, but rather a “good fit” to the nature of this project, the actual resources available in the required time, and a good fit between cutting-edge tools and tools that are well-understood and time-tested. Especially in high-tech, there is benefit to a stable tool chain.
Let’s look at the first two legs of this three-legged data system today:
OSM_Buildings
bin/fetch-osm-ca-latest
721M Jan 27 14:43 california-170128.osm.pbf
bin/reload-osm-buildings
Usage: reload-osm-buildings -go [-f]
-go go ahead and do it
-f fetch new osm data
#--
dbb@i7d:~/CEC_i7d/Code_Misc_repo$ psql osm_buildings -f osm_bldg_pt_count0.sql
zone | cnt | zone | cnt
------+-------+------+-------
PZ_1 | 16347 | PZ_2 | 12140
zone | cnt | zone | cnt
------+--------+------+-------
PZ_3 | 668975 | PZ_4 | 39829
zone | cnt | zone | cnt
------+--------+------+--------
PZ_5 | 302767 | PZ_6 | 204249
zone | cnt | zone | cnt
------+---------+------+-------
PZ_7 | 2240489 | PZ_8 | 39908
bin/osm_building_stats tool; stats on osm_buildings; 31Jan2017
pz_id | pz_name | fips | county_name | osm_bldgs_pts_count
-------+---------------------+------+-----------------+---------------------
8 | san_diego | 073 | San Diego | 39908
7 | scag | 025 | Imperial | 648
7 | scag | 037 | Los Angeles | 2154919
7 | scag | 059 | Orange | 36089
7 | scag | 065 | Riverside | 27398
7 | scag | 071 | San Bernardino | 18728
7 | scag | 111 | Ventura | 2707
6 | central_coast | 053 | Monterey | 6359
6 | central_coast | 069 | San Benito | 23822
6 | central_coast | 079 | San Luis Obispo | 152462
6 | central_coast | 083 | Santa Barbara | 14433
6 | central_coast | 087 | Santa Cruz | 7173
5 | central_valley | 019 | Fresno | 9208
5 | central_valley | 029 | Kern | 189405
5 | central_valley | 031 | Kings | 366
5 | central_valley | 039 | Madera | 308
5 | central_valley | 047 | Merced | 23432
5 | central_valley | 077 | San Joaquin | 74145
5 | central_valley | 099 | Stanislaus | 3904
4 | sacog | 017 | El Dorado | 5407
4 | sacog | 061 | Placer | 5934
4 | sacog | 067 | Sacramento | 15399
4 | sacog | 101 | Sutter | 109
4 | sacog | 113 | Yolo | 12802
4 | sacog | 115 | Yuba | 178
3 | bay_area | 001 | Alameda | 65732
3 | bay_area | 013 | Contra Costa | 11140
3 | bay_area | 041 | Marin | 2942
3 | bay_area | 055 | Napa | 3493
3 | bay_area | 075 | San Francisco | 159723
3 | bay_area | 081 | San Mateo | 215838
3 | bay_area | 085 | Santa Clara | 204109
3 | bay_area | 095 | Solano | 2229
3 | bay_area | 097 | Sonoma | 3769
2 | sierra | 003 | Alpine | 1367
2 | sierra | 005 | Amador | 329
2 | sierra | 009 | Calaveras | 223
2 | sierra | 027 | Inyo | 351
2 | sierra | 043 | Mariposa | 1763
2 | sierra | 051 | Mono | 9266
2 | sierra | 057 | Nevada | 843
2 | sierra | 091 | Sierra | 611
2 | sierra | 107 | Tulare | 1999
2 | sierra | 109 | Tuolumne | 208
1 | northern_california | 007 | Butte | 1328
1 | northern_california | 011 | Colusa | 43
1 | northern_california | 015 | Del Norte | 59
1 | northern_california | 021 | Glenn | 147
1 | northern_california | 023 | Humboldt | 1599
1 | northern_california | 033 | Lake | 5668
1 | northern_california | 035 | Lassen | 142
1 | northern_california | 045 | Mendocino | 1612
1 | northern_california | 049 | Modoc | 175
1 | northern_california | 063 | Plumas | 67
1 | northern_california | 089 | Shasta | 2348
1 | northern_california | 093 | Siskiyou | 264
1 | northern_california | 103 | Tehama | 63
1 | northern_california | 105 | Trinity | 11
Auth_Buildings
bin/create-auth-buildings
bin/add-auth-building-layer
--
psql (9.6.1)
auth_buildings=# \dt *.*
Schema | Name | Type | Owner
--------------------+--------------------------------+-------+----------
ambag | bldgs | table | dbb
bkrsfld | bldgs | table | dbb
census_p | pz_region_defs | table | dbb
census_p | tl_2016_06_bg | table | dbb
census_p | tl_2016_06_cousub | table | dbb
census_p | tl_2016_uac10_ca | table | dbb
census_p | tl_2016_us_county | table | dbb
la14 | la_bldgs14_del_pt | table | dbb
la14 | la_bldgs14_invalid | table | dbb
la14 | la_bldgs14_pt | table | dbb
la14 | lariac2_buildings_deleted_2014 | table | dbb
la14 | lariac4_buildings_2014 | table | dbb
marin0 | bldgs | table | dbb
newportb | bldgs | table | dbb
newportb | bldgs_orig | table | dbb
petaluma | bldgs | table | dbb
roseville | bldgs | table | dbb
sacramento0 | bldgs | table | dbb
sangis | bldgs | table | dbb
santa_cruz | bldgs | table | dbb
santacruz0 | bldgs | table | dbb
sf0 | bldgs | table | dbb
solano | bldgs | table | dbb
solano0 | bldgs | table | dbb
Machine-Assisted Buildings
sample session -LINK- -PAPER-
Mapserver 7.04 Install
– many details to install, not for beginners
– overview of the software –LINK-
– database object IDs are visible at higher zoom levels
– some interesting links:
San Mateo County, Menlo Park mapserv -LINK-
Using the layers picker, on can find many differences between OSM buildings and others.
Tour: before changing any settings, look at what is presented. What do you notice ?
Next, click the blue ‘plus’ sign on the upper-right side, notice the contents.
* State of Jupyter Article -LINK-
– Notebooks are central to this solution
Big News from Google Earth -LINK-
but, the rumor is that they are not going to release the client app
