15 Feb 17
Automated Ingestion Setup — California AB 802 Support
========================================================
Primary Components of Image Recognition
* Automated data loading from primary sources
* Automated Retrieval of specific content
* Engine(s) applied to content
* Review and Feedback
previously, we focused on automation of vector sources (auth
and osm
),
this time, focus on new digital imagery content, automated/scriptable viewer,
and first steps in recognition. Also note content in /ECN
DOQQ California 2014
What is a DOQQ ? -LINK- CA 2014 – statewide, 1 meter resolution, 4-band (true color and color-infrared) GeoTIFF tiles; County-composites typically lack the Infrared (IR) band.
2016 is 0.6 meter resolution. 11TB of hard disks delivered to:
Bruce Nielsen CA State GIS Coordinator
Bruce.Nielsen@ca.usda.gov
## A prioritized queue for DOQQ Downloading ... Priority 1 => commercial; 2 => MF; 3 => both; 0 => neither subtract whole counties already done ... auth_buildings=# select priority, count(*) from doqq_processing group by priority order by priority; priority | count ----------+------- 0 | 5683 1 | 957 2 | 415 3 | 4007 doqqid | priority | fetched | processed --------+----------+---------+----------- 6684 | 0 | f | f 1021 | 0 | f | f 4944 | 3 | f | f 6490 | 1 | f | f 3467 | 3 | f | f 4380 | 3 | t | f ... As of 13Feb17, a little over a 1/4 of the priority 3 DOQQs have been fetched and processed and about 22% of all the priority DOQQs. Fetching is the limiting factor with 4,184 left to fetch. Est. 4,184 * 5 min / 60 min/hr = 349 hrs, or 15 days to finish fetching. update 15Feb17: Year 2014 DOQQs processed, compressed and IR extracted: 1484
NAIP Imagery Misc
http://ucanr.edu/blogs/blogcore/
http://www.atlas.ca.gov/download.html
2D Building Extraction other
GEM User Guide -LINK- -HOME-
IEEE Explorer -LINK-
NYPL-Spacetime FOSS vectorizer -LINK-
NYPL-Spacetime FOSS building browser-LINK-
Facebook ML Object Recognition -LINK-
osmb/?zoom=18&lat=40.90375&lon=-124.08229&layers=0000BTFFFFFF Too much detail in one scene: Thought Experiment: If you were a computer program, which parts would you identify as "buildings" ? Is the color version better ?
##--- BIS Session record - 14Feb17 ##- confirm operation dbb@i7d:~/CEC_i7d/bis-2/bisapi-hamlin/bis-workd$ source /home/dbb/CEC_i7d/bis-2/bisenv/bin/activate bisenv (bisenv) dbb@i7d:~/CEC_i7d/bis-2/bisapi-hamlin/bis-workd$ python segment.py --test ##-- pick a sample image ## tiffinfo generic TIFF details ## rio info short description ## (bisenv) dbb@i7d:~/CEC_i7d/bis-2/bisapi-hamlin/bis-workd$ tiffinfo /wd4m/ca_naip_2014_quads/final-ir/m_4012408_sw_10_1_20140607.tif ... DocumentName: Arcata North SW 4012408 DateTime: 2014:09:18 10:08:51 ... ##-- (bisenv) dbb@i7d:~/CEC_i7d/bis-2/bisapi-hamlin/bis-workd$ rio info /wd4m/ca_naip_2014_quads/final-ir/m_4012408_sw_10_1_20140607.tif { "res" : [ 1.02267103930246e-05, 1.02267103930246e-05 ], "shape" : [ 6764, 7021 ], "lnglat" : [ -124.093772354795, 40.9062477860959 ], "dtype" : "uint8", "driver" : "GTiff", "blockysize" : 256, "bounds" : [ -124.129673, 40.87166105, -124.0578714, 40.94083452 ], "count" : 2, "crs" : "EPSG:4326", "width" : 7021, "interleave" : "pixel", "nodata" : null, "height" : 6764, "transform" : [ 1.02267103930246e-05, 0, -124.12967322163, 0, -1.02267103930246e-05, 40.9408345206451 ], "tiled" : true, "colorinterp" : [ "gray", "alpha" ], "blockxsize" : 256 } ##-- ## run file - Too Large, crash ## hand extract smaller sample ## run two segmentations at very low threshold TFILE=/wd4m/Arcata_Ex0.png (bisenv) dbb@i7d:~/CEC_i7d/bis-2/bisapi-hamlin/bis-workd$ python segment.py ${TFILE} -t [20,50] --tile True --xt 10 thresh 1 ... ##-- ## more samples python segment.py ${TFILE} -t [16,32,64,96] --tile True --xt 10 python segment.py ${TFILE} -t [108,124,142] --tile True --xt 10 python segment.py ${TFILE} -t [150,180,210,240] --tile True --xt 10 ##-- End Session
OSM Buildings Load Chain
fetch-osm-ca-latest adds OSM California into OSM hot folder reload-osm-buildings DROP osm_buildings database C++ tool filters out 2D buildings by tag RELOAD SUMMARY: Analyzing building data ... Wed Feb 15 12:18:35 PST 2017 select key, count(*) as cnt from (select (each(tags)).* from areas) as foo group by key order by cnt desc; key | cnt ---------------------------------------+--------- building | 4442239 lacounty:bld_id | 2924011 lacounty:ain | 2922837 height | 2898947 ele | 2890401 start_date | 2795684 building:units | 2680758 ... $psql osm_buildings -f osm_bldg_pt_count0.sql zone | cnt | zone | cnt ------+-------+------+------- PZ_1 | 17354 | PZ_2 | 12207 (1 row) zone | cnt | zone | cnt ------+--------+------+------- PZ_3 | 739953 | PZ_4 | 39836 (1 row) zone | cnt | zone | cnt ------+--------+------+-------- PZ_5 | 303110 | PZ_6 | 204510 (1 row) zone | cnt | zone | cnt ------+---------+------+------- PZ_7 | 3085052 | PZ_8 | 39997
OSM Misc
https://tile.openstreetmap.org/cgi-bin/debug
https://taginfo.openstreetmap.org/keys/building#wiki
01 Feb 17
Automated Ingestion Setup — California AB 802 Support
========================================================
Data Provenance v0.01 -LINK-
Tripartite Solution
Three parts to iterative building definition: building outlines from authoritative sources auth_buildings
; building outlines from Openstreetmap osm_buildings
; machine-assisted recognition ma_buildings
.
Code-driven workflow components work together to form the basis of this “human in-in-the-loop” discovery and definition environment. Think of it as a “power assist” for a person, instead of a fully-autonomous discovery agent. Inference, association and record-keeping are provided as a service to the human operator. A key element is some rigor to a reproducible workflow. Two guiding examples are “Reproducible Scientific Workflows for Data Science” -LINK- and the German Spatial Data Infrastructure (GD-SDI) ingestion system -LINK- -WIKIPEDIA-. These are not “the most advanced systems” available, but rather a “good fit” to the nature of this project, the actual resources available in the required time, and a good fit between cutting-edge tools and tools that are well-understood and time-tested. Especially in high-tech, there is benefit to a stable tool chain.
Let’s look at the first two legs of this three-legged data system today:
OSM_Buildings
bin/fetch-osm-ca-latest 721M Jan 27 14:43 california-170128.osm.pbf bin/reload-osm-buildings Usage: reload-osm-buildings -go [-f] -go go ahead and do it -f fetch new osm data #-- dbb@i7d:~/CEC_i7d/Code_Misc_repo$ psql osm_buildings -f osm_bldg_pt_count0.sql zone | cnt | zone | cnt ------+-------+------+------- PZ_1 | 16347 | PZ_2 | 12140 zone | cnt | zone | cnt ------+--------+------+------- PZ_3 | 668975 | PZ_4 | 39829 zone | cnt | zone | cnt ------+--------+------+-------- PZ_5 | 302767 | PZ_6 | 204249 zone | cnt | zone | cnt ------+---------+------+------- PZ_7 | 2240489 | PZ_8 | 39908
bin/osm_building_stats tool; stats on osm_buildings; 31Jan2017 pz_id | pz_name | fips | county_name | osm_bldgs_pts_count -------+---------------------+------+-----------------+--------------------- 8 | san_diego | 073 | San Diego | 39908 7 | scag | 025 | Imperial | 648 7 | scag | 037 | Los Angeles | 2154919 7 | scag | 059 | Orange | 36089 7 | scag | 065 | Riverside | 27398 7 | scag | 071 | San Bernardino | 18728 7 | scag | 111 | Ventura | 2707 6 | central_coast | 053 | Monterey | 6359 6 | central_coast | 069 | San Benito | 23822 6 | central_coast | 079 | San Luis Obispo | 152462 6 | central_coast | 083 | Santa Barbara | 14433 6 | central_coast | 087 | Santa Cruz | 7173 5 | central_valley | 019 | Fresno | 9208 5 | central_valley | 029 | Kern | 189405 5 | central_valley | 031 | Kings | 366 5 | central_valley | 039 | Madera | 308 5 | central_valley | 047 | Merced | 23432 5 | central_valley | 077 | San Joaquin | 74145 5 | central_valley | 099 | Stanislaus | 3904 4 | sacog | 017 | El Dorado | 5407 4 | sacog | 061 | Placer | 5934 4 | sacog | 067 | Sacramento | 15399 4 | sacog | 101 | Sutter | 109 4 | sacog | 113 | Yolo | 12802 4 | sacog | 115 | Yuba | 178 3 | bay_area | 001 | Alameda | 65732 3 | bay_area | 013 | Contra Costa | 11140 3 | bay_area | 041 | Marin | 2942 3 | bay_area | 055 | Napa | 3493 3 | bay_area | 075 | San Francisco | 159723 3 | bay_area | 081 | San Mateo | 215838 3 | bay_area | 085 | Santa Clara | 204109 3 | bay_area | 095 | Solano | 2229 3 | bay_area | 097 | Sonoma | 3769 2 | sierra | 003 | Alpine | 1367 2 | sierra | 005 | Amador | 329 2 | sierra | 009 | Calaveras | 223 2 | sierra | 027 | Inyo | 351 2 | sierra | 043 | Mariposa | 1763 2 | sierra | 051 | Mono | 9266 2 | sierra | 057 | Nevada | 843 2 | sierra | 091 | Sierra | 611 2 | sierra | 107 | Tulare | 1999 2 | sierra | 109 | Tuolumne | 208 1 | northern_california | 007 | Butte | 1328 1 | northern_california | 011 | Colusa | 43 1 | northern_california | 015 | Del Norte | 59 1 | northern_california | 021 | Glenn | 147 1 | northern_california | 023 | Humboldt | 1599 1 | northern_california | 033 | Lake | 5668 1 | northern_california | 035 | Lassen | 142 1 | northern_california | 045 | Mendocino | 1612 1 | northern_california | 049 | Modoc | 175 1 | northern_california | 063 | Plumas | 67 1 | northern_california | 089 | Shasta | 2348 1 | northern_california | 093 | Siskiyou | 264 1 | northern_california | 103 | Tehama | 63 1 | northern_california | 105 | Trinity | 11
Auth_Buildings
bin/create-auth-buildings bin/add-auth-building-layer -- psql (9.6.1) auth_buildings=# \dt *.* Schema | Name | Type | Owner --------------------+--------------------------------+-------+---------- ambag | bldgs | table | dbb bkrsfld | bldgs | table | dbb census_p | pz_region_defs | table | dbb census_p | tl_2016_06_bg | table | dbb census_p | tl_2016_06_cousub | table | dbb census_p | tl_2016_uac10_ca | table | dbb census_p | tl_2016_us_county | table | dbb la14 | la_bldgs14_del_pt | table | dbb la14 | la_bldgs14_invalid | table | dbb la14 | la_bldgs14_pt | table | dbb la14 | lariac2_buildings_deleted_2014 | table | dbb la14 | lariac4_buildings_2014 | table | dbb marin0 | bldgs | table | dbb newportb | bldgs | table | dbb newportb | bldgs_orig | table | dbb petaluma | bldgs | table | dbb roseville | bldgs | table | dbb sacramento0 | bldgs | table | dbb sangis | bldgs | table | dbb santa_cruz | bldgs | table | dbb santacruz0 | bldgs | table | dbb sf0 | bldgs | table | dbb solano | bldgs | table | dbb solano0 | bldgs | table | dbb
Machine-Assisted Buildings
sample session -LINK- -PAPER-
Mapserver 7.04 Install
– many details to install, not for beginners
– overview of the software –LINK-
– database object IDs are visible at higher zoom levels
– some interesting links:
San Mateo County, Menlo Park mapserv -LINK-
Using the layers picker, on can find many differences between OSM buildings and others.
Tour: before changing any settings, look at what is presented. What do you notice ?
Next, click the blue ‘plus’ sign on the upper-right side, notice the contents.
* State of Jupyter Article -LINK-
– Notebooks are central to this solution
Big News from Google Earth -LINK-
but, the rumor is that they are not going to release the client app