{"id":1166,"date":"2013-04-03T11:29:46","date_gmt":"2013-04-03T18:29:46","guid":{"rendered":"http:\/\/blog.light42.com\/wordpress\/?page_id=1166"},"modified":"2022-07-07T10:58:27","modified_gmt":"2022-07-07T17:58:27","slug":"re2","status":"publish","type":"page","link":"http:\/\/blog.light42.com\/wordpress\/?page_id=1166","title":{"rendered":"RE2"},"content":{"rendered":"<p>30 Mar 17<br \/>\nDocs and Handoff &#8212; WA18<br \/>\n========================================================<\/p>\n<p><strong>Prediction Run &#8212; Pass II<\/strong><\/p>\n<p>* rebuilt tiling<br \/>\n  &#8211; there was a bug in the tiling code that caused gaps, due to a floating point truncation; using %f format for the float, as input to the transform, was insufficient resolution and the tiles were slightly malformed<br \/>\n  &#8211; after retiling Inglewood and Humboldt Arcata, re-run training and prediction<br \/>\n  &#8211; take advantage of the change, and clip Humboldt to a water layer<\/p>\n<p>* Made a quick spatial join of Predicted set to Polys <code>htarg2<\/code><br \/>\n  &#8211; sent to garlynn for preliminary QC<br \/>\n  &#8211; created a few QGis visualizations<\/p>\n<p>Humboldt Data<br \/>\n   dist2road<\/p>\n<p>Inglewood \/ Training<br \/>\n   ctr_targ T\/F;  ctr_seg T\/F; perc_overlap (0-1); <\/p>\n<p>Humboldt \/ Prediction<br \/>\n   coverage2 value (0-1);<\/p>\n<p><strong>MGRS Grid Reference<\/strong><br \/>\n<div id=\"attachment_3384\" style=\"width: 310px\" class=\"wp-caption aligncenter\"><a href=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/mgrs_import_scrn.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-3384\" src=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/mgrs_import_scrn-300x198.png\" alt=\"\" width=\"300\" height=\"198\" class=\"size-medium wp-image-3384\" srcset=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/mgrs_import_scrn-300x198.png 300w, http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/mgrs_import_scrn-768x507.png 768w, http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/mgrs_import_scrn-1024x676.png 1024w, http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/mgrs_import_scrn.png 1087w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a><p id=\"caption-attachment-3384\" class=\"wp-caption-text\">MGRS 10t 10s 11s start<\/p><\/div><\/p>\n<p>buildings.buildings final<\/p>\n<p>5563240 from all tables<br \/>\n5457201 removing duplicates<\/p>\n<p>&nbsp;<\/p>\n<p>Google Earth Enterprise on Github<br \/>\nhttp:\/\/www.opengee.org\/<\/p>\n<p>Developer See is using SEGNET based ML<br \/>\nhttps:\/\/github.com\/alexgkendall\/caffe-segnet<br \/>\nhttps:\/\/github.com\/developmentseed<br \/>\nhttps:\/\/www.developmentseed.org\/projects\/<\/p>\n<p>24 Mar 17<br \/>\nRecognition Trials &#8212; WA18<br \/>\n========================================================<\/p>\n<p><strong>Machine Learning &#8212; Tuning the Training Process<\/strong><\/p>\n<p>  The <strong>BIS2<\/strong> segmentation engine has three paramaters. For this series of tests <code>evaltest<\/code>, the imagery zoom level is always &#8220;full image resolution&#8221; and only one NAIP DOQQ is segmented (a portion of Inglewood). Iterating through input parameters, segmentation is performed and the result polygons are saved. This is repeated many times, and for several <em>&#8216;modes&#8217;<\/em> of the imagery. Valid imagery <em>&#8216;mode&#8217;<\/em> starts with true color &#8216;tc&#8217;, false color &#8216;fc&#8217; and infrared &#8216;ir&#8217;; new products such as a Sobel Filter result can be added. (20mar17 &#8211; first sobel-enhanced segmentation added)<\/p>\n<p>  In <code>evaltest<\/code>, 675  segmentation run combinations of (t,s,c,mode) at full resolution on NAIP 2016 imagery were executed and the results are stored as shapefiles on disk. A small python program (<code>load-seg-data.py<\/code>) loads the shapefiles into a spatial table. <\/p>\n<p>The 675 tables test tables:<br \/>\n&nbsp;9 values for threshold (t) 10,20,30,40,50,60,70,80,90<br \/>\n&nbsp;5 values for shape (a) 0.1,0.3,0.5,0.7,0.9<br \/>\n&nbsp;5 values for compact (c) 0.1,0.3,0.5,0.7,0.9<br \/>\n&nbsp;3 values for mode ir,fc,tc<\/p>\n<p>&nbsp; &nbsp; 675 = 9 * 5 * 5 * 3<\/p>\n<p>In the next step, statistics are calculated on the previous results (code in <code>prep-relevance.py<\/code>). These stats are saved in tables in the <code>ma_buildings  relevance<\/code> schema. Tables have parameter values embedded in their names in a structured way (see below).  <\/p>\n<p>Completion of this step is to choose the best parameter set of a search objective. In this experiment, we are matching <strong>shape-to-shape<\/strong>.  Python <code>eval-stats.py<\/code> generates SQL which contains a regular expression which filters the 675 tables of run result stats by encoded name, performs summary calculations on all rows in those tables (connected in the output by UNION ALL) and sorts all of that by a desired stats output column to discover outcomes in the test runs. Built-in Postgres math functions <strong>min() max() avg() stddev()<\/strong> are executed on three columns from each table, and counts of two more booleans. A SQL UNION ALL construct lists the chosen stats for all considered tables, one line per table, which then can be sorted in any way that is convenient to pick winners. Using this generalized format, statistics for any model run can quickly be compared. On the testing hardware, 450 tables of a few thousand rows each, are summarized and sorted for comparison in about 1.5 seconds.<\/p>\n<div id=\"attachment_3351\" style=\"width: 310px\" class=\"wp-caption aligncenter\"><a href=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/osmb_seg_evals_mapserver.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-3351\" src=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/osmb_seg_evals_mapserver-300x166.png\" alt=\"\" width=\"300\" height=\"166\" class=\"size-medium wp-image-3351\" srcset=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/osmb_seg_evals_mapserver-300x166.png 300w, http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/osmb_seg_evals_mapserver-768x424.png 768w, http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/osmb_seg_evals_mapserver-1024x566.png 1024w, http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/osmb_seg_evals_mapserver.png 1216w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a><p id=\"caption-attachment-3351\" class=\"wp-caption-text\">Visual display of segmentation in a Training Area of Inglewood, California.<\/p><\/div>\n<pre>\r\n-- Five Spatial Tests are Performed on Every Segment Intersecting a Training Building --\r\n\r\nCREATE TABLE relevance.evaltest_evaltest1_MODE_T_S_C\r\n(\r\n  gid integer PRIMARY KEY,\r\n  class integer,      -- 0 not considered;  2 commercial bldg\r\n  pctoverlap double precision,\r\n  coverage1 double precision,\r\n  coverage2 double precision,\r\n  centr_seg boolean,\r\n  centr_trg boolean\r\n)\r\n\r\n-- Structure of a model run stats table  23mar17\r\n--\r\nma_buildings=# select * from \"5bandt1_test2_5b_50_03_03\" limit 5;\r\n  gid  | class |    pctoverlap     |     coverage1     |    coverage2     | centr_seg | centr_trg \r\n-------+-------+-------------------+-------------------+------------------+-----------+-----------\r\n 64311 |     2 | 0.547329611192823 | 0.970785850687081 | 35.9228431691447 | t         | f\r\n 64559 |     2 | 0.137682284191657 | 0.836077333010945 | 0.54216208047878 | f         | f\r\n 64107 |     2 |                 1 | 0.972281692288727 |   70.15447713594 | t         | f\r\n 64698 |     2 | 0.728239585461471 | 0.924875366545267 | 17.6592649379458 | t         | f\r\n 64721 |     2 | 0.923712040345278 | 0.943590470848252 | 30.8265007068801 | t         | f\r\n(5 rows)\r\n\r\n\r\n\r\nRelevance Query:\r\n--    segmented poly (a)   < =  polys from a segmentation run w\/ given params  \r\n--    target poly (b)      <=  training polys, what we are matching  \r\n--    raw areas (c)        <=  all known bldg polys in the area\r\n--\r\n\r\n        | insert into relevance.evaltest_evaltest1_fc_50_09_05                                                                                                                 \r\n        |   select a.gid,                                                                                                                                                      \r\n        |          1::integer,                                                                                                                                                 \r\n        |          st_area(st_intersection(a.geom,c.geom))\/st_area(a.geom),                                                                                                    \r\n        |          st_area(c.geom) - st_area(st_intersection(a.geom,c.geom))                                                                                                   \r\n        |            + abs(st_area(a.geom) - st_area(c.geom)),                                                                                                                 \r\n        |          (st_area(c.geom) - st_area(st_intersection(a.geom,c.geom)))                                                                                                 \r\n        |            \/ st_area(a.geom),                                                                                                                                        \r\n        |          st_intersects(st_centroid(a.geom), c.geom),                                                                                                                 \r\n        |          st_intersects(st_centroid(c.geom), a.geom)                                                                                                                  \r\n        |     from seg_polys a                                                                                                                                                 \r\n        |          join raw.areas c on st_intersects(a.geom, c.geom)                                                                                                           \r\n        |           left outer join \r\n                       relevance.evaltest_evaltest1_fc_50_09_05 b on a.gid=b.gid                                                                                   \r\n        |    where b.class is null  and  \r\n               jobid='evaltest'  and  project='evaltest1'  and  \r\n               mode='fc'  and  t=50.000000  and  s=9.000000  and  c=5.000000  \r\n                 on conflict do nothing \r\n<\/pre>\n<p><strong>Relevance Tests<\/strong><\/p>\n<p>&nbsp; <strong>pctoverlap<\/strong> =>  the area of the intersection between the target training poly and a segmented poly, divided by the total area of the segmented poly; meaning, the percentage of the segmented poly that is intersecting the target, converges to 1.0 on a segmented poly that is entirely intersecting the target<\/p>\n<p>&nbsp; <strong>cov1<\/strong> =>  the area of a building that is not covered by the intersection with the segmentation poly <em>plus<\/em> the absolute value of the difference between the segmented poly and the building poly; meaning, a measure of non-fit; converges to zero on a perfect fit <em>(actually, we changed this again, see code)<\/em><\/p>\n<p>&nbsp; <strong>cov2<\/strong> =>  the area of the segmented poly outside of the intersection with the target, divided by the area of the segmented poly; meaning, the proportion of the poly that is not part of the target fit; converges to 0 on a perfect fit<\/p>\n<pre>\r\neval-stats.py -m tc -f cov1 -o avg -g\r\n\r\n  search all result tables in schema 'relevance' for test run summary tables matching\r\nimagery mode 'tc' True Color with any parameters; from those tables, \r\ncalc min() max() avg() and stddev() on all measures, but emit only the totals for the chosen column 'cov1', \r\none line per table; sort those results on column 'average' with the default sort order  \r\nof smallest first. In this example, segmentation parameter 20-07-07 gave the lowest\r\naverage value of cov1 for all test runs, with 20-07-03 closely following. \r\n\r\n  table                         class count minimum   maximum   average   stddev  centr_seg centr_trg\r\n----------------------------------------------------------------------------------------------------------\r\nevaltest_evaltest1_tc_20_07_07   1   7237   0.000324  0.801199  0.052418  0.083101   3420   1414\r\nevaltest_evaltest1_tc_20_07_03   1   6694   0.000476  0.800666  0.052979  0.083563   3158   1336\r\nevaltest_evaltest1_tc_30_05_09   1   5404   0.000354  0.800875  0.053037  0.084356   2406   1232\r\nevaltest_evaltest1_tc_30_03_09   1   6731   0.000324  0.800875  0.053139  0.082947   3202   1362\r\nevaltest_evaltest1_tc_40_01_07   1   5310   0.000468  0.899413  0.053150  0.084180   2394   1223\r\nevaltest_evaltest1_tc_30_05_03   1   5252   0.000320  0.801222  0.053332  0.082887   2362   1209\r\nevaltest_evaltest1_tc_30_01_01   1   8033   0.000331  0.801431  0.053378  0.082763   3932   1425\r\nevaltest_evaltest1_tc_30_01_05   1   8069   0.000331  0.800875  0.053403  0.081330   3936   1451\r\nevaltest_evaltest1_tc_30_01_07   1   8212   0.000394  0.801083  0.053454  0.082504   4053   1459\r\nevaltest_evaltest1_tc_20_05_03   1   9560   0.000320  0.801222  0.053536  0.081324   4769   1530\r\nevaltest_evaltest1_tc_30_05_05   1   5379   0.000531  0.800875  0.053862  0.083052   2403   1210\r\nevaltest_evaltest1_tc_20_05_05   1   9607   0.000476  0.801410  0.053940  0.083886   4795   1542\r\nevaltest_evaltest1_tc_30_03_05   1   6658   0.000320  0.800875  0.054063  0.083892   3125   1324\r\n<\/pre>\n<pre>\r\n-- Classified Segments with a Given Band and Segmentation --\r\n\r\nma_buildings=# select class, count(*) from evaltest_evaltest1_ir_50_01_03 group by class;\r\n class | count \r\n-------+-------\r\n     1 |  4708\r\n     2 |   324\r\n(2 rows)\r\n<\/pre>\n<hr \/>\n<h4>Topics in Training Target Selection<\/h4>\n<p><strong>Process Steps<\/strong><\/p>\n<ul>\n<li>Generate 2000x2000 pixel tiles in the Training Area and the Search Area<br \/>\n<em>done -- 80 in Inglewood and 130 in Humboldt<\/em><\/li>\n<li>Segment those tiles <em>done -- chose 50_03_03<\/em><\/li>\n<li>Load that segment data into the database <em>done -- table sd_data.seg_polys_5b<\/em><\/li>\n<li>Feed the segment data to ML tools (either train or predict) <br \/>\n<em>predict-segments.py and train-segments.py<\/em><\/li>\n<li>Write the ML results back to the database.<\/li>\n<\/ul>\n<p><strong>Building Size<\/strong><br \/>\n &nbsp; Very small buildings are more likely to be confused with other kinds of visible features, therefore remove training target buildings smaller than 90 m-sq<\/p>\n<p>Next, what is a reasonable cutoff for buildings between 90 m-sq and MAX_SIZE<br \/>\nThis is a stats distribution question.. We know the buildings of interest (commercial) look very much like other buildings. What is are identifying characteristics of commercial buildings that will help distinguish them from all buildings? Looking at distributions of features on all buildings is some curve, and the features of buildings we are looking for in the training set is a different curve.. Non-commercial buildings will have more occurrences in the low sq-m sizes, making false matches to commercial buildings more likely. <\/p>\n<p>Let's start simple.<\/p>\n<p>A quick look at the full training set shows that roughly 90,000 of the 250,000 bldgs are less than 150 m-sq, which means that 160,000 or 64% are >= 150 m-sq . My first reaction is ... look for  >= 150 m-sq<\/p>\n<p>--<\/p>\n<p>20Mar17 <strong>Next Up<\/strong><\/p>\n<p>Segmentation is a general-purpose topic in imaging right now, so inevitably there will be other toolkits and advanced researchers that use it.. Some parts of the setup now would have to be generalized to handle other segmentation engines, but that is \"out of scope\" for this pass. For example, masking, poly splitting and poly merging, are well know in image segmentation. We have many options to explore, but must prioritize as we make the best use of the BIS2 tool we started with.. <\/p>\n<p>We have successfully added a fifth band to the raster layer 'mode'='sobel'.<br \/>\nUsing this five-band input, combined with segmentation trials, definitely<br \/>\nimproves the result. The toolchain did not have to change other than add the new layer.<br \/>\nDocumenting the details of this is better left at the code level.<\/p>\n<p>revised relevance query:<\/p>\n<pre>\r\ninsert into relevance.\"inglewood_run1_5b_50_03_03\" \r\n     1::integer,                                                                                \r\n     -- pctoverlap                                                                                                                                                                                          \r\n        st_area(st_intersection(a.geom,c.geom))\/st_area(a.geom),                                                                                                                                               \r\n     -- coverage1                                                                                                                                                                                           \r\n        (st_area(c.geom) + st_area(a.geom) -                                                                                                                                                                   \r\n         2.0*st_area(st_intersection(a.geom,c.geom))) \/                                                                                                                                                      \r\n          (st_area(c.geom) + st_area(a.geom)),                                                                                                                                                                \r\n     -- coverage2                                                                                                                                                                                           \r\n        (st_area(c.geom) - st_area(st_intersection(a.geom,c.geom)))                                                                                                                                            \r\n          \/ st_area(a.geom),                                                                                                                                                                                   \r\n     -- centr_seg                                                                                                                                                                                           \r\n        st_intersects(st_centroid(a.geom), c.geom),                                                                                                                                                            \r\n     -- centr_trg                                                                                                                                                                                           \r\n        st_intersects(st_centroid(c.geom), a.geom)                                                                                                                                                          \r\n   from seg_polys_5b a                                 -- (3) segment polygons                                                                                                                            \r\n    join la14.lariac4_buildings_2014 c on st_intersects(a.geom, c.geom) -- (4) building polygons                                                                                                            \r\n      left outer join relevance.\"inglewood_run1_5b_50_03_03\" b on a.gid=b.gid                                                                                                                                 \r\n       where b.class is null  and  jobid='inglewood'  and  project='run1'  and  \r\n         mode='5b'  and  t=50.000000  and  s=3.000000  and  c=3\r\n\r\n -- previous\r\n select a.gid,                                                                                       \r\n    1::integer,                                                                                  \r\n    -- pctoverlap                                                                                                                                          \r\n    st_area(st_intersection(a.geom,c.geom))\/st_area(a.geom),                                                                                               \r\n    -- coverage1                                                                                                                                           \r\n    (st_area(c.geom) + st_area(a.geom) -                                                                                                                   \r\n      2.0*st_area(st_intersection(a.geom,c.geom))) \/                                                                                                      \r\n        (st_area(c.geom) + st_area(a.geom)),                                                                                                                \r\n    -- coverage2                                                                                                                                           \r\n    (st_area(c.geom) - st_area(st_intersection(a.geom,c.geom)))                                                                                            \r\n      \/ st_area(a.geom),                                                                                                                                   \r\n    -- centr_seg                                                                                                                                           \r\n      st_intersects(st_centroid(a.geom), c.geom),                                                                                                            \r\n    -- centr_trg                                                                                                                                           \r\n      st_intersects(st_centroid(c.geom), a.geom)                                                                                                             \r\n from seg_polys_5b a                                                                                                                                         \r\n        join raw.areas c on st_intersects(a.geom, c.geom)                                                                                                      \r\n          left outer join relevance.\"5bandt1_test2_5b_50_03_04\" b on a.gid=b.gid                                                                                 \r\n where \r\n    b.class is null  and  jobid='5bandt1'  and  project='test2'  and  mode='5b'  and  \r\n    t=50.000000  and  s=3.000000  and  c=4.000000  \r\n      on conflict do nothing \r\n\r\n--\r\ntrain-segments.py\r\n  from sklearn.neighbors import KNeighborsClassifier\r\n  from sklearn.ensemble  import GradientBoostingClassifier\r\n\r\n\r\n--\r\nma_buildings=# \\d inglewood_run1_5b_50_03_03\r\nTable \"relevance.inglewood_run1_5b_50_03_03\"\r\n   Column   |       Type       | Modifiers \r\n------------+------------------+-----------\r\n gid        | integer          | not null\r\n class      | integer          | \r\n pctoverlap | double precision | \r\n coverage1  | double precision | \r\n coverage2  | double precision | \r\n centr_seg  | boolean          | \r\n centr_trg  | boolean          | \r\nIndexes:\r\n    \"inglewood_run1_5b_50_03_03_pkey\" PRIMARY KEY, btree (gid)\r\n\r\n<\/pre>\n<p><a href=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/osmb_tile5_50_03_03.png\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/osmb_tile5_50_03_03-300x198.png\" alt=\"\" width=\"300\" height=\"198\" class=\"aligncenter size-medium wp-image-3334\" srcset=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/osmb_tile5_50_03_03-300x198.png 300w, http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/osmb_tile5_50_03_03-768x507.png 768w, http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/osmb_tile5_50_03_03.png 1020w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p><strong>50-03-03<\/strong><br \/>\n  We selected this parameter set to use for the test area.. It is good at selecting a lot of area from a large, single roof.  Complex roofs that have colored parts, using the training tests we have, do not get joined using these settings.<\/p>\n<p><strong>dist2road<\/strong><br \/>\n  This measure seems important, yet it is a substantial cost to calculate, at least the way we are doing it now.<\/p>\n<pre>\r\n 5415961.887 ms  statement: \r\n&nbsp;  update sd_data.seg_polys_5b a \r\n&nbsp; &nbsp; set dist2road=(select st_distance(a.geom, b.geom) from roads b \r\n&nbsp; &nbsp; &nbsp; &nbsp; order by b.geom < ->a.geom limit 1) \r\n&nbsp; &nbsp;  where dist2road is null and a.gid between 400001 and 500001\r\n<\/pre>\n<hr \/>\n<h4>Completed Prediction Run<\/h4>\n<div id=\"attachment_3378\" style=\"width: 310px\" class=\"wp-caption aligncenter\"><a href=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/Humboldt_predict_c0_03272017.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-3378\" src=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/Humboldt_predict_c0_03272017-300x166.png\" alt=\"\" width=\"300\" height=\"166\" class=\"size-medium wp-image-3378\" srcset=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/Humboldt_predict_c0_03272017-300x166.png 300w, http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/Humboldt_predict_c0_03272017-768x425.png 768w, http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/Humboldt_predict_c0_03272017-1024x566.png 1024w, http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/Humboldt_predict_c0_03272017.png 1460w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a><p id=\"caption-attachment-3378\" class=\"wp-caption-text\">NAIP 2016 60cm segmented; predicted and classified based on Inglewood_5b model, sklearn  Gradient Boost Tree (GBT).<\/p><\/div>\n<p>&nbsp;<\/p>\n<div id=\"attachment_3381\" style=\"width: 310px\" class=\"wp-caption aligncenter\"><a href=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/Inglewood_seg5b_bldgs_train.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-3381\" src=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/Inglewood_seg5b_bldgs_train-300x169.png\" alt=\"\" width=\"300\" height=\"169\" class=\"size-medium wp-image-3381\" srcset=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/Inglewood_seg5b_bldgs_train-300x169.png 300w, http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/Inglewood_seg5b_bldgs_train-768x433.png 768w, http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/Inglewood_seg5b_bldgs_train-1024x578.png 1024w, http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/Inglewood_seg5b_bldgs_train.png 1444w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a><p id=\"caption-attachment-3381\" class=\"wp-caption-text\">Inglewood_5b training polygons, alongside classified segmented polygons<\/p><\/div>\n<pre>\r\n## Complete Machine-Learning Pipeline pass\r\n##  23March17 -dbb\r\n\r\n# tile the cousub for inglewood\r\n.\/tile-area.py -o inglewood -f 0603791400 -s 2000 -m 5b\r\n\r\n# segment all the tiles in the inglewood directory\r\n.\/segment-dir.py -m 5b -t 50 -s 0.3 -c 0.3 inglewood\/\r\n\r\n# load all the segment polygons into the database, compute dist2road\r\n# with jobid=inglewood, project=run1\r\n# segments go into sd_data.seg_polys_5b\r\n.\/load-seg-data.py -p run1 -m 5b inglewood inglewood\/\r\n\r\n# compare the segment polygons to the training polygons\r\n# filtering any training polygons less than 150 sq-m in area\r\n# save the results in relevance.inglewood_run1_5b_50_03_03\r\n.\/prep-relevance.py -b inglewood_run1_5b_50_03_03 -j inglewood -p run1\r\n-m 5b -t 50 -s 0.3 -c 0.3 -z 150\r\n\r\n# Using scikit learn and Gradient Boost Tree ML module\r\n# train it using the segments in the database\r\n# and save the trained object in inglewood_run1_5b_50_03_03-r0.5.pkl.gz\r\n.\/train-segments.py -j inglewood -p run1 -m 5b -r 0.5 -b\r\ninglewood_run1_5b_50_03_03 inglewood_run1_5b_50_03_03-r0.5.pkl.gz\r\n\r\n# tile an area for Humboldt based on a bounding box\r\n# and place the tiles in directory humboldt\r\n.\/tile-area.py -o humboldt -b -124.17664,40.85411,-124.05923,41.00326 -s\r\n2000 -m 5b\r\n\r\n# segment all the tiles in the humboldt directory\r\n.\/segment-dir.py -m 5b -t 50 -s 0.3 -c 0.3 humboldt\/\r\n\r\n# load all the segment polygons into the database, compute dist2road\r\n# with jobid=humboldt, project=target\r\n# segments go into sd_data.seg_polys_5b\r\n.\/load-seg-data.py -p target -m 5b humboldt humboldt\/\r\n\r\n# run the trained classifier inglewood_run1_5b_50_03_03-r0.5.pkl.gz\r\n# against the humboldt segments and save the results to\r\n# predict.humboldt_target_5b_50_03_03_r005\r\n.\/predict-segments.py -b humboldt_target_5b_50_03_03_r005 -j humboldt -p\r\ntarget -m 5b inglewood_run1_5b_50_03_03-r0.5.pkl.gz\r\n\r\n<\/pre>\n<hr \/>\n<p>Visualized output per parameter set:<\/p>\n<p>  http:\/\/ct.light42.com\/\/osmb\/?zoom=19&lat=33.87306&lon=-118.37408&layers=B000000FFFFFTFFT&seg=evaltest_evaltest1_tc_70_03_07<\/p>\n<p>  http:\/\/ct.light42.com\/osmb\/?zoom=19&lat=33.87306&lon=-118.37408&layers=B000000FFFFFTFFT&seg=evaltest_evaltest1_tc_70_03_05<\/p>\n<hr \/>\n<h4>Other Topics<\/h4>\n<p><strong>ECN OSM Import<\/strong><br \/>\nhttps:\/\/github.com\/darkblue-b\/ECN_osm_import\/blob\/master\/load-osm-buildings.cpp<\/p>\n<p><strong>BIDS<\/strong><br \/>\n&nbsp; https:\/\/bids.berkeley.edu\/news\/searchable-datasets-python-images-across-domains-experiments-algorithms-and-learning<\/p>\n<p><strong>OBIA<\/strong><br \/>\nhttp:\/\/wiki.landscapetoolbox.org\/doku.php\/remote_sensing_methods:object-based_classification<\/p>\n<p>https:\/\/www.ioer.de\/segmentation-evaluation\/results.html<br \/>\nhttps:\/\/www.ioer.de\/segmentation-evaluation\/news.html<\/p>\n<p>&nbsp;<\/p>\n<p><strong>OSM<\/strong><br \/>\n&nbsp; https:\/\/wiki.openstreetmap.org\/wiki\/United_States_admin_level#cite_ref-35<\/p>\n<p>Jochen Topf reports that the multi-polygon fixing effort is going well and provides an issue thread to track the evolution.<br \/>\n  https:\/\/blog.jochentopf.com\/2017-03-09-multipolygon-fixing-effort.html<br \/>\n  https:\/\/github.com\/osmlab\/fixing-polygons-in-osm\/issues\/15<br \/>\n-<\/p>\n<p><a href=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/import_03222017_viz.png\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/import_03222017_viz-300x260.png\" alt=\"\" width=\"300\" height=\"260\" class=\"alignright size-medium wp-image-3339\" srcset=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/import_03222017_viz-300x260.png 300w, http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/import_03222017_viz-768x665.png 768w, http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/import_03222017_viz.png 935w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a><br \/>\nOn Mar 22, 2017 9:59 AM, \"Brad Neuhauser\" <brad .neuhauser@gmail.com> wrote:<\/p>\n<p>> I think this is it?<br \/>\n>   http:\/\/wiki.openstreetmap.org\/wiki\/Available_Building_Footprints<br \/>\n><br \/>\n> On Wed, Mar 22, 2017 at 11:54 AM, Rihards <richlv @nakts.net> wrote:<br \/>\n><br \/>\n>> On 2017.03.22. 18:37, Clifford Snow wrote:<br \/>\n>> > I am happy to announce that Microsoft has made available approximately<br \/>\n>> > 9.8 million building footprints including building heights in key<br \/>\n>> > metropolitan areas. These footprints are licensed ODbL to allow<br \/>\n>> > importing into OSM. These footprints where manually created using high<br \/>\n>> > resolution imagery. The data contains no funny field names such as<br \/>\n>> > tiger:cfcc or gnis:featureid or fcode=46003, just building height.<br \/>\n>> ><br \/>\n>> ><br \/>\n>> > Please remember to follow the import guidelines.<br \/>\n>> ><br \/>\n>> > The wiki [1] has more information on these footprints as well as links<br \/>\n>> > to download.<br \/>\n>><br \/>\n>> the link seems to be a copypaste mistake \ud83d\ude42<br \/>\n>><br \/>\n>> > [1] http:\/\/www.openstreetmap.org\/user\/dieterdreist\/diary\/40727<br \/>\n>> ><br \/>\n>> > Enjoy,<br \/>\n>> > Clifford Snow<br \/>\n>> ><br \/>\n>> ><br \/>\n>> > --<br \/>\n>> > @osm_seattle<br \/>\n>> > osm_seattle.snowandsnow.us <http: \/\/osm_seattle.snowandsnow.us><br \/>\n>> > OpenStreetMap: Maps with a human touch<br \/>\n>> ><br \/>\n>> ><\/p>\n<p><strong>NGA<\/strong><br \/>\nhttps:\/\/apps.nga.mil\/Home<\/p>\n<p><strong>Development Seed<\/strong><br \/>\nhttps:\/\/developmentseed.org\/blog\/2017\/01\/30\/machine-learning-learnings\/<\/p>\n<hr \/>\n<p>&nbsp;<\/p>\n<p>17 Mar 17<br \/>\nRecognition Trials -- WA18<br \/>\n========================================================<\/p>\n<blockquote><p>\n &ldquo;<em>object-based approaches to segmentation ... are becoming increasingly important as<br \/>\n&nbsp; &nbsp;the spatial resolution of images gets smaller relative to the size of the objects of interest.<\/em> &rdquo;<\/p>\n<p>  Andy Lyons, PhD<br \/>\n  University of California, Division of Agriculture and Natural Resources<br \/>\n  Informatics in GIS <a href=\"http:\/\/igis.ucanr.edu\/\" target=\"_blank\" rel=\"noopener\">-LINK-<\/a>\n<\/p><\/blockquote>\n<p>&nbsp;<br \/>\nTopics for <strong>Detecting Buildings in Aerial Imagery<\/strong>:<br \/>\n- Object-based Image Analysis versus Pixel-based Image Analysis <em>redux<\/em><br \/>\n- Change-detection using multi-temporal imagery<br \/>\n- Inherent Difficulty of Identifying Buildings in Imagery, even for a human<br \/>\n- LIDAR can increase detection dramatically, but is difficult to scale<\/p>\n<p>Topics in <strong>Segmentation<\/strong><br \/>\n- Object-based Image Analysis relies on Segmentation<br \/>\n- removing roads before segmentation<br \/>\n- removing vegetation before segmentation using IR layer<br \/>\n- using height information and shadow-detection<\/p>\n<p><strong>Measuring Success<\/strong><br \/>\n- positive match rate to ground truth<br \/>\n- miss rate to ground truth<br \/>\n- false positives<br \/>\n- false negatives <\/p>\n<blockquote><p>\nAlthough many image segmentation methods have been developed,<br \/>\nthere is a strong need for new and more sophisticated segmentation<br \/>\nmethods to produce more accurate segmentation results for urban object<br \/>\nidentification on a fine scale <em>(Li et al., 2011)<\/em>.<br \/>\nSegmentation can be especially problematic in areas with low contrast<br \/>\nor where different appearance does not imply different meaning. In this<br \/>\ncase the outcomes are represented as wrongly delineated image objects<br \/>\n<em>(Kanjir et al., 2010)<\/em>.<\/p>\n<p><em>excerpted from:<\/em><br \/>\n&nbsp;  URBAN OBJECT EXTRACTION FROM DIGITAL SURFACE MODEL AND DIGITAL AERIAL IMAGES   Grigillo, Kanjir <a href=\"http:\/\/www.isprs-ann-photogramm-remote-sens-spatial-inf-sci.net\/I-3\/215\/2012\/isprsannals-I-3-215-2012.pdf\" target=\"_blank\" rel=\"noopener\">-LINK-<\/a>\n<\/p><\/blockquote>\n<hr \/>\n<p><em>research with examples of Object-based Image Analysis (OBIA):<\/em><\/p>\n<p>An object-oriented approach to urban land cover change detection. Doxani, G.; Siachalou, S.; Tsakiri-Strati, M.; Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci 2008, 37, 1655\u20131660. [<a href=\"http:\/\/scholar.google.com\/scholar_lookup?title=An%20object-oriented%20approach%20to%20urban%20land%20cover%20change%20detection&#038;author=Doxani,+G.&#038;author=Siachalou,+S.&#038;author=Tsakiri-Strati,+M.&#038;publication_year=2008&#038;journal=Int.+Arch.+Photogramm.+Remote+Sens.+Spat.+Inf.+Sci&#038;volume=37&#038;pages=1655%E2%80%931660\" target=\"_blank\" rel=\"noopener\">Google Scholar<\/a>]<\/p>\n<p>Monitoring urban changes based on scale-space filtering and object-oriented classification. Doxani, G.; Karantzalos, K.; Strati, M.T.; Int. J. Appl. Earth Obs. Geoinf 2012, 15, 38\u201348. [<a href=\"http:\/\/www.mdpi.com\/2072-4292\/6\/9\/8310\/htm#b18-remotesensing-06-08310\" target=\"_blank\" rel=\"noopener\">Google Scholar<\/a>]<\/p>\n<p>Building Change Detection from Historical Aerial Photographs<br \/>\n Using Dense Image Matching and Object-Based Image Analysis <a href=\"http:\/\/www.mdpi.com\/2072-4292\/6\/9\/8310\/htm\" target=\"_blank\" rel=\"noopener\">-LINK-<\/a><br \/>\nNebiker S., Lack N., Deuber M.; Institute of Geomatics Engineering, FHNW University of Applied Sciences and Arts Northwestern Switzerland, Gr\u00fcndenstrasse 40, 4132 Muttenz, CH<\/p>\n<p>&nbsp;  more research <a href=\"http:\/\/ct.light42.com\/ECN\/research_misc\" target=\"_blank\" rel=\"noopener\">-LINK-<\/a><\/p>\n<p>A Few Definitions:<br \/>\n<strong>GSD<\/strong> -- ground sampling distance; real distance between pixel centers in imagery<br \/>\n<strong>DSM<\/strong>  -- Digital Surface Model; similar to Digital Elevation Model<br \/>\n&nbsp;&nbsp; eDSM -- extracted Digital Surface Model; post-process result, varying definitions<br \/>\n&nbsp;&nbsp; nDSM -- normalized Digital Surface Model; varying definitions<br \/>\n<strong>FBM<\/strong> -- Feature-based mapping; store only features, not empty spaces<br \/>\n<strong>SGM<\/strong> -- Semi-global matching; computationally efficient pixel analysis<br \/>\nmorphological filtering -- computationally efficient edge methods<br \/>\nautomated shadow detection -- in OBIA<br \/>\nHough Transform <a href=\"https:\/\/en.wikipedia.org\/wiki\/Hough_transform\" target=\"_blank\" rel=\"noopener\">-LINK-<\/a><br \/>\n&nbsp;<br \/>\nRepresentative Public Test Process:<br \/>\n&nbsp;&nbsp;ISPRS Commission III, WG III\/4<br \/>\n&nbsp;&nbsp;ISPRS Test Project on Urban Classification and 3D Building Reconstruction<\/p>\n<p>&nbsp;<\/p>\n<hr \/>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/scikit-image-logo.png\" alt=\"\" width=\"284\" height=\"70\" class=\"alignright wp-image-3238\" srcset=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/scikit-image-logo.png 568w, http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/scikit-image-logo-300x74.png 300w\" sizes=\"(max-width: 284px) 100vw, 284px\" \/><em>\"... you dont want to be fooling with a lot of parameters.. you want paramater-free functions whenever you can.. thats one thing I think is exciting about neural networks (and the like), is that you just throw some images in there, and out come some magic answers. I think we will be seeing more of that.\"<\/em><br \/>\nscikit-image: Image Analysis in Python \/ Intermediate<br \/>\nSciPy 2016 Tutorial *<strong> Stefan van der Walt<\/strong><\/p>\n<p>&nbsp;<br \/>\n<strong>Machine Learning Pipeline Components<\/strong><br \/>\n<div id=\"attachment_3256\" style=\"width: 232px\" class=\"wp-caption alignright\"><a href=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/ma_training_bldgs_Inglewood0.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-3256\" src=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/ma_training_bldgs_Inglewood0-222x300.png\" alt=\"\" width=\"222\" height=\"300\" class=\"size-medium wp-image-3256\" srcset=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/ma_training_bldgs_Inglewood0-222x300.png 222w, http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/ma_training_bldgs_Inglewood0.png 476w\" sizes=\"(max-width: 222px) 100vw, 222px\" \/><\/a><p id=\"caption-attachment-3256\" class=\"wp-caption-text\">Training buildings available in Inglewood, Calif.<\/p><\/div><em>Introduction:<\/em> Source imagery is fed to a segmentation engine using parameter combinations of threshold (t), shape count (s) and compactness\/smoothness (c). The results are saved. For every chosen entry in a table of known good training polygons, intersection is calculated and statistics are stored.  Search is performed to find optimal \"goodness of fit\" by some criteria. The results parameters are then propagated to subsequent segmentation of new imagery, those results are stored, and searched for likely matches.<\/p>\n<p>Known-good Building Polygons all have the following attributes available:<\/p>\n<pre>\r\nma_buildings=# \\d train_bldgs_08mar17\r\n                 Table \"sd_data.train_bldgs_08mar17\"\r\n          Column           |            Type             | Modifiers \r\n---------------------------+-----------------------------+-----------\r\n gid                       | integer                     | not null\r\n building_type_id          | integer                     | \r\n building_type_name        | text                        | \r\n situscity                 | text                        | \r\n shp_area_m                | integer                     | \r\n geom                      | geometry(MultiPolygon,4326) | \r\n naip_meta_1               | integer                     | \r\n naip_meta_2               | text                        | \r\n urban_ldc                 | integer                     | \r\n compact_ldc               | integer                     | \r\n standard_ldc              | integer                     | \r\n intersection_density_sqmi | double precision            | \r\n acres_grid_gf             | double precision            | \r\n acres_grid_con            | double precision            | \r\n use_du_dens               | double precision            | \r\n<\/pre>\n<p>&nbsp;<br \/>\nTheory of <strong>Determination of Relevance<\/strong> -- Clinton_Scarborough_2010_PERS<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/CLINTON_relevance_excerpt.png\" alt=\"\" width=\"475\" height=\"520\" class=\" wp-image-3214\" \/><\/p>\n<p><em>from<\/em> <code>Accuracy Assessment Measures for Object-based Image Segmentation Goodness: Clinton, et al<\/code><\/p>\n<p>&nbsp;<br \/>\nPipeline Tools to date:<br \/>\n&nbsp;<br \/>\n<img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/st_overlaps03.png\" alt=\"\" width=\"200\" height=\"200\" class=\"alignright wp-image-3210\" srcset=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/st_overlaps03.png 210w, http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/st_overlaps03-150x150.png 150w\" sizes=\"(max-width: 200px) 100vw, 200px\" \/><\/p>\n<pre>\r\nprep-relevance.py  &gt;options&lt;\r\n  create a table associated with subset of seg_polys and assign\r\n  to a class based on mutual percent overlap with a truth polygon\r\n \r\n  options:\r\n    -h|--help\r\n    -i|--info  - report available jobid, project, t, s, c parameters and exit\r\n    -b|--table tablename - table to create with relevance and class code (required)\r\n    -j|--jobid name          -+\r\n    -p|--project name         |\r\n    -m|--mode ir|tc|fc|4b     |- filters for selecting specific records\r\n    -t|--threshold n.n        |\r\n    -s|--shape n.n            |\r\n    -c|--compact n.n         -+\r\n    -v|--verbose\r\n\r\n<\/pre>\n<p>Segmentation Trials <a href=\"http:\/\/ct.light42.com\/osmb\/doc\/lineup\/\" target=\"_blank\" rel=\"noopener\">-LINEUP-<\/a><br \/>\n&nbsp;<\/p>\n<pre>\r\ntrain-segments.py\r\nUsage: train-segments.py &gt;options&lt; trained-object.pkl.gz\r\n    -i|--info  - report available jobid, project, t, s, c parameters\r\n    -b|--table tablename - table with relevance and class code\r\n    -j|--jobid name    -+\r\n    -p|--project name   |\r\n    -t|--threshold n.n  |- filter dataset to be extracted\r\n    -s|--shape n.n      |\r\n    -c|--compact n.n   -+\r\n    -r|--relevance n.n - min relevance value to be part of class, default: 0.5\r\n    -v|--verbose\r\n    -x|--test - Use 50% of data to train and 50% to predict and report\r\n    -a|--algorithm - KNN - KNearestNeighbors|\r\n                     GBT - Gradient Boost Tree, default: GBT\r\n\r\n   conn = psycopg2.connect(\"dbname=<strong>ma_buildings<\/strong>\")\r\n\r\npredict-segments &gt;options&lt; trained-object.pkl.gz\r\n    -i|--info  - report available jobid, project, t, s, c parameters\r\n    -b|--table tablename - table to create with relevance and class code\r\n    -j|--jobid name    -+\r\n    -p|--project name   |\r\n    -t|--threshold n.n  |- filter dataset to be extracted\r\n    -s|--shape n.n      |\r\n    -c|--compact n.n   -+\r\n    -x|--test - do the prediction but don't write to the database\r\n\r\n\r\neval-stats.py\r\n    scan intermediate relevance tables and compute descriptive statistics\r\n\r\neval-segmentation.py\r\n  evaluate multiple t, s, c parameters of the segmentation process\r\n  and try to pick the that gives up the best results\r\n\r\n  for a given training set:\r\n    * take 40% of the polygons and use them for training\r\n    * take 40% of the polygons and use them for evaluation\r\n    * save 20% of the polygons as a hold back set for final evaluation\r\n\r\n  there are currently two proceses planned to do this:\r\n  1. a short path:\r\n     compute the relevance for all the polygons\r\n     and try to increase the average mutual coverage\r\n     and decrease the stddev on the average mutual coverage\r\n     take the best 2-3 sets of parameters and run them through 2.\r\n\r\n  2. a long path: (NOT IMPLEMENTED YET)\r\n     compute the relevance for all the polygons\r\n     create a trained classifier using the training polygons\r\n     run the test polygons through the trained classifier\r\n     evaluate the goodness of the fit\r\n\r\n  In either case, save the parameters that gave the best fit.\r\n\r\n  Relevance tables will be generated based on &gt;jobid&lt; &gt;project&lt; t s c\r\n\r\n-------------------------------------------------------\r\nload-seg-data.py [-c] jobid dir\r\n  -c    - optionally drop and tables\r\n  jobid - string to uniquely identify this data\r\n  dir   - segmentizer output directory to walk and load\r\n\r\nFor a given input file like project\/tcase\/test-fc.vrt\r\nthe segmentizer generates files in path project\/tcase\/:\r\n\r\ntest-fc.vrt_30_05_03.tif\r\ntest-fc.vrt_30_05_03.tif_line.tif\r\ntest-fc.vrt_30_05_03.tif_rgb.tif\r\ntest-fc.vrt_30_05_03.tif_stats.csv\r\ntest-fc.vrt_30_05_03.tif.shp\r\n\r\nWe can generate files given this matrix:\r\n\r\n  mode |  t  |  S  |  c  |  z  |\r\n  ------------------------------\r\n   fc  | 30  | 05  | 03  | 19  |\r\n   ir  | 50  | ... | 07  | ... |\r\n   tc  | ... | ... | ... | ... |\r\n  ------------------------------\r\n\r\nWe can generate any number of values for t, s, c, and z.\r\nSegmented results are stored in table seg_polys,\r\n\r\ncreate table sd_data.seg_polys (\r\n  gid serial not null primary key,\r\n  project text,               -- from file path\r\n  tcase text,                 -- from file path\r\n  ...\r\n\r\n--\r\nsegment-dir options dir\r\n    Call the segmentation engine for each file in a directory, with the given settings\r\n    Produce an HTML overview for human interpretation of results\r\n\r\n      [-t|--threshold T]   - T1[,T2,...] thresholds, default: 60\r\n      [-s|--shape S]       - S1[,S2,...] shape rate, default: 0.9\r\n      [-c|--compact C]     - C1[,C2,...] compact\/smoothness, default: 0.5\r\n      [-v]                 - be verbose\r\n      [--nostats]          - don't generate stats.csv file\r\n      [--nolineup]         - don't generate the lineup image\r\n\r\n\r\n--\r\nThe following database table enables reference of any dataset from a mapfile:\r\n\r\ncreate table sd_data.seg_tileindex (\r\n  gid serial not null primary key,\r\n  project text,               -- from file path\r\n  tcase text,                 -- from file path\r\n  ...\r\n\r\n<\/pre>\n<p>&nbsp;<br \/>\nOther Imaging Filters Dept.<br \/>\n &nbsp;A <a href=\"https:\/\/en.wikipedia.org\/wiki\/Sobel_operator\" target=\"_blank\" rel=\"noopener\">Sobel Filter<\/a> from sckit-image may be worth adding to the chain<br \/>\nThe filter detects edges well (though they are not squared and various weights).<br \/>\nA target area could be passed through the filter and added as a segmentation clue.<\/p>\n<p>&nbsp;<\/p>\n<p>08 Mar 17<br \/>\nRecognition Trials -- WA18<br \/>\n========================================================<\/p>\n<p><strong>Project Big Picture<\/strong><br \/>\n&nbsp;  \"tripartite solution\" -- Authoritative Sources of 2D building footprints, with varying attributes (<code>dbname=auth_buildings<\/code>), updates and ingestion are <em>semi-manual<\/em>; Openstreet Buildings with minimal attributes (<code>dbname=osm_buildings<\/code>), updates are <em>automated<\/em>; Machine-Learning generated building locations (<code>dbname=ma_buildings<\/code>), updates are machine-generated with supplementary attributes.<\/p>\n<p><strong>ML Theory<\/strong><br \/>\n&nbsp; Unsupervised Classification with pixel-based analysis (see below)<br \/>\n&nbsp; Supervised Classification with Object-based analysis (what we are using, newer)<\/p>\n<p><strong>Project Details<\/strong><br \/>\n&nbsp; Database <code>ma_buildings<\/code> schemas now include: building_types, a classification system for buildings in all use_codes; US Census TIGER, road network, block groups, county subportions; Grid 150m Los Angeles County, vast, high-resolution landuse and urban metrics data; LARIAC4 Buildings 2014, every building in LA from SCAG; sd_data training_buildings, a table-driven source of ML training targets (see below), picked from 249,927 classified geometry records of commercial and MF5+ buildings in LA County.<\/p>\n<p>Building a Training Set:  filter a set of 2D building polygons of interest from LA County (<code>table=sd_data.train_bldgs<\/code>); run a segmentation on remote sensing data layers NAIP 2016, store the results (<code>table=seg_polys<\/code>).<\/p>\n<p>Recognition: Segment a target search area using remote sensing layers NAIP 2016; use ML engine and training product to recognize likely matches; characterize the results.<\/p>\n<p><strong>Documentation &amp;  Handoff<\/strong><br \/>\n&nbsp; <em>discussion<\/em><\/p>\n<p>&nbsp;<\/p>\n<hr \/>\n<p>* Training Set, continued:<\/p>\n<p>- Garlynn's crosswalk JOIN table gives a LOT of category information on the buildings themselves. In effect, takes 300,000 or so non-residential buildings and puts a very fine grain classification on them.<\/p>\n<pre>\r\n\"Each and Every Building in LA, provided by the County\"\r\nma_buildings  la14.lariac_buildings_2014\r\n----------------------------------------\r\ncode, bld_id, height, elev,\r\nlariac_buildings_2014_area, source, date_, ain, shape_leng, status, old_bld_id,\r\n ...\r\n\r\n\"Building Categorization from Vision_California release\"\r\nbuilding_types\r\n--------------------------------------------------------\r\n  building_type_category text,\r\n  building_type_id integer NOT NULL,\r\n  building_type_name text,\r\n  building_height_floors integer,\r\n  total_far integer\r\n  ...\r\n\r\nUseCode_2_crosswalk_UF_Building_Types2.csv\r\n-------------------------------------------\r\nusecode_2,usetype,usedescription,bt_id,mf5_plus_flag,include_flag\r\n10,Commercial,Commercial,41,,1\r\n11,Commercial,Stores,41,,1\r\n12,Commercial,Store Combination,39,,1\r\n13,Commercial,Department Stores,39,,1\r\n14,Commercial,Supermarkets,41,,1\r\n15,Commercial,\"Shopping Centers (Neighborhood, community)\",41,,\r\n16,Commercial,Shopping Centers (Regional),39,,1\r\n17,Commercial,Office Buildings,32,,1\r\n18,Commercial,Hotel & Motels,38,,1\r\n19,Commercial,Professional Buildings,32,,1\r\n....\r\n<\/pre>\n<p>First attempt at a JOIN with a VIEW.. did not work for QGis and had permissions problems within the database. Second attempt simply defines a TABLE in the <code>sd_data<\/code> schema. <strong>Success<\/strong> !<\/p>\n<div id=\"attachment_3129\" style=\"width: 310px\" class=\"wp-caption aligncenter\"><a href=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/qgisViz_training_set_first_pass.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-3129\" src=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/qgisViz_training_set_first_pass-300x170.png\" alt=\"\" width=\"300\" height=\"170\" class=\"size-medium wp-image-3129\" srcset=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/qgisViz_training_set_first_pass-300x170.png 300w, http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/qgisViz_training_set_first_pass-768x436.png 768w, http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/qgisViz_training_set_first_pass-1024x582.png 1024w, http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/qgisViz_training_set_first_pass.png 1146w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a><p id=\"caption-attachment-3129\" class=\"wp-caption-text\">First pass at a finely categorized non-residential building set in LA County as ML Training Set candidates.<\/p><\/div>\n<p><strong>Misc URLs<\/strong><br \/>\n&nbsp; <a href=\"http:\/\/ct.light42.com\/osmb\/work\/lineup.html\" target=\"_blank\" rel=\"noopener\">http:\/\/ct.light42.com\/osmb\/work\/lineup.html<\/a><br \/>\n&nbsp; <a href=\"http:\/\/ct.light42.com\/osmb\/la14-50-60-09-05\/lineup.html\" target=\"_blank\" rel=\"noopener\">http:\/\/ct.light42.com\/osmb\/la14-50-60-09-05\/lineup.html<\/a><br \/>\n&nbsp; <a href=\"http:\/\/ct.light42.com\/osmb\/la14-50-60-09-01_05_07\/lineup.html\" target=\"_blank\" rel=\"noopener\">http:\/\/ct.light42.com\/osmb\/la14-50-60-09-01_05_07\/lineup.html<\/a><\/p>\n<p>.. and, Machine Learning <em>problematic data <\/em><\/p>\n<p>&nbsp; <a href=\"http:\/\/ct.light42.com\/osmb\/?zoom=18&#038;lat=33.92182&#038;lon=-118.35224&#038;layers=000000BFFTFFTFF\" target=\"_blank\" rel=\"noopener\">http:\/\/ct.light42.com\/osmb\/?zoom=18&lat=33.92182&lon=-118.35224&layers=000000BFFTFFTFF<\/a><\/p>\n<p><strong>Characterize Training Set Buildings<\/strong><br \/>\n  - residential multifamily five units and up - are included<br \/>\n  - how many buildings ?  what sizes ?  no filter on \"small bldgs\" yet<\/p>\n<p><strong> Attributes and Sources For Training Set<\/strong> (gw):<br \/>\n- Should record<strong> imagery attributes <\/strong>(direction of sun angle, year, etc). This will come from the NAIP metadata. <strong><em>WHY<\/em><\/strong>: It obviously matters from which direction the sun is shining, as shadows will be found on the opposite sides of objects from the sun. Time of year will tell about the presence or absence of leaves on deciduous vegetation. The year allows for time-series comparisons between different vintages of the same dataset.<br \/>\n- <strong>City<\/strong> (name) \u2014 this can come from a number of sources, including the LARIAC buildings shapes themselves (SitusCity). <strong><em>WHY<\/em><\/strong>: Different cities may have different attributes, citywide, in a meaningful manner.<br \/>\n- The <strong>area of the polygon<\/strong>, from LARIAC buildings training set (Shape_Area). <strong><em>WHY<\/em><\/strong>: We want to be able to filter by this attribute for only large buildings.<br \/>\n-<strong> urban_ldc<\/strong> from the UF base grid. <strong><em>WHY<\/em><\/strong>: These are super-urban areas, more than 150 walkable intersections\/sqmi, most buildings here are skyscrapers.<br \/>\n- <strong>compact_ldc <\/strong>from the UF base grid. <strong><em>WHY<\/em><\/strong>: These are compact development types, more than 150 walkable intersections\/sqmi, with a walkable street grid.<br \/>\n- <strong>standard_ldc<\/strong> from the UF base grid. <strong><em>WHY<\/em><\/strong>: These are suburban development types, less than 150 walkable intersections\/sqmi,<br \/>\n- <strong>intersection_density_sqmi<\/strong> from the UF base grid. <strong><em>WHY<\/em><\/strong>: Intersection density is a critical variable in urbanism. It could be that the machine finds a strong correlation between it and other things. Let\u2019s find out.<br \/>\n- <strong>acres_grid_gf<\/strong> from the UF base grid. <strong><em>WHY<\/em><\/strong>: Greenfield acres tell us how many non-urbanized acres are within the grid. If this is above some threshold, say 85% of the grid cell, we can probably skip that grid cell for ML analysis.<br \/>\n- <strong>acres_grid_con<\/strong> from the UF base grid. <strong><em>WHY<\/em><\/strong>: Constrained acres also tell us how many non-urbanized acres are within the grid. If this is above some threshold, say 85% of the grid cell, we can probably skip that grid cell for ML analysis.<br \/>\n- <strong>use_du_dens<\/strong> from the UF base grid. <strong><em>WHY<\/em><\/strong>: We may be able to do something clever, like filter out the residential records from the buildings dataset by only those grid cells from the UF base load with a DU use density above some threshold (I\u2019m thinking 10-20 or so), which should get rid of some of the noise in the data.<br \/>\n- Building Size -- Filter all records by our threshold size: 2,000? 10,000? ...of 2D footprint. Maybe don\u2019t try this during the first pass? Keep it in your back pocket for later? Not sure.<\/p>\n<p>&nbsp;<\/p>\n<hr \/>\n<p><strong>Server Layout<\/strong><\/p>\n<pre>\r\n \/var\/local\/osmb_ab\/\r\n    bin        # all automation, plus small utils\r\n        add-auth-building-layer  do-backup            json-to-postgres    osm_building_stats\r\n        create-auth-buildings    extract-image.py     load-osm-buildings  reload-osm-buildings\r\n        createdb-dee1            fetch-osm-ca-latest  load-roads          wget-roads\r\n \r\n    bis2       # the Image Segmentation engine\r\n    cgi-bin    # mapserver(tm) delivers the view of data\r\n    data       # links to large data stores, raw inputs, csv files\r\n        auth_buildings  dbdump  misc  naip  sd_data  tiger2016-roads\r\n    html       # mapserver site \r\n    maps       # mapserver layer definitions\r\n    src        # misc code\r\n        code_dbb  ECN_osm_import  naip_fetch  naip-fetch2  naip-process  osm-buildings  sql\r\n\r\n<\/pre>\n<p>&nbsp;<\/p>\n<hr \/>\n<p>California <strong>Department of Conservation<\/strong><br \/>\nnew maps site:  <a href=\"https:\/\/maps.conservation.ca.gov\/\" target=\"_blank\" rel=\"noopener\">https:\/\/maps.conservation.ca.gov\/<\/a><br \/>\n\"... For further information or suggestions regarding the data on this site, please contact the DOGGR Technical Services Unit at 801 K Street, MS 20-20, Sacramento, CA 95814 or email DOGGRWebmaster@conservation.ca.gov.  \"<\/p>\n<p><em>also<\/em>:<br \/>\n  http:\/\/statewidedatabase.org\/   < - referred from a wikipedia page, no other info\n\n&nbsp;\n\n\n<hr \/>\n<p><strong>Some Breadth on (pre)Machine Learning Topics<\/strong><br \/>\n (<em>see gw\/rdir\/snippets_misc for more like this<\/em>)<br \/>\n&nbsp;<br \/>\nPixel Matching -- Notes on Maximum Likelyhood Classification Analysis<br \/>\n  \"... is a statistical decision criterion to assist in the classification of overlapping signatures; pixels are assigned to the class of highest probability.\" <a href=\"http:\/\/www.sc.chula.ac.th\/courseware\/2309507\/Lecture\/remote18.htm\" target=\"_blank\" rel=\"noopener\">-LINK-<\/a><a href=\"http:\/\/ct.light42.com\/ECN\/research_misc\/ECHO.pdf\" target=\"_blank\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/Kettig_Landgrebe_1976_core.png\" alt=\"\" width=\"378\" height=\"102\" class=\"alignright size-full wp-image-3145\" srcset=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/Kettig_Landgrebe_1976_core.png 378w, http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/Kettig_Landgrebe_1976_core-300x81.png 300w\" sizes=\"(max-width: 378px) 100vw, 378px\" \/><\/a><br \/>\n  \"one approach is to use Bayes' Rule and maximize the agreement between a model and the observed data\"<\/p>\n<p>&nbsp; Chapter 5 - Model Estimation, page 116<br \/>\n<code> The Nature of Mathematical Modeling; Gershenfeld, Neil, Cambridge University Press 1999<\/code><\/p>\n<p>  Maximum Likelyhood Estimation <a href=\"https:\/\/en.wikipedia.org\/wiki\/Maximum_likelihood_estimation\" target=\"_blank\" rel=\"noopener\">-WIKIPEDIA-<\/a><br \/>\n  Bayes' Rule <a href=\"https:\/\/en.wikipedia.org\/wiki\/Bayes%27_rule\" target=\"_blank\" rel=\"noopener\">-WIKIPEDIA-<\/a><\/p>\n<p>&nbsp;<\/p>\n<p>01 Mar 17<br \/>\nRecognition Trials -- WA18<br \/>\n========================================================<\/p>\n<p>* prep for segmentation<\/p>\n<pre>\r\n$ extract-image.py -o  \/sand480\/extractions_work\/t2.tif -z 2 40.87427 -124.08024\r\n\/sand480\/extractions_work\/t2.tif-tc.tif\r\n\/sand480\/extractions_work\/t2.tif-ir.tif\r\n<\/pre>\n<p>&nbsp; - web visible results <a href=\"http:\/\/ct.light42.com\/ECN\/extractions_work\" target=\"_blank\" rel=\"noopener\">-LINK-<\/a><\/p>\n<p>&nbsp; <a href=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/extraction_test2.png\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/extraction_test2-300x265.png\" alt=\"\" width=\"300\" height=\"265\" class=\"alignleft size-medium wp-image-3081\" srcset=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/extraction_test2-300x265.png 300w, http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/extraction_test2-768x679.png 768w, http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/extraction_test2.png 988w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a>&nbsp; <a href=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/extraction_test2_qgis.png\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/extraction_test2_qgis-300x300.png\" alt=\"\" width=\"300\" height=\"300\" class=\"alignnone size-medium wp-image-3082\" srcset=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/extraction_test2_qgis-300x300.png 300w, http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/extraction_test2_qgis-150x150.png 150w, http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/extraction_test2_qgis-768x770.png 768w, http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/extraction_test2_qgis.png 829w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p><a href=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/t2_IR_qgis.png\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/t2_IR_qgis-300x298.png\" alt=\"\" width=\"300\" height=\"298\" class=\"alignnone size-medium wp-image-3084\" srcset=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/t2_IR_qgis-300x298.png 300w, http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/t2_IR_qgis-150x150.png 150w, http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/t2_IR_qgis.png 742w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<hr \/>\n<p>&nbsp;<\/p>\n<p><em>from<\/em> <strong>Multiresolution Segmentation:<\/strong> an optimization approach for high quality multi-scale image segmentation -- Martin BAATZ und Arno SCH\u00c4PE <a href=\"http:\/\/ct.light42.com\/ECN\/research_misc\/405_baatz_fp_12.pdf\" target=\"_blank\" rel=\"noopener\">-LINK-<\/a><br \/>\nDesign Goals:  <em>(for a generalized analysis method ed.)<\/em><\/p>\n<blockquote><p>\n<strong>Multiresolution<\/strong>: Objects of interest  typically  appear  on  different  scales  in  an  image simultaneously. The extraction of meaningful image objects needs to take into account the scale of the problem to be solved. Therefore the scale of resulting image objects should be free adaptable to fit to the scale of task.<\/p>\n<p><strong>Similar Resolution<\/strong>: Almost all attributes of image objects \u2013 color, texture or form -<br \/>\nare more or less scale-dependent. Only structures of similar size respectively scale<br \/>\nare of comparable quality. Hence the size of all resulting image objects should be of comparable scale.<\/p>\n<p><strong>Reproducibility<\/strong>: Segmentation results should be reproducible<\/p>\n<p><strong>Universality<\/strong>: Applicable to arbitrary types of data and problems<\/p>\n<p><strong>Speed<\/strong>: Reasonable performance even on large image data sets<\/p>\n<\/blockquote>\n<p><em>commentary:<\/em>  Let us focus here on the specific segmentation\/extraction parameters (the \"scale of the problem\" is ambiguous in English, as it could refer to the total ambitions of the result set size). That is, the content of a scene by number of objects to be recognized, and the information in the scene, by measurable detail in the recorded bits. Both number of targets in a scene, and the detail available on each target, vary directly by scale -- aka zoom level.<\/p>\n<p>The NAIP CA 2016 imagery has both IR and RGB layers, and with both layers, any reasonable desired \"zoom level\" is available, greater and lesser than the native resolution. Some questions here include:<\/p>\n<p>* what combination of zoom levels + segmenting + analysis, within reasonable compute bounds given existing software and hardware, will yield accurate and consistent recognition for the size of the object we are searching for ? <\/p>\n<p>* what segmentation detail will be most useful to identify likely targets?<\/p>\n<p>* can we combine segmentation results from RGB and IR\/Gray, in some way with our setup, to increase recognition?  Do we need two training sets for ML, or can the two layers be combined ?<\/p>\n<p>* if this search is focused on \"big buildings\", can we disregard completely \"small buildings\" footprints, like a typical residential, single-family home for example?   Or, do we need the similarly shaped, but smaller footprint building in the result. Generally speaking, one building is more similar looking to other buildings, than to other kinds of features in the scene...  except maybe parking lots !<\/p>\n<p>* Can negative examples be used in a training set ? In other words, a segmented parking lot is \"not a building\". <\/p>\n<p>-&nbsp;<\/p>\n<p>We can think of \"how big is a typical target, at a given scale\" and \"how many segments typically compose one target\". Some explanation of these topics follows.<\/p>\n<p>Consider a test area that is only 20% larger than a typical building of a certain class of interest. There is a huge amount of detail to match on, but, how many image analysis operations would it take to accomplish a survey ?<\/p>\n<p>A counter-example is, analysis of an image at a zoom level such that there are one-hundred targets of interest. Each target is no longer a detailed polygon, but instead could be identified by the pattern of segments including the surrounding group, e.g. pixels for the building, its shadow, the road leading to it, cleared land around it, etc.<\/p>\n<p>Can information from analysis at different scales, be combined in some rigorous way ? Does this overlap with combining non-segmentation data with segmentation data ?  for example the presence of OSM polygons in a search area..<\/p>\n<p>&nbsp;<\/p>\n<hr \/>\n<p><a href=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/test2-aaa-tc.tif_lineup-x175.jpg\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/test2-aaa-tc.tif_lineup-x175-300x300.jpg\" alt=\"\" width=\"300\" height=\"300\" class=\"aligncenter size-medium wp-image-3102\" srcset=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/test2-aaa-tc.tif_lineup-x175-300x300.jpg 300w, http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/test2-aaa-tc.tif_lineup-x175-150x150.jpg 150w, http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/test2-aaa-tc.tif_lineup-x175-768x766.jpg 768w, http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/test2-aaa-tc.tif_lineup-x175.jpg 899w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<hr \/>\n<p>In case of boredom, a boot disk dropping off of its RAID is a remedy...<\/p>\n<pre>\r\n$ cat \/proc\/mdstat\r\nPersonalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] \r\n[raid4] [raid10]\r\nmd0 : active raid1 sdh[2] sda1[0]\r\n244007744 blocks super 1.2 [2\/1] [U_]\r\n[====>................] recovery = 23.5% (57386432\/244007744) \r\nfinish=53.0min speed=58672K\/sec\r\n\r\nunused devices: <none>\r\n<\/none><\/pre>\n<p><strong>Mask Files ?<\/strong><br \/>\n- it seems there may be something missing in the extraction chain. There is an option in the <code>gdal_translate<\/code> code to \"generate internal masks\" but the way the processing chain is right now, the dot-msk files were left as a side product. This may need to change.<\/p>\n<p><strong>Map URLs as the Basis of Linking<\/strong><br \/>\n  The short-form of the map urls has zoom information and a center LatLon. This is useful at all steps in the analysis, from human to machine internals. Adapt the <code>extract-image.py<\/code> tool  to use those zoom levels when reading NAIP imagery.<\/p>\n<p><em>Search Notes:<\/em><br \/>\n* we think that \"distance to a road\" should be included in a preliminary search as a weight,<br \/>\nand that making effort eliminate whole areas from the search is worthwhile.<\/p>\n<p>24 Feb 17<br \/>\nRecognition Trials -- California AB 802 Support<br \/>\n========================================================<\/p>\n<p><em>James S. writes:<\/em><\/p>\n<pre>\r\nThere is probably no documentation you haven't seen. The Python API source code is documentation\r\n of sorts. There is also a brief FAQ on imageseg.com\r\n\r\nHonestly, different customers will either use or not use different parts for their specific\r\nworkflow. We lay it all out and let the integration happen at the customer end. As such there \r\nare different pieces and you are welcome to use, disregard, or innovate on top of them.\r\n\r\nYou're certainly putting more software engineering effort into it than most.\r\n\r\nThe rank.py is optional for doing the shape comparison to determine the \"goodness\" of the\r\nsegmentation match to a ground truth shapefie. That embodies the gist of the Clinton paper.\r\nAnother potential post processing is supervised classification on the image stats.\r\n\r\nNote automate.py and workflow.py modules that wrap the suggested workflow.\r\n\r\nYou are certainly welcome to build your own front end in Jupyter if that works for you. That\r\nsounds like a good iterative\/exploratory take on it. Some other customer in Africa just wanted \r\na GUI with four boxes. The land cover mapping customer in Mexico just did the whole country \r\nat 10,0.5,0.5 and moved on. As I say, fit for purpose.\r\n<\/pre>\n<p>&nbsp;<br \/>\n<strong>BIS Gui<\/strong><\/p>\n<ul>\n<li>Threshold | 1-255 ? | close to 0 is \"many colors in final\"; 255 is \"few colors in final\"<\/li>\n<li>Shape Rate | [0.0,1.0] | 0 -> \"many shapes\"; \"few shapes\" -> 1.0<\/li>\n<li>Compact\/Smoothness | [0.0,1.0] | 0 -> \"compact; \"spread\" -> 1.0<\/li>\n<\/ul>\n<p><em>Base Example<\/em> <a href=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/Arcata_Ex0.png\" target=\"_blank\" rel=\"noopener\">-IMG-<\/a> <code> Arcata_Ex0.png<\/code><br \/>\n&nbsp; A harsh first test; much detail, many similar objects.. ..<br \/>\n&nbsp; run in grayscale (IR) only<br \/>\n&nbsp; &nbsp; <em>roughly 70 variations are produced (see exArcata dir)<\/em><\/p>\n<p>Example Vectorization using BIS:<br \/>\n&nbsp; Here is an IR layer image of Arcata, close up and shown at 50% intensity. Using the triple-inputs to BIS (threshhold, shape rate, smoothness) get a vector layer back. Do this twice, varying only the first parameter (threshhold). Notice how the same edges are picked in both cases, only more of them with a lower threshold. <\/p>\n<p><a href=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/Arcata_IR_close1.png\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/Arcata_IR_close1-150x150.png\" alt=\"\" width=\"150\" height=\"150\" class=\"alignnone size-thumbnail wp-image-3019\" \/><\/a>&nbsp;<a href=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/Arcata_IR_close_vec.png\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/Arcata_IR_close_vec-150x150.png\" alt=\"\" width=\"150\" height=\"150\" class=\"alignnone size-thumbnail wp-image-3012\" \/><\/a>&nbsp;&nbsp;&nbsp;<a href=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/Arcata_IR_close_vec2.png\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/Arcata_IR_close_vec2-150x150.png\" alt=\"\" width=\"150\" height=\"150\" class=\"alignnone size-thumbnail wp-image-3013\" \/><\/a><\/p>\n<ul>\n<li>ArcataEx0<\/li>\n<li>ArcataEx0 50_04_04<\/li>\n<li>ArcataEx0 30_04_04<\/li>\n<\/ul>\n<p>&nbsp;<br \/>\nSome observations: these vector result edges do follow the edge of change in pixel values, exactly. So the odd shapes of shadow and lighting, do carry through to the vector results. There appears to be no way for the vectorizing engine to estimate and resolve implied straight edges through shadows, for example. <em>note:<\/em> internally, BIS calls <code>gdal.Polygonize()<\/code> <a href=\"http:\/\/www.gdal.org\/gdal_polygonize.html\" target=\"_blank\" rel=\"noopener\">-LINK-<\/a> to perform the vectorization.<\/p>\n<pre>\r\n-- Some buildings in Chico, at zoom level 14 in Infrared (IR)\r\n &nbsp; \/osmb\/?zoom=14&lat=39.72545&lon=-121.80761&layers=0000BFFFTFFF\r\n-- RGB Color, same scene in 2014\r\n &nbsp; \/osmb\/?zoom=14&lat=39.72545&lon=-121.80761&layers=000B0FFFTFFF\r\n-- osm outlines, more to do !\r\n &nbsp; \/osmb\/?zoom=14&lat=39.72545&lon=-121.80761&layers=B0000FFFTTFF\r\n<\/pre>\n<p>Another form of output from the BIS engine is statistics available in a dot-csv file, each and every geometry is summed, including area in square units, mean luminance per pixel, and other statistical measures.. These statistics can be used to calibrate engine response in a feedback loop, using larger frameworks. <\/p>\n<p><em>Questioning Machine Learning:<\/em><br \/>\n &nbsp;&nbsp;<em>What is building and what is not building, in this image.<\/em><\/p>\n<p>One question a ML process can answer is, is this (noisy thing) similar to this other noisy thing. Not the same to make a clean line in vector geometry! We may be asking several questions at once.. Where on the image is \"building pixels\" and where are not, is different than, provide a simple geometry representing each building. What is building, what is not building, in an image... is a different question than, show me a good representation of a building. <\/p>\n<p>* developing a training set for supervised ML is work - a training set shows the  learning system what a \"right answer\" is.. more than one hundred thousand, well-chosen examples is not uncommon in a training set. Developing the training set is not to be taken for granted.<\/p>\n<p>* we have targets that do not change in life, but the view of them changes, due to weather, lighting, focus, image processing. <\/p>\n<pre>\r\nExample BIS Automation Run from the Command Line using IPython\r\n\r\n$ cd \/home\/dbb\/CEC_i7d\/bis-2\/bisapi-hamlin\/bis-workd\r\n$ ipython\r\nPython 2.7.6 (default, Oct 26 2016, 20:30:19) \r\nType \"copyright\", \"credits\" or \"license\" for more information.\r\n\r\nIPython 5.2.2 -- An enhanced Interactive Python.\r\n...\r\nIn [3]: from automate import automate\r\n...\r\nIn [4]: \r\n    ...: files = automate('\/wd4m\/exArcata\/Arcata_Ex0.png', t=[42,52,62,72,82], \r\n    ...:   s=[0.5], c=[0.3,0.7],do_stats=True, do_colorize=False,\r\n    ...:   do_train=False, do_classify=False, do_outline=True, do_lineup=True)\r\n\r\n<\/pre>\n<pre>\r\n## -- BIS virtualenv Fresh Install --\r\nipython \/ jupyter \/ scipy \/ pyside  virtualenv re-install\r\n\r\n    source \/home\/dbb\/CEC_i7d\/bis-2\/bisenv\/bin\/activate bisenv\r\n    pip install ipykernel\r\n    python  -m ipykernel install --user --name bisenv --display-name \"bisenv\"\r\n    pip install shapely fiona\r\n    pip install scikit-image\r\n    pip install numpy rasterio\r\n    pip install pyproj cartopy\r\n    pip install psycopg2\r\n    pip install gdal   < - gives 2.1.3 which appears to work ok with 2.2dev\r\n\r\n\r\n## - BIS working set of packages in virtualenv-\r\n##\r\n(bisenv) dbb@i7d:~\/CEC_i7d\/bis-2\/bisapi-hamlin\/bis-workd$ pip freeze\r\naffine==2.0.0.post1  Cartopy==0.14.2   Fiona==1.7.3    GDAL==2.1.3\r\nipykernel==4.5.2    ipython==5.2.2    ipython-genutils==0.1.0\r\njupyter-client==4.4.0    jupyter-core==4.3.0\r\nmatplotlib==2.0.0    numpy==1.12.0    Pillow==4.0.0    psycopg2==2.6.2    ...\r\npyproj==1.9.5.1    pyshp==1.2.10    PySide==1.2.4    ...\r\nrasterio==0.36.0    scipy==0.18.1    Shapely==1.5.17    ...\r\nsix==1.10.0    subprocess32==3.2.7    tornado==4.4.2    ...\r\n\r\n## Fiona CLI tool - (no PG driver)\r\n$fio env\r\n  ...\r\n  ESRI Shapefile (modes 'r', 'a', 'w')\r\n  GPKG (modes 'r', 'w')\r\n  GPSTrackMaker (modes 'r', 'a', 'w')\r\n  GPX (modes 'r', 'a', 'w')\r\n  GeoJSON (modes 'r', 'w')\r\n  OpenFileGDB (modes 'r')\r\n  ...\r\n\r\n<\/pre>\n<hr \/>\n<p>i7d Disk Updates:<br \/>\n &nbsp;* naip-fetch2\/process-doqqs.py<br \/>\n &nbsp; &nbsp;- add dir \/sand480\/tmp  <\/p>\n<p>&nbsp;<\/p>\n<p>Google Search \\  filetype:pdf inurl:asprs object<\/p>\n<p>http:\/\/wiki.openstreetmap.org\/wiki\/OpenSolarMap<br \/>\nhttps:\/\/github.com\/opensolarmap\/solml<\/p>\n<hr \/>\n<p><strong> BIS Test 2<\/strong><\/p>\n<p> visually picked an area in IR; important variables: zoom level, color\/gray; composition<\/p>\n<p><code>\/osmb\/?zoom=17&lat=40.8835&lon=-124.08918&layers=0000BTFFFFFF<\/code> <a href=\"http:\/\/ct.light42.com\/\/osmb\/?zoom=17&#038;lat=40.8835&#038;lon=-124.08918&#038;layers=0000BTFFFFFF\" target=\"_blank\" rel=\"noopener\">-LINK-<\/a><\/p>\n<p>  Use the DOQQ labels layer in the map to find the filename of the right image on disk; picked out a subset using <code>rio<\/code> tool<\/p>\n<p><code>rio info final-ir\/m_4012408_sw_10_1_20140607.tif<\/code><\/p>\n<p>  take one-hundredth of a degree in each direction, from some buildings<\/p>\n<p><code><br \/>\n$ rio clip input.tif output.tif --bounds xmin ymin xmax ymax<br \/>\n  --bounds #(-124.087-0.01) (40.878-0.01) (-124.087+0.01)   (40.878+0.01)<br \/>\nrio clip final-ir\/m_4012408_sw_10_1_20140607.tif \/tmp\/work1.tif --bounds -124.097 40.868 -124.077 40.888<\/code><\/p>\n<p>  try expanding the file with gdal to 2x resolution<\/p>\n<p><code>gdal_translate -outsize 200% 0 \/tmp\/work1.tif \/tmp\/work2.tif<br \/>\n   ...<br \/>\n-rw-rw-r-- 1 dbb dbb 24M Feb 20 19:09 work2.tif<br \/>\n-rw-rw-r-- 1 dbb dbb 6.0M Feb 20 19:00 \/tmp\/work1.tif<br \/>\n<\/code><br \/>\n   but, the image result looks exactly the same, just 4x the size on disk.<br \/>\n   onward -- go to BIS work dir<\/p>\n<p><code>dbb@i7d:~\/CEC_i7d\/bis-2\/bisapi-hamlin\/bis-workd$ source \/home\/dbb\/CEC_i7d\/bis-2\/bisenv\/bin\/activate bisenv<br \/>\n(bisenv) dbb@i7d:~\/CEC_i7d\/bis-2\/bisapi-hamlin\/bis-workd$ ipython<\/p>\n<p>In [1]: from automate import automate<\/p>\n<p>  files = automate('\/tmp\/work1.tif', t=[42,48,54,60,72], s=[0.1,0.2,0.4,0.6,0.9], c=[0.5],do_stats=True, do_colorize=False, do_train=False, do_classify=False, do_outline=True, do_lineup=True)<\/code><\/p>\n<p>  CRASH in outline burn()<br \/>\n  repeat with do_outline=False<br \/>\n  also note, turn off stats if they are not being used<\/p>\n<p><code>files = automate('\/tmp\/work1.tif', t=[42,48,54,60,72], s=[0.1,0.2,0.4,0.6,0.9], c=[0.5],do_stats=False, do_colorize=False, do_train=False, do_classify=False, do_outline=False, do_lineup=True)<\/code><\/p>\n<p>  <code>automate()<\/code> invokes the engine five times, each time thresholds are evaluated from one to the largest threshhold in t, and len(t) number of outputs are emitted. A run in this example allocates 12GB of RAM.... about 40 seconds per run.. 5 runs with 5 outputs each, in this example.<\/p>\n<p><em>example output:<\/em><\/p>\n<p>  <a href=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/work5_52_02_05.png\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/work5_52_02_05-150x150.png\" alt=\"\" width=\"150\" height=\"150\" class=\"alignleft size-thumbnail wp-image-3039\" \/><\/a>&nbsp;<a href=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/work5_orig.png\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/work5_orig-150x150.png\" alt=\"\" width=\"150\" height=\"150\" class=\"alignnone size-thumbnail wp-image-3040\" \/><\/a>&nbsp;<a href=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/work5_52_02_05_comb.png\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/work5_52_02_05_comb-150x150.png\" alt=\"\" width=\"150\" height=\"150\" class=\"alignnone size-thumbnail wp-image-3043\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n<p>New Data - <strong>NAIP CA 2016 0.6m<\/strong><br \/>\n<em>copying ...<\/em><\/p>\n<pre>$ \/naip_16\/d2\/naip\/ca_60cm_2016\/40124$ ls -lh m_4012408_sw_10_h_20160528.tif\r\n-rwxrwxrwx 2 root root 439M Sep 29 04:22 m_4012408_sw_10_h_20160528.tif\r\n\r\n##-- test area in Arcata, CA\r\nrio info \/naip_16\/d2\/naip\/ca_60cm_2016\/40124\/m_4012408_sw_10_h_20160528.tif\r\n{\"count\": 4, \"crs\": \"EPSG:26910\", \"colorinterp\": [\"red\", \"green\", \"blue\", \"undefined\"], \"interleave\": \"pixel\", \"dtype\": \"uint8\", \"driver\": \"GTiff\", \"transform\": [0.6, 0.0, 405054.0, 0.0, -0.6, 4532580.0], \"lnglat\": [-124.0937491743777, 40.90625190492096], \"height\": 12180, \"width\": 9430, \"shape\": [12180, 9430], \"tiled\": false, \"res\": [0.6, 0.6], \"nodata\": null, \"bounds\": [405054.0, 4525272.0, 410712.0, 4532580.0]}\r\n\r\n<\/pre>\n<p><div id=\"attachment_3061\" style=\"width: 310px\" class=\"wp-caption alignnone\"><a href=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/NAIP_ca_2016_color_arcata4.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-3061\" src=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/NAIP_ca_2016_color_arcata4-300x181.png\" alt=\"\" width=\"300\" height=\"181\" class=\"size-medium wp-image-3061\" srcset=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/NAIP_ca_2016_color_arcata4-300x181.png 300w, http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/NAIP_ca_2016_color_arcata4-768x464.png 768w, http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/NAIP_ca_2016_color_arcata4-1024x619.png 1024w, http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/04\/NAIP_ca_2016_color_arcata4.png 1495w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a><p id=\"caption-attachment-3061\" class=\"wp-caption-text\">NAIP California 2016, 0.6 meter per pixel, compressed JPEG YCbCr in 4326<\/p><\/div><br \/>\n&nbsp;<\/p>\n<p><strong>Prioritizing LA County in NAIP 2016<\/strong><\/p>\n<pre>auth_buildings=# \r\n  update doqq_processing as a \r\n    set priority=6 from tl_2016_us_county b, naip_3_16_1_1_ca c \r\n      where \r\n        b.statefp='06' and b.countyfp='037' and \r\n        st_intersects(b.geom, c.geom) and a.doqqid=c.gid;\r\n<\/pre>\n<p>&nbsp;<\/p>\n<p>Feb 15 <a href=\"http:\/\/blog.light42.com\/wordpress\/?page_id=3322\" target=\"_blank\" rel=\"noopener\">-ARCHIVE-<\/a><\/p>\n<p>Jan 17 <a href=\"http:\/\/blog.light42.com\/wordpress\/?page_id=3114\" target=\"_blank\" rel=\"noopener\">-ARCHIVE-<\/a><\/p>\n<p>2016 ARCHIVE Main <a href=\"http:\/\/blog.light42.com\/wordpress\/?page_id=3118\" target=\"_blank\" rel=\"noopener\">-LINK-<\/a><\/http:><\/richlv><\/brad><\/p>\n","protected":false},"excerpt":{"rendered":"<p>30 Mar 17 Docs and Handoff &#8212; WA18 ======================================================== Prediction Run &#8212; Pass II * rebuilt tiling &#8211; there was a bug in the tiling code that caused gaps, due to a floating point truncation; using %f format for the float, as input to the transform, was insufficient resolution and the tiles were slightly malformed [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"_links":{"self":[{"href":"http:\/\/blog.light42.com\/wordpress\/index.php?rest_route=\/wp\/v2\/pages\/1166"}],"collection":[{"href":"http:\/\/blog.light42.com\/wordpress\/index.php?rest_route=\/wp\/v2\/pages"}],"about":[{"href":"http:\/\/blog.light42.com\/wordpress\/index.php?rest_route=\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"http:\/\/blog.light42.com\/wordpress\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/blog.light42.com\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1166"}],"version-history":[{"count":1204,"href":"http:\/\/blog.light42.com\/wordpress\/index.php?rest_route=\/wp\/v2\/pages\/1166\/revisions"}],"predecessor-version":[{"id":3773,"href":"http:\/\/blog.light42.com\/wordpress\/index.php?rest_route=\/wp\/v2\/pages\/1166\/revisions\/3773"}],"wp:attachment":[{"href":"http:\/\/blog.light42.com\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1166"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}