{"id":1558,"date":"2014-08-05T11:24:41","date_gmt":"2014-08-05T18:24:41","guid":{"rendered":"http:\/\/blog.light42.com\/wordpress\/?p=1558"},"modified":"2014-08-05T21:38:02","modified_gmt":"2014-08-06T04:38:02","slug":"numeric-stats-on-bay-area-intersection-counts","status":"publish","type":"post","link":"http:\/\/blog.light42.com\/wordpress\/?p=1558","title":{"rendered":"Numeric Stats on Bay Area Intersection Counts"},"content":{"rendered":"<p><img decoding=\"async\" alt=\"pg_logo\" src=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2012\/06\/PostgreSQL_logo.3colors.120x120.png\" width=\"108px\" height=\"108px\" align=\"right\" style=\"padding-left: 16px;\"\/><\/p>\n<p>In preparing for an upcoming <a href=\"http:\/\/asa-datathon.github.io\/\" title=\"Datathon\" target=\"_blank\">Datathon<\/a>, a column of data in PostgreSQL <code>numeric<\/code> format needed formatting for presentation. &#8220;Intersection Count&#8221; <code>intersection_density_sqkm<\/code>  is a count of street intersections per unit area &#8211; a quick way to measure density of the built environment.  A table of grid cells (covering the nine-county San Francisco Bay Area) that the column comes from consists of roughly 814,000 cells. How to quickly characterize the data contents?  Use SQL and the psql <strong>quantile<\/strong> extension to look at ranges, with and without the zeroes.<br \/>\n&nbsp;<\/p>\n<pre>\r\nSELECT \r\n   min(intersection_density_sqkm), \r\n    quantile(intersection_density_sqkm,ARRAY[0.02,0.25,0.5,0.75,0.92]), \r\n   max(intersection_density_sqkm) \r\n\r\n FROM uf_singleparts.ba_intersection_density_sqkm\r\n<\/pre>\n<blockquote>\n<pre>\r\n-[ RECORD 1 ]-----------------------------------------------------------------------------\r\nmin      | 0.0\r\nquantile | {0.0, 0.0, 0.683937...,3.191709...,25.604519...}\r\nmax      | 116.269430...\r\n<\/pre>\n<\/blockquote>\n<p>Psql extension <code>quantile<\/code> takes as arguments a column name and an ARRAY for <em>N<\/em> positional elements by percentage, <em>e.g. above<\/em><\/p>\n<hr \/>\n<h4>How Many Gridcells Have Non-zero Data ?<\/h4>\n<pre>\r\nselect count(*) from ba_intersection_density_sqkm;\r\ncount => 814439\r\n\r\nselect count(*) from ba_intersection_density_sqkm where intersection_density_sqkm <> 0;\r\ncount => 587504\r\n<\/pre>\n<p>&nbsp;Select stats on non-zero data<\/p>\n<pre>\r\n SELECT \r\n   min(intersection_density_sqkm), \r\n    quantile(intersection_density_sqkm,ARRAY[0.02,0.25,0.5,0.75,0.92]), \r\n   max(intersection_density_sqkm) \r\n\r\n FROM uf_singleparts.ba_intersection_density_sqkm\r\n\r\n where intersection_density_sqkm <> 0;\r\n<\/pre>\n<blockquote>\n<pre>\r\n-[ RECORD 1 ]-----------------------------------------------------------------------------\r\nmin      | 0.227979...\r\nquantile | {0.227979...,0.455958...,1.367875...,7.751295...,31.461139...}\r\nmax      | 116.269430...\r\n<\/pre>\n<\/blockquote>\n<p>&nbsp;<br \/>\nand, what does the high-end of the range look like ?  Use SQL for a quick visual inspection for either outliers or smooth transitions:<\/p>\n<pre>\r\nSELECT intersection_density_sqkm \r\nFROM ba_intersection_density_sqkm \r\n ORDER BY  intersection_density_sqkm   desc limit 12;\r\n\r\n intersection_density_sqkm \r\n---------------------------\r\n      116.2694300518134736\r\n      115.5854922279792768\r\n      115.3575129533678764\r\n      115.1295336787564760\r\n      114.9015544041450756\r\n      114.9015544041450756\r\n      114.4455958549222792\r\n      113.7616580310880824\r\n      112.6217616580310892\r\n      112.6217616580310892\r\n      112.1658031088082884\r\n      112.1658031088082884\r\n<\/pre>\n<p>&nbsp;<\/p>\n<p>So, recall that a natural log <em>e<\/em> of 1.0 is 0; a natural log of 116 is slightly over 4.75; a natural log of a number less than 1 is a negative number. To simplify the range for visualization, add a float column called <i>data<\/i>, set the new <i>data<\/i> column to the natural log of <em>(intersection_density_sqkm + 1);<\/em> use a simple multiply-then-divide technique to limit the precision to two digits (screenshot from an <strong><a href=\"http:\/\/ipython.org\/notebook.html\" title=\"IPython Notebook\" target=\"_blank\">IPython Notebook<\/a><\/strong> session using <strong><a href=\"http:\/\/initd.org\/psycopg\/\" title=\"psycopg2\" target=\"_blank\">psycopg2<\/a><\/strong>).<\/p>\n<p><a href=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2014\/08\/ipynb_sql_update.png\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2014\/08\/ipynb_sql_update.png\" alt=\"ipython_notebook UPDATE sql\" width=\"841\" height=\"147\" class=\"alignnone size-full wp-image-1577\" srcset=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2014\/08\/ipynb_sql_update.png 841w, http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2014\/08\/ipynb_sql_update-300x52.png 300w\" sizes=\"(max-width: 841px) 100vw, 841px\" \/><\/a><\/p>\n<pre>\r\n select quantile(data,ARRAY[0.02,0.25,0.5,0.75,0.92]) from ba_intersection_density_sqkm;\r\n&nbsp;&nbsp;{ 0, 0, 0.52, 1.43, 3.28 }\r\n<\/pre>\n<pre>\r\nSELECT\r\n  min(data), \r\n    quantile(data,ARRAY[0.02,0.25,0.5,0.75,0.92]), \r\n  max(data) \r\nFROM ba_intersection_density_sqkm\r\nWHERE data <> 0;\r\n\r\n min  |          quantile          | max  \r\n------+----------------------------+------\r\n 0.21 | {0.21,0.38,0.86,2.17,3.48} | 4.76\r\n(1 row)\r\n<\/pre>\n<p>&nbsp;<\/p>\n<h4>Final Results in GeoServer 2.5 CSS styler:<\/h4>\n<p><a href=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2014\/08\/geoserv_css_style_inters.png\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2014\/08\/geoserv_css_style_inters.png\" alt=\"geoserv_css_style_inters\" width=\"831\" height=\"678\" class=\"alignnone size-full wp-image-1587\" srcset=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2014\/08\/geoserv_css_style_inters.png 831w, http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2014\/08\/geoserv_css_style_inters-300x244.png 300w\" sizes=\"(max-width: 831px) 100vw, 831px\" \/><\/a><\/p>\n<p>ps- <em>a full sequential scan on this table takes about four seconds, on a Western Digital Black Label 1TB disk, ext4 filesystem, on Linux.<br \/>\n<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>In preparing for an upcoming Datathon, a column of data in PostgreSQL numeric format needed formatting for presentation. &#8220;Intersection Count&#8221; intersection_density_sqkm is a count of street intersections per unit area &#8211; a quick way to measure density of the built environment. A table of grid cells (covering the nine-county San Francisco Bay Area) that the [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[11,5],"tags":[],"_links":{"self":[{"href":"http:\/\/blog.light42.com\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/1558"}],"collection":[{"href":"http:\/\/blog.light42.com\/wordpress\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/blog.light42.com\/wordpress\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/blog.light42.com\/wordpress\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/blog.light42.com\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1558"}],"version-history":[{"count":50,"href":"http:\/\/blog.light42.com\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/1558\/revisions"}],"predecessor-version":[{"id":1610,"href":"http:\/\/blog.light42.com\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/1558\/revisions\/1610"}],"wp:attachment":[{"href":"http:\/\/blog.light42.com\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1558"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/blog.light42.com\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1558"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/blog.light42.com\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1558"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}