Skip to content

KNN GIST Index in Postgresql

With version 9.1 of PostgreSQL and above, the KNN GIST Index is available.

There are two new operators defined:  <-> and <#> 

With version 2.0 of PostGIS and above, the KNN GIST Index functionality is exposed for PostGIS geometry types.

<-> - Returns the distance between two points. For point / point checks it uses floating point accuracy (as opposed to the double precision accuracy of the underlying point geometry). For other geometry types the distance between the floating point bounding box centroids is returned. Useful for doing distance ordering and nearest neighbor limits using KNN gist functionality.

<#> - Returns the distance between bounding box of 2 geometries. For point / point checks it's almost the same as distance (though may be different since the bounding box is at floating point accuracy and geometries are double precision). Useful for doing distance ordering and nearest neighbor limits using KNN gist functionality.

Let’s look at how they operators affect search results in a real-world example – proximity of parcels from a point on the street. Note that one of the geometries in this example is a very thin polygon just inside of a larger one against the street.
This first example shows bounding box ordered distance <#>

 

 

PostGIS Topology of Berkeley Streets

list schemas the low level way

pg_logo

SELECT 
 n.nspname AS "Name",  
 pg_catalog.pg_get_userbyid(n.nspowner) AS "Owner"
FROM 
 pg_catalog.pg_namespace n
WHERE 
 n.nspname !~ '^pg_' AND n.nspname <> 'information_schema'
ORDER BY 1;

One More Graph, SQL to R


import psycopg2
conn = psycopg2.connect( "dbname=example_db" )
curs = conn.cursor()

tSQL = '''
select emp_gden_min, res_units_gden_min, emp_gden_max, res_units_gden_max
from p_translation_table;
'''
curs.execute( tSQL )
resT = curs.fetchall()

from rpy2.robjects.packages import importr
grdevices = importr( 'grDevices' )
grdevices.pdf(file='/home/shared/aab.pdf')

r_plot_str = '''
plot(c(0,1000),c(0,2000),type="n",log="xy",xlim=c(0.1,1000),ylim=c(0.1,2000), xlab="employment", ylab="residence")
'''
rpy2.robjects.r( r_plot_str )

tC = 0
for n in resT:
  tStr2 = "rect( %s,%s,%s,%s, density=10, col=rgb(%s,%s,%s) )" \
    % (n[0],n[1],n[2],n[3], tC,tC,tC)
  rpy2.robjects.r( tStr2 )
  tC += 1.0/len(resT)

grdevices.dev_off()

Killing a Pg Query on Linux

pg_logo

kill => linux utility to send a signal to a process
pid => a process ID

Using the linux kill utility, a user with sufficient privileges could
terminate a running Postgres query specified by process ID (pid). This is a bad idea though, because it will cause PostgreSQL to panic. It is useful to know the pid of the query, as we shall see below. To find the pid of a query, execute the following:

## PG10 Syntax
SELECT datname,usename,pid,client_addr,wait_event,query_start,query
FROM pg_stat_activity ;

or simply

SELECT pg_backend_pid() ;

With the pid, a user might try a termination signal first (bad idea):

kill -SIGTERM pid
kill -15 pid

Or a SIGKILL signal could be issued (also a bad idea). The Linux scheduler executes the command, the process is never notified. There are three equivalent forms :

kill -SIGKILL pid
kill -KILL pid
kill -9 pid

PostgreSQL provides ways to stop a running query (good idea).

pg_terminate_backend(pid int) boolean Terminate a backend
pg_cancel_backend(pid int) boolean Cancel a backend's current query

Lastly, the Postgres utility pg_ctl can issue the SIGKILL signal, and is available on non-Linux platforms.

see

http://www.postgresonline.com/journal/archives/134-Terminating-Annoying-Back-Ends.html

http://www.postgresql.org/docs/9.1/static/functions-admin.html

http://www.postgresql.org/docs/9.1/static/app-pg-ctl.html

Postgres 9.2 release notes:
* Allow non-superusers to use pg_cancel_backend() and pg_terminate_backend() on other sessions belonging to the same user (Magnus Hagander, Josh Kupershmidt, Dan Farina) Previously only superusers were allowed to use these functions.

Peter Eisentraut writes:
  “You shouldn’t run pg_ctl directly under Ubuntu/Debian. Use pg_ctlcluster instead, which is installed by postgresql-common. See its man page for documentation.”
example:  sudo pg_ctlcluster 10 main stop --mode immediate
alt:  sudo systemctl stop postgresql@10-main

postscript from IRC chat

peerce pg_ctl kill TERM sends things a SIGTERM, they have to be listening for it to exit .. SIGKILL, now that will kill it, but thats usually very bad for postgres as it causes immediate exit without any cleanup.

RhodiumToad use pg_cancel_backend (which sends SIGINT, not SIGTERM) and pg_terminate_backend (which does send SIGTERM) exclusively. if you find something that doesn’t die in the face of either of those, the next stop is pg_ctl restart -mimmediate. do not ever send SIGKILL to postgres processes. (if you do kill a backend with -KILL, then pg will do a panic restart, and if you mess up and kill the postmaster with -KILL, you end up with a serious mess)

dbb if you send a kill signal to a query it will certainly panic postgresql ?
RhodiumToad yes

RhodiumToad because pg then has no way to know what the dead process was doing with shared memory, so the only option is to SIGKILL every other process, throw the shared memory segment away, and restart with a new one, which requires doing a recovery from WAL. pg_ctl stop -mimmediate or restart -mimmediate also does the recovery from WAL, but in a slightly more controlled fashion, for a query not to die from pg_terminate_backend after a reasonably short time interval is a bug, though (last one I saw was due to index corruption making vacuum go round in circles) pg_cancel_backend will likewise kill any query that isn’t actually catching a query_canceled exception

Pg Jobs Active

pg_logo

I keep this one line as .sql in my home directory of all the work machines:

## PG10 Syntax
SELECT datname,usename,pid,client_addr,wait_event,query_start,query
FROM pg_stat_activity ;

checking what is going on is easy

psql -f short_queries_list.sql

per the manual

http://www.postgresql.org/docs/current/static/monitoring-stats.html

Sacramento Vehicle Miles Travelled

early visualizations of a model written in Python and PostGIS, in 2011 at Calthorpe Associates, Berkeley.

 
Lens on Daily Transportation Patterns
Sacramento, California 

VMT_sac_15nov10

Zoned, Urban, Highways and Parks Bay Area

zones_residential_ba0

Extract and Sum Classified Areas by Grid Cell

PostgreSQL_logo.3colors.120x120

Given a set of classified polygons, in this example classified into three categories by name, apply a reference grid, and sum the areas for each classification into float fields in a destination table. This code contains test harness pieces in it also, so that a smaller example can be executed and verified, before applying the process to large sets.

drop table if exists ca_lt1 cascade; --test_geom_out;
create table ca_lt1 --test_geom_out
as    select 
  g.gridcode, 
  g.the_geom,
  SUM(d.acres_urban) as acres_urban,
  SUM(d2.acres_constrained) as acres_constrained,
  SUM(d3.acres_greenfield) as acres_greenfield
FROM 
  grid150m_uniq g
  --test_grid g
LEFT JOIN 
( select 
     sum(st_area( st_intersection( land.wkb_geometry, g2.the_geom))) as acres_urban,
     g2.gridcode
  from 
     grid150m_uniq g2, --test_geom land
     ca_landtypes_df land
   where st_intersects( land.wkb_geometry, g2.the_geom) AND
     land.landtype = 'Urban' 
     AND st_intersects( st_geomfromEWKT('SRID=3310;POLYGON((-147472.090  71129.874, -113634.730  71129.874,   -113634.730  50253.940,   -147472.090 50253.940,  -147472.090 71129.874))'), g2.the_geom)
  group by g2.gridcode
   ) d
ON ( g.gridcode = d.gridcode )

LEFT JOIN 
( select 
     sum(st_area( st_intersection( land.wkb_geometry, g2.the_geom))) as acres_constrained,
     g2.gridcode
  from 
     grid150m_uniq g2, --test_geom land
     ca_landtypes_df land
   where st_intersects( land.wkb_geometry, g2.the_geom) AND
     land.landtype = 'Constrained' 
     AND st_intersects( st_geomfromEWKT('SRID=3310;POLYGON((-147472.090  71129.874, -113634.730  71129.874,   -113634.730  50253.940,   -147472.090 50253.940,  -147472.090 71129.874))'), g2.the_geom)
  group by g2.gridcode
   ) d2
ON ( g.gridcode = d2.gridcode )

LEFT JOIN 
( select 
     sum(st_area( st_intersection( land.wkb_geometry, g2.the_geom))) as acres_greenfield,
     g2.gridcode
  from 
     grid150m_uniq g2, --test_geom land
     ca_landtypes_df land
   where st_intersects( land.wkb_geometry, g2.the_geom) AND
     land.landtype = 'Greenfield' 
     AND st_intersects( st_geomfromEWKT('SRID=3310;POLYGON((-147472.090  71129.874, -113634.730  71129.874,   -113634.730  50253.940,   -147472.090 50253.940,  -147472.090 71129.874))'), g2.the_geom)
  group by g2.gridcode
   ) d3
ON ( g.gridcode = d3.gridcode )

  WHERE st_intersects( st_geomfromtext('SRID=3310;POLYGON((-147472.090  71129.874, -113634.730  71129.874,   -113634.730  50253.940,   -147472.090 50253.940,  -147472.090 71129.874))'), g.the_geom)

GROUP BY 
  g.gridcode,
  g.the_geom;


sandag_landtypes_sm

sandag_minus_parcels3

p_gridded


 

Early UrbanFootprint Grid Development

a few variations on grid visualization:

6-placetyped_constraints_perc

1-set_gridded_analysis

2-set_gridded_load_constraint

4-set_gridded_constraints

5-set_gridded_constraints_perc

percentages_c