In California, there are multi-county, regional government agencies that perform some functions like planning public transportation systems. The capital city of California is Sacramento, and the regional council is called the Sacramento Area Council of Governments (SACOG).
Using a database of train and light rail stations in a projection where distance measurement is accurate, and the 2010 US Census data, it is possible to find distance from a train station to a census tract or, better yet a census block, like so:
-- return the distance of each transit station from census tract 1768 SELECT b.ogc_fid, st_distance( t.wkb_geometry, b.wkb_geometry)::integer as dist FROM transit_stations_updated t, census_2010_tracts b WHERE t.region = 'SACOG' and b.ogc_fid = 1768 ORDER BY st_distance( t.wkb_geometry, b.wkb_geometry);
But wait, if we want to get the distance to a transit stop, it is far more useful to find the distance to the nearest transit stop.. and getting distance to one transit stop is straightforward, but what about all transit stops in the district, using one SQL statement for performance?
For this, we can use the handy WINDOW FUNCTIONS introduced in PostgreSQL 8.4. Let’s break the problem into parts.
First, list the distances from one census block, in order, using a RANK() function too:
-- return the ranked distance of each transit station from census block 433144 SELECT b.ogc_fid as census_block_id, t.ogc_fid as transit_station_id, st_distance( t.wkb_geometry, b.wkb_geometry)::integer as dist, rank() OVER (partition by b.ogc_fid ORDER BY st_distance( t.wkb_geometry, b.wkb_geometry)) FROM transit_stations_updated t, census_2010_blocks b WHERE t.region = 'SACOG' and b.ogc_fid = 433144 ORDER BY st_distance( t.wkb_geometry, b.wkb_geometry)::integer; -- census_block_id | transit_station_id | dist | rank -----------------+--------------------+-------+------ 433144 | 758 | 176 | 1 433144 | 715 | 369 | 2 433144 | 716 | 1152 | 3 433144 | 1137 | 1377 | 4 ...
Windowing functions have been written to be very general, so the scope of any description is far larger than this post. What we can observe here is that a partition has been declared by census block, and the order within the partition is by distance between census block and transit stop. Now we have almost all of the components in place to solve: for every census block, return only the nearest transit stop. Here I have written a solution using a subselect – there are definitely other ways to write this! Note that a judicious use of indexing can speed things up quite a bit.
-- return only the nearest transit station to each census block and if less than 100 meters SELECT census_block_id, transit_station_id, dist FROM ( SELECT b.ogc_fid as census_block_id, t.ogc_fid as transit_station_id, st_distance( t.wkb_geometry, b.wkb_geometry)::integer as dist, rank() OVER (partition by b.ogc_fid ORDER BY st_distance( t.wkb_geometry, b.wkb_geometry)) as rank FROM transit_stations_updated t, census_2010_blocks b WHERE st_distance(t.wkb_geometry, b.wkb_geometry) < 100 AND b.countyfp10 = '067' AND t.region = 'SACOG' ) t1 WHERE t1.rank = 1;
Finally, a variation of this problem is to take all locations within a certain buffer distance of at least one transit stop, and record them along with the distance to the nearest transit stop.