from pathlib import Path
import tarfile
from urllib.request import urlretrieve
# Download data
= "https://github.com/iamgeoknight/common_datasets/raw/main/parquet/202003-citibike-tripdata.tar.xz"
url = Path("202003-citibike-tripdata.tar.xz")
tar_path = Path("202003-citibike-tripdata.parquet")
parquet_path
if not tar_path.exists():
urlretrieve(url, tar_path)
if not parquet_path.exists():
with tarfile.open(tar_path, "r:xz") as t:
"202003-citibike-tripdata.parquet") t.extract(
Spatial Dev Guru wrote a great tutorial that walks you through a step-by-step geospatial analysis of bike sharing data using DuckDB.
Ibis has support for all the geospatial functions used on the tutorial, and we decided to replicate it and share it with you.
Installation
Install Ibis with the dependencies needed to work with geospatial data using DuckDB:
$ pip install 'ibis-framework[duckdb,geospatial]'
Data
The parquet file used in the original tutorial is available at https://github.com/iamgeoknight/common_datasets/tree/main/parquet. The original data is also available from the citibike source but as a .csv
file.
Now that we have the data, we import Ibis and turn on the interactive mode, to easily explore the output of our queries.
from ibis.interactive import *
Let’s get started
Because this dataset does not contain any geometries, we have to load the spatial extension. If the dataset included any geometry columns, Ibis is smart enough to load the extension for us upon reading the data.
= ibis.duckdb.connect("biketrip.ddb")
con "spatial")
con.load_extension(
# read data and rename columns to use snake case
= con.read_parquet("202003-citibike-tripdata.parquet").rename("snake_case")
biketrip biketrip
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━┓ ┃ tripduration ┃ starttime ┃ stoptime ┃ start_station_id ┃ start_station_name ┃ start_station_latitude ┃ start_station_longitude ┃ end_station_id ┃ end_station_name ┃ end_station_latitude ┃ end_station_longitude ┃ bikeid ┃ usertype ┃ birth_year ┃ gender ┃ ┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━┩ │ int64 │ string │ string │ int64 │ string │ float64 │ float64 │ int64 │ string │ float64 │ float64 │ int64 │ string │ int64 │ int64 │ ├──────────────┼──────────────────────────┼──────────────────────────┼──────────────────┼───────────────────────────────┼────────────────────────┼─────────────────────────┼────────────────┼─────────────────────────────┼──────────────────────┼───────────────────────┼────────┼────────────┼────────────┼────────┤ │ 1589 │ 2020-03-01 00:00:03.6400 │ 2020-03-01 00:26:32.9860 │ 224 │ Spruce St & Nassau St │ 40.711464 │ -74.005524 │ 3574 │ Prospect Pl & Underhill Ave │ 40.676969 │ -73.965790 │ 16214 │ Subscriber │ 1980 │ 1 │ │ 389 │ 2020-03-01 00:00:16.7560 │ 2020-03-01 00:06:46.0620 │ 293 │ Lafayette St & E 8 St │ 40.730207 │ -73.991026 │ 223 │ W 13 St & 7 Ave │ 40.737815 │ -73.999947 │ 29994 │ Subscriber │ 1991 │ 2 │ │ 614 │ 2020-03-01 00:00:20.0580 │ 2020-03-01 00:10:34.2200 │ 379 │ W 31 St & 7 Ave │ 40.749156 │ -73.991600 │ 515 │ W 43 St & 10 Ave │ 40.760094 │ -73.994618 │ 39853 │ Subscriber │ 1991 │ 1 │ │ 597 │ 2020-03-01 00:00:24.3510 │ 2020-03-01 00:10:22.3390 │ 3739 │ Perry St & Greenwich Ave │ 40.735918 │ -74.000939 │ 325 │ E 19 St & 3 Ave │ 40.736245 │ -73.984738 │ 42608 │ Subscriber │ 1989 │ 1 │ │ 1920 │ 2020-03-01 00:00:26.1120 │ 2020-03-01 00:32:26.2680 │ 236 │ St Marks Pl & 2 Ave │ 40.728419 │ -73.987140 │ 3124 │ 46 Ave & 5 St │ 40.747310 │ -73.954510 │ 36288 │ Subscriber │ 1993 │ 1 │ │ 2055 │ 2020-03-01 00:00:46.1010 │ 2020-03-01 00:35:01.2880 │ 471 │ Grand St & Havemeyer St │ 40.712868 │ -73.956981 │ 497 │ E 17 St & Broadway │ 40.737050 │ -73.990093 │ 31313 │ Subscriber │ 1985 │ 1 │ │ 764 │ 2020-03-01 00:01:04.6650 │ 2020-03-01 00:13:49.0300 │ 83 │ Atlantic Ave & Fort Greene Pl │ 40.683826 │ -73.976323 │ 3579 │ Sterling Pl & Bedford Ave │ 40.672695 │ -73.954131 │ 31216 │ Subscriber │ 1980 │ 2 │ │ 1719 │ 2020-03-01 00:01:04.9250 │ 2020-03-01 00:29:44.0140 │ 3809 │ W 55 St & 6 Ave │ 40.763189 │ -73.978434 │ 3372 │ E 74 St & 1 Ave │ 40.768974 │ -73.954823 │ 27715 │ Subscriber │ 1968 │ 2 │ │ 441 │ 2020-03-01 00:01:09.3050 │ 2020-03-01 00:08:30.8670 │ 3463 │ E 16 St & Irving Pl │ 40.735367 │ -73.987974 │ 174 │ E 25 St & 1 Ave │ 40.738177 │ -73.977387 │ 42015 │ Subscriber │ 1970 │ 1 │ │ 601 │ 2020-03-01 00:01:09.4420 │ 2020-03-01 00:11:11.2240 │ 406 │ Hicks St & Montague St │ 40.695128 │ -73.995951 │ 421 │ Clermont Ave & Park Ave │ 40.695734 │ -73.971297 │ 27592 │ Subscriber │ 1985 │ 1 │ │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ └──────────────┴──────────────────────────┴──────────────────────────┴──────────────────┴───────────────────────────────┴────────────────────────┴─────────────────────────┴────────────────┴─────────────────────────────┴──────────────────────┴───────────────────────┴────────┴────────────┴────────────┴────────┘
We have the information about the longitude and latitude for start and end stations, to create geometry points and put the spatial features to use.
Create bike trip table
In the original tutorial, Spatial Dev Guru creates a table with transformed “Pickup” and “Dropoff” points. In DuckDB the st_transform
function takes by default points as YX
(latitude/longitude) while in Ibis, we assume data in the form XY
(longitude/latitude) to be consistent with PostGIS and Geopandas.
# Notice longitude/latitude order
= _.start_station_longitude.point(_.start_station_latitude)
pickup = _.end_station_longitude.point(_.end_station_latitude)
dropoff
# convert is the equivalent of `st_transform`
= biketrip.mutate(
biketrip =pickup.convert("EPSG:4326", "EPSG:3857"),
pickup_point=dropoff.convert("EPSG:4326", "EPSG:3857"),
dropoff_point
)
"pickup_point", "dropoff_point"]] biketrip[[
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ pickup_point ┃ dropoff_point ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ geospatial:geometry │ geospatial:geometry │ ├────────────────────────────────────┼────────────────────────────────────┤ │ <POINT (-8238257.278 4969875.321)> │ <POINT (-8233834.079 4964810.852)> │ │ <POINT (-8236643.369 4972628.277)> │ <POINT (-8237636.375 4973746.028)> │ │ <POINT (-8236707.235 4975412.341)> │ <POINT (-8237043.245 4977019.779)> │ │ <POINT (-8237746.848 4973467.317)> │ <POINT (-8235943.322 4973515.398)> │ │ <POINT (-8236210.701 4972365.622)> │ <POINT (-8232578.395 4975141.09)> │ │ <POINT (-8232853.487 4970081.63)> │ <POINT (-8236539.472 4973633.6)> │ │ <POINT (-8235006.638 4965817.34)> │ <POINT (-8232536.205 4964183.464)> │ │ <POINT (-8235241.603 4977474.595)> │ <POINT (-8232613.208 4978324.842)> │ │ <POINT (-8236303.583 4973386.377)> │ <POINT (-8235125.009 4973799.125)> │ │ <POINT (-8237191.547 4967476.65)> │ <POINT (-8234447.08 4967565.556)> │ │ … │ … │ └────────────────────────────────────┴────────────────────────────────────┘
Using mutate
we add two new columns to our biketrip
table with transformed spatial points for pickup and dropoff locations, that are in the Web Mercator projection (EPSG:3857).
Identify popular starts and end stations
The following queries retrieve a list of bike start and end stations with their respective trip count in descending order.
Top 10 start stations by trip count
=ibis._.count()).order_by(
biketrip.group_by(biketrip.start_station_name).agg(trips"trips")
ibis.desc( )
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┓ ┃ start_station_name ┃ trips ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━┩ │ string │ int64 │ ├───────────────────────────────┼───────┤ │ Pershing Square North │ 6679 │ │ W 21 St & 6 Ave │ 6058 │ │ West St & Chambers St │ 5808 │ │ E 17 St & Broadway │ 5585 │ │ Broadway & W 60 St │ 5361 │ │ 8 Ave & W 31 St │ 5287 │ │ Broadway & E 22 St │ 5201 │ │ 12 Ave & W 40 St │ 5029 │ │ W 41 St & 8 Ave │ 4809 │ │ Christopher St & Greenwich St │ 4786 │ │ … │ … │ └───────────────────────────────┴───────┘
Similarly, in Ibis you can use the topk
operation:
10) biketrip.start_station_name.topk(
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓ ┃ start_station_name ┃ CountStar() ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩ │ string │ int64 │ ├───────────────────────────────┼─────────────┤ │ Pershing Square North │ 6679 │ │ W 21 St & 6 Ave │ 6058 │ │ West St & Chambers St │ 5808 │ │ E 17 St & Broadway │ 5585 │ │ Broadway & W 60 St │ 5361 │ │ 8 Ave & W 31 St │ 5287 │ │ Broadway & E 22 St │ 5201 │ │ 12 Ave & W 40 St │ 5029 │ │ W 41 St & 8 Ave │ 4809 │ │ Christopher St & Greenwich St │ 4786 │ └───────────────────────────────┴─────────────┘
Top 10 end stations by trip count
10) biketrip.end_station_name.topk(
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓ ┃ end_station_name ┃ CountStar() ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩ │ string │ int64 │ ├───────────────────────────────┼─────────────┤ │ Pershing Square North │ 6584 │ │ E 17 St & Broadway │ 6179 │ │ W 21 St & 6 Ave │ 6108 │ │ West St & Chambers St │ 5857 │ │ Broadway & E 22 St │ 5751 │ │ 8 Ave & W 31 St │ 5436 │ │ Broadway & W 60 St │ 5231 │ │ 12 Ave & W 40 St │ 5180 │ │ Christopher St & Greenwich St │ 4849 │ │ E 13 St & Avenue A │ 4827 │ └───────────────────────────────┴─────────────┘
Explore trip patterns by user type
We can also calculate the average trip duration and distance traveled for each user type. According to the data dictionary, user type can be “customer” or “subscriber” where:
- Customer = 24-hour pass or 3-day pass user
- Subscriber = Annual Member
biketrip.group_by(_.usertype).aggregate(=_.tripduration.mean(),
avg_duration=_.pickup_point.distance(_.dropoff_point).mean()
avg_distance )
┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓ ┃ usertype ┃ avg_duration ┃ avg_distance ┃ ┡━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩ │ string │ float64 │ float64 │ ├────────────┼──────────────┼──────────────┤ │ Customer │ 2619.627797 │ 3017.728172 │ │ Subscriber │ 914.437774 │ 2457.477582 │ └────────────┴──────────────┴──────────────┘
Analyzing efficiency: trip duration vs linear distance
The original tutorial defines efficiency_ratio
as the trip_duration
/ linear_distance
, where a higher efficiency ratio could mean a more direct route or faster travel times.
# linear distance
= biketrip.pickup_point.distance(biketrip.dropoff_point)
trip_distance
= biketrip.mutate(
biketrip =trip_distance,
linear_distance=_.tripduration / trip_distance,
efficiency_ratio
)
"pickup_point", "dropoff_point", "linear_distance", "efficiency_ratio"]] biketrip[[
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓ ┃ pickup_point ┃ dropoff_point ┃ linear_distance ┃ efficiency_ratio ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━┩ │ geospatial:geometry │ geospatial:geometry │ float64 │ float64 │ ├────────────────────────────────────┼────────────────────────────────────┼─────────────────┼──────────────────┤ │ <POINT (-8238257.278 4969875.321)> │ <POINT (-8233834.079 4964810.852)> │ 6724.101099 │ 0.236314 │ │ <POINT (-8236643.369 4972628.277)> │ <POINT (-8237636.375 4973746.028)> │ 1495.135118 │ 0.260177 │ │ <POINT (-8236707.235 4975412.341)> │ <POINT (-8237043.245 4977019.779)> │ 1642.181058 │ 0.373893 │ │ <POINT (-8237746.848 4973467.317)> │ <POINT (-8235943.322 4973515.398)> │ 1804.166804 │ 0.330901 │ │ <POINT (-8236210.701 4972365.622)> │ <POINT (-8232578.395 4975141.09)> │ 4571.309382 │ 0.420011 │ │ <POINT (-8232853.487 4970081.63)> │ <POINT (-8236539.472 4973633.6)> │ 5118.884569 │ 0.401455 │ │ <POINT (-8235006.638 4965817.34)> │ <POINT (-8232536.205 4964183.464)> │ 2961.856125 │ 0.257946 │ │ <POINT (-8235241.603 4977474.595)> │ <POINT (-8232613.208 4978324.842)> │ 2762.494638 │ 0.622264 │ │ <POINT (-8236303.583 4973386.377)> │ <POINT (-8235125.009 4973799.125)> │ 1248.758342 │ 0.353151 │ │ <POINT (-8237191.547 4967476.65)> │ <POINT (-8234447.08 4967565.556)> │ 2745.907046 │ 0.218871 │ │ … │ … │ … │ … │ └────────────────────────────────────┴────────────────────────────────────┴─────────────────┴──────────────────┘
Let’s take take a look at the table in descending order for the linear_distance
, for trips that are longer than 0 meters.
filter(_.linear_distance > 0).order_by(ibis.desc("linear_distance")) biketrip.
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓ ┃ tripduration ┃ starttime ┃ stoptime ┃ start_station_id ┃ start_station_name ┃ start_station_latitude ┃ start_station_longitude ┃ end_station_id ┃ end_station_name ┃ end_station_latitude ┃ end_station_longitude ┃ bikeid ┃ usertype ┃ birth_year ┃ gender ┃ pickup_point ┃ dropoff_point ┃ linear_distance ┃ efficiency_ratio ┃ ┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━┩ │ int64 │ string │ string │ int64 │ string │ float64 │ float64 │ int64 │ string │ float64 │ float64 │ int64 │ string │ int64 │ int64 │ geospatial:geometry │ geospatial:geometry │ float64 │ float64 │ ├──────────────┼──────────────────────────┼──────────────────────────┼──────────────────┼────────────────────────────────────┼────────────────────────┼─────────────────────────┼────────────────┼─────────────────────────────────────┼──────────────────────┼───────────────────────┼────────┼────────────┼────────────┼────────┼────────────────────────────────────┼────────────────────────────────────┼─────────────────┼──────────────────┤ │ 7660 │ 2020-03-20 19:12:59.0210 │ 2020-03-20 21:20:39.5060 │ 3403 │ 4 Ave & 9 St │ 40.670513 │ -73.988766 │ 3881 │ 12 Ave & W 125 St │ 40.818299 │ -73.960405 │ 30277 │ Customer │ 1968 │ 2 │ <POINT (-8236391.739 4963863.283)> │ <POINT (-8233234.623 4985577.637)> │ 21942.664921 │ 0.349092 │ │ 9536 │ 2020-03-07 07:16:42.8100 │ 2020-03-07 09:55:39.1630 │ 3545 │ Broadway & W 122 St │ 40.812056 │ -73.961255 │ 3661 │ Montgomery St & Franklin Ave │ 40.666439 │ -73.960556 │ 33590 │ Customer │ 1998 │ 2 │ <POINT (-8233329.212 4984659.395)> │ <POINT (-8233251.476 4963265.338)> │ 21394.198653 │ 0.445728 │ │ 9541 │ 2020-03-07 07:16:42.9730 │ 2020-03-07 09:55:44.7680 │ 3545 │ Broadway & W 122 St │ 40.812056 │ -73.961255 │ 3661 │ Montgomery St & Franklin Ave │ 40.666439 │ -73.960556 │ 30524 │ Customer │ 2000 │ 2 │ <POINT (-8233329.212 4984659.395)> │ <POINT (-8233251.476 4963265.338)> │ 21394.198653 │ 0.445962 │ │ 6557 │ 2020-03-18 12:43:05.5510 │ 2020-03-18 14:32:22.5670 │ 3543 │ Morningside Dr & Amsterdam Ave │ 40.810285 │ -73.957365 │ 3317 │ 10 St & 5 Ave │ 40.668627 │ -73.987001 │ 36139 │ Subscriber │ 1994 │ 1 │ <POINT (-8232896.163 4984398.898)> │ <POINT (-8236195.224 4963586.452)> │ 21072.297158 │ 0.311167 │ │ 5361 │ 2020-03-27 15:32:39.0650 │ 2020-03-27 17:02:00.8680 │ 3881 │ 12 Ave & W 125 St │ 40.818299 │ -73.960405 │ 3384 │ Smith St & 3 St │ 40.678724 │ -73.995991 │ 27737 │ Customer │ 1982 │ 2 │ <POINT (-8233234.623 4985577.637)> │ <POINT (-8237196.038 4965068.429)> │ 20888.283661 │ 0.256651 │ │ 9164 │ 2020-03-07 07:23:19.9250 │ 2020-03-07 09:56:04.2370 │ 3536 │ W 116 St & Broadway │ 40.808200 │ -73.964100 │ 3661 │ Montgomery St & Franklin Ave │ 40.666439 │ -73.960556 │ 38264 │ Customer │ 1998 │ 2 │ <POINT (-8233645.949 4984092.238)> │ <POINT (-8233251.476 4963265.338)> │ 20830.634948 │ 0.439929 │ │ 8898 │ 2020-03-07 07:27:08.5700 │ 2020-03-07 09:55:27.5190 │ 3536 │ W 116 St & Broadway │ 40.808200 │ -73.964100 │ 3661 │ Montgomery St & Franklin Ave │ 40.666439 │ -73.960556 │ 21194 │ Customer │ 1999 │ 2 │ <POINT (-8233645.949 4984092.238)> │ <POINT (-8233251.476 4963265.338)> │ 20830.634948 │ 0.427159 │ │ 5504 │ 2020-03-10 20:45:55.0120 │ 2020-03-10 22:17:39.2290 │ 3405 │ 5 St & 6 Ave │ 40.670484 │ -73.982090 │ 3629 │ Adam Clayton Powell Blvd & W 126 St │ 40.809495 │ -73.947765 │ 37653 │ Customer │ 1979 │ 2 │ <POINT (-8235648.551 4963858.895)> │ <POINT (-8231827.538 4984282.75)> │ 20778.209576 │ 0.264893 │ │ 5915 │ 2020-03-10 15:59:38.5130 │ 2020-03-10 17:38:13.8720 │ 3534 │ Frederick Douglass Blvd & W 117 St │ 40.805159 │ -73.954692 │ 3596 │ Sullivan Pl & Bedford Ave │ 40.664241 │ -73.957469 │ 37257 │ Customer │ 1969 │ 0 │ <POINT (-8232598.655 4983645)> │ <POINT (-8232907.745 4962942.662)> │ 20704.645089 │ 0.285685 │ │ 4755 │ 2020-03-24 07:41:42.1440 │ 2020-03-24 09:00:58.0820 │ 3349 │ Grand Army Plaza & Plaza St West │ 40.672968 │ -73.970880 │ 3529 │ Lenox Ave & W 130 St │ 40.810792 │ -73.943068 │ 36100 │ Subscriber │ 1965 │ 2 │ <POINT (-8234400.677 4964223.518)> │ <POINT (-8231304.689 4984473.487)> │ 20485.271859 │ 0.232118 │ │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ └──────────────┴──────────────────────────┴──────────────────────────┴──────────────────┴────────────────────────────────────┴────────────────────────┴─────────────────────────┴────────────────┴─────────────────────────────────────┴──────────────────────┴───────────────────────┴────────┴────────────┴────────────┴────────┴────────────────────────────────────┴────────────────────────────────────┴─────────────────┴──────────────────┘
Analyzing bike trips within a 500 meters radius
In the original tutorial, the author choses a point (first point on the table), and it creates a buffer of 500 m radius around it. In our table we already have the point in meters, since we converted them in a previous query.
The following query shows all the bike trips whose pickup point falls within a 500 meter radius of the first point of the table with long=-74.00552427
and lat=40.71146364
.
# grab the first row of the data
= biketrip.limit(1)
first_point
= biketrip.filter(
trips_within_500 buffer(500)).to_array())
_.pickup_point.within(first_point.select(_.pickup_point.
)
trips_within_500
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓ ┃ tripduration ┃ starttime ┃ stoptime ┃ start_station_id ┃ start_station_name ┃ start_station_latitude ┃ start_station_longitude ┃ end_station_id ┃ end_station_name ┃ end_station_latitude ┃ end_station_longitude ┃ bikeid ┃ usertype ┃ birth_year ┃ gender ┃ pickup_point ┃ dropoff_point ┃ linear_distance ┃ efficiency_ratio ┃ ┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━┩ │ int64 │ string │ string │ int64 │ string │ float64 │ float64 │ int64 │ string │ float64 │ float64 │ int64 │ string │ int64 │ int64 │ geospatial:geometry │ geospatial:geometry │ float64 │ float64 │ ├──────────────┼──────────────────────────┼──────────────────────────┼──────────────────┼───────────────────────┼────────────────────────┼─────────────────────────┼────────────────┼─────────────────────────────┼──────────────────────┼───────────────────────┼────────┼────────────┼────────────┼────────┼────────────────────────────────────┼────────────────────────────────────┼─────────────────┼──────────────────┤ │ 1589 │ 2020-03-01 00:00:03.6400 │ 2020-03-01 00:26:32.9860 │ 224 │ Spruce St & Nassau St │ 40.711464 │ -74.005524 │ 3574 │ Prospect Pl & Underhill Ave │ 40.676969 │ -73.965790 │ 16214 │ Subscriber │ 1980 │ 1 │ <POINT (-8238257.278 4969875.321)> │ <POINT (-8233834.079 4964810.852)> │ 6724.101099 │ 0.236314 │ │ 1813 │ 2020-03-01 00:09:02.6260 │ 2020-03-01 00:39:16.4950 │ 3783 │ Cliff St & Fulton St │ 40.708380 │ -74.004950 │ 410 │ Suffolk St & Stanton St │ 40.720664 │ -73.985180 │ 41200 │ Subscriber │ 1995 │ 1 │ <POINT (-8238193.35 4969422.472)> │ <POINT (-8235992.538 4971226.629)> │ 2845.796350 │ 0.637080 │ │ 463 │ 2020-03-01 00:24:52.7870 │ 2020-03-01 00:32:36.6160 │ 330 │ Reade St & Broadway │ 40.714505 │ -74.005628 │ 311 │ Norfolk St & Broome St │ 40.717227 │ -73.988021 │ 27788 │ Subscriber │ 1999 │ 1 │ <POINT (-8238268.813 4970321.91)> │ <POINT (-8236308.805 4970721.816)> │ 2000.388975 │ 0.231455 │ │ 454 │ 2020-03-01 00:25:06.9590 │ 2020-03-01 00:32:41.9440 │ 330 │ Reade St & Broadway │ 40.714505 │ -74.005628 │ 311 │ Norfolk St & Broome St │ 40.717227 │ -73.988021 │ 34083 │ Subscriber │ 1968 │ 1 │ <POINT (-8238268.813 4970321.91)> │ <POINT (-8236308.805 4970721.816)> │ 2000.388975 │ 0.226956 │ │ 492 │ 2020-03-01 00:40:27.3060 │ 2020-03-01 00:48:40.2190 │ 330 │ Reade St & Broadway │ 40.714505 │ -74.005628 │ 309 │ Murray St & West St │ 40.714979 │ -74.013012 │ 37858 │ Subscriber │ 1975 │ 1 │ <POINT (-8238268.813 4970321.91)> │ <POINT (-8239090.808 4970391.552)> │ 824.940265 │ 0.596407 │ │ 2020 │ 2020-03-01 01:14:59.0300 │ 2020-03-01 01:48:39.4420 │ 224 │ Spruce St & Nassau St │ 40.711464 │ -74.005524 │ 236 │ St Marks Pl & 2 Ave │ 40.728419 │ -73.987140 │ 36388 │ Subscriber │ 1986 │ 2 │ <POINT (-8238257.278 4969875.321)> │ <POINT (-8236210.701 4972365.622)> │ 3223.364013 │ 0.626674 │ │ 794 │ 2020-03-01 01:15:33.0460 │ 2020-03-01 01:28:47.6180 │ 376 │ John St & William St │ 40.708621 │ -74.007222 │ 3746 │ 6 Ave & Broome St │ 40.724308 │ -74.004730 │ 37334 │ Subscriber │ 1998 │ 1 │ <POINT (-8238446.219 4969457.928)> │ <POINT (-8238168.9 4971761.857)> │ 2320.558910 │ 0.342159 │ │ 816 │ 2020-03-01 01:16:07.4430 │ 2020-03-01 01:29:43.7260 │ 376 │ John St & William St │ 40.708621 │ -74.007222 │ 3746 │ 6 Ave & Broome St │ 40.724308 │ -74.004730 │ 18888 │ Customer │ 1969 │ 0 │ <POINT (-8238446.219 4969457.928)> │ <POINT (-8238168.9 4971761.857)> │ 2320.558910 │ 0.351639 │ │ 238 │ 2020-03-01 01:28:46.2460 │ 2020-03-01 01:32:44.7130 │ 376 │ John St & William St │ 40.708621 │ -74.007222 │ 415 │ Pearl St & Hanover Square │ 40.704718 │ -74.009260 │ 29327 │ Subscriber │ 1960 │ 1 │ <POINT (-8238446.219 4969457.928)> │ <POINT (-8238673.167 4968884.671)> │ 616.546132 │ 0.386021 │ │ 120 │ 2020-03-01 01:33:37.7920 │ 2020-03-01 01:35:38.2580 │ 330 │ Reade St & Broadway │ 40.714505 │ -74.005628 │ 146 │ Hudson St & Reade St │ 40.716250 │ -74.009106 │ 28153 │ Subscriber │ 1988 │ 1 │ <POINT (-8238268.813 4970321.91)> │ <POINT (-8238655.983 4970578.277)> │ 464.354418 │ 0.258423 │ │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ └──────────────┴──────────────────────────┴──────────────────────────┴──────────────────┴───────────────────────┴────────────────────────┴─────────────────────────┴────────────────┴─────────────────────────────┴──────────────────────┴───────────────────────┴────────┴────────────┴────────────┴────────┴────────────────────────────────────┴────────────────────────────────────┴─────────────────┴──────────────────┘
Acknowledgements and resources
Thank you to Spatial Dev Guru, for the amazing tutorial showcasing DuckDB spatial features. It was fun to replicate the tutorial using Ibis.
If you are interested in learning more about Ibis-DuckDB geospatial support, here is another blog post bis + DuckDB geospatial: a match made on Earth.
Here are some resources to learn more about Ibis:
Chat with us on Zulip: