Projections#

Xvec supports handling Coordinate Reference Systems (CRS) using the PyPROJ package. Each GeometryIndex has a .crs attribute storing the CRS as a pyproj.CRS object (similarly to GeoPandas), and the same object is also stored in the attrs of the respective DataArray.

import geopandas as gpd
import pandas as pd
import xarray as xr
import xvec

You can create the same Dataset of NYC taxi trips as in the previous notebook, just backed by a sparse array to save memory this time.

Hide code cell source
# Load the data
trips = pd.read_parquet(
    "https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2022-01.parquet"
)  # 33MB
zones = gpd.read_file(
    "https://d37ci6vzurychx.cloudfront.net/misc/taxi_zones.zip"
)  # 1MB
lookup = pd.read_csv("https://d37ci6vzurychx.cloudfront.net/misc/taxi+_zone_lookup.csv")

# create variables for day and hour
trips["date"] = trips.tpep_pickup_datetime.dt.date
trips["hour"] = trips.tpep_pickup_datetime.dt.hour

# use groupby over five columns to create a mutli-indexed DataFrame with aggregations
# and create a Dataset backed by sparse arrays
taxi_trips = xr.Dataset.from_dataframe(
    trips[  # filter only trips with known locations
        trips.PULocationID.isin(zones.LocationID)
        & trips.DOLocationID.isin(zones.LocationID)
    ]
    .groupby(["payment_type", "date", "hour", "PULocationID", "DOLocationID"])
    .agg(
        {
            "trip_distance": "mean",
            "VendorID": "count",
            "tip_amount": "mean",
            "fare_amount": "mean",
        }
    ),
)

# Replace int codes with labels
taxi_trips["payment_type"] = [
    "Credit card",
    "Cash",
    "No charge",
    "Dispute",
    "Unknown",
    "Voided trip",
]

# create linkable geometry variable
taxi_zones = (
    lookup.merge(
        zones.dissolve("LocationID")[["zone", "geometry"]],
        left_on="Zone",
        right_on="zone",
        how="left",
    )
    .set_index("LocationID")
    .geometry
)
# replace location IDs with actual geometries
taxi_trips["PULocationID"] = taxi_zones.loc[taxi_trips.PULocationID].values
taxi_trips["DOLocationID"] = taxi_zones.loc[taxi_trips.DOLocationID].values

# rename
taxi_trips = taxi_trips.rename(
    {"PULocationID": "origin", "DOLocationID": "destination", "VendorID": "trips_count"}
)

# assing GeometryIndex
taxi_trips = taxi_trips.xvec.set_geom_indexes(["origin", "destination"], crs=zones.crs)
taxi_trips
<xarray.Dataset>
Dimensions:        (payment_type: 6, date: 40, hour: 24, origin: 252,
                    destination: 257)
Coordinates:
  * payment_type   (payment_type) <U11 'Credit card' 'Cash' ... 'Voided trip'
  * date           (date) object 2008-12-31 2009-01-01 ... 2022-04-06 2022-05-18
  * hour           (hour) int32 0 1 2 3 4 5 6 7 8 ... 15 16 17 18 19 20 21 22 23
  * origin         (origin) object POLYGON ((933100.9183527103 192536.0856972...
  * destination    (destination) object POLYGON ((933100.9183527103 192536.08...
Data variables:
    trip_distance  (payment_type, date, hour, origin, destination) float64 na...
    trips_count    (payment_type, date, hour, origin, destination) float64 na...
    tip_amount     (payment_type, date, hour, origin, destination) float64 na...
    fare_amount    (payment_type, date, hour, origin, destination) float64 na...
Indexes:
    origin         GeometryIndex (crs=EPSG:2263)
    destination    GeometryIndex (crs=EPSG:2263)

You can see that each GeometryIndex has a CRS assigned.

taxi_trips.xindexes
Indexes:
    payment_type  PandasIndex
    date          PandasIndex
    hour          PandasIndex
    origin        GeometryIndex (crs=EPSG:2263)
    destination   GeometryIndex (crs=EPSG:2263)

The pyproj.CRS class is available via the .crs attribute.

taxi_trips.origin.crs
<Projected CRS: EPSG:2263>
Name: NAD83 / New York Long Island (ftUS)
Axis Info [cartesian]:
- X[east]: Easting (US survey foot)
- Y[north]: Northing (US survey foot)
Area of Use:
- name: United States (USA) - New York - counties of Bronx; Kings; Nassau; New York; Queens; Richmond; Suffolk.
- bounds: (-74.26, 40.47, -71.8, 41.3)
Coordinate Operation:
- name: SPCS83 New York Long Island zone (US survey foot)
- method: Lambert Conic Conformal (2SP)
Datum: North American Datum 1983
- Ellipsoid: GRS 1980
- Prime Meridian: Greenwich

And in the attrs. This duplication ensures that the CRS information is preserved when the dimension loses GeometryIndex due to some operations.

taxi_trips.origin.attrs["crs"]
<Projected CRS: EPSG:2263>
Name: NAD83 / New York Long Island (ftUS)
Axis Info [cartesian]:
- X[east]: Easting (US survey foot)
- Y[north]: Northing (US survey foot)
Area of Use:
- name: United States (USA) - New York - counties of Bronx; Kings; Nassau; New York; Queens; Richmond; Suffolk.
- bounds: (-74.26, 40.47, -71.8, 41.3)
Coordinate Operation:
- name: SPCS83 New York Long Island zone (US survey foot)
- method: Lambert Conic Conformal (2SP)
Datum: North American Datum 1983
- Ellipsoid: GRS 1980
- Prime Meridian: Greenwich

Setting a Projection#

Like in GeoPandas, there are two relevant operations for projections: setting a projection and re-projecting.

Unlike GeoPandas, the vector data cube is more often created in memory from other objects than read from a file, so a user is responsible for setting a projection to every DataArray with geometries.

The most common way to set a projection is to create a GeometryIndex using the .xvec.set_geom_indexes() method. It takes a crs attribute that assigns the CRS to all indexes created. Above, the CRS is assigned on this line, taking the CRS information directly from the original GeoDataFrame:

taxi_trips.xvec.set_geom_indexes(["origin", "destination"], crs=zones.crs)

If you don’t assign a CRS at this stage or each of your GeometryIndex uses a different one, you can use the .xvec.set_crs() method later. However, the method does not allow you to override existing CRS that is different unless you specify allow_override=True. Note that you should rarely need to override a CRS. Note that assignment of a CRS via .xvec.set_crs() does NOT change the underlying geometry. If you want to re-project to a different CRS, use .xvec.to_crs() instead.

taxi_trips.xvec.set_crs(origin=4326, destination=3857, allow_override=True)
<xarray.Dataset>
Dimensions:        (payment_type: 6, date: 40, hour: 24, origin: 252,
                    destination: 257)
Coordinates:
  * payment_type   (payment_type) <U11 'Credit card' 'Cash' ... 'Voided trip'
  * date           (date) object 2008-12-31 2009-01-01 ... 2022-04-06 2022-05-18
  * hour           (hour) int32 0 1 2 3 4 5 6 7 8 ... 15 16 17 18 19 20 21 22 23
  * origin         (origin) object POLYGON ((933100.9183527103 192536.0856972...
  * destination    (destination) object POLYGON ((933100.9183527103 192536.08...
Data variables:
    trip_distance  (payment_type, date, hour, origin, destination) float64 na...
    trips_count    (payment_type, date, hour, origin, destination) float64 na...
    tip_amount     (payment_type, date, hour, origin, destination) float64 na...
    fare_amount    (payment_type, date, hour, origin, destination) float64 na...
Indexes:
    origin         GeometryIndex (crs=EPSG:4326)
    destination    GeometryIndex (crs=EPSG:3857)

Re-projecting#

Geometries can be re-projected from one CRS to another using the .xvec.to_crs() method. You can see that not only the CRS information has changed but also the coordinates of geometries.

taxi_trips.xvec.to_crs(origin=4326, destination=3857)
<xarray.Dataset>
Dimensions:        (payment_type: 6, date: 40, hour: 24, origin: 252,
                    destination: 257)
Coordinates:
  * payment_type   (payment_type) <U11 'Credit card' 'Cash' ... 'Voided trip'
  * date           (date) object 2008-12-31 2009-01-01 ... 2022-04-06 2022-05-18
  * hour           (hour) int32 0 1 2 3 4 5 6 7 8 ... 15 16 17 18 19 20 21 22 23
  * origin         (origin) object POLYGON ((-74.1844528021319 40.69499597816...
  * destination    (destination) object POLYGON ((-8258175.510710959 4967457....
Data variables:
    trip_distance  (payment_type, date, hour, origin, destination) float64 na...
    trips_count    (payment_type, date, hour, origin, destination) float64 na...
    tip_amount     (payment_type, date, hour, origin, destination) float64 na...
    fare_amount    (payment_type, date, hour, origin, destination) float64 na...
Indexes:
    origin         GeometryIndex (crs=EPSG:4326)
    destination    GeometryIndex (crs=EPSG:3857)

CRS input#

You can pass CRS information in nearly any common format. The pyproj.CRS class is created via the pyproj.CRS.from_user_input() method that can take anything like:

  • PROJ string

  • Dictionary of PROJ parameters

  • PROJ keyword arguments for parameters

  • JSON string with PROJ parameters

  • CRS WKT string

  • An authority string [i.e. "epsg:4326"]

  • An EPSG integer code [i.e. 4326]

  • A tuple of (“auth_name”: “auth_code”) [i.e ("epsg", "4326")]

  • An object with a to_wkt method.

  • A pyproj.crs.CRS class

When passing a CRS from a GeoPandas object, you pass a pyproj.crs.CRS class.