Extracting points from a geospatial raster#

When you have an N-dimensional raster cube (Dataset or DataArray) composed of geospatial data indexed by latitude and longitude or any projected coordinates, you may be interested in the values from that cube only for a small subset of locations. Vector data cubes are an ideal data structure for such a use case as they preserve the structure of the original cube and all its attributes while allowing you to index it by a point geometry. The geometry can represent any arbitrary points within the bounds of the original raster and does not have to be constrained to a grid.

import geopandas as gpd
import numpy as np
import shapely
import xarray as xr
import xvec

The example using the ERA-Interim reanalysis, monthly averages of upper level data:

ds = xr.tutorial.open_dataset("eraint_uvz")
ds
<xarray.Dataset>
Dimensions:    (longitude: 480, latitude: 241, level: 3, month: 2)
Coordinates:
  * longitude  (longitude) float32 -180.0 -179.2 -178.5 ... 177.8 178.5 179.2
  * latitude   (latitude) float32 90.0 89.25 88.5 87.75 ... -88.5 -89.25 -90.0
  * level      (level) int32 200 500 850
  * month      (month) int32 1 7
Data variables:
    z          (month, level, latitude, longitude) float64 ...
    u          (month, level, latitude, longitude) float64 ...
    v          (month, level, latitude, longitude) float64 ...
Attributes:
    Conventions:  CF-1.0
    Info:         Monthly ERA-Interim data. Downloaded and edited by fabien.m...

This Dataset is indexed by longitude and latitude representing the spatial grid. When sampling points using ds.xvec.extract_points, you are replacing these two dimensions with a single one with shapely geometry.

Array of shapely geometries#

Create an array of points used for sampling. Usually, you would have specific locations, like weather stations, cities or anything else of interest. Here, we can create points as a range within the bounds.

points = shapely.points(
    np.linspace(ds.longitude.min(), ds.longitude.max(), 10),
    np.linspace(ds.latitude.min(), ds.latitude.max(), 10),
)
points
array([<POINT (-180 -90)>, <POINT (-140.083 -70)>, <POINT (-100.167 -50)>,
       <POINT (-60.25 -30)>, <POINT (-20.333 -10)>, <POINT (19.583 10)>,
       <POINT (59.5 30)>, <POINT (99.417 50)>, <POINT (139.333 70)>,
       <POINT (179.25 90)>], dtype=object)

Using the .xvec.extract_points method with a numpy array of geometries will create a Dataset indexed by the GeometryIndex:

extracted = ds.xvec.extract_points(points, x_coords="longitude", y_coords="latitude")
extracted
<xarray.Dataset>
Dimensions:   (level: 3, month: 2, geometry: 10)
Coordinates:
  * level     (level) int32 200 500 850
  * month     (month) int32 1 7
  * geometry  (geometry) object POINT (-180 -90) ... POINT (179.25 90)
Data variables:
    z         (month, level, geometry) float64 ...
    u         (month, level, geometry) float64 ...
    v         (month, level, geometry) float64 ...
Indexes:
    geometry  GeometryIndex (crs=None)
Attributes:
    Conventions:  CF-1.0
    Info:         Monthly ERA-Interim data. Downloaded and edited by fabien.m...

However, since the numpy array of geometries does not hold any information on CRS, the resulting GeometryIndex has no CRS assigned.

extracted.xindexes
Indexes:
    level     PandasIndex
    month     PandasIndex
    geometry  GeometryIndex (crs=None)

In that situation, you can (and should) specify it manually:

extracted = ds.xvec.extract_points(
    points, x_coords="longitude", y_coords="latitude", crs=4326
)
extracted
<xarray.Dataset>
Dimensions:   (level: 3, month: 2, geometry: 10)
Coordinates:
  * level     (level) int32 200 500 850
  * month     (month) int32 1 7
  * geometry  (geometry) object POINT (-180 -90) ... POINT (179.25 90)
Data variables:
    z         (month, level, geometry) float64 ...
    u         (month, level, geometry) float64 ...
    v         (month, level, geometry) float64 ...
Indexes:
    geometry  GeometryIndex (crs=EPSG:4326)
Attributes:
    Conventions:  CF-1.0
    Info:         Monthly ERA-Interim data. Downloaded and edited by fabien.m...
extracted.geometry.crs
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

GeoPandas GeoSeries#

If your points are stored as a GeoPandas GeoSeries or GeometryArray, Xvec will retrieve the CRS automatically.

gs = gpd.GeoSeries(points, crs=4326)
gs
0    POINT (-180.00000 -90.00000)
1    POINT (-140.08333 -70.00000)
2    POINT (-100.16667 -50.00000)
3     POINT (-60.25000 -30.00000)
4     POINT (-20.33333 -10.00000)
5       POINT (19.58333 10.00000)
6       POINT (59.50000 30.00000)
7       POINT (99.41667 50.00000)
8      POINT (139.33333 70.00000)
9      POINT (179.25000 90.00000)
dtype: geometry
extracted = ds.xvec.extract_points(gs, x_coords="longitude", y_coords="latitude")
extracted
<xarray.Dataset>
Dimensions:   (level: 3, month: 2, geometry: 10)
Coordinates:
  * level     (level) int32 200 500 850
  * month     (month) int32 1 7
  * geometry  (geometry) object POINT (-180 -90) ... POINT (179.25 90)
Data variables:
    z         (month, level, geometry) float64 ...
    u         (month, level, geometry) float64 ...
    v         (month, level, geometry) float64 ...
Indexes:
    geometry  GeometryIndex (crs=EPSG:4326)
Attributes:
    Conventions:  CF-1.0
    Info:         Monthly ERA-Interim data. Downloaded and edited by fabien.m...

When using a GeoSeries, you can also store its index as another coordinate of the geometry dimension. This may be useful when a GeoSeries comes from a GeoDataFrame and you want to later link the results of the extraction back to the GeoDataFrame. The index is stored automatically if it is named or non-default, unless you specify index=False.

extracted = ds.xvec.extract_points(
    gs, x_coords="longitude", y_coords="latitude", index=True
)
extracted
<xarray.Dataset>
Dimensions:   (level: 3, month: 2, geometry: 10)
Coordinates:
  * level     (level) int32 200 500 850
  * month     (month) int32 1 7
  * geometry  (geometry) object POINT (-180 -90) ... POINT (179.25 90)
    index     (geometry) int64 0 1 2 3 4 5 6 7 8 9
Data variables:
    z         (month, level, geometry) float64 ...
    u         (month, level, geometry) float64 ...
    v         (month, level, geometry) float64 ...
Indexes:
    geometry  GeometryIndex (crs=EPSG:4326)
Attributes:
    Conventions:  CF-1.0
    Info:         Monthly ERA-Interim data. Downloaded and edited by fabien.m...

DataArray#

It will also be used if you have a DataArray of shapely geometries with a "crs" key with the CRS information in its attributes. The typical situation is to reuse a DataArray created by xvec before.

extracted.geometry
<xarray.DataArray 'geometry' (geometry: 10)>
array([<POINT (-180 -90)>, <POINT (-140.083 -70)>, <POINT (-100.167 -50)>,
       <POINT (-60.25 -30)>, <POINT (-20.333 -10)>, <POINT (19.583 10)>,
       <POINT (59.5 30)>, <POINT (99.417 50)>, <POINT (139.333 70)>,
       <POINT (179.25 90)>], dtype=object)
Coordinates:
  * geometry  (geometry) object POINT (-180 -90) ... POINT (179.25 90)
    index     (geometry) int64 0 1 2 3 4 5 6 7 8 9
Indexes:
    geometry  GeometryIndex (crs=EPSG:4326)
Attributes:
    crs:      EPSG:4326

You can see above that the extracted.geometry has a crs stored as an attribute.

extracted_da = ds.xvec.extract_points(
    extracted.geometry, x_coords="longitude", y_coords="latitude"
)
extracted_da
<xarray.Dataset>
Dimensions:   (level: 3, month: 2, geometry: 10)
Coordinates:
  * level     (level) int32 200 500 850
  * month     (month) int32 1 7
  * geometry  (geometry) object POINT (-180 -90) ... POINT (179.25 90)
    index     (geometry) int64 0 1 2 3 4 5 6 7 8 9
Data variables:
    z         (month, level, geometry) float64 ...
    u         (month, level, geometry) float64 ...
    v         (month, level, geometry) float64 ...
Indexes:
    geometry  GeometryIndex (crs=EPSG:4326)
Attributes:
    Conventions:  CF-1.0
    Info:         Monthly ERA-Interim data. Downloaded and edited by fabien.m...