Extracting points from a geospatial raster#
When you have an N-dimensional raster cube (Dataset or DataArray) composed of geospatial data indexed by latitude and longitude or any projected coordinates, you may be interested in the values from that cube only for a small subset of locations. Vector data cubes are an ideal data structure for such a use case as they preserve the structure of the original cube and all its attributes while allowing you to index it by a point geometry. The geometry can represent any arbitrary points within the bounds of the original raster and does not have to be constrained to a grid.
import geopandas as gpd
import numpy as np
import shapely
import xarray as xr
import xvec
The example using the ERA-Interim reanalysis, monthly averages of upper level data:
ds = xr.tutorial.open_dataset("eraint_uvz")
ds
/Users/martin/dev/xvec/.pixi/envs/default/lib/python3.13/site-packages/xarray/conventions.py:200: SerializationWarning: variable 'z' has non-conforming '_FillValue' np.float64(nan) defined, dropping '_FillValue' entirely.
var = coder.decode(var, name=name)
/Users/martin/dev/xvec/.pixi/envs/default/lib/python3.13/site-packages/xarray/conventions.py:200: SerializationWarning: variable 'u' has non-conforming '_FillValue' np.float64(nan) defined, dropping '_FillValue' entirely.
var = coder.decode(var, name=name)
/Users/martin/dev/xvec/.pixi/envs/default/lib/python3.13/site-packages/xarray/conventions.py:200: SerializationWarning: variable 'v' has non-conforming '_FillValue' np.float64(nan) defined, dropping '_FillValue' entirely.
var = coder.decode(var, name=name)
<xarray.Dataset> Size: 17MB
Dimensions: (longitude: 480, latitude: 241, level: 3, month: 2)
Coordinates:
* longitude (longitude) float32 2kB -180.0 -179.2 -178.5 ... 178.5 179.2
* latitude (latitude) float32 964B 90.0 89.25 88.5 ... -88.5 -89.25 -90.0
* level (level) int32 12B 200 500 850
* month (month) int32 8B 1 7
Data variables:
z (month, level, latitude, longitude) float64 6MB ...
u (month, level, latitude, longitude) float64 6MB ...
v (month, level, latitude, longitude) float64 6MB ...
Attributes:
Conventions: CF-1.0
Info: Monthly ERA-Interim data. Downloaded and edited by fabien.m...This Dataset is indexed by longitude and latitude representing the spatial grid. When sampling points using ds.xvec.extract_points, you are replacing these two dimensions with a single one with shapely geometry.
Array of shapely geometries#
Create an array of points used for sampling. Usually, you would have specific locations, like weather stations, cities or anything else of interest. Here, we can create points as a range within the bounds.
points = shapely.points(
np.linspace(ds.longitude.min().item(), ds.longitude.max().item(), 10),
np.linspace(ds.latitude.min().item(), ds.latitude.max().item(), 10),
)
points
array([<POINT (-180 -90)>, <POINT (-140.083 -70)>, <POINT (-100.167 -50)>,
<POINT (-60.25 -30)>, <POINT (-20.333 -10)>, <POINT (19.583 10)>,
<POINT (59.5 30)>, <POINT (99.417 50)>, <POINT (139.333 70)>,
<POINT (179.25 90)>], dtype=object)
Using the .xvec.extract_points method with a numpy array of geometries will create a Dataset indexed by the GeometryIndex:
extracted = ds.xvec.extract_points(points, x_coords="longitude", y_coords="latitude")
extracted
<xarray.Dataset> Size: 2kB
Dimensions: (level: 3, month: 2, geometry: 10)
Coordinates:
* level (level) int32 12B 200 500 850
* month (month) int32 8B 1 7
* geometry (geometry) object 80B POINT (-180 -90) ... POINT (179.25 90)
Data variables:
z (month, level, geometry) float64 480B ...
u (month, level, geometry) float64 480B ...
v (month, level, geometry) float64 480B ...
Indexes:
geometry GeometryIndex (crs=None)
Attributes:
Conventions: CF-1.0
Info: Monthly ERA-Interim data. Downloaded and edited by fabien.m...However, since the numpy array of geometries does not hold any information on CRS, the resulting GeometryIndex has no CRS assigned.
extracted.xindexes
Indexes:
level PandasIndex
month PandasIndex
geometry GeometryIndex (crs=None)
In that situation, you can (and should) specify it manually:
extracted = ds.xvec.extract_points(
points, x_coords="longitude", y_coords="latitude", crs=4326
)
extracted
<xarray.Dataset> Size: 2kB
Dimensions: (level: 3, month: 2, geometry: 10)
Coordinates:
* level (level) int32 12B 200 500 850
* month (month) int32 8B 1 7
* geometry (geometry) object 80B POINT (-180 -90) ... POINT (179.25 90)
Data variables:
z (month, level, geometry) float64 480B ...
u (month, level, geometry) float64 480B ...
v (month, level, geometry) float64 480B ...
Indexes:
geometry GeometryIndex (crs=EPSG:4326)
Attributes:
Conventions: CF-1.0
Info: Monthly ERA-Interim data. Downloaded and edited by fabien.m...extracted.geometry.crs
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
GeoPandas GeoSeries#
If your points are stored as a GeoPandas GeoSeries or GeometryArray, Xvec will retrieve the CRS automatically.
gs = gpd.GeoSeries(points, crs=4326)
gs
0 POINT (-180 -90)
1 POINT (-140.08333 -70)
2 POINT (-100.16667 -50)
3 POINT (-60.25 -30)
4 POINT (-20.33333 -10)
5 POINT (19.58333 10)
6 POINT (59.5 30)
7 POINT (99.41667 50)
8 POINT (139.33333 70)
9 POINT (179.25 90)
dtype: geometry
extracted = ds.xvec.extract_points(gs, x_coords="longitude", y_coords="latitude")
extracted
<xarray.Dataset> Size: 2kB
Dimensions: (level: 3, month: 2, geometry: 10)
Coordinates:
* level (level) int32 12B 200 500 850
* month (month) int32 8B 1 7
* geometry (geometry) object 80B POINT (-180 -90) ... POINT (179.25 90)
Data variables:
z (month, level, geometry) float64 480B ...
u (month, level, geometry) float64 480B ...
v (month, level, geometry) float64 480B ...
Indexes:
geometry GeometryIndex (crs=EPSG:4326)
Attributes:
Conventions: CF-1.0
Info: Monthly ERA-Interim data. Downloaded and edited by fabien.m...When using a GeoSeries, you can also store its index as another coordinate of the geometry dimension. This may be useful when a GeoSeries comes from a GeoDataFrame and you want to later link the results of the extraction back to the GeoDataFrame. The index is stored automatically if it is named or non-default, unless you specify index=False.
extracted = ds.xvec.extract_points(
gs, x_coords="longitude", y_coords="latitude", index=True
)
extracted
<xarray.Dataset> Size: 2kB
Dimensions: (level: 3, month: 2, geometry: 10)
Coordinates:
* level (level) int32 12B 200 500 850
* month (month) int32 8B 1 7
* geometry (geometry) object 80B POINT (-180 -90) ... POINT (179.25 90)
index (geometry) int64 80B 0 1 2 3 4 5 6 7 8 9
Data variables:
z (month, level, geometry) float64 480B ...
u (month, level, geometry) float64 480B ...
v (month, level, geometry) float64 480B ...
Indexes:
geometry GeometryIndex (crs=EPSG:4326)
Attributes:
Conventions: CF-1.0
Info: Monthly ERA-Interim data. Downloaded and edited by fabien.m...DataArray#
It will also be used if you have an one-dimensional DataArray of shapely geometries with a "crs" key with the CRS information in its attributes. The typical situation is to reuse a DataArray created by xvec before.
extracted.geometry
<xarray.DataArray 'geometry' (geometry: 10)> Size: 80B
array([<POINT (-180 -90)>, <POINT (-140.083 -70)>, <POINT (-100.167 -50)>,
<POINT (-60.25 -30)>, <POINT (-20.333 -10)>, <POINT (19.583 10)>,
<POINT (59.5 30)>, <POINT (99.417 50)>, <POINT (139.333 70)>,
<POINT (179.25 90)>], dtype=object)
Coordinates:
* geometry (geometry) object 80B POINT (-180 -90) ... POINT (179.25 90)
index (geometry) int64 80B 0 1 2 3 4 5 6 7 8 9
Indexes:
geometry GeometryIndex (crs=EPSG:4326)
Attributes:
crs: EPSG:4326You can see above that the extracted.geometry has a crs stored as an attribute.
extracted_da = ds.xvec.extract_points(
extracted.geometry, x_coords="longitude", y_coords="latitude"
)
extracted_da
<xarray.Dataset> Size: 2kB
Dimensions: (level: 3, month: 2, geometry: 10)
Coordinates:
* level (level) int32 12B 200 500 850
* month (month) int32 8B 1 7
* geometry (geometry) object 80B POINT (-180 -90) ... POINT (179.25 90)
index (geometry) int64 80B 0 1 2 3 4 5 6 7 8 9
Data variables:
z (month, level, geometry) float64 480B ...
u (month, level, geometry) float64 480B ...
v (month, level, geometry) float64 480B ...
Indexes:
geometry GeometryIndex (crs=EPSG:4326)
Attributes:
Conventions: CF-1.0
Info: Monthly ERA-Interim data. Downloaded and edited by fabien.m...