Extracting points from a geospatial raster
Contents
Extracting points from a geospatial raster#
When you have an N-dimensional raster cube (Dataset
or DataArray
) composed of geospatial data indexed by latitude and longitude or any projected coordinates, you may be interested in the values from that cube only for a small subset of locations. Vector data cubes are an ideal data structure for such a use case as they preserve the structure of the original cube and all its attributes while allowing you to index it by a point geometry. The geometry can represent any arbitrary points within the bounds of the original raster and does not have to be constrained to a grid.
import geopandas as gpd
import numpy as np
import shapely
import xarray as xr
import xvec
The example using the ERA-Interim reanalysis, monthly averages of upper level data:
ds = xr.tutorial.open_dataset("eraint_uvz")
ds
<xarray.Dataset> Dimensions: (longitude: 480, latitude: 241, level: 3, month: 2) Coordinates: * longitude (longitude) float32 -180.0 -179.2 -178.5 ... 177.8 178.5 179.2 * latitude (latitude) float32 90.0 89.25 88.5 87.75 ... -88.5 -89.25 -90.0 * level (level) int32 200 500 850 * month (month) int32 1 7 Data variables: z (month, level, latitude, longitude) float64 ... u (month, level, latitude, longitude) float64 ... v (month, level, latitude, longitude) float64 ... Attributes: Conventions: CF-1.0 Info: Monthly ERA-Interim data. Downloaded and edited by fabien.m...
This Dataset is indexed by longitude and latitude representing the spatial grid. When sampling points using ds.xvec.extract_points
, you are replacing these two dimensions with a single one with shapely geometry.
Array of shapely geometries#
Create an array of points used for sampling. Usually, you would have specific locations, like weather stations, cities or anything else of interest. Here, we can create points as a range within the bounds.
points = shapely.points(
np.linspace(ds.longitude.min(), ds.longitude.max(), 10),
np.linspace(ds.latitude.min(), ds.latitude.max(), 10),
)
points
array([<POINT (-180 -90)>, <POINT (-140.083 -70)>, <POINT (-100.167 -50)>,
<POINT (-60.25 -30)>, <POINT (-20.333 -10)>, <POINT (19.583 10)>,
<POINT (59.5 30)>, <POINT (99.417 50)>, <POINT (139.333 70)>,
<POINT (179.25 90)>], dtype=object)
Using the .xvec.extract_points
method with a numpy array of geometries will create a Dataset
indexed by the GeometryIndex
:
extracted = ds.xvec.extract_points(points, x_coords="longitude", y_coords="latitude")
extracted
<xarray.Dataset> Dimensions: (level: 3, month: 2, geometry: 10) Coordinates: * level (level) int32 200 500 850 * month (month) int32 1 7 * geometry (geometry) object POINT (-180 -90) ... POINT (179.25 90) Data variables: z (month, level, geometry) float64 ... u (month, level, geometry) float64 ... v (month, level, geometry) float64 ... Indexes: geometry GeometryIndex (crs=None) Attributes: Conventions: CF-1.0 Info: Monthly ERA-Interim data. Downloaded and edited by fabien.m...
However, since the numpy array of geometries does not hold any information on CRS, the resulting GeometryIndex
has no CRS assigned.
extracted.xindexes
Indexes:
level PandasIndex
month PandasIndex
geometry GeometryIndex (crs=None)
In that situation, you can (and should) specify it manually:
extracted = ds.xvec.extract_points(
points, x_coords="longitude", y_coords="latitude", crs=4326
)
extracted
<xarray.Dataset> Dimensions: (level: 3, month: 2, geometry: 10) Coordinates: * level (level) int32 200 500 850 * month (month) int32 1 7 * geometry (geometry) object POINT (-180 -90) ... POINT (179.25 90) Data variables: z (month, level, geometry) float64 ... u (month, level, geometry) float64 ... v (month, level, geometry) float64 ... Indexes: geometry GeometryIndex (crs=EPSG:4326) Attributes: Conventions: CF-1.0 Info: Monthly ERA-Interim data. Downloaded and edited by fabien.m...
extracted.geometry.crs
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
GeoPandas GeoSeries#
If your points are stored as a GeoPandas GeoSeries
or GeometryArray
, Xvec will retrieve the CRS automatically.
gs = gpd.GeoSeries(points, crs=4326)
gs
0 POINT (-180.00000 -90.00000)
1 POINT (-140.08333 -70.00000)
2 POINT (-100.16667 -50.00000)
3 POINT (-60.25000 -30.00000)
4 POINT (-20.33333 -10.00000)
5 POINT (19.58333 10.00000)
6 POINT (59.50000 30.00000)
7 POINT (99.41667 50.00000)
8 POINT (139.33333 70.00000)
9 POINT (179.25000 90.00000)
dtype: geometry
extracted = ds.xvec.extract_points(gs, x_coords="longitude", y_coords="latitude")
extracted
<xarray.Dataset> Dimensions: (level: 3, month: 2, geometry: 10) Coordinates: * level (level) int32 200 500 850 * month (month) int32 1 7 * geometry (geometry) object POINT (-180 -90) ... POINT (179.25 90) Data variables: z (month, level, geometry) float64 ... u (month, level, geometry) float64 ... v (month, level, geometry) float64 ... Indexes: geometry GeometryIndex (crs=EPSG:4326) Attributes: Conventions: CF-1.0 Info: Monthly ERA-Interim data. Downloaded and edited by fabien.m...
When using a GeoSeries, you can also store its index as another coordinate of the geometry dimension. This may be useful when a GeoSeries comes from a GeoDataFrame and you want to later link the results of the extraction back to the GeoDataFrame. The index is stored automatically if it is named or non-default, unless you specify index=False
.
extracted = ds.xvec.extract_points(gs, x_coords="longitude", y_coords="latitude", index=True)
extracted
<xarray.Dataset> Dimensions: (level: 3, month: 2, geometry: 10) Coordinates: * level (level) int32 200 500 850 * month (month) int32 1 7 * geometry (geometry) object POINT (-180 -90) ... POINT (179.25 90) index (geometry) int64 0 1 2 3 4 5 6 7 8 9 Data variables: z (month, level, geometry) float64 ... u (month, level, geometry) float64 ... v (month, level, geometry) float64 ... Indexes: geometry GeometryIndex (crs=EPSG:4326) Attributes: Conventions: CF-1.0 Info: Monthly ERA-Interim data. Downloaded and edited by fabien.m...
DataArray#
It will also be used if you have a DataArray of shapely geometries with a "crs"
key with the CRS information in its attributes. The typical situation is to reuse a DataArray created by xvec before.
extracted.geometry
<xarray.DataArray 'geometry' (geometry: 10)> array([<POINT (-180 -90)>, <POINT (-140.083 -70)>, <POINT (-100.167 -50)>, <POINT (-60.25 -30)>, <POINT (-20.333 -10)>, <POINT (19.583 10)>, <POINT (59.5 30)>, <POINT (99.417 50)>, <POINT (139.333 70)>, <POINT (179.25 90)>], dtype=object) Coordinates: * geometry (geometry) object POINT (-180 -90) ... POINT (179.25 90) index (geometry) int64 0 1 2 3 4 5 6 7 8 9 Indexes: geometry GeometryIndex (crs=EPSG:4326) Attributes: crs: epsg:4326
You can see above that the extracted.geometry
has a crs
stored as an attribute.
extracted_da = ds.xvec.extract_points(
extracted.geometry, x_coords="longitude", y_coords="latitude"
)
extracted_da
<xarray.Dataset> Dimensions: (level: 3, month: 2, geometry: 10) Coordinates: * level (level) int32 200 500 850 * month (month) int32 1 7 * geometry (geometry) object POINT (-180 -90) ... POINT (179.25 90) index (geometry) int64 0 1 2 3 4 5 6 7 8 9 Data variables: z (month, level, geometry) float64 ... u (month, level, geometry) float64 ... v (month, level, geometry) float64 ... Indexes: geometry GeometryIndex (crs=EPSG:4326) Attributes: Conventions: CF-1.0 Info: Monthly ERA-Interim data. Downloaded and edited by fabien.m...