GeoPandas interface#

Due to the nature of operations done with the vector data cubes, you will often need to convert the Xarray object to GeoPandas GeoDataFrames. Xvec offers a set of methods extending support for exporting Xarray to pandas.

import geopandas as gpd
import numpy as np
import pandas as pd
import xarray as xr
import xvec

from geodatasets import get_path

Create a dummy DataArray with two dimensions indexed by geometry, mimicking transport counts between communities of Chicago.

Xvec offers .xvec.to_geopandas() and .xvec.to_geodataframe() methods as counterparts to the default to_pandas() and to_dataframe() methods built-in Xarray. While the former works on arrays with 2 dimensions or less, the latter is generalisable and always returns all coordinates. Consider the following examples.

Select one day, one time and one mode and convert the data to a GeoDataFrame.

traffic_counts.sel(date="2023-02-28", time=12, mode="bike").xvec.to_geodataframe(
    geometry="origin"
)

	origin	destination	mode	date	time	traffic_counts
0	POLYGON ((-87.60914 41.84469, -87.60915 41.844...	POLYGON ((-87.60914 41.84469, -87.60915 41.844...	bike	2023-02-28	12	19
1	POLYGON ((-87.60914 41.84469, -87.60915 41.844...	POLYGON ((-87.59215 41.81693, -87.59231 41.816...	bike	2023-02-28	12	23
2	POLYGON ((-87.60914 41.84469, -87.60915 41.844...	POLYGON ((-87.6288 41.80189, -87.62879 41.8017...	bike	2023-02-28	12	20
3	POLYGON ((-87.60914 41.84469, -87.60915 41.844...	POLYGON ((-87.60671 41.81681, -87.6067 41.8165...	bike	2023-02-28	12	13
4	POLYGON ((-87.60914 41.84469, -87.60915 41.844...	POLYGON ((-87.59215 41.81693, -87.59215 41.816...	bike	2023-02-28	12	13
...	...	...	...	...	...	...
5924	POLYGON ((-87.80676 42.00084, -87.80676 42.000...	POLYGON ((-87.69646 41.70714, -87.69644 41.706...	bike	2023-02-28	12	53
5925	POLYGON ((-87.80676 42.00084, -87.80676 42.000...	POLYGON ((-87.64215 41.68508, -87.64249 41.685...	bike	2023-02-28	12	47
5926	POLYGON ((-87.80676 42.00084, -87.80676 42.000...	MULTIPOLYGON (((-87.83658 41.9864, -87.83658 4...	bike	2023-02-28	12	26
5927	POLYGON ((-87.80676 42.00084, -87.80676 42.000...	POLYGON ((-87.65456 41.99817, -87.65456 41.998...	bike	2023-02-28	12	75
5928	POLYGON ((-87.80676 42.00084, -87.80676 42.000...	POLYGON ((-87.80676 42.00084, -87.80676 42.000...	bike	2023-02-28	12	62

5929 rows × 6 columns

You end up with two geometry columns. Because none is primary by default, the cell above contains geometry="origin" to specify which of them shall become the active geometry of the GeoDataFrame. You can see that all coordinates are present, but mode, date and time are constants.

However, exporting the same using .xvec.to_geopandas() does not work because both index and columns are arrays of geometries.

traffic_counts.sel(date="2023-02-28", time=12, mode="bike").xvec.to_geopandas()

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[4], line 1
----> 1 traffic_counts.sel(date="2023-02-28", time=12, mode="bike").xvec.to_geopandas()

File ~/Git/xvec/xvec/accessor.py:748, in XvecAccessor.to_geopandas(self)
    742     raise ValueError(
    743         f"Cannot convert arrays with {self._obj.ndim} dimensions into pandas "
    744         "objects. Requires 2 or fewer dimensions."
    745     )
    747 if len(self._geom_indexes) > 1:
--> 748     raise ValueError(
    749         "Multiple coordinates based on xvec.GeometryIndex are not supported "
    750         "as GeoPandas.GeoDataFrame cannot be indexed by geometry. Try using "
    751         "`.xvec.to_geodataframe()` instead."
    752     )
    754 # DataArray
    755 if isinstance(self._obj, xr.DataArray):

ValueError: Multiple coordinates based on xvec.GeometryIndex are not supported as GeoPandas.GeoDataFrame cannot be indexed by geometry. Try using `.xvec.to_geodataframe()` instead.

You can use the method if you ensure that only one array of geometries is returned.

traffic_counts.sel(date="2023-02-28", time=12, origin=origin[0]).xvec.to_geopandas()

mode	destination	car	bike	foot
0	POLYGON ((-87.60914 41.84469, -87.60915 41.844...	48	19	97
1	POLYGON ((-87.59215 41.81693, -87.59231 41.816...	77	23	25
2	POLYGON ((-87.6288 41.80189, -87.62879 41.8017...	43	20	59
3	POLYGON ((-87.60671 41.81681, -87.6067 41.8165...	96	13	42
4	POLYGON ((-87.59215 41.81693, -87.59215 41.816...	38	13	48
...	...	...	...	...
72	POLYGON ((-87.69646 41.70714, -87.69644 41.706...	54	13	22
73	POLYGON ((-87.64215 41.68508, -87.64249 41.685...	76	10	82
74	MULTIPOLYGON (((-87.83658 41.9864, -87.83658 4...	49	82	14
75	POLYGON ((-87.65456 41.99817, -87.65456 41.998...	84	28	16
76	POLYGON ((-87.80676 42.00084, -87.80676 42.000...	59	25	12

77 rows × 4 columns

Compared to the to_geodataframe, you can see that constants are removed, and the final GeoDataFrame is much smaller.

traffic_counts.sel(date="2023-02-28", time=12, origin=origin[0]).xvec.to_geodataframe()

	destination	date	time	origin	traffic_counts
mode
car	POLYGON ((-87.60914 41.84469, -87.60915 41.844...	2023-02-28	12	POLYGON ((-87.60914 41.84469, -87.60915 41.844...	48
car	POLYGON ((-87.59215 41.81693, -87.59231 41.816...	2023-02-28	12	POLYGON ((-87.60914 41.84469, -87.60915 41.844...	77
car	POLYGON ((-87.6288 41.80189, -87.62879 41.8017...	2023-02-28	12	POLYGON ((-87.60914 41.84469, -87.60915 41.844...	43
car	POLYGON ((-87.60671 41.81681, -87.6067 41.8165...	2023-02-28	12	POLYGON ((-87.60914 41.84469, -87.60915 41.844...	96
car	POLYGON ((-87.59215 41.81693, -87.59215 41.816...	2023-02-28	12	POLYGON ((-87.60914 41.84469, -87.60915 41.844...	38
...	...	...	...	...	...
foot	POLYGON ((-87.69646 41.70714, -87.69644 41.706...	2023-02-28	12	POLYGON ((-87.60914 41.84469, -87.60915 41.844...	22
foot	POLYGON ((-87.64215 41.68508, -87.64249 41.685...	2023-02-28	12	POLYGON ((-87.60914 41.84469, -87.60915 41.844...	82
foot	MULTIPOLYGON (((-87.83658 41.9864, -87.83658 4...	2023-02-28	12	POLYGON ((-87.60914 41.84469, -87.60915 41.844...	14
foot	POLYGON ((-87.65456 41.99817, -87.65456 41.998...	2023-02-28	12	POLYGON ((-87.60914 41.84469, -87.60915 41.844...	16
foot	POLYGON ((-87.80676 42.00084, -87.80676 42.000...	2023-02-28	12	POLYGON ((-87.60914 41.84469, -87.60915 41.844...	12

231 rows × 5 columns

Long form and wide form#

Unlike generic tabular data, geospatial data are often designed as wide form tables to ensure that each geometry is present exactly one in the GeoDataFrame, which is not the case above. For this reason, Xvec allows you to export a wide form directly from Xarray, given you have exactly one dimension indexed by geometry.

The default is the same as for vanilla Xarray, so a long form data frame.

demography.xvec.to_geodataframe()

	county	population	unemployment	divorce	age
year
1960	POLYGON ((-95.34258 48.5467, -95.34081 48.7151...	4304	7.900000	1.858974	28.800000
1970	POLYGON ((-95.34258 48.5467, -95.34081 48.7151...	3987	9.000000	2.619808	30.500000
1980	POLYGON ((-95.34258 48.5467, -95.34081 48.7151...	3764	5.902579	3.746594	34.500000
1990	POLYGON ((-95.34258 48.5467, -95.34081 48.7151...	4076	3.894790	7.388535	35.500000
1960	POLYGON ((-118.8505 47.94969, -118.84732 48.47...	3889	8.200000	2.863278	25.900000
...	...	...	...	...	...
1990	POLYGON ((-77.53178 38.56506, -77.72094 38.840...	250377	3.353981	5.750177	28.466667
1960	POLYGON ((-111.37152 44.74516, -111.36839 45.3...	39260	5.218039	2.777586	28.250000
1970	POLYGON ((-111.37152 44.74516, -111.36839 45.3...	43766	5.444291	2.724604	26.400000
1980	POLYGON ((-111.37152 44.74516, -111.36839 45.3...	55800	7.017943	4.782264	28.966667
1990	POLYGON ((-111.37152 44.74516, -111.36839 45.3...	65077	5.488553	7.414559	35.333333

12340 rows × 5 columns

You can pass long=False to generate a wide form with multi-indexed columns instead.

demography.xvec.to_geodataframe(long=False)

	county	population				unemployment				divorce				age
year		1960	1970	1980	1990	1960	1970	1980	1990	1960	1970	1980	1990	1960	1970	1980	1990
0	POLYGON ((-95.34258 48.5467, -95.34081 48.7151...	4304	3987	3764	4076	7.900000	9.000000	5.902579	3.894790	1.858974	2.619808	3.746594	7.388535	28.80	30.5	34.500000	35.500000
1	POLYGON ((-118.8505 47.94969, -118.84732 48.47...	3889	3655	5811	6295	8.200000	15.400000	15.422886	16.811594	2.863278	3.686007	6.625442	11.543135	25.90	27.1	27.200000	32.800000
2	POLYGON ((-117.43777 48.04422, -117.54113 48.0...	17884	17405	28979	30948	10.100000	9.000000	13.574064	10.700794	2.711447	2.976378	5.448223	9.123212	29.60	31.8	28.700000	34.500000
3	POLYGON ((-118.97096 47.93928, -118.97293 47.9...	25520	25867	30639	33350	7.500000	10.500000	12.700195	10.203544	3.372041	4.090386	7.122333	9.245627	29.40	31.1	31.200000	35.000000
4	POLYGON ((-117.4375 49, -117.03098 49, -117.02...	6914	6025	8580	8915	10.600000	13.400000	18.148999	14.991023	3.008988	4.010695	5.287860	10.158351	31.20	33.8	31.300000	36.100000
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
3080	POLYGON ((-114.51984 33.02767, -114.5583 33.03...	46235	60827	90554	120739	4.600000	4.800000	8.012906	10.910351	3.702838	4.160574	5.730308	7.798840	25.70	24.6	27.900000	33.900000
3081	POLYGON ((-107.19484 34.58352, -107.7178 34.58...	39085	40576	61115	69029	6.100000	3.800000	8.375127	11.002271	1.880621	2.274648	5.372952	9.078873	20.10	21.7	25.500000	30.450000
3082	POLYGON ((-76.39569 37.10771, -76.4027 37.0905...	21583	33203	44189	53427	2.500000	3.105065	5.037045	4.102888	1.113223	1.550323	3.257639	4.850746	24.40	24.4	29.800000	34.000000
3083	POLYGON ((-77.53178 38.56506, -77.72094 38.840...	50164	111102	166665	250377	3.300000	2.361559	3.708886	3.353981	1.672391	1.468343	3.742207	5.750177	22.30	21.8	25.666667	28.466667
3084	POLYGON ((-111.37152 44.74516, -111.36839 45.3...	39260	43766	55800	65077	5.218039	5.444291	7.017943	5.488553	2.777586	2.724604	4.782264	7.414559	28.25	26.4	28.966667	35.333333

3085 rows × 17 columns

Differences compared to the default pandas exports#

While both methods wrap native Xarray methods to_pandas and to_dataframe, there are differences in behavior. Since GeoPandas doesn’t support GeometryArray as an index (or columns), Xvec transforms the resulting DataFrame if that happens. It means either resetting the index to get geometries as a standard GeoSeries or transposing the DataFrame and then resetting the index (if geometries are indexing columns).

GeoPandas interface

Contents

GeoPandas interface#

Long form and wide form#

Differences compared to the default pandas exports#