GeoPandas interface#

Due to the nature of operations done with the vector data cubes, you will often need to convert the Xarray object to GeoPandas GeoDataFrames. Xvec offers a set of methods extending support for exporting Xarray to pandas.

import geopandas as gpd
import numpy as np
import pandas as pd
import xarray as xr
import xvec

from geodatasets import get_path

Create a dummy DataArray with two dimensions indexed by geometry, mimicking transport counts between communities of Chicago.

Hide code cell source
chicago = gpd.read_file(get_path("geoda.chicago health"))

origin = destination = chicago.geometry.array
mode = ["car", "bike", "foot"]
date = pd.date_range("2023-01-01", periods=100)
hours = range(24)
rng = np.random.default_rng(1)
data = rng.integers(1, 100, size=(3, 100, 24, len(chicago), len(chicago)))
traffic_counts = xr.DataArray(
    data,
    coords=(mode, date, hours, origin, destination),
    dims=["mode", "date", "time", "origin", "destination"],
    name="traffic_counts",
).xvec.set_geom_indexes(["origin", "destination"], crs=chicago.crs)
traffic_counts
<xarray.DataArray 'traffic_counts' (mode: 3, date: 100, time: 24, origin: 77,
                                    destination: 77)>
array([[[[[47, 51, 75, ..., 72, 53, 87],
          [46, 37,  7, ...,  3, 49, 67],
          [45, 91, 95, ..., 18, 70, 77],
          ...,
          [39, 93, 18, ..., 85,  5, 76],
          [91, 96, 10, ..., 50, 96, 22],
          [66, 11,  6, ..., 35, 34, 15]],

         [[ 2, 30, 42, ..., 87, 37, 30],
          [ 2, 74, 64, ..., 78, 40,  3],
          [93, 86, 92, ..., 24, 63, 84],
          ...,
          [18, 32, 35, ..., 20, 65, 72],
          [ 9, 92, 95, ..., 63, 60, 59],
          [74, 82, 71, ..., 58, 27, 68]],

         [[45, 63, 94, ..., 61, 78, 60],
          [90, 50, 94, ..., 57, 97, 37],
          [47, 67, 73, ..., 35, 89, 70],
          ...,
...
          ...,
          [69, 11, 20, ..., 44, 86,  3],
          [75, 80, 55, ..., 50, 64, 58],
          [70, 12, 36, ..., 82, 25, 18]],

         [[40, 64, 76, ..., 90, 79, 88],
          [31, 96, 25, ..., 38, 50,  2],
          [36, 93, 99, ..., 31, 31, 38],
          ...,
          [43, 94, 80, ..., 39, 53, 34],
          [47, 13, 90, ..., 92, 94,  5],
          [27, 21, 59, ..., 46, 90, 31]],

         [[99, 42, 31, ..., 51, 74, 31],
          [46, 83, 96, ..., 11, 86, 20],
          [18, 38,  6, ..., 91, 48, 54],
          ...,
          [28, 88, 56, ...,  2,  4, 44],
          [88, 73, 55, ..., 17, 52,  9],
          [ 2, 29, 57, ..., 99, 50, 82]]]]])
Coordinates:
  * mode         (mode) <U4 'car' 'bike' 'foot'
  * date         (date) datetime64[ns] 2023-01-01 2023-01-02 ... 2023-04-10
  * time         (time) int64 0 1 2 3 4 5 6 7 8 9 ... 15 16 17 18 19 20 21 22 23
  * origin       (origin) object POLYGON ((-87.60914087617012 41.844692503461...
  * destination  (destination) object POLYGON ((-87.60914087617012 41.8446925...
Indexes:
    origin       GeometryIndex (crs=GEOGCS["WGS84(DD)",DATUM["WGS_1984",SPHEROID["WGS84",6378137,29 ...)
    destination  GeometryIndex (crs=GEOGCS["WGS84(DD)",DATUM["WGS_1984",SPHEROID["WGS84",6378137,29 ...)

Xvec offers .xvec.to_geopandas() and .xvec.to_geodataframe() methods as counterparts to the default to_pandas() and to_dataframe() methods built-in Xarray. While the former works on arrays with 2 dimensions or less, the latter is generalisable and always returns all coordinates. Consider the following examples.

Select one day, one time and one mode and convert the data to a GeoDataFrame.

traffic_counts.sel(date="2023-02-28", time=12, mode="bike").xvec.to_geodataframe(
    geometry="origin"
)
origin destination mode date time traffic_counts
0 POLYGON ((-87.60914 41.84469, -87.60915 41.844... POLYGON ((-87.60914 41.84469, -87.60915 41.844... bike 2023-02-28 12 19
1 POLYGON ((-87.60914 41.84469, -87.60915 41.844... POLYGON ((-87.59215 41.81693, -87.59231 41.816... bike 2023-02-28 12 23
2 POLYGON ((-87.60914 41.84469, -87.60915 41.844... POLYGON ((-87.62880 41.80189, -87.62879 41.801... bike 2023-02-28 12 20
3 POLYGON ((-87.60914 41.84469, -87.60915 41.844... POLYGON ((-87.60671 41.81681, -87.60670 41.816... bike 2023-02-28 12 13
4 POLYGON ((-87.60914 41.84469, -87.60915 41.844... POLYGON ((-87.59215 41.81693, -87.59215 41.816... bike 2023-02-28 12 13
... ... ... ... ... ... ...
5924 POLYGON ((-87.80676 42.00084, -87.80676 42.000... POLYGON ((-87.69646 41.70714, -87.69644 41.706... bike 2023-02-28 12 53
5925 POLYGON ((-87.80676 42.00084, -87.80676 42.000... POLYGON ((-87.64215 41.68508, -87.64249 41.685... bike 2023-02-28 12 47
5926 POLYGON ((-87.80676 42.00084, -87.80676 42.000... MULTIPOLYGON (((-87.83658 41.98640, -87.83658 ... bike 2023-02-28 12 26
5927 POLYGON ((-87.80676 42.00084, -87.80676 42.000... POLYGON ((-87.65456 41.99817, -87.65456 41.998... bike 2023-02-28 12 75
5928 POLYGON ((-87.80676 42.00084, -87.80676 42.000... POLYGON ((-87.80676 42.00084, -87.80676 42.000... bike 2023-02-28 12 62

5929 rows × 6 columns

You end up with two geometry columns. Because none is primary by default, the cell above contains geometry="origin" to specify which of them shall become the active geometry of the GeoDataFrame. You can see that all coordinates are present, but mode, date and time are constants.

However, exporting the same using .xvec.to_geopandas() does not work because both index and columns are arrays of geometries.

traffic_counts.sel(date="2023-02-28", time=12, mode="bike").xvec.to_geopandas()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[4], line 1
----> 1 traffic_counts.sel(date="2023-02-28", time=12, mode="bike").xvec.to_geopandas()

File ~/Git/xvec/xvec/accessor.py:754, in XvecAccessor.to_geopandas(self)
    748     raise ValueError(
    749         f"Cannot convert arrays with {self._obj.ndim} dimensions into pandas "
    750         "objects. Requires 2 or fewer dimensions."
    751     )
    753 if len(self._geom_indexes) > 1:
--> 754     raise ValueError(
    755         "Multiple coordinates based on xvec.GeometryIndex are not supported "
    756         "as GeoPandas.GeoDataFrame cannot be indexed by geometry. Try using "
    757         "`.xvec.to_geodataframe()` instead."
    758     )
    760 # DataArray
    761 if isinstance(self._obj, xr.DataArray):

ValueError: Multiple coordinates based on xvec.GeometryIndex are not supported as GeoPandas.GeoDataFrame cannot be indexed by geometry. Try using `.xvec.to_geodataframe()` instead.

You can use the method if you ensure that only one array of geometries is returned.

traffic_counts.sel(date="2023-02-28", time=12, origin=origin[0]).xvec.to_geopandas()
mode destination car bike foot
0 POLYGON ((-87.60914 41.84469, -87.60915 41.844... 48 19 97
1 POLYGON ((-87.59215 41.81693, -87.59231 41.816... 77 23 25
2 POLYGON ((-87.62880 41.80189, -87.62879 41.801... 43 20 59
3 POLYGON ((-87.60671 41.81681, -87.60670 41.816... 96 13 42
4 POLYGON ((-87.59215 41.81693, -87.59215 41.816... 38 13 48
... ... ... ... ...
72 POLYGON ((-87.69646 41.70714, -87.69644 41.706... 54 13 22
73 POLYGON ((-87.64215 41.68508, -87.64249 41.685... 76 10 82
74 MULTIPOLYGON (((-87.83658 41.98640, -87.83658 ... 49 82 14
75 POLYGON ((-87.65456 41.99817, -87.65456 41.998... 84 28 16
76 POLYGON ((-87.80676 42.00084, -87.80676 42.000... 59 25 12

77 rows × 4 columns

Compared to the to_geodataframe, you can see that constants are removed, and the final GeoDataFrame is much smaller.

traffic_counts.sel(date="2023-02-28", time=12, origin=origin[0]).xvec.to_geodataframe()
destination date time origin traffic_counts
mode
car POLYGON ((-87.60914 41.84469, -87.60915 41.844... 2023-02-28 12 POLYGON ((-87.60914 41.84469, -87.60915 41.844... 48
car POLYGON ((-87.59215 41.81693, -87.59231 41.816... 2023-02-28 12 POLYGON ((-87.60914 41.84469, -87.60915 41.844... 77
car POLYGON ((-87.62880 41.80189, -87.62879 41.801... 2023-02-28 12 POLYGON ((-87.60914 41.84469, -87.60915 41.844... 43
car POLYGON ((-87.60671 41.81681, -87.60670 41.816... 2023-02-28 12 POLYGON ((-87.60914 41.84469, -87.60915 41.844... 96
car POLYGON ((-87.59215 41.81693, -87.59215 41.816... 2023-02-28 12 POLYGON ((-87.60914 41.84469, -87.60915 41.844... 38
... ... ... ... ... ...
foot POLYGON ((-87.69646 41.70714, -87.69644 41.706... 2023-02-28 12 POLYGON ((-87.60914 41.84469, -87.60915 41.844... 22
foot POLYGON ((-87.64215 41.68508, -87.64249 41.685... 2023-02-28 12 POLYGON ((-87.60914 41.84469, -87.60915 41.844... 82
foot MULTIPOLYGON (((-87.83658 41.98640, -87.83658 ... 2023-02-28 12 POLYGON ((-87.60914 41.84469, -87.60915 41.844... 14
foot POLYGON ((-87.65456 41.99817, -87.65456 41.998... 2023-02-28 12 POLYGON ((-87.60914 41.84469, -87.60915 41.844... 16
foot POLYGON ((-87.80676 42.00084, -87.80676 42.000... 2023-02-28 12 POLYGON ((-87.60914 41.84469, -87.60915 41.844... 12

231 rows × 5 columns

Long form and wide form#

Unlike generic tabular data, geospatial data are often designed as wide form tables to ensure that each geometry is present exactly one in the GeoDataFrame, which is not the case above. For this reason, Xvec allows you to export a wide form directly from Xarray, given you have exactly one dimension indexed by geometry.

Hide code cell source
counties = gpd.read_file(get_path("geoda.natregimes"))

deomgraphy = xr.Dataset(
    data_vars=dict(
        population=(["county", "year"], counties[["PO60", "PO70", "PO80", "PO90"]]),
        unemployment=(["county", "year"], counties[["UE60", "UE70", "UE80", "UE90"]]),
        divorce=(["county", "year"], counties[["DV60", "DV70", "DV80", "DV90"]]),
        age=(["county", "year"], counties[["MA60", "MA70", "MA80", "MA90"]]),
    ),
    coords=dict(county=counties.geometry, year=[1960, 1970, 1980, 1990]),
).xvec.set_geom_indexes("county", crs=counties.crs)
deomgraphy
<xarray.Dataset>
Dimensions:       (county: 3085, year: 4)
Coordinates:
  * county        (county) object POLYGON ((-95.34258270263672 48.54670333862...
  * year          (year) int64 1960 1970 1980 1990
Data variables:
    population    (county, year) int64 4304 3987 3764 4076 ... 43766 55800 65077
    unemployment  (county, year) float64 7.9 9.0 5.903 ... 5.444 7.018 5.489
    divorce       (county, year) float64 1.859 2.62 3.747 ... 2.725 4.782 7.415
    age           (county, year) float64 28.8 30.5 34.5 ... 26.4 28.97 35.33
Indexes:
    county   GeometryIndex (crs=EPSG:4326)

The default is the same as for vanilla Xarray, so a long form data frame.

deomgraphy.xvec.to_geodataframe()
county population unemployment divorce age
year
1960 POLYGON ((-95.34258 48.54670, -95.34081 48.715... 4304 7.900000 1.858974 28.800000
1970 POLYGON ((-95.34258 48.54670, -95.34081 48.715... 3987 9.000000 2.619808 30.500000
1980 POLYGON ((-95.34258 48.54670, -95.34081 48.715... 3764 5.902579 3.746594 34.500000
1990 POLYGON ((-95.34258 48.54670, -95.34081 48.715... 4076 3.894790 7.388535 35.500000
1960 POLYGON ((-118.85050 47.94969, -118.84732 48.4... 3889 8.200000 2.863278 25.900000
... ... ... ... ... ...
1990 POLYGON ((-77.53178 38.56506, -77.72094 38.840... 250377 3.353981 5.750177 28.466667
1960 POLYGON ((-111.37152 44.74516, -111.36839 45.3... 39260 5.218039 2.777586 28.250000
1970 POLYGON ((-111.37152 44.74516, -111.36839 45.3... 43766 5.444291 2.724604 26.400000
1980 POLYGON ((-111.37152 44.74516, -111.36839 45.3... 55800 7.017943 4.782264 28.966667
1990 POLYGON ((-111.37152 44.74516, -111.36839 45.3... 65077 5.488553 7.414559 35.333333

12340 rows × 5 columns

You can pass long=False to generate a wide form with multi-indexed columns instead.

deomgraphy.xvec.to_geodataframe(long=False)
county population unemployment divorce age
year 1960 1970 1980 1990 1960 1970 1980 1990 1960 1970 1980 1990 1960 1970 1980 1990
0 POLYGON ((-95.34258 48.54670, -95.34081 48.715... 4304 3987 3764 4076 7.900000 9.000000 5.902579 3.894790 1.858974 2.619808 3.746594 7.388535 28.80 30.5 34.500000 35.500000
1 POLYGON ((-118.85050 47.94969, -118.84732 48.4... 3889 3655 5811 6295 8.200000 15.400000 15.422886 16.811594 2.863278 3.686007 6.625442 11.543135 25.90 27.1 27.200000 32.800000
2 POLYGON ((-117.43777 48.04422, -117.54113 48.0... 17884 17405 28979 30948 10.100000 9.000000 13.574064 10.700794 2.711447 2.976378 5.448223 9.123212 29.60 31.8 28.700000 34.500000
3 POLYGON ((-118.97096 47.93928, -118.97293 47.9... 25520 25867 30639 33350 7.500000 10.500000 12.700195 10.203544 3.372041 4.090386 7.122333 9.245627 29.40 31.1 31.200000 35.000000
4 POLYGON ((-117.43750 49.00000, -117.03098 49.0... 6914 6025 8580 8915 10.600000 13.400000 18.148999 14.991023 3.008988 4.010695 5.287860 10.158351 31.20 33.8 31.300000 36.100000
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
3080 POLYGON ((-114.51984 33.02767, -114.55830 33.0... 46235 60827 90554 120739 4.600000 4.800000 8.012906 10.910351 3.702838 4.160574 5.730308 7.798840 25.70 24.6 27.900000 33.900000
3081 POLYGON ((-107.19484 34.58352, -107.71780 34.5... 39085 40576 61115 69029 6.100000 3.800000 8.375127 11.002271 1.880621 2.274648 5.372952 9.078873 20.10 21.7 25.500000 30.450000
3082 POLYGON ((-76.39569 37.10771, -76.40270 37.090... 21583 33203 44189 53427 2.500000 3.105065 5.037045 4.102888 1.113223 1.550323 3.257639 4.850746 24.40 24.4 29.800000 34.000000
3083 POLYGON ((-77.53178 38.56506, -77.72094 38.840... 50164 111102 166665 250377 3.300000 2.361559 3.708886 3.353981 1.672391 1.468343 3.742207 5.750177 22.30 21.8 25.666667 28.466667
3084 POLYGON ((-111.37152 44.74516, -111.36839 45.3... 39260 43766 55800 65077 5.218039 5.444291 7.017943 5.488553 2.777586 2.724604 4.782264 7.414559 28.25 26.4 28.966667 35.333333

3085 rows × 17 columns

Differences compared to the default pandas exports#

While both methods wrap native Xarray methods to_pandas and to_dataframe, there are differences in behavior. Since GeoPandas doesn’t support GeometryArray as an index (or columns), Xvec transforms the resulting DataFrame if that happens. It means either resetting the index to get geometries as a standard GeoSeries or transposing the DataFrame and then resetting the index (if geometries are indexing columns).