GeoPandas interface#
Due to the nature of operations done with the vector data cubes, you will often need to convert the Xarray object to GeoPandas GeoDataFrames. Xvec offers a set of methods extending support for exporting Xarray to pandas.
import geopandas as gpd
import numpy as np
import pandas as pd
import xarray as xr
from geodatasets import get_path
import xvec
Create a dummy DataArray with two dimensions indexed by geometry, mimicking transport counts between communities of Chicago.
Show code cell source
chicago = gpd.read_file(get_path("geoda.chicago health"))
origin = destination = chicago.geometry.array
mode = ["car", "bike", "foot"]
date = pd.date_range("2023-01-01", periods=100)
hours = range(24)
rng = np.random.default_rng(1)
data = rng.integers(1, 100, size=(3, 100, 24, len(chicago), len(chicago)))
traffic_counts = xr.DataArray(
data,
coords=(mode, date, hours, origin, destination),
dims=["mode", "date", "time", "origin", "destination"],
name="traffic_counts",
).xvec.set_geom_indexes(["origin", "destination"], crs=chicago.crs)
traffic_counts
<xarray.DataArray 'traffic_counts' (mode: 3, date: 100, time: 24, origin: 77, destination: 77)> array([[[[[47, 51, 75, ..., 72, 53, 87], [46, 37, 7, ..., 3, 49, 67], [45, 91, 95, ..., 18, 70, 77], ..., [39, 93, 18, ..., 85, 5, 76], [91, 96, 10, ..., 50, 96, 22], [66, 11, 6, ..., 35, 34, 15]], [[ 2, 30, 42, ..., 87, 37, 30], [ 2, 74, 64, ..., 78, 40, 3], [93, 86, 92, ..., 24, 63, 84], ..., [18, 32, 35, ..., 20, 65, 72], [ 9, 92, 95, ..., 63, 60, 59], [74, 82, 71, ..., 58, 27, 68]], [[45, 63, 94, ..., 61, 78, 60], [90, 50, 94, ..., 57, 97, 37], [47, 67, 73, ..., 35, 89, 70], ..., ... ..., [69, 11, 20, ..., 44, 86, 3], [75, 80, 55, ..., 50, 64, 58], [70, 12, 36, ..., 82, 25, 18]], [[40, 64, 76, ..., 90, 79, 88], [31, 96, 25, ..., 38, 50, 2], [36, 93, 99, ..., 31, 31, 38], ..., [43, 94, 80, ..., 39, 53, 34], [47, 13, 90, ..., 92, 94, 5], [27, 21, 59, ..., 46, 90, 31]], [[99, 42, 31, ..., 51, 74, 31], [46, 83, 96, ..., 11, 86, 20], [18, 38, 6, ..., 91, 48, 54], ..., [28, 88, 56, ..., 2, 4, 44], [88, 73, 55, ..., 17, 52, 9], [ 2, 29, 57, ..., 99, 50, 82]]]]]) Coordinates: * mode (mode) <U4 'car' 'bike' 'foot' * date (date) datetime64[ns] 2023-01-01 2023-01-02 ... 2023-04-10 * time (time) int64 0 1 2 3 4 5 6 7 8 9 ... 15 16 17 18 19 20 21 22 23 * origin (origin) object POLYGON ((-87.60914087617012 41.844692503461... * destination (destination) object POLYGON ((-87.60914087617012 41.8446925... Indexes: origin GeometryIndex (crs=GEOGCS["WGS84(DD)",DATUM["WGS_1984",SPHEROID["WGS84",6378137,29 ...) destination GeometryIndex (crs=GEOGCS["WGS84(DD)",DATUM["WGS_1984",SPHEROID["WGS84",6378137,29 ...)
Xvec offers .xvec.to_geopandas()
and .xvec.to_geodataframe()
methods as counterparts to the default to_pandas()
and to_dataframe()
methods built-in Xarray. While the former works on arrays with 2 dimensions or less, the latter is generalisable and always returns all coordinates. Consider the following examples.
Select one day, one time and one mode and convert the data to a GeoDataFrame.
traffic_counts.sel(date="2023-02-28", time=12, mode="bike").xvec.to_geodataframe(
geometry="origin"
)
origin | destination | mode | date | time | traffic_counts | |
---|---|---|---|---|---|---|
0 | POLYGON ((-87.60914 41.84469, -87.60915 41.844... | POLYGON ((-87.60914 41.84469, -87.60915 41.844... | bike | 2023-02-28 | 12 | 19 |
1 | POLYGON ((-87.60914 41.84469, -87.60915 41.844... | POLYGON ((-87.59215 41.81693, -87.59231 41.816... | bike | 2023-02-28 | 12 | 23 |
2 | POLYGON ((-87.60914 41.84469, -87.60915 41.844... | POLYGON ((-87.6288 41.80189, -87.62879 41.8017... | bike | 2023-02-28 | 12 | 20 |
3 | POLYGON ((-87.60914 41.84469, -87.60915 41.844... | POLYGON ((-87.60671 41.81681, -87.6067 41.8165... | bike | 2023-02-28 | 12 | 13 |
4 | POLYGON ((-87.60914 41.84469, -87.60915 41.844... | POLYGON ((-87.59215 41.81693, -87.59215 41.816... | bike | 2023-02-28 | 12 | 13 |
... | ... | ... | ... | ... | ... | ... |
5924 | POLYGON ((-87.80676 42.00084, -87.80676 42.000... | POLYGON ((-87.69646 41.70714, -87.69644 41.706... | bike | 2023-02-28 | 12 | 53 |
5925 | POLYGON ((-87.80676 42.00084, -87.80676 42.000... | POLYGON ((-87.64215 41.68508, -87.64249 41.685... | bike | 2023-02-28 | 12 | 47 |
5926 | POLYGON ((-87.80676 42.00084, -87.80676 42.000... | MULTIPOLYGON (((-87.83658 41.9864, -87.83658 4... | bike | 2023-02-28 | 12 | 26 |
5927 | POLYGON ((-87.80676 42.00084, -87.80676 42.000... | POLYGON ((-87.65456 41.99817, -87.65456 41.998... | bike | 2023-02-28 | 12 | 75 |
5928 | POLYGON ((-87.80676 42.00084, -87.80676 42.000... | POLYGON ((-87.80676 42.00084, -87.80676 42.000... | bike | 2023-02-28 | 12 | 62 |
5929 rows × 6 columns
You end up with two geometry columns. Because none is primary by default, the cell above contains geometry="origin"
to specify which of them shall become the active geometry of the GeoDataFrame. You can see that all coordinates are present, but mode
, date
and time
are constants.
However, exporting the same using .xvec.to_geopandas()
does not work because both index and columns are arrays of geometries.
traffic_counts.sel(date="2023-02-28", time=12, mode="bike").xvec.to_geopandas()
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[4], line 1
----> 1 traffic_counts.sel(date="2023-02-28", time=12, mode="bike").xvec.to_geopandas()
File ~/Git/xvec/xvec/accessor.py:748, in XvecAccessor.to_geopandas(self)
742 raise ValueError(
743 f"Cannot convert arrays with {self._obj.ndim} dimensions into pandas "
744 "objects. Requires 2 or fewer dimensions."
745 )
747 if len(self._geom_indexes) > 1:
--> 748 raise ValueError(
749 "Multiple coordinates based on xvec.GeometryIndex are not supported "
750 "as GeoPandas.GeoDataFrame cannot be indexed by geometry. Try using "
751 "`.xvec.to_geodataframe()` instead."
752 )
754 # DataArray
755 if isinstance(self._obj, xr.DataArray):
ValueError: Multiple coordinates based on xvec.GeometryIndex are not supported as GeoPandas.GeoDataFrame cannot be indexed by geometry. Try using `.xvec.to_geodataframe()` instead.
You can use the method if you ensure that only one array of geometries is returned.
traffic_counts.sel(date="2023-02-28", time=12, origin=origin[0]).xvec.to_geopandas()
mode | destination | car | bike | foot |
---|---|---|---|---|
0 | POLYGON ((-87.60914 41.84469, -87.60915 41.844... | 48 | 19 | 97 |
1 | POLYGON ((-87.59215 41.81693, -87.59231 41.816... | 77 | 23 | 25 |
2 | POLYGON ((-87.6288 41.80189, -87.62879 41.8017... | 43 | 20 | 59 |
3 | POLYGON ((-87.60671 41.81681, -87.6067 41.8165... | 96 | 13 | 42 |
4 | POLYGON ((-87.59215 41.81693, -87.59215 41.816... | 38 | 13 | 48 |
... | ... | ... | ... | ... |
72 | POLYGON ((-87.69646 41.70714, -87.69644 41.706... | 54 | 13 | 22 |
73 | POLYGON ((-87.64215 41.68508, -87.64249 41.685... | 76 | 10 | 82 |
74 | MULTIPOLYGON (((-87.83658 41.9864, -87.83658 4... | 49 | 82 | 14 |
75 | POLYGON ((-87.65456 41.99817, -87.65456 41.998... | 84 | 28 | 16 |
76 | POLYGON ((-87.80676 42.00084, -87.80676 42.000... | 59 | 25 | 12 |
77 rows × 4 columns
Compared to the to_geodataframe
, you can see that constants are removed, and the final GeoDataFrame is much smaller.
traffic_counts.sel(date="2023-02-28", time=12, origin=origin[0]).xvec.to_geodataframe()
destination | date | time | origin | traffic_counts | |
---|---|---|---|---|---|
mode | |||||
car | POLYGON ((-87.60914 41.84469, -87.60915 41.844... | 2023-02-28 | 12 | POLYGON ((-87.60914 41.84469, -87.60915 41.844... | 48 |
car | POLYGON ((-87.59215 41.81693, -87.59231 41.816... | 2023-02-28 | 12 | POLYGON ((-87.60914 41.84469, -87.60915 41.844... | 77 |
car | POLYGON ((-87.6288 41.80189, -87.62879 41.8017... | 2023-02-28 | 12 | POLYGON ((-87.60914 41.84469, -87.60915 41.844... | 43 |
car | POLYGON ((-87.60671 41.81681, -87.6067 41.8165... | 2023-02-28 | 12 | POLYGON ((-87.60914 41.84469, -87.60915 41.844... | 96 |
car | POLYGON ((-87.59215 41.81693, -87.59215 41.816... | 2023-02-28 | 12 | POLYGON ((-87.60914 41.84469, -87.60915 41.844... | 38 |
... | ... | ... | ... | ... | ... |
foot | POLYGON ((-87.69646 41.70714, -87.69644 41.706... | 2023-02-28 | 12 | POLYGON ((-87.60914 41.84469, -87.60915 41.844... | 22 |
foot | POLYGON ((-87.64215 41.68508, -87.64249 41.685... | 2023-02-28 | 12 | POLYGON ((-87.60914 41.84469, -87.60915 41.844... | 82 |
foot | MULTIPOLYGON (((-87.83658 41.9864, -87.83658 4... | 2023-02-28 | 12 | POLYGON ((-87.60914 41.84469, -87.60915 41.844... | 14 |
foot | POLYGON ((-87.65456 41.99817, -87.65456 41.998... | 2023-02-28 | 12 | POLYGON ((-87.60914 41.84469, -87.60915 41.844... | 16 |
foot | POLYGON ((-87.80676 42.00084, -87.80676 42.000... | 2023-02-28 | 12 | POLYGON ((-87.60914 41.84469, -87.60915 41.844... | 12 |
231 rows × 5 columns
Long form and wide form#
Unlike generic tabular data, geospatial data are often designed as wide form tables to ensure that each geometry is present exactly one in the GeoDataFrame, which is not the case above. For this reason, Xvec allows you to export a wide form directly from Xarray, given you have exactly one dimension indexed by geometry.
Show code cell source
counties = gpd.read_file(get_path("geoda.natregimes"))
demography = xr.Dataset(
data_vars=dict(
population=(["county", "year"], counties[["PO60", "PO70", "PO80", "PO90"]]),
unemployment=(["county", "year"], counties[["UE60", "UE70", "UE80", "UE90"]]),
divorce=(["county", "year"], counties[["DV60", "DV70", "DV80", "DV90"]]),
age=(["county", "year"], counties[["MA60", "MA70", "MA80", "MA90"]]),
),
coords=dict(county=counties.geometry, year=[1960, 1970, 1980, 1990]),
).xvec.set_geom_indexes("county", crs=counties.crs)
demography
<xarray.Dataset> Dimensions: (county: 3085, year: 4) Coordinates: * county (county) object POLYGON ((-95.34258270263672 48.54670333862... * year (year) int64 1960 1970 1980 1990 Data variables: population (county, year) int64 4304 3987 3764 4076 ... 43766 55800 65077 unemployment (county, year) float64 7.9 9.0 5.903 ... 5.444 7.018 5.489 divorce (county, year) float64 1.859 2.62 3.747 ... 2.725 4.782 7.415 age (county, year) float64 28.8 30.5 34.5 ... 26.4 28.97 35.33 Indexes: county GeometryIndex (crs=EPSG:4326)
The default is the same as for vanilla Xarray, so a long form data frame.
demography.xvec.to_geodataframe()
county | population | unemployment | divorce | age | |
---|---|---|---|---|---|
year | |||||
1960 | POLYGON ((-95.34258 48.5467, -95.34081 48.7151... | 4304 | 7.900000 | 1.858974 | 28.800000 |
1970 | POLYGON ((-95.34258 48.5467, -95.34081 48.7151... | 3987 | 9.000000 | 2.619808 | 30.500000 |
1980 | POLYGON ((-95.34258 48.5467, -95.34081 48.7151... | 3764 | 5.902579 | 3.746594 | 34.500000 |
1990 | POLYGON ((-95.34258 48.5467, -95.34081 48.7151... | 4076 | 3.894790 | 7.388535 | 35.500000 |
1960 | POLYGON ((-118.8505 47.94969, -118.84732 48.47... | 3889 | 8.200000 | 2.863278 | 25.900000 |
... | ... | ... | ... | ... | ... |
1990 | POLYGON ((-77.53178 38.56506, -77.72094 38.840... | 250377 | 3.353981 | 5.750177 | 28.466667 |
1960 | POLYGON ((-111.37152 44.74516, -111.36839 45.3... | 39260 | 5.218039 | 2.777586 | 28.250000 |
1970 | POLYGON ((-111.37152 44.74516, -111.36839 45.3... | 43766 | 5.444291 | 2.724604 | 26.400000 |
1980 | POLYGON ((-111.37152 44.74516, -111.36839 45.3... | 55800 | 7.017943 | 4.782264 | 28.966667 |
1990 | POLYGON ((-111.37152 44.74516, -111.36839 45.3... | 65077 | 5.488553 | 7.414559 | 35.333333 |
12340 rows × 5 columns
You can pass long=False
to generate a wide form with multi-indexed columns instead.
demography.xvec.to_geodataframe(long=False)
county | population | unemployment | divorce | age | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
year | 1960 | 1970 | 1980 | 1990 | 1960 | 1970 | 1980 | 1990 | 1960 | 1970 | 1980 | 1990 | 1960 | 1970 | 1980 | 1990 | |
0 | POLYGON ((-95.34258 48.5467, -95.34081 48.7151... | 4304 | 3987 | 3764 | 4076 | 7.900000 | 9.000000 | 5.902579 | 3.894790 | 1.858974 | 2.619808 | 3.746594 | 7.388535 | 28.80 | 30.5 | 34.500000 | 35.500000 |
1 | POLYGON ((-118.8505 47.94969, -118.84732 48.47... | 3889 | 3655 | 5811 | 6295 | 8.200000 | 15.400000 | 15.422886 | 16.811594 | 2.863278 | 3.686007 | 6.625442 | 11.543135 | 25.90 | 27.1 | 27.200000 | 32.800000 |
2 | POLYGON ((-117.43777 48.04422, -117.54113 48.0... | 17884 | 17405 | 28979 | 30948 | 10.100000 | 9.000000 | 13.574064 | 10.700794 | 2.711447 | 2.976378 | 5.448223 | 9.123212 | 29.60 | 31.8 | 28.700000 | 34.500000 |
3 | POLYGON ((-118.97096 47.93928, -118.97293 47.9... | 25520 | 25867 | 30639 | 33350 | 7.500000 | 10.500000 | 12.700195 | 10.203544 | 3.372041 | 4.090386 | 7.122333 | 9.245627 | 29.40 | 31.1 | 31.200000 | 35.000000 |
4 | POLYGON ((-117.4375 49, -117.03098 49, -117.02... | 6914 | 6025 | 8580 | 8915 | 10.600000 | 13.400000 | 18.148999 | 14.991023 | 3.008988 | 4.010695 | 5.287860 | 10.158351 | 31.20 | 33.8 | 31.300000 | 36.100000 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
3080 | POLYGON ((-114.51984 33.02767, -114.5583 33.03... | 46235 | 60827 | 90554 | 120739 | 4.600000 | 4.800000 | 8.012906 | 10.910351 | 3.702838 | 4.160574 | 5.730308 | 7.798840 | 25.70 | 24.6 | 27.900000 | 33.900000 |
3081 | POLYGON ((-107.19484 34.58352, -107.7178 34.58... | 39085 | 40576 | 61115 | 69029 | 6.100000 | 3.800000 | 8.375127 | 11.002271 | 1.880621 | 2.274648 | 5.372952 | 9.078873 | 20.10 | 21.7 | 25.500000 | 30.450000 |
3082 | POLYGON ((-76.39569 37.10771, -76.4027 37.0905... | 21583 | 33203 | 44189 | 53427 | 2.500000 | 3.105065 | 5.037045 | 4.102888 | 1.113223 | 1.550323 | 3.257639 | 4.850746 | 24.40 | 24.4 | 29.800000 | 34.000000 |
3083 | POLYGON ((-77.53178 38.56506, -77.72094 38.840... | 50164 | 111102 | 166665 | 250377 | 3.300000 | 2.361559 | 3.708886 | 3.353981 | 1.672391 | 1.468343 | 3.742207 | 5.750177 | 22.30 | 21.8 | 25.666667 | 28.466667 |
3084 | POLYGON ((-111.37152 44.74516, -111.36839 45.3... | 39260 | 43766 | 55800 | 65077 | 5.218039 | 5.444291 | 7.017943 | 5.488553 | 2.777586 | 2.724604 | 4.782264 | 7.414559 | 28.25 | 26.4 | 28.966667 | 35.333333 |
3085 rows × 17 columns
Differences compared to the default pandas exports#
While both methods wrap native Xarray methods to_pandas
and to_dataframe
, there are differences in behavior. Since GeoPandas doesn’t support GeometryArray as an index (or columns), Xvec transforms the resulting DataFrame if that happens. It means either resetting the index to get geometries as a standard GeoSeries or transposing the DataFrame and then resetting the index (if geometries are indexing columns).