Reading and writing files#

Vector data cubes do not have an interoperable file format that can be used to save any Xarray object with geometric dimensions to a file and read it back or in other software like {stars} in the R-Spatial ecosystem.

CF conventions and NetCDF offer a way of saving a Dataset with a single dimension indexed by geometry, but this still needs to be implemented in Xvec. While we are working on a solution (see this issue), some options can be used in the meantime.

GIS file formats#

Another option is to convert the data to a GeoDataFrame and save it to any geospatial file format or a GeoParquet if there are multiple geometry columns.

Long table form#

Long table form needs to save each value in a row with all its dimension coordinates. While this option is possible, it can be inefficient, leading to a high degree of repetition and very long tables. However, it is readily supported by Xarray and Xvec.

long = cube.xvec.to_geodataframe()
long

	county	population	unemployment	divorce	age
year
1960	POLYGON ((-95.34258 48.5467, -95.34081 48.7151...	4304	7.900000	1.858974	28.800000
1970	POLYGON ((-95.34258 48.5467, -95.34081 48.7151...	3987	9.000000	2.619808	30.500000
1980	POLYGON ((-95.34258 48.5467, -95.34081 48.7151...	3764	5.902579	3.746594	34.500000
1990	POLYGON ((-95.34258 48.5467, -95.34081 48.7151...	4076	3.894790	7.388535	35.500000
1960	POLYGON ((-118.8505 47.94969, -118.84732 48.47...	3889	8.200000	2.863278	25.900000
...	...	...	...	...	...
1990	POLYGON ((-77.53178 38.56506, -77.72094 38.840...	250377	3.353981	5.750177	28.466667
1960	POLYGON ((-111.37152 44.74516, -111.36839 45.3...	39260	5.218039	2.777586	28.250000
1970	POLYGON ((-111.37152 44.74516, -111.36839 45.3...	43766	5.444291	2.724604	26.400000
1980	POLYGON ((-111.37152 44.74516, -111.36839 45.3...	55800	7.017943	4.782264	28.966667
1990	POLYGON ((-111.37152 44.74516, -111.36839 45.3...	65077	5.488553	7.414559	35.333333

12340 rows × 5 columns

You can see that the result is longer than the original GeoDataFrame that had each year encoded as a column.

You can then save it using to_file or to_parquet.

long.to_file("cube.gpkg")

long.to_parquet("cube.parquet")

You can then re-create the Dataset using the from_dataframe method with the geometry column set as an index.

Wide table form#

The other option is to create a wide table form which is what we started with. Each combination of coordinate and data variables except geometry is encoded as a column (e.g. ["population_1960", "population_1970", "population_1980", "population_1990"] or using the MultiIndex like below). However, this works only when the Dataset or DataArray is indexed by one geometry dimension only.

To create a valid GeoDataFrame, you would need to do something like this:

wide = cube.xvec.to_geodataframe(long=False)
wide

	county	population				unemployment				divorce				age
year		1960	1970	1980	1990	1960	1970	1980	1990	1960	1970	1980	1990	1960	1970	1980	1990
0	POLYGON ((-95.34258 48.5467, -95.34081 48.7151...	4304	3987	3764	4076	7.900000	9.000000	5.902579	3.894790	1.858974	2.619808	3.746594	7.388535	28.80	30.5	34.500000	35.500000
1	POLYGON ((-118.8505 47.94969, -118.84732 48.47...	3889	3655	5811	6295	8.200000	15.400000	15.422886	16.811594	2.863278	3.686007	6.625442	11.543135	25.90	27.1	27.200000	32.800000
2	POLYGON ((-117.43777 48.04422, -117.54113 48.0...	17884	17405	28979	30948	10.100000	9.000000	13.574064	10.700794	2.711447	2.976378	5.448223	9.123212	29.60	31.8	28.700000	34.500000
3	POLYGON ((-118.97096 47.93928, -118.97293 47.9...	25520	25867	30639	33350	7.500000	10.500000	12.700195	10.203544	3.372041	4.090386	7.122333	9.245627	29.40	31.1	31.200000	35.000000
4	POLYGON ((-117.4375 49, -117.03098 49, -117.02...	6914	6025	8580	8915	10.600000	13.400000	18.148999	14.991023	3.008988	4.010695	5.287860	10.158351	31.20	33.8	31.300000	36.100000
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
3080	POLYGON ((-114.51984 33.02767, -114.5583 33.03...	46235	60827	90554	120739	4.600000	4.800000	8.012906	10.910351	3.702838	4.160574	5.730308	7.798840	25.70	24.6	27.900000	33.900000
3081	POLYGON ((-107.19484 34.58352, -107.7178 34.58...	39085	40576	61115	69029	6.100000	3.800000	8.375127	11.002271	1.880621	2.274648	5.372952	9.078873	20.10	21.7	25.500000	30.450000
3082	POLYGON ((-76.39569 37.10771, -76.4027 37.0905...	21583	33203	44189	53427	2.500000	3.105065	5.037045	4.102888	1.113223	1.550323	3.257639	4.850746	24.40	24.4	29.800000	34.000000
3083	POLYGON ((-77.53178 38.56506, -77.72094 38.840...	50164	111102	166665	250377	3.300000	2.361559	3.708886	3.353981	1.672391	1.468343	3.742207	5.750177	22.30	21.8	25.666667	28.466667
3084	POLYGON ((-111.37152 44.74516, -111.36839 45.3...	39260	43766	55800	65077	5.218039	5.444291	7.017943	5.488553	2.777586	2.724604	4.782264	7.414559	28.25	26.4	28.966667	35.333333

3085 rows × 17 columns

And a re-creation of the Dataset.

Reading and writing files

Contents

Reading and writing files#

Pickle, joblib#

Pickle#

Joblib#

GIS file formats#

Long table form#

Wide table form#