Slider labels do not correspond to the title - python

I have this xarray dataset defined as ds:
<xarray.Dataset>
Dimensions: (bnds: 2, lag: 61, plev: 63)
Coordinates:
* plev (plev) float64 1e+03 925.0 850.0 800.0 780.0 750.0 700.0 ...
* lag (lag) int64 -30 -29 -28 -27 -26 -25 -24 -23 -22 -21 -20 -19 ...
* bnds (bnds) int64 0 1
Data variables:
time_bnds (lag, plev, bnds) float64 -3.468e+04 -3.468e+04 -3.468e+04 ...
lat_bnds (lag, plev, bnds) float64 -48.24 -51.95 -48.24 -51.95 -48.24 ...
lon_bnds (lag, plev, bnds) float64 -318.8 -322.5 -318.7 -322.5 -318.7 ...
plev_bnds (lag, plev, bnds) float64 -1e+05 -9.25e+04 -9.25e+04 -8.5e+04 ...
accelogw (lag, plev) float64 -0.001869 0.05221 0.04774 0.02534 0.02233 ...
where the plev coordinate is decrasing (pressure level) from 1000 to 0.0007 hPa.
I define geoviews dataset like this:
import geoviews as gv
kdims = ['plev', 'lag']
vdims = ['accelogw']
dataset = gv.Dataset(ds, kdims=kdims, vdims = vdims)
I create the holoviews Curve object in this way:
%%opts Curve [xrotation=25] NdOverlay [fig_size=300 aspect=1.2]
dataset.to(hv.Curve, 'lag')
resulting in frames beginning with this one:
As you can see the slider label shows the pressure level equal to 0.0007 hPa in contrary to the title showing the pressure level equal to 1000 hPa. Is it a bug or default behavior of holoviews/geoviews for dimensions?
Thanks for your time.
EDIT: I have holoviews on v1.6.2, geoviews on v1.1.0 and xarray on v0.8.2.

Related

OSError when extracting values from xarray DataArray

I have a dataset containing windspeeds at multiple pressure levels for 3 consecutive months:
import xarray as xr
da = xr.open_dataset('autumn_data.grib', engine='cfgrib')
In[1]: da
Out[1]:
<xarray.Dataset>
Dimensions: (time: 2208, isobaricInhPa: 11, latitude: 161, longitude: 401)
Coordinates:
number int32 ...
* time (time) datetime64[ns] 2020-08-01 ... 2020-10-31T23:00:00
step timedelta64[ns] ...
* isobaricInhPa (isobaricInhPa) float64 1e+03 950.0 900.0 ... 550.0 500.0
* latitude (latitude) float64 70.0 69.75 69.5 69.25 ... 30.5 30.25 30.0
* longitude (longitude) float64 -90.0 -89.75 -89.5 ... 9.5 9.75 10.0
valid_time (time) datetime64[ns] ...
Data variables:
u (time, isobaricInhPa, latitude, longitude) float32 ...
v (time, isobaricInhPa, latitude, longitude) float32 ...
Attributes:
GRIB_edition: 1
GRIB_centre: ecmf
GRIB_centreDescription: European Centre for Medium-Range Weather Forecasts
GRIB_subCentre: 0
Conventions: CF-1.7
institution: European Centre for Medium-Range Weather Forecasts
history: 2022-06-19T13:42 GRIB to CDM+CF via cfgrib-0.9.1...
I loop over this dataset to make a numpy array with windspeeds in both directions for 40 consecutive hours.
import numpy as np
RUNTIME = 40
TIMES = da.time.values
for t in range(len(TIMES)-RUNTIME+1)):
WIND = np.stack((
da['u'].sel(time = xr.DataArray(TIMES[t:t+RUNTIME])).values,
da['v'].sel(time = xr.DataArray(TIMES[t:t+RUNTIME])).values
))
This worked fine, until some point where I got an error.
In[2]: da['u'].sel(time = xr.DataArray(TIMES[716:716+RUNTIME])).values
Traceback (most recent call last):
Input In [2] in <cell line: 1>
da['u'].sel(time = xr.DataArray(TIMES[716:716+RUNTIME])).values
File ~\anaconda3\envs\thesis\lib\site-packages\xarray\core\dataarray.py:646 in values
return self.variable.values
File ~\anaconda3\envs\thesis\lib\site-packages\xarray\core\variable.py:519 in values
return _as_array_or_item(self._data)
File ~\anaconda3\envs\thesis\lib\site-packages\xarray\core\variable.py:259 in _as_array_or_item
data = np.asarray(data)
File ~\anaconda3\envs\thesis\lib\site-packages\xarray\core\indexing.py:551 in __array__
self._ensure_cached()
File ~\anaconda3\envs\thesis\lib\site-packages\xarray\core\indexing.py:548 in _ensure_cached
self.array = NumpyIndexingAdapter(np.asarray(self.array))
File ~\anaconda3\envs\thesis\lib\site-packages\xarray\core\indexing.py:521 in __array__
return np.asarray(self.array, dtype=dtype)
File ~\anaconda3\envs\thesis\lib\site-packages\xarray\core\indexing.py:422 in __array__
return np.asarray(array[self.key], dtype=None)
File ~\anaconda3\envs\thesis\lib\site-packages\cfgrib\xarray_plugin.py:144 in __getitem__
return xr.core.indexing.explicit_indexing_adapter(
File ~\anaconda3\envs\thesis\lib\site-packages\xarray\core\indexing.py:711 in explicit_indexing_adapter
result = raw_indexing_method(raw_key.tuple)
File ~\anaconda3\envs\thesis\lib\site-packages\cfgrib\xarray_plugin.py:150 in _getitem
return self.array[key]
File ~\anaconda3\envs\thesis\lib\site-packages\cfgrib\dataset.py:342 in __getitem__
message = self.index.get_field(message_ids[0]) # type: ignore
File ~\anaconda3\envs\thesis\lib\site-packages\cfgrib\messages.py:472 in get_field
return ComputedKeysAdapter(self.fieldset[message_id], self.computed_keys)
File ~\anaconda3\envs\thesis\lib\site-packages\cfgrib\messages.py:332 in __getitem__
return self.message_from_file(file, offset=item)
File ~\anaconda3\envs\thesis\lib\site-packages\cfgrib\messages.py:328 in message_from_file
return Message.from_file(file, offset, **kwargs)
File ~\anaconda3\envs\thesis\lib\site-packages\cfgrib\messages.py:91 in from_file
file.seek(offset)
OSError: [Errno 22] Invalid argument
This seems very strange, because I didn't get this error in the previous loops and because I can create the corresponding DataArray.
In[3]: da['u'].sel(time = xr.DataArray(TIMES[716:716+RUNTIME]))
Out[3]:
<xarray.DataArray 'u' (dim_0: 40, isobaricInhPa: 11, latitude: 161, longitude: 401)>
[28406840 values with dtype=float32]
Coordinates:
number int32 0
time (dim_0) datetime64[ns] 2020-08-30T20:00:00 ... 2020-09-01T...
step timedelta64[ns] 00:00:00
* isobaricInhPa (isobaricInhPa) float64 1e+03 950.0 900.0 ... 550.0 500.0
* latitude (latitude) float64 70.0 69.75 69.5 69.25 ... 30.5 30.25 30.0
* longitude (longitude) float64 -90.0 -89.75 -89.5 ... 9.5 9.75 10.0
valid_time (dim_0) datetime64[ns] 2020-08-30T20:00:00 ... 2020-09-01T...
Dimensions without coordinates: dim_0
Attributes:
GRIB_paramId: 131
GRIB_dataType: an
GRIB_numberOfPoints: 64561
GRIB_typeOfLevel: isobaricInhPa
GRIB_stepUnits: 1
GRIB_stepType: instant
GRIB_gridType: regular_ll
GRIB_NV: 0
GRIB_Nx: 401
GRIB_Ny: 161
GRIB_cfName: eastward_wind
GRIB_cfVarName: u
GRIB_gridDefinitionDescription: Latitude/Longitude Grid
GRIB_iDirectionIncrementInDegrees: 0.25
GRIB_iScansNegatively: 0
GRIB_jDirectionIncrementInDegrees: 0.25
GRIB_jPointsAreConsecutive: 0
GRIB_jScansPositively: 0
GRIB_latitudeOfFirstGridPointInDegrees: 70.0
GRIB_latitudeOfLastGridPointInDegrees: 30.0
GRIB_longitudeOfFirstGridPointInDegrees: -90.0
GRIB_longitudeOfLastGridPointInDegrees: 10.0
GRIB_missingValue: 9999
GRIB_name: U component of wind
GRIB_shortName: u
GRIB_totalNumber: 0
GRIB_units: m s**-1
long_name: U component of wind
units: m s**-1
standard_name: eastward_wind
Why does this occur? And how can I fix this?

Is there another alternative to DataArray.sel(time = slice(x, y)) in xarray?

For some reason, the DataArray.sel(time = slice(x, y)) is working for me without any problem for the months of January to June, where x and y are both equal to values ranging from 1 for January to 6 for June. However, this method is not working for July to December. I have checked the input data, which is a netCDF4 file and it is not corrupted. Therefore, I am looking for an alternative to use instead of DataArray.sel(time = slice(x, y)) in xarray to extract the data for the months of July to December.
The code is as follows:
import xarray as xr
td = xr.open_dataset(r'C:\Users\abc\Desktop\misc\netcdf_to_geotiff\ECLIPSEv5_monthly_patterns.nc')
td_agr = td.agr
td_agrtime = td_agr.sel(time = slice('1', '1'))
which gives the output:
In [7]: td_agrtime
Out[7]:
<xarray.DataArray 'agr' (time: 1, lat: 360, lon: 720)>
[259200 values with dtype=float64]
Coordinates:
* lat (lat) float64 -89.75 -89.25 -88.75 -88.25 ... 88.75 89.25
89.75
* lon (lon) float64 -179.8 -179.2 -178.8 -178.2 ... 178.8 179.2
179.8
* time (time) int32 1
Attributes:
long_name: Monthly weights - Agriculture (animals, rice, soil)
sector: Agriculture (animals, rice, soil)
If the 1 is changed to 7 in the code as follows:
td_agrtime = td_agr.sel(time = slice('7', '7')
the output is:
In [7]: td_agrtime
Out[9]:
<xarray.DataArray 'agr' (time: 6, lat: 360, lon: 720)>
[1555200 values with dtype=float64]
Coordinates:
* lat (lat) float64 -89.75 -89.25 -88.75 -88.25 ... 88.75 89.25
89.75
* lon (lon) float64 -179.8 -179.2 -178.8 -178.2 ... 178.8 179.2
179.8
* time (time) int32 7 8 9 10 11 12
Attributes:
long_name: Monthly weights - Agriculture (animals, rice, soil)
sector: Agriculture (animals, rice, soil)
Thanks to Robert Davy for his comment. The answer is to use .isel(), instead of .sel().

Python Change Dimension and Coordinates Xarray Dataset

I have a xarray Dataset that looks like this below. I need to be able to plot by latitude and longitude any of the three variables in "Data variables: si10, si10_u, avg". However, I cannot figure out how to change the dimensions to latitude and longitude from index_id. Or, to delete "index_id" in Coordinates. I've tried that and then 'latitude' and 'longitude' disappear from "Coordinates". Thank you for suggestions.
Here is my xarray Dataset:
<xarray.Dataset>
Dimensions: (index: 2448, index_id: 2448)
Coordinates:
* index_id (index_id) MultiIndex
- latitude (index_id) float64 58.0 58.0 58.0 58.0 ... 23.0 23.0 23.0 23.0
- longitude (index_id) float64 -130.0 -129.0 -128.0 ... -65.0 -64.0 -63.0
Dimensions without coordinates: index
Data variables:
si10 (index) float32 1.7636629 1.899161 ... 5.9699616 5.9121003
si10_u (index) float32 1.6784391 1.7533684 ... 6.13361 6.139127
avg (index) float32 1.721051 1.8262646 ... 6.0517855 6.025614
You have two issues. First, you need to replace 'index' with 'index_id' so your data is indexed consistently. Second, to unstack 'index_id', you're looking for xr.Dataset.unstack:
ds = ds.unstack('index_id')
As an example... here's a dataset like yours
In [16]: y = np.arange(58, 23, -1)
...: x = np.arange(-130, -63, 1)
In [17]: ds = xr.Dataset(
...: data_vars={
...: v: (("index",), np.random.random(len(x) * len(y)))
...: for v in ["si10", "si10_u", "avg"]
...: },
...: coords={
...: "index_id": pd.MultiIndex.from_product(
...: [y, x], names=["latitude", "longitude"],
...: ),
...: },
...: )
In [18]: ds
Out[18]:
<xarray.Dataset>
Dimensions: (index: 2345, index_id: 2345)
Coordinates:
* index_id (index_id) MultiIndex
- latitude (index_id) int64 58 58 58 58 58 58 58 58 ... 24 24 24 24 24 24 24
- longitude (index_id) int64 -130 -129 -128 -127 -126 ... -68 -67 -66 -65 -64
Dimensions without coordinates: index
Data variables:
si10 (index) float64 0.9412 0.7395 0.6843 ... 0.03979 0.4259 0.09203
si10_u (index) float64 0.7359 0.1984 0.5919 ... 0.5535 0.2867 0.4093
avg (index) float64 0.04257 0.1442 0.008705 ... 0.1911 0.2669 0.1498
First, reorganize your data to have consistent dims:
In [19]: index_id = ds['index_id']
In [20]: ds = (
...: ds.drop("index_id")
...: .rename({"index": "index_id"})
...: .assign_coords(index_id=index_id)
...: )
Then, ds.unstack reorganizes the data to be the combinatorial product of all dimensions in the MultiIndex:
In [21]: ds.unstack("index_id")
Out[21]:
<xarray.Dataset>
Dimensions: (latitude: 35, longitude: 67)
Coordinates:
* latitude (latitude) int64 24 25 26 27 28 29 30 31 ... 52 53 54 55 56 57 58
* longitude (longitude) int64 -130 -129 -128 -127 -126 ... -67 -66 -65 -64
Data variables:
si10 (latitude, longitude) float64 0.9855 0.1467 ... 0.6569 0.9479
si10_u (latitude, longitude) float64 0.4672 0.2664 ... 0.4894 0.128
avg (latitude, longitude) float64 0.3738 0.01793 ... 0.1264 0.21

Multiply xarray datasets with different dimensions

I have two NetCDF files, one covers the continental US (dataset2) and the other only the northeast (dataset1). I'm trying to multiply the two values together in order to create one dataset, however I get a ValueError after doing the multiplication.
import xarray
dataset1=xarray.open_dataset('../data/precip.nc')
print(dataset1)
Output:
<xarray.Dataset>
Dimensions: (time: 24, x: 180, y: 235)
Coordinates:
* time (time) datetime64[ns] 2019-02-14 ... 2019-02-14T23:00:00
* y (y) float64 -4.791e+06 -4.786e+06 ... -3.681e+06 -3.677e+06
* x (x) float64 2.234e+06 2.238e+06 2.243e+06 ... 3.081e+06 3.086e+06
lat (y, x) float64 ...
lon (y, x) float64 ...
Data variables:
z (y, x) float64 ...
crs int32 ...
PRECIP (time, y, x) float32 ...
dataset2=xarray.open_dataset('../data/ratio.nc')
print(dataset2)
Output:
<xarray.Dataset>
Dimensions: (lat: 272, lon: 480, nv: 2)
Coordinates:
* lat (lat) float64 21.06 21.19 21.31 21.44 ... 54.69 54.81 54.94
* lon (lon) float64 -125.9 -125.8 -125.7 ... -66.31 -66.19 -66.06
Dimensions without coordinates: nv
Data variables:
lat_bounds (lat, nv) float64 ...
lon_bounds (lon, nv) float64 ...
crs int16 ...
Data (lat, lon) float32 ...
# Merge datasets
data=xarray.merge([dataset1, dataset2], compat='override')
print(data)
Output:
<xarray.Dataset>
Dimensions: (lat: 272, lon: 480, nv: 2, time: 24, x: 180, y: 235)
Coordinates:
* time (time) datetime64[ns] 2019-02-14 ... 2019-02-14T23:00:00
* y (y) float64 -4.791e+06 -4.786e+06 ... -3.681e+06 -3.677e+06
* x (x) float64 2.234e+06 2.238e+06 ... 3.081e+06 3.086e+06
* lat (lat) float64 21.06 21.19 21.31 21.44 ... 54.69 54.81 54.94
* lon (lon) float64 -125.9 -125.8 -125.7 ... -66.31 -66.19 -66.06
Dimensions without coordinates: nv
Data variables:
z (y, x) float64 ...
crs int32 ...
PRECIP (time, y, x) float32 ...
lat_bounds (lat, nv) float64 ...
lon_bounds (lon, nv) float64 ...
Data (lat, lon) float32 ...
# Get first hour of precip data
precip=data.PRECIP[0:, :, :]
# Get ratio data
slr=data.Data
# Multiply to get snowfall
snow=slr*precip
That last line then give me this error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-41-b9ed8e05f451> in <module>
----> 1 snow=slr*precip
~/.local/lib/python3.7/site-packages/xarray/core/dataarray.py in func(self, other)
2597 variable = (
2598 f(self.variable, other_variable)
-> 2599 if not reflexive
2600 else f(other_variable, self.variable)
2601 )
~/.local/lib/python3.7/site-packages/xarray/core/variable.py in func(self, other)
2034 new_data = (
2035 f(self_data, other_data)
-> 2036 if not reflexive
2037 else f(other_data, self_data)
2038 )
ValueError: iterator is too large
Solved following https://gis.stackexchange.com/questions/339463/using-xarray-to-resample-and-merge-two-datasets
slr_interpolate = slr.interp(lat=precip["lat"], lon=precip["lon"])
mpe_snowfall=slr_interpolate.Data*precip.Data

Xarray: Make two DataArrays in the same Dataset use the same coordinate system

I have an ArviZ InferenceData posterior trace which is an XArray Dataset.
In there, posterior traces for two of my random variables, a_mu_org and b_mu_org are DataArrays. Their coordinates are:
a_mu_org: (chain, draws, a_mu_org), with lengths (1, 2000, 15) respectively.
b_mu_org: (chain, draws, b_mu_org), with lengths (1, 2000, 15) respectively.
Semantically, a_mu_org and b_mu_org should really be indexed by a single categorical coordinate system of 15 organisms, rather than be separate indexes.
For a bit more clarity, here is the full dataset string repr:
<xarray.Dataset>
Dimensions: (L_dim_0: 34281, a_dim_0: 456260, a_prot_shift_dim_0: 34281, b_dim_0: 456260, b_mu_org_dim_0: 15, b_prot_shift_dim_0: 34281, chain: 1, draw: 2000, organism: 15, sigma_dim_0: 34281, t50_org_dim_0: 15, t50_prot_dim_0: 39957)
Coordinates:
* chain (chain) int64 0
* draw (draw) int64 0 1 2 3 4 5 ... 1995 1996 1997 1998 1999
* a_prot_shift_dim_0 (a_prot_shift_dim_0) object 'A0A023PXQ4_YMR173W-A' ... 'Z4YNA9_AB124611'
* b_prot_shift_dim_0 (b_prot_shift_dim_0) object 'A0A023PXQ4_YMR173W-A' ... 'Z4YNA9_AB124611'
* L_dim_0 (L_dim_0) object 'A0A023PXQ4_YMR173W-A' ... 'Z4YNA9_AB124611'
a_mu_org_dim_0 (organism) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
* a_dim_0 (a_dim_0) object 'ytzI' 'mtlF' ... 'atpG2' 'atpB2'
* b_mu_org_dim_0 (b_mu_org_dim_0) int64 0 1 2 3 4 5 ... 9 10 11 12 13 14
* b_dim_0 (b_dim_0) object 'ytzI' 'mtlF' ... 'atpG2' 'atpB2'
* t50_prot_dim_0 (t50_prot_dim_0) <U65 'Bacillus subtilis_168_lysate_R1-C0H3Q1_ytzI' ... 'Oleispira antarctica_RB-8_lysate_R1-R4YVF0_atpB2'
* t50_org_dim_0 (t50_org_dim_0) <U43 'Arabidopsis thaliana seedling lysate' ... 'Thermus thermophilus HB27 lysate'
* sigma_dim_0 (sigma_dim_0) object 'A0A023PXQ4_YMR173W-A' ... 'Z4YNA9_AB124611'
Dimensions without coordinates: organism
Data variables:
a_org_pop (chain, draw) float32 519.3236 518.8292 ... 517.84784
a_prot_shift (chain, draw, a_prot_shift_dim_0) float32 ...
b_org_pop (chain, draw) float32 11.509291 11.445394 ... 11.929538
b_prot_shift (chain, draw, b_prot_shift_dim_0) float32 ...
L_pop (chain, draw) float32 3.445896 3.4300675 ... 3.3917112
L (chain, draw, L_dim_0) float32 ...
a_mu_org (chain, draw, organism) float32 430.56827 ... 813.2518
a (chain, draw, a_dim_0) float32 ...
b_mu_org (chain, draw, b_mu_org_dim_0) float32 9.997488 ... 8.389757
b (chain, draw, b_dim_0) float32 ...
t50_prot (chain, draw, t50_prot_dim_0) float32 39.249863 ... 52.19809
t50_org (chain, draw, t50_org_dim_0) float32 43.067646 ... 96.93388
sigma (chain, draw, sigma_dim_0) float32 ...
Attributes:
created_at: 2020-04-23T08:54:58.300091
arviz_version: 0.7.0
inference_library: pymc3
inference_library_version: 3.8
I would like to make a_mu_org and b_mu_org take on dimensions (chain, draw, organism) instead of their separate a_mu_org and b_mu_org. Things I have already tried include:
Adding a coordinate called organism, and then doing trace.posterior.swap_dims({"a_mu_org_dim_0": "organism"}), but I get an error stating that "replacement dimension 'organism' is not a 1D variable along the old dimension 'a_mu_org_dim_0'".
Renaming the dimension a_mu_org_dim_0 to organism, but then I also can't swap b_mu_org_dim_0 to the new organism.
Is what I'm trying to accomplish possible?
I am not sure my solution is very good practice, it feels a little too hacky. Also, terminology is quite tricky, I'll try to stick to xarray terminology but may fail in doing so. The trick is to remove the coordinates so that a_dim_0 and b_dim_0 become only dimensions (now dimensions without coordinates). Afterwards, they can be renamed to the same thing and assigned to a new coord. Here is one example:
Starting from the following dataset called ds:
<xarray.Dataset>
Dimensions: (a_dim_0: 15, b_dim_0: 15, chain: 4, draw: 100)
Coordinates:
* chain (chain) int64 0 1 2 3
* draw (draw) int64 0 1 2 3 4 5 6 7 8 9 ... 90 91 92 93 94 95 96 97 98 99
* a_dim_0 (a_dim_0) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
* b_dim_0 (b_dim_0) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Data variables:
a (chain, draw, a_dim_0) float64 0.8152 1.189 ... 1.32 -0.2023
b (chain, draw, b_dim_0) float64 0.6447 -0.8059 ... -0.06435 -0.8666
the following 3 commands do the trick (the place of the assign_coord does not seem to affect the output, which makes sense, but it is key to first remove coordinates and then rename):
organism_names = [f"o{i}" for i in range(15)]
ds.reset_index(["a_dim_0", "b_dim_0"], drop=True) \
.assign_coords(organism=organism_names) \
.rename({"a_dim_0": "organism", "b_dim_0": "organism"})
Output:
<xarray.Dataset>
Dimensions: (chain: 4, draw: 100, organism: 15)
Coordinates:
* chain (chain) int64 0 1 2 3
* draw (draw) int64 0 1 2 3 4 5 6 7 8 9 ... 90 91 92 93 94 95 96 97 98 99
* organism (organism) <U3 'o0' 'o1' 'o2' 'o3' ... 'o11' 'o12' 'o13' 'o14'
Data variables:
a (chain, draw, organism) float64 0.8152 1.189 ... 1.32 -0.2023
b (chain, draw, organism) float64 0.6447 -0.8059 ... -0.8666

Categories