How do I multiply two Gaussian distributions?

How do I multiply two Gaussian distributions? - python

I am trying to multiply two Gaussian distributions to obtain posterior for GMM data. In order to do that, I am trying to use .prob() function from tf.contrib.distributions.MultivariateNormalDiag, but every time I am getting the same error, even if I am providing the argument with float64.
I am using TensorFlow 1.8 version.
x = tf.placeholder(tf.float64, [None,2], name="input")
likelihood = tf.contrib.distributions.MultivariateNormalDiag(loc = [0., 0., 0.], scale_diag= [1., 1., 1.])
y_LL = likelihood.prob(x).eval()
TypeError: Input had dtype <dtype: 'float32'> but expected <dtype: 'float64'>.
I am confused whether I am doing it the wrong way, or what? Can someone please help me with this?

For this example, you are using x as a tf.float64. Unless you explicitly specify, tensorflow will auto-convert list inputs to tf.float32. You want to do something like (not executable code, but demonstrating you need to signal float64):
import numpy as np
likelihood = tf.contrib.distributions.MultivariateNormalDiag(loc=np.float64([0., 0., 0.]), scale_diag=np.float64([1., 1., 1.]))
y_LL = likelihood.prob(x).eval()

Related

eofs.xarray raising TypeError (Using a DataArray to construct a variable is ambiguous)

I'm working on a multidimensional dataset using xarray and had some issues with eofs, the EOF analysis package, and particularly, with its xarray interface.
My xarray DataArray looks like this:
<xarray.DataArray 'timeMonthly_avg_flux' (time: 1800, y: 601, x: 601)>
array([[[0., 0., ..., 0., 0.],
[0., 0., ..., 0., 0.],
...,
[0., 0., ..., 0., 0.],
[0., 0., ..., 0., 0.]],
[[0., 0., ..., 0., 0.],
[0., 0., ..., 0., 0.],
...,
[0., 0., ..., 0., 0.],
[0., 0., ..., 0., 0.]]])
Coordinates:
lat (y, x) float64 ...
lon (y, x) float64 ...
time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2150-12-31
x (x) float64 -3e+06 -2.99e+06 -2.98e+06 ... 2.98e+06 2.99e+06 3e+06
y (y) float64 -3e+06 -2.99e+06 -2.98e+06 ... 2.98e+06 2.99e+06 3e+06
The problem arises when I run the following:
from eofs.xarray import Eof
solver = Eof(flux) # flux is the above DataArray
flux_eofs = solver.eofs()
for which I get the following TypeError:
TypeError: Using a DataArray object to construct a variable is ambiguous, please extract the data using the .data property.
Also noting that other methods in this function work as intended: I am able to call the principal components as below:
flux_pcs = solver.pcs()
The dataset does have NaN values, but as far as I can tell, the eofs.xarray module has been designed to handle NaNs. For now, my workaround has been to convert the dataset into a Numpy array and use the eofs.standard interface instead, and convert the outputs back into xarray Datasets/DataArrays as required. All methods work as intended when I do this:
from eofs.standard import Eof
flux_np = flux.to_numpy()
solver = Eof(flux_np)
flux_eofs = solver.eofs()
I could find two other instances of this error being raised: as part of the w2w package, where it seems to have been something to do with the python environment, and here, as part of the PyWake project, but it's not clear to me what the problem was.

For everyone encountering this issue, this is a bug which has been also raised and discussed on Github (also by the author of this question :)). The xarray compability is a bit broken currently.
For the time being this can be fixed manually by changing the
eofs/lib/eofs/xarray.py - file
#Lines 638 to 640
# Add non-dimension coordinates.
pcs.coords.update({coord.name: (coord.dims, coord)
for coord in time_ndcoords})
to
# Add non-dimension coordinates.
pcs.coords.update({coord.name: (coord.dims, coord.data)
for coord in time_ndcoords})
There is a pull request fixing this that has unfortunately not been merged yet.

Sorry for the shameless self-promotion ;) you may want to give xeofs a try. It provides EOF analysis (and more) in xarray.

I recently ran into the same error message in my project (it is different in nature to yours). I pip uninstalled the latest version of xarray on my PC (0.20.2) and installed an older version (0.16.0), and (at least) that error went away.

How to count non-zeroes values using binned_statistic

I need to efficiently process very large 1D arrays extracting some statistics per bin and I have found very useful the function binned_statistic from scipy.stats as it includes a 'statistic' argument that works quite efficiently.
I would like to perform a 'count' function but without considering zero values.
I am working in parallel with sliding windows (pandas rolling function) over the same arrays and it work nicely to substitute zeroes to NaN, but this behavior is not shared to my case.
This is a toy example of what I am doing:
import numpy as np
import pandas as pd
from scipy.stats import binned_statistic
# As example with sliding windows, this returns just the length of each window:
a = np.array([1., 0., 0., 1.])
pd.Series(a).rolling(2).count() # Returns [1.,2.,2.,2.]
# You can make the count to do it only if not zero:
nonzero_a = a.copy()
nonzero_a[nonzero_a==0.0]='nan'
pd.Series(nonzero_a).rolling(2).count() # Returns [1.,1.,0.,1.]
# However, with binned_statistic I am not able to do anything similar:
binned_statistic(range(4), a, bins=2, statistic='count')[0]
binned_statistic(range(4), nonzero_a, bins=2, statistic='count')[0]
binned_statistic(range(4), np.array([1., False, None, 1.], bins=2, statistic='count')[0]
All the previous runs provide the same output: [2., 2.] but I am expecting [1., 1.].
The only option found is to pass a custom function but it performs considerably worst than the implemented functions with real cases.
binned_statistic(range(4), a, bins=2, statistic=np.count_nonzero)

I have found and easy way to replicate the nonzero count transforming the array to 0-1 and applying sum:
# Transform all non-zero to 1s
a = np.array([1., 0., 0., 2.])
nonzero_a = a.copy()
nonzero_a[nonzero_a>0.0]=1.0 # nonzero_a = [1., 0., 0., 1.]
binned_statistic(np.arange(len(nonzero_a)), nonzero_a, bins=bins, statistic='sum')[0] # Returns [1.0, 1.0]

Xgboost OneHotEncoding: merge numerical and encoded array

My dataset contains one numerical feature and one categorical feature. It only has 20 observations (for the question purpose).
X is a numpy array of shape (20,1) and is like:
array([[10],
[465],
[3556],
[899],
[090],
....]]
encoded_x is a numpy array of shape (20,4) and is like:
array([[ 0., 1., 0., 0.],
[ 1., 0., 0., 0.],
[ 0., 0., 1., 0.],
[ 0., 0., 1., 0.],
...................]]
Question: Now, how can I merge those array to give them as input to Xgboost?
How should the final array look like?
My understanding is that numerical features should not be encoded, that is why I have two distinct arrays.

XGBoost approach is a bit different from, say, neural networks. It requires you to have one numerical matrix for the input, and this makes you think differently about what a feature is.
From your point of view, there are 2 features: one categorical and one numerical. But XGBoost sees 5 features, 4 of which, for some reason, take just two values: 0 or 1. XGBoost doesn't know about one-hot encoding, it sees only numbers.
As a result, no matter how you encode your categorical feature (ordinal or one-hot), you should just concatenate all of result arrays into a single 2D array and fit it to the model.
x1 = np.arange(20).reshape([-1, 1]) # numerical feature
x2 = np.random.randint(0, 2, size=[20, 4]) # not one-hot, but still ok for XGBoost
x = np.concatenate([x1, x2], axis=1) # now it's 5 XGBoost features

Creating a h5py file data.hy and each data point contains multi-dimensional file

I'm following this repository (https://github.com/gitlimlab/SSGAN-Tensorflow) and trying to use my own dataset. As mention there
Store your data as an h5py file datasets/YOUR_DATASET/data.hy and each
data point contains
'image': has shape [h, w, c], where c is the
number of channels (grayscale images: 1, color images: 3)
'label':
represented as an one-hot vector
I could not find something that helps in creating a file with same extension data.hy but I tried to follow the main tutorial on h5py:
import h5py
f = h5py.File("dataset.hy", "w")
dataset = f.create_dataset("default", shape=(3,10)) #I have ten classes
but to check that the initialization is correct I printed datatset[0] which gave the following output
In [7]: dataset.shape
Out[7]: (3, 10)
In [8]: dataset[0]
Out[8]: array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32)
This obviously means that I did not shape the dataset correctly but I don't know how to fix it. I know that the h5py follows the same was as numpy shaping but not sure how to fix it in here.
EDIT:
What I want to do is to fix the shape of the dataset so each point has two columns, each has a 1-d vector with a different number of elements e.g.
[[h,w,c],[0,1,2,3,4,5,6,7,8,9]]

Array of hermite values in numpy

I have a data structure that looks like a list values and I am trying to compute the (x,y) 2d hermite functions from them using numpy. I'm trying to use as many numpy arrays as possible due to the performance boost you get from getting to Fortran as quickly as possible (I'm expecting x to be in practice many thousands of 3-arrays). Specifically, my code looks like this:
x = np.array([[1., 2., 3.], [4., 5., 6.]])
coefs = np.array([[[1., 0.],[0., 1.]], [[0., 1.], [1., 0.]]])
z = np.array([0., 0.])
z[:] = hermval2d(x[:,0], x[:,1], coefs[:])
This returns an error about the shape of hermval2d, which according to just running the hermval2d function instead of assigning it:
In [XX]: hermval2d(x[:,0], x[:,1], coefs[:])
Out[XX]:
array([[ 9., 81.],
[ 6., 18.]])
I would expect the hermval2d to be a scalar for every x, y, and coefficient matrix, which is what you would expect from the documentation. So what am I missing here? What's the score?

It's right there in the docs :)
hermval2d(x, y, c)
[...]
The shape of the result will be c.shape[2:] + x.shape
In your case this seems to return the Hermite values for x and y evaluated for each ith 2d array in c[:,:,i].

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How do I multiply two Gaussian distributions? - python

Related

eofs.xarray raising TypeError (Using a DataArray to construct a variable is ambiguous)

How to count non-zeroes values using binned_statistic

Xgboost OneHotEncoding: merge numerical and encoded array

Creating a h5py file data.hy and each data point contains multi-dimensional file

Array of hermite values in numpy

Categories

Resources