why is numpy.fromstring reading numbers wrong? - python

I am writing code that uses numpy.fromstring to read arrays from xml element text.
It runs with no error, but what it reads is very strange.
for example
import numpy as np
nr = 24
r_string = '''
0.0000 0.0100 0.0200 0.0300 0.0400 0.0500 0.0600 0.0700
0.0800 0.0900 0.1000 0.1100 0.1200 0.1300 0.1400 0.1500
0.1600 0.1700 0.1800 0.1900 0.2000 0.2100 0.2200 0.2300
'''
r = np.fromstring(r_string, count = nr)
print(r)
prints the following(garbage)
[1.20737375e-153 1.48440234e-076 1.30354286e-076 6.96312257e-077
6.01356142e-154 1.20737830e-153 1.82984908e-076 1.30354286e-076
6.96312257e-077 3.22522589e-086 6.01347037e-154 6.03686893e-154
1.39804459e-076 9.72377416e-072 3.24245662e-086 6.01347037e-154
6.03686880e-154 1.39939399e-076 1.79371973e-052 1.91654811e-076
8.54289848e-072 6.96312257e-077 6.01356142e-154 1.20738399e-153]
What is going on here?
I will appreciate help here.

you need to declare sep=' '
>>> r = np.fromstring(r_string, count = nr, sep=' ')
>>> r
array([0. , 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1 ,
0.11, 0.12, 0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.2 , 0.21,
0.22, 0.23])

Running np.fromstring without the sep specified will actually throw the warning:
DeprecationWarning: The binary mode of fromstring is deprecated, as it behaves surprisingly on unicode inputs.
You need to specify your seperator, like:
np.fromstring(r_string, sep="\t")
Output:
array([0. , 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1 ,
0.11, 0.12, 0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.2 , 0.21,
0.22, 0.23])

Related

Plotly python regression in ternary space

I'm trying to draw a regression line in plotly python in ternary space, but there doesn't seem to be an option like "trendline = 'loess' for scatter ternaries. Is there another way to achieve the same result for ternaries? Code from a previous post that makes a spline line but not a regression.
import numpy as np
import plotly.graph_objects as go
a = np.array([0.15, 0.15, 0.17, 0.2 , 0.21, 0.24, 0.26, 0.27, 0.27, 0.29, 0.32, 0.35, 0.39, 0.4 , 0.4 , 0.41, 0.47, 0.48, 0.51, 0.52, 0.54, 0.56, 0.59, 0.62, 0.63, 0.65, 0.69, 0.73, 0.74])
b = np.array([0.14, 0.15, 0.1 , 0.17, 0.17, 0.18, 0.05, 0.16, 0.17, 0.04, 0.03, 0.14, 0.13, 0.13, 0.14, 0.14, 0.13, 0.13, 0.14, 0.14, 0.15, 0.16, 0.18, 0.2 , 0.21, 0.22, 0.24, 0.25, 0.25])
c = np.array([0.71, 0.7 , 0.73, 0.63, 0.62, 0.58, 0.69, 0.57, 0.56, 0.67, 0.65, 0.51, 0.48, 0.47, 0.46, 0.45, 0.4 , 0.39, 0.35, 0.34, 0.31, 0.28, 0.23, 0.18, 0.16, 0.13, 0.07, 0.02, 0.01])
fig = go.Figure()
curve_portion = np.where((b < 0.15) & (c > 0.6))
curve_other_portion = np.where(~((b < 0.15) & (c > 0.6)))
def add_plot_spline_portions(fig, indices_groupings):
for indices in indices_groupings:
fig.add_trace(go.Scatterternary({
'mode': 'lines',
'connectgaps': True,
'a': a[indices],
'b': b[indices],
'c': c[indices],
'line': {'color': 'black', 'shape': 'spline', 'smoothing': 1},
'marker': {'size': 2, 'line': {'width': 0.1}}
})
)
add_plot_spline_portions(fig, [curve_portion, curve_other_portion])
fig.show(renderer='png')
I can outline what I think is a general sort of solution - it doesn't have as much mathematical rigor as I would like, and involves some guess and check type work - but hopefully it's helpful.
The first consideration is that for this regression on a ternary plot, there are only two degrees of freedom because A+B+C=1 (you might find this explanation helpful). This means it only makes sense to consider the relationship between two of the variables at a time. What we really want to do is create a regression between two of the variables, then determine the value of the third variable using the equation A+B+C=1.
The second consideration is bit harder to define, but since you are after a regression that captures the "reversing" nature of the variable A, we want a regression where A can take on repeated values. I think the most straightforward way to achieve this is for A to be the variable you are predicting.
For simplicity sake, let's say we use a degree 2 polynomial regression that predicts A from either B or C. We can make a scatter and choose whichever polynomial will have a better fit for our purposes.
Here is a quick eda:
import numpy as np
import plotly.graph_objects as go
from plotly.subplots import make_subplots
a = np.array([0.15, 0.15, 0.17, 0.2 , 0.21, 0.24, 0.26, 0.27, 0.27, 0.29, 0.32, 0.35, 0.39, 0.4 , 0.4 , 0.41, 0.47, 0.48, 0.51, 0.52, 0.54, 0.56, 0.59, 0.62, 0.63, 0.65, 0.69, 0.73, 0.74])
b = np.array([0.14, 0.15, 0.1 , 0.17, 0.17, 0.18, 0.05, 0.16, 0.17, 0.04, 0.03, 0.14, 0.13, 0.13, 0.14, 0.14, 0.13, 0.13, 0.14, 0.14, 0.15, 0.16, 0.18, 0.2 , 0.21, 0.22, 0.24, 0.25, 0.25])
c = np.array([0.71, 0.7 , 0.73, 0.63, 0.62, 0.58, 0.69, 0.57, 0.56, 0.67, 0.65, 0.51, 0.48, 0.47, 0.46, 0.45, 0.4 , 0.39, 0.35, 0.34, 0.31, 0.28, 0.23, 0.18, 0.16, 0.13, 0.07, 0.02, 0.01])
## eda to determine polynomial of best fit to predict A
fig_eda = make_subplots(rows=1, cols=2)
fig_eda.add_trace(go.Scatter(x=b, y=a, mode='markers'),row=1, col=1)
coefficients = np.polyfit(b,a,2)
p = np.poly1d(coefficients)
b_vals = np.linspace(min(b),max(b))
a_pred = np.array([p(x) for x in b_vals])
fig_eda.add_trace(go.Scatter(x=b_vals, y=a_pred, mode='lines'),row=1, col=1)
fig_eda.add_trace(go.Scatter(x=c, y=a, mode='markers'),row=1, col=2)
coefficients = np.polyfit(c,a,2)
p = np.poly1d(coefficients)
c_vals = np.linspace(min(c),max(c))
a_pred = np.array([p(x) for x in c_vals])
fig_eda.add_trace(go.Scatter(x=c_vals, y=a_pred, mode='lines'),row=1, col=2)
Notice how predicting A from B looks like it captures the reversing nature of A better than predicting A from C. If we try to make a degree 2 polynomial regression on A from C, we can see A is not going to repeat within the domain of C: [0,1] because of the very low sloping nature of that polynomial.
So let's proceed with this regression with C as the predictor variable, and A as the predicted variable (and B also being a predicted variable using B = 1 - (A + C).
fig = go.Figure()
fig.add_trace(go.Scatterternary({
'mode': 'markers',
'connectgaps': True,
'a': a,
'b': b,
'c': c
}))
## since A+B+C = 100, we only need to fit a polynomial between two of the variables
## fit an n-degree polynomial to 2 of your variables
## source https://numpy.org/doc/stable/reference/generated/numpy.polyfit.html
coefficients = np.polyfit(b,a,2)
p = np.poly1d(coefficients)
## we use the entire domain of the input variable B
b_vals = np.linspace(0,1)
a_pred = np.array([p(x) for x in b_vals])
c_pred = 1 - (b_vals + a_pred)
fig.add_trace(go.Scatterternary({
'mode': 'lines',
'connectgaps': True,
'a': a_pred,
'b': b_vals,
'c': c_pred,
'marker': {'size': 2, 'color':'red', 'line': {'width': 0.1}}
}))
fig.show()
This is the lowest degree polynomial regression that allows for repeated values of A (a linear regression to predict A would be the wouldn't allow A to take on repeated values). However, you can definitely experiment with increasing the degree of the polynomial you are using, and predicting A from either variables B or C.

Generation of a nested list with different ranges

Generation of a list of many lists each with different ranges
Isc_act = [0.1, 0.2, 0.3]
I_cel = []
a = []
for i in range(0,len(Isc_act)):
a = np.arange(0, Isc_act[i], 0.1*Isc_act[i])
I_cel[i].append(a)
print(I_cel)
Output is:
IndexError: list index out of range
My code is giving error. But, I want to get I_cel = [[0,0.01,..,0.1],[0,0.02,0.04,...,0.2],[0, 0.03, 0.06,...,0.3]]. Hence, the 'nested list' I_cel has three lists and each list has 10 values.
The simplest fix to your code, probably what you were intending to do:
Isc_act = [0.1, 0.2, 0.3]
I_cel = []
for i in range(0,len(Isc_act)):
a = np.arange(0, Isc_act[i], 0.1*Isc_act[i])
I_cel.append(a)
print(I_cel)
Note that the endpoint will be one step less than you wanted! For example row zero, you have to pick two of the below:
Steps of size 0.01
Start point 0.0 and end point 0.1
10 elements total
You can not have all three.
More numpythonic approach:
>>> Isc_act = [0.1, 0.2, 0.3]
>>> (np.linspace(0, 1, 11).reshape(11,1) # [Isc_act]).T
array([[0. , 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1 ],
[0. , 0.02, 0.04, 0.06, 0.08, 0.1 , 0.12, 0.14, 0.16, 0.18, 0.2 ],
[0. , 0.03, 0.06, 0.09, 0.12, 0.15, 0.18, 0.21, 0.24, 0.27, 0.3 ]])
linspace gives better control of the end point when dealing with floats:
In [84]: [np.linspace(0,x,11) for x in [.1,.2,.3]]
Out[84]:
[array([0. , 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1 ]),
array([0. , 0.02, 0.04, 0.06, 0.08, 0.1 , 0.12, 0.14, 0.16, 0.18, 0.2 ]),
array([0. , 0.03, 0.06, 0.09, 0.12, 0.15, 0.18, 0.21, 0.24, 0.27, 0.3 ])]
Or we could scale just one array (arange with integers is predictable):
In [86]: np.array([.1,.2,.3])[:,None]*np.arange(0,11)
Out[86]:
array([[0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ],
[0. , 0.2, 0.4, 0.6, 0.8, 1. , 1.2, 1.4, 1.6, 1.8, 2. ],
[0. , 0.3, 0.6, 0.9, 1.2, 1.5, 1.8, 2.1, 2.4, 2.7, 3. ]])

Numpy advanced indexing usage

Case 1 (solved): Array A has shape (say) (300,50). Array B is an indices array with the shape (300,5), such that B[i,j] indicate for the row i the index of another row to "concate" next to the row i. The end result is an array C with the shape (300,5,50), such that C[i,j,:] = A[B[i,j],:]. This can be done by calling A[B,:].
Here is small script example for case 1:
import numpy as np
## A is the data array
A = np.arange(20).reshape((5,4))
## B indicate for each row which rows to pull together
B = np.array([[0,2],[1,2],[2,0],[3,4],[4,1]])
A[B,:] #The desired result
Case 2 (unsolved): Same problem, only now A is shaped (100,300,50). If B is the indicies matrix shaped (100,300,5), the end result would be an array C with the shape (100,300,5,50) such that C[i,j,k,:] = A[i,B[i,j,k],:]. A[B,:] doesn't work anymore, because it result with a shape (100,300,5,300,50), due to broadcasting.
How should I approach this with indexing?
One approach would be reshaping to 2D keeping the number of columns intact and then indexing into the first axis with the flattened B indices and finally reshaping back to the desired one.
Thus, the implementation would be -
A.reshape(-1,A.shape[-1])[B.ravel()].reshape(100,300,5,50)
Those reshaping being merely views into the arrays, should be quite efficient.
This solves both cases. Here's a sample run for the case #1 -
1) Inputs :
In [667]: A = np.random.rand(3,4)
...: B = np.random.randint(0,3,(3,5))
...:
2) Original method :
In [668]: A[B,:]
Out[668]:
array([[[ 0.1 , 0.91, 0.1 , 0.98],
[ 0.1 , 0.91, 0.1 , 0.98],
[ 0.1 , 0.91, 0.1 , 0.98],
[ 0.45, 0.16, 0.02, 0.02],
[ 0.1 , 0.91, 0.1 , 0.98]],
[[ 0.45, 0.16, 0.02, 0.02],
[ 0.48, 0.6 , 0.96, 0.21],
[ 0.48, 0.6 , 0.96, 0.21],
[ 0.1 , 0.91, 0.1 , 0.98],
[ 0.45, 0.16, 0.02, 0.02]],
[[ 0.48, 0.6 , 0.96, 0.21],
[ 0.45, 0.16, 0.02, 0.02],
[ 0.48, 0.6 , 0.96, 0.21],
[ 0.45, 0.16, 0.02, 0.02],
[ 0.45, 0.16, 0.02, 0.02]]])
3) Proposed method :
In [669]: A.reshape(-1,A.shape[-1])[B.ravel()].reshape(3,5,4)
Out[669]:
array([[[ 0.1 , 0.91, 0.1 , 0.98],
[ 0.1 , 0.91, 0.1 , 0.98],
[ 0.1 , 0.91, 0.1 , 0.98],
[ 0.45, 0.16, 0.02, 0.02],
[ 0.1 , 0.91, 0.1 , 0.98]],
[[ 0.45, 0.16, 0.02, 0.02],
[ 0.48, 0.6 , 0.96, 0.21],
[ 0.48, 0.6 , 0.96, 0.21],
[ 0.1 , 0.91, 0.1 , 0.98],
[ 0.45, 0.16, 0.02, 0.02]],
[[ 0.48, 0.6 , 0.96, 0.21],
[ 0.45, 0.16, 0.02, 0.02],
[ 0.48, 0.6 , 0.96, 0.21],
[ 0.45, 0.16, 0.02, 0.02],
[ 0.45, 0.16, 0.02, 0.02]]])

sorting of 2d array min to max in tensorflow

I have an array
x1 = tf.Variable([[0.51, 0.52, 0.53, 0.94, 0.35],
[0.32, 0.72, 0.83, 0.74, 0.55],
[0.23, 0.72, 0.63, 0.64, 0.35],
[0.11, 0.02, 0.03, 0.14, 0.15],
[0.01, 0.72, 0.73, 0.04, 0.75]],tf.float32)
I want to sort the elements in each row from min to max. Is there any function for doing such ?
In the example here they are using tf.nn.top_k2d array,using this I can loop to create the max to min.
def sort(instance):
sorted = []
rows = tf.shape(instance)[0]
col = tf.shape(instance)[1]
for i in range(rows.eval()):
matrix.append([tf.gather(instance[i], tf.nn.top_k(instance[i], k=col.eval()).indices)])
return matrix
Is there any thing similar for finding the min to max or how to reverse the array in each row ?
As suggested by #Yaroslav you can just use the top_k values.
a = tf.Variable([[0.51, 0.52, 0.53, 0.94, 0.35],
[0.32, 0.72, 0.83, 0.74, 0.55],
[0.23, 0.72, 0.63, 0.64, 0.35],
[0.11, 0.02, 0.03, 0.14, 0.15],
[0.01, 0.72, 0.73, 0.04, 0.75]],tf.float32)
row_size = a.get_shape().as_list()[-1]
top_k = tf.nn.top_k(-a, k=row_size)
sess.run(-top_k.values)
this prints for me
array([[ 0.34999999, 0.50999999, 0.51999998, 0.52999997, 0.94 ],
[ 0.31999999, 0.55000001, 0.72000003, 0.74000001, 0.82999998],
[ 0.23 , 0.34999999, 0.63 , 0.63999999, 0.72000003],
[ 0.02 , 0.03 , 0.11 , 0.14 , 0.15000001],
[ 0.01 , 0.04 , 0.72000003, 0.73000002, 0.75 ]], dtype=float32)

How to compare two arrays and find the optimal match in Python?

I have two arrays X and Y, X is the base array and Y is operated in a loop. As the loop runs I want to compare the arrays to find the nearest value of Y to X or in other words where is Y most close to X. As an example I have attached the reproducible code:
from __future__ import division
import numpy as np
import matplotlib.pyplot as plt
from scipy import interpolate
x = np.array([[0.12, 0.11, 0.1, 0.09, 0.08],
[0.13, 0.12, 0.11, 0.1, 0.09],
[0.15, 0.14, 0.12, 0.11, 0.1],
[0.17, 0.15, 0.14, 0.12, 0.11],
[0.19, 0.17, 0.16, 0.14, 0.12],
[0.22, 0.19, 0.17, 0.15, 0.13],
[0.24, 0.22, 0.19, 0.16, 0.14],
[0.27, 0.24, 0.21, 0.18, 0.15],
[0.29, 0.26, 0.22, 0.19, 0.16]])
y = np.array([[0.07, 0.06, 0.05, 0.04, 0.03],
[0.08, 0.07, 0.06, 0.05, 0.04],
[0.10, 0.09, 0.07, 0.06, 0.05],
[0.14, 0.12, 0.11, 0.09, 0.08],
[0.16, 0.14, 0.13, 0.11, 0.09],
[0.19, 0.16, 0.14, 0.12, 0.10],
[0.22, 0.20, 0.17, 0.14, 0.12],
[0.25, 0.22, 0.19, 0.16, 0.13],
[0.27, 0.24, 0.20, 0.17, 0.14]])
for i in range(100):
y = y + (i / 10000)
I want to break the loop when the closest values have been found. By closest I mean, the values should be within ±10% of the original values or some other percentage. How can this be done in Python?
You can compute the Euclidean distance between the two matrices:
import numpy as np
import scipy.spatial.distance
import matplotlib.pyplot as plt
x = np.array([[0.12, 0.11, 0.1, 0.09, 0.08],
[0.13, 0.12, 0.11, 0.1, 0.09],
[0.15, 0.14, 0.12, 0.11, 0.1],
[0.17, 0.15, 0.14, 0.12, 0.11],
[0.19, 0.17, 0.16, 0.14, 0.12],
[0.22, 0.19, 0.17, 0.15, 0.13],
[0.24, 0.22, 0.19, 0.16, 0.14],
[0.27, 0.24, 0.21, 0.18, 0.15],
[0.29, 0.26, 0.22, 0.19, 0.16]])
y = np.array([[0.07, 0.06, 0.05, 0.04, 0.03],
[0.08, 0.07, 0.06, 0.05, 0.04],
[0.10, 0.09, 0.07, 0.06, 0.05],
[0.14, 0.12, 0.11, 0.09, 0.08],
[0.16, 0.14, 0.13, 0.11, 0.09],
[0.19, 0.16, 0.14, 0.12, 0.10],
[0.22, 0.20, 0.17, 0.14, 0.12],
[0.25, 0.22, 0.19, 0.16, 0.13],
[0.27, 0.24, 0.20, 0.17, 0.14]])
dists = []
for i in range(100):
y = y + (i / 10000.)
dists.append(scipy.spatial.distance.euclidean(x.flatten(), y.flatten()))
plt.plot(dists)
will return this graph, which is the evolution of the Euclidean distance between your 2 matrices:
To break the loop at the minimum, you can use:
dist = np.inf
for i in range(100):
y = y + (i / 10000.)
d = scipy.spatial.distance.euclidean(x.flatten(), y.flatten())
if d < dist:
dist = d
else:
break
print dist
# 0.0838525491562 #(the minimal distance)
print y
#[[ 0.1051 0.0951 0.0851 0.0751 0.0651]
#[ 0.1151 0.1051 0.0951 0.0851 0.0751]
#[ 0.1351 0.1251 0.1051 0.0951 0.0851]
#[ 0.1751 0.1551 0.1451 0.1251 0.1151]
#[ 0.1951 0.1751 0.1651 0.1451 0.1251]
#[ 0.2251 0.1951 0.1751 0.1551 0.1351]
#[ 0.2551 0.2351 0.2051 0.1751 0.1551]
#[ 0.2851 0.2551 0.2251 0.1951 0.1651]
#[ 0.3051 0.2751 0.2351 0.2051 0.1751]]

Categories