IndexError on 3-dimensional arrays - python

Noob question, but I can't seem to figure out why this is throwing an error: IndexError: index 4 is out of bounds for axis 2 with size 4
import numpy as np
numP = 4;
P = np.zeros((3,3,numP))
P[:,:,1] = np.array([[0.50, 0.25, 0.25],
[0.20, 0.55, 0.25],
[0.20, 0.30, 0.50]])
P[:,:,2] = np.array([[0.70, 0.20, 0.10],
[0.05, 0.75, 0.20],
[0.10, 0.20, 0.70]])
P[:,:,3] = np.array([[0.45, 0.35, 0.20],
[0.20, 0.65, 0.15],
[0.00, 0.30, 0.70]])
P[:,:,4] = np.array([[0.60, 0.20, 0.20],
[0.20, 0.60, 0.20],
[0.05, 0.05, 0.90]])

Python is 0-indexed (as in list[0] refers to the first element in the list, list[1] refers to the second element... etc)
so the last assignment should be P[:,:,3]

Related

How can I reduce time complexity on this algorithm?

I have this exercise and the goal is to solve it with complexity less than O(n^2).
You have an array with length N filled with event probabilities. Create another array in which for each element i calculate the probability of all event to happen until the position i.
I have coded this O(n^2) solution. Any ideas how to improve it?
probabilityTable = [0.1, 0.54, 0.34, 0.11, 0.55, 0.75, 0.01, 0.06, 0.96]
finalTable = list()
for i in range(len(probabilityTable)):
finalTable.append(1)
for j in range(i):
finalTable[i] *= probabilityTable[j]
for item in finalTable:
print(item)
probabilityTable = [0.1, 0.54, 0.34, 0.11, 0.55, 0.75, 0.01, 0.06, 0.96]
finalTable = probabilityTable.copy()
for i in range(1, len(probabilityTable)):
finalTable[i] = finalTable[i] * finalTable[i - 1]
for item in finalTable:
print(item)
new_probs = [probabilityTable[0]]
for prob in probabilityTable[1:]:
new_probs.append(new_probs[-1] + prob)

np.arange producing elements with many decimals

I have the following loop.
x_array = []
for x in np.arange(0.01, 0.1, 0.01 ):
x_array.append(x)
Why are some of the elements in x_array in so many decimals?
[0.01,
0.02,
0.03,
0.04,
0.05,
0.060000000000000005,
0.06999999999999999,
0.08,
0.09]
If you want your list of numbers without "additional" digits in the
fractional part, try the following code:
x_array = np.arange(0.01, 0.1, 0.01).round(2).tolist()
As you see, you don't even need any explicit loop.
The result is just what you want, i.e.:
[0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09]
Another choice is:
x_array = (np.arange(1, 10) / 100).tolist()

Two different beaviour for .tolist() IndexError

I looping over on dataframe df1 to look for maximum order and then I want to take discount_first to assign to max order.
For one dataset everything goes OK
new_rate_1 = []
for value in df1["maximum_order"]:
new_val = df[df["New_Order_Lines"]==value]["discount_first"]
new_val = new_val.tolist()[0]
new_rate_1.append(new_val)
new_rate_1
[-1.3,
-1.3,
0.35,
0.8,
0.75,
0.55,
0.8,
0.85,
0.4,
0.75,
0.85,
0.85,
0.55,
0.45,
0.8,
0.65,
0.55,
0.85,
0.35,
0.85,
0.9,
0.5,
0.55,
-0.6,
0.85,
0.75,
0.35,
0.15,
0.55,
0.7,
0.8,
0.85,
0.75,
0.65,
0.75,
0.75,
0.35,
0.85,
0.4,
...
....
]
for other data set i start getting error ?
IndexError: list index out of range
If I dont index the list within the look I dont get error and output looks like this
[[0.8],
[0.8],
[0.55],
[0.55],
[0.55],
[0.85],
[0.55],
[0.85],
[0.85],
[0.65],
[0.65],
[0.75],
[0.7]
.....
any suggestion/advice how can I get rid of behaviour?
Thanks in advance
How about using this
# new_val = new_val.tolist()[0]
new_val = new_val.values.flatten()[0]
Why looping at all when you can do it without a loop?
you can use isin()+tolist() method:
new_rate_1 =df.loc[df["New_Order_Lines"].isin(df1["maximum_order"]),"discount_first"].tolist()

How to L2 Normalize a list of lists in Python using Sklearn

s2 = [[0.2, 0.2, 0.2, 0.3021651247531982, 0.24462871026284194], [0.2, 0.4892574205256839, 0.2, 0.2, 0.383258146374831], [0.3193817886456925, 0.16666666666666666, 0.16666666666666666, 0.16666666666666666, 0.3193817886456925, 0.3193817886456925], [0.2, 0.2, 0.2, 0.3021651247531982, 0.24462871026284194]]
from sklearn.preprocessing import normalize
X = normalize(s2)
this is throwing error:
ValueError: setting an array element with a sequence.
How to L2 Normalize a list of lists in Python using Sklearn.
Since I don't have enough reputation to comment; hence posting it as an answer.
Let's quickly look at your datapoint.
I have converted the given datapoint into NumPy array. Since it doesn't have the same length, so it will look like.
>>> n2 = np.array([[0.2, 0.2, 0.2, 0.3021651247531982, 0.24462871026284194], [0.2, 0.4892574205256839, 0.2, 0.2, 0.383258146374831], [0.3193817886456925, 0.16666666666666666, 0.16666666666666666, 0.16666666666666666, 0.3193817886456925, 0.3193817886456925], [0.2, 0.2, 0.2, 0.3021651247531982, 0.24462871026284194]])
>>> n2
array([list([0.2, 0.2, 0.2, 0.3021651247531982, 0.24462871026284194]),
list([0.2, 0.4892574205256839, 0.2, 0.2, 0.383258146374831]),
list([0.3193817886456925, 0.16666666666666666, 0.16666666666666666, 0.16666666666666666, 0.3193817886456925, 0.3193817886456925]),
list([0.2, 0.2, 0.2, 0.3021651247531982, 0.24462871026284194])],
dtype=object)
And you can see here that converted values are not in Sequence of Values and to achieve this you need to keep the same length for the internal list ( looks like 0.16666666666666666 is copied multiple time in your array; if not then fix the length), it will look like
>>> n3 = np.array([[0.2, 0.2, 0.2, 0.3021651247531982, 0.24462871026284194], [0.2, 0.4892574205256839, 0.2, 0.2, 0.383258146374831], [0.3193817886456925, 0.16666666666666666, 0.16666666666666666, 0.16666666666666666, 0.319381788645692], [0.2, 0.2, 0.2, 0.3021651247531982, 0.24462871026284194]])
>>> n3
array([[0.2 , 0.2 , 0.2 , 0.30216512, 0.24462871],
[0.2 , 0.48925742, 0.2 , 0.2 , 0.38325815],
[0.31938179, 0.16666667, 0.16666667, 0.16666667, 0.31938179],
[0.2 , 0.2 , 0.2 , 0.30216512, 0.24462871]])
As you can see now n3 has become a sequence of values.
and if you use normalize function, it simply works
>>> X = normalize(n3)
>>> X
array([[0.38408524, 0.38408524, 0.38408524, 0.58028582, 0.46979139],
[0.28108867, 0.6876236 , 0.28108867, 0.28108867, 0.53864762],
[0.59581303, 0.31091996, 0.31091996, 0.31091996, 0.59581303],
[0.38408524, 0.38408524, 0.38408524, 0.58028582, 0.46979139]])
How to use NumPy array to avoid this issue, please have a look at this SO link ValueError: setting an array element with a sequence
Important: I removed one element from the 3rd list in order for all lists to have the same length.
I did that cause I really believe that it's a copy-paste error. If not, comment below and I will modify my answer.
import numpy as np
s2 = [[0.2, 0.2, 0.2, 0.3021651247531982, 0.24462871026284194], [0.2, 0.4892574205256839, 0.2, 0.2, 0.383258146374831], [0.3193817886456925, 0.16666666666666666, 0.16666666666666666, 0.3193817886456925, 0.3193817886456925], [0.2, 0.2, 0.2, 0.3021651247531982, 0.24462871026284194]]
X = normalize(np.array(s2))

Equivalent of pandas.Series.unique() for non-hashable elements

I would like to know if there is an equivalent for pandas.Series.unique() when the series contains non-hashable elements (in my case, lists).
For instance, with
>> ds
XTR
s0b0_VARC-0.200 [0.05, 0.05]
s0b0_VARC-0.100 [0.05, 0.05]
s0b0_VARC0.000 [0.05, 0.05]
s0b0_VARC0.100 [0.05, 0.05]
s0b1_VARC-0.200 [0.05, 0.05]
s0b1_VARC0.000 [0.05, 0.05]
s0b1_VARC0.100 [0.05, 0.05]
s0b2_VARC-0.200 [0.05, 0.05]
s0b2_VARC-0.100 [0.06, 0.025]
s0b2_VARC0.000 [0.05, 0.05]
s0b2_VARC0.100 [0.05, 0.05]
I would like to get
>> ds.unique()
2
Thanks #Quang Hoang
Inspired from this SO answer, I wrote the following function (not sure how robust it is though):
def count_unique_values(series):
try:
tuples = [tuple(x) for x in series.values]
series = pd.Series(tuples)
nb = len(series.unique())
print(nb)
except TypeError:
nb = len(series.unique())
return nb

Categories