Equivalent of pandas.Series.unique() for non-hashable elements

Equivalent of pandas.Series.unique() for non-hashable elements - python

I would like to know if there is an equivalent for pandas.Series.unique() when the series contains non-hashable elements (in my case, lists).
For instance, with
>> ds
XTR
s0b0_VARC-0.200 [0.05, 0.05]
s0b0_VARC-0.100 [0.05, 0.05]
s0b0_VARC0.000 [0.05, 0.05]
s0b0_VARC0.100 [0.05, 0.05]
s0b1_VARC-0.200 [0.05, 0.05]
s0b1_VARC0.000 [0.05, 0.05]
s0b1_VARC0.100 [0.05, 0.05]
s0b2_VARC-0.200 [0.05, 0.05]
s0b2_VARC-0.100 [0.06, 0.025]
s0b2_VARC0.000 [0.05, 0.05]
s0b2_VARC0.100 [0.05, 0.05]
I would like to get
>> ds.unique()
2

Thanks #Quang Hoang
Inspired from this SO answer, I wrote the following function (not sure how robust it is though):
def count_unique_values(series):
try:
tuples = [tuple(x) for x in series.values]
series = pd.Series(tuples)
nb = len(series.unique())
print(nb)
except TypeError:
nb = len(series.unique())
return nb

Related

How can I reduce time complexity on this algorithm?

I have this exercise and the goal is to solve it with complexity less than O(n^2).
You have an array with length N filled with event probabilities. Create another array in which for each element i calculate the probability of all event to happen until the position i.
I have coded this O(n^2) solution. Any ideas how to improve it?
probabilityTable = [0.1, 0.54, 0.34, 0.11, 0.55, 0.75, 0.01, 0.06, 0.96]
finalTable = list()
for i in range(len(probabilityTable)):
finalTable.append(1)
for j in range(i):
finalTable[i] *= probabilityTable[j]
for item in finalTable:
print(item)

probabilityTable = [0.1, 0.54, 0.34, 0.11, 0.55, 0.75, 0.01, 0.06, 0.96]
finalTable = probabilityTable.copy()
for i in range(1, len(probabilityTable)):
finalTable[i] = finalTable[i] * finalTable[i - 1]
for item in finalTable:
print(item)

new_probs = [probabilityTable[0]]
for prob in probabilityTable[1:]:
new_probs.append(new_probs[-1] + prob)

np.arange producing elements with many decimals

I have the following loop.
x_array = []
for x in np.arange(0.01, 0.1, 0.01 ):
x_array.append(x)
Why are some of the elements in x_array in so many decimals?
[0.01,
0.02,
0.03,
0.04,
0.05,
0.060000000000000005,
0.06999999999999999,
0.08,
0.09]

If you want your list of numbers without "additional" digits in the
fractional part, try the following code:
x_array = np.arange(0.01, 0.1, 0.01).round(2).tolist()
As you see, you don't even need any explicit loop.
The result is just what you want, i.e.:
[0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09]
Another choice is:
x_array = (np.arange(1, 10) / 100).tolist()

IndexError on 3-dimensional arrays

Noob question, but I can't seem to figure out why this is throwing an error: IndexError: index 4 is out of bounds for axis 2 with size 4
import numpy as np
numP = 4;
P = np.zeros((3,3,numP))
P[:,:,1] = np.array([[0.50, 0.25, 0.25],
[0.20, 0.55, 0.25],
[0.20, 0.30, 0.50]])
P[:,:,2] = np.array([[0.70, 0.20, 0.10],
[0.05, 0.75, 0.20],
[0.10, 0.20, 0.70]])
P[:,:,3] = np.array([[0.45, 0.35, 0.20],
[0.20, 0.65, 0.15],
[0.00, 0.30, 0.70]])
P[:,:,4] = np.array([[0.60, 0.20, 0.20],
[0.20, 0.60, 0.20],
[0.05, 0.05, 0.90]])

Python is 0-indexed (as in list[0] refers to the first element in the list, list[1] refers to the second element... etc)
so the last assignment should be P[:,:,3]

How to L2 Normalize a list of lists in Python using Sklearn

s2 = [[0.2, 0.2, 0.2, 0.3021651247531982, 0.24462871026284194], [0.2, 0.4892574205256839, 0.2, 0.2, 0.383258146374831], [0.3193817886456925, 0.16666666666666666, 0.16666666666666666, 0.16666666666666666, 0.3193817886456925, 0.3193817886456925], [0.2, 0.2, 0.2, 0.3021651247531982, 0.24462871026284194]]
from sklearn.preprocessing import normalize
X = normalize(s2)
this is throwing error:
ValueError: setting an array element with a sequence.
How to L2 Normalize a list of lists in Python using Sklearn.

Since I don't have enough reputation to comment; hence posting it as an answer.
Let's quickly look at your datapoint.
I have converted the given datapoint into NumPy array. Since it doesn't have the same length, so it will look like.
>>> n2 = np.array([[0.2, 0.2, 0.2, 0.3021651247531982, 0.24462871026284194], [0.2, 0.4892574205256839, 0.2, 0.2, 0.383258146374831], [0.3193817886456925, 0.16666666666666666, 0.16666666666666666, 0.16666666666666666, 0.3193817886456925, 0.3193817886456925], [0.2, 0.2, 0.2, 0.3021651247531982, 0.24462871026284194]])
>>> n2
array([list([0.2, 0.2, 0.2, 0.3021651247531982, 0.24462871026284194]),
list([0.2, 0.4892574205256839, 0.2, 0.2, 0.383258146374831]),
list([0.3193817886456925, 0.16666666666666666, 0.16666666666666666, 0.16666666666666666, 0.3193817886456925, 0.3193817886456925]),
list([0.2, 0.2, 0.2, 0.3021651247531982, 0.24462871026284194])],
dtype=object)
And you can see here that converted values are not in Sequence of Values and to achieve this you need to keep the same length for the internal list ( looks like 0.16666666666666666 is copied multiple time in your array; if not then fix the length), it will look like
>>> n3 = np.array([[0.2, 0.2, 0.2, 0.3021651247531982, 0.24462871026284194], [0.2, 0.4892574205256839, 0.2, 0.2, 0.383258146374831], [0.3193817886456925, 0.16666666666666666, 0.16666666666666666, 0.16666666666666666, 0.319381788645692], [0.2, 0.2, 0.2, 0.3021651247531982, 0.24462871026284194]])
>>> n3
array([[0.2 , 0.2 , 0.2 , 0.30216512, 0.24462871],
[0.2 , 0.48925742, 0.2 , 0.2 , 0.38325815],
[0.31938179, 0.16666667, 0.16666667, 0.16666667, 0.31938179],
[0.2 , 0.2 , 0.2 , 0.30216512, 0.24462871]])
As you can see now n3 has become a sequence of values.
and if you use normalize function, it simply works
>>> X = normalize(n3)
>>> X
array([[0.38408524, 0.38408524, 0.38408524, 0.58028582, 0.46979139],
[0.28108867, 0.6876236 , 0.28108867, 0.28108867, 0.53864762],
[0.59581303, 0.31091996, 0.31091996, 0.31091996, 0.59581303],
[0.38408524, 0.38408524, 0.38408524, 0.58028582, 0.46979139]])
How to use NumPy array to avoid this issue, please have a look at this SO link ValueError: setting an array element with a sequence

Important: I removed one element from the 3rd list in order for all lists to have the same length.
I did that cause I really believe that it's a copy-paste error. If not, comment below and I will modify my answer.
import numpy as np
s2 = [[0.2, 0.2, 0.2, 0.3021651247531982, 0.24462871026284194], [0.2, 0.4892574205256839, 0.2, 0.2, 0.383258146374831], [0.3193817886456925, 0.16666666666666666, 0.16666666666666666, 0.3193817886456925, 0.3193817886456925], [0.2, 0.2, 0.2, 0.3021651247531982, 0.24462871026284194]]
X = normalize(np.array(s2))

Sorting a nested list by increasing numbers

How do you sort all the values within the nested list structure, so that the sublists are both the same length as in the original list and so that the values shift to the appropriate sublist so that they are sorted overall, not just within each sublist individually. How does one go about this??
for instance:
list1=[[0.10, 0.90, 0,20], [0.15, 0.80], [0.68, 0.08, 0.30]]
Becomes:
list1=[[0.08, 0.10, 0.15], [0.20, 0.30], [0.68, 0.80, 0.90]]
Any help is appreciated

This works.
list1=[[0.10, 0.90, 0.20], [0.15, 0.80], [0.68, 0.08, 0.30]]
list_lengths = [len(x) for x in list1]
flattened = [item for items in list1 for item in items]
items_sorted = sorted(flattened)
loc = 0
lists2 = []
for length in list_lengths:
lists2.append(items_sorted[loc:loc+length])
loc += length
print(lists2)
You need to get list lengths at some point to build the final lists2. To get your ordered values properly, you flatten and sort the list, then you add lists to list2 by slicing your sorted items.
Note that this will work for arbitrary length lists and tuples.

You can use chain.from_iterable to chain the lists, sort them and create an iterator. Then you can just iterate over the original lists and create a result using next:
>>> from itertools import chain
>>> l = [[0.10, 0.90, 0.20], [0.15, 0.80], [0.68, 0.08, 0.30]]
>>> it = iter(sorted(chain.from_iterable(l)))
>>> [[next(it) for _ in l2] for l2 in l]
[[0.08, 0.1, 0.15], [0.2, 0.3], [0.68, 0.8, 0.9]]

I would use itertools for this and confine the whole thing inside one function:
import itertools
def join_sort_slice(iterable):
it = iter(sorted(itertools.chain(*iterable)))
output = []
for i in map(len, iterable):
output.append(list(itertools.islice(it, i)))
return output
Use it:
lst = [[0.10, 0.90, 0.20], [0.15, 0.80], [0.68, 0.08, 0.30]]
join_sort_slice(lst)
# [[0.08, 0.1, 0.15], [0.2, 0.3], [0.68, 0.8, 0.9]]
The idea is to chain all sublists together and then sort the outcome. This sorted output is then sliced based on the lengths of the original list of lists.
I hope this helps.

Similar to #Evan's answer
import itertools
import numpy as np
def pairwise(iterable):
"s -> (s0,s1), (s1,s2), (s2, s3), ..."
a, b = itertools.tee(iterable)
next(b, None)
return zip(a, b)
list1=[[0.10, 0.90, 0.20], [0.15, 0.80], [0.68, 0.08, 0.30]]
# get the sizes of each of the sublists and where they start
sizes = [len(l) for l in list1]
sizes.insert(0,0)
offsets = np.cumsum(sizes)
# flatten and sort
flat_list = sorted(itertools.chain(*list1))
nested = [flat_list[begin:end] for begin, end in pairwise(offsets)]
print(nested)

Another variation with itertools:
import itertools
list1=[[0.10, 0.90, 0.20], [0.15, 0.80], [0.68, 0.08, 0.30]]
sorted_l = sorted(itertools.chain.from_iterable(list1))
result = []
k=0
for i in (len(i) for i in list1):
result.append(sorted_l[k:k+i])
k=k+i
print(result)
The output:
[[0.08, 0.1, 0.15], [0.2, 0.3], [0.68, 0.8, 0.9]]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Equivalent of pandas.Series.unique() for non-hashable elements - python

Related

How can I reduce time complexity on this algorithm?

np.arange producing elements with many decimals

IndexError on 3-dimensional arrays

How to L2 Normalize a list of lists in Python using Sklearn

Sorting a nested list by increasing numbers

Categories

Resources