Converting collections.Counters of combinations frequency from dataframe multi-index into string

Converting collections.Counters of combinations frequency from dataframe multi-index into string - python

Would like to ask for some advise on how to do this properly. I'm new to python.
Initially I wanted to find out the counters/frequency of the combinations of multi-index. I tried a few ways, such as loop, itertuples, iterrows, etc and I realize the fastest and least overhead is to use collections.Counter
However, it returns a list of tuples of the multi-index index combinations as the counter dict keys. The keys of tuples makes it hard for thereafter processing.
Thus I am figuring out how to make them into string with separators to make the thereafter processing easier to manage.
For example this multi-index below:
# testing
def testing():
testing_df = pd.read_csv("data/testing.csv", float_precision="high")
testing_df = testing_df.set_index(["class", "table", "seat"]).sort_index()
print("\n1: \n" + str(testing_df.to_string()))
print("\n2 test: \n" + str(testing_df.index))
occurrences = collections.Counter(testing_df.index)
print("\n3: \n" + str(occurrences))
output:
1:
random_no
class table seat
Emerald 1 0 55.00
Ruby 0 0 33.67
0 24.01
1 87.00
Topaz 0 0 67.00
2 test:
MultiIndex([('Emerald', 1, 0),
( 'Ruby', 0, 0),
( 'Ruby', 0, 0),
( 'Ruby', 0, 1),
( 'Topaz', 0, 0)],
names=['class', 'table', 'seat'])
3:
Counter({('Ruby', 0, 0): 2, ('Emerald', 1, 0): 1, ('Ruby', 0, 1): 1, ('Topaz', 0, 0): 1})
As we can see from 3), it returns the combinations in tuples of different data types as the dict keys, and makes it hard for processing.
I tried to separate it or making it string so processing it can be easier.
Tried below with errors:
x = "|".join(testing_df.index)
print(x)
x = "|".join(testing_df.index)
TypeError: sequence item 0: expected str instance, tuple found
and below with errors
x = "|".join(testing_df.index[0])
print(x)
x = "|".join(testing_df.index[0])
TypeError: sequence item 1: expected str instance, numpy.int64 found
Basically, its either:
I make the combinations into strings before calculating collections.Counter or
after making it into collections.Counter, where all the numerous keys are tuples and convert those keys into strings
Can I ask how do I do this properly?
Thank you very much!

I can offer a solution for 2., convert key tuples into strings:
from collections import Counter
# recreate your problem
occurrences = Counter([('Ruby', 0, 0),
('Ruby', 0, 0),
('Emerald', 1, 0),
('Ruby', 0, 1),
('Topaz', 0, 0)])
# convert tuple keys to string keys
new_occurrences = {'|'.join(str(index) for index in key) : value for key,value in occurrences.items()}
print(new_occurrences)
{'Ruby|0|0': 2, 'Emerald|1|0': 1, 'Ruby|0|1': 1, 'Topaz|0|0': 1}
Counter is a subclass of dict, therefore you can use fancy things like dict-comprehensions and .items() to loop over keys and values at the same time.
Depending on you how you intend to further process your data, it might be more useful to convert the result of your counter to a pandas DataFrame. Simply because pandas offers more and easier functionality for processing.
Here's how:
import pandas as pd
df = pd.DataFrame({'class': [k[0] for k in occurrences.keys()],
'table': [k[1] for k in occurrences.keys()],
'seat': [k[2] for k in occurrences.keys()],
'counts': [v for _,v in occurrences.items()]})
df.head()
class table seat counts
0 Ruby 0 0 2
1 Emerald 1 0 1
2 Ruby 0 1 1
3 Topaz 0 0 1

Related

Obtain the value that occurs the maximum number of times in a tuple

Tuple Expected
(0 , 0, 1) 0
(-1, 0, -1) -1
(0, -1, 1) 0
(0, 1, 1) 1
If no value found return zero.

You use this.
from collections import Counter
a=(-1,0,-1)
b=Counter(a) #Counter({-1: 2, 0: 1})
out=max(b,key=lambda x:b[x]) #-1
If you want count with it use this.
out=max(b.items,key=lambda x:x[1])#(-1,2) it means -1 occured 2 times which is max.
All of this can be incorporated into one-line.
max(Counter(a).items(),key=lambda x:x[1])

You can use collections.Counter to count element occurrences in tuples.
collections.Counter returns a dictionary where the keys are the elements of the tuple and the values are the occurrences of those elements.
from collections import Counter
def elem_max(tup):
# count the element occurances
count_dict = Counter(tup)
# check if different value occurs the same time
for key, val in count_dict.items():
for key1, val1 in count_dict.items():
if key != key1 and val == val1:
return 0
# check if each value occurs only once
if len(set(count_dict.values())) <= 1:
return 0
else:
# sort the dict keys by values in ascending order
# and select the keys with max value
count_max = sorted(count_dict, key=(lambda key: count_dict[key]))[-1]
return count_max
Use the function like this
a = (1, 0, -1)
b = (1, 1, 0)
print(elem_max(a))
print(elem_max(b))
This should show :
>> 0
>> 1
This even works for cases where two elements can occur the same amount of time. The function returns zero in that case.
c = (1, 1, 0, 0, 2)
This returns,
>> 0

Credit: #Redowan Delowar and #Ch3steR
from collections import Counter
def elem_max(tup):
b= Counter(tup)
out=max(b.items(),key=lambda x:x[1])
if(out[1] ==1):
return 0
else:
return out[0]
Test case
tup1 = (1,-1,1)
tup2 = (1, 0, -1)
print(elem_max(tup1)) # 1
print(elem_max(tup2)) # 0

multiple dimensional permutations [duplicate]

This question already has answers here:
How do I generate all permutations of a list?
(40 answers)
Closed 3 years ago.
given a list of non-zero integers like, [2, 3, 4, 2]
generate a list of all the permutations possible where each element above reflects its maximum variance (I am sure there is a better way to express this, but I don't have the math background); each element in the above array can be considered a dimension; the above 2 would allow for values 0 and 1; the 3 would allow for values 0, 1 and 2, etc
the result would be a list of zero-based tuples:
[(0, 0, 0, 0), (0, 0, 0, 1), (0, 0, 1, 0), (0, 0, 1, 1), (0, 0, 2, 0)...
and so on till (1, 2, 3, 1)]
the length of the array could vary, from 1 element to x

you can use itertools.product:
try this:
from itertools import product
limits = [2, 3, 4, 2]
result = list(product(*[range(x) for x in limits]))
print(result)

What you're basically doing is trying to represent integers in a changing base. In your example, some of the digits are base 2, some base 3, and some base 4. So you can use an algorithm that chance base 10 to any base, and have the base you convert to depend on the current digit. Here's what I threw together, not sure if it's completely clear how it works.
n = [2, 3, 4, 2]
max_val = 1
for i in n:
max_val *= i
ans = [] # will hold the generated lists
for i in range(max_val):
current_value = i
current_perm = []
for j in n[::-1]: # For you, the 'least significant bit' is on the right
current_perm.append(current_value % j)
current_value //= j # integer division in python 3
ans.append(current_perm[::-1]) # flip it back around!
print(ans)

So you basically just want to count, but you have a different limit for each position?
limits = [2,3,4,2]
counter = [0] * len(limits)
def check_limits():
for i in range(len(limits)-1, 0, -1):
if counter[i] >= limits[i]:
counter[i] = 0
counter[i-1] += 1
return not counter[0] >= limits[0]
while True:
counter[len(counter)-1] += 1
check = check_limits()
if check:
print(counter)
else:
break
Not a list of tuples, but you get the idea...

Count the duplicates in a list of tuples

I have a list of tuples: a = [(1,2),(1,4),(1,2),(6,7),(2,9)] I want to check if one of the individual elements of each tuple matches the same position/element in another tuple, and how many times this occurs.
For example: If only the 1st element in some tuples has a duplicate, return the tuple and how many times it's duplicated.
I can do that with the following code:
a = [(1,2), (1,4), (1,2), (6,7), (2,9)]
coll_list = []
for t in a:
coll_cnt = 0
for b in a:
if b[0] == t[0]:
coll_cnt = coll_cnt + 1
print "%s,%d" %(t,coll_cnt)
coll_list.append((t,coll_cnt))
print coll_list
I want to know if there is a more effective way to do this?

You can use a Counter
from collections import Counter
a = [(1,2),(1,4),(1,2),(6,7),(2,9)]
counter=Counter(a)
print counter
This will output:
Counter({(1, 2): 2, (6, 7): 1, (2, 9): 1, (1, 4): 1})
It is a dictionary like object with the item (tuples in this case) as the key and a value containing the number of times that key was seen. Your (1,2) tuple is seen twice, while all others are only seen once.
>>> counter[(1,2)]
2
If you are interested in each individual portion of the tuple, you can utilize the same logic for each element in the tuple.
first_element = Counter([x for (x,y) in a])
second_element = Counter([y for (x,y) in a])
first_element and second_element now contain a Counter of the number of times values are seen per element in the tuple
>>> first_element
Counter({1: 3, 2: 1, 6: 1})
>>> second_element
Counter({2: 2, 9: 1, 4: 1, 7: 1})
Again, these are dictionary like objects, so you can check how frequent a specific value appeared directly:
>>> first_element[2]
1
In the first element of your list of tuples, the value 2 appeared 1 time.

use collections library. In the following code val_1, val_2 give you duplicates of each first elements and second elements of the tuples respectively.
import collections
val_1=collections.Counter([x for (x,y) in a])
val_2=collections.Counter([y for (x,y) in a])
>>> print val_1
<<< Counter({1: 3, 2: 1, 6: 1})
This is the number of occurrences of the first element of each tuple
>>> print val_2
<<< Counter({2: 2, 9: 1, 4: 1, 7: 1})
This is the number of occurrences of the second element of each tuple

You can make count_map, and store the count of each tuple as the value.
>>> count_map = {}
>>> for t in a:
... count_map[t] = count_map.get(t, 0) +1
...
>>> count_map
{(1, 2): 2, (6, 7): 1, (2, 9): 1, (1, 4): 1}

Using pandas this is simple and very fast:
import pandas
print(pandas.Series(data=[(1,2),(1,4),(1,2),(6,7),(2,9)]).value_counts())
(1, 2) 2
(1, 4) 1
(6, 7) 1
(2, 9) 1
dtype: int64

Maybe Dictionary can work better. Because in your code, you are traveling the list for twice. And this makes the complexity of your code O(n^2). And this is not a good thing :)
Best way is the travelling for once and to use 1 or 2 conditions for each traverse. Here is the my first solution for such kind of problem.
a = [(1,2),(1,4),(1,2),(6,7),(2,9)]
dict = {}
for (i,j) in a:
if dict.has_key(i):
dict[i] += 1
else:
dict[i] = 1
print dict
For this code, this will give the output:
{1: 3, 2: 1, 6: 1}
I hope it will be helpful.

Python: is index() buggy at all?

I'm working through this thing on pyschools and it has me mystified.
Here's the code:
def convertVector(numbers):
totes = []
for i in numbers:
if i!= 0:
totes.append((numbers.index(i),i))
return dict((totes))
Its supposed to take a 'sparse vector' as input (ex: [1, 0, 1 , 0, 2, 0, 1, 0, 0, 1, 0])
and return a dict mapping non-zero entries to their index.
so a dict with 0:1, 2:1, etc where x is the non zero item in the list and y is its index.
So for the example number it wants this: {0: 1, 9: 1, 2: 1, 4: 2, 6: 1}
but instead gives me this: {0: 1, 4: 2} (before its turned to a dict it looks like this:
[(0, 1), (0, 1), (4, 2), (0, 1), (0, 1)]
My plan is for i to iterate through numbers, create a tuple of that number and its index, and then turn that into a dict. The code seems straightforward, I'm at a loss.
It just looks to me like numbers.index(i) is not returning the index, but instead returning some other, unsuspected number.
Is my understanding of index() defective? Are there known index issues?
Any ideas?

index() only returns the first:
>>> a = [1,2,3,3]
>>> help(a.index)
Help on built-in function index:
index(...)
L.index(value, [start, [stop]]) -> integer -- return first index of value.
Raises ValueError if the value is not present.
If you want both the number and the index, you can take advantage of enumerate:
>>> for i, n in enumerate([10,5,30]):
... print i,n
...
0 10
1 5
2 30
and modify your code appropriately:
def convertVector(numbers):
totes = []
for i, number in enumerate(numbers):
if number != 0:
totes.append((i, number))
return dict((totes))
which produces
>>> convertVector([1, 0, 1 , 0, 2, 0, 1, 0, 0, 1, 0])
{0: 1, 9: 1, 2: 1, 4: 2, 6: 1}
[Although, as someone pointed out though I can't find it now, it'd be easier to write totes = {} and assign to it directly using totes[i] = number than go via a list.]

What you're trying to do, it could be done in one line:
>>> dict((index,num) for index,num in enumerate(numbers) if num != 0)
{0: 1, 2: 1, 4: 2, 6: 1, 9: 1}

Yes your understanding of list.index is incorrect. It finds the position of the first item in the list which compares equal with the argument.
To get the index of the current item, you want to iterate over with enumerate:
for index, item in enumerate(iterable):
# blah blah

The problem is that .index() looks for the first occurence of a certain argument. So for your example it always returns 0 if you run it with argument 1.
You could make use of the built in enumerate function like this:
for index, value in enumerate(numbers):
if value != 0:
totes.append((index, value))

Check the documentation for index:
Return the index in the list of the first item whose value is x. It is
an error if there is no such item.
According to this definition, the following code appends, for each value in numbers a tuple made of the value and the first position of this value in the whole list.
totes = []
for i in numbers:
if i!= 0:
totes.append((numbers.index(i),i))
The result in the totes list is correct: [(0, 1), (0, 1), (4, 2), (0, 1), (0, 1)].
When turning it into again, again, the result is correct, since for each possible value, you get the position of its first occurrence in the original list.
You would get the result you want using i as the index instead:
result = {}
for i in range(len(numbers)):
if numbers[i] != 0:
result[i] = numbers[i]

index() returns the index of the first occurrence of the item in the list. Your list has duplicates which is the cause of your confusion. So index(1) will always return 0. You can't expect it to know which of the many instances of 1 you are looking for.
I would write it like this:
totes = {}
for i, num in enumerate(numbers):
if num != 0:
totes[i] = num
and avoid the intermediate list altogether.

Riffing on #DSM:
def convertVector(numbers):
return dict((i, number) for i, number in enumerate(numbers) if number)
Or, on re-reading, as #Rik Poggi actually suggests.

How to rewrite the code more elegant

I wrote this function. The input and expected results are indicated in the docstring.
def summarize_significance(sign_list):
"""Summarizes a series of individual significance data in a list of ocurrences.
For a group of p.e. 5 measurements and two diferent states, the input data
has the form:
sign_list = [[-1, 1],
[0, 1],
[0, 0],
[0,-1],
[0,-1]]
where -1, 0, 1 indicates decrease, no change or increase respectively.
The result is a list of 3 items lists indicating how many measurements
decrease, do not change or increase (as list items 0,1,2 respectively) for each state:
returns: [[1, 4, 0], [2, 1, 2]]
"""
swaped = numpy.swapaxes(sign_list, 0, 1)
summary = []
for row in swaped:
mydd = defaultdict(int)
for item in row:
mydd[item] += 1
summary.append([mydd.get(-1, 0), mydd.get(0, 0), mydd.get(1, 0)])
return summary
I am wondering if there is a more elegant, efficient way of doing the same thing. Some ideas?

Here's one that uses less code and is probably more efficient because it just iterates through sign_list once without calling swapaxes, and doesn't build a bunch of dictionaries.
summary = [[0,0,0] for _ in sign_list[0]]
for row in sign_list:
for index,sign in enumerate(row):
summary[index][sign+1] += 1
return summary

No, just more complex ways of doing so.
import itertools
def summarize_significance(sign_list):
res = []
for s in zip(*sign_list):
d = dict((x[0], len(list(x[1]))) for x in itertools.groupby(sorted(s)))
res.append([d.get(x, 0) for x in (-1, 0, 1)])
return res

For starters, you could do:
swapped = numpy.swapaxes(sign_list, 0, 1)
for row in swapped:
mydd = {-1:0, 0:0, 1:0}
for item in row:
mydd[item] += 1
summary.append([mydd[-1], mydd[0], mydd[1])
return summary

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Converting collections.Counters of combinations frequency from dataframe multi-index into string - python

Related

Obtain the value that occurs the maximum number of times in a tuple

multiple dimensional permutations [duplicate]

Count the duplicates in a list of tuples

Python: is index() buggy at all?

How to rewrite the code more elegant

Categories

Resources