How to count frequency of such list using basic libraries? - python

List looks like this having ascii character and number value, I want to count occurrence of each of ASCII character for 0, 1 and 2
So for A {0=10, 1=2, 2 =12} likewise
[('P', 0),
('S', 2),
('R', 1),
('O', 1),
('J', 1),
('E', 1),
('C', 1),
('T', 1),
('G', 1),
('U', 1),
('T', 1),
('E', 1),
('N', 1)]
I have tried
char_freq = {c:[0,0,0] for c in string.ascii_uppercase}
also
for i in range(3):
for x,i in a:
print(x,i)
I want to count X for i where X is [A-Z]
It should give me result like
Character | 0 | 1 | 2
A 10 5 4

although you don't supply enough example data to actually achieve your desired output.. i think this is what you're looking for:
from collections import Counter
import pandas as pd
l = [('P', 0),
('S', 2),
('R', 1),
('O', 1),
('J', 1),
('E', 1),
('C', 1),
('T', 1),
('G', 1),
('U', 1),
('T', 1),
('E', 1),
('N', 1)]
df = pd.DataFrame(l)
counts = df.groupby(0)[1].agg(Counter)
returns:
C {1: 1}
E {1: 2}
G {1: 1}
J {1: 1}
N {1: 1}
O {1: 1}
P {0: 1}
R {1: 1}
S {2: 1}
T {1: 2}
U {1: 1}
this will give you each ASCII character, along with each unique number, and how many occurrences of each number

from collections import Counter
l = [('A', 1),
('A', 1),
('A', 2),
('A', 2),
('B', 1),
('B', 2),
('B', 3),
('B', 4)]
data = {}
for k,v in l:
data[k] = [v] if k not in data else data[k] + [v]
char_freq = {k: dict(Counter(v)) for k, v in data.items()}
print(char_freq)
Outputs:
{'A': {1: 2, 2: 2}, 'B': {1: 1, 2: 1, 3: 1, 4: 1}}

your code looks fine you just have to make a small change to the char_freq variable to get the expected result:
char_freq = {c: {0: 0, 1: 0, 2: 0} for c in string.ascii_uppercase}
for x, i in a:
char_freq[x][i] += 1
to avoid having all the alphabet in your char_freq you could use only the necessary characters:
char_freq = {c: {0: 0, 1: 0, 2: 0} for c in {t[0] for t in a}}
for x, i in a:
char_freq[x][i] += 1
output:
{'O': {0: 0, 1: 1, 2: 0},
'T': {0: 0, 1: 2, 2: 0},
'N': {0: 0, 1: 1, 2: 0},
'G': {0: 0, 1: 1, 2: 0},
'U': {0: 0, 1: 1, 2: 0},
'E': {0: 0, 1: 2, 2: 0},
'J': {0: 0, 1: 1, 2: 0},
'R': {0: 0, 1: 1, 2: 0},
'C': {0: 0, 1: 1, 2: 0},
'S': {0: 0, 1: 0, 2: 1},
'P': {0: 1, 1: 0, 2: 0}}

Related

In Python: In a list of tuples that represent X, Y points, what is the syntax to access a particular X or Y?

For example if
list = [{0: 1}, {0: 2}, {1: 0}, {1: 2}, {2: 0}, {2: 1}, {2: 2}]
what is the syntax to get the x and y of the second item?
x = list[1] ?
y = list[1] ?
The list you have shown is:
[{0: 1}, {0: 2}, {1: 0}, {1: 2}, {2: 0}, {2: 1}, {2: 2}]
This is a list of dicts, each with a single element. For the second item (at index 1), the dictionary has one key/value pair with key 0 and value 2. To access this, you could do this:
myList = [{0: 1}, {0: 2}, {1: 0}, {1: 2}, {2: 0}, {2: 1}, {2: 2}]
myDict = myList[1]
key, value = next(iter(myDict.items()))
print(key, value)
If what you want instead is a list of tuples, that would be something like:
[(0, 1), (0, 2), (1, 0), (1, 2), (2, 0), (2, 1), (2, 2)]
You could access the second tuple in the list as follows:
myList = [(0, 1), (0, 2), (1, 0), (1, 2), (2, 0), (2, 1), (2, 2)]
x = myList[1][0]
y = myList[1][1]
print(x, y)
Alternatively, you could use this more concise assignment syntax:
x, y = myList[1]

how to group a list of dictionaries to get a list of their corresponding indices?

Have a list of dictionaries, something like this:
l = [{'a':25}, {'a':25}, {'b':30}, {'c':200}, {'b':30}]
want to find the distinct elements and their corresponding indices, something like this:
[
({'a':25}, [0,1]),
({'b':30}, [2,4]),
({'c':200}, [3]),
]
tried with itertools.groupby, but couldn't make it happen, perhaps I'm missing something, any other directions are great too.
Consider this list of dictionaries:
>>> dicts
[{'a': 3},
{'d': 4, 'a': 3, 'c': 1},
{'d': 8, 'c': 0, 'b': 9},
{'c': 3, 'a': 9},
{'a': 5, 'd': 8},
{'d': 5, 'b': 5, 'a': 0},
{'b': 7, 'c': 7},
{'d': 6, 'b': 7, 'a': 6},
{'a': 4, 'c': 1, 'd': 5, 'b': 2},
{'d': 7}]
Assuming you want all indices of every instance of every dictionary's keys:
idxs = {}
for i, d in enumerate(l):
for pair in d.items():
idxs.setdefault(pair, []).append(i)
This produces what I would consider more useful output, as it allows you to look up the indices of any specific key-value pair:
{('a', 3): [0, 1],
('d', 4): [1],
('c', 1): [1, 8],
('d', 8): [2, 4],
('c', 0): [2],
('b', 9): [2],
('c', 3): [3],
('a', 9): [3],
('a', 5): [4],
('d', 5): [5, 8],
('b', 5): [5],
('a', 0): [5],
('b', 7): [6, 7],
('c', 7): [6],
('d', 6): [7],
('a', 6): [7],
('a', 4): [8],
('b', 2): [8],
('d', 7): [9]}
However, if you must convert to List[Tuple[Dict[str, int], List[int]]], you can produce it very easily from the previous output:
>>> [(dict((p,)), l) for p, l in idxs.items()]
[({'a': 3}, [0, 1]),
({'d': 4}, [1]),
({'c': 1}, [1, 8]),
({'d': 8}, [2, 4]),
({'c': 0}, [2]),
({'b': 9}, [2]),
({'c': 3}, [3]),
({'a': 9}, [3]),
({'a': 5}, [4]),
({'d': 5}, [5, 8]),
({'b': 5}, [5]),
({'a': 0}, [5]),
({'b': 7}, [6, 7]),
({'c': 7}, [6]),
({'d': 6}, [7]),
({'a': 6}, [7]),
({'a': 4}, [8]),
({'b': 2}, [8]),
({'d': 7}, [9])]
Turn the dictionaries into tuples so you can use them as keys in a dictionary. Then iterate over the list, adding the indexes to this dictionary.
locations_dict = {}
for i, d in enumerate(l):
dtuple = tuple(d.items())
locations_dict.setdefault(dtuple, []).append(i)
locations = [(dict(key), value) for key, value in locations_dict.items()]
from collections import defaultdict
indices = defaultdict(list)
for idx, val in enumerate(l):
indices[tuple(*val.items())].append(idx)
print(indices)
# output
defaultdict(list, {('a', 25): [0, 1], ('b', 30): [2, 4], ('c', 200): [3]})
Another way of doing it:
import ast
l = [{'a':25}, {'a':25}, {'b':30}, {'c':200}, {'b':30}]
n_dict = {}
for a, b in enumerate(l):
n_dict[str(b)] = n_dict.get(str(b), []) + [a]
print(list(zip( [ast.literal_eval(i) for i in n_dict.keys()], n_dict.values() )))
great idea with the dicts/defaultdicts, this also seems to work:
l = [{'a':25}, {'a':25}, {'b':30}, {'c':200}, {'b':30}, {'a': 25}]
sorted_values = sorted(enumerate(l), key=lambda x: str(x[1]))
grouped = itertools.groupby(sorted_values, lambda x: x[1])
grouped_indices = [(k, [x[0] for x in g]) for k, g in grouped]
print(grouped_indices)
the idea is that once an array is sorted (keeping the original indices as additional details) itertools/linux groupby is preaty similar to sql/pandas groupby

How to iterate over list of key-value tuples?

I have sorted the dictionary using sorted() method and it returns a list of key-value tuples as:-
MyDict = {'a': 8, 'b': 4, 'c': 3, 'd': 1, 'e': 0, 'f': 0, 'g': 1, 'h' :1}
data = sorted(MyDict.items(), key=lambda x: x[1], reverse=True)
print(data)
Output:-
[('a', 8), ('b', 4), ('c', 3), ('d', 1), ('g', 1), ('h', 1), ('e', 0), ('f', 0)]
Now, I want to iterate over it using for loop and print the keys as well as values that are divisible by k (k can me any integer number). This seems a basic question, but I am a beginner and it seems tricky to me. Help would be really appreciated.
k = 4
data = [('a', 8), ('b', 4), ('c', 3), ('d', 1), ('g', 1), ('h', 1), ('e', 0), ('f', 0)]
for item in data:
if item[1] % k == 0:
print('Key = %s, Value = %.i' % (item[0], item[1]))
Output:
Key = a, Value = 8
Key = b, Value = 4
Key = e, Value = 0
Key = f, Value = 0
Try the following:
num = 1
for key, value in data:
if value % num == 0:
print("Key: {0} Value: {1}".format(key, value))
you don't need to convert the dictionary to list of tuples for doing that.
Try this :
MyDict = {'a': 8, 'b': 4, 'c': 3, 'd': 1, 'e': 0, 'f': 0, 'g': 1, 'h' :1}
k = 2 # it can be any value as you say
for key,val in MyDict.iteritems():
if val%k ==0:
print("Key - {0} and value - {1}".format(key,val))
If you still need to convert the given dictionary to list of tuples and then use it, here is how you can do it.
MyDict = {'a': 8, 'b': 4, 'c': 3, 'd': 1, 'e': 0, 'f': 0, 'g': 1, 'h' :1}
data = sorted(MyDict.items(), key=lambda x: x[1], reverse=True)
k = 2 # any int value you want.
for item in data:
if item[1]%k == 0:
print("Key - {0} and value - {1}".format(item[0],item[1]))

Pandas MultiIndex (more than 2 levels) DataFrame to Nested Dict/JSON

This question is similar to this one, but I want to take it a step further. Is it possible to extend the solution to work with more levels? Multilevel dataframes' .to_dict() method has some promising options, but most of them will return entries that are indexed by tuples (i.e. (A, 0, 0): 274.0) rather than nesting them in dictionaries.
For an example of what I'm looking to accomplish, consider this multiindex dataframe:
data = {0: {
('A', 0, 0): 274.0,
('A', 0, 1): 19.0,
('A', 1, 0): 67.0,
('A', 1, 1): 12.0,
('B', 0, 0): 83.0,
('B', 0, 1): 45.0
},
1: {
('A', 0, 0): 254.0,
('A', 0, 1): 11.0,
('A', 1, 0): 58.0,
('A', 1, 1): 11.0,
('B', 0, 0): 76.0,
('B', 0, 1): 56.0
}
}
df = pd.DataFrame(data).T
df.index = ['entry1', 'entry2']
df
# output:
A B
0 1 0
0 1 0 1 0 1
entry1 274.0 19.0 67.0 12.0 83.0 45.0
entry2 254.0 11.0 58.0 11.0 76.0 56.0
You can imagine that we have many records here, not just two, and that the index names could be longer strings. How could you turn this into nested dictionaries (or directly to JSON) that look like this:
[
{'entry1': {'A': {0: {0: 274.0, 1: 19.0}, 1: {0: 67.0, 1: 12.0}},
'B': {0: {0: 83.0, 1: 45.0}}},
'entry2': {'A': {0: {0: 254.0, 1: 11.0}, 1: {0: 58.0, 1: 11.0}},
'B': {0: {0: 76.0, 1: 56.0}}}}
]
I'm thinking some amount of recursion could potentially be helpful, maybe something like this, but have so far been unsuccessful.
So, you really need to do 2 things here:
df.to_dict()
Convert this to nested dictionary.
df.to_dict(orient='index') gives you a dictionary with the index as keys; it looks like this:
>>> df.to_dict(orient='index')
{'entry1': {('A', 0, 0): 274.0,
('A', 0, 1): 19.0,
('A', 1, 0): 67.0,
('A', 1, 1): 12.0,
('B', 0, 0): 83.0,
('B', 0, 1): 45.0},
'entry2': {('A', 0, 0): 254.0,
('A', 0, 1): 11.0,
('A', 1, 0): 58.0,
('A', 1, 1): 11.0,
('B', 0, 0): 76.0,
('B', 0, 1): 56.0}}
Now you need to nest this. Here's a trick from Martijn Pieters to do that:
def nest(d: dict) -> dict:
result = {}
for key, value in d.items():
target = result
for k in key[:-1]: # traverse all keys but the last
target = target.setdefault(k, {})
target[key[-1]] = value
return result
Putting this all together:
def df_to_nested_dict(df: pd.DataFrame) -> dict:
d = df.to_dict(orient='index')
return {k: nest(v) for k, v in d.items()}
Output:
>>> df_to_nested_dict(df)
{'entry1': {'A': {0: {0: 274.0, 1: 19.0}, 1: {0: 67.0, 1: 12.0}},
'B': {0: {0: 83.0, 1: 45.0}}},
'entry2': {'A': {0: {0: 254.0, 1: 11.0}, 1: {0: 58.0, 1: 11.0}},
'B': {0: {0: 76.0, 1: 56.0}}}}
I took the idea from the previous answer and slightly modified it.
1) Took the function nested_dict from stackoverflow, to create the dictionary
from collections import defaultdict
def nested_dict(n, type):
if n == 1:
return defaultdict(type)
else:
return defaultdict(lambda: nested_dict(n-1, type))
2 Wrote the following function:
def df_to_nested_dict(self, df, type):
# Get the number of levels
temp = df.index.names
lvl = len(temp)
# Create the target dictionary
new_nested_dict=nested_dict(lvl, type)
# Convert the dataframe to a dictionary
temp_dict = df.to_dict(orient='index')
for x, y in temp_dict.items():
dict_keys = ''
# Process the individual items from the key
for item in x:
dkey = '[%d]' % item
dict_keys = dict_keys + dkey
# Create a string and execute it
dict_update = 'new_nested_dict%s = y' % dict_keys
exec(dict_update)
return new_nested_dict
It is the same idea but it is done slightly different

Getting an empty list when appending values with multiprocessing

I am using Python 3.5 to edit a list, which in this case is predictions_dict['D'], included in the dictionary predictions_dict. This is the code that I use:
import multiprocessing as multip
predictions_dict = {'A': [],
'B': [],
'C': [],
'D': [],
'E': [],
'F': [],
'Def': []}
data = [{'index': 1, 'rank': 'A'}, {'index': 2, 'rank': 'D'}, {'index': 3, 'rank': 'E'}]
prediction = [(1, 'C'), (2, 'D'), (3, 'D')]
def create_predictions_dict(index, rank):
for j in data:
if j['index'] == index:
predictions_dict[rank].append((index, j['rank'], rank))
break
np = multip.cpu_count()
p = multip.Pool(processes=np)
_ = p.starmap(create_predictions_dict, prediction)
p.close()
p.join()
print('final list:', predictions_dict['D'])
when I execute this code, the output I get is:
final list: []
And I don't understand why, as I would expect to get:
final list: [(2, 'D', 'D'), (3, 'E', 'D')]
I have worked on a solution, thanks to the fact that in the comments the problem was identified as the fact that processes don't share state:
import multiprocessing as multip
predictions_dict = {'A': [],
'B': [],
'C': [],
'D': [],
'E': [],
'F': [],
'Def': []}
data = [{'index': 1, 'rank': 'A'}, {'index': 2, 'rank': 'D'}, {'index': 3, 'rank': 'E'}]
prediction = [(1, 'C'), (2, 'D'), (3, 'D')]
def create_predictions_dict(index, rank):
for j in data:
if j['index'] == index:
return index, j['rank'], rank
np = multip.cpu_count()
p = multip.Pool(processes=np)
sk = p.starmap(create_predictions_dict, prediction)
p.close()
p.join()
for elem in sk:
predictions_dict[elem[2]].append(elem)
print('final list:', predictions_dict['D'])

Categories