Retrieve keys Set from pandas Series of dict values - python

Given a pandas Series of dict values with str keys:
Series
------
{'a': 1, 'b' : 2, 'c' : 3}
{'b': 3, 'd': 5}
{'d': 7, 'e': 7}
How can the Series be scanned to retrieve a set of the dictionary keys? The resulting output would be a plain python set:
{'a', 'b', 'c', 'd', 'e'}
Thank you in advance for your consideration and response.

Use list comprehension with flattening and convert to sets:
a = set([y for x in s for y in x])
print (a)
{'e', 'a', 'd', 'c', 'b'}
Or use itertools.chain.from_iterable:
from itertools import chain
a = set(chain.from_iterable(s))

Maybe this:
s = pd.Series(...)
a = set(list(pd.DataFrame(s.tolist())))
# {'a', 'e', 'b', 'c', 'd'}

Related

Extract all possible combinations of unique elements in dict of lists

I have this input:
d = {'a': ['A', 'B', 'C'], 'b': ['A', 'B', 'C'], 'c': ['D', 'E'], 'd': ['E', 'F', 'G']}
How can I extract all the possible unique samplings per list?
One of the possible output is for example:
d = {'a': 'A', 'b': 'B', 'c': 'D', 'd': 'E'}
or
d = {'a': 'B', 'b': 'A', 'c': 'E', 'd': 'F'}
and so on..
Any idea?
Thank you
This is what you are looking for
import itertools
keys, values = zip(*d.items())
permutations_dicts = [dict(zip(keys, v)) for v in itertools.product(*values)]

Finding the second level keys of a multi key dictionary Python?

I have a multi key dict in the following format. I am trying to access the list of the second level keys, however, it is returning the keys in the format of a dict_keys list. What I am trying to get is ['a', 'b', 'c', 'd', 'e', 'f']
dictTest={}
dictTest[1]={'a':1, 'b':2}
dictTest[2]={'c':1, 'd':2}
dictTest[3]={'e':1, 'f':2}
print(dictTest)
print(list([dictTest[i].keys() for i in dictTest.keys()]))
{1: {'a': 1, 'b': 2}, 2: {'c': 1, 'd': 2}, 3: {'e': 1, 'f': 2}}
[dict_keys(['a', 'b']), dict_keys(['c', 'd']), dict_keys(['e', 'f'])]
You could use itertools.chain in combination with mapping dict.keys to all the dicts values:
from itertools import chain
dictTest = {1: {'a': 1, 'b': 2}, 2: {'c': 1, 'd': 2}, 3: {'e': 1, 'f': 2}}
print(list(chain(*map(dict.keys, dictTest.values()))))
['a', 'b', 'c', 'd', 'e', 'f']
>>> [v2 for v1 in dictTest.values() for v2 in v1]
['a', 'b', 'c', 'd', 'e', 'f']
Try this:
# sum([list(b.keys()) for b in dictTest.values()], [])
# syntax improvement by #wwii
sum([list(b) for b in dictTest.values()], [])
Output:
['a', 'b', 'c', 'd', 'e', 'f']

Find corresponding columns in python

I have a dataset simalar to this one:
p = {'A': [0,1,0,1], 'B': [1,1,1,1], 'C': [0,0,1,1], 'D': [1,1,1,0]}
df5 = pd.DataFrame(data=p)
df5
Now I would like to create a list with the corresponding values per row, which I was currently doing like:
cols = df5.dot(df5.columns).map(set).values.tolist()
cols
However if the name of my column is not 'A' but 'AA' than this is not working anymore, is there a workaround for this?
You can add separator for columns names, then remove last by Series.str.rstrip and use Series.str.split:
p = {'AA': [0,1,0,1], 'B': [1,1,1,1], 'C': [0,0,1,1], 'D': [1,1,1,0]}
df5 = pd.DataFrame(data=p)
cols = df5.dot(df5.columns + ',').str.rstrip(',').str.split(',').map(set).values.tolist()
print (cols)
[{'D', 'B'}, {'B', 'D', 'AA'}, {'C', 'D', 'B'}, {'B', 'AA', 'C'}]
Another solution is use numpy indexing:
c = df5.columns.to_numpy()
cols = [set(c[x]) for x in df5.to_numpy().astype(bool)]
print (cols)
[{'D', 'B'}, {'B', 'D', 'AA'}, {'C', 'D', 'B'}, {'B', 'AA', 'C'}]
Replace the 1 values with the column name
df5.replace(1, pd.Series(df5.columns, df5.columns), inplace=True)
Replace the 0 values with nans, then use stack to drop them and convert to list
cols = df5.replace(0, np.nan).stack().groupby(level=0).apply(list).tolist()
cols
This returns a list of lists instead of a list of sets:
[['B', 'D'], ['A', 'B', 'D'], ['B', 'C', 'D'], ['A', 'B', 'C']]
The stacking is to remove the zeroes. If you are okay to keep them (and maybe remove in a different step, df5.values.tolist() will do.

Replace placeholder in string with dictionary keys

Let's assume I have:
a placeholder string "aabbaaa"
and a dictionary: {'A': 'a', 'B': 'a', 'C': 'b', 'D': 'a', 'E': 'b', 'F': 'a', 'G': 'b'}.
How can I create in python all possible permutations from the dictionary keys for the placeholder string?
The expected result would, for example, be:
AACCAAA, AACCAAB, AACCABA, ... AACEAA, AACEAA, AAEEAA ... , FFGGFFF etc.
The solution could be:
>>> import itertools
>>> from collections import defaultdict
>>> dict_ = defaultdict(list)
>>> input = "ab"
>>> _dict = {'A': 'a', 'B': 'a', 'C': 'b', 'D': 'a', 'E': 'b', 'F': 'a', 'G': 'b'}
>>> for k,v in _dict.items():
... dict_[v].append(k)
...
>>> _iterables = [dict_[character] for character in input]
>>> output = [''.join(tup) for tup in itertools.product(*_iterables)]
set(['BE', 'AC', 'BG', 'AE', 'AG', 'BC', 'DG', 'DE', 'DC', 'FC', 'FE', 'FG'])
Let me know if it helps!!
You can build all the permutation with backtracking.
At first the dict would be more useful if reversed, so do it:
from collections import defaultdict
orig_str = "aabbaaa"
d = {'A': 'a', 'B': 'a', 'C': 'b', 'D': 'a', 'E': 'b', 'F': 'a', 'G': 'b'}
reverse_d = defaultdict(list)
for k, el in d.items():
reverse_d[el].append(k)
And here we have reverse_d = {'a': ['A', 'B', 'D', 'F'], 'b': ['C', 'E', 'G']}
Next we can write our backtracking function that for any character of the string will put the possibilities in order:
def permut(orig_str, index, chars_till_now):
if index == len(orig_str):
print("".join(chars_till_now))
return
chars = chars_till_now[:]
chars.append("")
for possibility in reverse_d[orig_str[index]]:
chars[-1] = possibility
permut(orig_str, index+1, chars)
You can modify the function to save the permutation rather than print or pass a specific dictionary rather than use one global; it depends on what you need.
To call the function just:
permut(orig_str, 0, [])

One line iteration through dictionaries in dictionaries

G2 = {'a': {'c': 1, 'b': 1}, 'b': {'a': 1, 'c': 1}}
b = G2.values()
for i in b:
for key, value in i.items():
list.append(key)
#result: ['c', 'b', 'a', 'c']
Can I get the same result but using a list generator?
I tried it like this:
list2 = [key for key, value in i.items() for i in b]
#but i get: ['a', 'a', 'c', 'c']
just chain the dictionary values (aka keys) using itertools.chain.from_iterable, and convert to list to print the result:
import itertools
G2 = {'a': {'c': 1, 'b': 1}, 'b': {'a': 1, 'c': 1}}
#['c', 'b', 'a', 'c']
result = list(itertools.chain.from_iterable(G2.values()))
print(result)
result:
['c', 'b', 'c', 'a']
note that the order is not guaranteed as you're iterating on dictionary keys.
Variant without using itertools with flattening double loop inside comprehension (which is probably closer to your attempt):
result = [x for values in G2.values() for x in values]

Categories