Find corresponding columns in python - python

I have a dataset simalar to this one:
p = {'A': [0,1,0,1], 'B': [1,1,1,1], 'C': [0,0,1,1], 'D': [1,1,1,0]}
df5 = pd.DataFrame(data=p)
df5
Now I would like to create a list with the corresponding values per row, which I was currently doing like:
cols = df5.dot(df5.columns).map(set).values.tolist()
cols
However if the name of my column is not 'A' but 'AA' than this is not working anymore, is there a workaround for this?

You can add separator for columns names, then remove last by Series.str.rstrip and use Series.str.split:
p = {'AA': [0,1,0,1], 'B': [1,1,1,1], 'C': [0,0,1,1], 'D': [1,1,1,0]}
df5 = pd.DataFrame(data=p)
cols = df5.dot(df5.columns + ',').str.rstrip(',').str.split(',').map(set).values.tolist()
print (cols)
[{'D', 'B'}, {'B', 'D', 'AA'}, {'C', 'D', 'B'}, {'B', 'AA', 'C'}]
Another solution is use numpy indexing:
c = df5.columns.to_numpy()
cols = [set(c[x]) for x in df5.to_numpy().astype(bool)]
print (cols)
[{'D', 'B'}, {'B', 'D', 'AA'}, {'C', 'D', 'B'}, {'B', 'AA', 'C'}]

Replace the 1 values with the column name
df5.replace(1, pd.Series(df5.columns, df5.columns), inplace=True)
Replace the 0 values with nans, then use stack to drop them and convert to list
cols = df5.replace(0, np.nan).stack().groupby(level=0).apply(list).tolist()
cols
This returns a list of lists instead of a list of sets:
[['B', 'D'], ['A', 'B', 'D'], ['B', 'C', 'D'], ['A', 'B', 'C']]
The stacking is to remove the zeroes. If you are okay to keep them (and maybe remove in a different step, df5.values.tolist() will do.

Related

Extract all possible combinations of unique elements in dict of lists

I have this input:
d = {'a': ['A', 'B', 'C'], 'b': ['A', 'B', 'C'], 'c': ['D', 'E'], 'd': ['E', 'F', 'G']}
How can I extract all the possible unique samplings per list?
One of the possible output is for example:
d = {'a': 'A', 'b': 'B', 'c': 'D', 'd': 'E'}
or
d = {'a': 'B', 'b': 'A', 'c': 'E', 'd': 'F'}
and so on..
Any idea?
Thank you
This is what you are looking for
import itertools
keys, values = zip(*d.items())
permutations_dicts = [dict(zip(keys, v)) for v in itertools.product(*values)]

Finding the second level keys of a multi key dictionary Python?

I have a multi key dict in the following format. I am trying to access the list of the second level keys, however, it is returning the keys in the format of a dict_keys list. What I am trying to get is ['a', 'b', 'c', 'd', 'e', 'f']
dictTest={}
dictTest[1]={'a':1, 'b':2}
dictTest[2]={'c':1, 'd':2}
dictTest[3]={'e':1, 'f':2}
print(dictTest)
print(list([dictTest[i].keys() for i in dictTest.keys()]))
{1: {'a': 1, 'b': 2}, 2: {'c': 1, 'd': 2}, 3: {'e': 1, 'f': 2}}
[dict_keys(['a', 'b']), dict_keys(['c', 'd']), dict_keys(['e', 'f'])]
You could use itertools.chain in combination with mapping dict.keys to all the dicts values:
from itertools import chain
dictTest = {1: {'a': 1, 'b': 2}, 2: {'c': 1, 'd': 2}, 3: {'e': 1, 'f': 2}}
print(list(chain(*map(dict.keys, dictTest.values()))))
['a', 'b', 'c', 'd', 'e', 'f']
>>> [v2 for v1 in dictTest.values() for v2 in v1]
['a', 'b', 'c', 'd', 'e', 'f']
Try this:
# sum([list(b.keys()) for b in dictTest.values()], [])
# syntax improvement by #wwii
sum([list(b) for b in dictTest.values()], [])
Output:
['a', 'b', 'c', 'd', 'e', 'f']

How to find common values in list using Python?

I have a dataframe looks like this:
names
year name
0 1990 'a', 'b', 'c'
1 2001 'a', 'd', 'c'
2 2004 'e', 'b', 'c'
And I want to find the common values in names such that,
c:3, a:2, b:2, d:1, e:1
I am not sure how to approach this.
But what I thought of is to convert the name column to a list:
names_list = name['name'].tolist()
names_list = ['a', 'b', 'c', 'a', 'd', 'c', 'e', 'b', 'c']
And then, use the below function I found in another post to get the most common value:
def most_common(lst):
return max(set(lst), key=lst.count)
most_common(names_list)
'c'
And it only gives one most common value, but I'm trying to get at least the top 3 values from the list. How can I do this?
Let us do mode after split and explode
df.name.str.split(', ').explode().mode()
Return the count
df.name.str.split(', ').explode().value_counts() # if only would like the highest count ,
#df.name.str.split(', ').explode().value_counts().sort_values().tail(1)
If you have
names_list = ['a', 'b', 'c', 'a', 'd', 'c', 'e', 'b', 'c']
then you might use collections.Counter following way:
import collections
names_list = ['a', 'b', 'c', 'a', 'd', 'c', 'e', 'b', 'c']
occurs = collections.Counter(names_list)
print(occurs)
Output:
Counter({'c': 3, 'a': 2, 'b': 2, 'e': 1, 'd': 1})
Note that collections.Counter is subclass of dict, so occurs has .keys(), .values(), .items() and so on.

Retrieve keys Set from pandas Series of dict values

Given a pandas Series of dict values with str keys:
Series
------
{'a': 1, 'b' : 2, 'c' : 3}
{'b': 3, 'd': 5}
{'d': 7, 'e': 7}
How can the Series be scanned to retrieve a set of the dictionary keys? The resulting output would be a plain python set:
{'a', 'b', 'c', 'd', 'e'}
Thank you in advance for your consideration and response.
Use list comprehension with flattening and convert to sets:
a = set([y for x in s for y in x])
print (a)
{'e', 'a', 'd', 'c', 'b'}
Or use itertools.chain.from_iterable:
from itertools import chain
a = set(chain.from_iterable(s))
Maybe this:
s = pd.Series(...)
a = set(list(pd.DataFrame(s.tolist())))
# {'a', 'e', 'b', 'c', 'd'}

Converting list of lists to a dictionary with multiple values for a key

I need to write a function that accepts a list of lists representing friends for each person and need to convert it into a dictionary.
so an input of [['A','B'],['A','C'],['A','D'],['B','A'],['C','B'],['C','D'],['D','B'],['E']] should return {A:[B,C,D],B:[A],C:[B,D],D:[B],E:None}
Input:
[['A','B'],['A','C'],['A','D'],['B','A'],['C','B'],['C','D'],['D','B'],['E']]
Expected Output:
{A:[B,C,D],B:[A],C:[B,D],D:[B],E:None}
Currently I am trying the following:
s=[['A','B'],['A','C'],['A','D'],['B','A'],['C','B'],['C','D'],['D','B'],['E']]
output=dict.fromkeys((set([x[0] for x in s])),[ ])
for x in s:
if len(x)>1:
output[x[0]].append(x[1])
else:
output[x[0]].append(None)
But the output is giving me all values for every key rather than returning only the corresponding values
The output i am getting is:
{
'A': ['B', 'C', 'D', 'A', 'B', 'D', 'B', None],
'B': ['B', 'C', 'D', 'A', 'B', 'D', 'B', None],
'C': ['B', 'C', 'D', 'A', 'B', 'D', 'B', None],
'D': ['B', 'C', 'D', 'A', 'B', 'D', 'B', None],
'E': ['B', 'C', 'D', 'A', 'B', 'D', 'B', None]
}
You can iterate through the key-value pairs in the list of lists, but unpack the value as a list to accommodate the possible lack of a value:
s = [['A','B'],['A','C'],['A','D'],['B','A'],['C','B'],['C','D'],['D','B'],['E']]
output = {}
for k, *v in s:
if v:
output.setdefault(k, []).extend(v)
else:
output[k] = None
output becomes:
{'A': ['B', 'C', 'D'], 'B': ['A'], 'C': ['B', 'D'], 'D': ['B'], 'E': None}
Or if you don't mind that keys without a value get an empty list instead of None, you can simply do:
output = {}
for k, *v in s:
output.setdefault(k, []).extend(v)
output would then become:
{'A': ['B', 'C', 'D'], 'B': ['A'], 'C': ['B', 'D'], 'D': ['B'], 'E': []}
The issue is the list you feed to dict.keys is only one reference across keys.
Your desired result is inconsistent. I recommend you choose an empty list for 'E', however much it seems None is more appropriate. With this adjusted requirement, you can use collections.defaultdict.
from collections import defaultdict
L = [['A','B'],['E','C'],['A','D'],['B','A'],['C','B'],['C','D'],['D','B'],['E']]
dd = defaultdict(list)
for lst in L:
if len(lst) > 1:
dd[lst[0]].append(lst[1])
else:
dd[lst[0]]
print(dd)
defaultdict(list,
{'A': ['B', 'C', 'D'],
'B': ['A'],
'C': ['B', 'D'],
'D': ['B'],
'E': []})
You should use a common dict comprehension to initiate the dict:
output = {x[0]: [] for x in s}
dict.fromkeys gives all keys the identical referential value. With a mutable value that is a problem. The comprehension will give each key an independent list object, in addition to being more readable.
One of the way to solve this is as given below:
friend_combi = [['A','B'],['A','C'],['A','D'],['B','A'],['C','B'],['C','D'],['D','B'],['E']] # Input to be processed
final_dict = {} #Empty dict to store result
for i in friend_combi: # loop through each element in list
if final_dict.get(i[0]): #if data present in dict then append else add
final_dict[i[0]].append(i[1])
else:
final_dict[i[0]] = [i[1]] if i[1:] else None #check if value exist in list else save None
print (final_dict)
#Output --> {'A': ['B', 'C', 'D'], 'B': ['A'], 'C': ['B', 'D'], 'D': ['B'], 'E': None}
I hope this helps :)
You can define a function named get_dictionary() as the below code shows.
>>> def get_dictionary(l):
... d = {}
... for arr in l:
... if len(arr) == 2:
... key = arr[0]
... if key in d:
... d[key].append(arr[1])
... else:
... d[key] = [arr[1]]
... else:
... d[key] = None
... return d
...
>>> l = [['A','B'], ['A','C'], ['A','D'], ['B','A'], ['C','B'], ['C','D'], ['D','B'], ['E']]
>>>
>>> get_dictionary(l)
{'A': ['B', 'C', 'D'], 'B': ['A'], 'C': ['B', 'D'], 'D': None}
>>>
Pretty printing the dictionary as JSON.
>>> import json
>>>
>>> d = get_dictionary(l)
>>>
>>> print(json.dumps(d, indent=4))
{
"A": [
"B",
"C",
"D"
],
"B": [
"A"
],
"C": [
"B",
"D"
],
"D": null
}
>>>

Categories