I want to remove a dictionary from a nested dictionary based on a condition.
dict :
{1: {'A': [1, 2, 3, 0], 'B': ['ss', 'dd', 'ff', 'aa']},
2: {'A': [0, 1, 2, 3], 'B': ['ee', 'ff', 'bb', 'gg']},
3: {'A': [0, 1, 2], 'B': ['ar', 'hh', 'ww']},
4: {'A': [ 1, 0], 'B': [ 'll', 'jj']}}
I want to remove if 'A' == 0, then if B isnt starting with a, then I want to delete that particular dictionary.
Expected:
{1: {'A': [1, 2, 3, 0], 'B': ['ss', 'dd', 'ff', 'aa']},
2: {'A': [0, 1, 2], 'B': ['ar', 'hh', 'ww']},
}
Check
s=pd.DataFrame(d)
new_d = s.loc[:,s.loc['B'].str[0].str[0]=='a'].to_dict()
Out[99]:
{1: {'A': [0, 1, 2, 3], 'B': ['aa', 'ss', 'dd', 'ff']},
3: {'A': [0, 1, 2], 'B': ['ar', 'hh', 'ww']}}
i assume the keys change on the new dictionary is on purpose, so here is a simple solution with no extra libraries(nested is your dictionary)
temp = [x for x in nested.values() if 0 in x['A'] and any([i[0] == 'a' for i in x['B']])]
new_dict = dict(enumerate(temp, 1))
if you want you can make it a one liner, or you can split the condition to be a function
like this
def check(x):
if 0 in x['A']:
return any([i[0] == 'a' for i in x['B']])
return True
temp = [x for x in nested.values() if check(x)]
Now if you prefer using filter() the above 2 variations become:
temp = filter(lambda x: 0 in x['A'] and any(i[0] == 'a' for i in x['B']), nested.values())
and if you decide to declare check() like above you can do it in this elegant way(notice i use new_dict here, instead of temp since it doen's make sence to split it when it so compact)
new_dict = dict(enumerate(filter(check, nested.values()),1)
Your question needs more clarity.
Assuming you intend to delete
{
1: {'A': [0, 1, 2, 3], 'B': ['aa', 'ss', 'dd', 'ff']},
2: {'A': [0, 1, 2], 'B': ['ar', 'hh', 'ww']},
}
from
{1: {'A': [0, 1, 2, 3], 'B': ['aa', 'ss', 'dd', 'ff']},
2: {'A': [0, 1, 2, 3], 'B': ['ee', 'ff', 'bb', 'gg']},
3: {'A': [0, 1, 2], 'B': ['ar', 'hh', 'ww']},
4: {'A': [0, 1], 'B': ['jj', 'll']}}
You can try this:
import re
r = re.compile("a.*")
input_dictionary = {
1: {'A': [0, 1, 2, 3], 'B': ['aa', 'ss', 'dd', 'ff']},
2: {'A': [0, 1, 2, 3], 'B': ['ee', 'ff', 'bb', 'gg']},
3: {'A': [0, 1, 2], 'B': ['ar', 'hh', 'ww']},
4: {'A': [0, 1], 'B': ['jj', 'll']}}
key_list = list(input_dictionary.keys())
for key in key_list:
value = input_dictionary[key]
if(any(r.match(b_element) for b_element in value['B']) and 0 in value['A']):
del input_dictionary[key]
print("Deleting key: ", key)
Related
am trying to subset a dataset based on a condition and pick the rows until it sees the value based on a condition
Condition, if Column A == 0, column B should start with 'a'.
Dataset:
A B
0 aa
1 ss
2 dd
3 ff
0 ee
1 ff
2 bb
3 gg
0 ar
1 hh
2 ww
0 jj
1 ll
expected:
[0:{'A':[0,1,2,3], 'B':['aa','ss','dd','ff']}, 1:{'A':[0,1,2], 'B':['ar','hh,'ww']} ]
The series starts from column A == 0 and ends until the next 0.
In total there are 4 different dictionaries in that dataframe.
May be try with cumsum as well ~
{x : y.to_dict('list')for x , y in df.groupby(df['A'].eq(0).cumsum())}
Out[87]:
{1: {'A': [0, 1, 2, 3], 'B': ['aa', 'ss', 'dd', 'ff']},
2: {'A': [0, 1, 2, 3], 'B': ['ee', 'ff', 'bb', 'gg']},
3: {'A': [0, 1, 2], 'B': ['rr', 'hh', 'ww']},
4: {'A': [0, 1], 'B': ['jj', 'll']}}
Do a cumsum on the condition to identify the groups, then groupby:
groups = (df['A'].eq(0) & df['B'].str.startswith('a')).cumsum()
{k:v.to_dict(orient='list') for k,v in df.groupby(groups)}
Output:
{1: {'A': [0, 1, 2, 3], 'B': ['aa', 'ss', 'dd', 'ff']},
2: {'A': [0, 1, 2, 3], 'B': ['ae', 'ff', 'bb', 'gg']},
3: {'A': [0, 1, 2, 0, 1], 'B': ['ar', 'hh', 'ww', 'jj', 'll']}}
This answers this question's revision 2020-11-04 19:29:39Z. Later additions/edits to the question or additional requirements in the comments will not be considered.
First find the desired rows and select them into a new dataframe. Group the rows and convert them to dicts.
g = (df.A.eq(0).astype(int) + df.B.str.startswith('a')).replace(0, method='ffill') - 1
df_BeqA = df[g.astype('bool')]
{x: y.to_dict('list') for x , y in df_BeqA.groupby(df_BeqA.A.eq(0).cumsum() - 1)}
Out:
{0: {'A': [0, 1, 2, 3], 'B': ['aa', 'ss', 'dd', 'ff']},
1: {'A': [0, 1, 2], 'B': ['ar', 'hh', 'ww']}}
I have a DataFrame which looks like:
Users Date
['A', 'B'] 2017-10-21
['B', 'C'] 2017-10-21
['A', 'D'] 2017-10-21
['D', 'E'] 2017-10-22
['A', 'E'] 2017-10-22
['A', 'E', 'D'] 2017-10-22
['C', 'B', 'E'] 2017-10-23
['D', 'C', 'F'] 2017-11-23
I need to make a new DataFrame from this DataFrame which would count the number of times the items show up in the list on each day. The count, therefore, would be across different rows on the same date..
For example, the new DataFrame would look like:
Users Date
[A=2, B=2, C=1, D=1] 2017-10-21
[E=3, D=2, A=2] 2017-10-22
[B=1, C=2, D=1, E=1, F=1] 2017-10-23
Some things to note: the all the items in the first dataset are lists with individual elements being strings. The Date column is of DateTime type.
I understand there would be a groupby function on the Date column but I can't figure out how to write the function that I would apply to.
Using groupby and apply with collections.Counter:
df.groupby('Date').Users.sum().apply(collections.Counter, 1)
Date
2017-10-21 {'A': 2, 'B': 2, 'C': 1, 'D': 1}
2017-10-22 {'D': 2, 'E': 3, 'A': 2}
2017-10-23 {'C': 1, 'B': 1, 'E': 1}
2017-11-23 {'D': 1, 'C': 1, 'F': 1}
Name: Users, dtype: object
If you have multiple columns that you want to count per group:
Setup
s = 'ABCDE'
df = pd.DataFrame({
'Users': [random.sample(s, random.randint(1, 5)) for _ in range(10)],
'Tools': [random.sample(s, random.randint(1, 5)) for _ in range(10)],
'Hours': [random.sample(s, random.randint(1, 5)) for _ in range(10)],
'Date': ['2017-10-21', '2017-10-21', '2017-10-21', '2017-10-22',
'2017-10-22', '2017-10-22', '2017-10-23', '2017-10-23', '2017-10-23', '2017-11-23']
})
Using agg:
df.groupby('Date').sum().agg({
'Users': collections.Counter,
'Tools': collections.Counter,
'Hours': collections.Counter
})
Users Tools Hours
Date
2017-10-21 {'C': 2, 'E': 2, 'A': 2, 'B': 2, 'D': 1} {'E': 3, 'A': 2, 'B': 3, 'D': 2, 'C': 2} {'B': 2, 'C': 2, 'E': 1, 'A': 1, 'D': 1}
2017-10-22 {'D': 2, 'A': 2, 'E': 1, 'C': 1, 'B': 2} {'E': 2, 'B': 3, 'A': 3, 'D': 1, 'C': 1} {'B': 1, 'C': 2, 'E': 2, 'A': 2, 'D': 2}
2017-10-23 {'B': 2, 'A': 2, 'D': 1, 'E': 1, 'C': 2} {'D': 3, 'E': 2, 'B': 2, 'C': 3, 'A': 2} {'C': 3, 'E': 2, 'D': 2, 'B': 1, 'A': 2}
2017-11-23 {'D': 1, 'B': 1, 'C': 1} {'B': 1} {'C': 1, 'E': 1}
I have sets of values that I want to apply as parameters to a function:
params = {
'a': [1, 2, 3],
'b': [5, 6, 7],
'x': [None, 'eleven', 'f'],
# et cetera
}
I want to run myfunc() with all possible combinations, so myfunc(a=1, b=5, x=None ...), myfunc(a=2, b=5, x=None ...) ... myfunc(a=3, b=7, x='f' ...). Is there something (for example in itertools) that can help? I thought about using itertools.product() but that doesn't keep the names of the parameters and just gives me tuples of the combinations.
You can use itertools.product to get all combinations of arguments:
>>> import itertools
>>> for xs in itertools.product([1,2], [5,6], ['eleven', 'f']):
... print(xs)
...
(1, 5, 'eleven')
(1, 5, 'f')
(1, 6, 'eleven')
(1, 6, 'f')
(2, 5, 'eleven')
(2, 5, 'f')
(2, 6, 'eleven')
(2, 6, 'f')
With Argument list unpacking, you can call myfunc with all combinations of keyword arguments:
params = {
'a': [1, 2, 3],
'b': [5, 6, 7],
'x': [None, 'eleven', 'f'],
}
def myfunc(**args):
print(args)
import itertools
keys = list(params)
for values in itertools.product(*map(params.get, keys)):
myfunc(**dict(zip(keys, values)))
output:
{'a': 1, 'x': None, 'b': 5}
{'a': 1, 'x': None, 'b': 6}
{'a': 1, 'x': None, 'b': 7}
{'a': 1, 'x': 'eleven', 'b': 5}
{'a': 1, 'x': 'eleven', 'b': 6}
{'a': 1, 'x': 'eleven', 'b': 7}
{'a': 1, 'x': 'f', 'b': 5}
...
Ordering of .keys and .values are guaranteed across all Python versions (unless dict is altered which does not happen here), so this might be a bit trivial:
from itertools import product
for vals in product(*params.values()):
myfunc(**dict(zip(params, vals)))
You can find the gurantee in the docs:
If keys, values and items views are iterated over with no intervening
modifications to the dictionary, the order of items will directly
correspond.
Demo:
for vals in product(*params.values()):
print(dict(zip(params, vals)))
{'a': 1, 'x': None, 'b': 5}
{'a': 1, 'x': None, 'b': 6}
{'a': 1, 'x': None, 'b': 7}
{'a': 1, 'x': 'eleven', 'b': 5}
{'a': 1, 'x': 'eleven', 'b': 6}
{'a': 1, 'x': 'eleven', 'b': 7}
{'a': 1, 'x': 'f', 'b': 5}
{'a': 1, 'x': 'f', 'b': 6}
{'a': 1, 'x': 'f', 'b': 7}
...
I developed combu which is that solution.
Install combu
pip install combu
Use combu
# Case 1: Directly call
import combu
for res, param in combu.execute(myfunc, params):
print(res, params)
# Case 2: Use class
from combu import Combu
comb = Combu(myfunc)
for res, param in comb.execute(params):
print(res, params)
here is my list of dict:
l = [{'a': 2, 'c': 1, 'b': 3},
{'a': 2, 'c': 3, 'b': 1},
{'a': 1, 'c': 2, 'b': 3},
{'a': 1, 'c': 3, 'b': 2},
{'a': 2, 'c': 5, 'b': 3}]
and now I want to sort the list by keys and orders provided by the user. for instance:
keys = ['a', 'c', 'b']
orders = [1, -1, 1]
I tried to using lambda in sort()method but it failed in a weird way :
>>> l.sort(key=lambda x: (order * x[key] for (key, order) in zip(keys, orders)))
>>> l
[{'a': 2, 'c': 5, 'b': 3},
{'a': 1, 'c': 3, 'b': 2},
{'a': 1, 'c': 2, 'b': 3},
{'a': 2, 'c': 3, 'b': 1},
{'a': 2, 'c': 1, 'b': 3}]
Anyone know how to solve this?
You were almost there; your lambda produces generator expressions and those happen to be ordered by their memory address (in Python 2) and produce a TypeError: '<' not supported between instances of 'generator' and 'generator' exception in Python 3.
Use a list comprehension instead:
l.sort(key=lambda x: [order * x[key] for (key, order) in zip(keys, orders)])
Demo:
>>> l = [{'a': 1, 'c': 2, 'b': 3},
... {'a': 1, 'c': 3, 'b': 2},
... {'a': 2, 'c': 1, 'b': 3},
... {'a': 2, 'c': 5, 'b': 3},
... {'a': 2, 'c': 3, 'b': 1}]
>>> keys = ['a', 'c', 'b']
>>> orders = [1, -1, 1]
>>> l.sort(key=lambda x: [order * x[key] for (key, order) in zip(keys, orders)])
>>> from pprint import pprint
>>> pprint(l)
[{'a': 1, 'b': 2, 'c': 3},
{'a': 1, 'b': 3, 'c': 2},
{'a': 2, 'b': 3, 'c': 5},
{'a': 2, 'b': 1, 'c': 3},
{'a': 2, 'b': 3, 'c': 1}]
I wanted to create a dictionary of dictionaries in Python:
Suppose I already have a list which contains the keys:
keys = ['a', 'b', 'c', 'd', 'e']
value = [1, 2, 3, 4, 5]
Suppose I have a data field with numeric values (20 of them)
I want to define a dictionary which stores 4 different dictionaries with the given to a corresponding value
for i in range(0, 3)
for j in range(0, 4)
dictionary[i] = { 'keys[j]' : value[j] }
So basically, it should be like:
dictionary[0] = {'a' : 1, 'b' : 2, 'c' : 3, 'd': 4, 'e':5}
dictionary[1] = {'a' : 1, 'b' : 2, 'c' : 3, 'd': 4, 'e':5}
dictionary[2] = {'a' : 1, 'b' : 2, 'c' : 3, 'd': 4, 'e':5}
dictionary[3] = {'a' : 1, 'b' : 2, 'c' : 3, 'd': 4, 'e':5}
What is the best way to achieve this?
Use a list comprehension and dict(zip(keys,value)) will return the dict for you.
>>> keys = ['a', 'b', 'c', 'd', 'e']
>>> value = [1, 2, 3, 4, 5]
>>> dictionary = [dict(zip(keys,value)) for _ in xrange(4)]
>>> from pprint import pprint
>>> pprint(dictionary)
[{'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5},
{'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5},
{'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5},
{'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5}]
If you want a dict of dicts then use a dict comprehension:
>>> keys = ['a', 'b', 'c', 'd', 'e']
>>> value = [1, 2, 3, 4, 5]
>>> dictionary = {i: dict(zip(keys,value)) for i in xrange(4)}
>>> pprint(dictionary)
{0: {'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5},
1: {'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5},
2: {'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5},
3: {'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5}}
An alternative that only zips once...:
from itertools import repeat
map(dict, repeat(zip(keys,values), 4))
Or, maybe, just use dict.copyand construct the dict once:
[d.copy() for d in repeat(dict(zip(keys, values)), 4)]
for a list of dictionaries:
dictionary = [dict(zip(keys,value)) for i in xrange(4)]
If you really wanted a dictionary of dictionaries like you said:
dictionary = dict((i,dict(zip(keys,value))) for i in xrange(4))
I suppose you could use pop or other dict calls which you could not from a list
BTW: if this is really a data/number crunching application, I'd suggest moving on to numpy and/or pandas as great modules.
Edit re: OP comments,
if you want indicies for the type of data you are talking about:
# dict keys must be tuples and not lists
[(i,j) for i in xrange(4) for j in range(3)]
# same can come from itertools.product
from itertools import product
list(product(xrange4, xrange 3))