Related
I have a dictionary of the form:
{"level": [1, 2, 3],
"conf": [-1, 1, 2],
"text": ["here", "hel", "llo"]}
I want to filter the lists to remove every item at index i where an index in the value "conf" is not >0.
So for the above dict, the output should be this:
{"level": [2, 3],
"conf": [1, 2],
"text": ["hel", "llo"]}
As the first value of conf was not > 0.
I have tried something like this:
new_dict = {i: [a for a in j if a >= min_conf] for i, j in my_dict.items()}
But that would work just for one key.
try:
from operator import itemgetter
def filter_dictionary(d):
positive_indices = [i for i, item in enumerate(d['conf']) if item > 0]
f = itemgetter(*positive_indices)
return {k: list(f(v)) for k, v in d.items()}
d = {"level": [1, 2, 3], "conf": [-1, 1, 2], "text": ["-1", "hel", "llo"]}
print(filter_dictionary(d))
output:
{'level': [2, 3], 'conf': [1, 2], 'text': ['hel', 'llo']}
I tried to first see which indices of 'conf' are positive, then with itemgetter I picked those indices from values inside the dictionary.
More compact version + without temporary list using generator expression instead:
def filter_dictionary(d):
f = itemgetter(*(i for i, item in enumerate(d['conf']) if item > 0))
return {k: list(f(v)) for k, v in d.items()}
Here's a one-liner:
dct = {k: [x for i, x in enumerate(v) if d['conf'][i] > 0] for k, v in d.items()}
Output:
>>> dct
{'level': [2, 3], 'conf': [1, 2], 'text': ['hel', 'llo']}
With sample data:
d = {"level":[1,2,3], "conf":[-1,1,2], "text":["here","hel","llo"]
I would keep the indexes of valid elements (those greater than 0) with:
kept_keys = [i for i in range(len(my_dict['conf'])) if my_dict['conf'][i] > 0]
And then you can filter each list checking if the index of a certain element in the list is contained in kept_keys:
{k: list(map(lambda x: x[1], filter(lambda x: x[0] in kept_keys, enumerate(my_dict[k])))) for k in my_dict}
Output:
{'level': [2, 3], 'conf': [1, 2], 'text': ['hel', 'llo']}
The structure of the data you're describing sounds like it might be more naturally modelled as a pandas DataFrame: you are essentially viewing your data as a 2-D grid, and you want to filter out rows of that grid based on the value in one column.
The following snippet will do what you need using a DataFrame as an intermediate representation:
import pandas as pd
data = {"level":[1,2,3], "conf":[-1,1,2], "text":["here","hel","llo"]}
df = pd.DataFrame(data)
df = df.loc[df["conf"] > 0]
result = df.to_dict(orient="list")
Output:
{'level': [2, 3], 'conf': [1, 2], 'text': ['hel', 'llo']}
However, note that if you represent your data as a DataFrame in the first place, and keep it in that form when you're done, this is simplified to,
data = pd.DataFrame({
"level":[1,2,3],
"conf":[-1,1,2],
"text":["here","hel","llo"],
})
result = data.loc[data["conf"] > 0]
Output:
level conf text
1 2 1 hel
2 3 2 llo
Which is terser, more expressive, and (on large inputs) more performant than any "pure dict" solution.
If the other operations you want to do on this data are similar (in the sense of really being '2D array' operations), it is likely that they will also be more naturally expressed in terms of DataFrames, and so keeping your data as a DataFrame is likely to be advantageous vs converting back to a dict.
I solved it with this:
from typing import Dict, List, Any, Set
d = {"level":[1,2,3], "conf":[-1,1,2], "text":["-1", "hel", "llo"]}
# First, we create a set that stores the indices which should be kept.
# I chose a set instead of a list because it has a O(1) lookup time.
# We only want to keep the items on indices where the value in d["conf"] is greater than 0
filtered_indexes = {i for i, value in enumerate(d.get('conf', [])) if value > 0}
def filter_dictionary(d: Dict[str, List[Any]], filtered_indexes: Set[int]) -> Dict[str, List[Any]]:
filtered_dictionary = d.copy() # We'll return a modified copy of the original dictionary
for key, list_values in d.items():
# In the next line the actual filtering for each key/value pair takes place.
# The original lists get overwritten with the filtered lists.
filtered_dictionary[key] = [value for i, value in enumerate(list_values) if i in filtered_indexes]
return filtered_dictionary
print(filter_dictionary(d, filtered_indexes))
Output:
{'level': [2, 3], 'conf': [1, 2], 'text': ['hel', 'llo']}
You can have a function which works out which indexes to keep and reformulate each list with only those indexes:
my_dict = {"level":[1,2,3], "conf":[-1,1,2],'text':["-1","hel","llo"]}
def remove_corresponding_items(d, key):
keep_indexes = [idx for idx, value in enumerate(d[key]) if value>0]
for key, lst in d.items():
d[key] = [lst[idx] for idx in keep_indexes]
remove_corresponding_items(my_dict, 'conf')
print(my_dict)
Output as requested
Here's a numpy way of doing it:
dct = {"level":[1,2,3], "conf":[-1,1,2], "text":["here","hel","llo"]}
dct = {k: np.array(v) for k, v in d.items()}
dct = {k: v[a['conf'] > 0].tolist() for k, v in a.items()}
Output:
>>> dct
{'level': [2, 3], 'conf': [1, 2], 'text': ['hel', 'llo']}
Lots of good answers. Here's another 2-pass approach:
mydict = {"level": [1, 2, 3], "conf": [-1, 1, 2], 'text': ["-1", "hel", "llo"]}
for i, v in enumerate(mydict['conf']):
if v <= 0:
for key in mydict.keys():
mydict[key][i] = None
for key in mydict.keys():
mydict[key] = [v for v in mydict[key] if v is not None]
print(mydict)
Output:
{'level': [2, 3], 'conf': [1, 2], 'text': ['hel', 'llo']}
a = {"level":[1,2,3,4], "conf": [-1,1,2,-1],"text": ["-1","hel","llo","test"]}
# inefficient solution
# for k, v in a.items():
# if k == "conf":
# start_search = 0
# to_delete = [] #it will store the index numbers of the conf that you want to delete(conf<0)
# for element in v:
# if element < 0:
# to_delete.append(v.index(element,start_search))
# start_search = v.index(element) + 1
#more efficient and elegant solution
to_delete = [i for i, element in enumerate(a["conf"]) if element < 0]
for position in list(reversed(to_delete)):
for k, v in a.items():
v.pop(position)
and the result will be
>>> a
{'level': [2, 3], 'conf': [1, 2], 'text': ['hel', 'llo']}
Try this, simple and easy to understand, especially for beginners:
a_dict = {"level": [1, 2, 3, 4, 5, 8], "conf": [-1, 1, -1, -2], "text": ["-1", "hel", "llo", "ai", 0, 9]}
# iterate backwards over the list keeping the indexes
for index, item in reversed(list(enumerate(a_dict["conf"]))):
if item <= 0:
for lists in a_dict.values():
del lists[index]
print(a_dict)
Output:
{'level': [2, 5, 8], 'conf': [1], 'text': ['hel', 0, 9]}
I believe this will work:
For each list, we will filter the values where conf is negative, and after that we will filter conf itself.
d = {"level":[1,2,3], "conf":[-1,1,2], "text":["-1","hel","llo"]}
for key in d:
if key != "conf":
d[key] = [d[key][i] for i in range(len(d[key])) if d["conf"][i] >= 0]
d["conf"] = [i for i in d["conf"] if i>=0]
print(d)
A simpler solution will be (exactly the same but using list comprehension, so we don't need to do it separately for conf and the rest:
d = {"level":[1,2,3], "conf":[-1,1,2], "text":["-1","hel","llo"]}
d = {i:[d[i][j] for j in range(len(d[i])) if d["conf"][j] >= 0] for i in d}
Output:
{'level': [2, 3], 'conf': [1, 2], 'text': ['hel', 'llo']}
I am trying to implement a simple task. I have a dictionary with keys (ti, wi)
y={('t1', 'w1'): 1, ('t2', 'w1'): 2, ('t3', 'w1'): 3, ('t1', 'w2'): 4, ('t2', 'w2'): 5, ('t3', 'w2'): 6}
I want to create a new dictionary where keys will be wi, and value is a list of all ti. So I want to have an output dictionary like:
{'w1': [1, 2, 3], 'w2': [4, 5, 6]}
I wrote the following code:
y={('t1', 'w1'): 1, ('t2', 'w1'): 2, ('t3', 'w1'): 3, ('t1', 'w2'): 4, ('t2', 'w2'): 5, ('t3', 'w2'): 6}
y_w={}
y_t=[]
for w in range(1,3):
y_t.clear()
for t in range(1,4):
print('t= ', t, 'w= ', w, 'y=' , y['t{0}'.format(t), 'w{0}'.format(w)])
y_t.append(y['t{0}'.format(t), 'w{0}'.format(w)])
print(y_t)
y_w['w{0}'.format(w)]=y_t
print(y_w)
But the result I am getting is
{'w1': [4, 5, 6], 'w2': [4, 5, 6]}
I can not understand where the first list disappeared? Can someone help me explain where I am wrong? Is there a nicer way to do it, maybe without for lops?
Your problem lies in the assumption that setting the value in the dictionary somehow freezes the list.
It's no accident the lists have the same values: They are identical, two pointers to the same list. Observe:
>>> a_dict = {}
>>> a_list = []
>>> a_list.append(23)
>>> a_dict["a"] = a_list
>>> a_list.clear()
>>> a_list.append(42)
>>> a_dict["b"] = a_list
>>> a_dict
{'a': [42], 'b': [42]}
You could fix your solution by replacing y_t.clear() with y_t = [], which does create a new list:
y = {('t1', 'w1'): 1, ('t2', 'w1'): 2, ('t3', 'w1'): 3, ('t1', 'w2'): 4, ('t2', 'w2'): 5, ('t3', 'w2'): 6}
y_w = {}
for w in range(1,3):
y_t = []
for t in range(1,4):
print('t= ', t, 'w= ', w, 'y=' , y['t{0}'.format(t), 'w{0}'.format(w)])
y_t.append(y['t{0}'.format(t), 'w{0}'.format(w)])
print(y_t)
y_w['w{0}'.format(w)]=y_t
print(y_w)
But there are, as you suspect, easier ways of doing this, for example the defaultdict solution shown by Riccardo Bucco.
Try this:
from collections import defaultdict
d = defaultdict(list)
for k, v in y.items():
d[k[1]].append(v)
d = dict(d)
The line number 10 is causing the problem, if you replace it with y_t = [] it will work as you expect
You could first find all unique keys:
unique_keys = set(list(zip(*k))[1])
and then create the dict with list-values using those:
{u: [v for k, v in y.items() if k[1] == u] for u in unique_keys}
According to your output here's what you can try:
y = {('t1', 'w1'): 1, ('t2', 'w1'): 2, ('t3', 'w1'): 3, ('t1', 'w2'): 4, ('t2', 'w2'): 5, ('t3', 'w2'): 6}
def new_dict_with_keys(dictionary):
new_dictionary = dict()
# Go through the dictionary keys to read each key's value
for tuple_key in dictionary:
if "w1" in tuple_key or "w2" in tuple_key:
# Determine which key to use
if "w1" in tuple_key:
key = "w1"
else:
key = "w2"
# Check if the new dictionary has the "w1" or "w2" as a an item
# If it does not, create a new list
if new_dictionary.get(key) is None:
new_dictionary[key] = list()
# Append the value in the respective key
new_dictionary[key].append(dictionary[tuple_key])
# Return the dictionary with the items
return new_dictionary
print(new_dict_with_keys(y))
# Prints: {'w1': [1, 2, 3], 'w2': [4, 5, 6]}
Here's a solution using itertools.groupby:
import itertools as it
from operator import itemgetter
items = sorted((k, v) for (_, k), v in y.items())
groups = it.groupby(items, key=itemgetter(0))
result = {k: [v for _, v in vs] for k, vs in groups}
# {'w1': [1, 2, 3], 'w2': [4, 5, 6]}
I am working on a function to exclude all occurrences in a list and return a tuple/list with index information that will be assigned to a library. For example:
for a list input:
x = [0,0,1,2,3,0,0,]
output:
{"inds":[2,3,4],"vals":[1,2,3]}
My current solution is very ungly:
def function(x):
b = list()
c = list()
d = {'inds': [], 'vals': []}
a = list(enumerate(x))
for i in a:
if i[1]!=0:
b.append(i[1])
c.append(i[0])
d["inds"] = c
d["vals"] = b
return d
I am looking forward a concise solution.
You're basically there, you have the concept in mind. There's just a few ways to clean up your code.
There's no need to create lists b and c, when you can simply append the new data into the dictionary:
x = [0, 0, 1, 2, 3, 0, 0]
d = {'inds': [], 'vals': []}
for i, j in enumerate(x):
if j != 0:
d['inds'].append(i)
d['vals'].append(j)
print(d)
# Prints: {'vals': [1, 2, 3], 'inds': [2, 3, 4]}
There's also no need to call list() around enumerate(). I'm going to assume you use Python 3 here and that when you do enumerate(), you see something like:
<enumerate object at 0x102c579b0>
This is ok! This is because enumerate returns a special object of its own which is iterable just like a list, so you can simply loop through a. Also, since the list will have two values per item, you can do for i, j like I have.
idx, vals = zip(*[[n, v] for n, v in enumerate(x) if v])
d = {"inds": idx, "vals": vals}
>>> d
{'inds': [2, 3, 4], 'vals': [1, 2, 3]}
Your solution is ok, but you have some superfluous lines.
def function(x):
d = {'inds': [], 'vals': []}
for index, value in enumerate(x):
if value != 0:
d['inds'].append(index)
d['vals'].append(value)
return d
If performance is an issue for very long arrays you could also use numpy:
def function(x):
x_arr = np.array(x)
mask = x_arr != 0
indices = np.argwhere(mask)[:,0]
values = x_arr[mask]
return {'inds': list(indices), 'vals': list(values)}
You can do it also like this:
d = dict((i, v) for i, v in enumerate(x) if v)
d = {'inds': d.keys(), 'vals': d.values()}
EDIT:
If order matters, then like this (thanks to comments):
import collections
d = collections.OrderedDict((i, v) for i, v in enumerate(x) if v)
d = {'inds': d.keys(), 'vals': d.values()}
It can be done by this ugly functional one-liner:
{'inds': list(filter(lambda x: x>0, map(lambda (i,x): i if x>0 else 0 , enumerate(x)))), 'vals': list(filter(lambda x: x!=0, x))}
output:
{'inds': [2, 3, 4], 'vals': [1, 2, 3]}
this gives you inds:
list(filter(lambda x: x>0, map(lambda (i,x): i if x>0 else 0 , enumerate(x))))
this gives you values:
list(filter(lambda x: x!=0, x))
Imagine the following dicts:
a = {'key1': {'subkey1': [1, 2, 3]}}
b = {'key1': {'subkey2': [1, 2, 3]}}
I'd like to merge them to get
c = {'key1': {'subkey1': [1, 2, 3],
'subkey2': [1, 2, 3]}}
Extra nice would be a solution that returns deep-copies from a and b which I can alter without altering a or b.
c = {**a, **b}
looks nice but seems to be the same as c = copy(a).update(b) which returns same as b in my case because key1 gets overwritten by the update.
You can of course do this by hand like this (found in another answer):
def combine_dict(map1: dict, map2: dict):
def update(d: dict, u: dict):
for k, v in u.items():
if isinstance(v, collections.Mapping):
r = update(d.get(k, {}), v)
d[k] = r
else:
d[k] = u[k]
return d
_result = {}
update(_result, map1)
update(_result, map2)
return _result
But we have Python 3.5 now - maybe things have changed?
You need recursion to accomplish this. Luckily milanboers on GitHub saved us from hours of work and possible brain damage.
def deep_merge(dict1: dict, dict2: dict) -> dict:
""" Merges two dicts. If keys are conflicting, dict2 is preferred. """
def _val(v1, v2):
if isinstance(v1, dict) and isinstance(v2, dict):
return deep_merge(v1, v2)
return v2 or v1
return {k: _val(dict1.get(k), dict2.get(k)) for k in dict1.keys() | dict2.keys()}
a = {'key1': {'subkey1': [1, 2, 3]}}
b = {'key1': {'subkey2': [1, 2, 3]}}
a = deep_merge(a, b)
print(a)
Results in:
{'key1': {'subkey2': [1, 2, 3], 'subkey1': [1, 2, 3]}}
I would like update a dictionary items in a for loop here is what I have:
>>> d = {}
>>> for i in range(0,5):
... d.update({"result": i})
>>> d
{'result': 4}
But I want d to have following items:
{'result': 0,'result': 1,'result': 2,'result': 3,'result': 4}
As mentioned, the whole idea of dictionaries is that they have unique keys.
What you can do is have 'result' as the key and a list as the value, then keep appending to the list.
>>> d = {}
>>> for i in range(0,5):
... d.setdefault('result', [])
... d['result'].append(i)
>>> d
{'result': [0, 1, 2, 3, 4]}
Keys have to be unique in a dictionnary, so what you are trying to achieve is not possible. When you assign another item with the same key, you simply override the previous entry, hence the result you see.
Maybe this would be useful to you?
>>> d = {}
>>> for i in range(3):
... d['result_' + str(i)] = i
>>> d
{'result_0': 0, 'result_1': 1, 'result_2': 2}
You can modify this to fit your needs.
PHA in dictionary the key cant be same :p in your example
{'result': 0,'result': 1,'result': 2,'result': 3,'result': 4}
you can use list of multiplw dict:
[{},{},{},{}]
You can't have different values for the same key in your dictionary. One option would be to number the result:
d = {}
for i in range(0,5):
result = 'result' + str(i)
d[result] = i
d
>>> {'result0': 0, 'result1': 1, 'result4': 4, 'result2': 2, 'result3': 3}
d = {"key1": [8, 22, 38], "key2": [7, 3, 12], "key3": [5, 6, 71]}
print(d)
for key, value in d.items():
value_new = [sum(value)]
d.update({key: value_new})
print(d)
>>> d = {"result": []}
>>> for i in range(0,5):
... d["result"].append(i)
...
>>> d
{'result': [0, 1, 2, 3, 4]}