I have arrays like this:
['[camera_positive,3]', '[lens_positive,1]', '[camera_positive,2]', '[lens_positive,1]', '[lens_positive,1]', '[camera_positive,1]']
How to sum all value on index [1] with same string on index [0]?
Example:
camera_positive = 3 + 2 + 1 = 6
lens_positive = 1 + 1 + 1 = 3
You could use set in order to extract the unique keys and then use list comprehension to compute the sum for each key:
data = [['camera_positive', 3],
['lens_positive', 1],
['camera_positive', 2],
['lens_positive', 1],
['lens_positive', 1],
['camera_positive', 1]]
keys = set(key for key, value in data)
for key1 in keys:
total = sum(value for key2, value in data if key1 == key2)
print("key='{}', sum={}".format(key1, total))
this gives:
key='camera_positive', sum=6
key='lens_positive', sum=3
I'm assuming that you have a list of list, not a list of strings as shown in the question. Otherwise you'll have to do some parsing. That said, I would solve this problem by creating a dictionary, and then iterating over the values and adding them to the dictionary as you go.
The default dict allows this program to work without getting a key error, as it'll assume 0 if the key does not exist yet. You can read up on defaultdict here: https://docs.python.org/3.3/library/collections.html#collections.defaultdict
lmk if that helps!
>>> from collections import defaultdict
>>> d = defaultdict(int)
>>> d
defaultdict(<class 'int'>, {})
>>> lst=[['a',1], ['b', 2], ['a',4]]
>>> for k, v in lst:
... d[k] += v
...
>>> d
defaultdict(<class 'int'>, {'a': 5, 'b': 2})
You could group the entries by their first index using groupby with lambda x: x[0] or operator.itemgetter(0) as key.
This is maybe a bit less code than what Nick Brady showed. However you would need to sort the list first (for the same key), so it might be slower than his approach.
Related
I know to write something simple and slow with loop, but I need it to run super fast in big scale.
input:
lst = [[1, 1, 2], ["txt1", "txt2", "txt3"]]
desired out put:
d = {1 : ["txt1", "txt2"], 2 : "txt3"]
There is something built-in at python which make dict() extend key instead replacing it?
dict(list(zip(lst[0], lst[1])))
One option is to use dict.setdefault:
out = {}
for k, v in zip(*lst):
out.setdefault(k, []).append(v)
Output:
{1: ['txt1', 'txt2'], 2: ['txt3']}
If you want the element itself for singleton lists, one way is adding a condition that checks for it while you build an output dictionary:
out = {}
for k,v in zip(*lst):
if k in out:
if isinstance(out[k], list):
out[k].append(v)
else:
out[k] = [out[k], v]
else:
out[k] = v
or if lst[0] is sorted (like it is in your sample), you could use itertools.groupby:
from itertools import groupby
out = {}
pos = 0
for k, v in groupby(lst[0]):
length = len([*v])
if length > 1:
out[k] = lst[1][pos:pos+length]
else:
out[k] = lst[1][pos]
pos += length
Output:
{1: ['txt1', 'txt2'], 2: 'txt3'}
But as #timgeb notes, it's probably not something you want because afterwards, you'll have to check for data type each time you access this dictionary (if value is a list or not), which is an unnecessary problem that you could avoid by having all values as lists.
If you're dealing with large datasets it may be useful to add a pandas solution.
>>> import pandas as pd
>>> lst = [[1, 1, 2], ["txt1", "txt2", "txt3"]]
>>> s = pd.Series(lst[1], index=lst[0])
>>> s
1 txt1
1 txt2
2 txt3
>>> s.groupby(level=0).apply(list).to_dict()
{1: ['txt1', 'txt2'], 2: ['txt3']}
Note that this also produces lists for single elements (e.g. ['txt3']) which I highly recommend. Having both lists and strings as possible values will result in bugs because both of those types are iterable. You'd need to remember to check the type each time you process a dict-value.
You can use a defaultdict to group the strings by their corresponding key, then make a second pass through the list to extract the strings from singleton lists. Regardless of what you do, you'll need to access every element in both lists at least once, so some iteration structure is necessary (and even if you don't explicitly use iteration, whatever you use will almost definitely use iteration under the hood):
from collections import defaultdict
lst = [[1, 1, 2], ["txt1", "txt2", "txt3"]]
result = defaultdict(list)
for key, value in zip(lst[0], lst[1]):
result[key].append(value)
for key in result:
if len(result[key]) == 1:
result[key] = result[key][0]
print(dict(result)) # Prints {1: ['txt1', 'txt2'], 2: 'txt3'}
I have my program's output as a python dictionary and i want a list of keys from the dictn:
s = "cool_ice_wifi"
r = ["water_is_cool", "cold_ice_drink", "cool_wifi_speed"]
good_list=s.split("_")
dictn={}
for i in range(len(r)):
split_review=r[i].split("_")
counter=0
for good_word in good_list:
if good_word in split_review:
counter=counter+1
d1={i:counter}
dictn.update(d1)
print(dictn)
The conditions on which we should get the keys:
The keys with the same values will have the index copied as it is in a dummy list.
The keys with highest values will come first and then the lowest in the dummy list
Dictn={0: 1, 1: 1, 2: 2}
Expected output = [2,0,1]
You can use a list comp:
[key for key in sorted(dictn, key=dictn.get, reverse=True)]
In Python3 it is now possible to use the sorted method, as described here, to sort the dictionary in any way you choose.
Check out the documentation, but in the simplest case you can .get the dictionary's values, while for more complex operations, you'd define a key function yourself.
Dictionaries in Python3 are now insertion-ordered, so one other way to do things is to sort at the moment of dictionary creation, or you could use an OrderedDict.
Here's an example of the first option in action, which I think is the easiest
>>> a = {}
>>> a[0] = 1
>>> a[1] = 1
>>> a[2] = 2
>>> print(a)
{0: 1, 1: 1, 2: 2}
>>>
>>> [(k) for k in sorted(a, key=a.get, reverse=True)]
[2, 0, 1]
I have a list of data, which has 2 values:
a 12
a 11
a 5
a 12
a 11
I would like to use a dictionary, so I can end up with a list of values for each of the key. Column 1 may have a different entry, like 'b', so I can arrange data based on column 1 as key, while column 2 is the data for each key
[a:12,11,5]
How do I achieve this? From what I read, if 2 values has the same key, the last one override the previous one, so only one key is in the dictionary.
d={}
for line in results:
templist=line.split(' ')
thekey=templist[0]
thevalue=templist[1]
if thevalue in d:
d[thekey].append(thevalue)
else:
d[thekey]=[thevalue]
Am I approaching the problem using the wrong way?
Python dicts can have only one value for a key, so you cannot assign multiple values in the fashion you are trying to.
Instead, store the mutiple values in a list corresponding to the key so that the list becomes the one value corresponding to the key:
d = {}
d["a"] = []
d["a"].append(1)
d["a"].append(2)
>>> print d
{'a': [1, 2]}
You can use a defaultdict to simplify this, which will initialise the key if it doesn't exist with an empty list as below:
from collections import defaultdict
d = defaultdict(list)
d["a"].append(1)
d["a"].append(2)
>>> print d
defaultdict(<type 'list'>, {'a': [1, 2]})
If you don't want repeat values for a key, you can use a set instead of list. But do note that a set is unordered.
from collections import defaultdict
d = defaultdict(set)
d["a"].add(1)
d["a"].add(2)
d["a"].add(1)
>>> print d
defaultdict(<type 'set'>, {'a': set([1, 2])})
If you need to maintain order, either use sorted at runtime, or use a list
with an if clause to check for values
from collections import defaultdict
d = defaultdict(list)
for item in (1, 2, 1, 2, 3, 4, 1, 2):
if item not in d["a"]:
d["a"].append(item)
>>> print d
defaultdict(<type 'list'>, {'a': [1, 2, 3, 4]})
I am pretty new to all of this so this might be a noobie question.. but I am looking to find length of dictionary values... but I do not know how this can be done.
So for example,
d = {'key':['hello', 'brave', 'morning', 'sunset', 'metaphysics']}
I was wondering is there a way I can find the len or number of items of the dictionary value.
Thanks
Sure. In this case, you'd just do:
length_key = len(d['key']) # length of the list stored at `'key'` ...
It's hard to say why you actually want this, but, perhaps it would be useful to create another dict that maps the keys to the length of values:
length_dict = {key: len(value) for key, value in d.items()}
length_key = length_dict['key'] # length of the list stored at `'key'` ...
Lets do some experimentation, to see how we could get/interpret the length of different dict/array values in a dict.
create our test dict, see list and dict comprehensions:
>>> my_dict = {x:[i for i in range(x)] for x in range(4)}
>>> my_dict
{0: [], 1: [0], 2: [0, 1], 3: [0, 1, 2]}
Get the length of the value of a specific key:
>>> my_dict[3]
[0, 1, 2]
>>> len(my_dict[3])
3
Get a dict of the lengths of the values of each key:
>>> key_to_value_lengths = {k:len(v) for k, v in my_dict.items()}
{0: 0, 1: 1, 2: 2, 3: 3}
>>> key_to_value_lengths[2]
2
Get the sum of the lengths of all values in the dict:
>>> [len(x) for x in my_dict.values()]
[0, 1, 2, 3]
>>> sum([len(x) for x in my_dict.values()])
6
To find all of the lengths of the values in a dictionary you can do this:
lengths = [len(v) for v in d.values()]
A common use case I have is a dictionary of numpy arrays or lists where I know they're all the same length, and I just need to know one of them (e.g. I'm plotting timeseries data and each timeseries has the same number of timesteps). I often use this:
length = len(next(iter(d.values())))
Let dictionary be :
dict={'key':['value1','value2']}
If you know the key :
print(len(dict[key]))
else :
val=[len(i) for i in dict.values()]
print(val[0])
# for printing length of 1st key value or length of values in keys if all keys have same amount of values.
d={1:'a',2:'b'}
sum=0
for i in range(0,len(d),1):
sum=sum+1
i=i+1
print i
OUTPUT=2
This seems like such an obvious thing that I feel like I'm missing out on something, but how do you find out if two different keys in the same dictionary have the exact same value? For example, if you have the dictionary test with the keys a, b, and c and the keys a and b both have the value of 10, how would you figure that out? (For the point of the question, please assume a large number of keys, say 100, and you have no knowledge of how many duplicates there are, if there are multiple sets of duplicates, or if there are duplicates at all). Thanks.
len(dictionary.values()) == len(set(dictionary.values()))
This is under the assumption that the only thing you want to know is if there are any duplicate values, not which values are duplicates, which is what I assumed from your question. Let me know if I misinterpreted the question.
Basically this is just checking if any entries were removed when the values of the dictionary were casted to an object that by definition doesn't have any duplicates.
If the above doesn't work for your purposes, this should be a better solution:
set(k for k,v in d.items() if d.values().count(v) > 1))
Basically the second version just checks to see if there is more than one entry that will be removed if you try popping it out of the list.
To detect all of these cases:
>>> import collections
>>> d = {"a": 10, "b": 15, "c": 10}
>>> value_to_key = collections.defaultdict(list)
>>> for k, v in d.iteritems():
... value_to_key[v].append(k)
...
>>> value_to_key
defaultdict(<type 'list'>, {10: ['a', 'c'], 15: ['b']})
#hivert makes the excellent point that this only works if the values are hashable. If this is not the case, there is no nice O(n) solution(sadly). This is the best I can come up with:
d = {"a": [10, 15], "b": [10, 20], "c": [10, 15]}
values = []
for k, v in d.iteritems():
must_insert = True
for val in values:
if val[0] == v:
val[1].append(k)
must_insert = False
break
if must_insert: values.append([v, [k]])
print [v for v in values if len(v[1]) > 1] #prints [[[10, 15], ['a', 'c']]]
You can tell which are the duplicate values by means of a reverse index - where the key is the duplicate value and the value is the set of keys that have that value (this will work as long as the values in the input dictionary are hashable):
from collections import defaultdict
d = {'w':20, 'x':10, 'y':20, 'z':30, 'a':10}
dd = defaultdict(set)
for k, v in d.items():
dd[v].add(k)
dd = { k : v for k, v in dd.items() if len(v) > 1 }
dd
=> {10: set(['a', 'x']), 20: set(['y', 'w'])}
From that last result it's easy to obtain the set of keys with duplicate values:
set.union(*dd.values())
=> set(['y', 'x', 'a', 'w'])
dico = {'a':0, 'b':0, 'c':1}
result = {}
for val in dico:
if dico[val] in result:
result[dico[val]].append(val)
else:
result[dico[val]] = [val]
>>> result
{0: ['a', 'b'], 1: ['c']}
Then you can filter on the result's key that has a value (list) with more than one element, e.g. a duplicate has been found
Build another dict mapping the values of the first dict to all keys that hold that value:
import collections
inverse_dict = collections.defaultdict(list)
for key in original_dict:
inverse_dict[original_dict[key]].append(key)
keys = set()
for key1 in d:
for key2 in d:
if key1 == key2: continue
if d[key1] == d[key2]:
keys |= {key1, key2}
i.e. that's Θ(n²) what you want. The reason is that a dict does not provide Θ(1) search of a key, given a value. So better rethink your data structure choices if that's not good enough.
You can use list in conjunction with dictionary to find duplicate elements!
Here is a simple code demonstrating the same:
d={"val1":4,"val2":4,"val3":5,"val4":3}
l=[]
for key in d:
l.append(d[key])
l.sort()
print(l)
for i in range(len(l)):
if l[i]==l[i+1]:
print("true, there are duplicate elements.")
print("the keys having duplicate elements are: ")
for key in d:
if d[key]==l[i]:
print(key)
break
output:
runfile('C:/Users/Andromeda/listeqtest.py', wdir='C:/Users/Andromeda')
[3, 4, 4, 5]
true, there are duplicate elements.
the keys having duplicate elements are:
val1
val2
when you sort the elements in the list, you will find that equal values always appear together!