Python object of arrays count number of similar array value occurrences - python

Apologies for the wording of this question. I have the beneath list containing sub-objects that also contains key, value pairs.
l = [{'melissa': ["power"]}, {'Linda': ["power", "a"]}, {'Rachel': ["power", "document"]}]
This is my current solution that counts the amount of occurrences of strings in each objects list as expected:
cnt = {}
for i in l:
for x in i.values():
for j in x:
if j not in cnt:
cnt[j] = 1
else:
cnt[j] += 1
final = list(map(list, cnt.items()))
print(final)
Output:
final = [['power', 3], ['a', 1], ['document', 1]]
Is there a better/more succinct method of doing this?
I would still like the output to be a list of sub-lists.
Thanks

Use collections.Counter:
from collections import Counter
l = [
{"melissa": ["power"]},
{"Linda": ["power", "a"]},
{"Rachel": ["power", "document"]},
]
out = Counter(i for d in l for lst in d.values() for i in lst)
out = list(map(list, out.items()))
print(out)
Prints:
[['power', 3], ['a', 1], ['document', 1]]

Related

Python: Adding integer elements of a nested list to a list

So, I have two lists whose integer elements need to be added.
nested_lst_1 = [[6],[7],[8,9]]
lst = [1,2,3]
I need to add them such that every element in the nested list, will be added to its corresponding integer in 'lst' to obtain another nested list.
nested_list_2 = [[6 + 1],[7 + 2],[8 + 3,9 + 3]]
or
nested_list_2 = [[7],[9],[11,12]]
Then, I need to use the integers from nested_list_1 and nested_list_2 as indices to extract a substring from a string.
nested_list_1 = [[6],[7],[8,9]] *obtained above*
nested_list_2 = [[7],[9],[11,12]] *obtained above*
string = 'AGTCATCGTACGATCATCGAAGCTAGCAGCATGAC'
string[6:7] = 'CG'
string[7:9] = 'GTA'
string[8:11] = 'TACG'
string[9:12] = 'ACGA'
Then, I need to create a nested list of the substrings obtained:
nested_list_substrings = [['CG'],['GTA'],['TACG','ACGA']]
Finally, I need to use these substrings as key values in a dictionary which also possesses keys of type string.
keys = ['GG', 'GTT', 'TCGG']
nested_list_substrings = [['CG'],['GTA'],['TACG','ACGA']]
DNA_mutDNA = {'GG':['CG'], 'GTT':['GTA'], 'TCGG':['TACG','ACGA']}
I understand that this is a multi-step problem, but if you could assist in any way, I really appreciate it.
Assuming you don't need the intermediate variables, you can do all this with a dictionary comprehension:
a = [[6],[7],[8,9]]
b = [1,2,3]
keys = ['GG', 'GTT', 'TCGG']
s = 'AGTCATCGTACGATCATCGAAGCTAGCAGCATGAC'
DNA_mutDNA = {k: [s[start:start+length+1] for start in starts]
for k, starts, length in zip(keys, a, b)}
You can produce the substring list directly with a nested list comprehension, nested_lst_2 isn't necessary.
nested_lst_1 = [[6],[7],[8,9]]
lst = [1,2,3]
string = 'AGTCATCGTACGATCATCGAAGCTAGCAGCATGAC'
keys = ['GG', 'GTT', 'TCGG']
substrings = [[string[v:i+v+1] for v in u] for i, u in zip(lst, nested_lst_1)]
print(substrings)
DNA_mutDNA = dict(zip(keys, substrings))
print(DNA_mutDNA)
output
[['CG'], ['GTA'], ['TACG', 'ACGA']]
{'GG': ['CG'], 'GTT': ['GTA'], 'TCGG': ['TACG', 'ACGA']}
In[2]: nested_lst_1 = [[6],[7],[8,9]]
...: lst = [1,2,3]
...: string = 'AGTCATCGTACGATCATCGAAGCTAGCAGCATGAC'
...: keys = ['GG', 'GTT', 'TCGG']
In[3]: nested_lst_2 = [[elem + b for elem in a] for a, b in zip(nested_lst_1, lst)]
In[4]: nested_list_substrings = []
...: for a, b in zip(nested_lst_1, nested_lst_2):
...: nested_list_substrings.append([string[c:d + 1] for c, d in zip(a, b)])
...:
In[5]: {k: v for k, v in zip(keys, nested_list_substrings)}
Out[5]: {'GG': ['CG'], 'GTT': ['GTA'], 'TCGG': ['TACG', 'ACGA']}
Surely not the most readable way to do it, here is a bit of functional style fun:
nested_lst_1 = [[6], [7], [8,9]]
lst = [1, 2, 3]
nested_lst_2 = list(map(
list,
map(map, map(lambda n: (lambda x: n+x), lst), nested_lst_1)))
nested_lst_2
Result looks as expected:
[[7], [9], [11, 12]]
Then:
from itertools import starmap
from operator import itemgetter
make_slices = lambda l1, l2: starmap(slice, zip(l1, map(lambda n: n+1, l2)))
string = 'AGTCATCGTACGATCATCGAAGCTAGCAGCATGAC'
get_slice = lambda s: itemgetter(s)(string)
nested_list_substrings = list(map(
lambda slices: list(map(get_slice, slices)),
starmap(make_slices, zip(nested_lst_1, nested_lst_2))))
nested_list_substrings
Result:
[['CG'], ['GTA'], ['TACG', 'ACGA']]
And finally:
keys = ['GG', 'GTT', 'TCGG']
DNA_mutDNA = dict(zip(keys, nested_list_substrings))
DNA_mutDNA
Final result:
{'GG': ['CG'], 'GTT': ['GTA'], 'TCGG': ['TACG', 'ACGA']}

Sum lists with different lengths in python

I have 3 lists with different lengths. They are made like this:
final_list = [[1230, 0], [1231,0],[1232,0], [1233, 0], [1234, 0]]
list2 = [[1232, 20], [1233, 30]]
list3 = [[1230, 10], [1231,20],[1232,40]]
What I want to obtain the final_list like this:
final_list = [[1230, 10], [1231,20],[1232,60], [1233, 30], [1234, 0]]
(If, considering each element of list2 and list3, its first value is equal to one of the first elements of the final list, then the corresponding element of the final list has to have the second value equal to the sum of the elements found.)
Not a clean solution, but easy to grasp and might save your day.
f = {}
dcts = map(lambda l: dict([l]), list2+list3)
for dct in dcts:
for k in dct.iterkeys():
f[k] = w.get(k, 0) + d[k]
final_list = map(list, f.items())
however, if you are familiar with itertools
import groupby from itertools
merged = list2+list3
final_list = []
for key, group in groupby(merged, key = lambda e: e[0]):
final_list.append([key, sum(j for i, j in group)])
or a oneliner
[[k, sum(j for i, j in g)] for k, g in groupby(list3+list2, key = lambda e: e[0])]
I created a temp_list and append all three lists to it.
create a dictionary dic and loop through temp_list to sum up each tuple base on the key.
then I turn the dic back into a list and sort it.
I admit this is not the most efficient way to do this. but it is a solution.
temp_list = []
temp_list.append(final_list)
temp_list.append(list2)
temp_list.append(list3)
dic = {}
for lst in temp_list:
for tp in lst:
if tp[0] in dic:
dic[tp[0]] = dic[tp[0]] + tp[1]
else:
dic[tp[0]] = tp[1]
result = []
for key, value in dic.iteritems():
temp = [key,value]
result.append(temp)
result.sort()
result:
[(1230, 10), (1231, 20), (1232, 60), (1233, 30), (1234, 0)]

Converting this loop to a list comprehension

I would like to generate a list of unique Ids by only keeping the list that has the minimum value in element 2.
For example, given the list:
list1 = [['Id1', 1, 40],['Id1', 2, 30],['Id2', 10,40]]`
Expected output:
[['Id1', 1, 40],['Id2', 10,40]]
Here's my working example, but it's pretty clunky. I think it could probably be done with a single list comprehension.
list1 = [['Id1', 1, 40],['Id1', 2, 30],['Id2', 10,40]]
unique_list = list(set([x[0] for x in list1]))
unique_list = [[x] for x in unique_list]
for x in unique_list:
id = x[0]
min_val = min([y[1:] for y in list1 if y[0] == id])
x.extend(min_val )
print unique_list
You can use itertools.groupby to group by the first element in the sublists, the you can get the min with a key argument to sort by the remaining elements in the sublist.
>>> from itertools import groupby
[min(list(g), key = lambda i: i[1:]) for k, g in groupby(list1, lambda i: i[0])]
[['Id1', 1, 40], ['Id2', 10, 40]]
A very naive approach but easily understandable.
list1 = [['Id1', 1, 40],['Id1', 2, 30],['Id2', 10,40]]
unique_list = []
for list_element in list1:
appendable = True
for temp_list in unique_list:
if list_element[0] == temp_list[0]:
if temp_list[1] < list_element[1]:
appendable = False
else:
unique_list.remove(temp_list)
if appendable == True:
unique_list.append(list_element)
unique_list.sort()
print unique_list

How to convert dictionary of indices to list of keys?

Say you have a dictionary listing the indices where each unique value appear. For example say you alphabet is just a and b then this dictionary will look something like: d = {'a': [1, 2, 6], 'b': [3, 7]}. I would like to convert it to the raw list which shows at the right index the right value, such that in the last example, l = ['a','a','b',None,None,'a',b']. I prefer an easy small solution rather than one which has tedious for loops. Thank!
Obviously doing this without for loops is a terrible idea, because the easiest way is (it's not perfect, but it does the job):
r = {}
for key, value in d.items():
for element in value:
r[element] = key
l = [r.get(i) for i in xrange(1, max(r) + 1)]
But if you REALLY want to know how to do this without any for then have a look:
m = {}
i = 0
d_keys = d.keys()
max_value = 0
while i < len(d):
d_i = d[d_keys[i]]
j = 0
while j < len(d_i):
d_i_j = d_i[j]
if max_value < d_i_j:
max_value = d_i_j
m[d_i_j] = d_keys[i]
j += 1
i += 1
l = []
i = 1
while i <= max_value:
l.append(m.get(i))
i += 1
It's quite easy, isn't it?
I don't know why you need that, but here is a dirty answer, without loops.
d = {'a': [1, 2, 6], 'b': [3, 7]}
map(lambda x: x[0] if x else None, map(lambda x: filter(lambda l: x in d[l], d), range(1, max(reduce(lambda x, y: x+y, map(lambda x:d[x], d)))+1)))
d.keys()
keys()
Return a copy of the dictionary’s list of keys. See the note for dict.items()
from Python Docs

Get a unique list of items that occur more than once in a list

I have a list of items:
mylist = ['A','A','B','C','D','E','D']
I want to return a unique list of items that appear more than once in mylist, so that my desired output would be:
[A,D]
Not sure how to even being this, but my though process is to first append a count of each item, then remove anything equal to 1. Then dedupe, but this seems like a really roundabout, inefficient way to do it, so I am looking for advice.
You can use collections.Counter to do what you have described easily:
from collections import Counter
mylist = ['A','A','B','C','D','E','D']
cnt = Counter(mylist)
print [k for k, v in cnt.iteritems() if v > 1]
# ['A', 'D']
>>> mylist = ['A','A','B','C','D','E','D']
>>> set([i for i in mylist if mylist.count(i)>1])
set(['A', 'D'])
import collections
cc = collections.Counter(mylist) # Counter({'A': 2, 'D': 2, 'C': 1, 'B': 1, 'E': 1})
cc.subtract(cc.keys()) # Counter({'A': 1, 'D': 1, 'C': 0, 'B': 0, 'E': 0})
cc += collections.Counter() # remove zeros (trick from the docs)
print cc.keys() # ['A', 'D']
Try some thing like this:
a = ['A','A','B','C','D','E','D']
import collections
print [x for x, y in collections.Counter(a).items() if y > 1]
['A', 'D']
Reference: How to find duplicate elements in array using for loop in Python?
OR
def list_has_duplicate_items( mylist ):
return len(mylist) > len(set(mylist))
def get_duplicate_items( mylist ):
return [item for item in set(mylist) if mylist.count(item) > 1]
mylist = [ 'oranges' , 'apples' , 'oranges' , 'grapes' ]
print 'List: ' , mylist
print 'Does list have duplicate item(s)? ' , list_has_duplicate_items( mylist )
print 'Redundant item(s) in list: ' , get_duplicate_items( mylist )
Reference https://www.daniweb.com/software-development/python/threads/286996/get-redundant-items-in-list
Using a similar approach to others here, heres my attempt:
from collections import Counter
def return_more_then_one(myList):
counts = Counter(my_list)
out_list = [i for i in counts if counts[i]>1]
return out_list
It can be as simple as ...
print(list(set([i for i in mylist if mylist.count(i) > 1])))
Use set to help you do that, like this maybe :
X = ['A','A','B','C','D','E','D']
Y = set(X)
Z = []
for val in Y :
occurrences = X.count(val)
if(occurrences > 1) :
#print(val,'occurs',occurrences,'times')
Z.append(val)
print(Z)
The list Z will save the list item which occur more than once. And the part I gave comment (#), that will show the number of occurrences of each list item which occur more than once
Might not be as fast as internal implementations, but takes (almost) linear time (since set lookup is logarithmic)
mylist = ['A','A','B','C','D','E','D']
myset = set()
dups = set()
for x in mylist:
if x in myset:
dups.add(x)
else:
myset.add(x)
dups = list(dups)
print dups
another solution what's written:
def delete_rep(list_):
new_list = []
for i in list_:
if i not in list_[i:]:
new_list.append(i)
return new_list
This is my approach without using packages
result = []
for e in listy:
if listy.count(e) > 1:
result.append(e)
else:
pass
print(list(set(result)))

Categories