indices of all occurrences of all elements - python

given a list I would like to see indices of all occurrences of all elements in a dictionary. I am looking for the Python implementation of function group offered by the Q programming language. I am expecting a simpler solution than code below
l=[2, 1, 1, 7, 2]
d={}
for e, v in enumerate(l):
if v in d.keys():
d[v].append(e)
else:
d[v]= [e]
print(d)
{2: [0, 4], 1:[1, 2], 7: [3]}

You could use a defaultdict to smoothen your code, but other than that, it's fine:
from collections import defaultdict
d = defaultdict(list)
for i, x in enumerate(l):
d[x].append(i)
Alternativey, you can use the setdefault method to access dict values:
d = {}
for i, x in enumerate(l):
d.setdefault(x, []).append(i)

Related

Python Dictionary Comprehension Not Outputting As Expected

I am playing around with dictionaries, and thought how would I create a dictionary using comprehensions. I thought
{k:v for k in [0,1,2] for v in [5,8,7]}
would print as
{0:5, 1:8, 2:7}
But instead it prints as
{0: 7, 1: 7, 2: 7}
Why is this happening and what modifications would I need to make to get the first output?
Your list comprehension is equivalent to nested loops:
result = {}
for v in [5, 8, 7]:
for k in [0, 1, 2]:
result[k] = v
So each iteration of the outer loop sets all the keys to that value, and at the end you have the last value in all of them.
Use zip() to iterate over two lists in parallel.
{k: v for k, v in zip([0, 1, 2], [5, 8, 7])}
You can also just use the dict() constructor:
dict(zip([0, 1, 2], [5, 8, 7]))
Whenever you have trouble with a comprehension, unroll it into the equivalent loops. Which in this case goes:
mydict = {}
for v in [5,8,7]:
for k in [0,1,2]:
mydict[k] = v
Each successive assignment to mydict[k] overwrites the previous one.

Filtering a (Nx1) list in Python

I have a list of the form
[(2,3),(4,3),(3,4),(1,4),(5,4),(6,5)]
I want to scan the list and return those elements whose (i,1) are repeated. (I apologize I couldn't frame this better)
For example, in the given list the pairs are (2,3),(4,3) and I see that 3 is repeated so I wish to return 2 and 4. Similarly, from (3,4),(1,4),(5,4) I will return 3, 1, and 5 because 4 is repeated.
I have implemented the bubble search but that is obviously very slow.
for i in range(0,p):
for j in range(i+1,p):
if (arr[i][1] == arr[j][1]):
print(arr[i][0],arr[j][0])
How do I go about it?
You can use collections.defaultdict. This will return a mapping from the second item to a list of first items. You can then filter for repetition via a dictionary comprehension.
from collections import defaultdict
lst = [(2,3),(4,3),(3,4),(1,4),(5,4),(6,5)]
d = defaultdict(list)
for i, j in lst:
d[j].append(i)
print(d)
# defaultdict(list, {3: [2, 4], 4: [3, 1, 5], 5: [6]})
res = {k: v for k, v in d.items() if len(v)>1}
print(res)
# {3: [2, 4], 4: [3, 1, 5]}
Using numpy allows to avoid for loops:
import numpy as np
l = [(2,3),(4,3),(3,4),(1,4),(5,4),(6,5)]
a = np.array(l)
items, counts = np.unique(a[:,1], return_counts=True)
is_duplicate = np.isin(a[:,1], items[counts > 1]) # get elements that have more than one count
print(a[is_duplicate, 0]) # return elements with duplicates
# tuple(map(tuple, a[is_duplicate, :])) # use this to get tuples in output
(toggle comment to get output in form of tuples)
pandas is another option:
import pandas as pd
l = [(2,3),(4,3),(3,4),(1,4),(5,4),(6,5)]
df = pd.DataFrame(l, columns=list(['first', 'second']))
df.groupby('second').filter(lambda x: len(x) > 1)

getting list of indices of each value of list in a pythonic way

I have a list data of values, and I want to return a dictionary mapping each value of data to a list of the indices where this value appears.
This can be done using this code:
data = np.array(data)
{val: list(np.where(data==val)[0]) for val in data}
but this runs in O(n^2), and this is too slow for long lists. Can an O(n) solution be coded using a "pythonic" syntax? (it can be done with creating an empty list and updating it in a loop, but I understand this is not recommended in python.)
You can use a defaultdict of lists to achieve this in O(n):
from collections import defaultdict
d = defaultdict(list)
for idx, item in enumerate(data):
d[item].append(idx)
For example, if data contains the string 'abcabccbazzzqa':
d = defaultdict(list)
for idx, item in enumerate('abcabccbazzzqa'):
d[item].append(idx)
>>> d
defaultdict(<type 'list'>, {'a': [0, 3, 8, 13], 'q': [12], 'c': [2, 5, 6], 'b': [1, 4, 7], 'z': [9, 10, 11]})
>>> d['a']
[0, 3, 8, 13]
Try this out :
data = np.array(data)
dic = {}
for i, val in enumerate(data):
if val in dic.keys():
dic[val].append(i)
else:
dic[val]=[]
dic[val].append(i)

Python - split list of lists by value

I want to split the following list of lists
a = [["aa",1,3]
["aa",3,3]
["sdsd",1,3]
["sdsd",6,0]
["sdsd",2,5]
["fffffff",1,3]]
into the three following lists of lists:
a1 = [["aa",1,3]
["aa",3,3]]
a2 = [["sdsd",1,3]
["sdsd",6,0]
["sdsd",2,5]]
a3 = [["fffffff",1,3]]
That is, according to the first value of each list. I need to do this for a list of lists with thousands of elements... How can I do it efficiently?
You're better off making a dictionary. If you really want to make a bunch of variables, you'll have to use globals(), which isn't really recommended.
a = [["aa",1,3]
["aa",3,3]
["sdsd",1,3]
["sdsd",6,0]
["sdsd",2,5]
["fffffff",1,3]]
d = {}
for sub in a:
key = sub[0]
if key not in d: d[key] = []
d[key].append(sub)
OR
import collections
d = collections.defaultdict(list)
for sub in a:
d[sub[0]].append(sub)
If input is sorted on first element:
from itertools import groupby
from operator import itemgetter
a = [["aa",1,3],
["aa",3,3],
["sdsd",1,3],
["sdsd",6,0],
["sdsd",2,5],
["fffffff",1,3]]
b = { k : list(v) for k, v in groupby(a, itemgetter(0))}
Create a dictionary with the first element as key and matching lists as value. And you will get a dictionary where value of each key value pair will be group of lists having same first element. For example,
a = [["aa", 1, 3],
["aa", 3, 3],
["sdsd", 1, 3],
["sdsd", 6, 0],
["sdsd", 2, 5],
["fffffff", 1, 3]]
d = {}
for e in a:
d[e[0]] = d.get(e[0]) or []
d[e[0]].append(e)
And now you can simply get the lists seperately,
a1 = d['aa']
a2 = d['sdsd']
A defaultdict will work nicely here:
a = [["aa",1,3],
["aa",3,3],
["sdsd",1,3],
["sdsd",6,0],
["sdsd",2,5],
["fffffff",1,3]]
from collections import defaultdict
d = defaultdict(list)
for thing in a:
d[thing[0]] += thing,
for separate_list in d.values():
print separate_list
Output
[['aa', 1, 3], ['aa', 3, 3]]
[['sdsd', 1, 3], ['sdsd', 6, 0], ['sdsd', 2, 5]]
[['fffffff', 1, 3]]

Removing Elements in A from B

I have two lists, let's say:
a = [1,2,3]
b = [1,2,3,1,2,3]
I would like to remove 1, 2 and 3 from list b, but not all occurrences. The resulting list should have:
b = [1,2,3]
I currently have:
for element in a:
try:
b.remove(element)
except ValueError:
pass
However, this has poor performance when a and b get very large. Is there a more efficient way of getting the same results?
EDIT
To clarify 'not all occurrences', I mean I do not wish to remove both '1's from b, as there was only one '1' in a.
I would do this:
set_a = set(a)
new_b = []
for x in b:
if x in set_a:
set_a.remove(x)
else:
new_b.append(x)
Unlike the other set solutions, this maintains order in b (if you care about that).
I would do something like this:
from collections import defaultdict
a = [1, 2, 3]
b = [1, 2, 3, 1, 2, 3]
# Build up the count of occurrences in b
d = defaultdict(int)
for bb in b:
d[bb] += 1
# Remove one for each occurrence in a
for aa in a:
d[aa] -= 1
# Create a list for all elements that still have a count of one or more
result = []
for k, v in d.iteritems():
if v > 0:
result += [k] * v
Or, if you are willing to be slightly more obscure:
from operator import iadd
result = reduce(iadd, [[k] * v for k, v in d.iteritems() if v > 0], [])
defaultdict generates a count of the occurrences of each key. Once it has been built up from b, it is decremented for each occurrence of a key in a. Then we print out the elements that are still left over, allowing them to occur multiple times.
defaultdict works with python 2.6 and up. If you are using a later python (2.7 and up, I believe), you can look into collections.Counter.
Later: you can also generalize this and create subtractions of counter-style defaultdicts:
from collections import defaultdict
from operator import iadd
a = [1, 2, 3, 4, 5, 6]
b = [1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3]
def build_dd(lst):
d = defaultdict(int)
for item in lst:
d[item] += 1
return d
def subtract_dd(left, right):
return {k: left[k] - v for k, v in right.iteritems()}
db = build_dd(b)
da = build_dd(a)
result = reduce(iadd,
[[k] * v for k, v in subtract_dd(db, da).iteritems() if v > 0],
[])
print result
But the reduce expression is pretty obscure now.
Later still: in python 2.7 and later, using collections.Counter, it looks like this:
from collections import Counter
base = [1, 2, 3]
missing = [4, 5, 6]
extra = [7, 8, 9]
a = base + missing
b = base * 4 + extra
result = Counter(b) - Counter(a)
print result
assert result == dict([(k, 3) for k in base] + [(k, 1) for k in extra])
Generally, you want to always avoid list.remove() (you are right, it would hurt the performance really badly). Also, it is much faster (O(1)) to look up elements in a dictionary or a set than in a list; so create a set out of your list1 (and if order doesn't matter, out of your list2).
Something like this:
sa = set(a)
new_b = [x for x in b if not x in sa]
# here you created a 3d list but I bet it's OK.
However I have no idea what is your actual algo for choosing elements for removal. Please elaborate on "but not all occurrences".

Categories