Create sublists of indexes of equal values from list - python

I'm trying to split a list of integers into sublists of the the indexes of equal integers. So say I have a list:
original_list = [1,2,1,4,4,4,3,4,4,1,4,3,3]
The desired output would be:
indexes : [[0,2,9], [1], [6,11,12], [3,4,5,7,8,10]]
# corresponds to sublists: [[1,1,1] [2], [3,3,3], [4,4,4,4,4,4]]
I can't figure out how to do this though, as most solutions require you to first sort the original list, but in my case, this messes up the indices. Itertools or np.arrays have not helped me for this reason, as they only group sequential equal elements.
Does anyone know of a solution for this problem? I would love to hear!

You can use enumerate:
original_list = [1,2,1,4,4,4,3,4,4,1,4,3,3]
groups = {a:[i for i, c in enumerate(original_list) if c == a] for a in set(original_list)}
Output:
{1: [0, 2, 9], 2: [1], 3: [6, 11, 12], 4: [3, 4, 5, 7, 8, 10]}

Using enumerate and a defaultdict, you can build a mapping of values to their indices with
from collections import defaultdict
dd = defaultdict(list)
for index, value in enumerate(original_list):
dd[value].append(index)
print(dd)
# defaultdict(<class 'list'>, {1: [0, 2, 9], 2: [1], 4: [3, 4, 5, 7, 8, 10], 3: [6, 11, 12]})

You can use collections.defaultdict for a one-pass solution. Then use sorted if you need, as in your desired result, to sort your indices by value.
original_list = [1,2,1,4,4,4,3,4,4,1,4,3,3]
from collections import defaultdict
from operator import itemgetter
dd = defaultdict(list)
for idx, value in enumerate(original_list):
dd[value].append(idx)
keys, values = zip(*sorted(dd.items(), key=itemgetter(0)))
print(keys, values, sep='\n')
(1, 2, 3, 4)
([0, 2, 9], [1], [6, 11, 12], [3, 4, 5, 7, 8, 10])
For comparison, the values of dd are insertion ordered in Python 3.6+ (officially in 3.7+, as a CPython implementation detail in 3.6):
print(list(dd.values()))
[[0, 2, 9], [1], [3, 4, 5, 7, 8, 10], [6, 11, 12]]

Here is how I would do it with numpy, using the argsort function I linked in the comments.
original = [1,2,1,4,4,4,3,4,4,1,4,3,3]
indexes = []
s = set()
for n in np.argsort(original):
if original[n] in s:
indexes[-1].append(n)
else:
indexes.append([n])
s.add(original[n])
print(indexes)

This can be achieved with a list comprehension.
>>> x = [1,2,1,4,4,4,3,4,4,1,4,3,3]
>>> [[i for i in range(len(x)) if x[i]==y] for y in sorted(set(x))]
[[0, 2, 9], [1], [6, 11, 12], [3, 4, 5, 7, 8, 10]]

Related

Get fixed size combinations of a list of lists in python?

I am looking for modified version of itertools.product(*a). This command returns combinations by selecting elements from each list but I need to restrict size.
Suppose,
mylist = [[6, 7, 8], [3, 5, 9], [2, 1, 4]]
output: (6, 3), (6, 2),....(3, 2)... when size is 2
Number of lists and size are not fixed. I need something that can be dynamic enough.
You can try:
from itertools import product, combinations, chain
mylist=[[6, 7, 8], [3, 5, 9], [2, 1]]
size = 2
results = chain.from_iterable(product(*t) for t in combinations(mylist, size))
print(list(results))
Perhaps you can try this:
from itertools import chain, combinations
l=[[6, 7, 8], [3, 5, 9], [2, 1, 4]]
x=list(combinations(chain.from_iterable(l),2))
print(x)
Solution:
import itertools
size = 2
mylist = [[6, 7, 8], [3, 5, 9], [2, 1, 4]]
res = []
for x in list(itertools.product(*mylist)):
res += itertools.combinations(x, size)
print(set(res))

Element wise concatenation of a list of lists with different lengths

I have a sample list of lists like:
lol = [[1,2,3,4],[5,6],[7,8,9,0,11],[21]]
the expected combined list is:
cl = [1,5,7,21,2,6,8,3,9,4,0,11]
Is there an elegant way of doing this preferably without nested for loops?
You can use itertools.zip_longest:
from itertools import zip_longest
lol = [[1, 2, 3, 4], [5, 6], [7, 8, 9, 0, 11], [21]]
out = [i for v in zip_longest(*lol) for i in v if not i is None]
print(out)
Prints:
[1, 5, 7, 21, 2, 6, 8, 3, 9, 4, 0, 11]
itertools is your friend. Use zip_longest to zip ignoring the differing lengths, chain it to flatten the zipped lists, and then just filter the Nones.
lol = [[1,2,3,4],[5,6],[7,8,9,0,11],[21]]
print([x for x in itertools.chain.from_iterable(itertools.zip_longest(*lol)) if x is not None])
In case it helps, a generator version of zip_longest is available as more_itertools.interleave_longest.
from more_itertools import interleave_longest, take
lol = [[1, 2, 3, 4], [5, 6], [7, 8, 9, 0, 11], [21]]
gen_from_lol = interleave_longest(*lol)
print(next(gen_from_lol), next(gen_from_lol))
print(take(6, gen_from_lol))
print(next(gen_from_lol))
print(next(gen_from_lol), next(gen_from_lol))
Output
1 5
[7, 21, 2, 6, 8, 3]
9
4 0
Note that interleave_longest(*iterables) is the basically the same as chain.from_iterable(zip_longest(*iterables))

Splitting nested list of floats to columns [duplicate]

Let's take:
l = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
The result I'm looking for is
r = [[1, 4, 7], [2, 5, 8], [3, 6, 9]]
and not
r = [(1, 4, 7), (2, 5, 8), (3, 6, 9)]
Python 3:
# short circuits at shortest nested list if table is jagged:
list(map(list, zip(*l)))
# discards no data if jagged and fills short nested lists with None
list(map(list, itertools.zip_longest(*l, fillvalue=None)))
Python 2:
map(list, zip(*l))
[[1, 4, 7], [2, 5, 8], [3, 6, 9]]
Explanation:
There are two things we need to know to understand what's going on:
The signature of zip: zip(*iterables) This means zip expects an arbitrary number of arguments each of which must be iterable. E.g. zip([1, 2], [3, 4], [5, 6]).
Unpacked argument lists: Given a sequence of arguments args, f(*args) will call f such that each element in args is a separate positional argument of f.
itertools.zip_longest does not discard any data if the number of elements of the nested lists are not the same (homogenous), and instead fills in the shorter nested lists then zips them up.
Coming back to the input from the question l = [[1, 2, 3], [4, 5, 6], [7, 8, 9]], zip(*l) would be equivalent to zip([1, 2, 3], [4, 5, 6], [7, 8, 9]). The rest is just making sure the result is a list of lists instead of a list of tuples.
Equivalently to Jena's solution:
>>> l=[[1,2,3],[4,5,6],[7,8,9]]
>>> [list(i) for i in zip(*l)]
... [[1, 4, 7], [2, 5, 8], [3, 6, 9]]
One way to do it is with NumPy transpose. For a list, a:
>>> import numpy as np
>>> np.array(l).T.tolist()
[[1, 4, 7], [2, 5, 8], [3, 6, 9]]
Or another one without zip (python < 3):
>>> map(list, map(None, *l))
[[1, 4, 7], [2, 5, 8], [3, 6, 9]]
Or for python >= 3:
>>> list(map(lambda *x: list(x), *l))
[[1, 4, 7], [2, 5, 8], [3, 6, 9]]
just for fun, valid rectangles and assuming that m[0] exists
>>> m = [[1,2,3],[4,5,6],[7,8,9]]
>>> [[row[i] for row in m] for i in range(len(m[0]))]
[[1, 4, 7], [2, 5, 8], [3, 6, 9]]
Methods 1 and 2 work in Python 2 or 3, and they work on ragged, rectangular 2D lists. That means the inner lists do not need to have the same lengths as each other (ragged) or as the outer lists (rectangular). The other methods, well, it's complicated.
the setup
import itertools
import six
list_list = [[1,2,3], [4,5,6, 6.1, 6.2, 6.3], [7,8,9]]
method 1 — map(), zip_longest()
>>> list(map(list, six.moves.zip_longest(*list_list, fillvalue='-')))
[[1, 4, 7], [2, 5, 8], [3, 6, 9], ['-', 6.1, '-'], ['-', 6.2, '-'], ['-', 6.3, '-']]
six.moves.zip_longest() becomes
itertools.izip_longest() in Python 2
itertools.zip_longest() in Python 3
The default fillvalue is None. Thanks to #jena's answer, where map() is changing the inner tuples to lists. Here it is turning iterators into lists. Thanks to #Oregano's and #badp's comments.
In Python 3, pass the result through list() to get the same 2D list as method 2.
method 2 — list comprehension, zip_longest()
>>> [list(row) for row in six.moves.zip_longest(*list_list, fillvalue='-')]
[[1, 4, 7], [2, 5, 8], [3, 6, 9], ['-', 6.1, '-'], ['-', 6.2, '-'], ['-', 6.3, '-']]
The #inspectorG4dget alternative.
method 3 — map() of map() — broken in Python 3.6
>>> map(list, map(None, *list_list))
[[1, 4, 7], [2, 5, 8], [3, 6, 9], [None, 6.1, None], [None, 6.2, None], [None, 6.3, None]]
This extraordinarily compact #SiggyF second alternative works with ragged 2D lists, unlike his first code which uses numpy to transpose and pass through ragged lists. But None has to be the fill value. (No, the None passed to the inner map() is not the fill value. It means there is no function to process each column. The columns are just passed through to the outer map() which converts them from tuples to lists.)
Somewhere in Python 3, map() stopped putting up with all this abuse: the first parameter cannot be None, and ragged iterators are just truncated to the shortest. The other methods still work because this only applies to the inner map().
method 4 — map() of map() revisited
>>> list(map(list, map(lambda *args: args, *list_list)))
[[1, 4, 7], [2, 5, 8], [3, 6, 9]] // Python 2.7
[[1, 4, 7], [2, 5, 8], [3, 6, 9], [None, 6.1, None], [None, 6.2, None], [None, 6.3, None]] // 3.6+
Alas the ragged rows do NOT become ragged columns in Python 3, they are just truncated. Boo hoo progress.
Three options to choose from:
1. Map with Zip
solution1 = map(list, zip(*l))
2. List Comprehension
solution2 = [list(i) for i in zip(*l)]
3. For Loop Appending
solution3 = []
for i in zip(*l):
solution3.append((list(i)))
And to view the results:
print(*solution1)
print(*solution2)
print(*solution3)
# [1, 4, 7], [2, 5, 8], [3, 6, 9]
import numpy as np
r = list(map(list, np.transpose(l)))
One more way for square matrix. No numpy, nor itertools, use (effective) in-place elements exchange.
def transpose(m):
for i in range(1, len(m)):
for j in range(i):
m[i][j], m[j][i] = m[j][i], m[i][j]
Maybe not the most elegant solution, but here's a solution using nested while loops:
def transpose(lst):
newlist = []
i = 0
while i < len(lst):
j = 0
colvec = []
while j < len(lst):
colvec.append(lst[j][i])
j = j + 1
newlist.append(colvec)
i = i + 1
return newlist
more_itertools.unzip() is easy to read, and it also works with generators.
import more_itertools
l = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
r = more_itertools.unzip(l) # a tuple of generators.
r = list(map(list, r)) # a list of lists
or equivalently
import more_itertools
l = more_itertools.chunked(range(1,10), 3)
r = more_itertools.unzip(l) # a tuple of generators.
r = list(map(list, r)) # a list of lists
matrix = [[1,2,3],
[1,2,3],
[1,2,3],
[1,2,3],
[1,2,3],
[1,2,3],
[1,2,3]]
rows = len(matrix)
cols = len(matrix[0])
transposed = []
while len(transposed) < cols:
transposed.append([])
while len(transposed[-1]) < rows:
transposed[-1].append(0)
for i in range(rows):
for j in range(cols):
transposed[j][i] = matrix[i][j]
for i in transposed:
print(i)
Just for fun: If you then want to make them all into dicts.
In [1]: l = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
...: fruits = ["Apple", "Pear", "Peach",]
...: [dict(zip(fruits, j)) for j in [list(i) for i in zip(*l)]]
Out[1]:
[{'Apple': 1, 'Pear': 4, 'Peach': 7},
{'Apple': 2, 'Pear': 5, 'Peach': 8},
{'Apple': 3, 'Pear': 6, 'Peach': 9}]
Here is a solution for transposing a list of lists that is not necessarily square:
maxCol = len(l[0])
for row in l:
rowLength = len(row)
if rowLength > maxCol:
maxCol = rowLength
lTrans = []
for colIndex in range(maxCol):
lTrans.append([])
for row in l:
if colIndex < len(row):
lTrans[colIndex].append(row[colIndex])
#Import functions from library
from numpy import size, array
#Transpose a 2D list
def transpose_list_2d(list_in_mat):
list_out_mat = []
array_in_mat = array(list_in_mat)
array_out_mat = array_in_mat.T
nb_lines = size(array_out_mat, 0)
for i_line_out in range(0, nb_lines):
array_out_line = array_out_mat[i_line_out]
list_out_line = list(array_out_line)
list_out_mat.append(list_out_line)
return list_out_mat

Python: group a list into sublists by a equality of projected value

Is there a nice pythonic way of grouping a list into a list of lists where each of the inner lists contain only those elements that have the same projection, defined by the user as a function?
Example:
>>> x = [0, 1, 2, 3, 4, 5, 6, 7]
>>> groupby(x, projection=lambda e: e % 3)
[[0, 3, 6], [1, 4, 7], [2, 5]]
I don't care about the projection itself, just that if it is equal for some elements these must end up in the same sublist.
I'm basically looking for a python equivalent of the haskell function GHC.Exts.groupWith:
Prelude> import GHC.Exts
Prelude GHC.Exts> groupWith (`mod` 3) [0..7]
[[0,3,6],[1,4,7],[2,5]]
The itertools module in the standard-library contains a groupby() function that should do what you want.
Note that the input to groupby() should be sorted by the group key to yield each group only once, but it's easy to use the same key function for sorting. So if your key function (projection) is looking at whether a number is even, it would look like this:
from itertools import groupby
x = [0, 1, 2, 3, 4, 5, 6, 7]
def projection(val):
return val % 3
x_sorted = sorted(x, key=projection)
x_grouped = [list(it) for k, it in groupby(x_sorted, projection)]
print(x_grouped)
[[0, 3, 6], [1, 4, 7], [2, 5]]
Note that while this version only uses standard Python features, if you are dealing with more than maybe 100.000 values, you should look into pandas (see #ayhan's answer)
No need to sort.
from collections import defaultdict
def groupby(iterable, projection):
result = defaultdict(list)
for item in iterable:
result[projection(item)].append(item)
return result
x = [0, 1, 2, 3, 4, 5, 6, 7]
groups = groupby(x, projection=lambda e: e % 3)
print groups
print groups[0]
Output:
defaultdict(<type 'list'>, {0: [0, 3, 6], 1: [1, 4, 7], 2: [2, 5]})
[0, 3, 6]
A pandas version would be like this:
import pandas as pd
x = [0, 1, 2, 3, 4, 5, 6, 7]
pd.Series(x).groupby(lambda t: t%3).groups
Out[13]: {0: [0, 3, 6], 1: [1, 4, 7], 2: [2, 5]}
Or
pd.Series(x).groupby(lambda t: t%3).groups.values()
Out[32]: dict_values([[0, 3, 6], [1, 4, 7], [2, 5]])
Here is one approach using compress from itertools:
from itertools import compress
import numpy as np
L = [i %3 for i in x]
[list(compress(x, np.array(L)==i)) for i in set(L)]
#[[0, 3, 6], [1, 4, 7], [2, 5]]

Extract list of duplicate values and locations from array

Given an array a of length N, which is a list of integers, I want to extract the duplicate values, where I have a seperate list for each value containing the location of the duplicates. In pseudo-math:
If |M| > 1:
val -> M = { i | a[i] == val }
Example (N=11):
a = [0, 3, 1, 6, 8, 1, 3, 3, 2, 10, 10]
should give the following lists:
3 -> [1, 6, 7]
1 -> [2, 5]
10 -> [9, 10]
I added the python tag since I'm currently programming in that language (numpy and scipy are available), but I'm more interestead in a general algorithm of how to do it. Code examples are fine, though.
One idea, which I did not yet flesh out: Construct a list of tuples, pairing each entry of a with its index: (i, a[i]). Sort the list with the second entry as key, then check consecutive entries for which the second entry is the same.
Here's an implementation using a python dictionary (actually a defaultdict, for convenience)
a = [0, 3, 1, 6, 8, 1, 3, 3, 2, 10, 10]
from collections import defaultdict
d = defaultdict(list)
for k, item in enumerate(a):
d[item].append(k)
finalD = {key : value for key, value in d.items() if len(value) > 1} # Filter dict for items that only occurred once.
print(finalD)
# {1: [2, 5], 10: [9, 10], 3: [1, 6, 7]}
The idea is to create a dictionary mapping the values to the list of the position where it appears.
This can be done in a simple way with setdefault. This can also be done using defaultdict.
>>> a = [0, 3, 1, 6, 8, 1, 3, 3, 2, 10, 10]
>>> dup={}
>>> for i,x in enumerate(a):
... dup.setdefault(x,[]).append(i)
...
>>> dup
{0: [0], 1: [2, 5], 2: [8], 3: [1, 6, 7], 6: [3], 8: [4], 10: [9, 10]}
Then, actual duplicates can be extracted using set comprehension to filter out elements appearing only once.
>>> {i:x for i,x in dup.iteritems() if len(x)>1}
{1: [2, 5], 10: [9, 10], 3: [1, 6, 7]}
Populate a dictionary whose keys are the values of the integers, and whose values are the lists of positions of those keys. Then go through that dictionary and remove all key/value pairs with only one position. You will be left with the ones that are duplicated.

Categories