Split a list into increasing sequences using itertools - python

I have a list with mixed sequences like
[1,2,3,4,5,2,3,4,1,2]
I want to know how I can use itertools to split the list into increasing sequences cutting the list at decreasing points. For instance the above would output
[[1, 2, 3, 4, 5], [2, 3, 4], [1, 2]]
this has been obtained by noting that the sequence decreases at 2 so we cut the first bit there and another decrease is at one cutting again there.
Another example is with the sequence
[3,2,1]
the output should be
[[3], [2], [1]]
In the event that the given sequence is increasing we return the same sequence. For example
[1,2,3]
returns the same result. i.e
[[1, 2, 3]]
For a repeating list like
[ 1, 2,2,2, 1, 2, 3, 3, 1,1,1, 2, 3, 4, 1, 2, 3, 4, 5, 6]
the output should be
[[1, 2, 2, 2], [1, 2, 3, 3], [1, 1, 1, 2, 3, 4], [1, 2, 3, 4, 5, 6]]
What I did to achieve this is define the following function
def splitter (L):
result = []
tmp = 0
initialPoint=0
for i in range(len(L)):
if (L[i] < tmp):
tmpp = L[initialPoint:i]
result.append(tmpp)
initialPoint=i
tmp = L[i]
result.append(L[initialPoint:])
return result
The function is working 100% but what I need is to do the same with itertools so that I can improve efficiency of my code. Is there a way to do this with itertools package to avoid the explicit looping?

With numpy, you can use numpy.split, this requires the index as split positions; since you want to split where the value decreases, you can use numpy.diff to calculate the difference and check where the difference is smaller than zero and use numpy.where to retrieve corresponding indices, an example with the last case in the question:
import numpy as np
lst = [ 1, 2,2,2, 1, 2, 3, 3, 1,1,1, 2, 3, 4, 1, 2, 3, 4, 5, 6]
np.split(lst, np.where(np.diff(lst) < 0)[0] + 1)
# [array([1, 2, 2, 2]),
# array([1, 2, 3, 3]),
# array([1, 1, 1, 2, 3, 4]),
# array([1, 2, 3, 4, 5, 6])]

Psidom already has you covered with a good answer, but another NumPy solution would be to use scipy.signal.argrelmax to acquire the local maxima, then np.split.
from scipy.signal import argrelmax
arr = np.random.randint(1000, size=10**6)
splits = np.split(arr, argrelmax(arr)[0]+1)

Assume your original input array:
a = [1, 2, 3, 4, 5, 2, 3, 4, 1, 2]
First find the places where the splits shall occur:
p = [ i+1 for i, (x, y) in enumerate(zip(a, a[1:])) if x > y ]
Then create slices for each such split:
print [ a[m:n] for m, n in zip([ 0 ] + p, p + [ None ]) ]
This will print this:
[[1, 2, 3, 4, 5], [2, 3, 4], [1, 2]]
I propose to use more speaking names than p, n, m, etc. ;-)

Related

Slicing a list to extract the last k and next k elements given an index

I'm trying to write a smart snippet of code that does the following:
Given a list and an integer-valued parameter k:
k = 2
myList = [1, 2, 3, 4, 5]
I would like to find a way of slicing my list such that I can later construct the following dictionary:
{1: [5, 4, 2, 3],
2: [1, 5, 3, 4],
3: [2, 1, 4, 5],
4: [3, 2, 5, 1],
5: [4, 3, 1, 2]}
i.e, I need to slice my list and extract my last k and next k elements (the order of the elements in the list after slicing does not matter), given an index.
For example, if my index is 0, then I would expect [5, 4, 2, 3].
My problem is similar to this question. However not exactly the same.
I'd appreciate any help, hint, or reference to any source.
You could do:
k = 2
myList = [1, 2, 3, 4, 5]
offset = len(myList)
padded_list = myList * 3
result = {myList[i]: padded_list[j - k: j][::-1] + padded_list[j + 1: j + k + 1] for i, j in
zip(range(offset), range(offset, 2 * offset))}
print(result)
Output
{1: [5, 4, 2, 3], 2: [1, 5, 3, 4], 3: [2, 1, 4, 5], 4: [3, 2, 5, 1], 5: [4, 3, 1, 2]}
The idea is to pad the list with itself before and after, and the iterate in the middle section. This should work without problem while k < len(myList).
I think simple python list slicing should suffice. Basically I sliced the list two times and concatenated the resulting two lists.
>>> l = [1, 2, 3, 4, 5]
>>> k = 2
>>> l[:k-1] + l[k:]
[1, 3, 4, 5]
Good luck!

How would you reshuffle this array efficiently?

I have an array arr_val, which stores values of a certain function at large size of locations (for illustration let's just take a small one 4 locations). Now, let's say that I also have another array loc_array which stores the location of the function, and assume that location is again the same number 4. However, location array is multidimensional array such that each location index has the same 4 sub-location index, and each sub-location index is a pair coordinates. To clearly illustrate:
arr_val = np.array([1, 2, 3, 4])
loc_array = np.array([[[1,1],[2,3],[3,1],[3,2]],[[1,2],[2,4],[3,4],[4,1]],
[[2,1],[1,4],[1,3],[3,3]],[[4,2],[4,3],[2,2],[4,4]]])
The meaning of the above two arrays would be value of some parameter of interest at, for example locations [1,1],[2,3],[3,1],[3,2] is 1, and so on. However, I am interested in re-expressing the same thing above in a different form, which is instead of having random points, I would like to have coordinates in the following tractable form
coord = [[[1,1],[1,2],[1,3],[1,4]],[[2,1],[2,2],[2,3],[2,4]],[[3,1],[3,2],
[3,3],[3,4]],[[4,1],[4,2],[4,3],[4,4]]]
and the values at respective coordinates given as
val = [[1, 2, 3, 3],[3, 4, 1, 2],[1, 1, 3, 2], [2, 4, 4, 4]]
What would be a very efficient way to achieve the above for large numpy arrays?
You can use lexsort like so:
>>> order = np.lexsort(loc_array.reshape(-1, 2).T[::-1])
>>> arr_val.repeat(4)[order].reshape(4, 4)
array([[1, 2, 3, 3],
[3, 4, 1, 2],
[1, 1, 3, 2],
[2, 4, 4, 4]])
If you know for sure that loc_array is a permutation of all possible locations then you can avoid the sort:
>>> out = np.empty((4, 4), arr_val.dtype)
>>> out.ravel()[np.ravel_multi_index((loc_array-1).reshape(-1, 2).T, (4, 4))] = arr_val.repeat(4)
>>> out
array([[1, 2, 3, 3],
[3, 4, 1, 2],
[1, 1, 3, 2],
[2, 4, 4, 4]])
It could not be the answer what you want, but it works anyway.
val = [[1, 2, 3, 3],[3, 4, 1, 2],[1, 1, 3, 2], [2, 4, 4, 4]]
temp= ""
int_list = []
for element in val:
temp_int = temp.join(map(str, element ))
int_list.append(int(temp_int))
int_list.sort()
print(int_list)
## result ##
[1132, 1233, 2444, 3412]
Change each element array into int and construct int_list
Sort int_list
Construct 2D np.array from int_list
I skipped last parts. You may find the way on web.

Repeat list if index range is out of bounds

I have a Python list
a = [1, 2, 3, 4]
and I'd like to get a range of indices such that if I select the indices 0 through N, I'm getting (for N=10) the repeated
[1, 2, 3, 4, 1, 2, 3, 4, 1, 2]
I could of course repeat the list via (int(float(N) / len(a) - 0.5) + 1) * a first and select the range [0:10] out of that, but that feels rather clumsy.
Any hints?
You can simply use the modulo operator when accessing the list, i.e.
a[i % len(a)]
This will give you the same result, but doesn't require to actually store the redundant elements.
You can use itertools.cycle and itertools.islice:
from itertools import cycle, islice
my_list = list(islice(cycle(my_list), 10))
Note that if you just want to iterate over this once, you should avoid calling list and just iterate over the iterable, since this avoids allocating repeated elements.
One easy way is to use modulo with list comprehensions à la
a = [1, 2, 3 ,4]
[k % len(a) for k in range(10)]
>>> a = [1, 2, 3, 4]
>>> (a*3)[:-2]
>>> [1, 2, 3, 4, 1, 2, 3, 4, 1, 2]
Thought I would offer a solution using the * operator for lists.
import math
def repeat_iterable(a, N):
factor = N / len(a) + 1
repeated_list = a * factor
return repeated_list[:N]
Sample Output:
>>> print repeat_iterable([1, 2, 3, 4], 10)
[1, 2, 3, 4, 1, 2, 3, 4, 1, 2]
>>> print repeat_iterable([1, 2, 3, 4], 3)
[1, 2, 3]
>>> print repeat_iterable([1, 2, 3, 4], 0)
[]
>>> print repeat_iterable([1, 2, 3, 4], 14)
[1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2]
How about faking it? Python is good at faking.
class InfiniteList(object):
def __init__(self, data):
self.data = data
def __getitem__(self, i):
return self.data[i % len(self.data)]
x = InfiniteList([10, 20, 30])
x[0] # 10
x[34] # 20
Of course, you could add __iter__, support for slices etc. You could also add a limit (N), but this is the general idea.

Generating circular shifts / reduced Latin Squares in Python

Was just wondering what's the most efficient way of generating all the circular shifts of a list in Python. In either direction. For example, given a list [1, 2, 3, 4], I want to generate either:
[[1, 2, 3, 4],
[4, 1, 2, 3],
[3, 4, 1, 2],
[2, 3, 4, 1]]
where the next permutation is generated by moving the last element to the front, or:
[[1, 2, 3, 4],
[2, 3, 4, 1],
[3, 4, 1, 2],
[4, 1, 2, 3]]
where the next permutation is generated by moving the first element to the back.
The second case is slightly more interesting to me because it results in a reduced Latin square (the first case also gives a Latin square, just not reduced), which is what I'm trying to use to do experimental block design. It actually isn't that different from the first case since they're just re-orderings of each other, but order does still matter.
The current implementation I have for the first case is:
def gen_latin_square(mylist):
tmplist = mylist[:]
latin_square = []
for i in range(len(mylist)):
latin_square.append(tmplist[:])
tmplist = [tmplist.pop()] + tmplist
return latin_square
For the second case its:
def gen_latin_square(mylist):
tmplist = mylist[:]
latin_square = []
for i in range(len(mylist)):
latin_square.append(tmplist[:])
tmplist = tmplist[1:] + [tmplist[0]]
return latin_square
The first case seems like it should be reasonably efficient to me, since it uses pop(), but you can't do that in the second case, so I'd like to hear ideas about how to do this more efficiently. Maybe there's something in itertools that will help? Or maybe a double-ended queue for the second case?
You can use collections.deque:
from collections import deque
g = deque([1, 2, 3, 4])
for i in range(len(g)):
print list(g) #or do anything with permutation
g.rotate(1) #for right rotation
#or g.rotate(-1) for left rotation
It prints:
[1, 2, 3, 4]
[4, 1, 2, 3]
[3, 4, 1, 2]
[2, 3, 4, 1]
To change it for left rotation just replace g.rotate(1) with g.rotate(-1).
For the first part, the most concise way probably is
a = [1, 2, 3, 4]
n = len(a)
[[a[i - j] for i in range(n)] for j in range(n)]
# [[1, 2, 3, 4], [4, 1, 2, 3], [3, 4, 1, 2], [2, 3, 4, 1]]
and for the second part
[[a[i - j] for i in range(n)] for j in range(n, 0, -1)]
# [[1, 2, 3, 4], [2, 3, 4, 1], [3, 4, 1, 2], [4, 1, 2, 3]]
These should also be much more efficient than your code, though I did not do any timings.
variation on slicing "conservation law" a = a[:i] + a[i:]
ns = list(range(5))
ns
Out[34]: [0, 1, 2, 3, 4]
[ns[i:] + ns[:i] for i in range(len(ns))]
Out[36]:
[[0, 1, 2, 3, 4],
[1, 2, 3, 4, 0],
[2, 3, 4, 0, 1],
[3, 4, 0, 1, 2],
[4, 0, 1, 2, 3]]
[ns[-i:] + ns[:-i] for i in range(len(ns))]
Out[38]:
[[0, 1, 2, 3, 4],
[4, 0, 1, 2, 3],
[3, 4, 0, 1, 2],
[2, 3, 4, 0, 1],
[1, 2, 3, 4, 0]]
more_itertools is a third-party library that offers a tool for cyclic permutations:
import more_itertools as mit
mit.circular_shifts(range(1, 5))
# [(1, 2, 3, 4), (2, 3, 4, 1), (3, 4, 1, 2), (4, 1, 2, 3)]
See also Wikipedia:
A circular shift is a special kind of cyclic permutation, which in turn is a special kind of permutation.
The answer by #Bruno Lenzi does not seem to work:
In [10]: from itertools import cycle
In [11]: x = cycle('ABCD')
In [12]: print [[x.next() for _ in range(4)] for _ in range(4)]
[['A', 'B', 'C', 'D'], ['A', 'B', 'C', 'D'], ['A', 'B', 'C', 'D'], ['A', 'B', 'C', 'D']]
I give a correct version below, however the solution by #f5r5e5d is faster.
In [45]: def use_cycle(a):
x=cycle(a)
for _ in a:
x.next()
print [x.next() for _ in a]
....:
In [46]: use_cycle([1,2,3,4])
[2, 3, 4, 1]
[3, 4, 1, 2]
[4, 1, 2, 3]
[1, 2, 3, 4]
In [50]: def use_slice(a):
print [ a[n:] + a[:n] for n in range(len(a)) ]
....:
In [51]: use_slice([1,2,3,4])
[[1, 2, 3, 4], [2, 3, 4, 1], [3, 4, 1, 2], [4, 1, 2, 3]]
In [54]: timeit.timeit('use_cycle([1,2,3,4])','from __main__ import use_cycle',number=100000)
Out[54]: 0.4884989261627197
In [55]: timeit.timeit('use_slice([1,2,3,4])','from __main__ import use_slice',number=100000)
Out[55]: 0.3103291988372803
In [58]: timeit.timeit('use_cycle([1,2,3,4]*100)','from __main__ import use_cycle',number=100)
Out[58]: 2.4427831172943115
In [59]: timeit.timeit('use_slice([1,2,3,4]*100)','from __main__ import use_slice',number=100)
Out[59]: 0.12029695510864258
I removed the print statement in use_cycle and use_slice for timing purposes.
Using itertools to avoid indexing:
x = itertools.cycle(a)
[[x.next() for i in a] for j in a]
This will be my solution.
#given list
a = [1,2,3,4]
#looping through list
for i in xrange(len(a)):
#inserting last element at the starting
a.insert(0,a[len(a)-1])
#removing the last element
a = a[:len(a)-1]
#printing if you want to
print a
This will output the following:
[4, 1, 2, 3]
[3, 4, 1, 2]
[2, 3, 4, 1]
[1, 2, 3, 4]
You can also use pop instead of using list slicing but the problem with pop is that it will return something.
Also the above code will work for any length of list. I have not checked for performance of the code. I am assuming that it will work better.
You should have a look at Python docs for getting a good understanding of List slicing.

python: sampling without replacement from a 2D grid

I need a sample, without replacement, from among all possible tuples of numbers from range(n). That is, I have a collection of (0,0), (0,1), ..., (0,n), (1,0), (1,1), ..., (1,n), ..., (n,0), (n,1), (n,n), and I'm trying to get a sample of k of those elements. I am hoping to avoid explicitly building this collection.
I know random.sample(range(n), k) is simple and efficient if I needed a sample from a sequence of numbers rather than tuples of numbers.
Of course, I can explicitly build the list containing all possible (n * n = n^2) tuples, and then call random.sample. But that probably is not efficient if k is much smaller than n^2.
I am not sure if things work the same in Python 2 and 3 in terms of efficiency; I use Python 3.
Depending on how many of these you're selecting, it might be simplest to just keep track of what things you've already picked (via a set) and then re-pick until you get something that you haven't picked already.
The other option is to just use some simple math:
numbers_in_nxn = random.sample(range(n*n), k) # Use xrange in Python 2.x
tuples_in_nxn = [divmod(x,n) for x in numbers_in_nxn]
You say:
Of course, I can explicitly build the
list containing all possible (n * n =
n^2) tuples, and then call
random.sample. But that probably is
not efficient if k is much smaller
than n^2.
Well, how about building the tuple after you have randomly picked one? Ie, if you can build the tuples before you randomly choose which one to pick, you can do the picking first and building later.
I don't understand how your tuples are supposed to look, but here is an example, although I realize your tuples are all of the same length, this shows the principle:
Instead of doing this:
>>> import random
>>> all_sequences = [range(x) for x in range(10)]
>>> all_sequences
[[], [0], [0, 1], [0, 1, 2], [0, 1, 2, 3], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4, 5], [0, 1, 2, 3, 4, 5, 6], [0, 1, 2, 3, 4, 5, 6, 7], [0, 1, 2, 3, 4, 5, 6, 7, 8]]
>>> random.sample(all_sequences, 3)
[[0, 1, 2, 3, 4, 5, 6, 7], [0, 1, 2, 3, 4, 5], [0, 1, 2, 3, 4, 5, 6, 7, 8]]
You would do this:
>>> import random
>>> selection = random.sample(range(10), 3)
>>> [range(x) for a in selection]
[[0, 1, 2, 3, 4, 5, 6, 7, 8], [0, 1, 2, 3, 4, 5, 6, 7, 8], [0, 1, 2, 3, 4, 5, 6, 7, 8]]
Without trying (no python at hand):
random.shuffle(range(n))[:k]
see comments. Didn't sleep enough...

Categories