Iterating over consecutive sublists in Python - python

Does Python offer a way to iterate over all "consecutive sublists" of a given list L - i.e. sublists of L where any two consecutive elements are also consecutive in L - or should I write my own?
(Example: if L = [1, 2, 3], then the set over which I want to iterate is {[1], [2], [3], [1, 2], [2,3], [1, 2, 3]}. [1, 3] is skipped since 1 and 3 are not consecutive in L.)

I don't think there's a built-in for exactly that; but it probably wouldn't be too difficult to code up by hand - you're basically just looping through all of the possible lengths from 1 to L.length, and then taking all substrings of each length.
You could probably use itertools.chain() to combine the sequences for each length of substring together into a generator for all of them.
Example:
>>> a = [1,2,3,4]
>>> list(
... itertools.chain(
... *[[a[i:i+q] for q in xrange(1,len(a)-i+1)] for i in xrange(len(a))]
... )
... )
[[1], [1, 2], [1, 2, 3], [1, 2, 3, 4], [2], [2, 3], [2, 3, 4], [3], [3, 4], [4]]
If you prefer them in the increasing-length-and-then-lexographical-order sequence that you described, you'd want this instead:
itertools.chain(*[[a[q:i+q] for q in xrange(len(a)-i+1)] for i in xrange(1,len(a)+1)])

Try something like this:
def iter_sublists(l):
n = len(l)+1
for i in xrange(n):
for j in xrange(i+1, n):
yield l[i:j]
>>> print list(iter_sublists([1,2,3]))
[[1], [1, 2], [1, 2, 3], [2], [2, 3], [3]]

This should work:
def sublists(lst):
for sublen in xrange(1,len(lst)+1):
for idx in xrange(0,len(lst)-sublen+1):
yield lst[idx:idx+sublen]

Related

Understanding recursion using power set example

I have written a simple piece of code to print all subsets of a set using recursion. I have a hard time understanding the output. For example, the first line in the output shows an empty set and a singleton of 3 whereas I was expecting an empty set and singleton of 1 to be printed followed by an empty set, singleton of 1, singleton of 2 etc. However, that is not what gets printed. I do not know how to visualize recursion tree. Are there any general techniques to accomplish the visualisation? I tried drawing a tree but it quickly gets confusing.
def subsets(self, nums):
inp = nums
out = []
result=[]
def helper(inp,out,index):
if index==len(inp):
result.append(out)
return
helper(inp,out,index+1)
helper(inp,out+[inp[index]],index+1)
print(result)
helper(inp,out,0)
return result
The output from the print statement for the input '[1,2,3]' is shown below
[[], [3]]
[[], [3], [2], [2, 3]]
[[], [3], [2], [2, 3]]
[[], [3], [2], [2, 3], [1], [1, 3]]
[[], [3], [2], [2, 3], [1], [1, 3], [1, 2], [1, 2, 3]]
[[], [3], [2], [2, 3], [1], [1, 3], [1, 2], [1, 2, 3]]
[[], [3], [2], [2, 3], [1], [1, 3], [1, 2], [1, 2, 3]]
If you add an "indentation" parameter to your function, while you explore it, you can immediately see which function calls which:
def subsets(nums):
inp = nums
out = []
result=[]
def helper(indent,inp,out,index):
print(f"{indent}->helper({inp},{out},{index})")
if index==len(inp):
result.append(out)
return
helper(indent+'--',inp,out,index+1)
helper(indent+'--',inp,out+[inp[index]],index+1)
helper('',inp,out,0)
return result
The result will look like:
->helper([1, 2, 3],[],0)
--->helper([1, 2, 3],[],1)
----->helper([1, 2, 3],[],2)
------->helper([1, 2, 3],[],3)
------->helper([1, 2, 3],[3],3)
----->helper([1, 2, 3],[2],2)
------->helper([1, 2, 3],[2],3)
------->helper([1, 2, 3],[2, 3],3)
--->helper([1, 2, 3],[1],1)
----->helper([1, 2, 3],[1],2)
------->helper([1, 2, 3],[1],3)
------->helper([1, 2, 3],[1, 3],3)
----->helper([1, 2, 3],[1, 2],2)
------->helper([1, 2, 3],[1, 2],3)
------->helper([1, 2, 3],[1, 2, 3],3)
So you can immidiately see why you get [] first--you get it when you go all the way through the list without including anything in the results. You get [3] next because you backtrack to the call where you add 3 and then go to the end. You get [2] by backtracking a bit further, to where you include 2 in the output, and then down the path that doesn't add 3. Then you get [2,3] because you backtrack one level up, to the call that has 2 included in the result, and this time go to the path that adds 3.
It probably isn't the easiest way to compute a power-set, though. There is a one-to-one correspondence between the powerset of size n and the binary numbers between 0 and 2**n-1. For each number, the 1-bits indicate which elements to include in the set. So you can also compute the powerset like this:
def subsets(nums):
return [
[nums[j] for j, b in enumerate(reversed(format(i, 'b'))) if b == '1']
for i in range(2**len(nums))
]
It runs in exponential size, but so does the recursive version, and that is unavoidable when the output is exponential in the size of the input.

Indices of duplicate lists in a nested list

I am trying to solve a problem that is a part of my genome alignment project. The problem goes as follows:
if given a nested list
y = [[1,2,3],[1,2,3],[3,4,5],[6,5,4],[4,2,5],[4,2,5],[1,2,8],[1,2,3]]
extract indices of unique lists into a nested list again.
For example, the output for the above nested list should be
[[0,1,7],[2],[3],[4,5],[6]].
This is because list [1,2,3] is present in 0,1,7th index positions, [3,4,5] in 2nd index position and so on.
Since I will be dealing with large lists, what could be the most optimal way of achieving this in Python?
You could create an dictionary (or OrderedDict if on older pythons). The keys of the dict will be tuples of the sub-lists and the values will be an array of indexes. After looping through, the dictionary values will hold your answer:
from collections import OrderedDict
y = [[1,2,3],[1,2,3],[3,4,5],[6,5,4],[4,2,5],[4,2,5],[1,2,8],[1,2,3]]
lookup = OrderedDict()
for idx,l in enumerate(y):
lookup.setdefault(tuple(l), []).append(idx)
list(lookup.values())
# [[0, 1, 7], [2], [3], [4, 5], [6]]
You could use list comprehension and range to check for duplicate indexes and append them to result.
result = []
for num in range(len(y)):
occurances = [i for i, x in enumerate(y) if x == y[num]]
if occurances not in result: result.append(occurances)
result
#[[0, 1, 7], [2], [3], [4, 5], [6]]
Consider numpy to solve this:
import numpy as np
y = [
[1, 2, 3],
[1, 2, 3],
[3, 4, 5],
[6, 5, 4],
[4, 2, 5],
[4, 2, 5],
[1, 2, 8],
[1, 2, 3]
]
# Returns unique values of array, indices of that
# array, and the indices that would rebuild the original array
unique, indices, inverse = np.unique(y, axis=0, return_index=True, return_inverse=True)
Here's a print out of each variable:
unique = [
[1 2 3]
[1 2 8]
[3 4 5]
[4 2 5]
[6 5 4]]
indices = [0 6 2 4 3]
inverse = [0 0 2 4 3 3 1 0]
If we look at our variable - inverse, we can see that we do indeed get [0, 1, 7] as the index positions for our first unique element [1,2,3], all we need to do now is group them appropriately.
new_list = []
for i in np.argsort(indices):
new_list.append(np.where(inverse == i)[0].tolist())
Output:
new_list = [[0, 1, 7], [2], [3], [4, 5], [6]]
Finally, refs for the code above:
Numpy - unique, where, argsort
One more solution:
y = [[1, 2, 3], [1, 2, 3], [3, 4, 5], [6, 5, 4], [4, 2, 5], [4, 2, 5], [1, 2, 8], [1, 2, 3]]
occurrences = {}
for i, v in enumerate(y):
v = tuple(v)
if v not in occurrences:
occurrences.update({v: []})
occurrences[v].append(i)
print(occurrences.values())

Data structure to represent multiple equivalent keys in set in Python?

Currently, I want to find the correct data structure to meet the following requirement.
There are multiple arrays with disordered element, for example,
[1, 2], [2, 1], [3, 2, 2], [2], [2, 1, 3], [2, 2, 3]
After processing those data, the result is,
[1, 2], [2, 2, 3], [2], [1, 2, 3]
With sorted element in each array and filter the duplicate arrays.
Here are my thoughts:
Data structure Set(Arrays)? - Failed. It seems there is only one array in the build-in set
set([])
Data structure Array(Sets)? - Failed. However, there is no duplicate element in the build-in set. I want to know whether there is one data structure like multiset in C++ within Python?
Transform your list to tuple(thus can be a item of set), then back to list.
>>> [list(i) for i in set([tuple(sorted(i)) for i in a])]
[[1, 2], [2], [2, 2, 3], [1, 2, 3]]
lst = [[1, 2], [2, 1], [3, 2, 2], [2], [2, 1, 3], [2, 2, 3]]
map(list, set(map(tuple, map(sorted, lst)))
Output:
[[1, 2], [2], [2, 2, 3], [1, 2, 3]]
Try this:
[list(i) for i in set(map(tuple, a))]
EDIT:
Assuming that list is already sorted. Thanks to #PM2RING to remind me.
If not, then add this line above
a = [sorted(i) for i in a]
Thanks again to #PM2RING: one liner
[list(i) for i in set(map(tuple, (sorted(i) for i in a)))]
Demo
Some of the solutions currently here are destroying ordering. I'm not sure if that's important to you or not, but here is a version which preserves original ordering:
>>> from collections import OrderedDict
>>> A = [[1, 2], [2, 1], [3, 2, 2], [2], [2, 1, 3], [2, 2, 3]]
>>> [list(k) for k in OrderedDict.fromkeys(tuple(sorted(a)) for a in A)]
[[1, 2], [2, 2, 3], [2], [1, 2, 3]]
No Python, doesn't have a built-in multiset; the closest equivalent in the standard modules is collections.Counter, which is a type of dictionary. A Counter may be suitable for your needs, but it's hard to tell without more context.
Note that sets do not preserve order of addition. If you need to preserve the initial ordering of the lists, you can do what you want like this:
data = [[1, 2], [2, 1], [3, 2, 2], [2], [2, 1, 3], [2, 2, 3]]
a = set()
outlist = []
for s in data:
t = tuple(sorted(s))
if t not in a:
a.add(t)
outlist.append(list(t))
print(outlist)
output
[[1, 2], [2, 2, 3], [2], [1, 2, 3]]
If the number of input lists is fairly small you don't need the set (and the list<->tuple conversions), just test membership in outlist. However, that's not efficient for larger input lists since it performs a linear search on the list.

How to retrieve list(s) that contains specific query items

I am trying to group list of items relevant to a query item. Below is an example of the problem and my attempt at it:
>>> _list=[[1,2,3],[2,3,4]]
>>> querylist=[1,2,4]
>>> relvant=[]
>>> for x in querylist:
for y in _list:
if x in y:
relvant.append(y)
My output:
>>> relvant
[[1, 2, 3], [1, 2, 3], [2, 3, 4], [2, 3, 4]]
Desired output:
[[[1, 2, 3]], [[1, 2, 3], [2, 3, 4]],[[2, 3, 4]]]
The issue is after each loop of a query item, I expected the relevant lists to be grouped but that isn't the case with my attempt.
Thanks for your suggestions.
I think it's clearer to use a list comprehension:
>>> _list = [[1,2,3],[2,3,4]]
>>> querylist = [1,2,4]
>>> [[l for l in _list if x in l] for x in querylist]
[[[1, 2, 3]], [[1, 2, 3], [2, 3, 4]], [[2, 3, 4]]]
The inner expression [l for l in _list if x in l] describes the list of all sublists that contain x. The outer expression's job is to get that list for all values of x in the query list.
By making minimal changes in the code provided you can create new dummy list to store values and at end of each inner loop iteration you just append it to the main list.
_list=[[1,2,3],[2,3,4]]
querylist=[1,2,4]
relvant=[]
for x in querylist:
dummy = []
for y in _list:
if x in y:
dummy.append(y)
relvant.append(dummy)
print relvant
>>> [[[1, 2, 3]], [[1, 2, 3], [2, 3, 4]],[[2, 3, 4]]]

A strange behavior when I append to a list in Python

I am looking for the Josephus_problem ,but the result is not my Expected. Why?
def J(n,x):
li=range(1,n+1)
k=0
res=[]
while len(li)>1:
k= (x+k-1) % len(li)
li.pop(k)
res.append(li)
#print li
return res
print J(5,3)
Expected Output:
[1, 2, 4, 5]
[2, 4, 5]
[2, 4]
[4]
Actual Output:
[[4], [4], [4], [4]]
You need to append copy of list here:
res.append(li[:]) # <-- not res.append(li) !!!
The actual reason of what's going on it that list is mutable data structure in Python. Look at this snippet
>>> l = [1,2,3]
>>> p = [l,l,l]
>>> p
[[1, 2, 3], [1, 2, 3], [1, 2, 3]]
>>> l.pop()
3
>>> p
[[1, 2], [1, 2], [1, 2]]

Categories