How to convert this using list comprehension? - python

l = []
for i in range(11):
l.append(1)
for j in range(i):
l.append(0)
print(l)
The output follows a (1, 1, 0, 1, 0, 0, 1, 0, 0, 0, ...) pattern. I do not know however how to convert nested-for loops using list comprehension.

This is tricky to write as a comprehension because of the need to add two separate elements in each iteration. It can be implemented using a list sum:
l = sum([[1] + [0] * i for i in range(11)], [])
Output:
[
1,
1, 0,
1, 0, 0,
1, 0, 0, 0,
1, 0, 0, 0, 0,
1, 0, 0, 0, 0, 0,
1, 0, 0, 0, 0, 0, 0,
1, 0, 0, 0, 0, 0, 0, 0,
1, 0, 0, 0, 0, 0, 0, 0, 0,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
]

You can use the following:
l = [0 if j else 1 for i in range(11) for j in range(i+1)]
or in a slightly shorter albeit more obfuscated form:
l = [int(not j) for i in range(11) for j in range(i+1)]

Related

Using List Comprehension for a number sequence (1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1 .....) in Python

I'm doing exercise questions from A Practical Introduction to Python Programming by Brian Heinold (pg 83) and there was a simpler question:
Using a for loop, create the list below, which consists of ones separated by increasingly many
zeroes. The last two ones in the list should be separated by ten zeroes.
[1,1,0,1,0,0,1,0,0,0,1,0,0,0,0,1,....]
# Question 11
L = [1]
for i in range(11):
if i == 0:
L.append(1)
else:
for j in range(i):
L.append(0)
L.append(1)
print(L)
Use a list comprehension to create the list below, which consists of ones separated by increasingly many zeroes. The last two ones in the list should be separated by ten zeroes.
[1,1,0,1,0,0,1,0,0,0,1,0,0,0,0,1,....]
I'm having difficulty with converting it into one line using list comprehension. Best I could do was
# Question 15
L = [[1, [0]*i] for i in range(11)]
print(L)
L = [j for row in L for j in row]
print(L)
Which would print:
[[1, []], [1, [0]], [1, [0, 0]], [1, [0, 0, 0]], [1, [0, 0, 0, 0]], [1, [0, 0, 0, 0, 0]], [1, [0, 0, 0, 0, 0, 0]], [1, [0, 0, 0, 0, 0, 0, 0]], [1, [0, 0, 0, 0, 0, 0, 0, 0]], [1, [0, 0, 0, 0, 0, 0, 0, 0, 0]], [1, [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]]
[1, [], 1, [0], 1, [0, 0], 1, [0, 0, 0], 1, [0, 0, 0, 0], 1, [0, 0, 0, 0, 0], 1, [0, 0, 0, 0, 0, 0], 1, [0, 0, 0, 0, 0, 0, 0], 1, [0, 0, 0, 0, 0, 0, 0, 0], 1, [0, 0, 0, 0, 0, 0, 0, 0, 0], 1, [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]
Flattening the list any further would result in a TypeError: 'int' object is not iterable and I would have to add another 1 at the end of the list using a seperate command which I guess would be fine. To add the code for flattening the list kinda goes over my head.
This is a second attempt using if/else statements, that gives the produces the same output and gives the same TypeError for when I try to flatten it completely.
L = [1 if i%2 == 0 else [0]*(i//2) for i in range(22)]
L = [j for row in L for j in row]
print(L)
I just want this as output:
[1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1]
Thanks!
You can solve it like this:
L = [j for i in range(11) for j in [1, *([0]*i)]]
or
L = [j for i in range(11) for j in [1, *(0 for k in range(i))]]
Edit
I missed that it should end with a one. Here's a solution that does that:
L = [1, *(j for i in range(11) for j in [*(0 for k in range(i)), 1])]
or with list comprehension instead of generators:
L = [1, *[j for i in range(11) for j in [*[0 for k in range(i)], 1]]]
You could use a nested comprehension that produces an increasing number of values that you convert to 0s and 1s by returning 0s for all but the last of each group:
n = 10
r = [1]+[b//z for z in range(1,n+2) for b in range(1,z+1)]
print(r)
[1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,
1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1]
If you want to avoid the need to add the initial [1], you could use a similar approach starting z at -1 instead of 1:
r = [ 1-(b<z) for z in range(-1,n+1) for b in range(z+1 or 1)]
I know it's not allowed in the problem, but assume we can use math library, there's a really interesting solution that goes like this:
from math import sqrt, floor
print([1] + [1 * ((-1+sqrt(1+8*i))/2 > 0 and floor((-1+sqrt(1+8*i))/2) == (-1+sqrt(1+8*i))/2 or (-1-sqrt(1+8*i))/2 > 0 and floor((-1-sqrt(1+8*i))/2) == (-1-sqrt(1+8*i))/2) for i in range(1, 67)])
The key observation is that all ones in the target list has an index that's a triangular number except for the one on index 0(no pun intended). So we can just check if the current index is a triangular number.
The nth triangular number can be expressed as:
Let i be the current index we are checking. All we need to do is to solve:
And check if one of the roots is a natural number.
Again, I know this is not a valid answer. I just want to show it as the idea is interesting.
p=[]
for i in range(11):
p.append(1)
for j in range(i):
p.append(i*0)
print(p)

Printing a list with specific seperations

I have been working for a small program that in the end creates a list with 100 randomized numbers in it. I am supposed to print the list in a 10x10 space, and, since I'm a newbie in code I can't really wrap my head around how to do it. Any suggestions?
could look like this:
list1 = [0]*100
and end result should look like this:
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Thanks in advance!
you can achieve this with two for loops. something like this:
import random
final_list =[]
for i in range(0, 10):
row_list = []
for j in range(0, 10):
row_list.append(random.randint(1, 10000))
final_list.append(row_list)
break down:
you could use a library called random for creating a random number. you can look at its document here.
in random.randint(1, 10000) I gave a range which is the range of random numbers. you could change it to suit your requirements.
there might be some better way to do this but it's more understandable :)
what you need is just the built-in function print:
print(*(list1[i: i + 10] for i in range(0, len(list1), 10)), sep='\n')
output:
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
maybe the code is a bit complicated since you are trying to learn but I will try to explain:
there is a generator that splits your initial list into smaller lists of 10 elements:
(list1[i: i + 10] for i in range(0, len(list1), 10))
then I'm using unpack operator * to unpack the generator into print arguments
I'm separating each list by a new line using sep='\n'
a more simple version to achieve your end result is to use a single for loop:
for index in range(0, len(list1), 10):
print(list1[i: i + 10])
ouptu:
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
I have pretty much solved the problem, at least to my satisfacion, it looks like this:
example_list = [0]*100
index_print = 0
for x in range(10):
for y in range(10):
print(example_list[index_print], end=' ')
index_print += 1
print('\n')
Could be inporved, and i will probably do that too, but right now im satisfied.
Sadly i have discovered some problems with my code that i have not solved yet, but thats for anoher question. Thanks for the many answers, would not have expected to get so many on my first post!

Avoid a deepcopy when doing a BFS

I'm currently solving the second exercise in this assignment (this is not homework, I'm actually trying to solve this other problem). My solution uses a BFS to search for the minimal solution to a variant of the "Lights Out" problem, in which pressing a light will flip the state of every light on the same row and the same column.
I think that my implementation is correct, but it's a bit too slow: it's currently taking 12+ seconds to run on my computer (which is unacceptable for my purposes).
from copy import deepcopy
from itertools import chain
from Queue import PriorityQueue
# See: http://www.seas.upenn.edu/~cis391/Homework/Homework2.pdf
class Puzzle(object):
def __init__(self, matrix):
self.matrix = matrix
self.dim = len(matrix)
def __repr__(self):
return str(self.matrix)
def solved(self):
return sum([sum(row) for row in self.matrix]) == 0
def move(self, i, j):
for k in range(self.dim):
self.matrix[i][k] = (self.matrix[i][k] + 1) % 2
self.matrix[k][j] = (self.matrix[k][j] + 1) % 2
self.matrix[i][j] = (self.matrix[i][j] + 1) % 2
return self
def copy(self):
return deepcopy(self)
def next(self):
result = []
for i in range(self.dim):
for j in range(self.dim):
result.append(self.copy().move(i, j))
return result
def solve(self):
q = PriorityQueue()
v = set()
q.put((0, self))
while True:
c = q.get()
if c[1].solved():
return c[0]
else:
for el in c[1].next():
t = el.tuple()
if t not in v:
v.add(t)
q.put((c[0] + 1, el))
def tuple(self):
return tuple(chain.from_iterable(self.matrix))
The culprit, according to cProfile, appears to be the deepcopy call. On the other hand, I see no alternatives: I need to add to the queue another Puzzle object containing a fresh copy of self.matrix.
How can I speed up my implementation?
Here's the test case that I'm using:
print Puzzle([
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
]).solve()
which should return 1 (we only need to press the light in the lower right corner).
EDIT: Here's another gnarly test case:
print Puzzle([
[1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0]
]).solve()
Its solution is at most 14: press all lights on the diagonal that were already on. Unfortunately, the impressive speedup by #zch isn't enough to solve this problem, leading me to believe that, due to the high branching factor, a BFS wasn't the right way to solve this problem.
There is a number of optimizations to be done.
First, avoid deepcopy, implement it your own copying (this by itself worked for me 5x faster):
class Puzzle(object):
def __init__(self, matrix):
self.matrix = [list(row) for row in matrix]
self.dim = len(matrix)
def copy(self):
return Puzzle(self.matrix)
Second, in BFS you don't need priority queue, use Queue or implement your own queuing. This gives some speedup. And third, check for being solved before putting it into the queue, not after taking things out. This should allow you to go one level deeper in comparable time:
def solve(self):
v = set()
q = [(0, self)]
i = 0
while True:
c = q[i]
i += 1
for el in c[1].next():
t = el.tuple()
if t not in v:
if el.solved():
return c[0] + 1
v.add(t)
q.append((c[0] + 1, el))
Further, using a list of list of bits is very memory-inefficient. You can pack all the bits into a single integer and get much faster solution. Additionally you can precompute masks for allowed moves:
def bits(iterable):
bit = 1
res = 0
for elem in iterable:
if elem:
res |= bit
bit <<= 1
return res
def mask(dim, i, j):
res = 0
for idx in range(dim * i, dim * (i + 1)):
res |= 1 << idx
for idx in range(j, dim * dim, dim):
res |= 1 << idx
return res
def masks(dim):
return [mask(dim, i, j) for i in range(dim) for j in range(dim)]
class Puzzle(object):
def __init__(self, matrix):
if isinstance(matrix, Puzzle):
self.matrix = matrix.matrix
self.dim = matrix.dim
self.masks = matrix.masks
else:
self.matrix = bits(sum(matrix, []))
self.dim = len(matrix)
self.masks = masks(len(matrix))
def __repr__(self):
return str(self.matrix)
def solved(self):
return self.matrix == 0
def next(self):
for mask in self.masks:
puzzle = Puzzle(self)
puzzle.matrix ^= mask
yield puzzle
def solve(self):
v = set()
q = [(0, self)]
i = 0
while True:
c = q[i]
i += 1
for el in c[1].next():
t = el.matrix
if t not in v:
if el.solved():
return c[0] + 1
v.add(t)
q.append((c[0] + 1, el))
And finally for another factor of 5 you can pass around just bit matrices, instead of whole Puzzle objects and additionally inline some most often used function.
def solve(self):
v = set()
q = [(0, self.matrix)]
i = 0
while True:
dist, matrix = q[i]
i += 1
for mask in self.masks:
t = matrix ^ mask
if t not in v:
if t == 0:
return dist + 1
v.add(t)
q.append((dist + 1, t))
For me these optimizations combined give speedup of about 250 times.
I changed solve to
def solve(self):
q = PriorityQueue()
v = set()
q.put((0, self))
while True:
c = q.get()
if c[1].solved():
return c[0]
else:
for i in range(self.dim):
for j in range(self.dim):
el = c[1].move(i, j) # do the move
t = el.tuple()
if t not in v:
v.add(t)
q.put((c[0] + 1, el.copy())) # copy only as needed
c[1].move(i, j) # undo the move
As .move(i, j) is its own inverse. Copies are made but only when the state has not been visited. This reduces the time from 7.405s to 5.671s. But this is not as big an improvement as expected.
Then replacing def tuple(self): with:
def tuple(self):
return tuple(tuple(r) for r in self.matrix)
reduces the time from 5.671s to 0.531s. That should do it.
Full listing:
from copy import deepcopy
from Queue import PriorityQueue
# See: http://www.seas.upenn.edu/~cis391/Homework/Homework2.pdf
class Puzzle(object):
def __init__(self, matrix):
self.matrix = matrix
self.dim = len(matrix)
def __repr__(self):
return str(self.matrix)
def solved(self):
return sum([sum(row) for row in self.matrix]) == 0
def move(self, i, j):
for k in range(self.dim):
self.matrix[i][k] = (self.matrix[i][k] + 1) % 2
self.matrix[k][j] = (self.matrix[k][j] + 1) % 2
self.matrix[i][j] = (self.matrix[i][j] + 1) % 2
return self
def copy(self):
return deepcopy(self)
def solve(self):
q = PriorityQueue()
v = set()
q.put((0, self))
while True:
c = q.get()
if c[1].solved():
return c[0]
else:
for i in range(self.dim):
for j in range(self.dim):
el = c[1].move(i, j) # do the move
t = el.tuple()
if t not in v:
v.add(t)
q.put((c[0] + 1, el.copy())) # copy only as needed
c[1].move(i, j) # undo the move
def tuple(self):
return tuple(tuple(r) for r in self.matrix)
print Puzzle([
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
]).solve()

Longest Common Sequence -Index Error

I am trying to find the LCS between two sequences: TACGCTGGTACTGGCAT and AGCTGGTCAGAA. I want my answer to output as a matrix so that I can backtrack which sequence is common (GCTGGT). When I use my code below, I am getting the following error. IndexError: list index out of range. How can I avoid this error in my code below?
def LCS(x, y):
m = len(x)
n = len(y)
C = []
for i in range(m):
for j in range(n):
if x[i] == y[j]:
C[i][j] == C[i-1][j-1] + 1
else:
C[i][j] == 0
return C
x = "TACGCTGGTACTGGCAT"
y = "AGCTGGTCAGAA"
m = len(x)
n = len(y)
C = LCS(x, y)
print C
You need to append to your list, because the index [i][j] does not exist yet.
def LCS(x, y):
m = len(x)
n = len(y)
C = []
for i in range(m):
C.append([]) # append a blank list at index [i]
for j in range(n):
if x[i] == y[j]:
C[i].append(C[i-1][j-1] + 1) # append current element to [i]
else:
C[i].append(0)
return C
Testing
x = "TACGCTGGTACTGGCAT"
y = "AGCTGGTCAGAA"
LCS(x,y)
Output
[[0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1],
[0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0],
[0, 0, 2, 0, 0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 3, 0, 0, 1, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 4, 1, 0, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 1, 5, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 1, 0, 0, 6, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1],
[0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 2, 0, 0, 1, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 3, 1, 0, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 1, 4, 0, 0, 0, 1, 0, 0],
[0, 0, 2, 0, 0, 0, 0, 1, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0, 2, 0, 1, 1],
[0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0]]

Compare a varying number of lists in Python

I'm writing a script that I'm going to use quite often, with datasets of different sizes, and I have to do some comparisons that I just can't get straight in Python.
There will be multiple lists (around 20 or more, but I've reduced them to three for example and testing purposes), all with the same number of integer items in a certain order. I want to compare items on the same position in every list to find differences.
For a defined number of lists, this is easy:
a = [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0]
b = [0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 4, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0]
c = [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 2, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
for x,y,z in zip(a,b,c):
if x != y != z:
print x, y, z
I've tried wrapping that loop in a function, so the number of arguments can vary, but there I got stuck.
def compare(*args):
for x in zip(args):
???
In the final script I will have not multiple single lists, but all together in one list of list. Would that help? If I loop through a list of lists, I won't get every list at once...
Forget the function, it's not really useful anyway as it will be part of a bigger script and it's too difficult defining the different arguments.
I'm now comparing two lists at a time, saving those that are identical. That way, I can later easily remove all those from my whole list and keep only the unique ones.
l_o_l = [[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 4, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0], [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 2, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]
for i in range(0, (len(l_o_l)-1)):
for j in range((i+1), len(l_o_l)):
if l_o_l[i] == l_o_l[j]:
duplicates.append(key_list[i])
duplicates.append(key_list[j])
dup = list(set(duplicates))
uniques = [x for x in key_list if x not in dup]
where the key_list contains, from a dictionary, identifiers for my lists.
Any suggestions for improvement?
Maybe something like this
def compare(*args):
for things in zip(*args):
yield all(x == things[0] for x in things)
You can then use it like this
a = range(10)
b = range(10)
c = range(10)
d = range(11, 20)
for match in compare(a,b,c):
print match
for match in compare(a,b,c,d):
print match
Here is a demo using your example (its a generator, so you have to iter over it or exhaust it using list)
a = [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0]
b = [0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 4, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0]
c = [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 2, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
print list(compare(a,b,c))
def compare(*args):
for x in zip(args):
values_list = list(x[0]) # x[0] because x is a tuple
different_values = set(values_list) # a set does not contain identical values
if len(different_values) != 1: # if you have more than 1 value you have different values in your list
print 'different values', values_list
gives you
a = [0, 0, 1]
b = [0, 1, 1]
c = [1, 1, 1]
compare(a, b, c)
>>> different values [0, 0, 1]
>>> different values [0, 1, 1]
Assuming the lists are similar to the ones in the example, I would use:
def compare(*args):
for x in zip(args):
if min(x) != max(x):
print x
def compare(elements):
return len(set(elements)) == bool(elements)
If you want to know whether all the lists are the same you can simply do:
all(compare(elements) for elements in zip(the_lists))
An alternative could be to transform the lists into tuples and use set there:
len(set(tuple(the_list) for the_list in the_lists) == bool(the_lists)
If you simply want to remove duplicates this should be faster:
the_lists = [list(elem) for elem in set(tuple(the_list) for the_list in the_lists)]
Example usage:
>>> a = range(100)
>>> b = range(100, 200)
>>> c = range(200, 300)
>>> d = a[:]
>>> e = b[:]
>>> the_lists = [a,b,c,d,e]
>>> the_lists2 = [list(elem) for elem in set(tuple(the_list) for the_list in the_lists)]
>>> [a,b,c] == sorted(the_lists2) #order is not maintained by set
True
It seems to be pretty fast:
>>> timeit.timeit('[list(elem) for elem in set(tuple(the_list) for the_list in the_lists)]', 'from __main__ import the_lists', number=1000000)
7.949447154998779
Less than 8 seconds for executing 1 million times. (Where the_lists is the same used before.)
Edit:
If you want to remove only the duplicated list then the simplest algorithm I can think of is sorting the list-of-lists and using itertools.groupby:
>>> a = range(100)
>>> b = range(100,200)
>>> c = range(200,300)
>>> d = a[:]
>>> e = b[:]
>>> the_lists = [a,b,c,d,e]
>>> the_lists.sort()
>>> import itertools as it
>>> for key, group in it.groupby(the_lists):
... if len(list(group)) == 1:
... print key
...
[200, 201, 202, ..., 297, 298, 299]
I think trying to get clever with *args and zip is just confusing the issue. I would write it something like this:
def compare(list_of_lists):
# assuming not an empty data set
inner_len = len(list_of_lists[0])
for index in range(inner_len):
expected = list_of_lists[0][index]
for inner_list in list_of_lists:
if inner_list[index] != expected:
# report difference at this index

Categories