Lists in python using itertools (moving only 2 items, permutations)

Lists in python using itertools (moving only 2 items, permutations) - python

I come from C++ as my 1st programming lenguage, Im just getting into python and im looking for a way to switch numbers from a list, in C++ this would be done using pointers moving them around with loops, however this time I need to generate all the permutations of a list A to a list B in Python
List A (the starter list) and list B (the result list)
A= 1234
B= 4231
The program has to show all the possible combinations in order, by moving only 2 numbers at he same time until the A list becomes the B list (the following example is simplified to 4 numbers and might not show all the combinations)
[1,2,3,4]
[1,2,4,3]
[1,4,2,3]
[4,1,2,3]
[4,2,1,3]
[4,2,3,1]
In order to acomplish this, I have found the itertools module, that contains a lot of functions, but havent been able to implement to many functions so far, the following code kind of does what its needed however it does not move the numbers in pairs nor in order
import itertools
from itertools import product, permutations
A = ([1,2,3,4])
B = ([4,2,3,1])
print "\n"
print (list(permutations(sorted(B),4)))
Im thinking about adding a while ( A != B ) then stop the permutations, i already tried this but im not familiar with pythons syntax, any help about how can i accomplish this would be appreciated

I am assuming sorting of the input list is not really required ,
from itertools import permutations
A = ([4, 3, 2, 1])
B = ([1,2,4, 3])
def print_combinations(start, target):
# use list(permutations(sorted(start), len(start))) if sorting of start is really required
all_perms = list(permutations(start, len(start)))
if tuple(target) not in all_perms:
# return empty list if target is not found in all permutations
return []
# return all combinations till target(inclusive)
# using list slicing
temp = all_perms[: all_perms.index(tuple(target)) + 1]
return temp
print print_combinations(A, B)

On the assumption that you are asking the best way to solve this permutation question - here is a different answer:
Think of all the permutations as a set. itertools.permutations generates all those permutations in some order. That is just what you want if you want to find all or some of those permutations. But that's not what you are looking for. You are trying to find paths through those permutations. itertools.permutations generates all the permutations in an order but not necessarily the order you want. And certainly not all the orders: it only generates them once.
So, you could generate all the permutations and consider them as the nodes of a network. Then you could link nodes whenever they are connected by a single swap, to get a graph. This is called the permutohedron. Then you could do a search on that graph to find all the loop-free paths from a to b that you are interested in. This is certainly possible but it's not really optimal. Building the whole graph in advance is an unnecessary step since it can easily be generated on demand.
Here is some Python code that does just that: it generates a depth first search over the permutohedron by generating the neighbours for a node when needed. It doesn't use itertools though.
a = (1,2,3,4)
b = (4,2,3,1)
def children(current):
for i in range(len(a)-1):
yield (current[:i] + (current[i+1],current[i]) +
current[i+2:])
def dfs(current,path,path_as_set):
path.append(current)
path_as_set.add(current)
if current == b:
yield path
else:
for next_perm in children(current):
if next_perm in path_as_set:
continue
for path in dfs(next_perm,path,path_as_set):
yield path
path.pop()
path_as_set.remove(current)
for path in dfs(a,[],set()):
print(path)
If you are really interested in using itertools.permutations, then the object you are trying to study is actually:
itertools.permutations(itertools.permutations(a))
This generates all possible paths through the set of permutations. You could work through this rejecting any that don't start at a and that contain steps that are not a single swap. But that is a very bad approach: this list is very long.

It's not completely clear what you are asking. I think you are asking for a pythonic way to swap two elements in a list. In Python it is usual to separate data structures into immutable and mutable. In this case you could be talking about tuples or lists.
Let's assume you want to swap elements i and j, with j larger.
For immutable tuples the pythonic approach will be to generate a new tuple via slicing like this:
next = (current[:i] + current[j:j+1] + current[i+1:j]
+ current[i:i+1] + current[j+1:])
For mutable lists it would be pythonic to do much the same as C++, though it's prettier in Python:
list[i],list[j] = list[j],list[i]
Alternatively you could be asking about how to solve your permutation question, in which case the answer is that itertools does not really provide much help. I would advise depth first search.

I guess following is a simpler way, I had nearly same issue (wanted swapped number) in a list (append a copy of list to itself list = list + list and then run :
from itertools import combinations_with_replacement
mylist = ['a', 'b']
list(set(combinations_with_replacement(mylist + mylist, r=2)))
results:
[('a', 'b'), ('b', 'a'), ('b', 'b'), ('a', 'a')]

Related

How exactly does Python check through a list?

I was doing one of the course exercises on codeacademy for python and I had a few questions I couldn't seem to find an answer to:
For this block of code, how exactly does python check whether something is "in" or "not in" a list? Does it run through each item in the list to check or does it use a quicker process?
Also, how would this code be affected if it were running with a massive list of numbers (thousands or millions)? Would it slow down as the list size increases, and are there better alternatives?
numbers = [1, 1, 2, 3, 5, 8, 13]
def remove_duplicates(list):
new_list = []
for i in list:
if i not in new_list:
new_list.append(i)
return new_list
remove_duplicates(numbers)
Thanks!
P.S. Why does this code not function the same?
numbers = [1, 1, 2, 3, 5, 8, 13]
def remove_duplicates(list):
new_list = []
new_list.append(i for i in list if i not in new_list)
return new_list

In order to execute i not in new_list Python has to do a linear scan of the list. The scanning loop breaks as soon as the result of the test is known, but if i is actually not in the list the whole list must be scanned to determine that. It does that at C speed, so it's faster than doing a Python loop to explicitly check each item. Doing the occasional in some_list test is ok, but if you need to do a lot of such membership tests it's much better to use a set.
On average, with random data, testing membership has to scan through half the list items, and in general the time taken to perform the scan is proportional to the length of the list. In the usual notation the size of the list is denoted by n, and the time complexity of this task is written as O(n).
In contrast, determining membership of a set (or a dict) can be done (on average) in constant time, so its time complexity is O(1). Please see TimeComplexity in the Python Wiki for further details on this topic. Thanks, Serge, for that link.
Of course, if your using a set then you get de-duplication for free, since it's impossible to add duplicate items to a set.
One problem with sets is that they generally don't preserve order. But you can use a set as an auxilliary collection to speed up de-duping. Here is an illustration of one common technique to de-dupe a list, or other ordered collection, which does preserve order. I'll use a string as the data source because I'm too lazy to type out a list. ;)
new_list = []
seen = set()
for c in "this is a test":
if c not in seen:
new_list.append(c)
seen.add(c)
print(new_list)
output
['t', 'h', 'i', 's', ' ', 'a', 'e']
Please see How do you remove duplicates from a list whilst preserving order? for more examples. Thanks, Jean-François Fabre, for the link.
As for your PS, that code appends a single generator object to new_list, it doesn't append what the generate would produce.
I assume you alreay tried to do it with a list comprehension:
new_list = [i for i in list if i not in new_list]
That doesn't work, because the new_list doesn't exist until the list comp finishes running, so doing in new_list would raise a NameError. And even if you did new_list = [] before the list comp, it won't be modified by the list comp, and the result of the list comp would simply replace that empty list object with a new one.
BTW, please don't use list as a variable name (even in example code) since that shadows the built-in list type, which can lead to mysterious error messages.

You are asking multiple questions and one of them asking if you can do this more efficiently. I'll answer that.
Ok let's say you'd have thousands or millions of numbers. From where exactly? Let's say they were stored in some kind of txtfile, then you would probably want to use numpy (if you are sticking with Python that is). Example:
import numpy as np
numbers = np.array([1, 1, 2, 3, 5, 8, 13], dtype=np.int32)
numbers = np.unique(numbers).tolist()
This will be more effective (above all memory-efficient compared) than reading it with python and performing a list(set..)
numbers = [1, 1, 2, 3, 5, 8, 13]
numbers = list(set(numbers))

You are asking for the algorithmic complexity of this function. To find that you need to see what is happening at each step.
You are scanning the list one at a time, which takes 1 unit of work. This is because retrieving something from a list is O(1). If you know the index, it can be retrieved in 1 operation.
The list to which you are going to add it increases at worst case 1 at a time. So at any point in time, the unique items list is going to be of size n.
Now, to add the item you picked to the unique items list is going to take n work in the worst case. Because we have to scan each item to decide that.
So if you sum up the total work in each step, it would be 1 + 2 + 3 + 4 + 5 + ... n which is n (n + 1) / 2. So if you have a million items, you can just find that by applying n = million in the formula.
This is not entirely true because of how list works. But theoretically, it would help to visualize this way.

to answer the question in the title: python has more efficient data types but the list() object is just a plain array, if you want a more efficient way to search values you can use dict() which uses a hash of the object stored to insert it into a tree which i assume is what you were thinking of when you mentioned "a quicker process".
as to the second code snippet:
list().append() inserts whatever value you give it to the end of the list, i for i in list if i not in new_list is a generator object and it inserts that generator as an object into the array, list().extend() does what you want: it takes in an iterable and appends all of its elements to the list

How to use itertools product with a huge data

I would like to make a list of 81-tuple by using three elements, namely 1,2,3 in python.
I tried to find a solution, and then I found these useful links:
How to use itertools to compute all combinations with repeating elements?
and
Which itertools generator doesn't skip any combinations?
According to the above links, I should do the following
import itertools
list = []
for p in itertools.product(range(1, 3 + 1), repeat=81):
list.append(p)
print(list)
But, my computer hangs. I think there is too much data in the list.
I want to know whether there is a command that prints only first 100-elements in list or the 101th to 200th in the list.

You can use itertools.islice:
p = itertools.product(range(1, 3 + 1), repeat=81)
s = itertools.islice(p, 101, 200)
print(list(s))
This will, however, iterate through all the elements until it reaches the starting index of the slice. So for ranges towards the end of an iterator with a gazillion elements (yours has 3**81 = 443426488243037769948249630619149892803 or in other words: too many to process let alone store), this will run into similar problems.
For those later ranges, you would have to calculate the n-th element by hand and generate successors from there... See How to select specific item from cartesian product without calculating every other item for some inspiration.

Which is more pythonic in a for loop: zip or enumerate?

Which one of these is considered the more pythonic, taking into account scalability and readability?
Using enumerate:
group = ['A','B','C']
tag = ['a','b','c']
for idx, x in enumerate(group):
print(x, tag[idx])
or using zip:
for x, y in zip(group, tag):
print(x, y)
The reason I ask is that I have been using a mix of both. I should keep to one standard approach, but which should it be?

No doubt, zip is more pythonic. It doesn't require that you use a variable to store an index (which you don't otherwise need), and using it allows handling the lists uniformly, while with enumerate, you iterate over one list, and index the other list, i.e. non-uniform handling.
However, you should be aware of the caveat that zip runs only up to the shorter of the two lists. To avoid duplicating someone else's answer I'd just include a reference here: someone else's answer.
#user3100115 aptly points out that in python2, you should prefer using itertools.izip over zip, due its lazy nature (faster and more memory efficient). In python3 zip already behaves like py2's izip.

While others have pointed out that zip is in fact more pythonic than enumerate, I came here to see if it was any more efficient. According to my tests, zip is around 10 to 20% faster than enumerate when simply accessing and using items from multiple lists in parallel.
Here I have three lists of (the same) increasing length being accessed in parallel. When the lists are more than a couple of items in length, the time ratio of zip/enumerate is below zero and zip is faster.
Code I used:
import timeit
setup = \
"""
import random
size = {}
a = [ random.randint(0,i+1) for i in range(size) ]
b = [ random.random()*i for i in range(size) ]
c = [ random.random()+i for i in range(size) ]
"""
code_zip = \
"""
data = []
for x,y,z in zip(a,b,c):
data.append(x+z+y)
"""
code_enum = \
"""
data = []
for i,x in enumerate(a):
data.append(x+c[i]+b[i])
"""
runs = 10000
sizes = [ 2**i for i in range(16) ]
data = []
for size in sizes:
formatted_setup = setup.format(size)
time_zip = timeit.timeit(code_zip, formatted_setup, number=runs)
time_enum = timeit.timeit(code_enum, formatted_setup, number=runs)
ratio = time_zip/time_enum
row = (size,time_zip,time_enum,ratio)
data.append(row)
with open("testzipspeed.csv", 'w') as csv_file:
csv_file.write("size,time_zip,time_enumerate,ratio\n")
for row in data:
csv_file.write(",".join([ str(i) for i in row ])+"\n")

The answer to the question asked in your title, "Which is more pythonic; zip or enumerate...?" is: they both are. enumerate is just a special case of zip.
The answer to your more specific question about that for loop is: use zip, but not for the reasons you've seen so far.
The biggest advantage of zip in that loop has nothing to do with zip itself. It has to do with avoiding the assumptions made in your enumerate loop. To explain, I'll make two different generators based on your two examples:
def process_items_and_tags(items, tags):
"Do something with two iterables: items and tags."
for item, tag in zip(items, tag):
yield process(item, tag)
def process_items_and_list_of_tags(items, tags_list):
"Do something with an iterable of items and an indexable collection of tags."
for idx, item in enumerate(items):
yield process(item, tags_list[idx])
Both generators can take any iterable as their first argument (items), but they differ in how they handle their second argument. The enumerate-based approach can only process tags in a list-like collection with [] indexing. That rules out a huge number of iterables, like file streams and generators, for no good reason.
Why is one parameter more tightly constrained than the other? The restriction isn't inherent in the problem the user is trying to solve, since the generator could just as easily have been written the other way 'round:
def process_list_of_items_and_tags(items_list, tags):
"Do something with an indexable collection of items and an iterable of tags."
for idx, tag in enumerate(tags):
yield process(items[idx], tag)
Same result, different restriction on the inputs. Why should your caller have to know or care about any of that?
As an added penalty, anything of the form some_list[some_index] could raise an IndexError, which you would have to either catch or prevent in some way. That's not normally a problem when your loop both enumerates and accesses the same list-like collection, but here you're enumerating one and then accessing items from another. You'd have to add more code to handle an error that could not have happened in the zip-based version.
Avoiding the unnecessary idx variable is also nice, but hardly the deciding difference between the two approaches.
For more on the subject of iterables, generators, and functions that use them, see Ned Batchelder's PyCon US 2013 talk, "Loop Like a Native" (text, 30-minute video).

zip is more pythonic as said where you don't require another variable while you could also use
from collections import deque
deque(map(lambda x, y:sys.stdout.write(x+" "+y+"\n"),group,tag),maxlen=0)
Since we are printing output here a the list of None values need to be rectified and also provided your lists are of same length.
Update : Well in this case it may not be as good because you are printing group and tag values and it generates a list of None values because of sys.stdout.write but practically if you needed to fetch values it would be better.

zip might be more Pythonic, but it has a gotcha. If you want to change elements in place, you need to use indexing. Iterating over the elements will not work. For example:
x = [1,2,3]
for elem in x:
elem *= 10
print(x)
Output: [1,2,3]
y = [1,2,3]
for index in range(len(y)):
y[i] *= 10
print(y)
Output: [10,20,30]

This is a trivial starting question. I think range(len([list])) isn´t pythonic trying a non pythonist solution.
Thinking about it and reading excelent python documentation, I really like docs as numpy format style in simple pythonic code, that enumerate is a solution for iterables if you need a for loop because make an iterable is a comprehensive form.
list_a = ['a', 'b', 'c'];
list_2 = ['1', '2', '3',]
[print(a) for a in lista]
is for exec the printable line and perhaps better is a generator,
item = genetator_item = (print(i, a) for i, a in enumerate(lista) if a.find('a') == 0)
next(item)
for multiline for and more complex for loops, we can use the enumerate(zip(.
for i, (arg1, arg2) i in enumerate(zip(list_a, list_2)):
print('multiline') # do complex code
but perhaps in extended pythonic code we can use anotrher complex format with itertools, note idx at the end for len(list_a[:]) slice
from itertools import count as idx
for arg1, arg2, i in zip(list_a, list_2, idx(start=1)):
print(f'multiline {i}: {arg1}, {arg2}') # do complex code

alternative to recursion based merge sort logic

here is a merge sort logic in python : (this is the first part, ignore the function merge()) The point in question is converting the recursive logic to a while loop.
Code courtesy: Rosettacode Merge Sort
def merge_sort(m):
if len(m) <= 1:
return m
middle = len(m) / 2
left = m[:middle]
right = m[middle:]
left = merge_sort(left)
right = merge_sort(right)
return list(merge(left, right))
Is it possible to make it a sort of dynamically in the while loop while each left and right array breaks into two, a sort of pointer keeps increasing based on the number of left and right arrays and breaking them until only single length sized list remains?
because every time the next split comes while going on both left- and right- side the array keeps breaking down till only single length list remains, so the number of left sided (left-left,left-right) and right sided (right-left,right-right) breaks will increase till it reaches a list of size 1 for all.

One possible implementation might be this:
def merge_sort(m):
l = [[x] for x in m] # split each element to its own list
while len(l) > 1: # while there's merging to be done
for x in range(len(l) >> 1): # take the first len/2 lists
l[x] = merge(l[x], l.pop()) # and merge with the last len/2 lists
return l[0] if len(l) else []
Stack frames in the recursive version are used to store progressively smaller lists that need to be merged. You correctly identified that at the bottom of the stack, there's a one-element list for each element in whatever you're sorting. So, by starting from a series of one-element lists, we can iteratively build up larger, merged lists until we have a single, sorted list.

Reposted from alternative to recursion based merge sort logic at the request of a reader:
One way to eliminate recursion is to use a queue to manage the outstanding work. For example, using the built-in collections.deque:
from collections import deque
from heapq import merge
def merge_sorted(iterable):
"""Return a list consisting of the sorted elements of 'iterable'."""
queue = deque([i] for i in iterable)
if not queue:
return []
while len(queue) > 1:
queue.append(list(merge(queue.popleft(), queue.popleft())))
return queue[0]

It's said, that every recursive function can be written in a non-recursive manner, so the short answer is: yes, it's possible. The only solution I can think of is to use the stack-based approach. When recursive function invokes itself, it puts some context (its arguments and return address) on the inner stack, which isn't available for you. Basically, what you need to do in order to eliminate recursion is to write your own stack and every time when you would make a recursive call, put the arguments onto this stack.
For more information you can read this article, or refer to the section named 'Eliminating Recursion' in Robert Lafore's "Data Structures and Algorithms in Java" (although all the examples in this book are given in Java, it's pretty easy to grasp the main idea).

Going with Dan's solution above and taking the advice on pop, still I tried eliminating while and other not so pythonic approach. Here is a solution that I have suggested:
PS: l = len
My doubt on Dans solution is what if L.pop() and L[x] are same and a conflict is created, as in the case of an odd range after iterating over half of the length of L?
def merge_sort(m):
L = [[x] for x in m] # split each element to its own list
for x in xrange(l(L)):
if x > 0:
L[x] = merge(L[x-1], L[x])
return L[-1]
This can go on for all academic discussions but I got my answer to an alternative to recursive method.

How to identify an odd item in a list of items using python

My goal is to identify the odd element in the list below.
list_1=['taska1', 'taska2', 'taska3', 'taskb2', 'taska7']
The odd item is tasksb2 as the other four items are under taska.
They all have equal length, hence discriminating using the len function will not work.
Any ideas? thanks.

If you simply want to find the item that does not start with 'taska', then you could use the following list comprehension:
>>> list_1=['taska1', 'taska2', 'taska3', 'taskb2', 'taska7']
>>> print [l for l in list_1 if not l.startswith('taska')]
['taskb2']
Another option is to use filter + lambda:
>>> filter(lambda l: not l.startswith('taska'), list_1)
['taskb2']

Seems to be an easy problem solved by alphabetical sort.
print sorted(list_1)[-1]
Don't wanna sort? Try an O(n) time-complexity solution with O(1) space complexity:
print max(list_1)

If you know what the basic structure of the items will be, then it's easy.
If you don't know the structure of your items a priori, one approach is to score the items according to their similarity against each other. Using info from this question for the standard library module difflib,
import difflib
import itertools
list_1=['taska1', 'taska2', 'taska3', 'taskb2', 'taska7']
# Initialize a dict, keyed on the items, with 0.0 score to start
score = dict.fromkeys(list_1, 0.0)
# Arrange the items in pairs with each other
for w1, w2 in itertools.combinations(list_1, 2):
# Performs the matching function - see difflib docs
seq=difflib.SequenceMatcher(a=w1, b=w2)
# increment the "match" score for each
score[w1]+=seq.ratio()
score[w2]+=seq.ratio()
# Print the results
>>> score
{'taska1': 3.166666666666667,
'taska2': 3.3333333333333335,
'taska3': 3.166666666666667,
'taska7': 3.1666666666666665,
'taskb2': 2.833333333333333}
It turns out that taskb2 has the lowest score!

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.