How does `sum` flatten lists? - python

A multidimensional list like l=[[1,2],[3,4]] could be converted to a 1D one by doing sum(l,[]). How does this happen?
(This doesn't work directly for higher multidimensional lists, but it can be repeated to handle those cases. For example if A is a 3D-list, then sum(sum(A),[]),[]) will flatten A to a 1D list.)

If your list nested is, as you say, "2D" (meaning that you only want to go one level down, and all 1-level-down items of nested are lists), a simple list comprehension:
flat = [x for sublist in nested for x in sublist]
is the approach I'd recommend -- much more efficient than summing would be (sum is intended for numbers -- it was just too much of a bother to somehow make it block all attempts to "sum" non-numbers... I was the original proposer and first implementer of sum in the Python standard library, so I guess I should know;-).
If you want to go down "as deep as it takes" (for deeply nested lists), recursion is the simplest way, although by eliminating the recursion you can get higher performance (at the price of higher complication).
This recipe suggests a recursive solution, a recursion elimination, and other approaches
(all instructive, though none as simple as the one-liner I suggested earlier in this answer).

sum adds a sequence together using the + operator. e.g sum([1,2,3]) == 6. The 2nd parameter is an optional start value which defaults to 0. e.g. sum([1,2,3], 10) == 16.
In your example it does [] + [1,2] + [3,4] where + on 2 lists concatenates them together. Therefore the result is [1,2,3,4]
The empty list is required as the 2nd paramter to sum because, as mentioned above, the default is for sum to add to 0 (i.e. 0 + [1,2] + [3,4]) which would result in unsupported operand type(s) for +: 'int' and 'list'
This is the relevant section of the help for sum:
sum(sequence[, start]) -> value
Returns the sum of a sequence of
numbers (NOT strings) plus the value
of parameter 'start' (which defaults
to 0).
Note
As wallacoloo comented this is not a general solution for flattening any multi dimensional list. It just works for a list of 1D lists due to the behavior described above.
Update
For a way to flatten 1 level of nesting see this recipe from the itertools page:
def flatten(listOfLists):
"Flatten one level of nesting"
return chain.from_iterable(listOfLists)
To flatten more deeply nested lists (including irregularly nested lists) see the accepted answer to this question (there are also some other questions linked to from that question itself.)
Note that the recipe returns an itertools.chain object (which is iterable) and the other question's answer returns a generator object so you need to wrap either of these in a call to list if you want the full list rather than iterating over it. e.g. list(flatten(my_list_of_lists)).

For any kind of multidiamentional array, this code will do flattening to one dimension :
def flatten(l):
try:
return flatten(l[0]) + (flatten(l[1:]) if len(l) > 1 else []) if type(l) is list else [l]
except IndexError:
return []

It looks to me more like you're looking for a final answer of:
[3, 7]
For that you're best off with a list comprehension
>>> l=[[1,2],[3,4]]
>>> [x+y for x,y in l]
[3, 7]

I wrote a program to do multi-dimensional flattening using recursion. If anyone has comments on making the program better, you can always see me smiling:
def flatten(l):
lf=[]
li=[]
ll=[]
p=0
for i in l:
if type(i).__name__=='list':
li.append(i)
else:
lf.append(i)
ll=[x for i in li for x in i]
lf.extend(ll)
for i in lf:
if type(i).__name__ =='list':
#not completely flattened
flatten(lf)
else:
p=p+1
continue
if p==len(lf):
print(lf)

I've written this function:
def make_array_single_dimension(l):
l2 = []
for x in l:
if type(x).__name__ == "list":
l2 += make_array_single_dimension(x)
else:
l2.append(x)
return l2
It works as well!

The + operator concatenates lists and the starting value is [] an empty list.

Related

Check number not a sum of 2 ints on a list

Given a list of integers, I want to check a second list and remove from the first only those which can not be made from the sum of two numbers from the second. So given a = [3,19,20] and b = [1,2,17], I'd want [3,19].
Seems like a a cinch with two nested loops - except that I've gotten stuck with break and continue commands.
Here's what I have:
def myFunction(list_a, list_b):
for i in list_a:
for a in list_b:
for b in list_b:
if a + b == i:
break
else:
continue
break
else:
continue
list_a.remove(i)
return list_a
I know what I need to do, just the syntax seems unnecessarily confusing. Can someone show me an easier way? TIA!
You can do like this,
In [13]: from itertools import combinations
In [15]: [item for item in a if item in [sum(i) for i in combinations(b,2)]]
Out[15]: [3, 19]
combinations will give all possible combinations in b and get the list of sum. And just check the value is present in a
Edit
If you don't want to use the itertools wrote a function for it. Like this,
def comb(s):
for i, v1 in enumerate(s):
for j in range(i+1, len(s)):
yield [v1, s[j]]
result = [item for item in a if item in [sum(i) for i in comb(b)]]
Comments on code:
It's very dangerous to delete elements from a list while iterating over it. Perhaps you could append items you want to keep to a new list, and return that.
Your current algorithm is O(nm^2), where n is the size of list_a, and m is the size of list_b. This is pretty inefficient, but a good start to the problem.
Thee's also a lot of unnecessary continue and break statements, which can lead to complicated code that is hard to debug.
You also put everything into one function. If you split up each task into different functions, such as dedicating one function to finding pairs, and one for checking each item in list_a against list_b. This is a way of splitting problems into smaller problems, and using them to solve the bigger problem.
Overall I think your function is doing too much, and the logic could be condensed into much simpler code by breaking down the problem.
Another approach:
Since I found this task interesting, I decided to try it myself. My outlined approach is illustrated below.
1. You can first check if a list has a pair of a given sum in O(n) time using hashing:
def check_pairs(lst, sums):
lookup = set()
for x in lst:
current = sums - x
if current in lookup:
return True
lookup.add(x)
return False
2. Then you could use this function to check if any any pair in list_b is equal to the sum of numbers iterated in list_a:
def remove_first_sum(list_a, list_b):
new_list_a = []
for x in list_a:
check = check_pairs(list_b, x)
if check:
new_list_a.append(x)
return new_list_a
Which keeps numbers in list_a that contribute to a sum of two numbers in list_b.
3. The above can also be written with a list comprehension:
def remove_first_sum(list_a, list_b):
return [x for x in list_a if check_pairs(list_b, x)]
Both of which works as follows:
>>> remove_first_sum([3,19,20], [1,2,17])
[3, 19]
>>> remove_first_sum([3,19,20,18], [1,2,17])
[3, 19, 18]
>>> remove_first_sum([1,2,5,6],[2,3,4])
[5, 6]
Note: Overall the algorithm above is O(n) time complexity, which doesn't require anything too complicated. However, this also leads to O(n) extra auxiliary space, because a set is kept to record what items have been seen.
You can do it by first creating all possible sum combinations, then filtering out elements which don't belong to that combination list
Define the input lists
>>> a = [3,19,20]
>>> b = [1,2,17]
Next we will define all possible combinations of sum of two elements
>>> y = [i+j for k,j in enumerate(b) for i in b[k+1:]]
Next we will apply a function to every element of list a and check if it is present in above calculated list. map function can be use with an if/else clause. map will yield None in case of else clause is successful. To cater for this we can filter the list to remove None values
>>> list(filter(None, map(lambda x: x if x in y else None,a)))
The above operation will output:
>>> [3,19]
You can also write a one-line by combining all these lines into one, but I don't recommend this.
you can try something like that:
a = [3,19,20]
b= [1,2,17,5]
n_m_s=[]
data=[n_m_s.append(i+j) for i in b for j in b if i+j in a]
print(set(n_m_s))
print("after remove")
final_data=[]
for j,i in enumerate(a):
if i not in n_m_s:
final_data.append(i)
print(final_data)
output:
{19, 3}
after remove
[20]

python array creation syntax [for in range]

I came across the following syntax to create a python array. It is strange to me.
Can anyone explain it to me? And how should I learn this kind of syntax?
[str(index) for index in range(100)]
First of all, this is not an array. This is a list. Python does have built-in arrays, but they are rarely used (google the array module, if you're interested). The structure you see is called list comprehension. This is the fastest way to do vectorized stuff in pure Python. Let's get through some examples.
Simple list comprehensions are written this way:
[item for item in iterable] - this will build a list containing all items of an iterable.
Actually, you can do something with each item using an expression or a function: [item**2 for item in iterable] - this will square each element, or [f(item) for item in iterable] - f is a function.
You can even add if and else statements like this [number for number in xrange(10) if not number % 2] - this will create a list of even numbers; ['even' if not number % 2 else 'odd' for number in range(10)] - this is how you use else statements.
You can nest list comprehensions [[character for character in word] for word in words] - this will create a list of lists. List comprehensions are similar to generator expressions, so you should google Python docs for additional information.
List comprehensions and generator expressions are among the most powerful and valuable Python features. Just start an interactive session and play for a while.
P.S.
There are other types of comprehensions that create sets and dictionaries. They use the same concept. Google them for additional information.
List comprehension itself is concept derived from mathematics' set comprehension, where to get new set, you specify parent set and the rule to filter out its elements.
In its simplest but full form list comprehension looks like this:
[f(i) for i in range(1000) if i % 2 == 0]
range(1000) - set of values you iterates through. It could be any iterable (list, tuple etc). range is just a function, which returns list of consecutive numbers, e.g. range(4) -> [0, 1, 2, 3]
i - variable will be assigned on each iteration.
if i%2 == 0 - rule condition to filter values. If condition is not True, resulting list will not contain this element.
f(i) - any python code or function on i, result of which will be in resulting list.
For understand concept of list comprehensions, try them out in python console, and look at output. Here is some of them:
[i for i in [1,2,3,4]]
[i for i in range(10)]
[i**2 for i in range(10)]
[max(4, i) for i in range(10)]
[(1 if i>5 else -1) for i in range(10)]
[i for i in range(10) if i % 2 == 0]
I recommend you to unwrap all comprehensions you face into for-loops to better understand their mechanics and syntax until you get used to them. For example, your comprehension can be unwrapped this way:
newlist = []
for index in range(100)
newlist.append(str(index))
I hope it's clear now.

Understanding python policy for finding the minimum in a list of list

I have the following list of lists of values and I want to find the min value among all the values.
Q = [[8.85008011807927, 4.129896248976861, 5.556804136197901],
[8.047707185696948, 7.140707521433818, 7.150610818529693],
[7.5326340018228555, 7.065307672838521, 6.862894377422498]]
I was planning to do something like:
min(min(Q))
I tried this approach on a smaller example and it works:
>>>b = [[2,2],[1,9]]
>>>min(b)
[1, 9]
>>>min(min(b))
1
But using this on my original list Q it returns the wrong result:
>>> min(Q)
[7.5326340018228555, 7.065307672838521, 6.862894377422498]
>>> min(min(Q))
6.862894377422498
Why is this approach wrong and why?
Lists are compared using their lexicographical order1 (i.e. first elements compared, then the second, then the third and so on), so just because list_a < list_b doesn't mean that the smallest element in list_a is less than the smallest element in list_b, which is why your approach doesn't work in the general case.
For example, consider this:
>>> l1 = [3, 0]
>>> l2 = [2, 1]
>>>
>>> min(l1, l2)
[2, 1]
The reason min(l1, l2) is [2, 1] is because the first element of l1 (3) is initially compared with that of l2 (2). Now, 2 < 3, so l2 is returned as the minimum without any further comparisons. However, it is l1 that really contains the smallest number out of both lists (0) which occurs after the initial element. Therefore, taking the min of min(l1, l2) gives us the incorrect result of 1.
A good way to address this would be to find the minimum of the "flattened" list, which can be obtained with a generator:
>>> Q = [[8.85008011807927, 4.129896248976861, 5.556804136197901],
... [8.047707185696948, 7.140707521433818, 7.150610818529693],
... [7.5326340018228555, 7.065307672838521, 6.862894377422498]]
>>>
>>> min(a for sub in Q for a in sub) # <--
4.129896248976861
(+1 to #Ffisegydd for posting a solution along these lines first.)
1 From http://docs.python.org/3/tutorial/datastructures.html#comparing-sequences-and-other-types:
Sequence objects may be compared to other objects with the same sequence type. The comparison uses lexicographical ordering: first the first two items are compared, and if they differ this determines the outcome of the comparison; if they are equal, the next two items are compared, and so on, until either sequence is exhausted. If two items to be compared are themselves sequences of the same type, the lexicographical comparison is carried out recursively. If all items of two sequences compare equal, the sequences are considered equal. If one sequence is an initial sub-sequence of the other, the shorter sequence is the smaller (lesser) one.
Your approach didn't work properly because, that is how Python sequence comparison is done
I want to find the min value among all the values.
If you want to find the minimum of all the values, you can do something like this
print min(map(min, Q))
# 4.12989624898
You can use a generator expression coupled with the min function to find the answer:
Q = [[8.85008011807927, 4.129896248976861, 5.556804136197901],
[8.047707185696948, 7.140707521433818, 7.150610818529693],
[7.5326340018228555, 7.065307672838521, 6.862894377422498]]
minimum = min(i for j in Q for i in j)
print(minimum) # 4.12989624898
This generator expression flattens your list of lists and then simply returns the minimum value.
min(map(min,Q)) ist the command you're looking for.
min(Q) returns the "minimum" list in Q, which is the list that has the smallest first element.
Therefore, min(min(Q)) returns the smallest element of the list with the smalles first element, which is not what you want.
You could use
min(min(x) for x in Q)
instead, which returns the smallest of the minimums of all lists in Q.
What you really want is to flatten that list and then find the minimum:
min(value for row in Q for value in row)
There are lots of answers, but the easiest way IMHO is to make the 'list of lists' into a single list using itertools.chain.from_iterable:
from itertools import chain
min(chain.from_iterable(Q))
or the shorter and just as easy to read (to me) version:
min(chain(*Q))
I think I found why,
min applied on a list of lists will compare the first values of each sublist.
>>> b=[[3,1],[2,5]]
>>> min(b)
[2, 5]
min(Q) does not always return the list that must contain Minimum of all values. That's why your approach is wrong.
You must find the min value of all list and make another list. then find min of that list, that's it.

python list.iteritems replacement

I've got a list in which some items shall be moved into a separate list (by a comparator function). Those elements are pure dicts. The question is how should I iterate over such list.
When iterating the simplest way, for element in mylist, then I don't know the index of the element. There's no .iteritems() methods for lists, which could be useful here. So I've tried to use for index in range(len(mylist)):, which [1] seems over-complicated as for python and [2] does not satisfy me, since range(len()) is calculated once in the beginning and if I remove an element from the list during iteration, I'll get IndexError: list index out of range.
Finally, my question is - how should I iterate over a python list, to be able to remove elements from the list (using a comparator function and put them in another list)?
You can use enumerate function and make a temporary copy of the list:
for i, value in enumerate(old_list[:]):
# i == index
# value == dictionary
# you can safely remove from old_list because we are iterating over copy
Creating a new list really isn't much of a problem compared to removing items from the old one. Similarly, iterating twice is a very minor performance hit, probably swamped by other factors. Unless you have a very good reason to do otherwise, backed by profiling your code, I'd recommend iterating twice and building two new lists:
from itertools import ifilter, ifilterfalse
l1 = list(ifilter(condition, l))
l2 = list(ifilterfalse(condition, l))
You can slice-assign the contents of one of the new lists into the original if you want:
l[:] = l1
If you're absolutely sure you want a 1-pass solution, and you're absolutely sure you want to modify the original list in place instead of creating a copy, the following avoids quadratic performance hits from popping from the middle of a list:
j = 0
l2 = []
for i in range(len(l)):
if condition(l[i]):
l[j] = l[i]
j += 1
else:
l2.append(l[i])
del l[j:]
We move each element of the list directly to its final position without wasting time shifting elements that don't really need to be shifted. We could use for item in l if we wanted, and it'd probably be a bit faster, but when the algorithm involves modifying the thing we're iterating over, I prefer the explicit index.
I prefer not to touch the original list and do as #Martol1ni, but one way to do it in place and not be affected by the removal of elements would be to iterate backwards:
for i in reversed(range(len()):
# do the filtering...
That will affect only the indices of elements that you have tested/removed already
Try the filter command, and you can override the original list with it too if you don't need it.
def cmp(i): #Comparator function returning a boolean for a given item
...
# mylist is the initial list
mylist = filter(cmp, mylist)
mylist is now a generator of suitable items. You can use list(mylist) if you need to use it more than once.
Haven't tried this yet but.. i'll give it a quick shot:
new_list = [old.pop(i) for i, x in reversed(list(enumerate(old))) if comparator(x)]
You can do this, might be one line too much though.
new_list1 = [x for x in old_list if your_comparator(x)]
new_list2 = [x for x in old_list if x not in new_list1]

Comparisons with loop in python

I have a problem:
list = [1,2,3,4,5]
a= 3
if a==[item for item in list]:
print(sth)
why the program never print?
thanks...
You're comparing an integer to a list, which will never return True as they are different types. Note that [item for item in list] is exactly the same as just saying list.
You're probably wondering if 3 is in the list; so you can do:
if a in list:
print(sth)
Or even:
if any(a == item for item in list):
print(sth)
(Although you really should just use the first option. I only put the second option in as it looks similar to your example :p)
As a side note, you shouldn't be naming lists list, or dictionaries dict, as they are built-in types already, and you're just overriding them :p.

Categories