Unwanted side effects when flattening a list [duplicate] - python

This question already has answers here:
Content of list change unexpected in Python 3
(2 answers)
Closed 2 years ago.
I was just testing some algorithms to flatten a list, so I created 3 lists inside a list, and then tried to flatten it. I never touch the original list, the variables are named different, but when I try to see the original list, it has been modified, any idea why this is happening?
In [63]: xxx = [['R', 'L', 'D'], ['U', 'O', 'E'], ['C', 'S', 'O']]
In [64]: def flat_ind(lst):
...: one = lst[0]
...: for l in lst[1:]:
...: one += l
...: return one
...:
In [65]: flat = flat_ind(xxx)
In [66]: flat
Out[66]: ['R', 'L', 'D', 'U', 'O', 'E', 'C', 'S', 'O']
In [67]: xxx
Out[67]:
[['R', 'L', 'D', 'U', 'O', 'E', 'C', 'S', 'O'],
['U', 'O', 'E'],
['C', 'S', 'O']]
I understand that one is still pointing to the original lst and that is the reason it is modifying it, but still, I though that, since this was inside a function, it would not happen, more importantly
how do I make this not happen?
Thanks!

"I understand that one is still pointing to the original lst and that is the reason it is modifying it, but still, I though that, since this was inside a function, it would not happen,"
That doesn't make any sense. It doesn't matter where you mutate an object, it will still be mutated.
In any case, the mutation occurs because of this:
one += l
which is an in-place modification. You could use
one = on + l
instead, but that would be highly inefficient. As others have pointed out, you could just copy that first list,
one = lst[0][:]
But the idiomatic way to flatten a regularly nested list like this is to simply:
flat = [x for sub in xxx for x in sub]
Or,
from itertools import chain
flat = list(chain.from_iterable(xxx))

Related

How does Python's random.shuffle() alter the value of a list without returning anything and is this something any Python coder can do as well? [duplicate]

This question already has answers here:
Why can a function modify some arguments as perceived by the caller, but not others?
(13 answers)
Why does random.shuffle return None?
(5 answers)
Closed 2 years ago.
edit:
When I asked this I did not properly understand the concept of mutable and immutable objects, and the variables that point to them
I just noticed I wasn't getting a return from random.shuffle(). I realised this makes sense as you would logically want to work with the original list unless specified otherwise.
>>> import string
>>> import random
>>> letters = list(string.ascii_lowercase)
>>> print(letters)
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
>>> rand_alpha = random.shuffle(letters)
>>> print(rand_alpha)
None
>>> print(letters)
['f', 'c', 'n', 'u', 'x', 'y', 'q', 'j', 's', 'v', 'w', 'o', 'p', 'z', 't', 'm', 'k', 'd', 'e', 'a', 'g', 'i', 'h', 'l', 'r', 'b']
This led me to wonder whether altering a list in another scope is something people do (and should be cautious not to do accidentally) or is this something special within the Python standard library?
I checked the code - and did a few searches - but I didn't find anything that made this clearer for me.
There are functions that work in-place and others that return their result. For the second type, you have to assign their return value to something to make use of it.
Probably the easiest\most common example is the .sort list method and the sorted function; the first works in place much like your shuffle while the second returns a sorted copy of the list passed.
Does this help ?
The shuffle fun functions shuffles the list and stores it inside that list only, it does not make a separate list. If you want a separate list, you can duplicate this list and then shuffle it like this
letters=["a","v"]
lettersCopy = letters
lettersCopy.shuffle()
You should always alter dat(lists) carefully because user entered data cannot be retrieved. It is done in other scopes when people want to save storage.

How to create a new sublist from a list whose length depends on another list [duplicate]

This question already has answers here:
Splitting a string by list of indices
(4 answers)
Closed 4 years ago.
I would like to create a new list with sub-lists inside whose length depends on another list, for example I have:
a = [1,3,2,5,4]
b = ['a','b','c','d','e','f','g','h','i','l','m','n','o','p','q']
and I would like to have a nested list of the form:
[['a'],
['b', 'c', 'd'],
['e', 'f'],
['g', 'h', 'i', 'l', 'm'],
['n', 'o', 'p', 'q']]
Steps to solve this:
create an empty list
create a int done=0 that tells you how many things you already sliced from your data
loop over all elements of your "how to cut the other list in parts"-list
slice from done to done + whatever the current element of your "how to cut the other list in parts" is and append it to the empty list
increment done by whatever you just sliced
add the (if needed) remainder of your data
print it.
Doku:
Understanding Python's slice notation
How to "properly" print a list?
and infos about looping here: looping techniques PyTut
You can read about why we do not solve your homework for you here: open letter to students with homework problems
a = [1,3,2,5,4]
b = ['a','b','c','d','e','f','g','h','i','l','m','n','o','p','q']
out=[]
for number in a:
out.append(b[:number])
b=b[number:]
print(out)
#[['a'], ['b', 'c', 'd'], ['e', 'f'], ['g', 'h', 'i', 'l', 'm'], ['n', 'o', 'p', 'q']]
Description
The out is the final output list. The loop iterates through each element in list a (say 'number') and appends a list of that many elements from the start of list b to our output list. Then it proceeds to update list b so that those elements are removed.

Merging a list of strings and a list of lists

This maybe a duplicate but I couldn't find a specific answer.
I also found one answer in composing this question but would like to know if there is a better option or one which works without knowing which item is a list of strings.
My question:
la=['a', 'b', 'c']
lb=[['d','e'], ['f','g'], ['i','j']]
I would like:
[['a','d','e'], ['b','f','g'], ['c','i','j']]
I discovered the following works specifically for my example;
la=['a', 'b', 'c']
lb=[['d','e'], ['f','g'], ['i','j']]
[ [x] + y for x,y in zip(la, lb)]
[['a', 'd', 'e'], ['b', 'f', 'g'], ['c', 'i', 'j']]
It works because I make the string list into a list before concatenating and avoids the TypeError: cannot concatenate 'str' and 'list' objects
Is there a more elegant solution?
You can use numpy.column_stack:
>>> la=['a', 'b', 'c']
>>> lb=[['d','e'], ['f','g'], ['i','j']]
>>> import numpy as np
>>> np.column_stack((la,lb))
array([['a', 'd', 'e'],
['b', 'f', 'g'],
['c', 'i', 'j']],
dtype='|S1')
If you want an expression I can't think of anything better than using zip as above. If you want to explicitly insert elements elements from la into elements of lb at their heads, I'd do
for i in range( len(la) ):
lb[i].insert(0, la[i])
which avoids having to know what zip is or does. Maybe also first check:
if len(la) != len(lb) : raise IndexError, "List lengths differ"
without that it'll "work" when lb is longer than la. BTW This isn't exactly the same wrt corner cases / duck typing. Seems safer to use insert, which method should exist only for a list-like object, than "+".
Also, purely personally, I'd write the above on one line
for i in range( len(la) ): lb[i].insert(0, la[i])

Removing item from list causes the list to become NoneType [duplicate]

This question already has answers here:
Why do these list operations (methods: clear / extend / reverse / append / sort / remove) return None, rather than the resulting list?
(6 answers)
Closed 5 months ago.
I imagine there is a simple solution that I am overlooking. Better that than a complicated one, right?
Simply put:
var = ['p', 's', 'c', 'x', 'd'].remove('d')
causes var to be of type None. What is going on here?
remove doesn't return anything. It modifies the existing list in-place. No assignment needed.
Replace
var = ['p', 's', 'c', 'x', 'd'].remove('d')
with
var = ['p', 's', 'c', 'x', 'd']
var.remove('d')
Now var will have a value of ['p', 's', 'c', 'x'].
remove mutates the list in-place, and returns None. You have to put it in a variable, and then change that:
>>> var = ['p', 's', 'c', 'x', 'd']
>>> var.remove('d') # Notice how it doesn't return anything.
>>> var
['p', 's', 'c', 'x']

Ordered Sets Python 2.7

I have a list that I'm attempting to remove duplicate items from. I'm using python 2.7.1 so I can simply use the set() function. However, this reorders my list. Which for my particular case is unacceptable.
Below is a function I wrote; which does this. However I'm wondering if there's a better/faster way. Also any comments on it would be appreciated.
def ordered_set(list_):
newlist = []
lastitem = None
for item in list_:
if item != lastitem:
newlist.append(item)
lastitem = item
return newlist
The above function assumes that none of the items will be None, and that the items are in order (ie, ['a', 'a', 'a', 'b', 'b', 'c', 'd'])
The above function returns ['a', 'a', 'a', 'b', 'b', 'c', 'd'] as ['a', 'b', 'c', 'd'].
Another very fast method with set:
def remove_duplicates(lst):
dset = set()
# relies on the fact that dset.add() always returns None.
return [item for item in lst
if item not in dset and not dset.add(item)]
Use an OrderedDict:
from collections import OrderedDict
l = ['a', 'a', 'a', 'b', 'b', 'c', 'd']
d = OrderedDict()
for x in l:
d[x] = True
# prints a b c d
for x in d:
print x,
print
Assuming the input sequence is unordered, here's O(N) solution (both in space and time).
It produces a sequence with duplicates removed, while leaving unique items in the same relative order as they appeared in the input sequence.
>>> def remove_dups_stable(s):
... seen = set()
... for i in s:
... if i not in seen:
... yield i
... seen.add(i)
>>> list(remove_dups_stable(['q', 'w', 'e', 'r', 'q', 'w', 'y', 'u', 'i', 't', 'e', 'p', 't', 'y', 'e']))
['q', 'w', 'e', 'r', 'y', 'u', 'i', 't', 'p']
I know this has already been answered, but here's a one-liner (plus import):
from collections import OrderedDict
def dedupe(_list):
return OrderedDict((item,None) for item in _list).keys()
>>> dedupe(['q', 'w', 'e', 'r', 'q', 'w', 'y', 'u', 'i', 't', 'e', 'p', 't', 'y', 'e'])
['q', 'w', 'e', 'r', 'y', 'u', 'i', 't', 'p']
I think this is perfectly OK. You get O(n) performance which is the best you could hope for.
If the list were unordered, then you'd need a helper set to contain the items you've already visited, but in your case that's not necessary.
if your list isn't sorted then your question doesn't make sense.
e.g. [1,2,1] could become [1,2] or [2,1]
if your list is large you may want to write your result back into the same list using a SLICE to save on memory:
>>> x=['a', 'a', 'a', 'b', 'b', 'c', 'd']
>>> x[:]=[x[i] for i in range(len(x)) if i==0 or x[i]!=x[i-1]]
>>> x
['a', 'b', 'c', 'd']
for inline deleting see Remove items from a list while iterating or Remove items from a list while iterating without using extra memory in Python
one trick you can use is that if you know x is sorted, and you know x[i]=x[i+j] then you don't need to check anything between x[i] and x[i+j] (and if you don't need to delete these j values, you can just copy the values you want into a new list)
So while you can't beat n operations if everything in the set is unique i.e. len(set(x))=len(x)
There is probably an algorithm that has n comparisons as its worst case but can have n/2 comparisons as its best case (or lower than n/2 as its best case if you know somehow know in advance that len(x)/len(set(x))>2 because of the data you've generated):
The optimal algorithm would probably use binary search to find maximum j for each minimum i in a divide and conquer type approach. Initial divisions would probably be of length len(x)/approximated(len(set(x))). Hopefully it could be carried out such that even if len(x)=len(set(x)) it still uses only n operations.
There is unique_everseen solution described in
http://docs.python.org/2/library/itertools.html
def unique_everseen(iterable, key=None):
"List unique elements, preserving order. Remember all elements ever seen."
# unique_everseen('AAAABBBCCDAABBB') --> A B C D
# unique_everseen('ABBCcAD', str.lower) --> A B C D
seen = set()
seen_add = seen.add
if key is None:
for element in ifilterfalse(seen.__contains__, iterable):
seen_add(element)
yield element
else:
for element in iterable:
k = key(element)
if k not in seen:
seen_add(k)
yield element
Looks ok to me. If you really want to use sets do something like this:
def ordered_set (_list) :
result = set()
lastitem = None
for item in _list :
if item != lastitem :
result.add(item)
lastitem = item
return sorted(tuple(result))
I don't know what performance you will get, you should test it; probably the same because of method's overheat!
If you really are paranoid, just like me, read here:
http://wiki.python.org/moin/HowTo/Sorting/
http://wiki.python.org/moin/PythonSpeed/PerformanceTips
Just remembered this(it contains the answer):
http://www.peterbe.com/plog/uniqifiers-benchmark

Categories