Python: Joining characters in sublists of list of lists - python

I have a huge list of lists, this is a section of it:
[['cusA', 'zupT', 'rcnA', 'cusA', 'zupT', 'zupT']]
I did the following operation on the entire list of lists:
[list(x) for x in set(tuple(x) for x in my_list)]
because I would like to have unique information in the sublists. This returned the following:
[['c', 'u', 's', 'A'], ['r', 'c', 'n', 'A'], ['z', 'u', 'p', 'T']]
Which is great, since it did become unique, but now I need them to be in their original from, without being broken up character-by-character.
Is there any way to re-join them inside the sublists?

Instead of list(x), use ''.join(x).
But you can just put the strings themselves in a set instead of calling tuple: list(set(my_list)).

If the ordering of the contents of the inner lists does not matter, you can turn them into a set, which is a an un-ordered collection of unique elements, and then turn that set back into a list:
result = [list(set(li)) for li in my_list]
Prints:
[['cusA', 'rcnA', 'zupT']]

as you already mentioned: you can join the strings:
print(''.join(['c', 'u', 's', 'A'])) # cusA
for your whole list you could do this:
lst = [['c', 'u', 's', 'A'], ['r', 'c', 'n', 'A'], ['z', 'u', 'p', 'T']]
str_lst = [''.join(item) for item in lst]
print(str_lst) # ['cusA', 'rcnA', 'zupT']
note that there is no point in creating a list of single characters; a string itself behaves exactly like a list of characters (an immutable one, though); so you could directoy do this:
print(set(['cusA', 'zupT', 'rcnA', 'cusA', 'zupT', 'zupT']))
# {'zupT', 'cusA', 'rcnA'}
# if you need a list again instead of a set:
print(list(set(['cusA', 'zupT', 'rcnA', 'cusA', 'zupT', 'zupT'])))
# ['zupT', 'cusA', 'rcnA']
that will not preserve the order though...

Related

Python rearrange list based on another list

I want to rearrange a list based on another list which have common elements between them.
my list = ['q','s','b','f','l','c','x','a']
base_list = ['z','a','b','c']
Above lists have common 'a','b' and 'c' as common elements.the expected outcome for is as below
my_result = ['a','b','c','q','s','f','l','x']
Thanks in Advance
Sky
my_list = ['q','s','b','f','l','c','x','a']
base_list = ['z','a','b','c']
res1=[x for x in base_list if x in my_list] # common elements
res2=[x for x in my_list if x not in res1] #
res3=res1+res2
Output :
['a', 'b', 'c', 'q', 's', 'f', 'l', 'x']
Create a custom key for sorted as shown in this document. Set the value arbitrarily high for the letters that don't appear in the base_list so they end up in the back. Since sorted is considered stable those that aren't in the base_list will remain untouched in terms of original order.
l = ['q','s','b','f','l','c','x','a']
base_list = ['z','a','b','c']
def custom_key(letter):
try:
return base_list.index(letter)
except ValueError:
return 1_000
sorted(l, key=custom_key)
['a', 'b', 'c', 'q', 's', 'f', 'l', 'x']
A (probably non optimal) way:
>>> sorted(my_list, key=lambda x: base_list.index(x) if x in base_list
else len(base_list)+1)
['a', 'b', 'c', 'q', 's', 'f', 'l', 'x']

Unwanted side effects when flattening a list [duplicate]

This question already has answers here:
Content of list change unexpected in Python 3
(2 answers)
Closed 2 years ago.
I was just testing some algorithms to flatten a list, so I created 3 lists inside a list, and then tried to flatten it. I never touch the original list, the variables are named different, but when I try to see the original list, it has been modified, any idea why this is happening?
In [63]: xxx = [['R', 'L', 'D'], ['U', 'O', 'E'], ['C', 'S', 'O']]
In [64]: def flat_ind(lst):
...: one = lst[0]
...: for l in lst[1:]:
...: one += l
...: return one
...:
In [65]: flat = flat_ind(xxx)
In [66]: flat
Out[66]: ['R', 'L', 'D', 'U', 'O', 'E', 'C', 'S', 'O']
In [67]: xxx
Out[67]:
[['R', 'L', 'D', 'U', 'O', 'E', 'C', 'S', 'O'],
['U', 'O', 'E'],
['C', 'S', 'O']]
I understand that one is still pointing to the original lst and that is the reason it is modifying it, but still, I though that, since this was inside a function, it would not happen, more importantly
how do I make this not happen?
Thanks!
"I understand that one is still pointing to the original lst and that is the reason it is modifying it, but still, I though that, since this was inside a function, it would not happen,"
That doesn't make any sense. It doesn't matter where you mutate an object, it will still be mutated.
In any case, the mutation occurs because of this:
one += l
which is an in-place modification. You could use
one = on + l
instead, but that would be highly inefficient. As others have pointed out, you could just copy that first list,
one = lst[0][:]
But the idiomatic way to flatten a regularly nested list like this is to simply:
flat = [x for sub in xxx for x in sub]
Or,
from itertools import chain
flat = list(chain.from_iterable(xxx))

How to create a new sublist from a list whose length depends on another list [duplicate]

This question already has answers here:
Splitting a string by list of indices
(4 answers)
Closed 4 years ago.
I would like to create a new list with sub-lists inside whose length depends on another list, for example I have:
a = [1,3,2,5,4]
b = ['a','b','c','d','e','f','g','h','i','l','m','n','o','p','q']
and I would like to have a nested list of the form:
[['a'],
['b', 'c', 'd'],
['e', 'f'],
['g', 'h', 'i', 'l', 'm'],
['n', 'o', 'p', 'q']]
Steps to solve this:
create an empty list
create a int done=0 that tells you how many things you already sliced from your data
loop over all elements of your "how to cut the other list in parts"-list
slice from done to done + whatever the current element of your "how to cut the other list in parts" is and append it to the empty list
increment done by whatever you just sliced
add the (if needed) remainder of your data
print it.
Doku:
Understanding Python's slice notation
How to "properly" print a list?
and infos about looping here: looping techniques PyTut
You can read about why we do not solve your homework for you here: open letter to students with homework problems
a = [1,3,2,5,4]
b = ['a','b','c','d','e','f','g','h','i','l','m','n','o','p','q']
out=[]
for number in a:
out.append(b[:number])
b=b[number:]
print(out)
#[['a'], ['b', 'c', 'd'], ['e', 'f'], ['g', 'h', 'i', 'l', 'm'], ['n', 'o', 'p', 'q']]
Description
The out is the final output list. The loop iterates through each element in list a (say 'number') and appends a list of that many elements from the start of list b to our output list. Then it proceeds to update list b so that those elements are removed.

String vs list membership check

So i'm wondering why this:
'alpha' in 'alphanumeric'
is True, but
list('alpha') in list('alphanumeric')
is False.
Why does x in s succeed when x is a substring of s, but x in l doesn't when x is a sublist of l?
When you use list function with any iterable, a new list object will be created with all the elements from the iterable as individual elements in the list.
In your case, strings are valid Python iterables, so
>>> list('alpha')
['a', 'l', 'p', 'h', 'a']
>>> list('alphanumeric')
['a', 'l', 'p', 'h', 'a', 'n', 'u', 'm', 'e', 'r', 'i', 'c']
So, you are effectively checking if one list is a sublist of another list.
In Python only Strings have the in operator to check if one string is part of another string. For all other collections, you can only use individual members. Quoting the documentation,
The operators in and not in test for collection membership. x in s evaluates to true if x is a member of the collection s, and false otherwise. x not in s returns the negation of x in s. The collection membership test has traditionally been bound to sequences; an object is a member of a collection if the collection is a sequence and contains an element equal to that object. However, it make sense for many other object types to support membership tests without being a sequence. In particular, dictionaries (for keys) and sets support membership testing.
For the list and tuple types, x in y is true if and only if there exists an index i such that x == y[i] is true.
For the Unicode and string types, x in y is true if and only if x is a substring of y. An equivalent test is y.find(x) != -1. Note, x and y need not be the same type; consequently, u'ab' in 'abc' will return True. Empty strings are always considered to be a substring of any other string, so "" in "abc" will return True.
For the second one you are asking if
['a', 'l', 'p', 'h', 'a'] in ['a', 'l', 'p', 'h', 'a', 'n', 'u', 'm', 'e', 'r', 'i', 'c']
and there is no sub-list in the second list only characters.
['a', 'l', 'p', 'h', 'a'] in [['a', 'l', 'p', 'h', 'a'], ['b', 'e', 't', 'a']]
would be true
lists determine membership if an item is equal to one of the list members.
strs determine whether string a is in string b if a substring of b is equal to a.
I suppose you are looking for the fact that string and list has different implementations of __contains__ magic method.
https://docs.python.org/2/reference/datamodel.html#object.contains
This is why 'alpha' in 'alphanumeric' is True, but
list('alpha') in list('alphanumeric') is False
maybe you should try issubset method.
>>> set('alpha').issubset(set('alphanumeric'))
True
although set('alpha') returns set(['a', 'p', 'l', 'h']), and set('alphanumeric'), set(['a', 'c', 'e', 'i', 'h', 'm', 'l', 'n', 'p', 'r', 'u']).
set method makes a list ignoring repetetive elements.

Ordered Sets Python 2.7

I have a list that I'm attempting to remove duplicate items from. I'm using python 2.7.1 so I can simply use the set() function. However, this reorders my list. Which for my particular case is unacceptable.
Below is a function I wrote; which does this. However I'm wondering if there's a better/faster way. Also any comments on it would be appreciated.
def ordered_set(list_):
newlist = []
lastitem = None
for item in list_:
if item != lastitem:
newlist.append(item)
lastitem = item
return newlist
The above function assumes that none of the items will be None, and that the items are in order (ie, ['a', 'a', 'a', 'b', 'b', 'c', 'd'])
The above function returns ['a', 'a', 'a', 'b', 'b', 'c', 'd'] as ['a', 'b', 'c', 'd'].
Another very fast method with set:
def remove_duplicates(lst):
dset = set()
# relies on the fact that dset.add() always returns None.
return [item for item in lst
if item not in dset and not dset.add(item)]
Use an OrderedDict:
from collections import OrderedDict
l = ['a', 'a', 'a', 'b', 'b', 'c', 'd']
d = OrderedDict()
for x in l:
d[x] = True
# prints a b c d
for x in d:
print x,
print
Assuming the input sequence is unordered, here's O(N) solution (both in space and time).
It produces a sequence with duplicates removed, while leaving unique items in the same relative order as they appeared in the input sequence.
>>> def remove_dups_stable(s):
... seen = set()
... for i in s:
... if i not in seen:
... yield i
... seen.add(i)
>>> list(remove_dups_stable(['q', 'w', 'e', 'r', 'q', 'w', 'y', 'u', 'i', 't', 'e', 'p', 't', 'y', 'e']))
['q', 'w', 'e', 'r', 'y', 'u', 'i', 't', 'p']
I know this has already been answered, but here's a one-liner (plus import):
from collections import OrderedDict
def dedupe(_list):
return OrderedDict((item,None) for item in _list).keys()
>>> dedupe(['q', 'w', 'e', 'r', 'q', 'w', 'y', 'u', 'i', 't', 'e', 'p', 't', 'y', 'e'])
['q', 'w', 'e', 'r', 'y', 'u', 'i', 't', 'p']
I think this is perfectly OK. You get O(n) performance which is the best you could hope for.
If the list were unordered, then you'd need a helper set to contain the items you've already visited, but in your case that's not necessary.
if your list isn't sorted then your question doesn't make sense.
e.g. [1,2,1] could become [1,2] or [2,1]
if your list is large you may want to write your result back into the same list using a SLICE to save on memory:
>>> x=['a', 'a', 'a', 'b', 'b', 'c', 'd']
>>> x[:]=[x[i] for i in range(len(x)) if i==0 or x[i]!=x[i-1]]
>>> x
['a', 'b', 'c', 'd']
for inline deleting see Remove items from a list while iterating or Remove items from a list while iterating without using extra memory in Python
one trick you can use is that if you know x is sorted, and you know x[i]=x[i+j] then you don't need to check anything between x[i] and x[i+j] (and if you don't need to delete these j values, you can just copy the values you want into a new list)
So while you can't beat n operations if everything in the set is unique i.e. len(set(x))=len(x)
There is probably an algorithm that has n comparisons as its worst case but can have n/2 comparisons as its best case (or lower than n/2 as its best case if you know somehow know in advance that len(x)/len(set(x))>2 because of the data you've generated):
The optimal algorithm would probably use binary search to find maximum j for each minimum i in a divide and conquer type approach. Initial divisions would probably be of length len(x)/approximated(len(set(x))). Hopefully it could be carried out such that even if len(x)=len(set(x)) it still uses only n operations.
There is unique_everseen solution described in
http://docs.python.org/2/library/itertools.html
def unique_everseen(iterable, key=None):
"List unique elements, preserving order. Remember all elements ever seen."
# unique_everseen('AAAABBBCCDAABBB') --> A B C D
# unique_everseen('ABBCcAD', str.lower) --> A B C D
seen = set()
seen_add = seen.add
if key is None:
for element in ifilterfalse(seen.__contains__, iterable):
seen_add(element)
yield element
else:
for element in iterable:
k = key(element)
if k not in seen:
seen_add(k)
yield element
Looks ok to me. If you really want to use sets do something like this:
def ordered_set (_list) :
result = set()
lastitem = None
for item in _list :
if item != lastitem :
result.add(item)
lastitem = item
return sorted(tuple(result))
I don't know what performance you will get, you should test it; probably the same because of method's overheat!
If you really are paranoid, just like me, read here:
http://wiki.python.org/moin/HowTo/Sorting/
http://wiki.python.org/moin/PythonSpeed/PerformanceTips
Just remembered this(it contains the answer):
http://www.peterbe.com/plog/uniqifiers-benchmark

Categories