Splitting a list - python

I've scoured various resources and can't figure out how to do a rather simple operation.
Right now, I have a list as follows:
li = [['a=b'],['c=d']]
I want to transform this into:
li = [['a','b'],['c','d']]
As I understand it, split("=") only applies to string types. Is there an equivalent method for lists?
Pardon the simplicity of my question...
-Dan

You want this:
[x[0].split('=') for x in li]
# prints [['a', 'b'], ['c', 'd']]
To grab a question from a comment further down the post, the reason split works for x[0] is that x represents the inner list. That's accomplished by the for x in li. Also, I fixed mine to read for x in li and not for x in test as I had assigned your examples to a variable called 'test' on my system.

You can use map():
>>> li = [['a=b'],['c=d']]
>>> map(lambda x: x[0].split('='), li)
[['a', 'b'], ['c', 'd']]
This traverses the list li and applies the lambda function to every element. As every element of the list is again a list with one element, x[0] takes this element, which is a string, splits it and returns a new list with both values.

Warning - its been a while since I did any python, but your issue is more general.
You are correct in that split applies to strings.
What you need to do is split the VALUE contained in your list not the list itself.
So you would do something like
newValue = split('=', li[0][0])
li[0] = newValue

Is this what you are looking for ?
map(lambda y:y.split('='),map(lambda x:x[0], li))

You can do it with this:
[k[0].split("=") for k in li]

Presuming each sublist consists of individual strings of the form a=b:
>>> [el[i].split('=') for el in li for i in range(len(el))]
[['a', 'b'], ['c', 'd']]
(Indeed, what you're splitting is the inner string a=b. So the split() string method works fine.)
EDIT: A much more elegant way of doing this double list comprehension is:
>>> [a.split('=') for el in li for a in el]
[['a', 'b'], ['c', 'd']]
There have been a number of good suggestions made, so the OP should be able to learn a good amount of Python for it. Important to remember is that what is being split is li[i][j], ie an item of the list that is an item of the list li.

Related

Replace elements in a list of lists python

I have a list of lists as follows:
list=[]
*some code to append elements to list*
list=[['a','bob'],['a','bob'],['a','john']]
I want to go through this list and change all instances of 'bob to 'b' and leave others unchanged.
for x in list:
for a in x:
if "bob" in a:
a.replace("bob", 'b')
After printing out x it is still the same as list, but not as follows:
list=[['a','b'],['a','b'],['a','john']]
Why is the change not being reflected in list?
Because str.replace doesn't work in-place, it returns a copy. As immutable objects, you need to assign the strings to elements in your list of lists.
You can assign directly to your list of lists if you extract indexing integers via enumerate:
L = [['a','bob'],['a','bob'],['a','john']]
for i, x in enumerate(L):
for j, a in enumerate(x):
if 'bob' in a:
L[i][j] = a.replace('bob', 'b')
Result:
[['a', 'b'], ['a', 'b'], ['a', 'john']]
More Pythonic would be to use a list comprehension to create a new list. For example, if only the second of two values contains names which need checking:
L = [[i, j if j != 'bob' else 'b'] for i, j in L]
You can try using a dictionary object of python
import numpy as np
L = [['a','bob'],['a','bob'],['a','john']]
dic = {'bob':'b'} # you can specify more changes here
new_list = [dic.get(n, n) for n in np.concatenate(L)]
print(np.reshape(new_list,[-1,2]).tolist())
Result is
[['a', 'b'], ['a', 'b'], ['a', 'john']]
I'm going to use a simple example, but basically x is another variable and isn't linked to the list element. You have to change the list element directly in order to alter the list.
l=[1,2,3,4]
for x in l:
x=x+1
This doesn't change the list
l=[1,2,3,4]
for i,x in enumerate(l):
l[i]=x+1
this changes the list
I might be a little to the party, but a more Pythonic way of doing this is using a map and a list comprehension. It can operate on a list of the list with any number of values.
l = [['a','bob'],['a','bob'],['a','john']]
[list(map(lambda x: x if x != 'bob' else 'b', i)) for i in l]
it gives you the desired output
[['a', 'b'], ['a', 'b'], ['a', 'john']]
The main idea is that the inner loop is iterating through the inner loop and using the simple lambda function to perform the replacement.
I hope that this helps anyone else who is looking out for something similar.
This is the case because you are only changing the temporary variable a.
list = [1,2,3]
for i in list:
i+=1
list will still be [1,2,3]
you have to edit the string based on its index in the list

Categorize list in Python

What is the best way to categorize a list in python?
for example:
totalist is below
totalist[1] = ['A','B','C','D','E']
totalist[2] = ['A','B','X','Y','Z']
totalist[3] = ['A','F','T','U','V']
totalist[4] = ['A','F','M','N','O']
Say I want to get the list where the first two items are ['A','B'], basically list[1] and list[2]. Is there an easy way to get these without iterate one item at a time? Like something like this?
if ['A','B'] in totalist
I know that doesn't work.
You could check the first two elements of each list.
for totalist in all_lists:
if totalist[:2] == ['A', 'B']:
# Do something.
Note: The one-liner solutions suggested by Kasramvd are quite nice too. I found my solution more readable. Though I should say comprehensions are slightly faster than regular for loops. (Which I tested myself.)
Just for fun, itertools solution to push per-element work to the C layer:
from future_builtins import map # Py2 only; not needed on Py3
from itertools import compress
from operator import itemgetter
# Generator
prefixes = map(itemgetter(slice(2)), totalist)
selectors = map(['A','B'].__eq__, prefixes)
# If you need them one at a time, just skip list wrapping and iterate
# compress output directly
matches = list(compress(totalist, selectors))
This could all be one-lined to:
matches = list(compress(totalist, map(['A','B'].__eq__, map(itemgetter(slice(2)), totalist))))
but I wouldn't recommend it. Incidentally, if totalist might be a generator, not a re-iterable sequence, you'd want to use itertools.tee to double it, adding:
totalist, forselection = itertools.tee(totalist, 2)
and changing the definition of prefixes to map over forselection, not totalist; since compress iterates both iterators in parallel, tee won't have meaningful memory overhead.
Of course, as others have noted, even moving to C, this is a linear algorithm. Ideally, you'd use something like a collections.defaultdict(list) to map from two element prefixes of each list (converted to tuple to make them legal dict keys) to a list of all lists with that prefix. Then, instead of linear search over N lists to find those with matching prefixes, you just do totaldict['A', 'B'] and you get the results with O(1) lookup (and less fixed work too; no constant slicing).
Example precompute work:
from collections import defaultdict
totaldict = defaultdict(list)
for x in totalist:
totaldict[tuple(x[:2])].append(x)
# Optionally, to prevent autovivification later:
totaldict = dict(totaldict)
Then you can get matches effectively instantly for any two element prefix with just:
matches = totaldict['A', 'B']
You could do this.
>>> for i in totalist:
... if ['A','B']==i[:2]:
... print i
Basically you can't do this in python with a nested list. But if you are looking for an optimized approach here are some ways:
Use a simple list comprehension, by comparing the intended list with only first two items of sub lists:
>>> [sub for sub in totalist if sub[:2] == ['A', 'B']]
[['A', 'B', 'C', 'D', 'E'], ['A', 'B', 'X', 'Y', 'Z']]
If you want the indices use enumerate:
>>> [ind for ind, sub in enumerate(totalist) if sub[:2] == ['A', 'B']]
[0, 1]
And here is a approach in Numpy which is pretty much optimized when you are dealing with large data sets:
>>> import numpy as np
>>>
>>> totalist = np.array([['A','B','C','D','E'],
... ['A','B','X','Y','Z'],
... ['A','F','T','U','V'],
... ['A','F','M','N','O']])
>>> totalist[(totalist[:,:2]==['A', 'B']).all(axis=1)]
array([['A', 'B', 'C', 'D', 'E'],
['A', 'B', 'X', 'Y', 'Z']],
dtype='|S1')
Also as an alternative to list comprehension in python if you don't want to use a loop and you are looking for a functional way, you can use filter function, which is not as optimized as a list comprehension:
>>> list(filter(lambda x: x[:2]==['A', 'B'], totalist))
[['A', 'B', 'C', 'D', 'E'], ['A', 'B', 'X', 'Y', 'Z']]
You imply that you are concerned about performance (cost). If you need to do this, and if you are worried about performance, you need a different data-structure. This will add a little "cost" when you making the lists, but save you time when filtering them.
If the need to filter based on the first two elements is fixed (it doesn't generalise to the first n elements) then I would add the lists, as they are made, to a dict where the key is a tuple of the first two elements, and the item is a list of lists.
then you simply retrieve your list by doing a dict lookup. This is easy to do and will bring potentially large speed ups, at almost no cost in memory and time while making the lists.

Remove list element without mutation

Assume you have a list
>>> m = ['a','b','c']
I'd like to make a new list n that has everything except for a given item in m (for example the item 'a'). However, when I use
>>> m.remove('a')
>>> m
m = ['b', 'c']
the original list is mutated (the value 'a' is removed from the original list). Is there a way to get a new list sans-'a' without mutating the original? So I mean that m should still be [ 'a', 'b', 'c' ], and I will get a new list, which has to be [ 'b', 'c' ].
I assume you mean that you want to create a new list without a given element, instead of changing the original list. One way is to use a list comprehension:
m = ['a', 'b', 'c']
n = [x for x in m if x != 'a']
n is now a copy of m, but without the 'a' element.
Another way would of course be to copy the list first
m = ['a', 'b', 'c']
n = m[:]
n.remove('a')
If removing a value by index, it is even simpler
n = m[:index] + m[index+1:]
There is a simple way to do that using built-in function :filter .
Here is ax example:
a = [1, 2, 3, 4]
b = filter(lambda x: x != 3, a)
If the order is unimportant, you can use set (besides, the removal seems to be fast in sets):
list(set(m) - set(['a']))
This will remove duplicate elements from your original list though
We can do it via built-in copy() function for list;
However, should assign a new name for the copy;
m = ['a','b','c']
m_copy=m.copy()
m_copy.remove('a')
print (m)
['a', 'b', 'c']
print(m_copy)
['b', 'c']
You can create a new list without the offending element with a list-comprehension. This will preserve the value of the original list.
l = ['a', 'b', 'c']
[s for s in l if s != 'a']
Another approach to list comprehension is numpy:
>>> import numpy
>>> a = [1, 2, 3, 4]
>>> list(numpy.remove(a, a.index(3)))
[1, 2, 4]
We can do it without using in built remove function and also without creating new list variable
Code:
# List m
m = ['a', 'b', 'c']
# Updated list m, without creating new list variable
m = [x for x in m if x != a]
print(m)
output
>>> ['b', 'c']
The question is useful as I sometimes have a list that I use throughout my given script but I need to at a certain step to apply a logic on a subset of the list elements. In that case I found it useful to use the same list but only exclude the needed element for that individual step, without the need to create a totally new list with a different name. For this you can use either:
list comprehension: say you have l=['a','b','c'] to exclude b, you can have [x for x in l if x!='b']
set [only if order is unimortant]: list(set(l) - set(['b'])), pay attention here that you pass 'b' as list ['b']

How to numerically sort string in list

I have a list inside of a list, and the inner list has strings of numbers (float) and words.
What I need to sort the list by, is in position list[0]. So for example,
list = [['8.34', 'a'],['3.55', 'c'],['5.92', 'b']]
I'm trying to sort the list numerically to look like
list = [['3.55', 'c'],['5.92', 'b'],['8.34', 'a']]
I've tried
sorted(list, key = float)
but I get an error message: 'float() argument must be a string or a number' and I've tried using lambda as well. Neither works. Could someone help please?
You can try passing a lambda function.:
sorted(my_list, key = lambda x : float(x[0]))
x will be an element of the list (which is also a list, because my_list is a list of lists), and float(x[0]) will return the float representation of the first element of that list.
Demo:
>>> my_list = [['8.34', 'a'],['3.55', 'c'],['5.92', 'b']]
>>> print sorted(my_list, key = lambda x : float(x[0]))
[['3.55', 'c'], ['5.92', 'b'], ['8.34', 'a']]
Note:
Don't use list as the name of a variable, because you will hide its built-in implementation.
Your list contains lists, so you cannot use float directly. You need to use a function that returns float value of the first item in each list.
>>> lis = [['8.34', 'a'],['3.55', 'c'],['5.92', 'b']]
>>> lis.sort(key=lambda x: float(x[0]))
>>> lis
[['3.55', 'c'], ['5.92', 'b'], ['8.34', 'a']]
This earlier answer can be used in your case also:
How to sort a list of lists by a specific index of the inner list?
from operator import itemgetter
list = [['8.34', 'a'],['3.55', 'c'],['5.92', 'b']]
print sorted(list, key=itemgetter(0))
gives the desired output:
[['3.55', 'c'], ['5.92', 'b'], ['8.34', 'a']]

The condition skips 2 members of a list [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Modifying list while iterating
I have been given a task to make a code in python that will remove all members that occures more than once in a list and leave 1 copy of it.
Condition: it should be case-insesitive
So I've written down the following code:
string = raw_input()
list1 = string.split(",")
low_case_list = list1[:] #for case-insesitive
for i in range(len(low_case_list)):
low_case_list[i] = low_case_list[i].lower()
for member in low_case_list:
if(low_case_list.count(member) > 1):
del list1[low_case_list.index(member)]
del low_case_list[low_case_list.index(member)]
after the input I get this list: [a,b,c,d,A,B,C,D,a,b,c,d]
and after I do the operation on it: [B,D,a,b,c,d]
my question is, why it skips 'B' and 'D' when it removes the members?
Why not just convert your list into a set with all elements converted to lower-case, and then back to a list. You can use a generator for converting every element to lowercase.
You can do it like this: -
>>> l = ['a', 'b', 'c', 'A', 'B', 'C', 'a', 'b', 'c']
>>> new_list = list(set(elem.lower() for elem in l))
>>> new_list
['a', 'c', 'b']
Note that, order may be changed because, set does not maintain the order of it's elements.
You could try something like this instead:
input = raw_input().split(',')
unique = set([s.lower() for s in input])
result = list(unique)
Try this, should be simple.
Given your list li:
lowcase = [elem.lower() for elem in li]
output = []
for el in lowcase:
if el not in output: output.append(el)
return output # if necessary, otherwise a simple li = output
Or, in a faster and more elegant way, you could replace the whole for loop with:
[output.append(el) for el in lowcase if el not in output]
Your code should be buggy because you refer to the index of the element, but the list changes size during the loop, so indices change too.
EDIT: didn't think about sets, obviously they're the best solution here.

Categories