Splitting string values in list into individual values, Python - python

I have a list of values in which some values are words separated by commas, but are considered single strings as shown:
l = ["a",
"b,c",
"d,e,f"]
#end result should be
#new_list = ['a','b','c','d','e','f']
I want to split those strings and was wondering if there's a one liner or something short to do such a mutation. So far what, I was thinking of just iterating through l and .split(',')-ing all the elements then merging, but that seems like it would take a while to run.

import itertools
new_list = list(itertools.chain(*[x.split(',') for x in l]))
print(new_list)
>>> ['a', 'b', 'c', 'd', 'e', 'f']

Kind of unusual but you could join all your elements with , and then split them:
l = ["a",
"b,c",
"d,e,f"]
newList = ','.join(l).split(',')
print(newList)
Output:
['a', 'b', 'c', 'd', 'e', 'f']

Here's a one-liner using a (nested) list comprehension:
new_list = [item for csv in l for item in csv.split(',')]
See it run here.

Not exactly a one-liner, but 2 lines:
>>> l = ["a",
"b,c",
"d,e,f"]
>>> ll =[]
>>> [ll.extend(x.split(',')) for x in l]
[None, None, None]
>>> ll
['a', 'b', 'c', 'd', 'e', 'f']
The accumulator needs to be created separately since x.split(',') can not be unpacked inside a comprehension.

Related

Delete a list from collection of lists

I have collection of lists like this:
example = [['a','b','c'],['d','e','f'],[ ],['z'],['g','h','i'],[ ],['z']]
I want to remove [] and ['z'] from the list.
The desired output is:
example = [['a','b','c'],['d','e','f'],['g','h','i']]
How can I do it? Can I remove both using an one liner?
I am familiar with .pop() and .remove() command but I have doubts if it will work for [ ] type of list.
You can use list comprehension for the filtering:
example = [['a','b','c'],['d','e','f'],[ ],['z'],['g','h','i'],[ ],['z']]
output = [sublst for sublst in example if sublst not in ([], ['z'])]
print(output) # [['a', 'b', 'c'], ['d', 'e', 'f'], ['g', 'h', 'i']]
you can remove them like this:
example = list(filter(lambda val: val != [] and val!=['z'], example))

How to remove elements from a list that appear less than k = 2?

I am trying to keep elements of a list that appear at least twice, and remove the elements that appear less than twice.
For example, my list can look like:
letters = ['a', 'a', 'b', 'b', 'b', 'c']
I want to get a list with the numbers that appear at least twice, to get the following list:
letters_appear_twice = ['a', 'b'].
But since this is part of a bigger code, I don't know exactly what my lists looks like, only that I want to keep the letters that are repeated at least twice. But for the sake of understanding, we can assume I know what the list looks like!
I have tried the following:
'''
letters = ['a', 'a', 'b', 'b', 'b', 'c']
for x in set(letters):
if letters.count(x) > 2:
while x in letters:
letters.remove(x)
print(letters)
'''
But this doesn't quite work like I want it too...
Thank you in advance for any help!
letters = ['a', 'a', 'b', 'b', 'b', 'c']
res = []
for x in set(letters):
if letters.count(x) >= 2:
res.append(x)
print(res)
Prints:
['b', 'a']
Using your code above. You can make a new list, and append to it.
new_list = []
for x in set(letters):
if letters.count(x) >= 2:
new_list.append(x)
print(new_list)
Output
['b', 'a']
Easier to create a new list instead of manipulating the source list
def letters_more_or_equal_to_k(letters, k):
result = []
for x in set(letters):
if letters.count(x) >= k:
result.append(x)
result.sort()
return result
def main():
letters = ['a', 'a', 'b', 'b', 'b', 'c']
k = 2
result = letters_more_or_equal_to_k(letters, k)
print(result) # prints ['a', 'b']
if __name__ == "__main__":
main()
If you don't mind shuffling the values, here's one possible solution:
from collections import Counter
letters = ['a', 'a', 'b', 'b', 'b', 'c']
c = Counter(letters)
to_remove = {x for x, i in c.items() if i < 2}
result = list(set(letters) - to_remove)
print(result)
Output:
['a', 'b']
You can always sort later.
This solution is efficient for lists with more than ~10 unique elements.

Putting column values from text file into a list in python

I have a text file like this:
a w
b x
c,d y
e,f z
And I want to get the values of the first column into a list without duplicates. For now I get the values from the first column, which I am doing like this:
f=open("file.txt","r")
lines=f.readlines()
firstCol=[]
for x in lines:
firstCol.append(x.split(' ')[0])
f.close()
In the next step I want to separate the values by a comma delimiter the same way I did before, but then I get an output like this:
[['a'], ['b'], ['c', 'd'], ['e', 'f']]
How can I convert this into a one dimensional thing to be able to remove duplicates afterwards?
I am a beginner in python.
You can split it immediately after the first split and must use extend instead of append.
f=open("file.txt","r")
lines=f.readlines()
firstCol=[]
for x in lines:
firstCol.extend(x.split(' ')[0].split(','))
f.close()
print(firstCol)
Result
['a', 'b', 'c', 'd', 'e', 'f']
Or if you want to keep the firstCol
f=open("file.txt","r")
lines=f.readlines()
firstCol=[]
for x in lines:
firstCol.append(x.split(' ')[0])
f.close()
one_dimension = []
for col in firstCol:
one_dimension.extend(col.split(','))
print(firstCol)
print(one_dimension)
Result
['a', 'b', 'c,d', 'e,f']
['a', 'b', 'c', 'd', 'e', 'f']
you can use itertools.chain to flatten your list of lists and then you can use the built-in class set to remove the duplicates :
from itertools import chain
l = [['a'], ['b'], ['c', 'd'], ['e', 'f']]
set(chain.from_iterable(l))
# {'a', 'b', 'c', 'd', 'e', 'f'}
to flatten your list you can also use a list comprehension:
my_l = [e for i in l for e in i]
# ['a', 'b', 'c', 'd', 'e', 'f']
same with 2 simple for loops:
my_l = []
for i in l:
for e in i:
my_l.append(e)
Possible solution 1
If your are fine with your code, you can keep like that and remove duplicates from a list of lists executing the following:
import itertools
firstCol.sort()
firstCol = list(x for x,_ in itertools.groupby(firstCol))
Possible solution 2
If you want to convert the list of lists into one list of items:
firstCol = [x for y in firstCol for x in y]
If you want to also remove duplicates:
firstCol = list(set([x for y in firstCol for x in y]))

List with duplicated values and suffix

I have a list, a:
a = ['a','b','c']
and need to duplicate some values with the suffix _ind added this way (order is important):
['a', 'a_ind', 'b', 'b_ind', 'c', 'c_ind']
I tried:
b = [[x, x + '_ind'] for x in a]
c = [item for sublist in b for item in sublist]
print (c)
['a', 'a_ind', 'b', 'b_ind', 'c', 'c_ind']
Is there some better, more pythonic solution?
You could make it a generator:
def mygen(lst):
for item in lst:
yield item
yield item + '_ind'
>>> a = ['a','b','c']
>>> list(mygen(a))
['a', 'a_ind', 'b', 'b_ind', 'c', 'c_ind']
You could also do it with itertools.product, itertools.starmap or itertools.chain or nested comprehensions but in most cases I would prefer a simple to understand, custom generator-function.
With python3.3, you can also use yield from—generator delegation—to make this elegant solution just a bit more concise:
def mygen(lst):
for item in lst:
yield from (item, item + '_ind')
It can be shortened a little bit by moving the options to the inner for loop in the list comprehension:
a = ['a','b','c']
[item for x in a for item in (x, x + '_ind')]
# ['a', 'a_ind', 'b', 'b_ind', 'c', 'c_ind']
Another alternative with splicing (Python2.x, 3.x):
result = [None] * len(a) * 2
result[::2], result[1::2] = a, map(lambda x: x + '_ind', a)
result
# ['a', 'a_ind', 'b', 'b_ind', 'c', 'c_ind']
You can use itertools.chain():
import itertools
l = ['a','b','c']
new_list = list(itertools.chain.from_iterable([[i, i+"_ind"] for i in l]))
print new_list
Output:
['a', 'a_ind', 'b', 'b_ind', 'c', 'c_ind']
Before list comprehensions and generators were invented/became widespread, people used to think much simpler1:
>>> a = ['a', 'b', 'c']
>>> b = []
>>> for x in a: b.extend([x, x+'_ind'])
...
>>> b
['a', 'a_ind', 'b', 'b_ind', 'c', 'c_ind']
* I don't mean that those constructs/tools are evil, just wanted to point out that there is a simple solution.
Since you asked for "simple", I thought I'd throw this in (albeit, maybe not the pythonic way):
for i in mylist:
mylist1.append(i);
mylist1.append(i + '_ind');

Python: filtering lists by indices

In Python I have a list of elements aList and a list of indices myIndices. Is there any way I can retrieve all at once those items in aList having as indices the values in myIndices?
Example:
>>> aList = ['a', 'b', 'c', 'd', 'e', 'f', 'g']
>>> myIndices = [0, 3, 4]
>>> aList.A_FUNCTION(myIndices)
['a', 'd', 'e']
I don't know any method to do it. But you could use a list comprehension:
>>> [aList[i] for i in myIndices]
Definitely use a list comprehension but here is a function that does it (there are no methods of list that do this). This is however bad use of itemgetter but just for the sake of knowledge I have posted this.
>>> from operator import itemgetter
>>> a_list = ['a', 'b', 'c', 'd', 'e', 'f', 'g']
>>> my_indices = [0, 3, 4]
>>> itemgetter(*my_indices)(a_list)
('a', 'd', 'e')
Indexing by lists can be done in numpy. Convert your base list to a numpy array and then apply another list as an index:
>>> from numpy import array
>>> array(aList)[myIndices]
array(['a', 'd', 'e'],
dtype='|S1')
If you need, convert back to a list at the end:
>>> from numpy import array
>>> a = array(aList)[myIndices]
>>> list(a)
['a', 'd', 'e']
In some cases this solution can be more convenient than list comprehension.
You could use map
map(aList.__getitem__, myIndices)
or operator.itemgetter
f = operator.itemgetter(*aList)
f(myIndices)
If you do not require a list with simultaneous access to all elements, but just wish to use all the items in the sub-list iteratively (or pass them to something that will), its more efficient to use a generator expression rather than list comprehension:
(aList[i] for i in myIndices)
Alternatively, you could go with functional approach using map and a lambda function.
>>> list(map(lambda i: aList[i], myIndices))
['a', 'd', 'e']
I wasn't happy with these solutions, so I created a Flexlist class that simply extends the list class, and allows for flexible indexing by integer, slice or index-list:
class Flexlist(list):
def __getitem__(self, keys):
if isinstance(keys, (int, slice)): return list.__getitem__(self, keys)
return [self[k] for k in keys]
Then, for your example, you could use it with:
aList = Flexlist(['a', 'b', 'c', 'd', 'e', 'f', 'g'])
myIndices = [0, 3, 4]
vals = aList[myIndices]
print(vals) # ['a', 'd', 'e']

Categories