Splitting strings properly in Python 3 [duplicate]

Splitting strings properly in Python 3 [duplicate] - python

I have to take a large list of words in the form:
['this\n', 'is\n', 'a\n', 'list\n', 'of\n', 'words\n']
and then using the strip function, turn it into:
['this', 'is', 'a', 'list', 'of', 'words']
I thought that what I had written would work, but I keep getting an error saying:
"'list' object has no attribute 'strip'"
Here is the code that I tried:
strip_list = []
for lengths in range(1,20):
strip_list.append(0) #longest word in the text file is 20 characters long
for a in lines:
strip_list.append(lines[a].strip())

You can either use a list comprehension
my_list = ['this\n', 'is\n', 'a\n', 'list\n', 'of\n', 'words\n']
stripped = [s.strip() for s in my_list]
or alternatively use map():
stripped = list(map(str.strip, my_list))
In Python 2, map() directly returned a list, so you didn't need the call to list. In Python 3, the list comprehension is more concise and generally considered more idiomatic.

list comprehension?
[x.strip() for x in lst]

You can use lists comprehensions:
strip_list = [item.strip() for item in lines]
Or the map function:
# with a lambda
strip_list = map(lambda it: it.strip(), lines)
# without a lambda
strip_list = map(str.strip, lines)

This can be done using list comprehensions as defined in PEP 202
[w.strip() for w in ['this\n', 'is\n', 'a\n', 'list\n', 'of\n', 'words\n']]

All other answers, and mainly about list comprehension, are great. But just to explain your error:
strip_list = []
for lengths in range(1,20):
strip_list.append(0) #longest word in the text file is 20 characters long
for a in lines:
strip_list.append(lines[a].strip())
a is a member of your list, not an index. What you could write is this:
[...]
for a in lines:
strip_list.append(a.strip())
Another important comment: you can create an empty list this way:
strip_list = [0] * 20
But this is not so useful, as .append appends stuff to your list. In your case, it's not useful to create a list with defaut values, as you'll build it item per item when appending stripped strings.
So your code should be like:
strip_list = []
for a in lines:
strip_list.append(a.strip())
But, for sure, the best one is this one, as this is exactly the same thing:
stripped = [line.strip() for line in lines]
In case you have something more complicated than just a .strip, put this in a function, and do the same. That's the most readable way to work with lists.

If you need to remove just trailing whitespace, you could use str.rstrip(), which should be slightly more efficient than str.strip():
>>> lst = ['this\n', 'is\n', 'a\n', 'list\n', 'of\n', 'words\n']
>>> [x.rstrip() for x in lst]
['this', 'is', 'a', 'list', 'of', 'words']
>>> list(map(str.rstrip, lst))
['this', 'is', 'a', 'list', 'of', 'words']

my_list = ['this\n', 'is\n', 'a\n', 'list\n', 'of\n', 'words\n']
print([l.strip() for l in my_list])
Output:
['this', 'is', 'a', 'list', 'of', 'words']

Related

Split The Second String of Every Element in a List into Multiple Strings

I am very very new to python so I'm still figuring out the basics.
I have a nested list with each element containing two strings like so:
mylist = [['Wowza', 'Here is a string'],['omg', 'yet another string']]
I would like to iterate through each element in mylist, and split the second string into multiple strings so it looks like:
mylist = [['wowza', 'Here', 'is', 'a', 'string'],['omg', 'yet', 'another', 'string']]
I have tried so many things, such as unzipping and
for elem in mylist:
mylist.append(elem)
NewList = [item[1].split(' ') for item in mylist]
print(NewList)
and even
for elem in mylist:
NewList = ' '.join(elem)
def Convert(string):
li = list(string.split(' '))
return li
print(Convert(NewList))
Which just gives me a variable that contains a bunch of lists
I know I'm way over complicating this, so any advice would be greatly appreciated

You can use list comprehension
mylist = [['Wowza', 'Here is a string'],['omg', 'yet another string']]
req_list = [[i[0]]+ i[1].split() for i in mylist]
# [['Wowza', 'Here', 'is', 'a', 'string'], ['omg', 'yet', 'another', 'string']]

I agree with #DeepakTripathi's list comprehension suggestion (+1) but I would structure it more descriptively:
>>> mylist = [['Wowza', 'Here is a string'], ['omg', 'yet another string']]
>>> newList = [[tag, *words.split()] for (tag, words) in mylist]
>>> print(newList)
[['Wowza', 'Here', 'is', 'a', 'string'], ['omg', 'yet', 'another', 'string']]
>>>

You can use the + operator on lists to combine them:
a = ['hi', 'multiple words here']
b = []
for i in a:
b += i.split()

Removing specific set of characters in a list of strings

I have a list of strings, and want to use another list of strings and remove any instance of the combination of bad list in my list. Such as the output of the below would be foo, bar, foobar, foofoo... Currently I have tried a few things for example below
mylist = ['foo!', 'bar\\n', 'foobar!!??!!', 'foofoo::!*']
remove_list = ['\\n', '!', '*', '?', ':']
for remove in remove_list:
for strings in mylist:
strings = strings.replace(bad, ' ')
The above code doesnt work, I did at one point set it to a new variable and append that afterwords but that wasnt working well becuase if their was two issues in a string it would be appended twice.

You changed the temporary variable, not the original list. Instead, assign the result back into mylist
for bad in remove_list:
for pos, string in enumerate(mylist):
mylist[pos] = string.replace(bad, ' ')

Try this:
mylist = ['foo!', 'bar\\n', 'foobar!!??!!', 'foofoo::!*']
bads = ['\\n', '!', '*', '?', ':']
result = []
for s in mylist:
# s is a temporary copy
for bad in bads:
s = s.replace(bad, '') # for all bad remove it
result.append(s)
print(result)
Could be implemented more concise, but this way it's more understandable.

I had a hard time interpreting the question, but I see you have the result desired at the top of your question.
mylist = ['foo!', 'bar\\n', 'foobar!!??!!', 'foofoo::!*']
remove_list = ['\\n', '!', '*', '?', ':']
output = output[]
for strings in mylist:
for remove in remove_list:
strings = strings.replace(remove, '')
output.append(strings)

import re
for list1 in mylist:
t = regex.sub('', list1)
print(t)
If you just want to get rid of non-chars do this. It works a lot better than comparing two separate array lists.

Why not have regex do the work for you? No nested loops this way (just make sure to escape correctly):
import re
mylist = ['foo!', 'bar\\n', 'foobar!!??!!', 'foofoo::!*']
remove_list = [r'\\n', '\!', '\*', '\?', ':']
removals = re.compile('|'.join(remove_list))
print([removals.sub('', s) for s in mylist])
['foo', 'bar', 'foobar', 'foofoo']

Another solution you can use is a comprehension list and remove the characters you want. After that, you delete duplicates.
list_good = [word.replace(bad, '') for word in mylist for bad in remove_list]
list_good = list(set(list_good))

my_list = ["foo!", "bar\\n", "foobar!!??!!", "foofoo::*!"]
to_remove = ["!", "\\n", "?", ":", "*"]
for index, item in enumerate(my_list):
for char in to_remove:
if char in item:
item = item.replace(char, "")
my_list[index] = item
print(my_list) # outputs [“foo”,”bar”,”foobar”,”foofoo”]

Splitting a single index list into multiple list indexes?

I have a list:
lst = ['words in a list']
and I was hoping to split each one of these words in the string into their own separate indexes. So for example, it would look something like this:
lst = ['words','in','a','list']
I'm wondering if this is possible? I thought initially this would be just a simple lst.split() with a loop, but it seems like this is will throw an error.
Thanks for the help!

Use this:
print(lst[0].split())
If the list has more elements:
print([x for i in lst for x in i.split()])

Split only works for a string type. So you need to index the list item first and then split.
lst = lst[0].split()

Use this when you have a list of string or single string inside a list
lst = ['this is string1', 'this is string2', 'this is string3']
result =' '.join(lst).split()
print(result)
# output : ['this', 'is', 'string1', 'this', 'is', 'string2', 'this', 'is', 'string3']

Confusion with split function in Python

I am trying to alphabetically sort the words from a file. However, the program sorts the lines, not the words, according to their first words. Here it is.
fname = raw_input("Enter file name: ")
fh = open(fname)
lst = list()
for line in fh:
lst2 = line.strip()
words = lst2.split()
lst.append(words)
lst.sort()
print lst
Here is my input file
But soft what light through yonder window breaks
It is the east and Juliet is the sun
Arise fair sun and kill the envious moon
Who is already sick and pale with grief
And this is what I'm hoping to get
['Arise', 'But', 'It', 'Juliet', 'Who', 'already', 'and', 'breaks', 'east', 'envious', 'fair', 'grief', 'is', 'kill', 'light', 'moon', 'pale', 'sick', 'soft', 'sun', 'the', 'through', 'what', 'window', 'with', 'yonder']

lst.append(words) append a list at the end of lst, it does not concatenates lst and words. You need to use lst.extend(words) or lst += words.
Also, you should not sort the list at each iteration but only at the end of your loop:
lst = []
for line in fh:
lst2 = line.strip()
words = lst2.split()
lst.extend(words)
lst.sort()
print lst
If you don't want repeated word, use a set:
st = set()
for line in fh:
lst2 = line.strip()
words = lst2.split()
st.update(words)
lst = list(st)
lst.sort()
print lst

lst.append(words) is adding the list as a member to the outer list. For instance:
lst = []
lst.append(['another','list'])
lst ## [['another','list']]
So you're getting a nested list. Use .extend(...) instead:
fname = raw_input("Enter file name: ")
fh = open(fname)
lst = list()
for line in fh:
lst2 = line.strip()
words = lst2.split()
lst.extend(words)
lst.sort()
print lst

line.split() returns a list of strings. Now you want to join those words with the list of strings you've already accumulated with the previous lines. When you call lst.append(words) you're just adding the list of words to your list, so you end up with a list of lists. What you probably want is extend() which simply adds all the elements of one list to the other.
So instead of doing lst.append(words), you would want lst.extend(words).

The problem is that words is an array of your words from the split. When you append words to lst, you are making a list of arrays, and sorting it will only sort that list.
You want to do something like:
for x in words:
lst.append(x)
lst.sort()
I believe
Edit: I have implemented your text file, this following code works for me:
inp=open('test.txt','r')
lst=list()
for line in inp:
tokens=line.split('\n')[0].split() #This is to split away new line characters but shouldnt impact
for x in tokens:
lst.append(x)
lst.sort()
lst

Remove trailing newline from the elements of a string list

I have to take a large list of words in the form:
['this\n', 'is\n', 'a\n', 'list\n', 'of\n', 'words\n']
and then using the strip function, turn it into:
['this', 'is', 'a', 'list', 'of', 'words']
I thought that what I had written would work, but I keep getting an error saying:
"'list' object has no attribute 'strip'"
Here is the code that I tried:
strip_list = []
for lengths in range(1,20):
strip_list.append(0) #longest word in the text file is 20 characters long
for a in lines:
strip_list.append(lines[a].strip())

You can either use a list comprehension
my_list = ['this\n', 'is\n', 'a\n', 'list\n', 'of\n', 'words\n']
stripped = [s.strip() for s in my_list]
or alternatively use map():
stripped = list(map(str.strip, my_list))
In Python 2, map() directly returned a list, so you didn't need the call to list. In Python 3, the list comprehension is more concise and generally considered more idiomatic.

list comprehension?
[x.strip() for x in lst]

You can use lists comprehensions:
strip_list = [item.strip() for item in lines]
Or the map function:
# with a lambda
strip_list = map(lambda it: it.strip(), lines)
# without a lambda
strip_list = map(str.strip, lines)

This can be done using list comprehensions as defined in PEP 202
[w.strip() for w in ['this\n', 'is\n', 'a\n', 'list\n', 'of\n', 'words\n']]

All other answers, and mainly about list comprehension, are great. But just to explain your error:
strip_list = []
for lengths in range(1,20):
strip_list.append(0) #longest word in the text file is 20 characters long
for a in lines:
strip_list.append(lines[a].strip())
a is a member of your list, not an index. What you could write is this:
[...]
for a in lines:
strip_list.append(a.strip())
Another important comment: you can create an empty list this way:
strip_list = [0] * 20
But this is not so useful, as .append appends stuff to your list. In your case, it's not useful to create a list with defaut values, as you'll build it item per item when appending stripped strings.
So your code should be like:
strip_list = []
for a in lines:
strip_list.append(a.strip())
But, for sure, the best one is this one, as this is exactly the same thing:
stripped = [line.strip() for line in lines]
In case you have something more complicated than just a .strip, put this in a function, and do the same. That's the most readable way to work with lists.

If you need to remove just trailing whitespace, you could use str.rstrip(), which should be slightly more efficient than str.strip():
>>> lst = ['this\n', 'is\n', 'a\n', 'list\n', 'of\n', 'words\n']
>>> [x.rstrip() for x in lst]
['this', 'is', 'a', 'list', 'of', 'words']
>>> list(map(str.rstrip, lst))
['this', 'is', 'a', 'list', 'of', 'words']

my_list = ['this\n', 'is\n', 'a\n', 'list\n', 'of\n', 'words\n']
print([l.strip() for l in my_list])
Output:
['this', 'is', 'a', 'list', 'of', 'words']

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Splitting strings properly in Python 3 [duplicate] - python

list comprehension? [x.strip() for x in lst]

You can use lists comprehensions: strip_list = [item.strip() for item in lines] Or the map function: # with a lambda strip_list = map(lambda it: it.strip(), lines) # without a lambda strip_list = map(str.strip, lines)

This can be done using list comprehensions as defined in PEP 202 [w.strip() for w in ['this\n', 'is\n', 'a\n', 'list\n', 'of\n', 'words\n']]

my_list = ['this\n', 'is\n', 'a\n', 'list\n', 'of\n', 'words\n'] print([l.strip() for l in my_list]) Output: ['this', 'is', 'a', 'list', 'of', 'words']

Related

Split The Second String of Every Element in a List into Multiple Strings

Removing specific set of characters in a list of strings

Splitting a single index list into multiple list indexes?

Confusion with split function in Python

Remove trailing newline from the elements of a string list

Categories

Resources