I have a list of strings:
['Hello Yes', 'Good Now Order', 'Been There Before', 'Because']
I want to rewrite this as:
['Hello\nYes', 'Good\nNow\nOrder', 'Been\nThere\nBefore', 'Because']
So the \n goes into every space except the beginning or end of the string.
I have tried .split(' ') inside a for loop, but that gets messy and then unsure how to rejoin at the end.
You can try the following comprehension that uses the split and join approach you are suggesting:
l = ['Hello Yes', 'Good Now Order', 'Been There Before', 'Because']
l_new = ['\n'.join(s.split()) for s in l]
# this would replace any sequence of whitespace by a single line break
or perhaps more readable and to the point, using str.replace:
l_new = [s.replace(' ', '\n') for s in l]
# this will replace all and only space characters
Try a list comprehension (this returns a new list with every element the same as in the old one, but with spaces replaced with linebreaks):
[element.replace(' ', '\n') for element in ls]
where ls is your list.
Alternatively, you could do a for-loop, using the replace method.
l = ['Hello Yes', 'Good Now Order', 'Been There Before', 'Because']
list(map('\n'.join, map(str.split, l)))
# ['Hello\nYes', 'Good\nNow\nOrder', 'Been\nThere\nBefore', 'Because']
Related
So this is a smaller version of my sentences/phrase list:
search = ['More than words', 'this way', 'go round', 'the same', 'end With', 'be sure', 'care for', 'see you in hell', 'see you', 'according to', 'they say', 'to go', 'stay with', 'Golf pro', 'Country Club']
What I would like to do is remove any terms that are more or less than 2 words in length. I basically just want another list with all 2 word terms. Is there a way to do this in Python? From my searching, I have only managed to find how to erase words of a certain number of characters and not entire phrases.
You can get 2 word phrases using this:
ans = list(filter(lambda s: len(s.split()) == 2, search))
Another way is using list comprehension:
ans = [w for w in search if len(w.split()) == 2]
As OP asked for removing duplicates in comment, adding it here:
ans = list(set(filter(lambda s: len(s.split()) == 2, search)))
Well if you take a simple approach and say every word must be separated by a space, then any two-word strings will only have 1 space character, three-word strings have 2 space characters, and so on (general rule, an n-word string has (n-1) spaces).
You can just break the two lists up like this (if you want one with strings <= two words, and one with strings > 2 words):
twoWords = []
moreThanTwo = []
for s in search:
if s.count(" ") == 1:
twoWords.append(s)
else
moreThanTwo.append(s)
Or to simplify with a lambda expression and just extract a single list of all two-word strings:
twoWords = list(filter(lambda s: s.count(" ") == 1, search))
As you want to get a new list with a list of words of length equal to 2, use list comprehension
new_list = [sentence for sentence in search if len(sentence.split(' ')) == 2]
I need to extract the "#" from a function that receives a string.
Here's what I've done:
def hashtag(str):
lst = []
for i in str.split():
if i[0] == "#":
lst.append(i[1:])
return lst
My code does work, but it splits words. So for the example string: "Python is #great #Computer#Science" it'll return the list: ['great', 'Computer#Science'] instead of ['great', 'Computer', 'Science'].
Without using RegEx please.
You can first try to find the firsr index where # occurs and split the slice on #
text = 'Python is #great #Computer#Science'
text[text.find('#')+1:].split('#')
Out[214]: ['great ', 'Computer', 'Science']
You can even use strip at last to remove unnecessary white space.
[tag.strip() for tag in text[text.find('#')+1:].split('#')]
Out[215]: ['great', 'Computer', 'Science']
Split into words, and then filter for the ones beginning with an octothorpe (hash).
[word for word in str.replace("#", " #").split()
if word.startswith('#')
]
The steps are
Insert a space in front of each hash, to make sure we separate on them
Split the string at spaces
Keep the words that start with a hash.
Result:
['#great', '#Computer', '#Science']
split by #
take all tokens except the first one
strip spaces
s = "Python is #great #Computer#Science"
out = [w.split()[0] for w in s.split('#')[1:]]
out
['great', 'Computer', 'Science']
When you split the string using default separator (space), you get the following result:
['Python', 'is', '#great', '#Computer#Science']
You can make a replace (adding a space before a hashtag) before splitting
def hashtag(str):
lst = []
str = str.replace('#', ' #')
for i in str.split():
if i[0] == "#":
lst.append(i[1:])
return lst
Is there a way to split a list of strings per character?
Here is a simple list that I want split per "!":
name1 = ['hello! i like apples!', ' my name is ! alfred!']
first = name1.split("!")
print(first)
I know it's not expected to run, I essentially want a new list of strings whose strings are now separated by "!". So output can be:
["hello", "i like apples", "my name is", "alfred"]
Based on your given output, I've "solved" the problem.
So basically what I do is:
1.) Create one big string by simply concatenating all of the strings contained in your list.
2.) Split the big string by character "!"
Code:
lst = ['hello! i like apples!', 'my name is ! alfred!']
s = "".join(lst)
result = s.split('!')
print(result)
Output:
['hello', ' i like apples', 'my name is ', ' alfred', '']
Just loop on each string and flatten its split result to a new list:
name1=['hello! i like apples!',' my name is ! alfred!']
print([s.strip() for sub in name1 for s in sub.split('!') if s])
Gives:
['hello', 'i like apples', 'my name is', 'alfred']
Try this:
name1 = ['hello! i like apples!', 'my name is ! alfred!']
new_list = []
for l in range(0, len(name1)):
new_list += name1[l].split('!')
new_list.remove('')
print(new_list)
Prints:
['hello', ' i like apples', 'my name is ', ' alfred']
What is the best way to split a string like
text = "hello there how are you"
in Python?
So I'd end up with an array like such:
['hello there', 'there how', 'how are', 'are you']
I have tried this:
liste = re.findall('((\S+\W*){'+str(2)+'})', text)
for a in liste:
print(a[0])
But I'm getting:
hello there
how are
you
How can I make the findall function move only one token when searching?
Here's a solution with re.findall:
>>> import re
>>> text = "hello there how are you"
>>> re.findall(r"(?=(?:(?:^|\W)(\S+\W\S+)(?:$|\W)))", text)
['hello there', 'there how', 'how are', 'are you']
Have a look at the Python docs for re: https://docs.python.org/3/library/re.html
(?=...) Lookahead assertion
(?:...) Non-capturing regular parentheses
If regex isn't require you could do something like:
l = text.split(' ')
out = []
for i in range(len(l)):
try:
o.append(l[i] + ' ' + l[i+1])
except IndexError:
continue
Explanation:
First split the string on the space character. The result will be a list where each element is a word in the sentence. Instantiate an empty list to hold the result. Loop over the list of words adding the two word combinations seperated by a space to the output list. This will throw an IndexError when accessing the last word in the list, just catch it and continue since you don't seem to want that lone word in your result anyway.
I don't think you actually need regex for this.
I understand you want a list, in which each element contains two words, the latter also being the former of the following element. We can do this easily like this:
string = "Hello there how are you"
liste = string.split(" ").pop(-1)
# we remove the last index, as otherwise we'll crash, or have an element with only one word
for i in range(len(liste)-1):
liste[i] = liste[i] + " " + liste[i+1]
I don't know if it's mandatory for you need to use regex, but I'd do this way.
First, you can get the list of words with the str.split() method.
>>> sentence = "hello there how are you"
>>> splited_sentence = sentence.split(" ")
>>> splited_sentence
['hello', 'there', 'how', 'are', 'you']
Then, you can make pairs.
>>> output = []
>>> for i in range (1, len(splited_sentence) ):
... output += [ splited[ i-1 ] + ' ' + splited_sentence[ i ] ]
...
output
['hello there', 'there how', 'how are', 'are you']
An alternative is just to split, zip, then join like so...
sentence = "Hello there how are you"
words = sentence.split()
[' '.join(i) for i in zip(words, words[1:])]
Another possible solution using findall.
>>> liste = list(map(''.join, re.findall(r'(\S+(?=(\s+\S+)))', text)))
>>> liste
['hello there', 'there how', 'how are', 'are you']
I've been tasked with writing a for loop to remove some punctuation in a list of strings, storing the answers in a new list. I know how to do this with one string, but not in a loop.
For example: phrases = ['hi there!', 'thanks!'] etc.
import string
new_phrases = []
for i in phrases:
if i not in string.punctuation
Then I get a bit stuck at this point. Do I append? I've tried yield and return, but realised that's for functions.
You can either update your current list or append the new value in another list. the update will be better because it takes constant space while append takes O(n) space.
phrases = ['hi there!', 'thanks!']
i = 0
for el in phrases:
new_el = el.replace("!", "")
phrases[i] = new_el
i += 1
print (phrases)
will give output: ['hi there', 'thanks']
Give this a go:
import re
new_phrases = []
for word in phrases:
new_phrases.append(re.sub(r'[^\w\s]','', word))
This uses the regex library to turn all punctuation into a 'blank' string. Essentially, removing it
You can use re module and list comprehension to do it in single line:
phrases = ['hi there!', 'thanks!']
import string
import re
new_phrases = [re.sub('[{}]'.format(string.punctuation), '', i) for i in phrases]
new_phrases
#['hi there', 'thanks']
If phrases contains any punctuation then replace it with "" and append to the new_phrases
import string
new_phrases = []
phrases = ['hi there!', 'thanks!']
for i in phrases:
for pun in string.punctuation:
if pun in i:
i = i.replace(pun,"")
new_phrases.append(i)
print(new_phrases)
OUTPUT
['hi there', 'thanks']
Following your forma mentis, I'll do like this:
for word in phrases: #for each word
for punct in string.punctuation: #for each punctuation
w=w.replace(punct,'') # replace the punctuation character with nothing (remove punctuation)
new_phrases.append(w) #add new "punctuationless text" to your output
I suggest you using the powerful translate() method on each string of your input list, which seems really appropriate. It gives the following code, iterating over the input list throug a list comprehension, which is short and easily readable:
import string
phrases = ['hi there!', 'thanks!']
translationRule = str.maketrans({k:"" for k in string.punctuation})
new_phrases = [phrase.translate(translationRule) for phrase in phrases]
print(new_phrases)
# ['hi there', 'thanks']
Or to only allow spaces and letters:
phrases=[''.join(x for x in i if x.isalpha() or x==' ') for i in phrases]
Now:
print(phrases)
Is:
['hi there', 'thanks']
you should use list comprehension
new_list = [process(string) for string in phrases]