Python split() String into a list with spaces - python

user_words = raw_input()
word_list = user_words.split()
user_words = []
for word in word_list:
user_words.append(word.capitalize())
user_words = " ".join(user_words)
print(user_words)
Current Output:
Input:
hello world(two spaces in between)
Output:
Hello World
Desired Output:
Input:
hello world(two spaces in between)
Output:
Hello World(two spaces in between)
Note: I want to be able to split the string by spaces, but still have the extra spaces between words in the original string that's inputted by the user.

If you split using the space character, you'll get extra '' in your list
>>> "Hello world".split()
['Hello', 'world']
>>> "Hello world".split(' ')
['Hello', '', 'world']
Those generate the extra spaces again after a join
>>> ' '.join(['Hello', '', 'world'])
'Hello world'

Use re.split for this and join by the space original string has.
user_words = raw_input()
word_list = re.split(r"(\s+)",user_words)
user_words = []
user_words.append(word_list[0].capitalize())
user_words.append(word_list[2].capitalize())
user_words = word_list[1].join(user_words)
print(user_words)

Related

split string by using regex in python

What is the best way to split a string like
text = "hello there how are you"
in Python?
So I'd end up with an array like such:
['hello there', 'there how', 'how are', 'are you']
I have tried this:
liste = re.findall('((\S+\W*){'+str(2)+'})', text)
for a in liste:
print(a[0])
But I'm getting:
hello there
how are
you
How can I make the findall function move only one token when searching?
Here's a solution with re.findall:
>>> import re
>>> text = "hello there how are you"
>>> re.findall(r"(?=(?:(?:^|\W)(\S+\W\S+)(?:$|\W)))", text)
['hello there', 'there how', 'how are', 'are you']
Have a look at the Python docs for re: https://docs.python.org/3/library/re.html
(?=...) Lookahead assertion
(?:...) Non-capturing regular parentheses
If regex isn't require you could do something like:
l = text.split(' ')
out = []
for i in range(len(l)):
try:
o.append(l[i] + ' ' + l[i+1])
except IndexError:
continue
Explanation:
First split the string on the space character. The result will be a list where each element is a word in the sentence. Instantiate an empty list to hold the result. Loop over the list of words adding the two word combinations seperated by a space to the output list. This will throw an IndexError when accessing the last word in the list, just catch it and continue since you don't seem to want that lone word in your result anyway.
I don't think you actually need regex for this.
I understand you want a list, in which each element contains two words, the latter also being the former of the following element. We can do this easily like this:
string = "Hello there how are you"
liste = string.split(" ").pop(-1)
# we remove the last index, as otherwise we'll crash, or have an element with only one word
for i in range(len(liste)-1):
liste[i] = liste[i] + " " + liste[i+1]
I don't know if it's mandatory for you need to use regex, but I'd do this way.
First, you can get the list of words with the str.split() method.
>>> sentence = "hello there how are you"
>>> splited_sentence = sentence.split(" ")
>>> splited_sentence
['hello', 'there', 'how', 'are', 'you']
Then, you can make pairs.
>>> output = []
>>> for i in range (1, len(splited_sentence) ):
... output += [ splited[ i-1 ] + ' ' + splited_sentence[ i ] ]
...
output
['hello there', 'there how', 'how are', 'are you']
An alternative is just to split, zip, then join like so...
sentence = "Hello there how are you"
words = sentence.split()
[' '.join(i) for i in zip(words, words[1:])]
Another possible solution using findall.
>>> liste = list(map(''.join, re.findall(r'(\S+(?=(\s+\S+)))', text)))
>>> liste
['hello there', 'there how', 'how are', 'are you']

Write a for loop to remove punctuation

I've been tasked with writing a for loop to remove some punctuation in a list of strings, storing the answers in a new list. I know how to do this with one string, but not in a loop.
For example: phrases = ['hi there!', 'thanks!'] etc.
import string
new_phrases = []
for i in phrases:
if i not in string.punctuation
Then I get a bit stuck at this point. Do I append? I've tried yield and return, but realised that's for functions.
You can either update your current list or append the new value in another list. the update will be better because it takes constant space while append takes O(n) space.
phrases = ['hi there!', 'thanks!']
i = 0
for el in phrases:
new_el = el.replace("!", "")
phrases[i] = new_el
i += 1
print (phrases)
will give output: ['hi there', 'thanks']
Give this a go:
import re
new_phrases = []
for word in phrases:
new_phrases.append(re.sub(r'[^\w\s]','', word))
This uses the regex library to turn all punctuation into a 'blank' string. Essentially, removing it
You can use re module and list comprehension to do it in single line:
phrases = ['hi there!', 'thanks!']
import string
import re
new_phrases = [re.sub('[{}]'.format(string.punctuation), '', i) for i in phrases]
new_phrases
#['hi there', 'thanks']
If phrases contains any punctuation then replace it with "" and append to the new_phrases
import string
new_phrases = []
phrases = ['hi there!', 'thanks!']
for i in phrases:
for pun in string.punctuation:
if pun in i:
i = i.replace(pun,"")
new_phrases.append(i)
print(new_phrases)
OUTPUT
['hi there', 'thanks']
Following your forma mentis, I'll do like this:
for word in phrases: #for each word
for punct in string.punctuation: #for each punctuation
w=w.replace(punct,'') # replace the punctuation character with nothing (remove punctuation)
new_phrases.append(w) #add new "punctuationless text" to your output
I suggest you using the powerful translate() method on each string of your input list, which seems really appropriate. It gives the following code, iterating over the input list throug a list comprehension, which is short and easily readable:
import string
phrases = ['hi there!', 'thanks!']
translationRule = str.maketrans({k:"" for k in string.punctuation})
new_phrases = [phrase.translate(translationRule) for phrase in phrases]
print(new_phrases)
# ['hi there', 'thanks']
Or to only allow spaces and letters:
phrases=[''.join(x for x in i if x.isalpha() or x==' ') for i in phrases]
Now:
print(phrases)
Is:
['hi there', 'thanks']
you should use list comprehension
new_list = [process(string) for string in phrases]

Split string in words before nth occurrence of hashtags in python

I am using the following code in Python to split string into words:
keywords=re.sub(r'[][)(!,;]', ' ', str(row[0])).split()
imagine the input is :
"Hello #world I am in #London and it is #sunny today"
I need it to be split in words only before occurrence of the second hashtag and no need to split the rest, which means the output should be :
['Hello','#world','I','am','in']
Is there any solution to split the string into keywords in such way in Python?
str.findtakes a start position so when you find the first use that index + 1 t start looking for the second then split that substring:
s = "Hello #world I am in #London and it is #sunny today"
i = s.find("#", s.find("#") + 1)
print(s[:i].split())
['Hello', '#world', 'I', 'am', 'in']
You can also do the same with index:
s = "Hello #world I am in #London and it is #sunny today"
i = s.index("#", s.index("#") + 1)
print(s[:i].split())
The difference being index will raise an error if the substring does not exist.
The split method accepts a character to split by, otherwise it splits on whitespace.
string_to_split = "Hello #world I am in #London and it is #sunny today"
# Split on all occurrences of #
temp = string_to_split.split("#")
# Join the first two entries with a '#' and remove any trailing whitespace
temp_two = '#'.join(temp[:2]).strip()
# split on spaces
final = temp_two.split(' ')
Run in terminal:
>>> string_to_split = "Hello #world I am in #London and it is #sunny today"
>>> temp = string_to_split.split("#")
>>> temp_two = '#'.join(temp[:2]).strip()
>>> final = temp_two.split(' ')
>>> final
['Hello', '#world', 'I', 'am', 'in']
Edit: fixed [2:] to [:2] i always get them mixed up
Edit: fixed the extra whitespace issue
interactive python:
>>> str="Hello #world I am in #London and it is #sunny today"
>>> hash_indices=[i for i, element in enumerate(str) if element=='#']
>>> hash_indices
[6, 21, 39]
>>> str[0:hash_indices[1]].split()
['Hello', '#world', 'I', 'am', 'in']
>>> str[hash_indices[1]:]
'#London and it is #sunny today'
>>>
Regex and split
source = "Hello #world I am in #London and it is #sunny today"
reg_out = re.search('[^#]*#[^#]*#', source)
split_out = reg_out.group().split()
print split_out[:-1]
O/P:['Hello', '#world', 'I', 'am', 'in']

Removal of white space

I want to get rid of the white space at the end of each line.
w = input("Words: ")
w = w.split()
k = 1
length = []
for ws in w:
length.append(len(ws))
y = sorted(length)
while k <= y[-1]:
if k in length:
for ws in w:
if len(ws) != k:
continue
else:
print(ws, end=" ")
print("")
k += 1
The out put is giving me lines of words in assessing lengths eg if I type in I do love QI;
I
do QI
love
But it has white space at the end of each line. If I try to .rstrip() it I also delete the spaces between the words and get;
I
doQI
love
Use " ".join(ws) instead and it will auto but them on the same line (you will need to create a list rather than a string)
re.sub(r"[ ]*$","",x)
You use use re.sub of re module.
you need to use rstrip
demo:
>>> 'hello '.rstrip()
'hello'
rstrip removes any whitespace from right
lstrip removes whitespace from left:
>>> ' hello '.lstrip()
'hello '
while strip removes from both end:
>>> ' hello '.strip()
'hello'
you need to use split to convert them to list
>>> "hello,how,are,you".split(',') # if ',' is the delimiter
['hello', 'how', 'are', 'you']
>>> "hello how are you".split() # if whitespace is delimiter
['hello', 'how', 'are', 'you']

Python and Line Breaks

With Python I know that the "\n" breaks to the next line in a string, but what I am trying to do is replace every "," in a string with a '\n'. Is that possible? I am kind of new to Python.
Try this:
text = 'a, b, c'
text = text.replace(',', '\n')
print text
For lists:
text = ['a', 'b', 'c']
text = '\n'.join(text)
print text
>>> str = 'Hello, world'
>>> str = str.replace(',','\n')
>>> print str
Hello
world
>>> str_list=str.split('\n')
>>> print str_list
['Hello', ' world']
For futher operations you may check: http://docs.python.org/library/stdtypes.html
You can insert a literal \n into your string by escaping the backslash, e.g.
>>> print '\n'; # prints an empty line
>>> print '\\n'; # prints \n
\n
The same principle is used in regular expressions. Use this expresion to replace all , in a string with \n:
>>> re.sub(",", "\\n", "flurb, durb, hurr")
'flurb\n durb\n hurr'

Categories