I want to get rid of the white space at the end of each line.
w = input("Words: ")
w = w.split()
k = 1
length = []
for ws in w:
length.append(len(ws))
y = sorted(length)
while k <= y[-1]:
if k in length:
for ws in w:
if len(ws) != k:
continue
else:
print(ws, end=" ")
print("")
k += 1
The out put is giving me lines of words in assessing lengths eg if I type in I do love QI;
I
do QI
love
But it has white space at the end of each line. If I try to .rstrip() it I also delete the spaces between the words and get;
I
doQI
love
Use " ".join(ws) instead and it will auto but them on the same line (you will need to create a list rather than a string)
re.sub(r"[ ]*$","",x)
You use use re.sub of re module.
you need to use rstrip
demo:
>>> 'hello '.rstrip()
'hello'
rstrip removes any whitespace from right
lstrip removes whitespace from left:
>>> ' hello '.lstrip()
'hello '
while strip removes from both end:
>>> ' hello '.strip()
'hello'
you need to use split to convert them to list
>>> "hello,how,are,you".split(',') # if ',' is the delimiter
['hello', 'how', 'are', 'you']
>>> "hello how are you".split() # if whitespace is delimiter
['hello', 'how', 'are', 'you']
Related
I need to extract the "#" from a function that receives a string.
Here's what I've done:
def hashtag(str):
lst = []
for i in str.split():
if i[0] == "#":
lst.append(i[1:])
return lst
My code does work, but it splits words. So for the example string: "Python is #great #Computer#Science" it'll return the list: ['great', 'Computer#Science'] instead of ['great', 'Computer', 'Science'].
Without using RegEx please.
You can first try to find the firsr index where # occurs and split the slice on #
text = 'Python is #great #Computer#Science'
text[text.find('#')+1:].split('#')
Out[214]: ['great ', 'Computer', 'Science']
You can even use strip at last to remove unnecessary white space.
[tag.strip() for tag in text[text.find('#')+1:].split('#')]
Out[215]: ['great', 'Computer', 'Science']
Split into words, and then filter for the ones beginning with an octothorpe (hash).
[word for word in str.replace("#", " #").split()
if word.startswith('#')
]
The steps are
Insert a space in front of each hash, to make sure we separate on them
Split the string at spaces
Keep the words that start with a hash.
Result:
['#great', '#Computer', '#Science']
split by #
take all tokens except the first one
strip spaces
s = "Python is #great #Computer#Science"
out = [w.split()[0] for w in s.split('#')[1:]]
out
['great', 'Computer', 'Science']
When you split the string using default separator (space), you get the following result:
['Python', 'is', '#great', '#Computer#Science']
You can make a replace (adding a space before a hashtag) before splitting
def hashtag(str):
lst = []
str = str.replace('#', ' #')
for i in str.split():
if i[0] == "#":
lst.append(i[1:])
return lst
What is the best way to split a string like
text = "hello there how are you"
in Python?
So I'd end up with an array like such:
['hello there', 'there how', 'how are', 'are you']
I have tried this:
liste = re.findall('((\S+\W*){'+str(2)+'})', text)
for a in liste:
print(a[0])
But I'm getting:
hello there
how are
you
How can I make the findall function move only one token when searching?
Here's a solution with re.findall:
>>> import re
>>> text = "hello there how are you"
>>> re.findall(r"(?=(?:(?:^|\W)(\S+\W\S+)(?:$|\W)))", text)
['hello there', 'there how', 'how are', 'are you']
Have a look at the Python docs for re: https://docs.python.org/3/library/re.html
(?=...) Lookahead assertion
(?:...) Non-capturing regular parentheses
If regex isn't require you could do something like:
l = text.split(' ')
out = []
for i in range(len(l)):
try:
o.append(l[i] + ' ' + l[i+1])
except IndexError:
continue
Explanation:
First split the string on the space character. The result will be a list where each element is a word in the sentence. Instantiate an empty list to hold the result. Loop over the list of words adding the two word combinations seperated by a space to the output list. This will throw an IndexError when accessing the last word in the list, just catch it and continue since you don't seem to want that lone word in your result anyway.
I don't think you actually need regex for this.
I understand you want a list, in which each element contains two words, the latter also being the former of the following element. We can do this easily like this:
string = "Hello there how are you"
liste = string.split(" ").pop(-1)
# we remove the last index, as otherwise we'll crash, or have an element with only one word
for i in range(len(liste)-1):
liste[i] = liste[i] + " " + liste[i+1]
I don't know if it's mandatory for you need to use regex, but I'd do this way.
First, you can get the list of words with the str.split() method.
>>> sentence = "hello there how are you"
>>> splited_sentence = sentence.split(" ")
>>> splited_sentence
['hello', 'there', 'how', 'are', 'you']
Then, you can make pairs.
>>> output = []
>>> for i in range (1, len(splited_sentence) ):
... output += [ splited[ i-1 ] + ' ' + splited_sentence[ i ] ]
...
output
['hello there', 'there how', 'how are', 'are you']
An alternative is just to split, zip, then join like so...
sentence = "Hello there how are you"
words = sentence.split()
[' '.join(i) for i in zip(words, words[1:])]
Another possible solution using findall.
>>> liste = list(map(''.join, re.findall(r'(\S+(?=(\s+\S+)))', text)))
>>> liste
['hello there', 'there how', 'how are', 'are you']
I was wondering if it would be possible to split a string such as
string = 'hello world [Im nick][introduction]'
into an array such as
['hello', 'world', '[Im nick][introduction]']
It doesn't have to be efficient, but just a way to get all the words from a sentence split unless they are in brackets, where the whole sentence is not split.
I need this because I have a markdown file with sentences such as
- What is the weather in [San antonio, texas][location]
I need the san antonio texas to be a full sentence inside of an array, would this be possible? The array would look like:
array = ['what', 'is', 'the', 'weather', 'in', 'San antonio, texas][location]']
Maybe this could work for you:
>>> s = 'What is the weather in [San antonio, texas][location]'
>>> i1 = s.index('[')
>>> i2 = s.index('[', i1 + 1)
>>> part_1 = s[:i1].split() # everything before the first bracket
>>> part_2 = [s[i1:i2], ] # first bracket pair
>>> part_3 = [s[i2:], ] # second bracket pair
>>> parts = part_1 + part_2 + part_3
>>> s
'What is the weather in [San antonio, texas][location]'
>>> parts
['What', 'is', 'the', 'weather', 'in', '[San antonio, texas]', '[location]']
It searches for the left brackets and uses that as a reference before splitting by spaces.
This assumes:
that there is no other text between the first closing bracket and the second opening bracket.
that there is nothing after the second closing bracket
Here is a more robust solution:
def do_split(s):
parts = []
while '[' in s:
start = s.index('[')
end = s.index(']', s.index(']')+1) + 1 # looks for second closing bracket
parts.extend(s[:start].split()) # everything before the opening bracket
parts.append(s[start:end]) # 2 pairs of brackets
s = s[end:] # remove processed part of the string
parts.extend(s.split()) # add remainder
return parts
This yields:
>>> do_split('What is the weather in [San antonio, texas][location] on [friday][date]?')
['What', 'is', 'the', 'weather', 'in', '[San antonio, texas][location]', 'on', '[friday][date]', '?']
Maybe this short snippet can help you. But note that this only works if everything you said holds true for all the entries in the file.
s = 'What is the weather in [San antonio, texas][location]'
s = s.split(' [')
s[1] = '[' + s[1] # add back the split character
mod = s[0] # store in a variable
mod = mod.split(' ') # split the first part on space
mod.append(s[1]) # attach back the right part
print(mod)
Outputs:
['What', 'is', 'the', 'weather', 'in', '[San antonio, texas][location]']
and for s = 'hello world [Im nick][introduction]'
['hello', 'world', '[Im nick][introduction]']
For an one liner use functional programming tools such as reduce from the functool module
reduce( lambda x, y: x.append(y) if y and y.endswith("]") else x + y.split(), s.split(" ["))
or, slightly shorter with using standard operators, map and sum
sum(map( lambda x: [x] if x and x.endswith("]") else x.split()), []) s.split(" ["))
This code below will work with your example. Hope it helps :)
I'm sure it can be better but now I have to go. Please enjoy.
string = 'hello world [Im nick][introduction]'
list = string.split(' ')
finall = []
for idx, elem in enumerate(list):
currentelem = elem
if currentelem[0] == '[' and currentelem[-1] != ']':
currentelem += list[(idx + 1) % len(list)]
finall.append(currentelem)
elif currentelem[0] != '[' and currentelem[-1] != ']':
finall.append(currentelem)
print(finall)
Let me offer an alternative to the ones above:
import re
string = 'hello world [Im nick][introduction]'
re.findall(r'(\[.+\]|\w+)', string)
Produces:
['hello', 'world', '[Im nick][introduction]']
you can use regex split with lookbehind/lookahead, note it is simple to filter out empty entries with filter or a list comprehension than avoid in re
import re
s = 'sss sss bbb [zss sss][zsss ss] sss sss bbb [ss sss][sss ss]'
[x for x in re.split(r"(?=\[[^\]\[]+\])* ", s)] if x]
user_words = raw_input()
word_list = user_words.split()
user_words = []
for word in word_list:
user_words.append(word.capitalize())
user_words = " ".join(user_words)
print(user_words)
Current Output:
Input:
hello world(two spaces in between)
Output:
Hello World
Desired Output:
Input:
hello world(two spaces in between)
Output:
Hello World(two spaces in between)
Note: I want to be able to split the string by spaces, but still have the extra spaces between words in the original string that's inputted by the user.
If you split using the space character, you'll get extra '' in your list
>>> "Hello world".split()
['Hello', 'world']
>>> "Hello world".split(' ')
['Hello', '', 'world']
Those generate the extra spaces again after a join
>>> ' '.join(['Hello', '', 'world'])
'Hello world'
Use re.split for this and join by the space original string has.
user_words = raw_input()
word_list = re.split(r"(\s+)",user_words)
user_words = []
user_words.append(word_list[0].capitalize())
user_words.append(word_list[2].capitalize())
user_words = word_list[1].join(user_words)
print(user_words)
I have a list as shown below:
exclude = ["please", "hi", "team"]
I have a string as follows:
text = "Hi team, please help me out."
I want my string to look as:
text = ", help me out."
effectively stripping out any word that might appear in the list exclude
I tried the below:
if any(e in text.lower()) for e in exclude:
print text.lower().strip(e)
But the above if statement returns a boolean value and hence I get the below error:
NameError: name 'e' is not defined
How do I get this done?
Something like this?
>>> from string import punctuation
>>> ' '.join(x for x in (word.strip(punctuation) for word in text.split())
if x.lower() not in exclude)
'help me out
If you want to keep the trailing/leading punctuation with the words that are not present in exclude:
>>> ' '.join(word for word in text.split()
if word.strip(punctuation).lower() not in exclude)
'help me out.'
First one is equivalent to:
>>> out = []
>>> for word in text.split():
word = word.strip(punctuation)
if word.lower() not in exclude:
out.append(word)
>>> ' '.join(out)
'help me out'
You can use Use this (remember it is case sensitive)
for word in exclude:
text = text.replace(word, "")
This is going to replace with spaces everything that is not alphanumeric or belong to the stopwords list, and then split the result into the words you want to keep. Finally, the list is joined into a string where words are spaced. Note: case sensitive.
' '.join ( re.sub('\W|'+'|'.join(stopwords),' ',sentence).split() )
Example usage:
>>> import re
>>> stopwords=['please','hi','team']
>>> sentence='hi team, please help me out.'
>>> ' '.join ( re.sub('\W|'+'|'.join(stopwords),' ',sentence).split() )
'help me out'
Using simple methods:
import re
exclude = ["please", "hi", "team"]
text = "Hi team, please help me out."
l=[]
te = re.findall("[\w]*",text)
for a in te:
b=''.join(a)
if (b.upper() not in (name.upper() for name in exclude)and a):
l.append(b)
print " ".join(l)
Hope it helps
if you are not worried about punctuation:
>>> import re
>>> text = "Hi team, please help me out."
>>> text = re.findall("\w+",text)
>>> text
['Hi', 'team', 'please', 'help', 'me', 'out']
>>> " ".join(x for x in text if x.lower() not in exclude)
'help me out'
In the above code, re.findall will find all words and put them in a list.
\w matches A-Za-z0-9
+ means one or more occurrence