Given a sentence, e.g. "Im SHORTING #RSR here", I need to extract word that follow the "#" symbol (from, and not including the "#", to the next space).
Obviously, the "#" symbol can be anywhere in the string.
Thanks.
You could use:
sentence = "Im SHORTING #RSR here"
words = [word.lstrip('#') for word in sentence.split() if word.startswith('#')]
The result will contain all hashtaged words in the sentence.
If there's always exactly one, just use words[0]
Try this:
phrase = 'Im SHORTING #RSR here'
# split the input on spaces
words = phrase.split(' ')
# init empty list
comments = []
# iterate through each word
for word in words:
# check if the first letter of the word is '#'
if word[0] == '#':
# add the comment to the list of comments
comments.append(word)
# let's see what we have!
print(comments)
Related
Write a function that will redact every third word in a sentence. Make use of the hashtag (#) symbol to redact the characters that make up the word, i.e. if the word is five characters long then a string of five hashtags should replace that word. However, this should not redact any of the following punctuation marks:
apostrophes (')
quotations (")
full stops (.)
commas (,)
exclamations (!)
question marks (?)
colons (:)
semicolons (;)
Arguments:
sentence (string) → sentence that needs to be redacted.
Return:
redacted sentence (string) → every third word should be redacted.
This is the function, but i haven't tried anything, i'm just confused
def redact_words(sentence):
sentence = sentence.split()
for word in sentence[2:3]:
for i in word:
word.replace('i', '#')
redacted_sentence =
return redacted_sentence
### END FUNCTION
Expected output
sentence = "My dear Explorer, do you understand the nature of the given question?"
redact_words(sentence) == 'My dear ########, do you ########## the nature ## the given ########?'
sentence = "Explorer, this is why you shouldn't come to a test unprepared."
redact_words(sentence)=="Explorer, this ## why you #######'# come to # test unprepared."
please help
def redact_words(sentence):
# split the sentence into individual words
words = sentence.split(" ")
# initialize a variable to hold the redacted sentence
redacted_sentence=""
# Set a counter for iteration
i=1
for word in words:
# This would select every third word in the sentence
if i%3==0:
# When the word has been selected, store it in a variable to
# perform the redaction operation on it.
redaction=""
for character in word:
# At this point, we want to separate the punctuations from
# the letters in the selected word
if character in '\'".,!?;:':
redaction = redaction + character
else:
redaction = redaction +'#'
# Piece back together the redacted word with the sentence
redacted_sentence = redacted_sentence+" "+redaction
else:
# Piece back together the word not redacted
redacted_sentence = redacted_sentence+" "+word
i+=1
# Return the redacted sentence and remove the spaces.
return redacted_sentence.strip()
Implement filescounter, which takes a string in any variety and returns the number of capitalized words in that string, inclusive of the last and first character.
def filescounter(s):
sr=0
for words in text:
#...
return sr
I'm stuck on how to go about this.
Split the text on whitespace then iterate through the words:
def countCapitalized(text):
count = 0
for word in text.split():
if word.isupper():
count += 1
return count
If, by capitalized, you mean only the first letter needs to be capitalized, then you can replace word.isupper() with word[0].isupper().
Use this:
def count_upper_words(text):
return sum(1 for word in text.split() if word.isupper())
Explanation:
split() chops text to words by either spaces or newlines
so called list comprehension works faster than an explicit for-loop and looks nicer
I have a list of words like substring = ["one","multiple words"] from which i want to check if a sentence contains any of these words.
sentence1 = 'This Sentence has ONE word'
sentence2 = ' This sentence has Multiple Words'
My code to check using any operator:
any(sentence1.lower() in s for s in substring)
This is giving me false even if the word is present in my sentence. I don't want to use regex as it would be an expensive operation for huge data.
Is there any other approach to this?
I think you should reverse your order:
any(s in sentence1.lower() for s in substring)
you're checking if your substring is a part of your sentence, NOT if your sentence is a part of any of your substrings.
As mentioned in other answers, this is what will get you the correct answer if you want to detect substrings:
any(s in sentence1.lower() for s in substring)
However, if your goal is to find words instead of substrings, this is incorrect. Consider:
sentence = "This is an aircraft"
words = ["air", "hi"]
any(w in sentence.lower() for w in words) # True.
The words "air" and "hi" are not in the sentence, but it returns True anyway. Instead, if you want to check for words, you should use:
any(w in sentence.lower().split(' ') for w in words)
use this scenario.
a="Hello Moto"
a.find("Hello")
It will give you an index in return. If the string is not there it will return -1
After a thorough search I could find how to delete all characters before a specific letter but not before any letter.
I am trying to turn a string from this:
" This is a sentence. #contains symbol and whitespace
To this:
This is a sentence. #No symbols or whitespace
I have tried the following code, but strings such as the first example still appear.
for ch in ['\"', '[', ']', '*', '_', '-']:
if ch in sen1:
sen1 = sen1.replace(ch,"")
Not only does this fail to delete the double quote in the example for some unknown reason but also wouldn't work to delete the leading whitespace as it would delete all of the whitespace.
Thank you in advance.
Instead of just removing white spaces, for removing any char before first letter, do this :
#s is your string
for i,x in enumerate(s):
if x.isalpha() #True if its a letter
pos = i #first letter position
break
new_str = s[pos:]
import re
s = " sthis is a sentence"
r = re.compile(r'.*?([a-zA-Z].*)')
print r.findall(s)[0]
Strip all whitespace and punctuation:
>>> text.lstrip(string.punctuation + string.whitespace)
'This is a sentence. #contains symbol and whitespace'
Or, an alternative, find the first character that is an ascii letter. For example:
>>> pos = next(i for i, x in enumerate(text) if x in string.ascii_letters)
>>> text[pos:]
'This is a sentence. #contains symbol and whitespace'
This is a very basic version; i.e. it uses syntax that beginners in Python will easily understand.
your_string = "1324 $$ '!' '' # this is a sentence."
while len(your_string) > 0 and your_string[0] not in "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz":
your_string = your_string[1:]
print(your_string)
#prints "this is a sentence."
Pros: Simple, no imports
Cons: The while loop could be avoided if you feel comfortable using list comprehensions.
Also, the string that you're comparing to could be simpler using regex.
Drop everything up to the first alpha character.
import itertools as it
s = " - .] * This is a sentence. #contains symbol and whitespace"
"".join(it.dropwhile(lambda x: not x.isalpha(), s))
# 'This is a sentence. #contains symbol and whitespace'
Alternatively, iterate the string and test if each character is in a blacklist. If true strip the character, otherwise short-circuit.
def lstrip(s, blacklist=" "):
for c in s:
if c in blacklist:
s = s.lstrip(c)
continue
return s
lstrip(s, blacklist='\"[]*_-. ')
# 'This is a sentence. #contains symbol and whitespace'
You can use re.sub
import re
text = " This is a sentence. #contains symbol and whitespace"
re.sub("[^a-zA-Z]+", " ", text)
re.sub(MATCH PATTERN, REPLACE STRING, STRING TO SEARCH)
I have some text in Python which is composed of numbers and alphabets. Something like this:
s = "12 word word2"
From the string s, I want to remove all the words containing only numbers
So I want the result to be
s = "word word2"
This is a regex I have but it works on alphabets i.e. it replaces each alphabet by a space.
re.sub('[\ 0-9\ ]+', ' ', line)
Can someone help in telling me what is wrong? Also, is there a more time-efficient way to do this than regex?
Thanks!
You can use this regex:
>>> s = "12 word word2"
>>> print re.sub(r'\b[0-9]+\b\s*', '', s)
word word2
\b is used for word boundary and \s* will remove 0 or more spaces after your number word.
Using a regex is probably a bit overkill here depending whether you need to preserve whitespace:
s = "12 word word2"
s2 = ' '.join(word for word in s.split() if not word.isdigit())
# 'word word2'
Without using any external library you could do:
stringToFormat = "12 word word2"
words = ""
for word in stringToFormat.split(" "):
try:
int(word)
except ValueError:
words += "{} ".format(word)
print(words)