I have long string (28MB) of normal sentences. I want to remove all words what are fully in capital letters (like TNT, USA, OMG).
So from sentance:
Jump over TNT in There.
I would like to get:
Jump over in There.
Is there any way, how to do it without splitting the text into list and itereate? Is it possible to use regex somehow to do is?
You can use the set of capital letters [A-Z] captured with word boundary \b:
import re
line = 'Jump over TNT in There NOW'
m = re.sub(r'\b[A-Z]+\b', '', line)
#'Jump over in There '
Use the module re,
import re
line = 'Jump over TNT in There.'
new_line = re.sub(r'[A-Z]+(?![a-z])', '', line)
print(new_line)
# Output
Jump over in There.
I would do something like this:
import string
def onlyUpper(word):
for c in word:
if not c.isupper():
return False
return True
s = "Jump over TNT in There."
for char in string.punctuation:
s = s.replace(char, ' ')
words = s.split()
good_words = []
for w in words:
if not onlyUpper(w):
good_words.append(w)
result = ""
for w in good_words:
result = result + w + " "
print result
Related
I want to write a Python script that will read a test from the console and output the average number of characters per word,but i have some problem with punctuation and newline characters .
there is my code.
def main():
allwords = []
while True:
words = input()
if words == "Amen.":
break
allwords.extend(words.split())
txt = " ".join( allwords)
n_pismen = len([c for c in txt if c.isalpha()])
n_slov = len([i for i in range(len(txt) - 1) if txt[i].isalpha() and not txt[i + 1].isalpha()])
for char in (',', '.', ';'):
txt = txt.replace(char, '')
txt.replace('\n', ' ')
words = txt.split()
print(sum(len(word) for word in words) / len(words))
if words:
average = sum(len(words) for words in allwords) / len( allwords)
if __name__ == '__main__':
main()
Our Father, which art in heaven,
hallowed be thy name;
thy kingdom come;
thy will be done,
in earth as it is in heaven.
Give us this day our daily bread.
And forgive us our trespasses,
as we forgive them that trespass against us.
And lead us not into temptation,
but deliver us from evil.
For thine is the kingdom,
the power, and the glory,
For ever and ever.
Amen.
normal will be output 4.00,but i just get 1.00
Not sure what's wrong in your example, but this will work. I used "test" as the string name, you can modify that as desired:
counts = [] #List to store number of characters per word
for t in test.split(): #Split to substrings at whitespace
counts.append(len([c for c in t if c.isalpha()])) #Calculate the length of each word ignoring non-letters
print(sum(counts)/len(counts)) #Compute the average
You can do this as follows (where strng is the passage of text):
# Remove all of the 'bad' characters
for char in (',', '.', ';'):
strng = strng.replace(char, '')
strng.replace('\n', ' ')
# Split on spaces
words = strng.split()
# Calculate the average length
print(sum(len(word) for word in words) / len(words))
I would match every word using a regex, than keep track on # of words and # of total characters:
import re
total_number = 0
n_words = 0
pattern = re.compile("[a-z]+", re.IGNORECASE)
with open({PATH_TO_YOUR_FILE}, "r") as f:
for line in f:
words = pattern.findall(line)
n_words += len(words)
total_number += sum([len(x) for x in words])
print(total_number/n_words)
OUTPUT
4.0
Try this:
import string
s = input()
s = s.translate(str.maketrans('', '', string.punctuation)
s.replace('\n', ' ')
words = s.split()
print(sum(map(len, words)) / len(words))
Write a function named string_processing that takes a list of
strings as input and returns an all-lowercase string with no
punctuation. There should be a space between each word. You do not
have to check for edge cases.
Here is my code:
import string
def string_processing(string_list):
str1 = ""
for word in string_list:
str1 += ''.join(x for x in word if x not in string.punctuation)
return str1
string_processing(['hello,', 'world!'])
string_processing(['test...', 'me....', 'please'])
My output:
'helloworld'
'testmeplease'
Expected output:
'hello world'
'test me please'
How to add a space in just between words?
You just need to keep all the words separate and then join them later with a space between them:
import string
def string_processing(string_list):
ret = []
for word in string_list:
ret.append(''.join(x for x in word if x not in string.punctuation))
return ' '.join(ret)
print(string_processing(['hello,', 'world!']))
print(string_processing(['test...', 'me....', 'please']))
Output:
hello world
test me please
Using regex, remove every non-letter and then join with a space:
import re
def string_processing(string_list):
return ' '.join(re.sub(r'[^a-zA-Z]', '', word) for word in string_list)
print(string_processing(['hello,', 'world!']))
print(string_processing(['test...', 'me....', 'please']))
Gives:
hello world
test me please
Try:
import string
def string_processing(string_list):
str1 = ""
for word in string_list:
st = ''.join(x for x in word if x not in string.punctuation)
str1 += f"{st} " #<-------- here
return str1.rstrip() #<------- here
string_processing(['hello,', 'world!'])
string_processing(['test...', 'me....', 'please'])
using regex:
import re
li = ['hello...,', 'world!']
st = " ".join(re.compile('\w+').findall("".join(li)))
The following code could help.
import string
def string_processing(string_list):
for i,word in enumerate(string_list):
string_list[i] = word.translate(str.maketrans('', '', string.punctuation)).lower()
str1 = " ".join(string_list)
return str1
string_processing(['hello,', 'world!'])
string_processing(['test...', 'me....', 'please'])
We can use the re library to process the words and add a space between them
import re
string = 'HelloWorld'
print(re.sub('([A-Z])', r' \1', string))
Output:
Hello World
So, I want to be able to scramble words in a sentence, but:
Word order in the sentence(s) is left the same.
If the word started with a capital letter, the jumbled word must also start with a capital letter
(i.e., the first letter gets capitalised).
Punctuation marks . , ; ! and ? need to be preserved.
For instance, for the sentence "Tom and I watched Star Wars in the cinema, it was
fun!" a jumbled version would be "Mto nad I wachtde Tars Rswa ni het amecin, ti wsa
fnu!".
from random import shuffle
def shuffle_word(word):
word = list(word)
if word.title():
???? #then keep first capital letter in same position in word?
elif char == '!' or '.' or ',' or '?':
???? #then keep their position?
else:
shuffle(word)
return''.join(word)
L = input('try enter a sentence:').split()
print([shuffle_word(word) for word in L])
I am ok for understanding how to jumble each word in the sentence but... struggling with the if statement to apply specifics? please help!
Here is my code. Little different from your logic. Feel free to optimize the code.
import random
def shuffle_word(words):
words_new = words.split(" ")
out=''
for word in words_new:
l = list(word)
if word.istitle():
result = ''.join(random.sample(word, len(word)))
out = out + ' ' + result.title()
elif any(i in word for i in ('!','.',',')):
result = ''.join(random.sample(word[:-1], len(word)-1))
out = out + ' ' + result+word[-1]
else:
result = ''.join(random.sample(word, len(word)))
out = out +' ' + result
return (out[1:])
L = "Tom and I watched Star Wars in the cinema, it was fun!"
print(shuffle_word(L))
Output of above code execution:
Mto nda I whaecdt Atsr Swra in hte ienamc, ti wsa nfu!
Hope it helps. Cheers!
Glad to see you've figured out most of the logic.
To maintain the capitalization of the first letter, you can check it beforehand and capitalize the "new" first letter later.
first_letter_is_cap = word[0].isupper()
shuffle(word)
if first_letter_is_cap:
# Re-capitalize first letter
word[0] = word[0].upper()
To maintain the position of a trailing punctuation, strip it first and add it back afterwards:
last_char = word[-1]
if last_char in ".,;!?":
# Strip the punctuation
word = word[:-1]
shuffle(word)
if last_char in ".,;!?":
# Add it back
word.append(last_char)
Since this is a string processing algorithm I would consider using regular expressions. Regex gives you more flexibility, cleaner code and you can get rid of the conditions for edge cases. For example this code handles apostrophes, numbers, quote marks and special phrases like date and time, without any additional code and you can control these just by changing the pattern of regular expression.
from random import shuffle
import re
# Characters considered part of words
pattern = r"[A-Za-z']+"
# shuffle and lowercase word characters
def shuffle_word(word):
w = list(word)
shuffle(w)
return ''.join(w).lower()
# fucntion to shuffle word used in replace
def replace_func(match):
return shuffle_word(match.group())
def shuffle_str(str):
# replace words with their shuffled version
shuffled_str = re.sub(pattern, replace_func, str)
# find original uppercase letters
uppercase_letters = re.finditer(r"[A-Z]", str)
# make new characters in uppercase positions uppercase
char_list = list(shuffled_str)
for match in uppercase_letters:
uppercase_index = match.start()
char_list[uppercase_index] = char_list[uppercase_index].upper()
return ''.join(char_list)
print(shuffle_str('''Tom and I watched "Star Wars" in the cinema's new 3D theater yesterday at 8:00pm, it was fun!'''))
This works with any sentence, even if was "special" characters in a row, preserving all the punctuaction marks:
from random import sample
def shuffle_word(sentence):
new_sentence=""
word=""
for i,char in enumerate(sentence+' '):
if char.isalpha():
word+=char
else:
if word:
if len(word)==1:
new_sentence+=word
else:
new_word=''.join(sample(word,len(word)))
if word==word.title():
new_sentence+=new_word.title()
else:
new_sentence+=new_word
word=""
new_sentence+=char
return new_sentence
text="Tom and I watched Star Wars in the cinema, it was... fun!"
print(shuffle_word(text))
Output:
Mto nda I hctawed Rast Aswr in the animec, ti asw... fnu!
I want to put AEM into brackets, so that text will look like: Agnico Eagle Mines Limited (AEM)
text = "Agnico Eagle Mines Limited AEM"
def add_brackets(test):
for word in test:
if word.isupper():
word = "(" + word + ")"
print(test)
print(add_brackets(text))
What's wrong with the code? I get the original text.
Two things, 1 you are checking per character, not per word. 2 you are not modifying text you are just setting word and not doing anything with it.
text = "Agnico Eagle Mines Limited AEM"
def add_brackets(test):
outstr = ""
for word in test.split(" "):
if word.isupper():
outstr += " (" + word + ")"
else:
outstr += " " + word
return outstr.strip()
print(add_brackets(text))
Edit: Fancier
text = "Agnico Eagle Mines Limited AEM"
def add_brackets(test):
return " ".join(["({})".format(word) if word.isupper() else word for word in test.split(" ")])
print(add_brackets(text))
This would be pretty concise with a regular expression substitution:
>>> import re
>>> text = "Agnico Eagle Mines Limited AEM"
>>> re.sub(r'\b([A-Z]+)\b', r'(\1)', text)
'Agnico Eagle Mines Limited (AEM)'
This looks for multiple uppercase characters together, with word boundaries (like whitespace) on either side, then substitutes that matched group with the same text (the \1) with the addition of parentheses.
In a function:
>>> import re
>>> def add_brackets(s):
... return re.sub(r'\b([A-Z]+)\b', r'(\1)', s)
...
>>> print(add_brackets(text))
Agnico Eagle Mines Limited (AEM)
Write a simple program that reads a line from the keyboard and outputs the same line where
every word is reversed. A word is defined as a continuous sequence of alphanumeric characters
or hyphen (‘-’). For instance, if the input is
“Can you help me!”
the output should be
“naC uoy pleh em!”
I just tryed with the following code, but there are some problem with it,
print"Enter the string:"
str1=raw_input()
print (' '.join((str1[::-1]).split(' ')[::-2]))
It prints "naC uoy pleh !em", just look the exclamation(!), it is the problem here. Anybody can help me???
The easiest is probably to use the re module to split the string:
import re
pattern = re.compile('(\W)')
string = raw_input('Enter the string: ')
print ''.join(x[::-1] for x in pattern.split(string))
When run, you get:
Enter the string: Can you help me!
naC uoy pleh em!
You could use re.sub() to find each word and reverse it:
In [8]: import re
In [9]: s = "Can you help me!"
In [10]: re.sub(r'[-\w]+', lambda w:w.group()[::-1], s)
Out[10]: 'naC uoy pleh em!'
My answer, more verbose though. It handles more than one punctuation mark at the end as well as punctuation marks within the sentence.
import string
import re
valid_punctuation = string.punctuation.replace('-', '')
word_pattern = re.compile(r'([\w|-]+)([' + valid_punctuation + ']*)$')
# reverses word. ignores punctuation at the end.
# assumes a single word (i.e. no spaces)
def word_reverse(w):
m = re.match(word_pattern, w)
return ''.join(reversed(m.groups(1)[0])) + m.groups(1)[1]
def sentence_reverse(s):
return ' '.join([word_reverse(w) for w in re.split(r'\s+', s)])
str1 = raw_input('Enter the sentence: ')
print sentence_reverse(str1)
Simple solution without using re module:
print 'Enter the string:'
string = raw_input()
line = word = ''
for char in string:
if char.isalnum() or char == '-':
word = char + word
else:
if word:
line += word
word = ''
line += char
print line + word
you can do this.
print"Enter the string:"
str1=raw_input()
print( ' '.join(str1[::-1].split(' ')[::-1]) )
or then, this
print(' '.join([w[::-1] for w in a.split(' ') ]))