I want to put AEM into brackets, so that text will look like: Agnico Eagle Mines Limited (AEM)
text = "Agnico Eagle Mines Limited AEM"
def add_brackets(test):
for word in test:
if word.isupper():
word = "(" + word + ")"
print(test)
print(add_brackets(text))
What's wrong with the code? I get the original text.
Two things, 1 you are checking per character, not per word. 2 you are not modifying text you are just setting word and not doing anything with it.
text = "Agnico Eagle Mines Limited AEM"
def add_brackets(test):
outstr = ""
for word in test.split(" "):
if word.isupper():
outstr += " (" + word + ")"
else:
outstr += " " + word
return outstr.strip()
print(add_brackets(text))
Edit: Fancier
text = "Agnico Eagle Mines Limited AEM"
def add_brackets(test):
return " ".join(["({})".format(word) if word.isupper() else word for word in test.split(" ")])
print(add_brackets(text))
This would be pretty concise with a regular expression substitution:
>>> import re
>>> text = "Agnico Eagle Mines Limited AEM"
>>> re.sub(r'\b([A-Z]+)\b', r'(\1)', text)
'Agnico Eagle Mines Limited (AEM)'
This looks for multiple uppercase characters together, with word boundaries (like whitespace) on either side, then substitutes that matched group with the same text (the \1) with the addition of parentheses.
In a function:
>>> import re
>>> def add_brackets(s):
... return re.sub(r'\b([A-Z]+)\b', r'(\1)', s)
...
>>> print(add_brackets(text))
Agnico Eagle Mines Limited (AEM)
Related
I have strings like so: hey what is up!, "what did you say?", "he said 'well'", etc. and a regex expression like so: [!%&'\(\)$#\"\/\\*+,-.:;<=>?#\[\]^_´{|}~]´. These are my delimiters and into the strings shown a space shall be inserted like so: "hey what is up !", "what did you say ?", "he said ' well '". So if one of the delimiters is in front of another character sequence, add a space, and if its is after, add space as well.
How can I achieve this? I do not want to split by these delimiters.
Here's my solution but I would be curious how to solve it with regex.
space = set("[!%&'()$#\"/\*+,-.:;<=>?#[]^_´`{|}~]")
for sent in self.sentences:
sent = list(sent)
for i, char in enumerate(sent):
# Make sure to respect length of string when indexing
if i != 0:
# insert space in front if char is punctuation
if sent[i] in space and sent[i - 1] != " ":
sent.insert(i, " ")
if i != len(sent)-1:
# insert space after if char is punctuation
if sent[i] in space and sent[i + 1] != " ":
sent.insert(i + 1, " ")
You could expand your pattern to catch optional spaces and then replace by capture group plus spaces before and after (loop only for demo, not neccessary):
import re
strings = ["hey what is up!", "what did you say?", "he said 'well'"]
pattern = r'(\s?[!%&\'\(\)$#\"\/\\*+,-.:;<=>?#\[\]^_´{|}~]\s?)'
for string in strings:
print(re.sub(pattern, r' \1 ', string))
This will give this output:
hey what is up !
what did you say ?
he said ' well '
Without the aid of the re module you could simply do this:
punctuation = "!%&'()$#\"/\\*+,-.:;<=>?#[]^_´{|}~"
mystring = "Well hello! How are you?"
mylist = list(mystring)
i = 0
for c in mystring:
if c in punctuation:
mylist.insert(i, ' ')
i += 2
else:
i += 1
print(''.join(mylist))
You can make a loop that goes through your strings and when it finds a ponctuation character use the slice function to cut your string in half and concatenate with a space in between.
For example:
for i in yourString:
if yourString[i] == '!':
newString = yourString.slice(0, i) + " " + yourString.slice(i + 1)
It only checks for "!" but you could replace it with a dictionnary of ponctuation characters
I have a list of names that has sizes in inches in it. Such as:
Asus VP248QG 24''
BenQ XYZ123456 32"
As you can see first name has double single-quote sign for inches while second name has normal double-quote sign.
I have this code to remove these sizes, because I do not need them:
def monitor_fix(s):
if ('"' in s):
return re.sub(r'\s+\d+(?:\.\d+)"\s*$', '', str(s))
if ("''" in s):
return re.sub(r"\s+\d+(?:\.\d+)''\s*$", '', str(s))
But it only removes ordinary double-quote sign, not the double single-quote sign. How to deal with this?
You can simply remove the last 4 - 5 symbols with string[:]
list = ["Asus VP248QG 24''", 'BenQ XYZ123456 32"']
for i in range(len(list)):
if "''" in list[i]:
list[i] = list[i][:-5]
if '"' in list[i]:
list[i] = list[i][:-4]
print(list[i])
Assuming the sizes are always well separated with spaces, we can simply remove the "word" that contains quotes. Bonus point because the size can be anywhere in the string too.
products = ["Asus VP248QG 24'' silver", 'BenQ XYZ123456 32"']
for n, product in enumerate(products):
product_without_size = ""
for word in product.split(" "):
if not("''" in word or '"' in word): # If the current word is not a size,
product_without_size += word + " " # add it to the product name (else skip it).
products[n] = product_without_size.rstrip(" ")
print(products) # ['Asus VP248QG silver', 'BenQ XYZ123456']
Using the format of your original post, it would look like this:
def monitor_fix(product):
product_without_size = ""
for word in product.split(" "):
if not("''" in word or '"' in word): # If the current word is not a size,
product_without_size += word + " " # add it to the product name (else skip it).
return product_without_size.rstrip(" ")
So, I want to be able to scramble words in a sentence, but:
Word order in the sentence(s) is left the same.
If the word started with a capital letter, the jumbled word must also start with a capital letter
(i.e., the first letter gets capitalised).
Punctuation marks . , ; ! and ? need to be preserved.
For instance, for the sentence "Tom and I watched Star Wars in the cinema, it was
fun!" a jumbled version would be "Mto nad I wachtde Tars Rswa ni het amecin, ti wsa
fnu!".
from random import shuffle
def shuffle_word(word):
word = list(word)
if word.title():
???? #then keep first capital letter in same position in word?
elif char == '!' or '.' or ',' or '?':
???? #then keep their position?
else:
shuffle(word)
return''.join(word)
L = input('try enter a sentence:').split()
print([shuffle_word(word) for word in L])
I am ok for understanding how to jumble each word in the sentence but... struggling with the if statement to apply specifics? please help!
Here is my code. Little different from your logic. Feel free to optimize the code.
import random
def shuffle_word(words):
words_new = words.split(" ")
out=''
for word in words_new:
l = list(word)
if word.istitle():
result = ''.join(random.sample(word, len(word)))
out = out + ' ' + result.title()
elif any(i in word for i in ('!','.',',')):
result = ''.join(random.sample(word[:-1], len(word)-1))
out = out + ' ' + result+word[-1]
else:
result = ''.join(random.sample(word, len(word)))
out = out +' ' + result
return (out[1:])
L = "Tom and I watched Star Wars in the cinema, it was fun!"
print(shuffle_word(L))
Output of above code execution:
Mto nda I whaecdt Atsr Swra in hte ienamc, ti wsa nfu!
Hope it helps. Cheers!
Glad to see you've figured out most of the logic.
To maintain the capitalization of the first letter, you can check it beforehand and capitalize the "new" first letter later.
first_letter_is_cap = word[0].isupper()
shuffle(word)
if first_letter_is_cap:
# Re-capitalize first letter
word[0] = word[0].upper()
To maintain the position of a trailing punctuation, strip it first and add it back afterwards:
last_char = word[-1]
if last_char in ".,;!?":
# Strip the punctuation
word = word[:-1]
shuffle(word)
if last_char in ".,;!?":
# Add it back
word.append(last_char)
Since this is a string processing algorithm I would consider using regular expressions. Regex gives you more flexibility, cleaner code and you can get rid of the conditions for edge cases. For example this code handles apostrophes, numbers, quote marks and special phrases like date and time, without any additional code and you can control these just by changing the pattern of regular expression.
from random import shuffle
import re
# Characters considered part of words
pattern = r"[A-Za-z']+"
# shuffle and lowercase word characters
def shuffle_word(word):
w = list(word)
shuffle(w)
return ''.join(w).lower()
# fucntion to shuffle word used in replace
def replace_func(match):
return shuffle_word(match.group())
def shuffle_str(str):
# replace words with their shuffled version
shuffled_str = re.sub(pattern, replace_func, str)
# find original uppercase letters
uppercase_letters = re.finditer(r"[A-Z]", str)
# make new characters in uppercase positions uppercase
char_list = list(shuffled_str)
for match in uppercase_letters:
uppercase_index = match.start()
char_list[uppercase_index] = char_list[uppercase_index].upper()
return ''.join(char_list)
print(shuffle_str('''Tom and I watched "Star Wars" in the cinema's new 3D theater yesterday at 8:00pm, it was fun!'''))
This works with any sentence, even if was "special" characters in a row, preserving all the punctuaction marks:
from random import sample
def shuffle_word(sentence):
new_sentence=""
word=""
for i,char in enumerate(sentence+' '):
if char.isalpha():
word+=char
else:
if word:
if len(word)==1:
new_sentence+=word
else:
new_word=''.join(sample(word,len(word)))
if word==word.title():
new_sentence+=new_word.title()
else:
new_sentence+=new_word
word=""
new_sentence+=char
return new_sentence
text="Tom and I watched Star Wars in the cinema, it was... fun!"
print(shuffle_word(text))
Output:
Mto nda I hctawed Rast Aswr in the animec, ti asw... fnu!
this is the string for example:
'I have an apple. I want to eat it. But it is so sore.'
and I want to convert it to this one:
'I have an apple want to eat it is is so sore'
Here is a way to do it without regexes, using del as you have mentioned:
def remove_after_sym(s, sym):
# Find first word
first = s.strip().split(' ')[0]
# Split the string using the symbol
l = []
s = s.strip().split(sym)
# Split words by space in each sentence
for a in s:
x = a.strip().split(' ')
del x[0]
l.append(x)
# Join words in each sentence
for i in range(len(l)):
l[i] = ' '.join(l[i])
# Combine sentences
final = first + ' ' + ' '.join(l)
final = final.strip() + '.'
return final
Here, sym is a str (a single character).
Also I have used the word 'sentence' very liberally as in your example, sym is a dot. But here sentence really means parts of the string broken by the symbol you want.
Here is what it outputs.
In [1]: remove_after_sym(string, '.')
Out[1]: 'I have an apple want to eat it it is so sore.'
I have long string (28MB) of normal sentences. I want to remove all words what are fully in capital letters (like TNT, USA, OMG).
So from sentance:
Jump over TNT in There.
I would like to get:
Jump over in There.
Is there any way, how to do it without splitting the text into list and itereate? Is it possible to use regex somehow to do is?
You can use the set of capital letters [A-Z] captured with word boundary \b:
import re
line = 'Jump over TNT in There NOW'
m = re.sub(r'\b[A-Z]+\b', '', line)
#'Jump over in There '
Use the module re,
import re
line = 'Jump over TNT in There.'
new_line = re.sub(r'[A-Z]+(?![a-z])', '', line)
print(new_line)
# Output
Jump over in There.
I would do something like this:
import string
def onlyUpper(word):
for c in word:
if not c.isupper():
return False
return True
s = "Jump over TNT in There."
for char in string.punctuation:
s = s.replace(char, ' ')
words = s.split()
good_words = []
for w in words:
if not onlyUpper(w):
good_words.append(w)
result = ""
for w in good_words:
result = result + w + " "
print result