I need to write a function that returns the first letters (and make it uppercase) of any text like:
shortened = shorten("Don't repeat yourself")
print(shortened)
Expected output:
DRY
and:
shortened = shorten("All terrain armoured transport")
print(shortened)
Expected output:
ATAT
Use list comprehension and join
shortened = "".join([x[0] for x in text.title().split(' ') if x])
Using regex you can match all characters except the first letter of each word, replace them with an empty string to remove them, then capitalize the resulting string:
import re
def shorten(sentence):
return re.sub(r"\B[\S]+\s*","",sentence).upper()
print(shorten("Don't repeat yourself"))
Output:
DRY
text = 'this is a test'
output = ''.join(char[0] for char in text.title().split(' '))
print(output)
TIAT
Let me explain how this works.
My first step is to capitalize the first letter of each work
text.title()
Now I want to be able to separate each word by the space in between, this will become a list
text.title()split(' ')
With that I'd end up with 'This','Is','A','Test' so now I obviously only want the first character of each word in the list
for word in text.title()split(' '):
print(word[0]) # T I A T
Now I can lump all that into something called list comprehension
output = [char[0] for char in text.title().split(' ')]
# ['T','I','A','T']
I can use ''.join() to combine them together, I don't need the [] brackets anymore because it doesn't need to be a list
output = ''.join(char[0] for char in text.title().split(' ')
Related
I have a regex in python and I want to prevent matching substrings. I want to add '#' at the beginning some words with alphanumeric and _ character and 4 to 15 characters. But it matches substring of larger words. I have this method:
def add_atsign(sents):
for i, sent in enumerate(sents):
sents[i] = re.sub(r'([a-zA-Z0-9_]{4,15})', r'#\1', str(sent))
return sents
And the example is :
mylist = list()
mylist.append("ali_s ali_t ali_u aabs:/t.co/kMMALke2l9")
add_atsign(mylist)
And the answer is :
['#ali_s #ali_t #ali_u #aabs:/t.co/#kMMALke2l9']
As you can see, it puts '#' at the beginning of 'aabs' and 'kMMALke2l9'. That it is wrong.
I tried to edit the code as bellow :
def add_atsign(sents):
for i, sent in enumerate(sents):
sents[i] = re.sub(r'((^|\s)[a-zA-Z0-9_]{4,15}(\s|$))', r'#\1', str(sent))
return sents
But the result will become like this :
['#ali_s ali_t# ali_u aabs:/t.co/kMMALke2l9']
As you can see It has wrong replacements.
The correct result I expect is:
"#ali_s #ali_t #ali_u aabs:/t.co/kMMALke2l9"
Could anyone help?
Thanks
This is a pretty interesting question. If I understand correctly, the issue is that you want to divide the string by spaces, and then do the replacement only if the entire word matches, and not catch a substring.
I think the best way to do this is to first split by spaces, and then add assertions to your regex that catch only an entire string:
def add_atsign(sents):
new_list = []
for string in sents:
new_list.append(' '.join(re.sub(r'^([a-zA-Z0-9_]{4,15})$', r'#\1', w)
for w in string.split()))
return new_list
mylist = ["ali_s ali_t ali_u aabs:/t.co/kMMALke2l9"]
add_atsign(mylist)
>
['#ali_s #ali_t #ali_u aabs:/t.co/kMMALke2l9']
ie, we split, then replace only if the entire word matches, then rejoin.
By the way, your regex can be simplified to r'^(\w{4,15})$':
def add_atsign(sents):
new_list = []
for string in sents:
new_list.append(' '.join(re.sub(r'^(\w{4,15})$', r'#\1', w)
for w in string.split()))
return new_list
You can separate words by spaces by adding (?<=\s) to the start and \s to the end of your first expression.
def add_atsign(sents):
for i, sent in enumerate(sents):
sents[i] = re.sub(r'((^|(?<=\s))[a-zA-Z0-9_]{4,15}\s)', r'#\1', str(sent))
return sents
The result will be like this:
['#ali_s #ali_t #ali_u aabs:/t.co/kMMALke2l9']
I am not sure what you are trying to accomplish, but the reason it puts the # at the wrong places is that as you added /s or ^ to the regex the whitespace becomes part of the match and it therefore puts the # before the whitespace.
you could try to split it to
check at beginning of string and put at first position and
check after every whitespace and put to second position
Im aware its not optimal, but maybe i can help if you clarify what the regex is supposed to match and what it shouldnt in a bit more detail
So for an assignment I have to create an empty list variable empty_list = [], then have python loop over a string, and have it add each word that starts with a 't' to that empty list. My attempt:
text = "this is a text sentence with words in it that start with letters"
empty_list = []
for twords in text:
if text.startswith('t') == True:
empty_list.append(twords)
break
print(empty_list)
This just prints a single [t]. I'm pretty sure I'm not using startswith() correctly. How would I go about making this work correctly?
text = "this is a text sentence with words in it that start with letters"
print([word for word in text.split() if word.startswith('t')])
Working solution for you. You also need to replace text.startswith('t') by twords.startswith('t') because you are now using twords to iterate through each word of your original statement stored in text. You used break which would only make your code print this since after finding the first word, it will break outside the for loop. To get all the words beginning with t, you need to get rid of the break.
text = "this is a text sentence with words in it that start with letters"
empty_list = []
for twords in text.split():
if twords.startswith('t') == True:
empty_list.append(twords)
print(empty_list)
> ['this', 'text', 'that']
Try something like this:
text = "this is a text sentence with words in it that start with letters"
t = text.split(' ')
ls = [s for s in t if s.startswith('t')]
ls will be the resulting list
Python is great for using list comprehension.
The below code works,
empty_list = []
for i in text.split(" "):
if i.startswith("t"):
empty_list.append(i)
print(empty_list)
The problem in your code is,
You are iterating each letter, that's wrong
I have a task that was assigned to me for homework. Basically the problem is:
Write a program that can get rid of the brand names and replace them with the generic names.
The table below shows some brand names that have generic names. The mapping has also been provided to you in your program as the BRANDS dictionary.
BRANDS = {
'Velcro': 'hook and loop fastener',
'Kleenex': 'tissues',
'Hoover': 'vacuum',
'Bandaid': 'sticking plaster',
'Thermos': 'vacuum flask',
'Dumpster': 'garbage bin',
'Rollerblade': 'inline skate',
'Asprin': 'acetylsalicylic acid'
}
This is my code:
sentence = input('Sentence: ')
sentencelist = sentence.split()
for c in sentencelist:
if c in BRANDS:
d = c.replace(c, BRANDS[c])
print(d, end=' ')
else:
print(c, end=' ')
My output:
Sentence: I bought some Velcro shoes.
I bought some hook and loop fastener shoes.
Expected output:
Sentence: I bought some Velcro shoes.
I bought some hook and loop fastener shoes.
It looks the same, but in my output there was an extra whitespace after 'shoes.' when there isn't supposed to be a whitespace. So how do I remove this whitespace?
I know you could do rstrip() or replace() and I tried it, but it would just jumble everything together when I just need to remove the trailing whitespace and not remove any other whitespace. If the user put the brand name in the middle of the sentence, and I used rstrip(), it would join the brand name and the rest of the sentence together.
The key is to use a string's join method to concatenate everything for you. For example, to put a space between a bunch of strings without putting a space after the last bit, do
' '.join(bunch_of_strings)
The strings have to be in an iterable, like a list, for that to work. You could make the list like this:
edited_list = []
for word in sentence_list:
if word in BRANDS:
edited_list.append(BRANDS[word])
else:
edited_list.append(word)
A much shorter alternative would be
edited_list = [BRANDS.get(word, word) for word in sentence_list]
Either way, you can combine the edited sentence using the join method:
print(' '.join(edited_list))
This being Python, you can do the whole thing as a one-liner without using an intermediate list at all:
print(' '.join(BRANDS.get(word, word) for word in sentence_list))
Finally, you could do the joining in print itself using splat notation. Here, you would pass in each element of your list as a separate argument, and use the default sep argument to insert the spaces:
print(*edited_list)
As an aside, d = c.replace(c, BRANDS[c]) is a completely pointless equivalent of just d = BRANDS[c]. Since strings are immutable, any time you do c.replace(c, ..., you are just returning the replacent in a somewhat illegible manner.
The problem is that print(c, end=' ') will always print a space after c. Here is a pretty minimal change to fix that:
sentence = input('Sentence: ')
sentencelist = sentence.split()
is_first = True
for c in sentencelist:
if not is_first:
print(' ', end='')
is_first = False
if c in BRANDS:
d = c.replace(c, BRANDS[c])
print(d, end='')
else:
print(c, end='')
As others have pointed out, this can be tidied up, e.g., d = c.replace(c, BRANDS[c]) is equivalent to d = BRANDS[c], and if you change it to c = BRANDS[c], then you could use a single print call and no else clause.
But you also have to be careful with your approach, because it will fail for sentences like "I bought a Hoover." The sentence.split() operation will keep "Hoover." as a single item, and that will fail the c in BRANDS test due to the extra period. You could try to separate words from punctuation, but that won't be easy. Another solution would be to apply all the replacements to each element, or equivalently, to the whole sentence. That should work fine in this case since you may not have to worry about replacement words that could be embedded in longer words (e.g., accidentally replacing 'cat' embedded in 'caterpillar'). So something like this may work OK:
new_sentence = sentence
for brand, generic in BRANDS.items():
new_sentence = new_sentence.replace(brand, generic)
print(new_sentence)
Your end=' ' unconditionally appends extra spaces to your output. There is no consistent way to undo this (echoing a backspace character only works for terminals, seeking only works for files, etc.).
The trick is to avoid printing it in the first place:
sentence = input('Sentence: ')
sentencelist = sentence.split()
result = []
for c in sentencelist:
# Perform replacement if needed
if c in BRANDS:
c = BRANDS[c] # c.replace(c, BRANDS[c]) is weird way to spell BRANDS[c]
# Append possibly replaced value to list of results
result.append(c)
# Add spaces only in between elements, not at the end, then print all at once
print(' '.join(result))
# Or as a trick to let print add the spaces and convert non-strings to strings:
print(*result)
You dont have to split the word and iterating through it.
Try this code it will work and will not get the issue of white space anymore
sentence = ' '.join(str(BRANDS.get(word, word)) for word in input_words)
Here,make a list names "input_words" and add the number of line that you wanted to process
Happy Learning!
say_d = ["say", "tell me"]
a = input("Please Type An Action For Me To Do: ")
if any(word in a for word in say_d):
print(a)
This is the program that prints out the typed input, if any keyword from say_d is in it. But it will also print the keyword. Is there any way to remove the keyword from the supposed output? Like:
say_d = ["say", "tell me"]
a = input("Please Type An Action For Me To Do: ")
if any(word in a for word in say_d):
print(a-say_d)
You can use either regex or str.replace to replace the common words with empty string:
import re
say_d = ["say","tell me"]
a = (input("Please Type An Action For Me To Do: "))
if any(word in a for word in say_d):
print(re.sub('|\b'.join(say_d), '', a))
But Note that if you want to remove the common words if thery exist in input, you don't need to use any, for both functions (re.sub and str.replace) replace the string only of they exist in your text.
Also, the part word in a will check the membership within the entire input string, not its words. That says, if one of the words within the input string is contain a word inside say_d it will return True. Like sayulita which is contain the word say.
For getting ride of this problem you can again check the membership by splitting the input string and then looping over it or use regex.
This is what you looking for:
say_d=["say","tell me"]
a=(input("Please Type An Action For Me To Do: "))
if any(word in a for word in say_d):
set2 = set(say_d)
result = [element for element in a if element not in set2]
print(list(result))
Without any input and output in question, i assume that you want to remove any candidate word(which in list say_d) from input string, so maybe it can be as simple as below:
say_d=["say","tell me"]
a=(input("Please Type An Action For Me To Do: "))
ret = reduce(lambda r, w: r.replace(w, ''), say_d, a)
if len(ret) != len(a):
print ret
I am trying to translate morse code into words and sentences and it all works fine... except for one thing. My entire output is lowercased and I want to be able to capitalize every first letter of every sentence.
This is my current code:
text = input()
if is_morse(text):
lst = text.split(" ")
text = ""
for e in lst:
text += TO_TEXT[e].lower()
print(text)
Each element in the split list is equal to a character (but in morse) NOT a WORD. 'TO_TEXT' is a dictionary. Does anyone have a easy solution to this? I am a beginner in programming and Python btw, so I might not understand some solutions...
Maintain a flag telling you whether or not this is the first letter of a new sentence. Use that to decide whether the letter should be upper-case.
text = input()
if is_morse(text):
lst = text.split(" ")
text = ""
first_letter = True
for e in lst:
if first_letter:
this_letter = TO_TEXT[e].upper()
else:
this_letter = TO_TEXT[e].lower()
# Period heralds a new sentence.
first_letter = this_letter == "."
text += this_letter
print(text)
From what is understandable from your code, I can say that you can use the title() function of python.
For a more stringent result, you can use the capwords() function importing the string class.
This is what you get from Python docs on capwords:
Split the argument into words using str.split(), capitalize each word using str.capitalize(), and join the capitalized words using str.join(). If the optional second argument sep is absent or None, runs of whitespace characters are replaced by a single space and leading and trailing whitespace are removed, otherwise sep is used to split and join the words.