Python 3 - Regular Expression - Match string with one character less - python

So I want to write a regex that matches with a word that is one character less than the word. So for example:
wordList = ['inherit', 'inherent']
for word in wordList:
if re.match('^inhe....', word):
print(word)
And in theory, it would print both inherit and inherent, but I can only get it to print inherent. So how can I match with a word one letter short without just erasing one of the dots (.)

(Edited)
For matching only inherent, you could use .{4}:
re.match('^inhe.{4}', word)
Or ....$:
re.match('^inhe....$')

A regex may not be the best tool here, if you just want to know if word Y starts with the first N-1 letters of word X, do this:
if Y.startswith( X[:-1] ):
# Do whatever you were trying to do.
X[:-1] gets all but the last character of X (or the empty string if X is the empty string).
Y.startswith( 'blah' ) returns true if Y starts with 'blah'.

Related

how to change ith letter of a word in capital letter in python?

I want to change the second last letter of each word in capital letter. but when my sentence contains a word with one letter the program gives an error of (IndexError: string index out of range). Here is my code. It works with more than one letter words. if I write, for example, str="Python is best programming language" it will work because there is not any word with (one) letter.
str ="I Like Studying Python Programming"
array1=str.split()
result =[]
for i in array1:
result.append(i[:-2].lower()+i[-2].upper()+i[-1].lower())
print(" ".join(result))
Your problem is quite amenable to using regular expressions, so I would recommend that here:
str = " I Like Studying Python Programming"
output = re.sub(r'(\w)(?=\w\b)', lambda m: m.group(1).upper(), str)
print(output)
This prints:
I LiKe StudyiNg PythOn ProgrammiNg
Note that this approach will not target any single letter words, since they would not be following by another word character.
Another option using a regex is to narrow down the match for characters only to be uppercased using a negated character class [^\W_\d] to match word characters except a digit or underscore followed by matching a non whitespace characters
This will for example uppercase a) to A) but will not match 3 in 3d
Explanation
[^\W_\d](?=\S(?!\S))
[^\W_\d] Match a word char except _ or a digit
(?= Positive lookahead, assert what is directly to the right is
\S(?!\S) Match a non whitespace char followed by a whitespace boundary
) Close lookahead
See a regex demo and a Python demo
Example
import re
regex = r"[^\W_\d](?=\S(?!\S))"
s = ("I Like Studying Python Programming\n\n"
"a) This is a test with 3d\n")
output = re.sub(regex, lambda m: m.group(0).upper(), s)
print(output)
Output
I LiKe StudyiNg PythOn ProgrammiNg
A) ThIs Is a teSt wiTh 3d
Using the PyPi regex module, you could also use \p{Ll} to match a lowercase letter that has an uppercase variant.
\p{Ll}(?=\S(?!\S))
See a regex demo and a Python demo
Simple check whether the length of each word is greater than one, only then convert the second last letter to uppercase and append it to the variable result, if the length the word is one, append the word as it is to the result variable.
Here is the code:
str ="I Like Studying Python Programming"
array1=str.split()
result =[]
for i in array1:
if len(i) > 1:
result.append(i[:-2].lower()+i[-2].upper()+i[-1].lower())
else:
result.append(i)
print(" ".join(result))

Creating a mapper that find the capitalized words in a text

Implement filescounter, which takes a string in any variety and returns the number of capitalized words in that string, inclusive of the last and first character.
def filescounter(s):
sr=0
for words in text:
#...
return sr
I'm stuck on how to go about this.
Split the text on whitespace then iterate through the words:
def countCapitalized(text):
count = 0
for word in text.split():
if word.isupper():
count += 1
return count
If, by capitalized, you mean only the first letter needs to be capitalized, then you can replace word.isupper() with word[0].isupper().
Use this:
def count_upper_words(text):
return sum(1 for word in text.split() if word.isupper())
Explanation:
split() chops text to words by either spaces or newlines
so called list comprehension works faster than an explicit for-loop and looks nicer

Python Regex - Replace specific word (without hash) with another word

I want to replace only specific word in one string. However, some other words have that word inside but I don't want them to be changed.
For example, for the below string I only want to replace x with y in z string. how to do that?
x = "112-224"
y = "hello"
z = "This is the number 112-224 not #112-224"
When I do re.sub(r'\b' + x + r'\b', y, z) I am getting 'This is the number hello not #hello'. So basically doesn't work with this regex. I am really not good with this regex stuff. What's the right way to do that? so, i can get This is the number hello not #112-224.
How about this:
pattern = r'(?<=[\w\s\n\r\^])'+x+r'(?=[\w\s\n\r$])'
With the complete code being:
x = "112-234"
y = "hello"
z = "112-234this is 112-234 not #112-234"
pattern = r'(?<=[\w\s\n\r\^])'+x+r'(?=[\w\s\n\r$])'
Here, I'm using a positive lookbehind and a positive lookahead in regex, which you can learn more about here
The regex states the match should be preceded by a word character, space, newline or the start of the line, and should be followed by a space, word character newline or the end of the line.
Note: Don't forget to escape out the carat ^ in the lookbehind, otherwise you'll end up negating everything in the square brackets.
Using a lookahead:
re.sub("\d{3}-\d{3}(?=\s)",y,z)
'This is the number hello not #112-224'
The above assumes that the digits will always be at most three.
Alternatively:
re.sub("\d.*\d(?=\s)","hello",z)
'This is the number hello not #112-224'

Search through a list of strings for a word that has a variable character

Basically, I start with inserting the word "brand" where I replace a single character in the word with an underscore and try and find all words that match the remaining characters. For example:
"b_and" would return: "band", "brand", "bland" .... etc.
I started with using re.sub to substitute the underscore in the character. But I'm really lost on where to go next. I only want words that are different by this underscore, either without the underscore or by replacing it with a letter. Like if the word "under" was to run through the list, i wouldn't want it to return "understood" or "thunder", just a single character difference. Any ideas would be great!
I tried replacing the character with every letter in the alphabet first, then back checking if that word is in the dictionary, but that took such a long time, I really want to know if there's a faster way
from itertools import chain
dictionary=open("Scrabble.txt").read().split('\n')
import re,string
#after replacing the word with "_", we find words in the dictionary that match the pattern
new=[]
for letter in string.ascii_lowercase:
underscore=re.sub('_', letter, word)
if underscore in dictionary:
new.append(underscore)
if new == []:
pass
else:
return new
IIUC this should do it. I'm doing it outside a function so you have a working example, but it's straightforward to do it inside a function.
string = 'band brand bland cat dand bant bramd branding blandisher'
word='brand'
new=[]
for n,letter in enumerate(word):
pattern=word[:n]+'\w?'+word[n+1:]
new.extend(re.findall(pattern,string))
new=list(set(new))
Output:
['bland', 'brand', 'bramd', 'band']
Explanation:
We're using regex to do what you're looking. In this case, in every iteration we're taking one letter out of "brand" and making the algorithm look for any word that matches. So it'll look for:
_rand, b_and, br_nd, bra_d, bran_
For the case of "b_and" the pattern is b\w?and, which means: find a word with b, then any character may or may not appear, and then 'and'.
Then it adds to the list all words that match.
Finally I remove duplicates with list(set(new))
Edit: forgot to add string vairable.
Here's a version of Juan C's answer that's a bit more Pythonic
import re
dictionary = open("Scrabble.txt").read().split('\n')
pattern = "b_and" # change to what you need
pattern = pattern.replace('_', '.?')
pattern += '\\b'
matching_words = [word for word in dictionary if re.match(pattern, word)]
Edit: fixed the regex according to your comment, quick explanation:
pattern = "b_and"
pattern = pattern.replace('_', '.?') # pattern is now b.?and, .? matches any one character (or none at all)
pattern += '\\b' # \b prevents matching with words like "bandit" or words longer than "b_and"

python Using Regular Expression to find letters in a string

I want to find the first vowel in a word, and remove all the letters before the first occurrence of vowel, finally return the left of the word. i thought i can use a list to do that, first find 'a' in the word, and get the first part separated by 'a', and then find 'e'.....but i want to simplify it with regular expression, i am thinking if theres a way that i can find all the five vowels at the same time and get a index of the first one, then it will be easy to do next step. so i might need some help. i am a newer in regular expression, does anyone have an idea about this?
i have problems again. this is the code i write according to the suggestion made by #Martijin.
import re
def pigify():
user_input=raw_input()
sentence=re.sub(r'\b([aeiou])([a-z]*)\b',r'\1\2'+'hay',user_input,re.I)
sentence1=re.sub(r'\b(qu)([a-z]*)\b',r'\2\1'+'ay',sentence,re.I)
sentence2=re.sub(r'\b([^aeiou]*)(\w*)\b',r'\2\1'+'ay',sentence1,re.I)
print sentence2
return
pigify()
if i input:
quiet askhj a dhjsadf skdhyksj qdksdj y
i would like to get:
ietquay askhjhay ahay adfdhjsay yksjskdhay qdksdjay yay
but now i've only complished the first two steps:1. find the vowel-started word and add 'hay' at the end of it, 2.find the 'qu'-started word and move 'qu' to the end then add 'ay'.the 3rd step is to find the left words in the sentence and for every word, find the first vowel or 'y'(when 'y' is not the first letter) in it, move all the letters before the vowel to the end and add 'ay'. the code run as result like this:
ietquayayaskhjhay ay ahay dhjsadf skdhyksj qdksdj y
i guess i didn't use \b in a right way, because re.sub use replacement to replace the blocks. how to get it right? by the way, i've accomplished another version with 'for' loop and 'if|else',this is the code, i think there must be a way to simplify it.
def SieveWord(user_input):
return user_input.split(' ')
def UpperToLower(user_input):
return user_input.lower()
vowel=['a','e','i','o','u']
transform_input=UpperToLower(raw_input())
input_list=SieveWord(transform_input)
u=[]
for word in input_list:
if len(word)!=1:
if word[0] in vowel:
word+='h'
else:
if word[0]+word[1]=='qu':
word=word[2:]+'qu'
else:
for letter in word:
if letter in vowel or (letter=='y' and word[0]!='y'):
position=word.index(letter)
removepart=word[0:position]
word=word[position:]+removepart
break
elif word in vowel:
word+='h'
u.append(word+'ay')
for d in u:
print d,
You can use a regular expression to remove all non-vowels at the start of a word:
re.sub(r'\b[^aeoui]*', '', inputstring, flags=re.I)
Demo:
>>> import re
>>> inputstring = 'School'
>>> re.sub(r'\b[^aeoui]*', '', inputstring, flags=re.I)
'ool'
The [^...] negative class matches anything that is not a vowel (with the re.I flag making sure it'll ignore case). The \b anchor matches the position in a string just before or after a word. The re.I makes the In the example above, \b matches the start, and the negative class matches the Sch characters, as they are not in the class.

Categories