Regex in python search - python

I'm trying to make something in python where you put the amount of letters in a word then it searches in a word list for words with that amount of chars.
My code:
import sys
import re
def search(pattern):
print("Searching...\n\n")
for i, line in enumerate(open(sys.argv[1])):
for match in re.finditer(pattern, line):
print(match.groups())
print("\n\nFinished.")
while True:
word = ""
put = int(input("Amount of Letters in Word: "))
if put > 25 or put < 3:
print("Invalid amount of letters.")
else:
for n in range(0, put):
word = word + "."
word = "^" + word + "$"
pattern = re.compile(word)
search(pattern)
I want it to show all words with the amount of letters that you put.
https://i.imgur.com/Kgusvyh.png
List of words:
word
1234
okay
0000
asdfg
asdfh
asdgj
Why does it show ()?

Fixed by replacing match.groups() with match.group().

Related

All possible combinations add of letters

I greet you all I need help I have this code here
how can i achieve this if I want letters to be added between the letters
example
input
east
output
aeast
eaast
eaast
easat
easta
aeast
beast
ebast
eabst
easbt
eastb
ect...
leters = 'abcdefghijklmnopqrstuvwxyz'
words = ('east')
for letersinput in leters:
for rang in range(1):
print(letersinput+""+str(words)+"")
I'm looking for the exact opposite of this, how to do it?
def missing_char(word, n):
n = abs(n)
front = word[:n]
back = word[n+1:]
return front + back
Iterate through the words and letters, and for each position, slice the word at the position and insert the letter:
letters = 'abcdefghijklmnopqrstuvwxyz'
words = ['east']
for word in words:
for letter in letters:
for i in range(len(word)+1):
print(word[:i] + letter + word[i:])
print()
Output:
aeast
eaast
eaast
easat
easta
beast
ebast
eabst
easbt
eastb
...
zeast
ezast
eazst
easzt
eastz

How to take non alpha characters out of a string and put them back into the string in the same order?

I'm trying to make a Pig Latin translator, but I have an issue with my code where it doesn't work properly when a word such as "hi" or /chair/ is input. This is because I need my code to detect that the input has a non-alpha character, take it out of the string, and put it back in when it's done changing the string. I am struggling to make this, though.
# Pig Latin 11/11/20
#!/usr/bin/env python3
vowels = ("A", "E", "I", "O", "U")
message = input("Input text to be translated to Pig
Latin\n")
message = message.split()
not_alpha = {}
new_message = []
for word in message:
This commented out section is what I tried to solve this problem with, before the word would go through the editing process, it would go through here and remove the non_alpha keys and place them in a dictionary called not_alpha. My thought process was that I would place it in a dictionary with the character as the key, and the index in the string as the value. Then, at the end, I would loop through every letter in word and reconstruct the word with all the non-alpha characters in order.
# for letter in word:
# if not letter.isalpha():
# not_alpha[letter] = word.index(letter)
# word = word
# for k in not_alpha.keys():
# word.replace(k, "")
letter_editing = word[0]
if word.isalpha():
if letter_editing.upper() in vowels:
word += "yay"
else:
letter_editing = word[0:2]
if letter_editing.upper() in vowels:
word = word[1:] + word[0] + "ay"
else:
word = word[2:] + word[0:2] + "ay"
# for letter in word:
# if word.index(letter) in not_alpha.values():
While I am not positive of all the rules of pig latin you need to apply, from what I see you are only applying two:
Rule 1 - First Letter is Consonant - in which case you are moving
the first letter to the end of the word and adding ay.
Rule 2 - First Letter is Vowel - in which case you are simply adding ay to
the end of the word.
Given these 2 rules and the following observations:
The input message is a stream of alphanumeric characters, punctuation characters and white space characters of length L.
the start of a word within the message is delineated by one or more punctuation or whitespace characters preceding the word.
the end of a word is delineated by either a punctuation character, a whitespace character or end of message.
You can accomplish the translation as follows:
from string import whitespace, punctuation
vowels = 'aeiou'
message = "Oh! my what a beautiful day for the fox to jump the fence!"
output = "" #The output buffer
temp_buf = "" #Temp storage for latin add-ons
word_found = False #Flag to identify a word has been found
cp = 0 # Character pointer to letter of interest in message
while cp < len(message):
ch = message[cp]
if whitespace.find(ch) >= 0 or punctuation.find(ch) >= 0:
word_found = False
if temp_buf:
output += temp_buf
temp_buf = ""
output += ch
else:
if word_found:
output += ch
else:
word_found = True
if vowels.find(ch.lower()) >= 0:
temp_buf = "ay"
output += ch
else:
temp_buf += ch + "ay"
cp += 1
if temp_buf:
output += temp_buf
print(output)
I'd implement this using the callback form of re.sub, where you can have a function determine the replacement for the regular expression match.
import re
vowels = "aeiou"
def mangle_word(match):
word = match.group(0)
if word[0].lower() in vowels:
return word + "ay"
return word[1:] + word[0] + "ay"
message = "Oh! my what a beautiful day for the fox to jump 653 fences!"
print(re.sub("[a-z]+", mangle_word, message, flags=re.I))
outputs
Ohay! ymay hatway aay eautifulbay ayday orfay hetay oxfay otay umpjay 653 encesfay!

How can I count exactly how many lowercase characters come after an uppercase character?

For example: "He Is a small man" has 2 lowercase characters that come after an uppercase. I have tried googling but I haven't found anything similar there. I have this code where I'm counting the lowercase and uppercase characters:
letters = input("Enter string: ")
count1=0
count2=0
for i in letters:
if(i.islower()):
count1=count1+1
elif(i.isupper()):
count2=count2+1
print("The number of lowercase characters is:")
print(count1)
print("The number of uppercase characters is:")
print(count2)
You can use Python's regex module to split the string into substrings that each follow a <UPPERCASE><not uppercase> pattern.
For example, the following pattern
([A-Z][^A-Z]*)
splits your sample string "He Is a small man" into the following substrings
He
Is a small man
then you can further split each substring into <UPPERCASE> and <not uppercase>
H, e
I, s a small man
and then finally get the len() of the <not uppercase> substring.
Here's a sample code:
import re
pattern = re.compile("(([A-Z])([^A-Z]*))")
matches = re.findall(pattern, input("Enter string: "))
for m in matches:
print(f"substring: {m[0]}")
uppercase = m[1]
print(f"uppercase: {uppercase}")
lowercases = m[2].replace(" ", "")
num_lowercases = len(lowercases)
print(f"lowercase chars: {num_lowercases}")
That outputs:
Enter string: He Is a small man
substring: He
uppercase: H
lowercase chars: 1
substring: Is a small man
uppercase: I
lowercase chars: 10
The matches are retrieved using findall, which returns all patterns from the string, as a list of tuples, for example:
('He ', 'H', 'e ')
[0]: substring that matches the pattern
[1]: the uppercase character
[2]: the rest of the non-uppercase characters
Notice that I added .replace(" ", "") to exclude spaces from the count of lowercase characters. If you also want the total count of uppercase and lowercase, you could just track the total similar to your count1 and count2 variables:
import re
total_uppercase = 0
total_lowercase = 0
pattern = re.compile("(([A-Z])([^A-Z]*))")
matches = re.findall(pattern, input("Enter string: "))
for m in matches:
print(f"substring: {m[0]}")
uppercase = m[1]
print(f"uppercase: {uppercase}")
total_uppercase += 1
lowercases = m[2].replace(" ", "")
num_lowercases = len(lowercases)
print(f"lowercase chars: {num_lowercases}")
total_lowercase += num_lowercases
print(f"total uppercase: {total_uppercase}")
print(f"total lowercase: {total_lowercase}")
which outputs:
Enter string: He Is a small man
...
total uppercase: 2
total lowercase: 11

Shuffle words' characters while maintaining sentence structure and punctuations

So, I want to be able to scramble words in a sentence, but:
Word order in the sentence(s) is left the same.
If the word started with a capital letter, the jumbled word must also start with a capital letter
(i.e., the first letter gets capitalised).
Punctuation marks . , ; ! and ? need to be preserved.
For instance, for the sentence "Tom and I watched Star Wars in the cinema, it was
fun!" a jumbled version would be "Mto nad I wachtde Tars Rswa ni het amecin, ti wsa
fnu!".
from random import shuffle
def shuffle_word(word):
word = list(word)
if word.title():
???? #then keep first capital letter in same position in word?
elif char == '!' or '.' or ',' or '?':
???? #then keep their position?
else:
shuffle(word)
return''.join(word)
L = input('try enter a sentence:').split()
print([shuffle_word(word) for word in L])
I am ok for understanding how to jumble each word in the sentence but... struggling with the if statement to apply specifics? please help!
Here is my code. Little different from your logic. Feel free to optimize the code.
import random
def shuffle_word(words):
words_new = words.split(" ")
out=''
for word in words_new:
l = list(word)
if word.istitle():
result = ''.join(random.sample(word, len(word)))
out = out + ' ' + result.title()
elif any(i in word for i in ('!','.',',')):
result = ''.join(random.sample(word[:-1], len(word)-1))
out = out + ' ' + result+word[-1]
else:
result = ''.join(random.sample(word, len(word)))
out = out +' ' + result
return (out[1:])
L = "Tom and I watched Star Wars in the cinema, it was fun!"
print(shuffle_word(L))
Output of above code execution:
Mto nda I whaecdt Atsr Swra in hte ienamc, ti wsa nfu!
Hope it helps. Cheers!
Glad to see you've figured out most of the logic.
To maintain the capitalization of the first letter, you can check it beforehand and capitalize the "new" first letter later.
first_letter_is_cap = word[0].isupper()
shuffle(word)
if first_letter_is_cap:
# Re-capitalize first letter
word[0] = word[0].upper()
To maintain the position of a trailing punctuation, strip it first and add it back afterwards:
last_char = word[-1]
if last_char in ".,;!?":
# Strip the punctuation
word = word[:-1]
shuffle(word)
if last_char in ".,;!?":
# Add it back
word.append(last_char)
Since this is a string processing algorithm I would consider using regular expressions. Regex gives you more flexibility, cleaner code and you can get rid of the conditions for edge cases. For example this code handles apostrophes, numbers, quote marks and special phrases like date and time, without any additional code and you can control these just by changing the pattern of regular expression.
from random import shuffle
import re
# Characters considered part of words
pattern = r"[A-Za-z']+"
# shuffle and lowercase word characters
def shuffle_word(word):
w = list(word)
shuffle(w)
return ''.join(w).lower()
# fucntion to shuffle word used in replace
def replace_func(match):
return shuffle_word(match.group())
def shuffle_str(str):
# replace words with their shuffled version
shuffled_str = re.sub(pattern, replace_func, str)
# find original uppercase letters
uppercase_letters = re.finditer(r"[A-Z]", str)
# make new characters in uppercase positions uppercase
char_list = list(shuffled_str)
for match in uppercase_letters:
uppercase_index = match.start()
char_list[uppercase_index] = char_list[uppercase_index].upper()
return ''.join(char_list)
print(shuffle_str('''Tom and I watched "Star Wars" in the cinema's new 3D theater yesterday at 8:00pm, it was fun!'''))
This works with any sentence, even if was "special" characters in a row, preserving all the punctuaction marks:
from random import sample
def shuffle_word(sentence):
new_sentence=""
word=""
for i,char in enumerate(sentence+' '):
if char.isalpha():
word+=char
else:
if word:
if len(word)==1:
new_sentence+=word
else:
new_word=''.join(sample(word,len(word)))
if word==word.title():
new_sentence+=new_word.title()
else:
new_sentence+=new_word
word=""
new_sentence+=char
return new_sentence
text="Tom and I watched Star Wars in the cinema, it was... fun!"
print(shuffle_word(text))
Output:
Mto nda I hctawed Rast Aswr in the animec, ti asw... fnu!

Comparing variables to all items in list in Python

I need to write a function for an edx Python course. The idea is figure out the number of letters and words in series of strings and return the average letter count while only using for/while loops and conditionals. My code comes close to being able to do this, but I cannot. For the life of me. Figure out why it doesn't work. I've been bashing my head against it for two days, now, and I know it's probably something really simple that I'm too idiotic to see (sense my frustration?), but I do not know what it is.
If I'm looking at line 14, the logic makes sense: if i in the string is punctuation (not a letter) and the previous character (char, in this case) is not punctuation (therefore a letter), it should be a word. But it's still counting double punctuation as words. But not all of them.
def averageWordLength(myString):
char = ""
punctuation = [" ", "!", "?", ".", ","]
letters = 0
words = 0
if not myString == str(myString):
return "Not a string"
try:
for i in myString:
if i not in punctuation:
letters += 1
elif i in punctuation:
if char not in punctuation:
words += 1
elif char in punctuation:
pass
char = i
if letters == 0:
return "No words"
else:
average = letters / (words + 1)
return letters, words + 1, average
except TypeError:
return "No words"
print(averageWordLength("Hi"))
print(averageWordLength("Hi, Lucy"))
print(averageWordLength(" What big spaces you have!"))
print(averageWordLength(True))
print(averageWordLength("?!?!?! ... !"))
print(averageWordLength("One space. Two spaces. Three spaces. Nine spaces. "))
Desired output:
2, 1, 2.0
6, 2, 3.0
20, 6, 4.0
Not a string
No words
38, 8, 4.75
What in blazes am I doing wrong?!
٩๏̯͡๏۶
Final correction:
for i in myString:
if i not in punctuation:
letters += 1
if char in punctuation:
words += 1
char = i
else:
average = letters / (words + 1)
return letters, words + 1, average
You're adding 1 to words by default... this is not valid in all cases: "Hi!" being a good example. This is actually what is putting off all of your strings: Anytime a string does not end in a word your function will be off.
Hint: You only want to add one if there is no punctuation after the last word.
A problem happens when the string begins with a punctuation character: the previous character is still "" and not considered as a punctuation character, so an non-existent word in counted.
you could add "" in the list of symbols, or do :
punctuation = " !?.,"
because testing c in s return true if c is a substring of s, aka if c is a character of s. And the empty string is contained in every string.
A second problem occurs at the end, if the string terminate with a word, it is not counted (were your word+1 a way to fix that ?), but if the string terminate with a punctuation, the last word is counted.
Add this just after the for loop :
if char not in punctuation:
words += 1
And now there will be no need to add 1, just use
average = letters / words
This is made more difficult since I assume you're not allowed to use inbuilt string functions like split().
The way I would approach this is to:
Split the sentence into a list of words.
Count the letters in each word.
Take the average amount of letters.
def averageWordLength(myString):
punctuation = [" ", "!", "?", ".", ","]
if not isinstance(myString, str):
return "Not a string"
split_values = []
word = ''
for char in myString:
if char in punctuation:
if word:
split_values.append(word)
word = ''
else:
word += char
if word:
split_values.append(word)
letter_count = []
for word in split_values:
letter_count.append(len(word))
if len(letter_count):
return sum(letter_count)/len(letter_count)
else:
return "No words."

Categories