Compare for a specific string containing a substring of ranging numbers - python

How may I compare for a specific string containing a substring of ranging numbers in python?
Example: I have the following strings "t (1)", "t (2)" and "t (3)". They're all "t (*)" where * is always a number. In my usecase, it will always be "t " followed by a bracketed number.
I'm not sure how to essentially do:
if (string == "t (*)"):
where * is the range of numbers.
I googled variations of string comparison methods in python, but I don't know what's the right search term to use. I assume it involves regex.

Probably the easiest way to do this is using regex.
import re
s = "t (99)"
match = re.search(r't \(\d+\)', s)
if match:
# found the string
else:
# did not find the string
See demo on regex101.com.

Related

I need help to automatically DEcensore a text (lot's of text to be prosseced)

I have a web story that has cencored word in it with asterix
right now i'm doing it with a simple and dumb str.replace
but as you can imagine this is a pain and I need to search in the text to find all instance of the censoring
here is bastard instance that are capitalized, plurial and with asterix in different places
toReplace = toReplace.replace("b*stard", "bastard")
toReplace = toReplace.replace("b*stards", "bastards")
toReplace = toReplace.replace("B*stard", "Bastard")
toReplace = toReplace.replace("B*stards", "Bastards")
toReplace = toReplace.replace("b*st*rd", "bastard")
toReplace = toReplace.replace("b*st*rds", "bastards")
toReplace = toReplace.replace("B*st*rd", "Bastard")
toReplace = toReplace.replace("B*st*rds", "Bastards")
is there a way to compare all word with "*" (or any other replacement character) to an already compiled dict and replace them with the uncensored version of the word ?
maybe regex but I don't think so
Using regex alone will likely not result in a full solution for this. You would likely have an easier time if you have a simple list of the words that you want to restore, and use Levenshtein distance to determine which one is closest to a given word that you have found a * in.
One library that may help with this is fuzzywuzzy.
The two approaches that I can think of quickly:
Split the text so that you have 1 string per word. For each word, if '*' in word, then compare it to the list of replacements to find which is closest.
Use re.sub to identify the words that contain a * character, and write a function that you would use as the repl argument to determine which replacement it is closest to and return that replacement.
Additional resources:
Python: find closest string (from a list) to another string
Find closest string match from list
How to find closest match of a string from a list of different length strings python?
You can use re module to find matches between the censored word and words in your wordlist.
Replace * with . (dot has special meaning in regex, it means "match every character") and then use re.match:
import re
wordlist = ["bastard", "apple", "orange"]
def find_matches(censored_word, wordlist):
pat = re.compile(censored_word.replace("*", "."))
return [w for w in wordlist if pat.match(w)]
print(find_matches("b*st*rd", wordlist))
Prints:
['bastard']
Note: If you want match exact word, add $ at the end of your pattern. That means appl* will not match applejuice in your dictionary for example.

Return a string of country codes from an argument that is a string of prices

So here's the question:
Write a function that will return a string of country codes from an argument that is a string of prices (containing dollar amounts following the country codes). Your function will take as an argument a string of prices like the following: "US$40, AU$89, JP$200". In this example, the function would return the string "US, AU, JP".
Hint: You may want to break the original string into a list, manipulate the individual elements, then make it into a string again.
Example:
> testEqual(get_country_codes("NZ$300, KR$1200, DK$5")
> "NZ, KR, DK"
As of now, I'm clueless as to how to separate the $ and the numbers. I'm very lost.
I would advice using and looking up regex expressions
https://docs.python.org/2/library/re.html
If you use re.findall it will return you a list of all matching strings, and you can use a regex expression like /[A-Z]{2}$ to find all the two letter capital words in the list.
After that you can just create a string from the resulting list.
Let me know if that is not clear
def test(string):
return ", ".join([item.split("$")[0] for item in string.split(", ")])
string = "NZ$300, KR$1200, DK$5"
print test(string)
Use a regular expression pattern and append the matches to a string. (\w{2})\$ matches exactly 2 word characters followed by by a $.
def get_country_codes(string):
matches = re.findall(r"(\w{2})\$", string)
return ", ".join(match for match in matches)

Retrieve part of string, variable length

I'm trying to learn how to use Regular Expressions with Python. I want to retrieve an ID number (in parentheses) in the end from a string that looks like this:
"This is a string of variable length (561401)"
The ID number (561401 in this example) can be of variable length, as can the text.
"This is another string of variable length (99521199)"
My coding fails:
import re
import selenium
# [Code omitted here, I use selenium to navigate a web page]
result = driver.find_element_by_class_name("class_name")
print result.text # [This correctly prints the whole string "This is a text of variable length (561401)"]
id = re.findall("??????", result.text) # [Not sure what to do here]
print id
This should work for your example:
(?<=\()[0-9]*
?<= Matches something preceding the group you are looking for but doesn't consume it. In this case, I used \(. ( is a special character, so it has to be escaped with \. [0-9] matches any number. The * means match any number of the directly preceding rule, so [0-9]* means match as many numbers as there are.
Solved this thanks to Kaz's link, very useful:
http://regex101.com/
id = re.findall("(\d+)", result.text)
print id[0]
You can use this simple solution :
>>> originString = "This is a string of variable length (561401)"
>>> str1=OriginalString.replace("("," ")
'This is a string of variable length 561401)'
>>> str2=str1.replace(")"," ")
'This is a string of variable length 561401 '
>>> [int(s) for s in string.split() if s.isdigit()]
[561401]
First, I replace parantheses with space. and then I searched the new string for integers.
No need to really use regular expressions here, if it is always at the end and always in parenthesis you can split, extract last element and remove the parenthesis by taking the substring ([1:-1]). Regexes are relatively time expensive.
line = "This is another string of variable length (99521199)"
print line.split()[-1][1:-1]
If you did want to use regular expressions I would do this:
import re
line = "This is another string of variable length (99521199)"
id_match = re.match('.*\((\d+)\)',line)
if id_match:
print id_match.group(1)

How to remove special characters from a string

I've been looking into how to create a program that removes any whitespaces/special characters from user input. I wan't to be left with just a string of numbers but I've not been able to work out quite how to do this. Is it possible anyone can help?
x = (input("Enter a debit card number: "))
x.translate(None, '!.;,')
print(x)
The code I have created is possibly to basic but yeah, it also doesn't work. Can anyone please help? :) I'm using Python3.
The way str.translate works is different in Py 3.x - it requires mapping a dictionary of ordinals to values, so instead you use:
x = input("Enter a debit card number: ")
result = x.translate(str.maketrans({ord(ch):None for ch in '!.;,'}))
Although you're better off just removing all non digits:
import re
result = re.sub('[^0-9], x, '')
Or using builtins:
result = ''.join(ch for ch in x if ch.isidigit())
It's important to note that strings are immutable and their methods return a new string - be sure to either assign back to the object or some other object to retain the result.
You don't need translate for this aim . instead you can use regex :
import re
x = input("Enter a debit card number: ")
x = re.sub(r'[\s!.;,]*','',x)
[\s!.;,]* match a single character present in the list below:
Quantifier: * Between zero and unlimited times, as many times as
possible, giving back as needed [greedy] \s match any white space
character [\r\n\t\f ] !.;, a single character in the list !.;,
literally
re.sub(pattern, repl, string, count=0, flags=0)
Return the string obtained by replacing the leftmost non-overlapping
occurrences of pattern in string by the replacement repl.
Assuming that you wanted only numbers (credit card number)
import re: # or 'from re import sub'
s = re.sub('[^0-9]+', "", my_str);
I've used this as an input:
my_str = "663388191712-483498-39434347!2848484;290850 2332049832048 23042 2 2";
What I've got is only numbers (because you mentioned that you want to be left with only numbers):
66338819171248349839434347284848429085023320498320482304222

replace multiple words - python

There can be an input "some word".
I want to replace this input with "<strong>some</strong> <strong>word</strong>" in some other text which contains this input
I am trying with this code:
input = "some word".split()
pattern = re.compile('(%s)' % input, re.IGNORECASE)
result = pattern.sub(r'<strong>\1</strong>',text)
but it is failing and i know why: i am wondering how to pass all elements of list input to compile() so that (%s) can catch each of them.
appreciate any help
The right approach, since you're already splitting the list, is to surround each item of the list directly (never using a regex at all):
sterm = "some word".split()
result = " ".join("<strong>%s</strong>" % w for w in sterm)
In case you're wondering, the pattern you were looking for was:
pattern = re.compile('(%s)' % '|'.join(sterm), re.IGNORECASE)
This works on your string because the regular expression would become
(some|word)
which means "matches some or matches word".
However, this is not a good approach as it does not work for all strings. For example, consider cases where one word contains another, such as
a banana and an apple
which becomes:
<strong>a</strong> <strong>banana</strong> <strong>a</strong>nd <strong>a</strong>n <strong>a</strong>pple
It looks like you're wanting to search for multiple words - this word or that word. Which means you need to separate your searches by |, like the script below:
import re
text = "some word many other words"
input = '|'.join('some word'.split())
pattern = re.compile('(%s)' % input, flags=0)
print pattern.sub(r'<strong>\1</strong>',text)
I'm not completely sure if I know what you're asking but if you want to pass all the elements of input in as parameters in the compile function call, you can just use *input instead of input. * will split the list into its elements. As an alternative, could't you just try joining the list with and adding at the beginning and at the end?
Alternatively, you can use the join operator with a list comprehension to create the intended result.
text = "some word many other words".split()
result = ' '.join(['<strong>'+i+'</strong>' for i in text])

Categories