I'd like to make strings lowercase only after an right arrow (and until it hits a comma) in Python.
Also, I prefer to write it in one-line, if I could.
Here's my code:
import re
line = "For example, →settle ACCOUNTs. After that, UPPER CASEs are OK."
string1 = re.sub(r'→([A-Za-z ]+)', r'→\1', line)
# string1 = re.sub(r'→([A-Za-z ]+)', r'→\1.lower()', line) # Pseudo-code in my brain
print(string1)
# Expecting: "For example, →settle accounts. After that, UPPERCASEs are OK."
# Should I always write two lines of code, like this:
string2 = re.findall(r'→([A-Za-z ]+)', line)
print('→' + string2[0].lower())
# ... and add "For example, " and ". After that, UPPER CASEs are OK." ... later?
I believe that there must be a better way. How would you guys do?
Thank you in advance.
import re
line = "For example, →settle ACCOUNTs. After that, UPPER CASEs are OK."
string1 = re.sub(r'→[A-Za-z ]+', lambda match: match.group().lower(), line)
print(string1)
# For example, →settle accounts. After that, UPPER CASEs are OK.
From the documentation:
re.sub(pattern, repl, string, count=0, flags=0)
...repl can be a string or a function...
If repl is a function, it is called for every non-overlapping occurrence of pattern. The function takes a single match object argument, and returns the replacement string.
Related
I've been able to isolate the list (or string) of characters I want excluded from a user entered string. But I don't see how to then remove all these unwanted characters. After I do this, I think I can try joining the user string so it all becomes one alphabet input like the instructions say.
Instructions:
Remove all non-alpha characters
Write a program that removes all non-alpha characters from the given input.
For example, if the input is:
-Hello, 1 world$!
the output should be:
Helloworld
My code:
userEntered = input()
makeList = userEntered.split()
def split(userEntered):
return list(userEntered)
if userEntered.isalnum() == False:
for i in userEntered:
if i.isalpha() == False:
#answer = userEntered[slice(userEntered.index(i))]
reference = split(userEntered)
excludeThis = i
print(excludeThis)
When I print excludeThis, I get this as my output:
-
,
1
$
!
So I think I might be on the right track. I need to figure it out how to get these characters out of the user input. Any help is appreciated.
Loop over the input string. If the character is alphabetic, add it to the result string.
userEntered = input()
result = ''
for char in userEntered:
if char.isalpha():
result += char
print(result)
This can also be done with a regular expression:
import re
userEntered = input()
result = re.sub(r'[^a-z]', '', userEntered, flags=re.I)
The regexp [^a-z] matches anything except an alphabetic character. The re.I flag makes it case-insensitive. These are all replaced with an empty string, which removes them.
There's basically two main parts to this: distinguish alpha from non-alpha, and get a string with only the former. If isalpha() is satisfactory for the former, then that leaves the latter. My understanding is that the solution that is considered most Pythonic would be to join a comprehension. This would like this:
''.join(char for char in userEntered if char.isalpha())
BTW, there are several places in the code where you are making it more complicated than it needs to be. In Python, you can iterate over strings, so there's no need to convert userEntered to a list. isalnum() checks whether the string is all alphanumeric, so it's rather irrelevant (alphanumeric includes digits). You shouldn't ever compare a boolean to True or False, just use the boolean. So, for instance, if i.isalpha() == False: can be simplified to just if not i.isalpha():.
I'm currently using the find function and found a slight problem.
theres gonna be a fire here
If I have a sentence with the word "here" and "theres" and I use find() to find "here"s index, I instead get "theres"
I thought find() would be like
if thisword in thatword:
as it would find the word, not a substring within a string.
Is there another function that may work similarly? I'm using find() quite heavily would like to know of alternatives before I clog the code with string.split() then iterate until I find the exact match with an index counter on the side.
MainLine = str('theres gonna be a fire here')
WordtoFind = str('here')
#String_Len = MainLine.find(WordtoFind)
split_line = MainLine.split()
indexCounter = 0
for i in range (0,len(split_line)):
indexCounter += (len(split_line[i]) + 1)
if WordtoFind in split_line[i]:
#String_Len = MainLine.find(split_line[i])
String_Len = indexCounter
break
The best route would be regular expressions. To find a "word" just make sure that the leading and ending characters are not alphanumeric. It uses no splits, has no exposed loops, and even works when you run into a weird sentence like "There is a fire,here". A find_word function might look like this
import re
def find_word_start(word, string):
pattern = "(?<![a-zA-Z0-9])"+word+"(?![a-zA-Z0-9])"
result = re.search(pattern, string)
return result.start()
>> find_word_start("here", "There is a fire,here")
>> 16
The regex I made uses a trick called lookarounds that make sure that the characters preceding and after the word are not letters or numbers. https://www.regular-expressions.info/lookaround.html. The term [a-zA-Z0-9] is a character set that is comprised of a single character in the sets a-z, A-Z, and 0-9. Look up the python re module to find out more about regular expressions.
There are probably several ways to solve this problem, so I'm open to any ideas.
I have a file, within that file is the string "D133330593" Note: I do have the exact position within the file this string exists, but I don't know if that helps.
Following this string, there are 6 digits, I need to replace these 6 digits with 6 other digits.
This is what I have so far:
def editfile():
f = open(filein,'r')
filedata = f.read()
f.close()
#This is the line that needs help
newdata = filedata.replace( -TOREPLACE- ,-REPLACER-)
#Basically what I need is something that lets me say "D133330593******"
#->"D133330593123456" Note: The following 6 digits don't need to be
#anything specific, just different from the original 6
f = open(filein,'w')
f.write(newdata)
f.close()
Use the re module to define your pattern and then use the sub() function to substitute occurrence of that pattern with your own string.
import re
...
pat = re.compile(r"D133330593\d{6}")
re.sub(pat, "D133330593abcdef", filedata)
The above defines a pattern as -- your string ("D133330593") followed by six decimal digits. Then the next line replaces ALL occurrences of this pattern with your replacement string ("abcdef" in this case), if that is what you want.
If you want a unique replacement string for each occurrence of pattern, then you could use the count keyword argument in the sub() function, which allows you to specify the number of times the replacement must be done.
Check out this library for more info - https://docs.python.org/3.6/library/re.html
Let's simplify your problem to you having a string:
s = "zshisjD133330593090909fdjgsl"
and you wanting to replace the 6 characters after "D133330593" with "123456" to produce:
"zshisjD133330594123456fdjgsl"
To achieve this, we can first need to find the index of "D133330593". This is done by just using str.index:
i = s.index("D133330593")
Then replace the next 6 characters, but for this, we should first calculate the length of our string that we want to replace:
l = len("D133330593")
then do the replace:
s[:i+l] + "123456" + s[i+l+6:]
which gives us the desired result of:
'zshisjD133330593123456fdjgsl'
I am sure that you can now integrate this into your code to work with a file, but this is how you can do the heart of your problem .
Note that using variables as above is the right thing to do as it is the most efficient compared to calculating them on the go. Nevertheless, if your file isn't too long (i.e. efficiency isn't too much of a big deal) you can do the whole process outlined above in one line:
s[:s.index("D133330593")+len("D133330593")] + "123456" + s[s.index("D133330593")+len("D133330593")+6:]
which gives the same result.
I'm trying to learn how to use Regular Expressions with Python. I want to retrieve an ID number (in parentheses) in the end from a string that looks like this:
"This is a string of variable length (561401)"
The ID number (561401 in this example) can be of variable length, as can the text.
"This is another string of variable length (99521199)"
My coding fails:
import re
import selenium
# [Code omitted here, I use selenium to navigate a web page]
result = driver.find_element_by_class_name("class_name")
print result.text # [This correctly prints the whole string "This is a text of variable length (561401)"]
id = re.findall("??????", result.text) # [Not sure what to do here]
print id
This should work for your example:
(?<=\()[0-9]*
?<= Matches something preceding the group you are looking for but doesn't consume it. In this case, I used \(. ( is a special character, so it has to be escaped with \. [0-9] matches any number. The * means match any number of the directly preceding rule, so [0-9]* means match as many numbers as there are.
Solved this thanks to Kaz's link, very useful:
http://regex101.com/
id = re.findall("(\d+)", result.text)
print id[0]
You can use this simple solution :
>>> originString = "This is a string of variable length (561401)"
>>> str1=OriginalString.replace("("," ")
'This is a string of variable length 561401)'
>>> str2=str1.replace(")"," ")
'This is a string of variable length 561401 '
>>> [int(s) for s in string.split() if s.isdigit()]
[561401]
First, I replace parantheses with space. and then I searched the new string for integers.
No need to really use regular expressions here, if it is always at the end and always in parenthesis you can split, extract last element and remove the parenthesis by taking the substring ([1:-1]). Regexes are relatively time expensive.
line = "This is another string of variable length (99521199)"
print line.split()[-1][1:-1]
If you did want to use regular expressions I would do this:
import re
line = "This is another string of variable length (99521199)"
id_match = re.match('.*\((\d+)\)',line)
if id_match:
print id_match.group(1)
There can be an input "some word".
I want to replace this input with "<strong>some</strong> <strong>word</strong>" in some other text which contains this input
I am trying with this code:
input = "some word".split()
pattern = re.compile('(%s)' % input, re.IGNORECASE)
result = pattern.sub(r'<strong>\1</strong>',text)
but it is failing and i know why: i am wondering how to pass all elements of list input to compile() so that (%s) can catch each of them.
appreciate any help
The right approach, since you're already splitting the list, is to surround each item of the list directly (never using a regex at all):
sterm = "some word".split()
result = " ".join("<strong>%s</strong>" % w for w in sterm)
In case you're wondering, the pattern you were looking for was:
pattern = re.compile('(%s)' % '|'.join(sterm), re.IGNORECASE)
This works on your string because the regular expression would become
(some|word)
which means "matches some or matches word".
However, this is not a good approach as it does not work for all strings. For example, consider cases where one word contains another, such as
a banana and an apple
which becomes:
<strong>a</strong> <strong>banana</strong> <strong>a</strong>nd <strong>a</strong>n <strong>a</strong>pple
It looks like you're wanting to search for multiple words - this word or that word. Which means you need to separate your searches by |, like the script below:
import re
text = "some word many other words"
input = '|'.join('some word'.split())
pattern = re.compile('(%s)' % input, flags=0)
print pattern.sub(r'<strong>\1</strong>',text)
I'm not completely sure if I know what you're asking but if you want to pass all the elements of input in as parameters in the compile function call, you can just use *input instead of input. * will split the list into its elements. As an alternative, could't you just try joining the list with and adding at the beginning and at the end?
Alternatively, you can use the join operator with a list comprehension to create the intended result.
text = "some word many other words".split()
result = ' '.join(['<strong>'+i+'</strong>' for i in text])