Retrieve part of string, variable length - python

I'm trying to learn how to use Regular Expressions with Python. I want to retrieve an ID number (in parentheses) in the end from a string that looks like this:
"This is a string of variable length (561401)"
The ID number (561401 in this example) can be of variable length, as can the text.
"This is another string of variable length (99521199)"
My coding fails:
import re
import selenium
# [Code omitted here, I use selenium to navigate a web page]
result = driver.find_element_by_class_name("class_name")
print result.text # [This correctly prints the whole string "This is a text of variable length (561401)"]
id = re.findall("??????", result.text) # [Not sure what to do here]
print id

This should work for your example:
(?<=\()[0-9]*
?<= Matches something preceding the group you are looking for but doesn't consume it. In this case, I used \(. ( is a special character, so it has to be escaped with \. [0-9] matches any number. The * means match any number of the directly preceding rule, so [0-9]* means match as many numbers as there are.

Solved this thanks to Kaz's link, very useful:
http://regex101.com/
id = re.findall("(\d+)", result.text)
print id[0]

You can use this simple solution :
>>> originString = "This is a string of variable length (561401)"
>>> str1=OriginalString.replace("("," ")
'This is a string of variable length 561401)'
>>> str2=str1.replace(")"," ")
'This is a string of variable length 561401 '
>>> [int(s) for s in string.split() if s.isdigit()]
[561401]
First, I replace parantheses with space. and then I searched the new string for integers.

No need to really use regular expressions here, if it is always at the end and always in parenthesis you can split, extract last element and remove the parenthesis by taking the substring ([1:-1]). Regexes are relatively time expensive.
line = "This is another string of variable length (99521199)"
print line.split()[-1][1:-1]
If you did want to use regular expressions I would do this:
import re
line = "This is another string of variable length (99521199)"
id_match = re.match('.*\((\d+)\)',line)
if id_match:
print id_match.group(1)

Related

Using strip() to remove only one element

I have a word within two opening and closing parenthesis, like this ((word)).
I want to remove the first and the last parenthesis, so they are not duplicate, in order to obtain something like this: (word).
I have tried using strip('()') on the variable that contains ((word)). However, it removes ALL parentheses at the beginning and at the end. Result: word.
Is there a way to specify that I only want the first and last one removed?
For this you could slice the string and only keep from the second character until the second to last character:
word = '((word))'
new_word = word[1:-1]
print(new_word)
Produces:
(word)
For varying quantities of parenthesis, you could count how many exist first and pass this to the slicing as such (this leaves only 1 bracket on each side, if you want to remove only the first and last bracket you can use the first suggestion);
word ='((((word))))'
quan = word.count('(')
new_word = word[quan-1:1-quan]
print(new_word)
Produces;
(word)
You can use regex.
import re
word = '((word))'
re.findall('(\(?\w+\)?)', word)[0]
This only keeps one pair of brackets.
instead use str.replace, so you would do str.replace('(','',1)
basically you would replace all '(' with '', but the third argument will only replace n instances of the specified substring (as argument 1), hence you will only replace the first '('
read the documentation :
replace(...)
S.replace (old, new[, count]) -> string
Return a copy of string S with all occurrences of substring
old replaced by new. If the optional argument count is
given, only the first count occurrences are replaced.
you can replace double opening and double closing parentheses, and set the max parameter to 1 for both operations
print('((word))'.replace('((','(',1).replace('))',')',1) )
But this will not work if there are more occurrences of double closing parentheses
Maybe reversing the string before replacing the closing ones will help
t= '((word))'
t = t.replace('((','(',1)
t = t[::-1] # see string reversion topic [https://stackoverflow.com/questions/931092/reverse-a-string-in-python]
t = t.replace('))',')',1) )
t = t[::-1] # and reverse again
Well , I used regular expression for this purpose and substitute a bunch of brackets with a single one using re.sub function
import re
s="((((((word)))))))))"
t=re.sub(r"\(+","(",s)
g=re.sub(r"\)+",")",t)
print(g)
Output
(word)
Try below:
>>> import re
>>> w = '((word))'
>>> re.sub(r'([()])\1+', r'\1', w)
'(word)'
>>> w = 'Hello My ((word)) into this world'
>>> re.sub(r'([()])\1+', r'\1', w)
'Hello My (word) into this world'
>>>
try this one:
str="((word))"
str[1:len(str)-1]
print (str)
And output is = (word)

Return a string of country codes from an argument that is a string of prices

So here's the question:
Write a function that will return a string of country codes from an argument that is a string of prices (containing dollar amounts following the country codes). Your function will take as an argument a string of prices like the following: "US$40, AU$89, JP$200". In this example, the function would return the string "US, AU, JP".
Hint: You may want to break the original string into a list, manipulate the individual elements, then make it into a string again.
Example:
> testEqual(get_country_codes("NZ$300, KR$1200, DK$5")
> "NZ, KR, DK"
As of now, I'm clueless as to how to separate the $ and the numbers. I'm very lost.
I would advice using and looking up regex expressions
https://docs.python.org/2/library/re.html
If you use re.findall it will return you a list of all matching strings, and you can use a regex expression like /[A-Z]{2}$ to find all the two letter capital words in the list.
After that you can just create a string from the resulting list.
Let me know if that is not clear
def test(string):
return ", ".join([item.split("$")[0] for item in string.split(", ")])
string = "NZ$300, KR$1200, DK$5"
print test(string)
Use a regular expression pattern and append the matches to a string. (\w{2})\$ matches exactly 2 word characters followed by by a $.
def get_country_codes(string):
matches = re.findall(r"(\w{2})\$", string)
return ", ".join(match for match in matches)

From list of strings, extract only characters within brackets

I have a list of strings that have variable construction but have a character sequence enclosed in square brackets. I want to extract only the sequence enclosed by the square brackets. There is only one instance of square brackets per string, which simplifies the process.
I am struggling to do so in an elegant manner, and this is clearly a simple problem with Python's large string library.
What is a simple expression to do this?
Check regular expression, "re"
Something like this should do the trick
import re
s = "hello_from_adele[this_is_the_string_i_am_looking_for]this_is_not_it"
match = re.search(r"\[([A-Za-z0-9_]+)\]", s)
print match.group(1)
If you provide an example, we can be more specific
You don't even need re to do this:
In [11]: strng = "This is some text [that has brackets] followed by more text"
In [12]: strng[strng.index("[")+1:strng.index("]")]
Out[12]: 'that has brackets'
This uses string slicing to return the characters inside the brackets. index() returns the 0-based position of its argument. Since we don't want to include the [ at the beginning, we add 1. The second argument of the slice is the stop position, but it is not included in the returned substring, so we don't need to add anything to it.
If you prefer not to use regex for whatever reason, it should be easy to do with string splitting since you're guaranteed to have one and only one instance of [ and ].
s = "some[string]to check"
_, midright = s.split("[")
target, _ = midright.split("]")
or
target = s.split("[")[1].split("]")[0] # ewww

replace multiple words - python

There can be an input "some word".
I want to replace this input with "<strong>some</strong> <strong>word</strong>" in some other text which contains this input
I am trying with this code:
input = "some word".split()
pattern = re.compile('(%s)' % input, re.IGNORECASE)
result = pattern.sub(r'<strong>\1</strong>',text)
but it is failing and i know why: i am wondering how to pass all elements of list input to compile() so that (%s) can catch each of them.
appreciate any help
The right approach, since you're already splitting the list, is to surround each item of the list directly (never using a regex at all):
sterm = "some word".split()
result = " ".join("<strong>%s</strong>" % w for w in sterm)
In case you're wondering, the pattern you were looking for was:
pattern = re.compile('(%s)' % '|'.join(sterm), re.IGNORECASE)
This works on your string because the regular expression would become
(some|word)
which means "matches some or matches word".
However, this is not a good approach as it does not work for all strings. For example, consider cases where one word contains another, such as
a banana and an apple
which becomes:
<strong>a</strong> <strong>banana</strong> <strong>a</strong>nd <strong>a</strong>n <strong>a</strong>pple
It looks like you're wanting to search for multiple words - this word or that word. Which means you need to separate your searches by |, like the script below:
import re
text = "some word many other words"
input = '|'.join('some word'.split())
pattern = re.compile('(%s)' % input, flags=0)
print pattern.sub(r'<strong>\1</strong>',text)
I'm not completely sure if I know what you're asking but if you want to pass all the elements of input in as parameters in the compile function call, you can just use *input instead of input. * will split the list into its elements. As an alternative, could't you just try joining the list with and adding at the beginning and at the end?
Alternatively, you can use the join operator with a list comprehension to create the intended result.
text = "some word many other words".split()
result = ' '.join(['<strong>'+i+'</strong>' for i in text])

Splice a string based on certain characters

I'm looking for a way to examine only certain characters within a string. For example:
#Given the string
s= '((hello+world))'
s[1:')'] #This obviously doesn't work because you can only splice a string using ints
Basically I want the program to start at the second occurence of ( and then from there splice until it hits the first occurence of ). So then maybe from there I can return it to another fucntion or whatever. Any solutions?
You can do it as follows: (assuming you want the innermost parenthesis)
s[s.rfind("("):s.find(")")+1] if you want "(hello+world)"
s[s.rfind("(")+1:s.find(")")] if you want "hello+world"
You can strip parenthesis (if, in your case, they always appear at the beginning and the end of the string):
>>> s= '((hello+world))'
>>> s.strip('()')
'hello+world'
Another option is to use regular expression to extract what is inside the double parenthesis:
>>> re.match('\(\((.*?)\)\)', s).group(1)
'hello+world'

Categories