How can I select this specific part of a string using regex? - python

Hi and thank you for your time.
I have the following example string: "Hola Luis," but the string template will always be "Hola {{name}},".
How would the regex be to match any name? You can assume the name will follow a blank space and "Hola" before that and it will have a comma right after it.
Thank you!

You can use the following regular expression, assuming that as you mention, the format is always the same:
import re
s = "Hola Luis,"
re.search('Hola (\w+),', s).group(1)
# 'Luis'

s = 'Hola test'
re.match(r'Hola (\w+)', s).groups()[0]
results:
'test'

Continuing from #yatu,
Without regex:
print("Hola Luis,".split(" ")[1].strip(","))
Explanation:
split(" ") # to split the string with spaces
[1] # to get the forthcoming part
strip(",") # to strip off any ','
OUTPUT:
Luis

According to Falsehoods Programmers Believe About Names and your requirements, I'll use the following regex: (?<=Hola )[^,]+(?=,).

Related

How to add quotes around words in python using regex?

I have following json type string text
blockAddress:{strandId:"C1DYN7Cag8oDCRRoIJ1uAz",
sequenceNo:68794},
transactionId:"AYj8Vf4kQ9EE6BJJbvt3js",
blockTimestamp:2019-12-03T08:00:04.899000001Z,
blockHash:{{gdOVqf7AsgaQf90ZK1Hsva2lzPckHnxGmm3plDRBeGA=}},
entriesHash:{{n8oUjERAqT9kL+Cr59P6UPJbIdyPvaP0R9ey9+Njdzc=}}
I want to add quotes(" ") around a word which has [a-zA-Z] characters and ends with colon(:) symbol.
Then my above string need to look likes follows:
"blockAddress":{"strandId":"C1DYN7Cag8oDCRRoIJ1uAz",
"sequenceNo":68794},
"transactionId":"AYj8Vf4kQ9EE6BJJbvt3js",
"blockTimestamp":2019-12-03T08:00:04.899000001Z,
"blockHash":{{gdOVqf7AsgaQf90ZK1Hsva2lzPckHnxGmm3plDRBeGA=}},
"entriesHash":{{n8oUjERAqT9kL+Cr59P6UPJbIdyPvaP0R9ey9+Njdzc=}}
I am trying this re.sub(r'([a-zA-Z]+:)', r'"\1"', s). But I am getting quotes after colon. Like
"blockAddress:"{"strandId:""C1DYN7Cag8oDCRRoIJ1uAz",
"sequenceNo:"68794},
"transactionId:""AYj8Vf4kQ9EE6BJJbvt3js",
"blockTimestamp:"2019-12-03T08:00:04.899000001Z,
"blockHash:"{{gdOVqf7AsgaQf90ZK1Hsva2lzPckHnxGmm3plDRBeGA=}},
"entriesHash:"{{n8oUjERAqT9kL+Cr59P6UPJbIdyPvaP0R9ey9+Njdzc=}}
What I need to change in above regex? or Is there any different approach in python?
Regex101:
txt = '''blockAddress:{strandId:"C1DYN7Cag8oDCRRoIJ1uAz",
sequenceNo:68794},
transactionId:"AYj8Vf4kQ9EE6BJJbvt3js",
blockTimestamp:2019-12-03T08:00:04.899000001Z,
blockHash:{{gdOVqf7AsgaQf90ZK1Hsva2lzPckHnxGmm3plDRBeGA=}},
entriesHash:{{n8oUjERAqT9kL+Cr59P6UPJbIdyPvaP0R9ey9+Njdzc=}}'''
import re
print( re.sub(r'([a-zA-Z]+):', r'"\1":', txt) )
Prints:
"blockAddress":{"strandId":"C1DYN7Cag8oDCRRoIJ1uAz",
"sequenceNo":68794},
"transactionId":"AYj8Vf4kQ9EE6BJJbvt3js",
"blockTimestamp":2019-12-03T08:00:04.899000001Z,
"blockHash":{{gdOVqf7AsgaQf90ZK1Hsva2lzPckHnxGmm3plDRBeGA=}},
"entriesHash":{{n8oUjERAqT9kL+Cr59P6UPJbIdyPvaP0R9ey9+Njdzc=}}
Sounds like what you want is:
re.sub(r'([a-zA-Z]+):', r'"\1":', s)
Using a lookahead assertion:
re.sub(r'([a-zA-Z]+)(?=:)', r'"\1"', input)

How can I remove the string from a character in python?

I have some URLs and I need some of them to be stripped from the question mark (?)
Ex. https://www.yelp.com/biz/starbucks-san-leandro-4?large_photo=1
I need it to return https://www.yelp.com/biz/starbucks-san-leandro-4
How can I do that?
you can also use .split() method
The split() method splits a string into a list.
You can specify the separator, default separator is any whitespace.
Syntax
string.split(separator, maxsplit)
data = 'https://www.yelp.com/biz/starbucks-san-leandro-4?large_photo=1'
print (data.split('?')[0])
output:
https://www.yelp.com/biz/starbucks-san-leandro-4
You could use rfind and slice the string up to the returned index:
s = 'https://www.yelp.com/biz/starbucks-san-leandro-4?large_photo=1'
s[:s.rfind('?')]
# 'https://www.yelp.com/biz/starbucks-san-leandro-4'
Go for a regular expression
import re
new_string = re.sub(r'\?.+$', '', your_string)
See a demo on regex101.com.
I would parse the url and the rebuild it with the parts that you want to keep. For example you can use urllib.parse

Getting word from string

How can i get word example from such string:
str = "http://test-example:123/wd/hub"
I write something like that
print(str[10:str.rfind(':')])
but it doesn't work right, if string will be like
"http://tests-example:123/wd/hub"
You can use this regex to capture the value preceded by - and followed by : using lookarounds
(?<=-).+(?=:)
Regex Demo
Python code,
import re
str = "http://test-example:123/wd/hub"
print(re.search(r'(?<=-).+(?=:)', str).group())
Outputs,
example
Non-regex way to get the same is using these two splits,
str = "http://test-example:123/wd/hub"
print(str.split(':')[1].split('-')[1])
Prints,
example
You can use following non-regex because you know example is a 7 letter word:
s.split('-')[1][:7]
For any arbitrary word, that would change to:
s.split('-')[1].split(':')[0]
many ways
using splitting:
example_str = str.split('-')[-1].split(':')[0]
This is fragile, and could break if there are more hyphens or colons in the string.
using regex:
import re
pattern = re.compile(r'-(.*):')
example_str = pattern.search(str).group(1)
This still expects a particular format, but is more easily adaptable (if you know how to write regexes).
I am not sure why do you want to get a particular word from a string. I guess you wanted to see if this word is available in given string.
if that is the case, below code can be used.
import re
str1 = "http://tests-example:123/wd/hub"
matched = re.findall('example',str1)
Split on the -, and then on :
s = "http://test-example:123/wd/hub"
print(s.split('-')[1].split(':')[0])
#example
using re
import re
text = "http://test-example:123/wd/hub"
m = re.search('(?<=-).+(?=:)', text)
if m:
print(m.group())
Python strings has built-in function find:
a="http://test-example:123/wd/hub"
b="http://test-exaaaample:123/wd/hub"
print(a.find('example'))
print(b.find('example'))
will return:
12
-1
It is the index of found substring. If it equals to -1, the substring is not found in string. You can also use in keyword:
'example' in 'http://test-example:123/wd/hub'
True

How to get string following some specific letters?

How can I get string from some specific characters? (more specifically, get "test" from "A8 test")
In this case, "A8" is following a pattern like "[A-Z]+[0-9]+".
So it can also be "C6 test","X90 test" and etc.
I've tried in Python using "(?<=[A-Z]+[0-9]).+", which throws an Exception:
"sre_constants.error: look-behind requires fixed-width pattern."
It means I should use fixed-width pattern such as "(?<=[A-Z]{1}[0-9]{1})".
But actually it's not fixed-width. What can I do?
If you means get the rest behind pattern "[A-Z]+[0-9]+", you can try this:
import re
s1 = 'A8 test'
s2 = 'C6 123'
s3 = 'X90 test32'
# parentheses is what you want
p = re.compile("[A-Z]+[0-9]+ (\w+)")
print(p.findall(s1))
print(p.findall(s2))
print(p.findall(s3))
output:
['test']
['123']
['test32']
Hope that will help you, and comment if you have further questions. : )
You can use a capture group to get what you need.
>>> regexp = r"[A-Z]+[0-9]+ (.+)"
>>> re.search(regexp, "C6 test")[1]
"test"
>>> re.search(regexp, "X90 test")[1]
"test"
>>> re.search(regexp, "CBF58456 test")[1]
"test"
Note that the current pattern you show would pick up on any number of uppercase letters followed by any number of digits, as long as there's at least one of each. Also note that my example above would require a blank between the first part and the test string to capture.
You could also use re.sub to jettison part of str you do not need by simply using empty str as second argument:
import re
text = "X90 test"
t = re.sub("[A-Z]+[0-9]+ ","",text)
print(t) #test
import re
ex = r"[A-Z]+[0-9]+ (.+)"
print(re.search(ex , "X90 test")[1])
print(re.search(ex , "C6 test")[1])
print(re.search(ex , "CBF58456 test")[1])
Output
test
test
test
You can split the string, then get your string.
>>> re.split(r'([A-Z]+[0-9]+ )(test)', 'A8 test')
['', 'A8 ', 'test', '']
Or you can write a simple function to find your string in the whole string by not using regex.

Returning all characters before the first underscore

Using re in Python, I would like to return all of the characters in a string that precede the first appearance of an underscore. In addition, I would like the string that is being returned to be in all uppercase and without any non-alpanumeric characters.
For example:
AG.av08_binloop_v6 = AGAV08
TL.av1_binloopv2 = TLAV1
I am pretty sure I know how to return a string in all uppercase using string.upper() but I'm sure there are several ways to remove the . efficiently. Any help would be greatly appreciated. I am still learning regular expressions slowly but surely. Each tip gets added to my notes for future use.
To further clarify, my above examples aren't the actual strings. The actual string would look like:
AG.av08_binloop_v6
With my desired output looking like:
AGAV08
And the next example would be the same. String:
TL.av1_binloopv2
Desired output:
TLAV1
Again, thanks all for the help!
Even without re:
text.split('_', 1)[0].replace('.', '').upper()
Try this:
re.sub("[^A-Z\d]", "", re.search("^[^_]*", str).group(0).upper())
Since everyone is giving their favorite implementation, here's mine that doesn't use re:
>>> for s in ('AG.av08_binloop_v6', 'TL.av1_binloopv2'):
... print ''.join(c for c in s.split('_',1)[0] if c.isalnum()).upper()
...
AGAV08
TLAV1
I put .upper() on the outside of the generator so it is only called once.
You don't have to use re for this. Simple string operations would be enough based on your requirements:
tests = """
AG.av08_binloop_v6 = AGAV08
TL.av1_binloopv2 = TLAV1
"""
for t in tests.splitlines():
print t[:t.find('_')].replace('.', '').upper()
# Returns:
# AGAV08
# TLAV1
Or if you absolutely must use re:
import re
pat = r'([a-zA-Z0-9.]+)_.*'
pat_re = re.compile(pat)
for t in tests.splitlines():
print re.sub(r'\.', '', pat_re.findall(t)[0]).upper()
# Returns:
# AGAV08
# TLAV1
He, just for fun, another option to get text before the first underscore is:
before_underscore, sep, after_underscore = str.partition('_')
So all in one line could be:
re.sub("[^A-Z\d]", "", str.partition('_')[0].upper())
import re
re.sub("[^A-Z\d]", "", yourstr.split('_',1)[0].upper())

Categories