How to add quotes around words in python using regex? - python

I have following json type string text
blockAddress:{strandId:"C1DYN7Cag8oDCRRoIJ1uAz",
sequenceNo:68794},
transactionId:"AYj8Vf4kQ9EE6BJJbvt3js",
blockTimestamp:2019-12-03T08:00:04.899000001Z,
blockHash:{{gdOVqf7AsgaQf90ZK1Hsva2lzPckHnxGmm3plDRBeGA=}},
entriesHash:{{n8oUjERAqT9kL+Cr59P6UPJbIdyPvaP0R9ey9+Njdzc=}}
I want to add quotes(" ") around a word which has [a-zA-Z] characters and ends with colon(:) symbol.
Then my above string need to look likes follows:
"blockAddress":{"strandId":"C1DYN7Cag8oDCRRoIJ1uAz",
"sequenceNo":68794},
"transactionId":"AYj8Vf4kQ9EE6BJJbvt3js",
"blockTimestamp":2019-12-03T08:00:04.899000001Z,
"blockHash":{{gdOVqf7AsgaQf90ZK1Hsva2lzPckHnxGmm3plDRBeGA=}},
"entriesHash":{{n8oUjERAqT9kL+Cr59P6UPJbIdyPvaP0R9ey9+Njdzc=}}
I am trying this re.sub(r'([a-zA-Z]+:)', r'"\1"', s). But I am getting quotes after colon. Like
"blockAddress:"{"strandId:""C1DYN7Cag8oDCRRoIJ1uAz",
"sequenceNo:"68794},
"transactionId:""AYj8Vf4kQ9EE6BJJbvt3js",
"blockTimestamp:"2019-12-03T08:00:04.899000001Z,
"blockHash:"{{gdOVqf7AsgaQf90ZK1Hsva2lzPckHnxGmm3plDRBeGA=}},
"entriesHash:"{{n8oUjERAqT9kL+Cr59P6UPJbIdyPvaP0R9ey9+Njdzc=}}
What I need to change in above regex? or Is there any different approach in python?

Regex101:
txt = '''blockAddress:{strandId:"C1DYN7Cag8oDCRRoIJ1uAz",
sequenceNo:68794},
transactionId:"AYj8Vf4kQ9EE6BJJbvt3js",
blockTimestamp:2019-12-03T08:00:04.899000001Z,
blockHash:{{gdOVqf7AsgaQf90ZK1Hsva2lzPckHnxGmm3plDRBeGA=}},
entriesHash:{{n8oUjERAqT9kL+Cr59P6UPJbIdyPvaP0R9ey9+Njdzc=}}'''
import re
print( re.sub(r'([a-zA-Z]+):', r'"\1":', txt) )
Prints:
"blockAddress":{"strandId":"C1DYN7Cag8oDCRRoIJ1uAz",
"sequenceNo":68794},
"transactionId":"AYj8Vf4kQ9EE6BJJbvt3js",
"blockTimestamp":2019-12-03T08:00:04.899000001Z,
"blockHash":{{gdOVqf7AsgaQf90ZK1Hsva2lzPckHnxGmm3plDRBeGA=}},
"entriesHash":{{n8oUjERAqT9kL+Cr59P6UPJbIdyPvaP0R9ey9+Njdzc=}}

Sounds like what you want is:
re.sub(r'([a-zA-Z]+):', r'"\1":', s)

Using a lookahead assertion:
re.sub(r'([a-zA-Z]+)(?=:)', r'"\1"', input)

Related

How to remove text before a particular character or string in multi-line text?

I want to remove all the text before and including */ in a string.
For example, consider:
string = ''' something
other things
etc. */ extra text.
'''
Here I want extra text. as the output.
I tried:
string = re.sub("^(.*)(?=*/)", "", string)
I also tried:
string = re.sub(re.compile(r"^.\*/", re.DOTALL), "", string)
But when I print string, it did not perform the operation I wanted and the whole string is printing.
I suppose you're fine without regular expressions:
string[string.index("*/ ")+3:]
And if you want to strip that newline:
string[string.index("*/ ")+3:].rstrip()
The problem with your first regex is that . does not match newlines as you noticed. With your second one, you were closer but forgot the * that time. This would work:
string = re.sub(re.compile(r"^.*\*/", re.DOTALL), "", string)
You can also just get the part of the string that comes after your "*/":
string = re.search(r"(\*/)(.*)", string, re.DOTALL).group(2)
Update: After doing some research, I found that the pattern (\n|.) to match everything including newlines is inefficient. I've updated the answer to use [\s\S] instead as shown on the answer I linked.
The problem is that . in python regex matches everything except newlines. For a regex solution, you can do the following:
import re
strng = ''' something
other things
etc. */ extra text.
'''
print(re.sub("[\s\S]+\*/", "", strng))
# extra text.
Add in a .strip() if you want to remove that remaining leading whitespace.
to keep text until that symbol you can do:
split_str = string.split(' ')
boundary = split_str.index('*/')
new = ' '.join(split_str[0:boundary])
print(new)
which gives you:
something
other things
etc.
string_list = string.split('*/')[1:]
string = '*/'.join(string_list)
print(string)
gives output as
' extra text. \n'

Getting word from string

How can i get word example from such string:
str = "http://test-example:123/wd/hub"
I write something like that
print(str[10:str.rfind(':')])
but it doesn't work right, if string will be like
"http://tests-example:123/wd/hub"
You can use this regex to capture the value preceded by - and followed by : using lookarounds
(?<=-).+(?=:)
Regex Demo
Python code,
import re
str = "http://test-example:123/wd/hub"
print(re.search(r'(?<=-).+(?=:)', str).group())
Outputs,
example
Non-regex way to get the same is using these two splits,
str = "http://test-example:123/wd/hub"
print(str.split(':')[1].split('-')[1])
Prints,
example
You can use following non-regex because you know example is a 7 letter word:
s.split('-')[1][:7]
For any arbitrary word, that would change to:
s.split('-')[1].split(':')[0]
many ways
using splitting:
example_str = str.split('-')[-1].split(':')[0]
This is fragile, and could break if there are more hyphens or colons in the string.
using regex:
import re
pattern = re.compile(r'-(.*):')
example_str = pattern.search(str).group(1)
This still expects a particular format, but is more easily adaptable (if you know how to write regexes).
I am not sure why do you want to get a particular word from a string. I guess you wanted to see if this word is available in given string.
if that is the case, below code can be used.
import re
str1 = "http://tests-example:123/wd/hub"
matched = re.findall('example',str1)
Split on the -, and then on :
s = "http://test-example:123/wd/hub"
print(s.split('-')[1].split(':')[0])
#example
using re
import re
text = "http://test-example:123/wd/hub"
m = re.search('(?<=-).+(?=:)', text)
if m:
print(m.group())
Python strings has built-in function find:
a="http://test-example:123/wd/hub"
b="http://test-exaaaample:123/wd/hub"
print(a.find('example'))
print(b.find('example'))
will return:
12
-1
It is the index of found substring. If it equals to -1, the substring is not found in string. You can also use in keyword:
'example' in 'http://test-example:123/wd/hub'
True

How can I select this specific part of a string using regex?

Hi and thank you for your time.
I have the following example string: "Hola Luis," but the string template will always be "Hola {{name}},".
How would the regex be to match any name? You can assume the name will follow a blank space and "Hola" before that and it will have a comma right after it.
Thank you!
You can use the following regular expression, assuming that as you mention, the format is always the same:
import re
s = "Hola Luis,"
re.search('Hola (\w+),', s).group(1)
# 'Luis'
s = 'Hola test'
re.match(r'Hola (\w+)', s).groups()[0]
results:
'test'
Continuing from #yatu,
Without regex:
print("Hola Luis,".split(" ")[1].strip(","))
Explanation:
split(" ") # to split the string with spaces
[1] # to get the forthcoming part
strip(",") # to strip off any ','
OUTPUT:
Luis
According to Falsehoods Programmers Believe About Names and your requirements, I'll use the following regex: (?<=Hola )[^,]+(?=,).

regex replace '...' at the end of the string

I have a string like:
text1 = 'python...is...fun...'
I want to replace the multiple '.'s to one '.' only when they are at the end of the string, i want the output to be:
python...is...fun.
So when there is only one '.' at the end of the string, then it won't be replaced
text2 = 'python...is...fun.'
and the output is just the same as text2
My regex is like this:
text = re.sub(r'(.*)\.{2,}$', r'\1.', text)
which i want to match any string then {2 to n} of '.' at the end of the string, but the output is:
python...is...fun..
any ideas how to do this?
Thx in advance!
You are making it a bit complex, you can easily do it by using regex as \.+$ and replace the regex pattern with single . character.
>>> text1 = 'python...is...fun...'
>>> new_text = re.sub(r"\.+$", ".", text1)
>>> 'python...is...fun.'
You may extend this regex further to handle the cases with input such as ... only, etc but the main concept was that there is no need to counting the number of ., as you have done in your answer.
Just look for the string ending with three periods, and replace them with a single one.
import re
x = "foo...bar...quux..."
print(re.sub('\.{2,}$', '.', x))
// foo...bar...quux.
import re
print(re.sub(r'\.{2,}$', '.', 'I...love...python...'))
As simple as that. Note that you need to escape the . because otherwise, it means whichever char
except \n.
I want to replace the multiple '.'s to one '.' only when they are at
the end of the string
For such simple case it's easier to substitute without importing re module, checking the value of the last 3 characters:
text1 = 'python...is...fun...'
text1 = text1[:-2] if text1[-3:] == '...' else text1
print(text1)
The output:
python...is...fun.

How to find a string between to special characters in python?

I have a set of strings like this:
uc001acu.2;C1orf159;chr1:1046736-1056736;uc001act.2;C1orf159;
I need to extract the sub-string between two semicolons and I only need the first occurrence.
The result should be: C1orf159
I have tried this code, but it does not work:
import re
info = "uc001acu.2;C1orf159;chr1:1046736-1056736;uc001act.2;C1orf159;"
name = re.search(r'\;(.*)\;', info)
print name.group()
Please help me.
Thanks
You can split the string and limit it to two splits.
x = info.split(';',2)[1]
import re
pattern=re.compile(r".*?;([a-zA-Z0-9]+);.*")
print pattern.match(info).groups()
This looks for first ; eating up non greedily through .*? .Then it captures the alpha numeric string until next ; is found.Then it eats up the rest of the string.Match captured though .groups()

Categories