input format data in json with double curly braces - python

I get json from the other server like
{"field":"zxczxczcx_{{name}}_qweqweqwe"}
So the question is how to format that value?
I've tried
d = {"field":"zxczxczcx_{{name}}_qweqweqwe"}
d['field'].format('any_string')
But it just remove one pair of curly braces and output
"zxczxczcx_{name}_qweqweqwe"

Maybe you can use the replace method?
d = {"field":"zxczxczcx_{{name}}_qweqweqwe"}
d['field'] = d['field'].replace('{{name}}','any_string')
print(d)
Based on your comments (this uses the re module (regular expressions) to find the {{x}} pattern) :
import re
tokens_to_replace = re.findall('{{.*}}',d['field'])
for token in tokens_to_replace:
d['field'] = d['field'].replace(token,d[token[2:-2]])
tokens_to_replace will have value: ['{{name}}']

Related

python3 - json.loads for a string that contains " in a value

I'm trying to transform a string that contains a dict to a dict object using json.
But in the data contains a "
example
string = '{"key1":"my"value","key2":"my"value2"}'
js = json.loads(s,strict=False)
it outputs json.decoder.JSONDecodeError: Expecting ',' delimiter: line 1 column 13 (char 12) as " is a delimiter and there is too much of it
What is the best way to achieve my goal ?
The solution I have found is to perform several .replace on the string to replace legit " by a pattern until only illgal " remains then replace back the pattern by the legit "
After that I can use json.loads and then replace the remaining pattern by the illegal "
But there must be another way
ex :
string = '{"key1":"my"value","key2":"my"value2"}'
string = string.replace('{"','__pattern_1')
string = string.replace('}"','__pattern_2')
...
...
string = string.replace('"','__pattern_42')
string = string.replace('__pattern_1','{"')
string = string.replace('__pattern_2','}"')
...
...
js = json.loads(s,strict=False)
This should work. What I am doing here is to simply replace all the expected double quotes with something else and then remove the unwanted double quotes. and then convert it back.
import re
import json
def fix_json_string(st):
st = re.sub(r'","',"!!",st)
st = re.sub(r'":"',"--",st)
st = re.sub(r'{"',"{{",st)
st = re.sub(r'"}',"}}",st)
st = st.replace('"','')
st = re.sub(r'}}','"}',st)
st = re.sub(r'{{','{"',st)
st = re.sub(r'--','":"',st)
st = re.sub(r'!!','","',st)
return st
broken_string = '{"key1":"my"value","key2":"my"value2"}'
fixed_string = fix_json_string(broken_string)
print(fixed_string)
js = json.dumps(eval(fixed_string))
print(js)
Output -
{"key1":"myvalue","key2":"myvalue2"} # str
{"key1": "myvalue", "key2": "myvalue2"} # converted to json
The variable string is not a valid JSON string.
The correct string should be:
string = '{"key1":"my\\"value","key2":"my\\"value2"}'
Problem is, that the string contains invalid json format.
String '{"key1": "my"value", "key2": "my"value2"}': value of key1 ends with "my" and additional characters value" are against the format.
You can use character escaping, valid json would look like:
{"key1": "my\"value", "key2": "my\"value2"}.
Since you are defining it as value you would then need to escape the escape characters:
string = '{"key1": "my\\"value", "key2": "my\\"value2"}'
There is a lot of educative material online on character escaping. I recommend to check it out if something is not clear
Edit: If you insist on fixing the string in code (which I don't recommend, see comment) you can do something like this:
import re
import json
string = '{"key1":"my"value","key2":"my"value2"}'
# finds contents of keys and values, assuming that the key the real key/value ending double quotes
# followed by one of following characters: ,}:]
m = re.finditer(r'"([^:]+?)"(?:[,}:\]])', string)
new_string = string
for i in reversed(list(m)):
ss, se = i.span(1) # first group holds the content
# escape double backslashes in the content and add all back together
# note that this is not effective. Bigger amounts of replacements would require other approach of concatanation
new_string = new_string[:ss] + new_string[ss:se].replace('"', '\\"') + new_string[se:]
json.loads(new_string)
This assumes that the real ending double quotes are followed by one of ,:}]. In other cases this won't work

I want to extract data using regular expression in python

I have a string = "ProductId%3D967164%26Colour%3Dbright-royal" and i want to extract data using regex so output will be 967164bright-royal.
I have tried with this (?:ProductId%3D|Colour%3D)(.*) in python with regex, but getting output as 967164%26Colour%3Dbright-royal.
Can anyone please help me to find out regex for it.
You don't need a regex here, use urllib.parse module:
from urllib.parse import parse_qs, unquote
qs = "ProductId%3D967164%26Colour%3Dbright-royal"
d = parse_qs(unquote(qs))
print(d)
# Output:
{'ProductId': ['967164'], 'Colour': ['bright-royal']}
Final output:
>>> ''.join(i[0] for i in d.values())
'967164bright-royal'
Update
>>> ''.join(re.findall(r'%3D(\S*?)(?=%26|$)', qs))
'967164bright-royal'
The alternative matches on the first part, you can not get a single match for 2 separate parts in the string.
If you want to capture both values using a regex in a capture group:
(?:ProductId|Colour)%3D(\S*?)(?=%26|$)
Regex demo
import re
pattern = r"(?:ProductId|Colour)%3D(\S*?)(?=%26|$)"
s = "ProductId%3D967164%26Colour%3Dbright-royal"
print(''.join(re.findall(pattern, s)))
Output
967164bright-royal
If you must use a regular expression and you can guarantee that the string will always be formatted the way you expect, you could try this.
import re
pattern = r"ProductId%3D(\d+)%26Colour%3D(.*)"
string = "ProductId%3D967164%26Colour%3Dbright-royal"
matches = re.match(pattern, string)
print(f"{matches[1]}{matches[2]}")

How to extract substrings between brackets while ignoring those between nested brackets in Python?

I have a string:
phy = '(s1:0.6507212936,((s2:0.4186036213,s3:0.4186036213):0.1428084058,((s4:0.1429514535,s5:0.1429514535):0.1695879844,s6:0.3125394379):0.2488725892):0.08930926654);'
How can I extract only the substrings that are enclosed between brackets and that do not contain any brackets within each substring? So, from my example I require two outputs: "s2:0.4186036213,s3:0.4186036213" and "s4:0.1429514535,s5:0.1429514535".
You can use regular rexpressions:
import re
phy = '(s1:0.6507212936,((s2:0.4186036213,s3:0.4186036213):0.1428084058,((s4:0.1429514535,s5:0.1429514535):0.1695879844,s6:0.3125394379):0.2488725892):0.08930926654);'
re.findall(r'\(([^\(\)]*)\)', phy)
# ['s2:0.4186036213,s3:0.4186036213', 's4:0.1429514535,s5:0.1429514535']
This captures everything non-brackety enclosed in opening-closing brackets. It does not, however, validate correct nesting levels.
Try this:
from collections import defaultdict
bracket_dict = defaultdict(int)
bracket_dict_ ={
'(':')',
'{':'}',
'[':']'
}
bracket_dict.update(bracket_dict_)
bracket_list = bracket_dict.keys()
phy = '(s1:0.6507212936,((s2:0.4186036213,s3:0.4186036213):0.1428084058,((s4:0.1429514535,s5:0.1429514535):0.1695879844,s6:0.3125394379):0.2488725892):0.08930926654);'
inner_items=[]
brackets = []
start_index = None
for i in range(len(phy)):
if phy[i] in bracket_list:
start_index = i
brackets.append(phy[i])
if brackets:
if phy[i] == bracket_dict[brackets[-1]]:
inner_items.append(phy[start_index+1 : i])
brackets.append(phy[i])
print(inner_items)
#['s2:0.4186036213,s3:0.4186036213', 's4:0.1429514535,s5:0.1429514535']
Use regex:
import re
reg = re.compile(r'[(]([^()]+)[)]')
phy = '(s1:0.6507212936,((s2:0.4186036213,s3:0.4186036213):0.1428084058,((s4:0.1429514535,s5:0.1429514535):0.1695879844,s6:0.3125394379):0.2488725892):0.08930926654)'
print(reg.findall(phy))
Output :
C:\Users\Desktop>py x.py
['s2:0.4186036213,s3:0.4186036213', 's4:0.1429514535,s5:0.1429514535']

regex matching and get into a python list

I have the following saved as a string in a variable:
window.dataLayer=[{"articleCondition":"New","categoryNr":"12345","sellerCustomerNr":"88888888","articleStatus":"Open"}]
How do I extract the values of each element?
Goal would be to have something like this:
articleCondition = 'new'
categoryNr = '12345'
...
In python there are many ways to get value from a string, you can use regex, Python eval function and even more ways that I may not know.
Method 1
value = 'window.dataLayer=[{"articleCondition":"New","categoryNr":"12345","sellerCustomerNr":"88888888","articleStatus":"Open"}]'
value = value.split('=')[1]
data = eval(value)[0]
articleCondition = data['articleCondition']
Method 2
using regex
import re
re.findall('"articleCondition":"(\w*)"',value)
for regex you can be more creative to make a generall pattern.
You are having a list of dictionary. Use the dictionary key to get the value.
Ex:
dataLayer=[{"articleCondition":"New","categoryNr":"12345","sellerCustomerNr":"88888888","articleStatus":"Open"}]
print(dataLayer[0]["articleCondition"])
print(dataLayer[0]["categoryNr"])
Output:
New
12345
Use json. Your string is:
>>> s = 'window.dataLayer=[{"articleCondition":"New","categoryNr":"12345","sellerCustomerNr":"88888888","articleStatus":"Open"}]'
You can get the right hand side of the  = with a split:
>>> s.split('=')[1]
'[{"articleCondition":"New","categoryNr":"12345","sellerCustomerNr":"88888888","articleStatus":"Open"}]'
Then parse it with the json module:
>>> import json
>>> t = json.loads(s.split('=')[1])
>>> t[0]['articleCondition']
'New'
Please note that this works because you have double quotes in the RHS. Single quotes are not allowed in JSON.

Why doesn't this regular expression match in this string?

I want to be able to replace a string in a file using regular expressions. But my function isn't finding a match. So I've mocked up a test to replicate what's happening.
I have defined the string I want to replace as follows:
string = 'buf = O_strdup("ONE=001&TYPE=PUZZLE&PREFIX=EXPRESS&");'
I want to replace the "TYPE=PUZZLE&PREFIX=EXPRESS&" part with something else. NB. the string won't always contain exactly "PUZZLE" and "PREFIX" in the original file, but it will be of that format ).
So first I tried testing that I got the correct match.
obj = re.search(r'TYPE=([\^&]*)\&PREFIX=([\^&]*)\&', string)
if obj:
print obj.group()
else:
print "No match!!"
Thinking that ([\^&]*) will match any number of characters that are NOT an ampersand.
But I always get "No match!!".
However,
obj = re.search(r'TYPE=([\^&]*)', string)
returns me "TYPE="
Why doesn't my first one work?
Since the ^ sign is escaped with \ the following part: ([\^&]*) matches any sequence of these characters: ^, &.
Try replacing it with ([^&]*).
In my regex tester, this does work: 'TYPE=(.*)\&PREFIX=(.*)\&'
Try this instead
obj = re.search(r'TYPE=(?P<type>[^&]*?)&PREFIX=(?P<prefix>[^&]*?)&', string)
The ?P<some_name> is a named capture group and makes it a little bit easier to access the captured group, obj.group("type") -->> 'PUZZLE'
It might be better to use the functions urlparse.parse_qsl() and urllib.urlencode() instead of regular expressions. The code will be less error-prone:
from urlparse import parse_qsl
from urllib import urlencode
s = "ONE=001&TYPE=PUZZLE&PREFIX=EXPRESS&"
a = parse_qsl(s)
d = dict(TYPE="a", PREFIX="b")
print urlencode(list((key, d.get(key, val)) for key, val in a))
# ONE=001&TYPE=a&PREFIX=b

Categories