Related
There is a JSON file with unknown structure.
I need to find an attribute of a known name in this file and, if it exists, return the name of its parent node, or nodes (if there are multiple instances of the attribute).
Example #1:
Input file:
{
"attr1": {
"attr2": {
"attr3": "somevalue"
"attr7": "someothervalue"
}
}
}
Attribute name: "attr7"
Expected return value: "attr2"
Example #2:
Input file:
{
"some": {
"deeply": {
"nested": {
"stuff": {
"array1": [
{"this":"value1"},
{"this":"value2"},
{"this":"value3"}
]
}
}
}
}
}
Attribute name: "this"
Expected return value: "array1"
Example #3:
(similar to #2 but with a duplicate)
Input file:
{
"some": {
"deeply": {
"nested": {
"this": {
"array1": [
{"this":"value1"},
{"this":"value2"},
{"this":"value3"}
]
}
}
}
}
}
Attribute name: "this"
Expected return value: "array1", "nested"
My starting point is:
import json
if __name__ == "__main__":
jsonFileName = "file.json"
attributeName = "this"
jsonFile = open(jsonFileName, "r")
jsonData = json.load(jsonFile)
# ???
I found this one: Access JSON element with parent name unknown but it is not really applicable in my case because they know the structure of their data and I don't.
Any hints?
So, with a bit of a back and forth with a more experienced colleague I came up with the following solution:
def findKey(jsonData: json, keyName: str, accessPath: str):
if isinstance(jsonData, str):
return None
for key in jsonData.keys():
if key == keyName:
return accessPath + f"/{keyName};"
if isinstance(jsonData[key], list):
for jd in jsonData[key]:
fk = findKey(jd, keyName, accessPath + "/[]" + key)
if(fk):
return fk
elif isinstance(jsonData[key], dict):
fk = findKey(jsonData[key], keyName, accessPath + "/{}" + key)
if(fk):
return fk
return None
following Update json nodes in Python using jsonpath, would like to know how one might update the JSON data given a certain context.
So, say we pick the exact same JSON example:
{
"SchemeId": 10,
"nominations": [
{
"nominationId": 1
}
]
}
But this time, would like to double the value of the original value, hence some lambda function is needed which takes into account the current node value.
No need for lambdas; for example, to double SchemeId, something like this should work:
data = json.loads("""the json string above""")
jsonpath_expr = parse('$.SchemeId')
jsonpath_expr.find(data)
val = jsonpath_expr.find(data)[0].value
jsonpath_expr.update(data, val*2)
print(json.dumps(data, indent=2))
Output:
{
"SchemeId": 20,
"nominations": [
{
"nominationId": 1
}
]
}
Here is example with lambda expression:
import json
from jsonpath_ng import parse
settings = '''{
"choices": {
"atm": {
"cs": "Strom",
"en": "Tree"
},
"bar": {
"cs": "Dům",
"en": "House"
},
"sea": {
"cs": "Moře",
"en": "Sea"
}
}
}'''
json_data = json.loads(settings)
pattern = parse('$.choices.*')
def magic(f: dict, to_lang='cs'):
return f[to_lang]
pattern.update(json_data,
lambda data_field, data, field: data.update({field: magic(data[field])}))
json_data
returns
{
'choices': {
'atm': 'Strom',
'bar': 'Dům',
'sea': 'Moře'
}
}
I'm scraping a site and the data I want is included in a script tag of an html page, I wrote a re code to find a match but it seems I am doing it the wrong way.
Hub = {};
Hub.config = {
config: {},
get: function(key) {
if (key in this.config) {
return this.config[key];
} else {
return null;
}
},
set: function(key, val) {
this.config[key] = val;
}
};
Hub.config.set('sku', {
valCartInfo : {
itemId : '576938415361',
cartUrl: '//cart.mangolane.com/cart.htm'
},
apiRelateMarket : '//tui.mangolane.com/recommend?appid=16&count=4&itemid=576938415361',
apiAddCart : '//cart.mangolane.com/add_cart_item.htm?item_id=576938415361',
apiInsurance : '',
wholeSibUrl : '//detailskip.mangolane.com/service/getData/1/p1/item/detail/sib.htm?itemId=576938415361&sellerId=499095250&modules=dynStock,qrcode,viewer,price,duty,xmpPromotion,delivery,upp,activity,fqg,zjys,amountRestriction,couponActivity,soldQuantity,page,originalPrice,tradeContract',
areaLimit : '',
bigGroupUrl : '',
valPostFee : '',
coupon : {
couponApi : '//detailskip.mangolane.com/json/activity.htm?itemId=576938415361&sellerId=499095250',
couponWidgetDomain: '//assets.mgcdn.com',
cbUrl : '/cross.htm?type=weibo'
},
valItemInfo : {
defSelected: -1,
skuMap : {";20549:103189693;1627207:811754571;":{"price":"528.00","stock":"2","skuId":"4301611864655","oversold":false},
";20549:59280855;1627207:412796441;":{"price":"528.00","stock":"2","skuId":"4432149803707","oversold":false},
";20549:59280855;1627207:196576508;":{"price":"528.00","stock":"2","skuId":"4018119863100","oversold":false},
";20549:72380707;1627207:28341;":{"price":"528.00","stock":"2","skuId":"4166690818570","oversold":false},
";20549:418624880;1627207:28341;":{"price":"528.00","stock":"2","skuId":"4166690818566","oversold":false},
";20549:418624880;1627207:196576508;":{"price":"528.00","stock":"2","skuId":"4018119863098","oversold":false},
";20549:72380707;1627207:3224419;":{"price":"528.00","stock":"2","skuId":"4166690818571","oversold":false},
";20549:147478970;1627207:196576508;":{"price":"528.00","stock":"2","skuId":"4018119863094","oversold":false},
";20549:72380707;1627207:384366805;":{"price":"528.00","stock":"2","skuId":"4432149803708","oversold":false},
";20549:296172561;1627207:811754571;":{"price":"528.00","stock":"2","skuId":"4301611864659","oversold":false},
";20549:72380707;1627207:1150336209;":{"price":"528.00","stock":"2","skuId":"4301611864664","oversold":false},
";20549:147478970;1627207:93586002;":{"price":"528.00","stock":"2","skuId":"4018119863095","oversold":false}}
,propertyMemoMap: {"1627207:811754571":"黑色单里(预售) 年后2.29发货","1627207:93586002":"黑色加绒 现货","1627207:412796441":"黑色(兔毛) 现货","1627207:384366805":"米白色(兔毛) 现货","1627207:3224419":"驼色 现货","1627207:1150336209":"驼色单里(预售) 年后2.29发货","1627207:28341":"黑色 现货","1627207:196576508":"驼色加绒 现货"}
}
});
I need to get only the data in Hub.config.set('sku'
I did this but it didnt work
config_base_str = re.findall("Hub.config.set ({[\s\S]*?});", config) where config is the string of data
The period and parenthesis have a special meaning in regex. If you want to search for the literal characters, you will need to escape them first with a backslash.
For example assuming the string:
config = """
Hub.config.set('sku', {
valCartInfo : {
itemId : '576938415361',
cartUrl: '//cart.mangolane.com/cart.htm'
},
.........
};
"""
If you only want the key, you can do something like this:
config_base_str = re.findall("Hub\.config\.set\('(\w*)", config) # ['sku']
If you want everything after the key within the brackets, you can do something like this instead:
config_base_str = re.findall("Hub\.config\.set\('\w*',\s*({[\s\S]*})", config) # ["{\n valCartInfo : {} ...}"]
https://regex101.com/r/QHdaG2/3/
How can I make a string from json text when the json text contains many, many quotation marks and string escapes?
For example, the following works:
json_string = """
{
"styles":[
{
"label":"Style",
"target":{
"label":"Target"
},
"overrides":{
"materialProperties":{
"CRYPTO_ID":{
"script":{
"binding":"name"
}
}
}
}
}
]
}
"""
However this does not, due to the escapes:
new_string = """
{
"styles":[
{
"label":"Style",
"target":{
"label":"Target",
"objectName":"*"
},
"overrides":{
"materialProperties":{
"perObj":{
"script":{
"code":"cvex myFn(string myObj=\"\"; export string perObj=\"\") { perObj = myObj; } ",
"bindings":{
"myObj":"myObj"
}
}
}
}
}
}
]
}
"""
Is there a smart way to break this up? I've had no luck breaking it out into chunks and re-assembling to form the same thing when joined and printed.
Your string per se is valid JSON, however Python still sees the \ as special characters.
Use a raw string by prefixing your string with r:
import json
new_string = r"""
{
"styles":[
{
"label":"Style",
"target":{
"label":"Target",
"objectName":"*"
},
"overrides":{
"materialProperties":{
"perObj":{
"script":{
"code":"cvex myFn(string myObj=\"\"; export string perObj=\"\") { perObj = myObj; } ",
"bindings":{
"myObj":"myObj"
}
}
}
}
}
}
]
}
"""
json.loads( new_string )
Or escape your \ characters:
import json
new_string = """
{
"styles":[
{
"label":"Style",
"target":{
"label":"Target",
"objectName":"*"
},
"overrides":{
"materialProperties":{
"perObj":{
"script":{
"code":"cvex myFn(string myObj=\\"\\"; export string perObj=\\"\\") { perObj = myObj; } ",
"bindings":{
"myObj":"myObj"
}
}
}
}
}
}
]
}
"""
json.loads( new_string )
I would recommend reading from an actual JSON file rather than embedding it into your Python code:
with open('path/to/file.json') as f:
json_string = f.read()
Or, if you need the JSON parsed into Python objects (dicts, lists etc.):
import json
with open('path/to/file.json') as f:
json_data = json.load(f)
I have a JSON input that consists of a list of dictionaries as unicode characters:
Example:
input = u'[{
attributes: {
NAME: "Name_1ĂĂÎÎ",
TYPE: "Tip1",
LOC_JUD: "Bucharest",
LAT_LON: "234343/432545",
S70: "2342345",
MAP: "Map_one",
SCH: "1:5000,
SURSA: "PPP"
}
}, {
attributes: {
NAME: "NAME_2șțț",
TYPE: "Tip2",
LOC_JUD: "cea",
LAT_LON: "123/54645",
S70: "4324",
MAP: "Map_two",
SCH: "1:578000",
SURSA: "PPP"
}
}
]
'
How can I parse this string into a list of dictionaries? I tried to do this using:
import json
json_d = json.dumps(input)
print type(json_d) # string object / Not list of dicts
json_obj = json.loads(json_d) # unicode object / Not list of dicts
I cannot parse the contents of the JSON:
print json_obj[0]["attributes"]
TypeError: string indices must be integers
I am using Python 2.7.11. Thanks for any help!
Try a simplified example:
s = '[{attributes: { a: "foo", b: "bar" } }]'
The main problem is your string is not in a valid JSON:
>>> json.loads(s)
[...]
JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 3 (char 2)
If the input is generated by you, then fix it. If it comes from somewhere else, then you will need to edit it before loading it with the json module.
Note how having a proper JSON, the .load() method works as expected:
>>> s = '[{"attributes": { "a": "foo", "b": "bar" } }]'
>>> json.loads(s)
[{'attributes': {'a': 'foo', 'b': 'bar'}}]
>>> type(json.loads(s))
list
As others have mentioned, your input data is not JSON. Ideally, that should be fixed upstream so that you do get valid JSON.
However, if that's out of your control you can convert that data to JSON.
The main problem is all those unquoted keys. We can fix that by using a regex to search for a valid name in the first field on each line. If a valid name is found we wrap it in double quotes.
import json
import re
source = u'''[{
attributes: {
NAME: "Name_1ĂĂÎÎ",
TYPE: "Tip1",
LOC_JUD: "Bucharest",
LAT_LON: "234343/432545",
S70: "2342345",
MAP: "Map_one",
SCH: "1:5000",
SURSA: "PPP"
}
}, {
attributes: {
NAME: "NAME_2șțț",
TYPE: "Tip2",
LOC_JUD: "cea",
LAT_LON: "123/54645",
S70: "4324",
MAP: "Map_two",
SCH: "1:578000",
SURSA: "PPP"
}
}
]
'''
# Split source into lines, then split lines into colon-separated fields
a = [s.strip().split(': ') for s in source.splitlines()]
# Wrap names in first field in double quotes
valid_name = re.compile('(^\w+$)')
for row in a:
row[0] = valid_name.sub(r'"\1"', row[0])
# Recombine the data and load it
data = json.loads(' '.join([': '.join(row) for row in a]))
# Test
print data[0]["attributes"]
print '- ' * 30
print json.dumps(data, indent=4, ensure_ascii=False)
output
{u'LOC_JUD': u'Bucharest', u'NAME': u'Name_1\u0102\u0102\xce\xce', u'MAP': u'Map_one', u'SURSA': u'PPP', u'S70': u'2342345', u'TYPE': u'Tip1', u'LAT_LON': u'234343/432545', u'SCH': u'1:5000'}
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
[
{
"attributes": {
"LOC_JUD": "Bucharest",
"NAME": "Name_1ĂĂÎÎ",
"MAP": "Map_one",
"SURSA": "PPP",
"S70": "2342345",
"TYPE": "Tip1",
"LAT_LON": "234343/432545",
"SCH": "1:5000"
}
},
{
"attributes": {
"LOC_JUD": "cea",
"NAME": "NAME_2șțț",
"MAP": "Map_two",
"SURSA": "PPP",
"S70": "4324",
"TYPE": "Tip2",
"LAT_LON": "123/54645",
"SCH": "1:578000"
}
}
]
Note that this code is a little fragile. It works with data that's in the format shown in the question, but it won't work if there are more than one key-value pair on a line.
As I said earlier, the best way to fix this problem is upstream, where the non-JSON is being produced.