How to get certain elements using jsonpath + Python? - python

I have a file named source.json, the content is
{
"msg": "OK",
"result": {
"data": {
"articles": [
{
"idtArticle": "CF00002",
"promotionService": {
"bundleSales": [
{
"code": "201900001"
},
{
"code": "201900002"
}
]
}
},
{
"idtArticle": "CF00003",
"promotionService": {
"bundleSales": [
{
"code": "201900001"
},
{
"code": "201900003"
}
]
}
}
]
}
}
}
I have Python code as following:
import json
import jsonpath
json_source = 'source.json'
with open(json_source, encoding='utf-8') as f:
root = json.loads(f.read())
if __name__ == "__main__":
result = jsonpath.jsonpath(root, """$..articles[?(#.idtArticle == "CF00002")]""")
print(result)
The code works and I can get the article whose idtArticle is CF00002, but how to get the article list whose code(or one of the 2 codes) is 201900001?
Appreciate all the helps!

jsonpath does not support projections so I would do what you want in simple python.
import json
json_source = 'source.json'
with open(json_source, encoding='utf-8') as f:
root = json.loads(f.read())
if __name__ == "__main__":
articles = root['result']['data']['articles']
result = []
for article in articles:
bundleSales = article['promotionService']['bundleSales']
for bundleSale in bundleSales:
if bundleSale['code'] == "201900001":
result.append(article['idtArticle'])
print(result)
You can test it with an extended example:
{
"msg": "OK",
"result": {
"data": {
"articles": [
{
"idtArticle": "CF00002",
"promotionService": {
"bundleSales": [
{
"code": "201900001"
},
{
"code": "201900002"
}
]
}
},
{
"idtArticle": "CF00003",
"promotionService": {
"bundleSales": [
{
"code": "201900001"
},
{
"code": "201900003"
}
]
}
},
{
"idtArticle": "CF00004",
"promotionService": {
"bundleSales": [
{
"code": "201900002"
},
{
"code": "201900003"
}
]
}
}
]
}
}
}
It prints ['CF00002', 'CF00003'].

Related

update nested json object in python

I have a json file name input which as follows
{
"abc": {
"dbc": {
"type": "string",
"metadata": {
"description": "Name of the namespace"
}
},
"fgh": {
"type": "string",
"metadata": {
"description": "Name of the Topic"
}
}
},
"resources": [
{
"sku": {
"name": "[parameters('sku')]"
},
"properties": {},
"resources": [
{
"resources": [
{
"resources": [
{
"properties": {
"filterType": "SqlFilter",
"sqlFilter": {
"sqlExpression": "HAI"
}
}
}
]
}
]
}
]
}
]
}
I want "sqlExpression": "HAI" value to be replaced with BYE as below
"sqlExpression": "BYE"
I want python code to do it, I tried the below code but not working
input['resources'][0]['resources'][0]['resources'][0]['resources'][0][properties][0][sqlFilter][0][sqlExpression][0]='BYE'
inp = {
"abc": {
"dbc": {
"type": "string",
"metadata": {
"description": "Name of the namespace"
}
},
"fgh": {
"type": "string",
"metadata": {
"description": "Name of the Topic"
}
}
},
"resources": [
{
"sku": {
"name": "[parameters('sku')]"
},
"properties": {},
"resources": [
{
"resources": [
{
"resources": [
{
"properties": {
"filterType": "SqlFilter",
"sqlFilter": {
"sqlExpression": "HAI"
}
}
}
]
}
]
}
]
}
]
}
inp['resources'][0]['resources'][0]['resources'][0]['resources'][0]['properties']['sqlFilter']['sqlExpression']='BYE'
print(inp)
Result
{'abc': {'dbc': ...truncated... {'sqlExpression': 'BYE'}}}]}]}]}]}

Extract specific column from csv stored in S3

I am querying Athena thru lambda. Results are getting stored in csv format in S3 bucket.
The csv files has two columns - EventTime and instance id.
I am reading csv file via one of function in my lambda handler:
def read_instanceids(path):
s3 = boto3.resource('s3')
bucket = s3.Bucket('aws-athena-query-results-mybucket-us-east-1')
obj = bucket.Object(key= path)
response = obj.get()
lines = response['Body'].read().decode('utf-8').split()
return lines**
Output:
[
"\"eventTime\",\"instanceId\"",
"\"2021-09-27T19:46:08Z\",\"\"\"i-0aa1f4dd\"\"\"",
"\"2021-09-27T21:04:13Z\",\"\"\"i-0465c287\"\"\"",
"\"2021-09-27T21:10:48Z\",\"\"\"i-08b75f79\"\"\"",
"\"2021-09-27T19:40:43Z\",\"\"\"i-0456700b\"\"\"",
"\"2021-03-29T21:58:40Z\",\"\"\"i-0724f99f\"\"\"",
"\"2021-03-29T23:27:44Z\",\"\"\"i-0fafbe64\"\"\"",
"\"2021-03-29T21:41:12Z\",\"\"\"i-0064a8552\"\"\"",
"\"2021-03-29T23:19:09Z\",\"\"\"i-07f5f08e5\"\"\""
]
I want to store only my instance ids in one array.
How I can achieve that. I cant use Pandas/Numpy.
If I am using get_query_results - and returning the response - its in the below format:
[
{
"Data": [
{
"VarCharValue": "eventTime"
},
{
"VarCharValue": "instanceId"
}
]
},
{
"Data": [
{
"VarCharValue": "2021-09-23T22:36:15Z"
},
{
"VarCharValue": "\"i-053090803\""
}
]
},
{
"Data": [
{
"VarCharValue": "2021-03-29T21:58:40Z"
},
{
"VarCharValue": "\"i-0724f62a\""
}
]
},
{
"Data": [
{
"VarCharValue": "2021-03-29T21:41:12Z"
},
{
"VarCharValue": "\"i-552\""
}
]
},
{
"Data": [
{
"VarCharValue": "2021-03-29T23:19:09Z"
},
{
"VarCharValue": "\"i-07f4e5\""
}
]
},
{
"Data": [
{
"VarCharValue": "2021-03-29T23:03:09Z"
},
{
"VarCharValue": "\"i-0eb453\""
}
]
},
{
"Data": [
{
"VarCharValue": "2021-03-30T19:18:11Z"
},
{
"VarCharValue": "\"i-062120\""
}
]
},
{
"Data": [
{
"VarCharValue": "2021-03-30T18:15:26Z"
},
{
"VarCharValue": "\"i-0121a04\""
}
]
},
{
"Data": [
{
"VarCharValue": "2021-03-29T23:27:44Z"
},
{
"VarCharValue": "\"i-0f213\""
}
]
},
{
"Data": [
{
"VarCharValue": "2021-03-30T18:07:05Z"
},
{
"VarCharValue": "\"i-0ee19d8\""
}
]
},
{
"Data": [
{
"VarCharValue": "2021-04-28T14:49:22Z"
},
{
"VarCharValue": "\"i-04ad3c29\""
}
]
},
{
"Data": [
{
"VarCharValue": "2021-04-28T14:38:43Z"
},
{
"VarCharValue": "\"i-7c6166\""
}
]
},
{
"Data": [
{
"VarCharValue": "2021-03-30T19:13:42Z"
},
{
"VarCharValue": "\"i-07bc579d\""
}
]
},
{
"Data": [
{
"VarCharValue": "2021-04-29T19:47:34Z"
},
{
"VarCharValue": "\"i-0b8bc7df5\""
}
]
}
]
You can use the result returned from Amazon Athena via get_query_results().
If the data variable contains the JSON shown in your question, you can extract a list of the instances with:
rows = [row['Data'][1]['VarCharValue'].replace('"', '') for row in data]
print(rows)
The output is:
['instanceId', 'i-053090803', 'i-0724f62a', 'i-552', 'i-07f4e5', 'i-0eb453', 'i-062120', 'i-0121a04', 'i-0f213', 'i-0ee19d8', 'i-04ad3c29', 'i-7c6166', 'i-07bc579d', 'i-0b8bc7df5']
You can skip the column header by referencing: rows[1:]
IF your list was valid, you can do:
l = [ "eventTime",
"instanceId",
"2021-09-27T19:46:08Z",
"i-0aa1f4dd",
"2021-09-27T21:04:13Z",
"""i-0465c287""",
"2021-09-27T21:10:48Z",
"""i-08b75f79""",
"2021-09-27T19:40:43Z",
"""i-0456700b""",
"2021-03-29T21:58:40Z",
"""i-0724f99f""",
"2021-03-29T23:27:44Z",
"""i-0fafbe64""",
"2021-03-29T21:41:12Z",
"""i-0064a8552""",
"2021-03-29T23:19:09Z",
"""i-07f5f08e5""" ]
print(l[2:][1::2])
['i-0aa1f4dd', 'i-0465c287', 'i-08b75f79', 'i-0456700b', 'i-0724f99f', 'i-0fafbe64', 'i-0064a8552', 'i-07f5f08e5']
Python has csv module in standard library. https://docs.python.org/3/library/csv.html
But in this use case, if instanceIds doesn't contain comma you can split lines by comma, take second field and strip double quotes.
def read_instanceids(path):
s3 = boto3.resource('s3')
bucket = s3.Bucket('aws-athena-query-results-mybucket-us-east-1')
obj = bucket.Object(key= path)
response = obj.get()
lines = response['Body'].read().decode('utf-8').split()
return [line.split(',')[1].strip('"') for line in lines[1:]]

Property ID error when trying to insert data into MongoDB through Python notebook

I'm trying to insert data into a MondoDB database through the following command written in Python - I'm using MongoDB's terminal to insert it:
db.sensores.insert(
I've tried editing it like this but now I'm getting a different error:
{
"timestamp": "2020-05-25T10:30:00Z",
"sensor_id": 1,
"location_id": 1,
"Ubicacion": "Valladolid",
"Coordenadas": "41.638597, 4.740186",
"Medidas": [
{
"tipo_medida":"Temperatura",
"valor":22.08,
"unidad":"ºC"
},
{
"tipo_medida":"Humedad_relativa",
"valor":34.92,
"unidad":"%"
}
]
},
{
"timestamp": "2020-05-28T11:30:00Z",
"sensor_id": 1,
"location_id": 2,
"Ubicacion": "Sevilla",
"Coordenadas": "37.409311, -5.949939",
"Medidas": [
{
"tipo_medida":"Temperatura",
"valor":21.12,
"unidad":"ºC"
},
{
"tipo_medida":"Humedad_relativa",
"valor":37.7,
"unidad":"%"
}
]
},
{
"timestamp": "2020-05-28T1:30:00Z",
"sensor_id": 2,
"location_id": 2,
"Ubicacion":"Sevilla",
"Coordenadas": "37.409311, -5.949939",
"medidas":[
{
"tipo_medida":"Emision_CO2",
"valor":2.102,
"unidad":"gCO2/m2"
},
{
"tipo_medida":"Consumo_electrico",
"valor":0.00272,
"unidad":"kWh/m2"
}
]
},
{
"timestamp": "2020-05-25T10:30:00Z",
"sensor_id": 2,
"location_id": 1,
"Ubicacion": "Valladolid",
"Coordenadas": "41.638597, 4.740186",
"medidas":[
{
"tipo_medida":"Emision_CO2",
"valor":1.626,
"unidad":"gCO2/m2"
},
{
"tipo_medida":"Consumo_electrico",
"valor":0.00146,
"unidad":"kWh/m2"
}
]
}
]
)
Now I'm getting the following error:
"Parse error on line 19:
...%"
}
]
},
{
"times
--------------------^
Expecting 'EOF', got ','"
I've tried going over all the brackets and punctuation but haven't been able to workout what I'm doing wrong. Anybody knows what the error means?
There are multiple issues with your JSON file. Here is a valid JSON of your data. Try it with this:
[
{
"timestamp":"2020-05-25T10:30:00Z",
"sensor_id":1,
"location_id":1,
"Ubicacion":"Valladolid",
"Coordenadas":"41.638597, 4.740186",
"Medidas":[
{
"tipo_medida":"Temperatura",
"valor":22.08,
"unidad":"ºC"
},
{
"tipo_medida":"Humedad_relativa",
"valor":34.92,
"unidad":"%"
}
]
},
{
"timestamp":"2020-05-28T11:30:00Z",
"sensor_id":1,
"location_id":2,
"Ubicacion":"Sevilla",
"Coordenadas":"37.409311, -5.949939",
"Medidas":[
{
"tipo_medida":"Temperatura",
"valor":21.12,
"unidad":"ºC"
},
{
"tipo_medida":"Humedad_relativa",
"valor":37.7,
"unidad":"%"
}
]
},
{
"timestamp":"2020-05-28T1:30:00Z",
"sensor_id":2,
"location_id":2,
"Ubicacion":"Sevilla",
"Coordenadas":"37.409311, -5.949939",
"medidas":[
{
"tipo_medida":"Emision_CO2",
"valor":2.102,
"unidad":"gCO2/m2"
},
{
"tipo_medida":"Consumo_electrico",
"valor":0.00272,
"unidad":"kWh/m2"
}
]
},
{
"timestamp":"2020-05-25T10:30:00Z",
"sensor_id":2,
"location_id":1,
"Ubicacion":"Valladolid",
"Coordenadas":"41.638597, 4.740186",
"medidas":[
{
"tipo_medida":"Emision_CO2",
"valor":1.626,
"unidad":"gCO2/m2"
},
{
"tipo_medida":"Consumo_electrico",
"valor":0.00146,
"unidad":"kWh/m2"
}
]
}
]

Unhashable type 'dict' when trying to send an Elasticsearch

I keep on getting the following error in Python
Exception has occurred: TypeError unhashable type: 'dict'
on line 92
"should": [],
"must_not": []
This is the query string
res = es.search(
scroll = '2m',
index = "logstash-*",
body = {
{
"aggs": {
"2": {
"terms": {
"field": "src_ip.keyword",
"size": 50,
"order": {
"1": "desc"
}
},
"aggs": {
"1": {
"cardinality": {
"field": "src_ip.keyword"
}
}
}
}
},
"size": 0,
"_source": {
"excludes": []
},
"stored_fields": [
"*"
],
"script_fields": {},
"docvalue_fields": [
{
"field": "#timestamp",
"format": "date_time"
},
{
"field": "flow.start",
"format": "date_time"
},
{
"field": "timestamp",
"format": "date_time"
},
{
"field": "tls.notafter",
"format": "date_time"
},
{
"field": "tls.notbefore",
"format": "date_time"
}
],
"query": {
"bool": {
"must": [
{
"range": {
"#timestamp": {
"gte": 1555777931992,
"lte": 1558369931992,
"format": "epoch_millis"
}
}
}
],
"filter": [
{
"match_all": {}
}
],
"should": [],
"must_not": []
}
}
}
})
the value of body is a set ({ } without key-value is a set literal, e.g., {1,2} is a set). Inside this set you have a dictionary.
Items in a set have to be hashable, and dictionary isn't.
As the comment from #Carcigenicate says, it seems like a typo of having {{ }} instead of { } for the value of body.
Elasticsearch documentation shows that body should be a dictionary.
More about sets from python docs

Customize Python JSON object_hook

I am trying to customize json data using object_hook in Python 3, but do not know how to get started. Any pointers are much appreciated. I am trying to introduce a new key and move existing data into the new key in Python Object.
I am trying to convert below json text:
{
"output": [
{
"Id": "101",
"purpose": "xyz text",
"array": [
{
"data": "abcd"
},
{
"data": "ef gh ij"
}
]
},
{
"Id": "102",
"purpose": "11xyz text",
"array": [
{
"data": "abcd"
},
{
"data": "java"
},
{
"data": "ef gh ij"
}
]
}
]
}
to
{
"output": [
{
"Id": "101",
"mydata": {
"purpose": "xyz text",
"array": [
{
"data": "abcd"
},
{
"data": "ef gh ij"
}
]
}
},
{
"Id": "102",
"mydata": {
"purpose": "11xyz text",
"array": [
{
"data": "abcd"
},
{
"data": "java"
},
{
"data": "ef gh ij"
}
]
}
}
]
}
My Python JSON object hook is defined as:
class JSONObject:
def __init__( self, dict ):
vars(self).update( dict )
def toJSON(self):
return json.dumps(self, default=lambda o: o.__dict__,
sort_keys=True, indent=4)
You can specify a custom object_pairs_hook (input_json is the string with your input JSON).
def mydata_hook(obj):
obj_d = dict(obj)
if 'Id' in obj_d:
return {'Id': obj_d['Id'], 'mydata': {k: v for k, v in obj_d.items() if 'Id' not in k}}
else:
return obj_d
print(json.dumps(json.loads(input_json, object_pairs_hook=mydata_hook), indent=2))
And the output:
{
"output": [
{
"mydata": {
"array": [
{
"data": "abcd"
},
{
"data": "ef gh ij"
}
],
"purpose": "xyz text"
},
"Id": "101"
},
{
"mydata": {
"array": [
{
"data": "abcd"
},
{
"data": "java"
},
{
"data": "ef gh ij"
}
],
"purpose": "11xyz text"
},
"Id": "102"
}
]
}

Categories