How find data from JSON using python and watson discovery news

How find data from JSON using python and watson discovery news - python

{
"matching_results": 1264,
"results": [
{
"main_image_url": "https://s4.reutersmedia.net/resources_v2/images/rcom-default.png",
"enriched_text": {
"entities": [
{
"relevance": 0.33,
"disambiguation": {
"subtype": [
"Country"
]
},
"sentiment": {
"score": 0
},
"type": "Location",
"count": 1,
"text": "China"
},
{
"relevance": 0.33,
"disambiguation": {
"subtype": [
"Country"
]
},
"sentiment": {
"score": 0
},
This is too much large file so I want to find "relevance" and "score" using python.
How fetch this info?

Regardless of how large it is, it is only a simple dictionary.
Iterate the lists. Extract the key-values.
for result in data['results']:
for e in result['enriched_text']['entities']:
print(e['relevance'])
print(e['sentiment']['score'])

Related

Python Cubes OLAP Framework - How to sum a json column?

I started using Python Cubes Olap recently.
I'm trying to sum/avg a JSON postgres column, how can i do this?
my db structure:
events
id
object_type
sn_name
spectra
id
snx_wavelengths (json column)
event_id
my json:
{
"dimensions": [
{
"name": "event",
"levels": [
{
"name": "object_type",
"label": "Object Type",
"attributes": [
"object_type"
]
},
{
"name": "sn_name",
"label": "name",
"attributes": [
"sn_name"
]
}
]
},
{
"name": "spectra",
"levels": [
{
"name": "catalog_name",
"label": "Catalog Name",
"attributes": [
"catalog_name"
]
},
{
"name": "capture_date",
"label": "Capture Date",
"attributes": [
"capture_date"
]
}
]
},
{
"name": "date"
}
],
"cubes": [
{
"id": "uid",
"name": "14G31Yx98ZG8aEhFHjOWNNBmFOETg5APjZo5AiHaqog5YxLMK5",
"dimensions": [
"event",
"spectra",
"date"
],
"aggregates": [
{
"name": "event_snx_wavelengths_sum",
"function": "sum",
"measure": "event.snx_wavelengths"
},
{
"name": "record_count",
"function": "count"
}
],
"joins": [
{
"master": "14G31Yx98ZG8aEhFHjOWNNBmFOETg5APjZo5AiHaqog5YxLMK5.id",
"detail": "spectra.event_id"
},
],
"mappings": {
"event.sn_name": "sn_name",
"event.object_type": "object_type",
"spectra.catalog_name": "spectra.catalog_name",
"spectra.capture_date": "spectra.capture_date",
"event.snx_wavelengths": "spectra.snx_wavelengths",
"date": "spectra.capture_date"
},
}
]
}
I'm getting the follow error:
Unknown attribute ''event.snx_wavelengths''
Anyone can help?
I already tried use mongodb to do the sum, i didnt had success.

Using pandas to convert csv into nested json with dynamic strucutre

I am new to python and now want to convert a csv file into json file. Basically the json file is nested with dynamic structure, the structure will be defined using the csv header.
From csv input:
ID, Name, person_id/id_type, person_id/id_value,person_id_expiry_date,additional_info/0/name,additional_info/0/value,additional_info/1/name,additional_info/1/value,salary_info/details/0/grade,salary_info/details/0/payment,salary_info/details/0/amount,salary_info/details/1/next_promotion
1,Peter,PASSPORT,A452817,1-01-2055,Age,19,Gender,M,Manager,Monthly,8956.23,unknown
2,Jane,PASSPORT,B859804,2-01-2035,Age,38,Gender,F,Worker, Monthly,125980.1,unknown
To json output:
[
{
"ID": 1,
"Name": "Peter",
"person_id": {
"id_type": "PASSPORT",
"id_value": "A452817"
},
"person_id_expiry_date": "1-01-2055",
"additional_info": [
{
"name": "Age",
"value": 19
},
{
"name": "Gender",
"value": "M"
}
],
"salary_info": {
"details": [
{
"grade": "Manager",
"payment": "Monthly",
"amount": 8956.23
},
{
"next_promotion": "unknown"
}
]
}
},
{
"ID": 2,
"Name": "Jane",
"person_id": {
"id_type": "PASSPORT",
"id_value": "B859804"
},
"person_id_expiry_date": "2-01-2035",
"additional_info": [
{
"name": "Age",
"value": 38
},
{
"name": "Gender",
"value": "F"
}
],
"salary_info": {
"details": [
{
"grade": "Worker",
"payment": " Monthly",
"amount": 125980.1
},
{
"next_promotion": "unknown"
}
]
}
}
]
Is this something can be done by the existing pandas API or I have to write lots of complex codes to dynamically construct the json object? Thanks.

How to browse and get only json position 0 in python [duplicate]

This question already has answers here:
How to extract data from dictionary in the list
(3 answers)
Closed 11 months ago.
I have the following json output.
"detections": [
{
"source": "detection",
"uuid": "50594028",
"detectionTime": "2022-03-27T06:50:56Z",
"ingestionTime": "2022-03-27T07:04:50Z",
"filters": [
{
"id": "F2058",
"unique_id": "3638f7c0",
"level": "critical",
"name": "Possible Right-To-Left Override Attack",
"description": "Possible Right-To-Left Override Detected in the Filename",
"tactics": [
"TA0005"
],
"techniques": [
"T1036.002"
],
"highlightedObjects": [
{
"field": "fileName",
"type": "filename",
"value": [
"1465940311.,S=473394(NONAMEFL(Z00057-PI‮fdp.exe))"
]
},
{
"field": "filePathName",
"type": "fullpath",
"value": "/exports/10_19/mail/12/91/20193/new/1465940311.,S=473394(NONAMEFL(Z00057-PI‮fdp.exe))"
},
{
"field": "malName",
"type": "detection_name",
"value": "HEUR_RLOTRICK.A"
},
{
"field": "actResult",
"type": "text",
"value": [
"Passed"
]
},
{
"field": "scanType",
"type": "text",
"value": "REALTIME"
}
]
},
{
"id": "F2140",
"unique_id": "5a313874",
"level": "medium",
"name": "Malicious Software",
"description": "A malicious software was detected on an endpoint.",
"tactics": [],
"techniques": [],
"highlightedObjects": [
{
"field": "fileName",
"type": "filename",
"value": [
"1465940311.,S=473394(NONAMEFL(Z00057-PI‮fdp.exe))"
]
},
{
"field": "filePathName",
"type": "fullpath",
"value": "/exports/10_19/mail/12/91/rs001291-excluido-20193/new/1465940311.,S=473394(NONAMEFL(Z00057-PI‮fdp.exe))"
},
{
"field": "malName",
"type": "detection_name",
"value": "HEUR_RLOTRICK.A"
},
{
"field": "actResult",
"type": "text",
"value": [
"Passed"
]
},
{
"field": "scanType",
"type": "text",
"value": "REALTIME"
},
{
"field": "endpointIp",
"type": "ip",
"value": [
"xxx.xxx.xxx"
]
}
]
}
],
"entityType": "endpoint",
"entityName": "xxx(xxx.xxx.xxx)",
"endpoint": {
"name": "xxx",
"guid": "d1dd7e61",
"ips": [
"2xx.xxx.xxx"
]
}
}
Inside the 'filters' offset it brings me two levels, one critical and one medim, both with the variable 'name'.
I want to print only the first name, but when I print the 'name', it returns both names:
How do I print only the first one?
If I put print in for filters, it returns both names:
If I put print in for detections, it only returns the second 'name' and that's not what I want:

If you only want to print the name of the first filter, why iterate over it, just index it and print the value under "name":
for d in r['detections']:
print(d['filters'][0]['name'])

how to convert multi valued CSV to Json

I have a csv file with 4 columns data as below.
type,MetalType,Date,Acknowledge
Metal,abc123451,2018-05-26,Success
Metal,abc123452,2018-05-27,Success
Metal,abc123454,2018-05-28,Failure
Iron,abc123455,2018-05-29,Success
Iron,abc123456,2018-05-30,Failure
( I just provided header in the above example data but in my case i dont have header in the data)
how can i convert above csv file to Json in the below format...
1st Column : belongs to --> "type": "Metal"
2nd Column : MetalType: "values" : "value": "abc123451"
3rd column : "Date": "values":"value": "2018-05-26"
4th Column : "Acknowledge": "values":"value": "Success"
and remaining all columns are default values.
As per below format ,
{
"entities": [
{
"id": "XXXXXXX",
"type": "Metal",
"data": {
"attributes": {
"MetalType": {
"values": [
{
"source": "XYZ",
"locale": "Australia",
"value": "abc123451"
}
]
},
"Date": {
"values": [
{
"source": "XYZ",
"locale": "Australia",
"value": "2018-05-26"
}
]
},
"Acknowledge": {
"values": [
{
"source": "XYZ",
"locale": "Australia",
"value": "Success"
}
]
}
}
}
}
]
}

Even though jww is right, I built something for you:
I import the csv using pandas:
df = pd.read_csv('data.csv')
then I create a template for the dictionaries you want to add:
d_json = {"entities": []}
template = {
"id": "XXXXXXX",
"type": "",
"data": {
"attributes": {
"MetalType": {
"values": [
{
"source": "XYZ",
"locale": "Australia",
"value": ""
}
]
},
"Date": {
"values": [
{
"source": "XYZ",
"locale": "Australia",
"value": ""
}
]
},
"Acknowledge": {
"values": [
{
"source": "XYZ",
"locale": "Australia",
"value": ""
}
]
}
}
}
}
Now you just need to fill in the dictionary:
for i in range(len(df)):
d = template
d['type'] = df['type'][i]
d['data']['attributes']['MetalType']['values'][0]['value'] = df['MetalType'][i]
d['data']['attributes']['Date']['values'][0]['value'] = df['Date'][i]
d['data']['attributes']['Acknowledge']['values'][0]['value'] = df['Acknowledge'][i]
d_json['entities'].append(d)
I know my way of iterating over the df is kind of ugly, maybe someone knows a cleaner way.
Cheers!

How to reshape DataFrame for nested JSON output

I am starting to work with some data manipulation and I need to create a new file (with new features) out of an old one. However, I could not realize how can I customize my own dataframe before using a ".to_json" method.
For example, I have a .csv as:
seller, customer, product, price
Roger, Will, 8129, 30
Roger, Markus, 1234, 100
Roger, Will, 2334, 50
Mike, Markus, 2295, 20
Mike, Albert, 1234, 100
...and I want to generate a .json file to support me in visualizing a network out of it. This should be more or less like:
{
"node": [
{"id":"Roger", "group": "seller" },
{"id":"Mike", "group": "seller" },
{"id":"Will", "group": "customer" },
{"id":"Markus", "group": "customer" },
{"id":"Albert", "group": "customer" }
],
"links":[
{"source":"Roger","target":"Will","product":8129,"price":30},
#...and so on
]
}
I tried to do something like:
df1 = pd.read_csv('file.csv')
seller_list = df1.seller.unique()
customer_list = df1.customer.unique()
..and I could get indeed lists with unique items. However, I could not find how I should add them in a dataframe in order to create an structure such as:
"node":[
...
{"id":"Mike", "group": "seller" },
{"id":"Markus", "group": "customer" },
...
]...#see above
Any support or hint on this is appreciated.

This will be a two step process. First, create the nodes dict using melt + drop_duplicates +to_dict -
nodes = df[['customer', 'seller']]\
.melt(var_name='group', value_name='id')\
.drop_duplicates()\
.to_dict('r')
Now, create the links dict using rename + to_dict
links = df.rename(columns={'seller' : 'source', 'customer' : 'target'}).to_dict('r')
Now, combine the data into one dictionary, and dump it as JSON to a file.
data = {'nodes' : nodes, 'links' : links}
with open('data.json', 'w') as f:
json.dump(data, f, indent=4)
Your data.json file should look like this -
{
"nodes": [
{
"id": "Will",
"group": "customer"
},
{
"id": "Markus",
"group": "customer"
},
{
"id": "Albert",
"group": "customer"
},
{
"id": "Roger",
"group": "seller"
},
{
"id": "Mike",
"group": "seller"
}
],
"links": [
{
"product": 8129,
"target": "Will",
"source": "Roger",
"price": 30
},
{
"product": 1234,
"target": "Markus",
"source": "Roger",
"price": 100
},
{
"product": 2334,
"target": "Will",
"source": "Roger",
"price": 50
},
{
"product": 2295,
"target": "Markus",
"source": "Mike",
"price": 20
},
{
"product": 1234,
"target": "Albert",
"source": "Mike",
"price": 100
}
]
}

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How find data from JSON using python and watson discovery news - python

Regardless of how large it is, it is only a simple dictionary. Iterate the lists. Extract the key-values. for result in data['results']: for e in result['enriched_text']['entities']: print(e['relevance']) print(e['sentiment']['score'])

Related

Python Cubes OLAP Framework - How to sum a json column?

Using pandas to convert csv into nested json with dynamic strucutre

How to browse and get only json position 0 in python [duplicate]

how to convert multi valued CSV to Json

How to reshape DataFrame for nested JSON output

Categories

Resources