I'd like to preface by saying that I'm VERY new to python so apologize if the answer to this is obvious. :) I have a python script does a couple of API calls and returns json data. The output currently looks something similar to below:
IP Information is below:
{
"id": 318283,
"name": "Name",
"type": "IP4Address",
"properties": "VLAN=5|DeviceName=Device|Notes=This is a description address|Administration=Team that admins the system|Location-Code=Location|address=1.2.3.4|state=STATIC|"
}
IP Subnet Range Information is below:
{
"id": 118836,
"name": "VLAN Description",
"type": "IP4Network",
"properties": "Location-Code=Location|Notes=Description of the subnet|CIDR=1.2.3.0/25|allowDuplicateHost=disable|inheritAllowDuplicateHost=true|pingBeforeAssign=disable|inheritPingBeforeAssign=true|inheritDefaultDomains=true|defaultView=118346|inheritDefaultView=true|inheritDNSRestrictions=true|"
I'd like for the response to not have the "id" string and would like to split the various | delimited strings in properties into their own string so it looks like below:
{
"id": 318283,
"name": "Name",
"type": "IP4Address",
"properties":{
"VLAN": “5”,
"DeviceName": “Device”,
"Notes": “Description”,
"Administration": “Admin team”,
"Location-Code": “Location”,
"Address": “1.2.3.4”,
"State": “STATIC”
}
}
{
"id": 118836,
"name": "Subnet name",
"type": "IP4Network",
"properties":{
"Location-Code": "Location",
"Notes": "Subnet description. ",
"CIDR": "1.2.3.0/25",
"allowDuplicateHost": "disable",
"inheritAllowDuplicateHost": "true",
"pingBeforeAssign": "disable",
"inheritPingBeforeAssign": "true",
"inheritDefaultDomains": "true",
"defaultView": "118346",
"inheritDefaultView": "true",
"inheritDNSRestrictions": "true"
}
}
Any suggestions are greatly appreciated!
Here's a way to handle your pipe-separated properties:
a = { "id": 318283, "name": "Name", "type": "IP4Address", "properties": "VLAN=5|DeviceName=Device|Notes=This is a description address|Administration=Team that admins the system|Location-Code=Location|address=1.2.3.4|state=STATIC|" }
new_properties = {}
for key_val_pair in a['properties'].split("|"):
if key_val_pair.strip() == "":
continue
key, val = key_val_pair.split("=")
new_properties[key] = val
a["properties"] = new_properties
print(a)
Output:
{'id': 318283, 'name': 'Name', 'type': 'IP4Address', 'properties': {'VLAN': '5', 'DeviceName': 'Device', 'Notes': 'This is a description address', 'Administration': 'Team that admins the system', 'Location-Code': 'Location', 'address': '1.2.3.4', 'state': 'STATIC'}}
Related
I need to convert a complex json file to csv using python, I tried a lot of codes without success, I came here for help,I updated the question, the JSON file is about a million,I need to convert them to csv format
csv file
{
"_id": {
"$oid": "2e3230"
},
"add": {
"address1": {
"address": "kvartira 14",
"zipcode": "10005",
},
"name": "Evgiya Kovava",
"address2": {
"country": "US",
"country_name": "NY",
}
}
}
{
"_id": {
"$oid": "2d118c8bo"
},
"add": {
"address1": {
"address": "kvartira 14",
"zipcode": "52805",
},
"name": "Eiya tceva",
"address2": {
"country": "US",
"country_name": "TX",
}
}
}
import pandas as pd
null = 'null'
data = {
"_id": {
"$oid": "2e3230s314i5dc07e118c8bo"
},
"add": {
"address": {
"address_type": "Door",
"address": "kvartira 14",
"city": "new york",
"region": null,
"zipcode": "10005",
},
"name": "Evgeniya Kovantceva",
"type": "Private person",
"code": null,
"additional_phone_nums": null,
"email": null,
"notifications": [],
"address": {
"address": "kvartira 14",
"city": "new york",
"region": null,
"zipcode": "10005",
"country": "US",
"country_name": "NY",
}
}
}
df = pd.json_normalize(data)
df.to_csv('yourpath.csv')
Beware the null value. The "address" nested dictionary comes inside "add" two times almost identical?
EDIT
Ok after your information it looks like json.JSONDecoder() is what you need.
Originally posted by #pschill on this link:
how to analyze json objects that are NOT separated by comma (preferably in Python)
I tried his code on your data:
import json
import pandas as pd
data = """{
"_id": {
"$oid": "2e3230"
},
"add": {
"address1": {
"address": "kvartira 14",
"zipcode": "10005"
},
"name": "Evgiya Kovava",
"address2": {
"country": "US",
"country_name": "NY"
}
}
}
{
"_id": {
"$oid": "2d118c8bo"
},
"add": {
"address1": {
"address": "kvartira 14",
"zipcode": "52805"
},
"name": "Eiya tceva",
"address2": {
"country": "US",
"country_name": "TX"
}
}
}"""
Keep in mind that your data also has trailing commas which makes the data unreadable (the last commas right before every closing bracket).
You have to remove them with some regex or another approach I am not familiar with. For the purpose of this answer I removed them manually.
So after that I tried this:
content = data
parsed_values = []
decoder = json.JSONDecoder()
while content:
value, new_start = decoder.raw_decode(content)
content = content[new_start:].strip()
# You can handle the value directly in this loop:
# print("Parsed:", value)
# Or you can store it in a container and use it later:
parsed_values.append(value)
which gave me an error but the list seems to get populated with all the values:
parsed_values
[{'_id': {'$oid': '2e3230'},
'add': {'address1': {'address': 'kvartira 14', 'zipcode': '10005'},
'name': 'Evgiya Kovava',
'address2': {'country': 'US', 'country_name': 'NY'}}},
{'_id': {'$oid': '2d118c8bo'},
'add': {'address1': {'address': 'kvartira 14', 'zipcode': '52805'},
'name': 'Eiya tceva',
'address2': {'country': 'US', 'country_name': 'TX'}}}]
next I did:
df = pd.json_normalize(parsed_values)
which worked fine.
You can always save that to a csv with:
df.to_csv('yourpath.csv')
Tell me if that helped.
Your json is quite problematic after all. Duplicate keys (problem), null value (unreadable), trailing commas (unreadable), not comma separated dicts... It didn't catch the eye at first :P
I have this json and I would like to get only the Name from every array. How do I write it in python,
Currently, I have this li = [item.get(data_new[0]'id') for item in data_new]
where data_new is my json data.
[
{
"id": "1687fbfa-8936-4b77-a7bc-123f9f276c49",
"attributes": [
{
"name": "status",
"value": "rejected",
"scope": "identity"
},
{
"name": "created_ts",
"value": "2020-06-25T16:22:07.578Z",
"scope": "system"
},
{
"name": "updated_ts",
"value": "2020-07-08T12:43:09.361Z",
"scope": "system"
},
{
"name": "artifact_name",
"value": "release-v10",
"scope": "inventory"
},
{
"name": "device_type",
"value": "proddemo-device",
"scope": "inventory"
},
],
"updated_ts": "2020-07-08T12:43:09.361Z"
},
{
"id": "0bf2a1fe-6004-473f-88b7-aab061972115",
"attributes": [
{
"name": "status",
"value": "rejected",
"scope": "identity"
},
{
"name": "created_ts",
"value": "2020-07-01T16:23:00.631Z",
"scope": "system"
},
{
"name": "updated_ts",
"value": "2020-07-08T17:41:16.45Z",
"scope": "system"
},
{
"name": "artifact_name",
"value": "Module_logs_v7",
"scope": "inventory"
},
{
"name": "cpu_model",
"value": "ARMv8 Processor",
"scope": "inventory"
},
{
"name": "device_type",
"value": "device",
"scope": "inventory"
},
{
"name": "hostname",
"value": "device004",
"scope": "inventory"
},
{
"name": "ipv4_br-d6eae8b3a339",
"value": "172.0.0.1/18",
"scope": "inventory"
}
],
"updated_ts": "2020-07-08T12:43:09.361Z"
}
]
This is the output snippet from my API and from this output I want to retrieve the value of the device whose name is hostname, as you can see that is the second last entry from this code where "name": "hostname"
So, I want to retrieve the value for that particular json only where the name will be "hostname", how can I do that.
Please guide me through.
a = [{'id': '291ae0e5956c69c2267489213df4459d19ed48a806603def19d417d004a4b67e',
'attributes': [{'name': 'ip_addr',
'value': '1.2.3.4',
'descriptionName': 'IP address'},
{'name': 'ports', 'value': ['8080', '8081'], 'description': 'Open ports'}],
'updated_ts': '2016-10-03T16:58:51.639Z'},
{'id': '76f40e5956c699e327489213df4459d1923e1a806603def19d417d004a4a3ef',
'attributes': [{'name': 'mac',
'value': '00:01:02:03:04:05',
'descriptionName': 'MAC address'}],
'updated_ts': '2016-10-04T18:24:21.432Z'}]
descriptionName = []
for i in a:
for j in i["attributes"]:
for k in j:
if k == "descriptionName":
descriptionName.append(j[k])
One liner:
[j["descriptionName"] for j in i["attributes"] for i in a if "descriptionName" in j ]
Output:
['IP address', 'MAC address']
Update 1:
To get all names
One liner code -
[j["name"] for j in i["attributes"] for i in a if "name" in j.keys()]
Output:
['status',
'status',
'created_ts',
'created_ts',
'updated_ts',
'updated_ts',
'artifact_name',
'artifact_name',
'cpu_model',
'cpu_model',
'device_type',
'device_type',
'hostname',
'hostname',
'ipv4_br-d6eae8b3a339',
'ipv4_br-d6eae8b3a339']
To get value for which name is "hostname"
[j["value"] for j in i["attributes"] for i in a if "name" in j.keys() and j["name"] == "hostname"]
Output:
['device004', 'device004']
I have a text file (record.txt) with the contents like this:
12-34,Doe,John:Art101,98:History201,56
56-78,Smith,Bob,bobsmith#email.com:Calculus300,45:Economics214,78:ECE415,84
The email field is optional so it may or may not be included for each person.
This is how the JSON format should look like:
[{
"id": "12-34", "lastname": "Doe", "firstname": "John",
"classes":[{
"classname":"Art101", "grade":"98"},{
"classname":"History201","grade":"56"}]
},
{
"id": "56-78", "lastname": "Smith", "firstname": "Bob",
"email":"bobsmith#email.com,
"classes":[{
"classname":"Calculus300", "grade":"45"},{
"classname":"Economics214","grade":"78"},
"classname":"ECE415", "grade":"84"}]
}]
I am new to Python and JSON so I am having a hard time wrapping my head around how to convert the contents in such a way where the email can be an optional field and how to serialize the classes for each person as well. I was unable to convert the data into JSON after multiple attempts.
Any suggestions or advice on how to tackle this would be greatly appreciated.
Thanks in advance!
Get line and first split on : and next every element split on ,. You can use len() to check if first part has 3 or 4 elements - if 4 then there is email.
import json
text = '''12-34,Doe,John:Art101,98:History201,56
56-78,Smith,Bob,bobsmith#email.com:Calculus300,45:Economics214,78:ECE415,84'''
all_data = []
for line in text.split('\n'):
line = line.strip()
parts = line.split(':')
data = parts[0].split(',')
classes = parts[1:]
item = {
'id': data[0],
'lastname': data[1],
'firstname': data[2],
'classes': [],
}
if len(data) > 3:
item['email'] = data[3]
for class_ in classes:
name, grade = class_.split(',')
item['classes'].append({'classname': name, 'grade': grade})
all_data.append(item)
print(json.dumps(all_data, indent=2))
Result:
[
{
"id": "12-34",
"lastname": "Doe",
"firstname": "John",
"classes": [
{
"classname": "Art101",
"grade": "98"
},
{
"classname": "History201",
"grade": "56"
}
]
},
{
"id": "56-78",
"lastname": "Smith",
"firstname": "Bob",
"classes": [
{
"classname": "Calculus300",
"grade": "45"
},
{
"classname": "Economics214",
"grade": "78"
},
{
"classname": "ECE415",
"grade": "84"
}
],
"email": "bobsmith#email.com"
}
]
I am not sure how you tried.
Read each line, split based on colon (:).
Split the 0th index with (,) if length is 4 process for email. Else ignore.
Hope this is clear
I'm trying to indexing some pandas dataframe into ElasticSearch. I have some troubles while parsing the json that I'm generating. I think that my problem is coming from the mapping. Please below find my code.
import logging
from pprint import pprint
from elasticsearch import Elasticsearch
import pandas as pd
def create_index(es_object, index_name):
created = False
# index settings
settings = {
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"danger": {
"dynamic": "strict",
"properties": {
"name": {
"type": "text"
},
"first_name": {
"type": "text"
},
"age": {
"type": "integer"
},
"city": {
"type": "text"
},
"sex": {
"type": "text",
},
}
}
}
}
try:
if not es_object.indices.exists(index_name):
#Ignore 400means to ignore "Index Already Exist" error
es_object.indices.create(index=index_name, ignore=400,
body=settings)
print('Created Index')
created = True
except Exception as ex:
print(str(ex))
finally:
return created
def store_record(elastic_object, index_name, record):
is_stored = True
try:
outcome = elastic_object.index(index=index_name,doc_type='danger', body=record)
print(outcome)
except Exception as ex:
print('Error in indexing data')
data = [['Hook', 'James','90', 'Austin','M'],['Sparrow','Jack','15', 'Paris', 'M'],['Kent','Clark','13', 'NYC', 'M'],['Montana','Hannah','28','Las Vegas', 'F'] ]
df = pd.DataFrame(data,columns=['name', 'first_name', 'age', 'city', 'sex'])
result = df.to_json(orient='records')
result = result[1:-1]
es = Elasticsearch()
if es is not None:
if create_index(es, 'cracra'):
out = store_record(es, 'cracra', result)
print('Data indexed successfully')
I got the following error
POST http://localhost:9200/cracra/danger [status:400 request:0.016s]
Error in indexing data
RequestError(400, 'mapper_parsing_exception', 'failed to parse')
Data indexed successfully
I don't know where it is coming from. If anyone may help me to solve this, I would be grateful.
Thanks a lot !
Try to remove extra commas from your mappings:
"mappings": {
"danger": {
"dynamic": "strict",
"properties": {
"name": {
"type": "text"
},
first_name": {
"type": "text"
},
"age": {
"type": "integer"
},
"city": {
"type": "text"
},
"sex": {
"type": "text", <-- here
}, <-- and here
}
}
}
UPDATE
It seems that the index is created successfully and the problem is in data indexing. As Nishant Saini noted you probably are trying to index several documents at a time. It can be done using Bulk API. Here is the example of correct request that indexes two documents:
POST cracra/danger/_bulk
{"index": {"_id": 1}}
{"name": "Hook", "first_name": "James", "age": "90", "city": "Austin", "sex": "M"}
{"index": {"_id": 2}}
{"name": "Sparrow", "first_name": "Jack", "age": "15", "city": "Paris", "sex": "M"}
Every document in the request body must appear in the new line with some meta information before it. In this case metainfo contains only id that must be assigned to the document.
You can either make this query by hand or use Elasticsearch Helpers for Python that can take care of adding correct metainfo.
I have a JSON schema validator where I need to check a specific field email to see if it's one of 4 possible emails. Lets call the possibilities ['test1', 'test2', 'test3', 'test4']. Sometimes the emails contain a \n new line separator so I need to account for that also. Is it possible to do a string contains method in JSON Schema?
Here is my schema without the email checks:
{
"type": "object",
"properties": {
"data": {
"type":"object",
"properties": {
"email": {
"type": "string"
}
},
"required": ["email"]
}
}
}
My input payload is:
{
"data": {
"email": "test3\njunktext"
}
}
I would need the following payload to pass validation since it has test3 in it. Thanks!
I can think of two ways:
Using enum you can define a list of valid emails:
{
"type": "object",
"properties": {
"data": {
"type": "object",
"properties": {
"email": {
"enum": [
"test1",
"test2",
"test3"
]
}
},
"required": [
"email"
]
}
}
}
Or with pattern which allows you to use a regular expression for matching a valid email:
{
"type": "object",
"properties": {
"data": {
"type": "object",
"properties": {
"email": {
"pattern": "test"
}
},
"required": [
"email"
]
}
}
}