I have a text file (record.txt) with the contents like this:
12-34,Doe,John:Art101,98:History201,56
56-78,Smith,Bob,bobsmith#email.com:Calculus300,45:Economics214,78:ECE415,84
The email field is optional so it may or may not be included for each person.
This is how the JSON format should look like:
[{
"id": "12-34", "lastname": "Doe", "firstname": "John",
"classes":[{
"classname":"Art101", "grade":"98"},{
"classname":"History201","grade":"56"}]
},
{
"id": "56-78", "lastname": "Smith", "firstname": "Bob",
"email":"bobsmith#email.com,
"classes":[{
"classname":"Calculus300", "grade":"45"},{
"classname":"Economics214","grade":"78"},
"classname":"ECE415", "grade":"84"}]
}]
I am new to Python and JSON so I am having a hard time wrapping my head around how to convert the contents in such a way where the email can be an optional field and how to serialize the classes for each person as well. I was unable to convert the data into JSON after multiple attempts.
Any suggestions or advice on how to tackle this would be greatly appreciated.
Thanks in advance!
Get line and first split on : and next every element split on ,. You can use len() to check if first part has 3 or 4 elements - if 4 then there is email.
import json
text = '''12-34,Doe,John:Art101,98:History201,56
56-78,Smith,Bob,bobsmith#email.com:Calculus300,45:Economics214,78:ECE415,84'''
all_data = []
for line in text.split('\n'):
line = line.strip()
parts = line.split(':')
data = parts[0].split(',')
classes = parts[1:]
item = {
'id': data[0],
'lastname': data[1],
'firstname': data[2],
'classes': [],
}
if len(data) > 3:
item['email'] = data[3]
for class_ in classes:
name, grade = class_.split(',')
item['classes'].append({'classname': name, 'grade': grade})
all_data.append(item)
print(json.dumps(all_data, indent=2))
Result:
[
{
"id": "12-34",
"lastname": "Doe",
"firstname": "John",
"classes": [
{
"classname": "Art101",
"grade": "98"
},
{
"classname": "History201",
"grade": "56"
}
]
},
{
"id": "56-78",
"lastname": "Smith",
"firstname": "Bob",
"classes": [
{
"classname": "Calculus300",
"grade": "45"
},
{
"classname": "Economics214",
"grade": "78"
},
{
"classname": "ECE415",
"grade": "84"
}
],
"email": "bobsmith#email.com"
}
]
I am not sure how you tried.
Read each line, split based on colon (:).
Split the 0th index with (,) if length is 4 process for email. Else ignore.
Hope this is clear
Related
I have this sample.json file with me:
{
"details":[
{
"name": "",
"class": "4",
"marks": "72.6"
},
{
"name": "David",
"class": "",
"marks": "78.2"
},
{
"name": "Emily",
"class": "4",
"marks": ""
}
]
}
As you can see for the first one; "name" is string datatype is actually empty.
For the second one; "class" with integer datatype is empty.
And for the third one; "marks" with float datatype is empty.
Now my task is;
to find the fields which are empty, if string is empty replace it with "BLANK", if integer is empty replace it with 0, and if float is empty replace it with 0.0
P.S: I'm doing this with Python like this:
import json
path = open('D:\github repo\python\sample.json')
df = json.load(path)
for i in df["details"]:
print(i["name"])
Also make sure that I don't want to hard-code the values. Coz here if we see there are only 3 fields(name, class, marks) but what if I have more that 3. Then what? How will I find which fields are empty or not?
Like you see here:
{
"code": "AAA",
"lat": "-17.3595",
"lon": "-145.494",
"name": "Anaa Airport",
"city": "Anaa",
"state": "Tuamotu-Gambier",
"country": "French Polynesia",
"woeid": "12512819",
"tz": "Pacific\/Midway",
"phone": "",
"type": "Airports",
"email": "",
"url": "",
"runway_length": "4921",
"elev": "7",
"icao": "NTGA",
"direct_flights": "2",
"carriers": "1"
},
This is just one block, I've N-number of blocks like this. That's why I can't hard_code the values right?
Can anybody help me with it!
Thank You so much!
Since the type info isn't available anywhere programmatically, and there seem to be only three hard-coded fields, I'd just check each of them explicitly.
Short-circuiting with the or operator would even allow you to achieve this fairly elegantly:
for d in df['details']:
d['name'] = d['name'] or 'BLANK'
d['class'] = d['class'] or '0'
d['marks'] = d['marks'] or '0.0'
You could check whether the string is empty with a simple if statement like so.
if not i['name'] == ""
Alternatively, you could also do
if not i['name']
The second if statement makes use of falsy and truthy values in Python. Here's a link to read more about it
You could create a dictionary empty_replacements mapping each key to its corresponding desired empty value:
import json
sample_json = {
"details": [
{
"name": "",
"class": "4",
"marks": "72.6"
},
{
"name": "David",
"class": "",
"marks": "78.2"
},
{
"name": "Emily",
"class": "4",
"marks": ""
}
]
}
empty_replacements = {"name": "BLANK", "class": "0", "marks": "0.0"}
sample_json["details"] = [{
k: v if v else empty_replacements[k]
for k, v in d.items()
} for d in sample_json["details"]]
print('sample_json after replacements: ')
print(json.dumps(
sample_json,
sort_keys=False,
indent=4,
))
Output:
sample_json after replacements:
{
"details": [
{
"name": "BLANK",
"class": "4",
"marks": "72.6"
},
{
"name": "David",
"class": "0",
"marks": "78.2"
},
{
"name": "Emily",
"class": "4",
"marks": "0.0"
}
]
}
I 'm assuming by the dictionary which you provided that marks & class are stored as String.
li=[]
for d in df["details"]:
for k,v in d.items():
if (v==''):
if (k=='name'):
d[k]="BLANK"
elif (k=='class') :
d[k]='0'
elif (k=='marks'):
d[k]='0.0'
li.append(d)
df['details']=li
I'd like to preface by saying that I'm VERY new to python so apologize if the answer to this is obvious. :) I have a python script does a couple of API calls and returns json data. The output currently looks something similar to below:
IP Information is below:
{
"id": 318283,
"name": "Name",
"type": "IP4Address",
"properties": "VLAN=5|DeviceName=Device|Notes=This is a description address|Administration=Team that admins the system|Location-Code=Location|address=1.2.3.4|state=STATIC|"
}
IP Subnet Range Information is below:
{
"id": 118836,
"name": "VLAN Description",
"type": "IP4Network",
"properties": "Location-Code=Location|Notes=Description of the subnet|CIDR=1.2.3.0/25|allowDuplicateHost=disable|inheritAllowDuplicateHost=true|pingBeforeAssign=disable|inheritPingBeforeAssign=true|inheritDefaultDomains=true|defaultView=118346|inheritDefaultView=true|inheritDNSRestrictions=true|"
I'd like for the response to not have the "id" string and would like to split the various | delimited strings in properties into their own string so it looks like below:
{
"id": 318283,
"name": "Name",
"type": "IP4Address",
"properties":{
"VLAN": “5”,
"DeviceName": “Device”,
"Notes": “Description”,
"Administration": “Admin team”,
"Location-Code": “Location”,
"Address": “1.2.3.4”,
"State": “STATIC”
}
}
{
"id": 118836,
"name": "Subnet name",
"type": "IP4Network",
"properties":{
"Location-Code": "Location",
"Notes": "Subnet description. ",
"CIDR": "1.2.3.0/25",
"allowDuplicateHost": "disable",
"inheritAllowDuplicateHost": "true",
"pingBeforeAssign": "disable",
"inheritPingBeforeAssign": "true",
"inheritDefaultDomains": "true",
"defaultView": "118346",
"inheritDefaultView": "true",
"inheritDNSRestrictions": "true"
}
}
Any suggestions are greatly appreciated!
Here's a way to handle your pipe-separated properties:
a = { "id": 318283, "name": "Name", "type": "IP4Address", "properties": "VLAN=5|DeviceName=Device|Notes=This is a description address|Administration=Team that admins the system|Location-Code=Location|address=1.2.3.4|state=STATIC|" }
new_properties = {}
for key_val_pair in a['properties'].split("|"):
if key_val_pair.strip() == "":
continue
key, val = key_val_pair.split("=")
new_properties[key] = val
a["properties"] = new_properties
print(a)
Output:
{'id': 318283, 'name': 'Name', 'type': 'IP4Address', 'properties': {'VLAN': '5', 'DeviceName': 'Device', 'Notes': 'This is a description address', 'Administration': 'Team that admins the system', 'Location-Code': 'Location', 'address': '1.2.3.4', 'state': 'STATIC'}}
I'm trying to indexing some pandas dataframe into ElasticSearch. I have some troubles while parsing the json that I'm generating. I think that my problem is coming from the mapping. Please below find my code.
import logging
from pprint import pprint
from elasticsearch import Elasticsearch
import pandas as pd
def create_index(es_object, index_name):
created = False
# index settings
settings = {
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"danger": {
"dynamic": "strict",
"properties": {
"name": {
"type": "text"
},
"first_name": {
"type": "text"
},
"age": {
"type": "integer"
},
"city": {
"type": "text"
},
"sex": {
"type": "text",
},
}
}
}
}
try:
if not es_object.indices.exists(index_name):
#Ignore 400means to ignore "Index Already Exist" error
es_object.indices.create(index=index_name, ignore=400,
body=settings)
print('Created Index')
created = True
except Exception as ex:
print(str(ex))
finally:
return created
def store_record(elastic_object, index_name, record):
is_stored = True
try:
outcome = elastic_object.index(index=index_name,doc_type='danger', body=record)
print(outcome)
except Exception as ex:
print('Error in indexing data')
data = [['Hook', 'James','90', 'Austin','M'],['Sparrow','Jack','15', 'Paris', 'M'],['Kent','Clark','13', 'NYC', 'M'],['Montana','Hannah','28','Las Vegas', 'F'] ]
df = pd.DataFrame(data,columns=['name', 'first_name', 'age', 'city', 'sex'])
result = df.to_json(orient='records')
result = result[1:-1]
es = Elasticsearch()
if es is not None:
if create_index(es, 'cracra'):
out = store_record(es, 'cracra', result)
print('Data indexed successfully')
I got the following error
POST http://localhost:9200/cracra/danger [status:400 request:0.016s]
Error in indexing data
RequestError(400, 'mapper_parsing_exception', 'failed to parse')
Data indexed successfully
I don't know where it is coming from. If anyone may help me to solve this, I would be grateful.
Thanks a lot !
Try to remove extra commas from your mappings:
"mappings": {
"danger": {
"dynamic": "strict",
"properties": {
"name": {
"type": "text"
},
first_name": {
"type": "text"
},
"age": {
"type": "integer"
},
"city": {
"type": "text"
},
"sex": {
"type": "text", <-- here
}, <-- and here
}
}
}
UPDATE
It seems that the index is created successfully and the problem is in data indexing. As Nishant Saini noted you probably are trying to index several documents at a time. It can be done using Bulk API. Here is the example of correct request that indexes two documents:
POST cracra/danger/_bulk
{"index": {"_id": 1}}
{"name": "Hook", "first_name": "James", "age": "90", "city": "Austin", "sex": "M"}
{"index": {"_id": 2}}
{"name": "Sparrow", "first_name": "Jack", "age": "15", "city": "Paris", "sex": "M"}
Every document in the request body must appear in the new line with some meta information before it. In this case metainfo contains only id that must be assigned to the document.
You can either make this query by hand or use Elasticsearch Helpers for Python that can take care of adding correct metainfo.
I am trying to convert my json data into a dictionary with key the id of the json data. for example lets say i have the following json:
{
"id": "1",
"name": "John",
"surname": "Smith"
},
{
"id": "2",
"name": "Steve",
"surname": "Ger"
}
And i want to construct a new dictionary which includes the id as a key and save it into a file so i wrote the following code:
json_dict = []
request = requests.get('http://example.com/...')
with open("data.json", "w") as out:
loaded_data = json.loads(request.text)
for list_item in loaded_data:
json_dict.append({"id": list_item["id"], "data": list_item })
out.write(json.dumps(json_dict))
In the file i get the following output:
[{"data": {"id":"1",
"name":"John",
"Surname":"Smith"
}
},
{"data": {"id":"2",
"name":"Steve",
"Surname":"Ger"
}
},
]
Why the id is not included in my dict before data ?
I'm pretty sure you're looking at a ghost here. You probably tested wrong. It will go away when you try to create a minimal, complete, and verifiable example for us (i. e. with a fixed string as input instead of a request call, with a print instead of an out.write, etc).
This is my test in which I could not reproduce the problem:
entries = [
{
"id": "1",
"name": "John",
"surname": "Smith"
},
{
"id": "2",
"name": "Steve",
"surname": "Ger"
}
]
json_dict = []
for i in entries:
json_dict.append({"id":i["id"], "data": i})
json.dumps(json_dict, indent=2)
This prints as expected:
[
{
"id": "1",
"data": {
"id": "1",
"surname": "Smith",
"name": "John"
}
},
{
"id": "2",
"data": {
"id": "2",
"surname": "Ger",
"name": "Steve"
}
}
]
You could try this instead:
json_dict = []
with open('data.json', 'w) as f:
loaded_data = json.loads(request.text)
for list_item in loaded_data:
json_dict.append({list_item['id'] : list_item})
I’ve previously succeeded in parsing data from a JSON file, but now I’m facing a problem with the function I want to achieve. I have a list of names, identification numbers and birthdate in a JSON. What I want to get in Python is to be able to let a user input a name and retrieve his identification number and the birthdate (if present).
This is my JSON example file:
[
{
"id_number": "SA4784",
"name": "Mark",
"birthdate": null
},
{
"id_number": "V410Z8",
"name": "Vincent",
"birthdate": "15/02/1989"
},
{
"id_number": "CZ1094",
"name": "Paul",
"birthdate": "27/09/1994"
}
]
To be clear, I want to input "V410Z8" and get his name and his birthdate.
I tried to write some code in Python but I only succeed in searching for “id_number” and not for what is inside “id_number” for example "V410Z8".
#!/usr/bin/python
# -*- coding: utf-8 -*-
import json
database = "example.json"
data = json.loads(open(database).read())
id_number = data[0]["id_number"]
print id_number
Thank you for your support, guys :)
You have to iterate over the list of dictionaries and search for the one with the given id_number. Once you find it you can print the rest of its data and break, assuming id_number is unique.
data = [
{
"id_number": "SA4784",
"name": "Mark",
"birthdate": None
},
{
"id_number": "V410Z8",
"name": "Vincent",
"birthdate": "15/02/1989"
},
{
"id_number": "CZ1094",
"name": "Paul",
"birthdate": "27/09/1994"
}
]
for i in data:
if i['id_number'] == 'V410Z8':
print(i['birthdate'])
print(i['name'])
break
If you have control over the data structure, a more efficient way would be to use the id_number as a key (again, assuming id_number is unique):
data = { "SA4784" : {"name": "Mark", "birthdate": None},
"V410Z8" : { "name": "Vincent", "birthdate": "15/02/1989"},
"CZ1094" : {"name": "Paul", "birthdate": "27/09/1994"}
}
Then all you need to do is try to access it directly:
try:
print(data["V410Z8"]["name"])
except KeyError:
print("ID doesn't exist")
>> "Vincent"
Using lamda in Python
data = [
{
"id_number": "SA4784",
"name": "Mark",
"birthdate": None
},
{
"id_number": "V410Z8",
"name": "Vincent",
"birthdate": "15/02/1989"
},
{
"id_number": "CZ1094",
"name": "Paul",
"birthdate": "27/09/1994"
}
]
Using Lambda and filter
print(list(filter(lambda x:x["id_number"]=="CZ1094",data)))
Output
[{'id_number': 'CZ1094', 'name': 'Paul', 'birthdate': '27/09/1994'}]
You can use list comprehension:
Given
data = [
{
"id_number": "SA4784",
"name": "Mark",
"birthdate": None
},
{
"id_number": "V410Z8",
"name": "Vincent",
"birthdate": "15/02/1989"
},
{
"id_number": "CZ1094",
"name": "Paul",
"birthdate": "27/09/1994"
}
]
to get the list item(s) with id_number equal to "V410Z8" you may use:
result = [x for x in data if x["id_number"]=="V410Z8"]
result will contain:
[{'id_number': 'V410Z8', 'name': 'Vincent', 'birthdate': '15/02/1989'}]
In case the if condition is not satisfied, result will contain an empty list: []
data = [
{
"id_number": "SA4784",
"name": "Mark",
"birthdate": None
},
{
"id_number": "V410Z8",
"name": "Vincent",
"birthdate": "14/02/1989"
},
{
"id_number": "CZ1093",
"name": "Paul",
"birthdate": "26/09/1994"
}
]
list(map(lambda x:x if x["id_number"]=="cz1093" ,data)
Output should be
[{
"id_number": "CZ1094",
"name": "Paul",
"birthdate": "26/09/1994"
}]
If you are only interested in one or a subset of total results, then I'd suggest a generator function as the fastest solution, since it will not unnecessarily iterate over every item regardless, and is more memory efficient:
def gen_func(data, search_term):
for i in data:
if i['id_number'] == search_term:
yield i
You can then run the following to retrieve results for CZ1094:
foo = gen_func(data, 'CZ1094')
next(foo)
{'id_number': 'CZ1094', 'name': 'Paul', 'birthdate': '27/09/1994'}
NB: You'll need to handle StopIteration at end of iterable.