I'm trying to process a log from Symphony using Pandas, but have some trouble with a malformed JSON which I can't parse.
An example of the log :
'{id:46025,
work_assignment:43313=>43313,
declaration:<p><strong>Bijkomende interventie.</strong></p>\r\n\r\n<p>H </p>\r\n\r\n<p><strong><em>Vaststellingen.</em></strong></p>\r\n\r\n<p><strong><em>CV. </em></strong>De.</p>=><p><strong>Bijkomende interventie.</strong></p>\r\n\r\n<p>He </p>\r\n\r\n<p><strong><em>Vaststellingen.</em></strong></p>\r\n\r\n<p><strong><em>CV. </em></strong>De.</p>,conclusions:<p>H </p>=><p>H </p>}'
What is the best way to process this?
For each part (id/work_assignment/declaration/etc) I would like to retrieve the old and new value (which are separated by "=>").
Use the following code:
def clean(my_log):
my_log.replace("{", "").replace("}", "") # Removes the unneeded { }
my_items = list(my_log.split(",")) # Split at the comma to get the pairs
my_dict = {}
for i in my_items:
key, value = i.split(":") # Split at the colon to separate the key and value
my_dict[key] = value # Add to the dictionary
return my_dict
Function returns a Python dictionary, which can then be converted to JSON using a serializer if needed, or directly used.
Hope I helped :D
Related
So I am struggling with getting a value from a JSON response. Looking in other post I have managed to write this code but when I try to search for the key (character_id) that I want in the dictionary python says that the key doesn't exist. My solution consists in getting the JSON object from the response, converting it into a string with json.dumps() and the converting it into a dictionary with json.loads(). Then I try to get 'character_id' from the dictionary but it doesn't exist. I am guessing it is related with the format of the dictionary but I have little to none experience in python. The code that makes the query and tries to get the values is this: (dataRequest is a fuction that makes the request and return the response from the api)
characterName = sys.argv[1];
response = dataRequest('http://census.daybreakgames.com/s:888/get/ps2:v2/character/?name.first_lower=' + characterName + '&c:show=character_id')
jsonString = json.dumps(response.json())
print(jsonString)
dic = json.loads(jsonString)
print(dic)
if 'character_id' in dic:
print(dic['character_id'])
The output of the code is:
{"character_list": [{"character_id": "5428662532301799649"}], "returned": 1}
{'character_list': [{'character_id': '5428662532301799649'}], 'returned': 1}
Welcome #Prieto! From what I can see, you probably don't need to serialize/de-serialize the JSON -- response.json() returns a python dictionary object already.
The issue is that you are looking for the 'character_id' key at the top-level of the dictionary, when it seems to be embedded inside another dictionary, that is inside a list. Try something like this:
#...omitted code
for char_obj in dic["character_list"]:
if "character_id" in char_obj:
print(char_obj["character_id"])
if your dic is like {"character_list": [{"character_id": "5428662532301799649"}], "returned": 1}
you get the value of character_id by
print(dic['character_list'][0][character_id])
The problem here is that you're trying to access a dictionary where the key is actually character_list.
What you need to do is to access the character_list value and iterate over or filter the character_id you want.
Like this:
print(jsonString)
dic = json.loads(jsonString)
print(dic)
character_information = dic['character_list'][0] # we access the character list and assume it is the first value
print(character_information["character_id"]) # this is your character id
The way I see it, the only hiccup with the code is this :
if 'character_id' in dic:
print(dic['character_id'])
The problem is that, the JSON file actually consists of actually 2 dictionaries , first is the main one, which has two keys, character_list and returned. There is a second sub-dictionary inside the array, which is the value for the key character_list.
So, what your code should actually look like is something like this:
for i in dic["character_list"]:
print(i["character_id"])
On a side-note, it will help to look at JSON file in this way :
{
"character_list": [
{
"character_id": "5428662532301799649"
}
],
"returned": 1
}
,where, elements enclosed in curly-brackets'{}' imply they are in a dictionary, whereas elements enclosed in curly-brackets'[]' imply they are in a list
I am trying to pass in a JSON file and convert the data into a dictionary.
So far, this is what I have done:
import json
json1_file = open('json1')
json1_str = json1_file.read()
json1_data = json.loads(json1_str)
I'm expecting json1_data to be a dict type but it actually comes out as a list type when I check it with type(json1_data).
What am I missing? I need this to be a dictionary so I can access one of the keys.
Your JSON is an array with a single object inside, so when you read it in you get a list with a dictionary inside. You can access your dictionary by accessing item 0 in the list, as shown below:
json1_data = json.loads(json1_str)[0]
Now you can access the data stored in datapoints just as you were expecting:
datapoints = json1_data['datapoints']
I have one more question if anyone can bite: I am trying to take the average of the first elements in these datapoints(i.e. datapoints[0][0]). Just to list them, I tried doing datapoints[0:5][0] but all I get is the first datapoint with both elements as opposed to wanting to get the first 5 datapoints containing only the first element. Is there a way to do this?
datapoints[0:5][0] doesn't do what you're expecting. datapoints[0:5] returns a new list slice containing just the first 5 elements, and then adding [0] on the end of it will take just the first element from that resulting list slice. What you need to use to get the result you want is a list comprehension:
[p[0] for p in datapoints[0:5]]
Here's a simple way to calculate the mean:
sum(p[0] for p in datapoints[0:5])/5. # Result is 35.8
If you're willing to install NumPy, then it's even easier:
import numpy
json1_file = open('json1')
json1_str = json1_file.read()
json1_data = json.loads(json1_str)[0]
datapoints = numpy.array(json1_data['datapoints'])
avg = datapoints[0:5,0].mean()
# avg is now 35.8
Using the , operator with the slicing syntax for NumPy's arrays has the behavior you were originally expecting with the list slices.
Here is a simple snippet that read's in a json text file from a dictionary. Note that your json file must follow the json standard, so it has to have " double quotes rather then ' single quotes.
Your JSON dump.txt File:
{"test":"1", "test2":123}
Python Script:
import json
with open('/your/path/to/a/dict/dump.txt') as handle:
dictdump = json.loads(handle.read())
You can use the following:
import json
with open('<yourFile>.json', 'r') as JSON:
json_dict = json.load(JSON)
# Now you can use it like dictionary
# For example:
print(json_dict["username"])
The best way to Load JSON Data into Dictionary is You can user the inbuilt json loader.
Below is the sample snippet that can be used.
import json
f = open("data.json")
data = json.load(f))
f.close()
type(data)
print(data[<keyFromTheJsonFile>])
I am working with a Python code for a REST API, so this is for those who are working on similar projects.
I extract data from an URL using a POST request and the raw output is JSON. For some reason the output is already a dictionary, not a list, and I'm able to refer to the nested dictionary keys right away, like this:
datapoint_1 = json1_data['datapoints']['datapoint_1']
where datapoint_1 is inside the datapoints dictionary.
pass the data using javascript ajax from get methods
**//javascript function
function addnewcustomer(){
//This function run when button click
//get the value from input box using getElementById
var new_cust_name = document.getElementById("new_customer").value;
var new_cust_cont = document.getElementById("new_contact_number").value;
var new_cust_email = document.getElementById("new_email").value;
var new_cust_gender = document.getElementById("new_gender").value;
var new_cust_cityname = document.getElementById("new_cityname").value;
var new_cust_pincode = document.getElementById("new_pincode").value;
var new_cust_state = document.getElementById("new_state").value;
var new_cust_contry = document.getElementById("new_contry").value;
//create json or if we know python that is call dictionary.
var data = {"cust_name":new_cust_name, "cust_cont":new_cust_cont, "cust_email":new_cust_email, "cust_gender":new_cust_gender, "cust_cityname":new_cust_cityname, "cust_pincode":new_cust_pincode, "cust_state":new_cust_state, "cust_contry":new_cust_contry};
//apply stringfy method on json
data = JSON.stringify(data);
//insert data into database using javascript ajax
var send_data = new XMLHttpRequest();
send_data.open("GET", "http://localhost:8000/invoice_system/addnewcustomer/?customerinfo="+data,true);
send_data.send();
send_data.onreadystatechange = function(){
if(send_data.readyState==4 && send_data.status==200){
alert(send_data.responseText);
}
}
}
django views
def addNewCustomer(request):
#if method is get then condition is true and controller check the further line
if request.method == "GET":
#this line catch the json from the javascript ajax.
cust_info = request.GET.get("customerinfo")
#fill the value in variable which is coming from ajax.
#it is a json so first we will get the value from using json.loads method.
#cust_name is a key which is pass by javascript json.
#as we know json is a key value pair. the cust_name is a key which pass by javascript json
cust_name = json.loads(cust_info)['cust_name']
cust_cont = json.loads(cust_info)['cust_cont']
cust_email = json.loads(cust_info)['cust_email']
cust_gender = json.loads(cust_info)['cust_gender']
cust_cityname = json.loads(cust_info)['cust_cityname']
cust_pincode = json.loads(cust_info)['cust_pincode']
cust_state = json.loads(cust_info)['cust_state']
cust_contry = json.loads(cust_info)['cust_contry']
#it print the value of cust_name variable on server
print(cust_name)
print(cust_cont)
print(cust_email)
print(cust_gender)
print(cust_cityname)
print(cust_pincode)
print(cust_state)
print(cust_contry)
return HttpResponse("Yes I am reach here.")**
I am currently trying out something which I am unsure if it is possible.
I am trying to map API values from a JSON string (which has nested values) to a database field but I wish for it to be dynamic.
In the YAML example below, the key would be the database field name and the database field value would be where to obtain the information from the JSON string ("-" delimited for nested values). I am able to read the YAML config but what I don't understand is how to translate it to python code. If it were to be dynamic I have no idea how many [] I would have to put.
YAML: (PYYAML package)
employer: "properties-employer_name"
...
employee_name: "employee"
Python Code: (Python 3.8)
json_data = { properties: {employer_name: "XYZ"}, employee: "Sam" }
employer = json_data["properties"]["employer_name"] # How Do I add [] based on how nested the value is dynamically?
employee = json_data["employee"]
Many thanks!
You could try something like this:
def get_value(data, keys):
# Go over each key and adjust data value to current level
for key in keys:
data = data[key]
return data # Once last key is reached return value
You would get your keys by splitting on '-' if that is how you have it in your yaml so in my example I just saved the value to a string and did it this way:
employer = "properties-employer_name"
keys = employer.split('-') # Gives us ['properties', 'employer_name']
Now we can call our get_value function defined above:
get_value(json_data, keys)
Which returns 'XYZ'
I have data that look like this:
data = 'somekey:value4thekey&second-key:valu3-can.be?anything&third_k3y:it%can have spaces;too'
In a nice human-readable way it would look like this:
somekey : value4thekey
second-key : valu3-can.be?anything
third_k3y : it%can have spaces;too
How should I parse the data so when I do data['somekey'] I would get >>> value4thekey?
Note: The & is connecting all of the different items
How am I currently tackling with it
Currently, I use this ugly solution:
all = data.split('&')
for i in all:
if i.startswith('somekey'):
print i
This solution is very bad due to multiple obvious limitations. It would be much better if I can somehow parse it into a python tree object.
I'd split the string by & to get a list of key-value strings, and then split each such string by : to get key-value pairs. Using dict and list comprehensions actually makes this quite elegant:
result = {k:v for k, v in (part.split(':') for part in data.split('&'))}
You can parse your data directly to a dictionary - split on the item separator & then split again on the key,value separator ::
table = {
key: value for key, value in
(item.split(':') for item in data.split('&'))
}
This allows you direct access to elements, e.g. as table['somekey'].
If you don't have objects within a value, you can parse it to a dictionary
structure = {}
for ele in data.split('&'):
ele_split = ele.split(':')
structure[ele_split[0]] = ele_split[1]
You can now use structure to get the values:
print structure["somekey"]
#returns "value4thekey"
Since the keys have a common format of being in the form of "key":"value".
You can use it as a parameter to split on.
for i in x.split("&"):
print(i.split(":"))
This would generate an array of even items where every even index is the key and odd index being the value. Iterate through the array and load it into a dictionary. You should be good!
I'd format data to YAML and parse the YAML
import re
import yaml
data = 'somekey:value4thekey&second-key:valu3-can.be?anything&third_k3y:it%can have spaces;too'
yaml_data = re.sub('[:]', ': ', re.sub('[&]', '\n', data ))
y = yaml.load(yaml_data)
for k in y:
print "%s : %s" % (k,y[k])
Here's the output:
third_k3y : it%can have spaces;too
somekey : value4thekey
second-key : valu3-can.be?anything
I am currently in the process of using python to transmit a python dictionary from one raspberry pi to another over a 433Mhz link, using virtual wire (vw.py) to send data.
The issue with vw.py is that data being sent is in string format.
I am successfully receiving the data on PI_no2, and now I am trying to reformat the data so it can be placed back in a dictionary.
I have created a small snippet to test with, and created a temporary string in the same format it is received as from vw.py
So far I have successfully split the string at the colon, and I am now trying to get rid of the double quotes, without much success.
my_status = {}
#temp is in the format the data is recieved
temp = "'mycode':['1','2','firstname','Lastname']"
key,value = temp.split(':')
print key
print value
key = key.replace("'",'')
value = value.replace("'",'')
my_status.update({key:value})
print my_status
Gives the result
'mycode'
['1','2','firstname','Lastname']
{'mycode': '[1,2,firstname,Lastname]'}
I require the value to be in the format
['1','2','firstname','Lastname']
but the strip gets rid of all the single speech marks.
You can use ast.literal_eval
import ast
temp = "'mycode':['1','2','firstname','Lastname']"
key,value = map(ast.literal_eval, temp.split(':'))
status = {key: value}
Will output
{'mycode': ['1', '2', 'firstname', 'Lastname']}
This shouldn't be hard to solve. What you need to do is strip away the [ ] in your list string, then split by ,. Once you've done this, iterate over the elements are add them to a list. Your code should look like this:
string = "[1,2,firstname,lastname]"
string = string.strip("[")
string = string.strip("]")
values = string.split(",")
final_list = []
for val in values:
final_list.append(val)
print final_list
This will return:
> ['1','2','firstname','lastname']
Then take this list and insert it into your dictionary:
d = {}
d['mycode'] = final_list
The advantage of this method is that you can handle each value independently. If you need to convert 1 and 2 to int then you'll be able to do that while leaving the other two as str.
Alternatively to cricket_007's suggestion of using a syntax tree parser - you're format is very similar to the standard yaml format. This is a pretty lightweight and intutive framework so I'll suggest it
a = "'mycode':['1','2','firstname','Lastname']"
print yaml.load(a.replace(":",": "))
# prints the dictionary {'mycode': ['1', '2', 'firstname', 'Lastname']}
The only thing that's different between your format and yaml is the colon needs a space
It also will distinguish between primitive data types for you, if that's important. Drop the quotes around 1 and 2 and it determines that they're numerical.
Tadhg McDonald-Jensen suggested pickling in the comments. This will allow you to store more complicated objects, though you may lose the human-readable format you've been experimenting with