I have a file that contains information similar to the example shown below. I would like to know if there is a way to read this file using python and use it as json. I am aware of how to open it and process the text but is there another way?
Example:
key={{name='foo', etc..},{},{},{}}
If you want your content to be treated as json you need to have a valid json syntax:
{"key":[{"name": "foo"},{},{},{}]}
Then you can use json library to convert it to a dictionary:
>>> import json
>>> json.loads('{"key":[{"name": "foo"},{},{},{}]}')
{u'key': [{u'name': u'foo'}, {}, {}, {}]}
Additionally, if your file content cannot be changed to a correct json syntax, then you will need to develop a parser for your specific case.
After searching a few different languages I was able to determine that the data struct was a Lua container. I found a simple lua-python parser currently available here github.com/SirAnthony/slpp and I was able to parse(decode) the lua data structure. Below is an example of parsing a lua container with slpp.py I hope this helps anyone with a similar problem.
>>> from slpp import slpp as lua
>>> data = lua.decode('{ array = { 65, 23, 5 }, dict = { string = "value", array = { 3, 6, 4}, mixed = { 43, 54.3, false, string = "value", 9 } } }')
>>> print data
{'array': [65, 23, 5], 'dict': {'mixed': {0: 43, 1: 54.33, 2: False, 4: 9, 'string': 'value'}, 'array': [3, 6, 4], 'string': 'value'}}
Related
probably a basic question to most.
Though I am looking at getting some specific values that would follow certain words within a mass amount of text like so - (https://character-service.dndbeyond.com/character/v5/character/00000001)
In this instance if I wanted to take the value after baseHitPoints and also the value after removedHitPoints how would I go about this in the simplest way? The aim is to take these values and any others, even potentially text after a certain word, and store them as strings/integers to use in calculations.
I am also using BeautifulSoup to get this information if that effects anything in your answers.
Thanks in advance!
Tried a few different approaches.
Learning JSON document structure
First of all, you don't actually need to use BeautifulSoup here. The URL is targeting to JSON document, which is quite easy to navigate and get values from if you know the structure of JSON document.
To examine the structure of JSON document I can recommend using browser extensions for prettifying JSON, like JSON lite for Opera. But this is not best approach if JSON document is so large, your browser will be struggling render so much text, and extension will be struggling trying to format the JSON document. If document is large - use second approach.
You can download JSON document, and use Text Editors with plugins for JSON prettifying (like powerful Notepad++, plugin JSFormat).
Also, you can examine the structure of JSON document dirrectly in python using library for pretty print output called pprint (this is also not the best option with large documents).
import pprint
json = {"data": {"actions": {"items": [1, 2, 3], "colors": ['red', 'green', 'blue'], "items1": [1, 2, 3], "items2": {"numbers": [1, 2, 3], "characters": ['a', 'b', 'c']}}, "some_other_data": ['one', 'two', 'three']}, "new_data": [11, 22, 33, 44, 55]}
pprint.pprint(json)
Outputs same structure but more structured, and you can easily check how to retrieve certain value from JSON:
{'data': {'actions': {'colors': ['red', 'green', 'blue'],
'items': [1, 2, 3],
'items1': [1, 2, 3],
'items2': {'characters': ['a', 'b', 'c'],
'numbers': [1, 2, 3]}},
'some_other_data': ['one', 'two', 'three']},
'new_data': [11, 22, 33, 44, 55]}
Retrieving data from your URL
As we can see from the JSON document in your URL, JSON-path to baseHitPoints is: data -> baseHitPoints. JSON-path to removedHitPoints is: data -> removedHitPoints.
Knowing that, we can now easily retrieve those values using library requests:
import requests
URL = "https://character-service.dndbeyond.com/character/v5/character/00000001"
req = requests.get(URL)
json_data = req.json()
baseHitPoints = json_data['data']['baseHitPoints']
removedHitPoints = json_data['data']['removedHitPoints']
print(f"{baseHitPoints = }\n{removedHitPoints = }")
Output:
baseHitPoints = 8
removedHitPoints = 2
What you have there is JSON, not really a standard string. Because that's JSON you can get the data out of there easily by doing something like this:
import json
myData = json.load(myJSON) #myJSON would be the contents of that link(https://character-service.dndbeyond.com/character/v5/character/00000001) you provided
baseHitPoints = myData["data"]["baseHitPoints"]
removedHitPoints = myData["data"]["removedHitPoints"]
The values of those variables in my example using your JSON:
baseHitPoints = 8
baseHitPoints = 2
To make your life a little easier going forward I'd look for a tutorial for how to parse through JSON.
I have a nested dictionary which I am trying to convert to a JSON file using json.dumps(). Using the following code:
import json
dictionary={'Galicia':{'ACoruña':1,'Pontevedra':2,'Lugo':3,'Ourense':4},'Asturias':{'Oviedo':5},
'Castilla':{'Leon':6,'Burgos':7,'Avila':8}}
print(dictionary)
with open ('prueba.txt','w') as outfile:
json.dump(dictionary,outfile,ensure_ascii=False,indent=4)
I get this:
{
"Galicia": {
"ACoruña": 1,
"Pontevedra": 2,
"Lugo": 3,
"Ourense": 4
},
"Asturias": {
"Oviedo": 5
},
"Castilla": {
"Leon": 6,
"Burgos": 7,
"Avila": 8
}
}
But I would like to put in my JSON in a way that each principal key is on a new line to make it easier to read. I would like it to look like this :
{
"Galicia": { "ACoruña": 1 , "Pontevedra": 2, "Lugo": 3, "Ourense": 4},
"Asturias": { "Oviedo": 5},
"Castilla": { "Leon": 6, "Burgos": 7, "Avila": 8}
}
Any ideas?
You might find that hard to achieve. Since JASON is used so much with JS, a lot of the formatters adopt K&R style output.
You could read up and see if there is a way to override the formatter with your own custom one, but that will likely be a fair amount of work.
Try this below:
dictionary = {'Galicia': {'ACoruña': 1, 'Pontevedra': 2, 'Lugo': 3, 'Ourense': 4}, 'Asturias': {'Oviedo': 5},
'Castilla': {'Leon': 6, 'Burgos': 7, 'Avila': 8}}
print(dictionary)
with open('prueba.txt', 'w', encoding='utf-8') as outfile:
outfile.write('{\n')
for key, value in dictionary.items():
outfile.write('{0}, {1}\n'.format(key, value))
outfile.write('}')
I am using python's json to produce a json file and cannot manage to format it the way I want.
What I have is a dictionary with some keys, and each key has a list of numbers attached to it:
out = {"a": [1,2,3], "b": [4,5,6]}
What I want to do is produce a JSON string where each list is in its own line, like so:
{
"a": [1,2,3],
"b": [4,5,6]
}
However, I can only get
>>> json.dumps(out)
'{"a": [1, 2, 3], "b": [4, 5, 6]}'
which has no new lines, or
>>> print json.dumps(out, indent=2)
{
"a": [
1,
2,
3
],
"b": [
4,
5,
6
]
}
which has waaay to many. Is there a simply way to produce the string I want? I can do it manually, of course, but I am wondering if it is possible with json alone...
You can't do that with the json module, no. It was never the goal for the module to allow this much control over the output.
The indent option is only meant to aid debugging. JSON parsers don't care about how much whitespace is used in-between elements.
So I use the Java Debugger JSON in my python program because a few months ago I was told that this was the best way of opening a text file and making it into a dictionary and also saving the dictionary to a text file. However I am not sure how it works.
Below is how I am using it within my program:
with open ("totals.txt", 'r') as f30:
totaldict = json.load(f30)
and
with open ("totals.txt", 'w') as f29:
json.dump(totaldict, f29)
I need to explain how it works for my project so could anyone explain for me how exactly json works when loading a text file into dictionary format and when dumping contents into the text file?
Thanks.
Edit: please don't just post links to other articles as I have tried to look at these and they have offered me not much help as they are not in my context of using JSON for dictionaries and a bit overwhelming as I am only a beginner.
JSON is J ava S cript O bject N otation. It works in Python like it does anywhere else, by giving you a syntax for describing arbitrary things as objects.
Most JSON is primarily composed of JavaScript arrays, which look like this:
[1, 2, 3, 4, 5]
Or lists of key-value pairs describing an object, which look like this:
{"key1": "value1", "key2": "value2"}
These can also be nested in either direction:
[{"object1": "data1"}, {"object2": "data2"}]
{"object1": ["list", "of", "data"]}
Naturally, Python can very easily treat these types as lists and dicts, which is exactly what the json module tries to do.
>>> import json
>>> json.loads('[{"object1": "data1"}, {"object2": "data2"}]')
[{'object1': 'data1'}, {'object2': 'data2'}]
>>> json.dumps(_)
'[{"object1": "data1"}, {"object2": "data2"}]'
Try this: Python Module of the Week
The json module provides an API similar to pickle for converting in-memory Python objects to a serialized representation known as JavaScript Object Notation (JSON). Unlike pickle, JSON has the benefit of having implementations in many languages (especially JavaScript)
Encoding and Decoding Simple Data Types
The encoder understands Python’s native types by default (string, unicode, int, float, list, tuple, dict).
import json
data = [ { 'a':'A', 'b':(2, 4), 'c':3.0 } ]
print 'DATA:', repr(data)
data_string = json.dumps(data)
print 'JSON:', data_string
Values are encoded in a manner very similar to Python’s repr() output.
$ python json_simple_types.py
DATA: [{'a': 'A', 'c': 3.0, 'b': (2, 4)}]
JSON: [{"a": "A", "c": 3.0, "b": [2, 4]}]
Encoding, then re-decoding may not give exactly the same type of object.
import json
data = [ { 'a':'A', 'b':(2, 4), 'c':3.0 } ]
data_string = json.dumps(data)
print 'ENCODED:', data_string
decoded = json.loads(data_string)
print 'DECODED:', decoded
print 'ORIGINAL:', type(data[0]['b'])
print 'DECODED :', type(decoded[0]['b'])
In particular, strings are converted to unicode and tuples become lists.
$ python json_simple_types_decode.py
ENCODED: [{"a": "A", "c": 3.0, "b": [2, 4]}]
DECODED: [{u'a': u'A', u'c': 3.0, u'b': [2, 4]}]
ORIGINAL: <type 'tuple'>
DECODED : <type 'list'>
Suppose I need to have a database file consisting of a list of dictionaries:
file:
[
{"name":"Joe","data":[1,2,3,4,5]},
{ ... },
...
]
I need to have a function that receives a list of dictionaries as shown above and appends it to the file. Is there any way to achieve that, say using json (or any other method), without loading the file?
EDIT1:
Note: What I need, is to append new dictionaries to an already existing file on the disc.
You can use json to dump the dicts, one per line. Now each line is a single json dict that you've written. You loose the outer list, but you can add records with a simple append to the existing file.
import json
import os
def append_record(record):
with open('my_file', 'a') as f:
json.dump(record, f)
f.write(os.linesep)
# demonstrate a program writing multiple records
for i in range(10):
my_dict = {'number':i}
append_record(my_dict)
The list can be assembled later
with open('my_file') as f:
my_list = [json.loads(line) for line in f]
The file looks like
{"number": 0}
{"number": 1}
{"number": 2}
{"number": 3}
{"number": 4}
{"number": 5}
{"number": 6}
{"number": 7}
{"number": 8}
{"number": 9}
If it is required to keep the file being valid json, it can be done as follows:
import json
with open (filepath, mode="r+") as file:
file.seek(0,2)
position = file.tell() -1
file.seek(position)
file.write( ",{}]".format(json.dumps(dictionary)) )
This opens the file for both reading and writing. Then, it goes to the end of the file (zero bytes from the end) to find out the file end's position (relatively to the beginning of the file) and goes last one byte back, which in a json file is expected to represent character ]. In the end, it appends a new dictionary to the structure, overriding the last character of the file and keeping it to be valid json. It does not read the file into the memory. Tested with both ANSI and utf-8 encoded files in Python 3.4.3 with small and huge (5 GB) dummy files.
A variation, if you also have os module imported:
import os, json
with open (filepath, mode="r+") as file:
file.seek(os.stat(filepath).st_size -1)
file.write( ",{}]".format(json.dumps(dictionary)) )
It defines the byte length of the file to go to the position of one byte less (as in the previous example).
If you are looking to not actually load the file, going about this with json is not really the right approach. You could use a memory mapped file… and never actually load the file to memory -- a memmap array can open the file and build an array "on-disk" without loading anything into memory.
Create a memory-mapped array of dicts:
>>> import numpy as np
>>> a = np.memmap('mydict.dat', dtype=object, mode='w+', shape=(4,))
>>> a[0] = {'name':"Joe", 'data':[1,2,3,4]}
>>> a[1] = {'name':"Guido", 'data':[1,3,3,5]}
>>> a[2] = {'name':"Fernando", 'data':[4,2,6,9]}
>>> a[3] = {'name':"Jill", 'data':[9,1,9,0]}
>>> a.flush()
>>> del a
Now read the array, without loading the file:
>>> a = np.memmap('mydict.dat', dtype=object, mode='r')
The contents of the file are loaded into memory when the list is created, but that's not required -- you can work with the array on-disk without loading it.
>>> a.tolist()
[{'data': [1, 2, 3, 4], 'name': 'Joe'}, {'data': [1, 3, 3, 5], 'name': 'Guido'}, {'data': [4, 2, 6, 9], 'name': 'Fernando'}, {'data': [9, 1, 9, 0], 'name': 'Jill'}]
It takes a negligible amount of time (e.g. nanoseconds) to create a memory-mapped array that can index a file regardless of size (e.g. 100 GB) of the file.
Using the same approach as user3500511...
Suppose we have two lists of dictionaries (dicts, dicts2). The dicts are converted to json formatted strings. Dicts is saved to a new file - test.json. Test.json is reopened and the string objects are formatted with the proper delimiters. With the reformatted objects, dict2 can be appended and the file still maintains the proper structure for a JSON object.
import json
dicts = [{ "name": "Stephen", "Number": 1 }
,{ "name": "Glinda", "Number": 2 }
,{ "name": "Elphaba", "Number": 3 }
,{ "name": "Nessa", "Number": 4 }]
dicts2= [{ "name": "Dorothy", "Number": 5 }
,{ "name": "Fiyero", "Number": 6 }]
f = open("test.json","w")
f.write(json.dumps(dicts))
f.close()
f2 = open("test.json","r+")
f2.seek(-1,2)
f2.write(json.dumps(dicts2).replace('[',',',1))
f2.close()
f3 = open('test.json','r')
f3.read()