how to read in json file in lines - python

I got a json file in format like, each record is represented in lines:
{
"A":0,
"B":2
}{
"A":3,
"B":4
}
how to read it in a list?

If your data is exactly in that format, we can edit it into valid JSON.
import json
source = '''\
{
"A":0,
"B":2
}{
"A":3,
"B":4
}{
"C":5,
"D":6
}
'''
fixed = '[' + source.replace('}{', '},{') + ']'
lst = json.loads(fixed)
print(lst)
output
[{'A': 0, 'B': 2}, {'A': 3, 'B': 4}, {'C': 5, 'D': 6}]
This relies on each record being separated by '}{'. If that's not the case, we can use regex to do the search & replace operation.

Add [ and ] around your input and try this:
import json
with open('data.json') as data_file:
data = json.load(data_file)
print (data)
This code returns this line
[{'A': 0, 'B': 2}, {'A': 3, 'B': 4}]
when I put this data into the file:
[
{
"A":0,
"B":2
},{
"A":3,
"B":4
}
]
If you can't edit the file data.json, you can read string from this file, add [ and ] around this string, and call json.loads().
Update: Oh, I see that I added comma separator between JSON files. For initial input this my code doesn't work. But may be it is better to modify generator of this file? (i.e. to add comma separator)

Untested
import pandas as pd
str = '{"A":0,"B":2}{"A":3,"B":4}'
list(pd.read_json(str))

Related

How to Convert Text in Non-structured Format to JSON Format

Content of a Sample Input Text
{'key1':'value1','msg1':"content1"} //line 1
{'key2':'value2','msg2':"content2"} //line 2
{'key3':'value3','msg3':"content3"} //line 3
Also, pointing out some notable characteristics of the input text
Lacks a proper delimiter, currently each object {...} takes a new line "\n"
Contains single quotes, which can be an issue since JSON (the expected output) accepts only double quotes
Does not have the opening and closing curly brackets required by JSON
Expected Output JSON
{
{
"key1":"value1",
"msg1":"content1"
},
{
"key2":"value2",
"msg2":"content2"
},
{
"key3":"value3",
"msg3":"content3"
}
}
What I have tried, but failed
json.dumps(input_text), but it cannot identify "\n" as the "delimiter"
Appending a comma at the end of each object {...}, but encountered the issue of extra comma when it comes to the last object
If you have one dictionary per line, you can replace newlines with , and enclose the whole in brackets [,] (you get a list of dictionaries).
You can use ast.literal_eval to import your file as list of dictionaries.
Finally export it to json:
import json
import ast
with open("file.txt", "r") as f:
dic_list = ast.literal_eval("[" + f.read().replace('\n',',') + "]")
print(json.dumps(dic_list, indent=4))
Output:
[
{
"key1": "value1",
"msg1": "content1"
},
{
"key2": "value2",
"msg2": "content2"
},
{
"key3": "value3",
"msg3": "content3"
}
]
Just use ast
import ast
with open('test.txt') as f:
data = [ast.literal_eval(l.strip()) for l in f.readlines()]
print(data)
output
[{'key1': 'value1', 'msg1': 'content1'}, {'key2': 'value2', 'msg2': 'content2'}, {'key3': 'value3', 'msg3': 'content3'}]

Python3 unpickle a string representation of bytes object

Is there a good way to load a bytes object that is represented as a string, so it can be unpickled?
Basic Example
Here is a dumb example:
import pickle
mydict = { 'a': 1111, 'b': 2222 }
string_of_bytes_obj = str(pickle.dumps(mydict)) # Deliberate string representation for this quick example.
unpickled_dict = pickle.loads(string_of_bytes_obj) # ERROR! Loads takes bytes-like object and not string.
Attempt at a Solution
One solution is of course to eval the string:
unpickled_dict = pickle.loads(eval(string_of_bytes_obj))
But, seems wrong to eval, especially when the strings might be coming over a network or from a file.
...
Any suggestions for a better solution?
Thanks!
For a safety concern you can use ast.literal_eval instead of eval:
>>> import ast
>>> pickle.loads(ast.literal_eval(string_of_bytes_obj))
{'b': 2222, 'a': 1111}
You can use encoding="latin1" as an argument to str and then use bytes to convert back:
import pickle
mydict = { 'a': 1111, 'b': 2222 }
string_of_bytes_obj = str(pickle.dumps(mydict), encoding="latin1")
unpickled_dict = pickle.loads(bytes(string_of_bytes_obj, "latin1"))
Output:
>>> print(unpickled_dict)
{'a': 1111, 'b': 2222}
Is there a reason you need to have it as a str? If you're just writing it to file, you can 'wb' instead of 'w'. (https://pythontips.com/2013/08/02/what-is-pickle-in-python/)
import pickle
mydict = { 'a': 1111, 'b': 2222 }
dumped = pickle.dumps(mydict)
string_of_bytes_obj = str(dumped) # Deliberate string representation for this quick example.
unpickled_dict = pickle.loads(dumped)
First of all i wouldn't use pickles to serialize data. instead use Json.
my solution with pickles
import pickle
mydict = { 'a': 1111, 'b': 2222 }
string_of_bytes_obj = pickle.dumps(mydict) # Deliberate string representation for this quick example.
print(string_of_bytes_obj)
unpickled_dict = pickle.loads(string_of_bytes_obj)
print(unpickled_dict)
BUT with json
import json
mydict = { 'a': 1111, 'b': 2222 }
string_of_bytes_obj = json.dumps(mydict)
print(string_of_bytes_obj)
unpickled_dict = json.loads(string_of_bytes_obj)
print(unpickled_dict)
I highly recommend you to use json to serialize your data

Python - How to insert an element in the top of a dictionnary?

I know there is no order in a Python's dictionary, but I have a JSON file which contains dictionaries, and I want to insert an element in every dictionary In a way that when I will open the JSON I will see the element added in the top.
Exemple: File JSON contains:
{"b":2, "c":3}
Should be like this after adding:
{"a":1, "b":2, "c":3}
You can use a little trick with OrderedDict to do this.
Step 1: Have your old data stored somewhere.
import json
s = {"b":2, "c":3}
json.dump(s, open('file.json', 'w'))
The data looks like this: '{"b": 2, "c": 3}'
Step 2: When adding new data, use an OrderedDict to load your existing data.
from collections import OrderedDict
new_data = OrderedDict({'a' : 1})
new_data.update(json.loads(open('file.json'), object_pairs_hook=OrderedDict))
json.dump(new_data, open('file.json', 'w'))
And now, the data looks like this: '{"a": 1, "b": 2, "c": 3}'

How to save a dictionary into a file, keeping nice format?

If I have dictionary like:
{
"cats": {
"sphinx": 3,
"british": 2
},
"dogs": {}
}
And try to save it to a text file, I get something like this:
{"cats": {"sphinx": 3}, {"british": 2}, "dogs": {}}
How can I save a dictionary in pretty format, so it will be easy to read by human eye?
You can import json and specify an indent level:
import json
d = {
"cats": {
"sphinx": 3,
"british": 2
},
"dogs": {}
}
j = json.dumps(d, indent=4)
print(j)
{
"cats": {
"sphinx": 3,
"british": 2
},
"dogs": {}
}
Note that this is a string, however:
>>> j
'{\n "cats": {\n "sphinx": 3, \n "british": 2\n }, \n "dogs": {}\n}'
You can use pprint for that:
import pprint
pprint.pformat(thedict)
If you want to save it in a more standard format, you can also use, for example, a yaml file (and the related python package http://pyyaml.org/wiki/PyYAMLDocumentation), and the code would look like:
import yaml
dictionary = {"cats": {"sphinx": 3}, {"british": 2}, "dogs": {}}
with open('dictionary_file.yml', 'w') as yaml_file:
yaml.dump(dictionary, stream=yaml_file, default_flow_style=False)
dump creates a string in the yaml format to be written to the file. Note that it is possible to specify the stream and write the content immediately to the file. If it is necessary to get the string for some reason before writing to the file, just don't specify it and write it after using write function for the file.
Note also that the parameter default_flow_style allows to have a nicer format; in the example the file looks:
cats:
british: 2
sphinx: 3
dogs: {}
To load again the yaml file in a dictionary:
import yaml
with open('dictionary_file.yml', 'r') as yaml_file:
dictionary = yaml.load(yaml_file)
You can dump it by using the Python Object Notation module (pon: disclaimer I am the author of that module)
from pon import PON, loads
data = {
"cats": {
"sphinx": 3,
"british": 2
},
"dogs": {}
}
pon = PON(obj=data)
pon.dump()
which gives:
dict(
cats=dict(
sphinx=3,
british=2,
),
dogs=dict( ),
)
which again is correct Python, but trading the quoted strings needed for keys by using dict .
You can load this again with:
read_back = loads(open('file_name.pon').read())
print(read_back)
giving:
{'cats': {'sphinx': 3, 'british': 2}, 'dogs': {}}
Please note that loads() does not evaluate the string, it actually parses it safely using python's built-in parser.
PON also allows you to load python dictionaries from files, that have commented entries, and dump them while preserving the comments. This is where it's real usefulness comes into action.
Alternatively, if you would like something, arbitrarily more readable like the YAML format, you can use ruamel.yaml and do:
import ruamel.yaml
ruamel.yaml.round_trip_dump(data, stream=open('file_name.yaml', 'wb'), indent=4)
which gives you a file file_name.yaml with contents:
cats:
sphinx: 3
british: 2
dogs: {}
which uses the indent you seem to prefer (and is more efficient than #alberto's version)

get JSON var value in python

In Python, what is the easiest way to extract a line containing a JavaScript variable definition and get the value assigned to it (I'm scraping the JavaScript from webpages using BeautifulSoup), which is contained within curly braces (i.e. {, }), keeping in mind that the variable itself may contain any several levels of nested curly braces within it.
For e.g. with the input
var myVar = { "a": "123","b":"345", "c": {"c1":20,"c2":"c123", "c3": {"c3_1": {"c3_1_1":"12"}}}, "d":21, "e":["1","2"]}
I would like to get the complete myVar value as a string (as I want to convert this to a Python list after that),
{ "a": "123","b":"345", "c": {"c1":20,"c2":"c123", "c3": {"c3_1": {"c3_1_1":"12"}}}, "d":21, "e":["1","2"]}
Any help would be great as I am new to Python.
Use str.index to find where json object start and re.sub (makes "a": "123" from a:"123") with str.replace (changes single quotes to double quotes here: ['1','2']) to fix json:
import json
import re
var = '''var myVar = { a: "123",b:"345", c: {c1:20,c2:"c123", c3: {c3_1: {c3_1_1:"12"}}}, d:21, e:['1','2']}'''
v = var[var.index('{'):]
v = re.sub(r'(\w*):', r'"\1":', v)
v = v.replace('\'', '\"')
>>> v
'{ "a": "123","b":"345", "c": {"c1":20,"c2":"c123", "c3": {"c3_1": {"c3_1_1":"12"}}}, "d":21, "e":["1","2"]}'
>>> json.loads(v)
{u'a': u'123', u'c': {u'c3': {u'c3_1': {u'c3_1_1': u'12'}}, u'c2': u'c123', u'c1': 20}, u'b': u'345', u'e': [u'1', u'2'], u'd': 21}
import json
a = json.dumps(myVar)
The variable a is of the type string in this example. You can manipulate it as you like.

Categories