Is there a good way to load a bytes object that is represented as a string, so it can be unpickled?
Basic Example
Here is a dumb example:
import pickle
mydict = { 'a': 1111, 'b': 2222 }
string_of_bytes_obj = str(pickle.dumps(mydict)) # Deliberate string representation for this quick example.
unpickled_dict = pickle.loads(string_of_bytes_obj) # ERROR! Loads takes bytes-like object and not string.
Attempt at a Solution
One solution is of course to eval the string:
unpickled_dict = pickle.loads(eval(string_of_bytes_obj))
But, seems wrong to eval, especially when the strings might be coming over a network or from a file.
...
Any suggestions for a better solution?
Thanks!
For a safety concern you can use ast.literal_eval instead of eval:
>>> import ast
>>> pickle.loads(ast.literal_eval(string_of_bytes_obj))
{'b': 2222, 'a': 1111}
You can use encoding="latin1" as an argument to str and then use bytes to convert back:
import pickle
mydict = { 'a': 1111, 'b': 2222 }
string_of_bytes_obj = str(pickle.dumps(mydict), encoding="latin1")
unpickled_dict = pickle.loads(bytes(string_of_bytes_obj, "latin1"))
Output:
>>> print(unpickled_dict)
{'a': 1111, 'b': 2222}
Is there a reason you need to have it as a str? If you're just writing it to file, you can 'wb' instead of 'w'. (https://pythontips.com/2013/08/02/what-is-pickle-in-python/)
import pickle
mydict = { 'a': 1111, 'b': 2222 }
dumped = pickle.dumps(mydict)
string_of_bytes_obj = str(dumped) # Deliberate string representation for this quick example.
unpickled_dict = pickle.loads(dumped)
First of all i wouldn't use pickles to serialize data. instead use Json.
my solution with pickles
import pickle
mydict = { 'a': 1111, 'b': 2222 }
string_of_bytes_obj = pickle.dumps(mydict) # Deliberate string representation for this quick example.
print(string_of_bytes_obj)
unpickled_dict = pickle.loads(string_of_bytes_obj)
print(unpickled_dict)
BUT with json
import json
mydict = { 'a': 1111, 'b': 2222 }
string_of_bytes_obj = json.dumps(mydict)
print(string_of_bytes_obj)
unpickled_dict = json.loads(string_of_bytes_obj)
print(unpickled_dict)
I highly recommend you to use json to serialize your data
Related
I have the following string:
txt = "{\'legs_a\': 1,\'legs_b\': 0,\'score_a\': 304,\'score_b\': 334,\'turn\': B,\'i\': 2,\'z\': 19}"
When I print it, I see the below output in my console
{'legs_a': 1,'legs_b': 0,'score_a': 304,'score_b': 334,'turn': B,'i': 2,'z': 19}
I want to make a dictionary of the string by using ast.literal_eval()
import ast
d = ast.literal_eval(txt)
This yields the following error:
{ValueError}malformed node or string: <_ast.Name object at
0x7fb8b8ab4fa0>
Please explain what's going wrong? How can I make a dictionary from the string? Thanks
B is an undefined variable or unquoted string.
try:
txt = "{\'legs_a\': 1,\'legs_b\': 0,\'score_a\': 304,\'score_b\': 334,\'turn\': \'B\',\'i\': 2,\'z\': 19}"
d = ast.literal_eval(txt)
print(d)
Output:
{'legs_a': 1, 'legs_b': 0, 'score_a': 304, 'score_b': 334, 'turn': 'B', 'i': 2, 'z': 19}
Note if you wanted to deserialze the JSON string using json.loads() function then you would need to replace single quotes with double quotes.
data = json.loads(txt.replace("'", '"'))
I got a json file in format like, each record is represented in lines:
{
"A":0,
"B":2
}{
"A":3,
"B":4
}
how to read it in a list?
If your data is exactly in that format, we can edit it into valid JSON.
import json
source = '''\
{
"A":0,
"B":2
}{
"A":3,
"B":4
}{
"C":5,
"D":6
}
'''
fixed = '[' + source.replace('}{', '},{') + ']'
lst = json.loads(fixed)
print(lst)
output
[{'A': 0, 'B': 2}, {'A': 3, 'B': 4}, {'C': 5, 'D': 6}]
This relies on each record being separated by '}{'. If that's not the case, we can use regex to do the search & replace operation.
Add [ and ] around your input and try this:
import json
with open('data.json') as data_file:
data = json.load(data_file)
print (data)
This code returns this line
[{'A': 0, 'B': 2}, {'A': 3, 'B': 4}]
when I put this data into the file:
[
{
"A":0,
"B":2
},{
"A":3,
"B":4
}
]
If you can't edit the file data.json, you can read string from this file, add [ and ] around this string, and call json.loads().
Update: Oh, I see that I added comma separator between JSON files. For initial input this my code doesn't work. But may be it is better to modify generator of this file? (i.e. to add comma separator)
Untested
import pandas as pd
str = '{"A":0,"B":2}{"A":3,"B":4}'
list(pd.read_json(str))
This question already has answers here:
Convert a String representation of a Dictionary to a dictionary
(11 answers)
Closed 8 months ago.
I have a bytes type object like this:
b"{'one': 1, 'two': 2}"
I need to get the dictionary from that using python code. I am converting it into string and then converting into dictionary as follows.
string = dictn.decode("utf-8")
print(type(string))
>> <class 'str'>
d = dict(toks.split(":") for toks in string.split(",") if toks)
But I am getting the below error:
------> d = dict(toks.split(":") for toks in string.split(",") if toks)
TypeError: 'bytes' object is not callable
I think a decode is also required to get a proper dict.
a= b"{'one': 1, 'two': 2}"
ast.literal_eval(a.decode('utf-8'))
**Output:** {'one': 1, 'two': 2}
The accepted answer yields
a= b"{'one': 1, 'two': 2}"
ast.literal_eval(repr(a))
**output:** b"{'one': 1, 'two': 2}"
The literal_eval hasn't done that properly with many of my code so I personally prefer to use json module for this
import json
a= b"{'one': 1, 'two': 2}"
json.loads(a.decode('utf-8'))
**Output:** {'one': 1, 'two': 2}
All you need is ast.literal_eval. Nothing fancier than that. No reason to mess with JSON unless you are specifically using non-Python dict syntax in your string.
# python3
import ast
byte_str = b"{'one': 1, 'two': 2}"
dict_str = byte_str.decode("UTF-8")
mydata = ast.literal_eval(dict_str)
print(repr(mydata))
See answer here. It also details how ast.literal_eval is safer than eval.
You could try like this:
import json
import ast
a= b"{'one': 1, 'two': 2}"
print(json.loads(a.decode("utf-8").replace("'",'"')))
print(ast.literal_eval(a.decode("utf-8")))
There are the doc of module:
1.ast doc
2.json doc
You can use Base64 library to convert string dictionary to bytes, and although you can convert bytes result to a dictionary using json library. Try this below sample code.
import base64
import json
input_dict = {'var1' : 0, 'var2' : 'some string', 'var1' : ['listitem1','listitem2',5]}
message = str(input_dict)
ascii_message = message.encode('ascii')
output_byte = base64.b64encode(ascii_message)
msg_bytes = base64.b64decode(output_byte)
ascii_msg = msg_bytes.decode('ascii')
# Json library convert stirng dictionary to real dictionary type.
# Double quotes is standard format for json
ascii_msg = ascii_msg.replace("'", "\"")
output_dict = json.loads(ascii_msg) # convert string dictionary to dict format
# Show the input and output
print("input_dict:", input_dict, type(input_dict))
print()
print("base64:", output_byte, type(output_byte))
print()
print("output_dict:", output_dict, type(output_dict))
Simple 😁
data = eval(b"{'one': 1, 'two': 2}")
If I have dictionary like:
{
"cats": {
"sphinx": 3,
"british": 2
},
"dogs": {}
}
And try to save it to a text file, I get something like this:
{"cats": {"sphinx": 3}, {"british": 2}, "dogs": {}}
How can I save a dictionary in pretty format, so it will be easy to read by human eye?
You can import json and specify an indent level:
import json
d = {
"cats": {
"sphinx": 3,
"british": 2
},
"dogs": {}
}
j = json.dumps(d, indent=4)
print(j)
{
"cats": {
"sphinx": 3,
"british": 2
},
"dogs": {}
}
Note that this is a string, however:
>>> j
'{\n "cats": {\n "sphinx": 3, \n "british": 2\n }, \n "dogs": {}\n}'
You can use pprint for that:
import pprint
pprint.pformat(thedict)
If you want to save it in a more standard format, you can also use, for example, a yaml file (and the related python package http://pyyaml.org/wiki/PyYAMLDocumentation), and the code would look like:
import yaml
dictionary = {"cats": {"sphinx": 3}, {"british": 2}, "dogs": {}}
with open('dictionary_file.yml', 'w') as yaml_file:
yaml.dump(dictionary, stream=yaml_file, default_flow_style=False)
dump creates a string in the yaml format to be written to the file. Note that it is possible to specify the stream and write the content immediately to the file. If it is necessary to get the string for some reason before writing to the file, just don't specify it and write it after using write function for the file.
Note also that the parameter default_flow_style allows to have a nicer format; in the example the file looks:
cats:
british: 2
sphinx: 3
dogs: {}
To load again the yaml file in a dictionary:
import yaml
with open('dictionary_file.yml', 'r') as yaml_file:
dictionary = yaml.load(yaml_file)
You can dump it by using the Python Object Notation module (pon: disclaimer I am the author of that module)
from pon import PON, loads
data = {
"cats": {
"sphinx": 3,
"british": 2
},
"dogs": {}
}
pon = PON(obj=data)
pon.dump()
which gives:
dict(
cats=dict(
sphinx=3,
british=2,
),
dogs=dict( ),
)
which again is correct Python, but trading the quoted strings needed for keys by using dict .
You can load this again with:
read_back = loads(open('file_name.pon').read())
print(read_back)
giving:
{'cats': {'sphinx': 3, 'british': 2}, 'dogs': {}}
Please note that loads() does not evaluate the string, it actually parses it safely using python's built-in parser.
PON also allows you to load python dictionaries from files, that have commented entries, and dump them while preserving the comments. This is where it's real usefulness comes into action.
Alternatively, if you would like something, arbitrarily more readable like the YAML format, you can use ruamel.yaml and do:
import ruamel.yaml
ruamel.yaml.round_trip_dump(data, stream=open('file_name.yaml', 'wb'), indent=4)
which gives you a file file_name.yaml with contents:
cats:
sphinx: 3
british: 2
dogs: {}
which uses the indent you seem to prefer (and is more efficient than #alberto's version)
In other words, what's the sprintf equivalent to pprint?
The pprint module has a function named pformat, for just that purpose.
From the documentation:
Return the formatted representation of object as a string. indent,
width and depth will be passed to the PrettyPrinter constructor as
formatting parameters.
Example:
>>> import pprint
>>> people = [
... {"first": "Brian", "last": "Kernighan"},
... {"first": "Dennis", "last": "Richie"},
... ]
>>> pprint.pformat(people, indent=4)
"[ { 'first': 'Brian', 'last': 'Kernighan'},\n { 'first': 'Dennis', 'last': 'Richie'}]"
Assuming you really do mean pprint from the pretty-print library, then you want
the pprint.pformat function.
If you just mean print, then you want str()
>>> import pprint
>>> pprint.pformat({'key1':'val1', 'key2':[1,2]})
"{'key1': 'val1', 'key2': [1, 2]}"
>>>
Are you looking for pprint.pformat?
Something like this:
import pprint, StringIO
s = StringIO.StringIO()
pprint.pprint(some_object, s)
print s.getvalue() # displays the string