str = "{ u'source_ip', u'127.0.0.1'}, { u'db_ip', u'43.53.696.23'}, { u'db_port', u'3306'}, { u'user_name', u'uz,ifls'} "
how do I convert this string to dict?
"source_ip":"127.0.0.1","db_ip":"43.53.696.23","db_port":"3306"
I have tried
str = dict(str)
but it didn't work
Those fragments look like python sets. If you run them through ast.literal_eval you get something close, but since sets are not ordered, you can't guarantee which of the two items is the key and which is the value. This is a total hack, but I replaced the curly braces with parens so they look more tuple-like and made the dictionary from there.
>>> mystr = "{ u'source_ip', u'127.0.0.1'}, { u'db_ip', u'43.53.696.23'}, { u'db_port', u'3306'}, { u'user_name', u'uz,ifls'} "
>>> mystr = mystr.replace('{', '(').replace('}', ')')
>>> import ast
>>> mydict = dict(ast.literal_eval(mystr))
>>> mydict
{u'user_name': u'uz,ifls', u'db_port': u'3306', u'source_ip': u'127.0.0.1', u'db_ip': u'43.53.696.23'}
>>>
A few points:
The top-level data structure is actually a tuple (because in Python, 1, 2, 3 is the same as (1, 2, 3).
As others have pointed out, the inner data structures are set literals, which are not ordered.
Set literals are implemented in Python 2.6 but not in its ast.literal_eval function, which is arguably a bug.
As it turns out, you can make your own custom literal_eval function and make it do what you want.
from _ast import *
from ast import *
# This is mostly copied from `ast.py` in your Python source.
def literal_eval(node_or_string):
"""
Safely evaluate an expression node or a string containing a Python
expression. The string or node provided may only consist of the following
Python literal structures: strings, bytes, numbers, tuples, lists, dicts,
sets, booleans, and None.
"""
if isinstance(node_or_string, str):
node_or_string = parse(node_or_string, mode='eval')
if isinstance(node_or_string, Expression):
node_or_string = node_or_string.body
def _convert(node):
if isinstance(node, (Str)):
return node.s
elif isinstance(node, Tuple):
return tuple(map(_convert, node.elts))
elif isinstance(node, Set):
# ** This is the interesting change.. when
# we see a set literal, we return a tuple.
return tuple(map(_convert, node.elts))
elif isinstance(node, Dict):
return dict((_convert(k), _convert(v)) for k, v
in zip(node.keys, node.values))
raise ValueError('malformed node or string: ' + repr(node))
return _convert(node_or_string)
Then we can do:
>>> s = "{ u'source_ip', u'127.0.0.1'}, { u'db_ip', u'43.53.696.23'}, { u'db_port', u'3306'}, { u'user_name', u'uz,ifls'} "
>>> dict(literal_eval(s))
{u'user_name': u'uz,ifls', u'db_port': u'3306', u'source_ip': u'127.0.0.1', u'db_ip': u'43.53.696.23'}
I don't know if you want to convert your entire input string to a dict or not, because the output you gave confuses me.
Otherwise, my answer will give you an output like the second hilighted text you want in a dict format:
a = "{ u'source_ip', u'127.0.0.1'}, { u'db_ip', u'43.53.696.23'}, { u'db_port', u'3306'}, { u'user_name', u'uz,ifls'} "
c = a.replace("{", '').replace("}","").replace(" u'", '').replace("'", '').replace(" ", "").split(",")
d, j = {}, 0
for i in range(len(c)):
if j +2 > len(c):
break
if c[j] == "user_name":
#d[c[j]] = "uz,ifls" #uncomment this line to have a complete dict
continue
d[c[j]] = c[j+1]
j += 2
Output:
print d
{'db_port': '3306', 'source_ip': '127.0.0.1', 'db_ip': '43.53.696.23'}
print type(d)
<type 'dict'>
If you want to have a complete dict of your string uncomment the line which is commented above, and the output will be:
print d
{'user_name': 'uz,ifls', 'db_port': '3306', 'source_ip': '127.0.0.1', 'db_ip': '43.53.696.23'}
print type(d)
<type 'dict'>
Related
I have seen quite a few links but mostly it gives me errors:
ValueError: Parse error: unable to parse:
'hover_data=["Confirmed","Deaths","Recovered"],
animation_frame="Date",color_continuous_scale="Portland",radius=7,
zoom=0,height=700"'
For example I want to convert the following string into a dict:
abc= 'fn=True, lat="Lat", lon="Long", hover_name="Country/Province/State",hover_data=["Confirmed","Deaths","Recovered"], animation_frame="Date",color_continuous_scale="Portland",radius=7, zoom=0,height=700"'
Expected output:
{'fn': True, "lat":"Lat",
"lon":"Long",
"hover_name":"Country/Province/State",
"hover_data":["Confirmed","Deaths","Recovered"],
"animation_frame":"Date",
"color_continuous_scale":"Portland",
"radius":7,
"zoom":0,
"height":700}
I tried to use this reference's code:
import re
keyval_re = re.compile(r'''
\s* # Leading whitespace is ok.
(?P<key>\w+)\s*=\s*( # Search for a key followed by..
(?P<str>"[^"]*"|\'[^\']*\')| # a quoted string; or
(?P<float>\d+\.\d+)| # a float; or
(?P<int>\d+) # an int.
)\s*,?\s* # Handle comma & trailing whitespace.
|(?P<garbage>.+) # Complain if we get anything else!
''', re.VERBOSE)
def handle_keyval(match):
if match.group('garbage'):
raise ValueError("Parse error: unable to parse: %r" %
match.group('garbage'))
key = match.group('key')
if match.group('str') is not None:
return (key, match.group('str')[1:-1]) # strip quotes
elif match.group('float') is not None:
return (key, float(match.group('float')))
elif match.group('int') is not None:
return (key, int(match.group('int')))
elif match.group('list') is not None:
return (key, int(match.group('list')))
elif match.group('bool') is not None:
return (key, int(match.group('bool')))
print(dict(handle_keyval(m) for m in keyval_re.finditer(abc)))
There seems to be an unwanted double-quote character as the last character of your string abc.
If that is removed, the following solution will work nicely:
eval("dict(" + abc + ")")
Output:
{'fn': True,
'lat': 'Lat',
'lon': 'Long',
'hover_name': 'Country/Province/State',
'hover_data': ['Confirmed', 'Deaths', 'Recovered'],
'animation_frame': 'Date',
'color_continuous_scale': 'Portland',
'radius': 7,
'zoom': 0,
'height': 700}
⚠️ DON'T USE EVAL.
import re, ast
test_string = 'fn=True, lat="Lat", lon="Long", hover_name="Country/Province/State",hover_data=["Confirmed","Deaths","Recovered"], animation_frame="Date",color_continuous_scale="Portland",radius=7, zoom=0,height=700'
items = re.split(r', |,(?=\w)', test_string)
d = {
key: ast.literal_eval(val)
for item in items
for key, val in [re.split(r'=|\s*=\s*', item)]
}
print(d)
I used a very simple method. Just splitted the string on , and then plain dict comprehension. I've also used ast.literal_eval() to convert strings into their respective keywords and data types.
Is there are more readable way to check if a key buried in a dict exists without checking each level independently?
Lets say I need to get this value in a object buried (example taken from Wikidata):
x = s['mainsnak']['datavalue']['value']['numeric-id']
To make sure that this does not end with a runtime error it is necessary to either check every level like so:
if 'mainsnak' in s and 'datavalue' in s['mainsnak'] and 'value' in s['mainsnak']['datavalue'] and 'nurmeric-id' in s['mainsnak']['datavalue']['value']:
x = s['mainsnak']['datavalue']['value']['numeric-id']
The other way I can think of to solve this is wrap this into a try catch construct which I feel is also rather awkward for such a simple task.
I am looking for something like:
x = exists(s['mainsnak']['datavalue']['value']['numeric-id'])
which returns True if all levels exists.
To be brief, with Python you must trust it is easier to ask for forgiveness than permission
try:
x = s['mainsnak']['datavalue']['value']['numeric-id']
except KeyError:
pass
The answer
Here is how I deal with nested dict keys:
def keys_exists(element, *keys):
'''
Check if *keys (nested) exists in `element` (dict).
'''
if not isinstance(element, dict):
raise AttributeError('keys_exists() expects dict as first argument.')
if len(keys) == 0:
raise AttributeError('keys_exists() expects at least two arguments, one given.')
_element = element
for key in keys:
try:
_element = _element[key]
except KeyError:
return False
return True
Example:
data = {
"spam": {
"egg": {
"bacon": "Well..",
"sausages": "Spam egg sausages and spam",
"spam": "does not have much spam in it"
}
}
}
print 'spam (exists): {}'.format(keys_exists(data, "spam"))
print 'spam > bacon (do not exists): {}'.format(keys_exists(data, "spam", "bacon"))
print 'spam > egg (exists): {}'.format(keys_exists(data, "spam", "egg"))
print 'spam > egg > bacon (exists): {}'.format(keys_exists(data, "spam", "egg", "bacon"))
Output:
spam (exists): True
spam > bacon (do not exists): False
spam > egg (exists): True
spam > egg > bacon (exists): True
It loop in given element testing each key in given order.
I prefere this to all variable.get('key', {}) methods I found because it follows EAFP.
Function except to be called like: keys_exists(dict_element_to_test, 'key_level_0', 'key_level_1', 'key_level_n', ..). At least two arguments are required, the element and one key, but you can add how many keys you want.
If you need to use kind of map, you can do something like:
expected_keys = ['spam', 'egg', 'bacon']
keys_exists(data, *expected_keys)
You could use .get with defaults:
s.get('mainsnak', {}).get('datavalue', {}).get('value', {}).get('numeric-id')
but this is almost certainly less clear than using try/except.
Python 3.8 +
dictionary = {
"main_key": {
"sub_key": "value",
},
}
if sub_key_value := dictionary.get("main_key", {}).get("sub_key"):
print(f"The key 'sub_key' exists in dictionary[main_key] and it's value is {sub_key_value}")
else:
print("Key 'sub_key' doesn't exists or their value is Falsy")
Extra
A little but important clarification.
In the previous code block, we verify that a key exists in a dictionary but that its value is also Truthy.
Most of the time, this is what people are really looking for, and I think this is what the OP really wants. However, it is not really the most "correct" answer, since if the key exists but its value is False, the above code block will tell us that the key does not exist, which is not true.
So, I leet here a more correct answer:
dictionary = {
"main_key": {
"sub_key": False,
},
}
if "sub_key" in dictionary.get("main_key", {}):
print(f"The key 'sub_key' exists in dictionary[main_key] and it's value is {dictionary['main_key']['sub_key']}")
else:
print("Key 'sub_key' doesn't exists")
Try/except seems to be most pythonic way to do that.
The following recursive function should work (returns None if one of the keys was not found in the dict):
def exists(obj, chain):
_key = chain.pop(0)
if _key in obj:
return exists(obj[_key], chain) if chain else obj[_key]
myDict ={
'mainsnak': {
'datavalue': {
'value': {
'numeric-id': 1
}
}
}
}
result = exists(myDict, ['mainsnak', 'datavalue', 'value', 'numeric-id'])
print(result)
>>> 1
I suggest you to use python-benedict, a solid python dict subclass with full keypath support and many utility methods.
You just need to cast your existing dict:
s = benedict(s)
Now your dict has full keypath support and you can check if the key exists in the pythonic way, using the in operator:
if 'mainsnak.datavalue.value.numeric-id' in s:
# do stuff
Here the library repository and the documentation:
https://github.com/fabiocaccamo/python-benedict
Note: I am the author of this project
You can use pydash to check if exists: http://pydash.readthedocs.io/en/latest/api.html#pydash.objects.has
Or get the value (you can even set default - to return if doesn't exist): http://pydash.readthedocs.io/en/latest/api.html#pydash.objects.has
Here is an example:
>>> get({'a': {'b': {'c': [1, 2, 3, 4]}}}, 'a.b.c[1]')
2
The try/except way is the most clean, no contest. However, it also counts as an exception in my IDE, which halts execution while debugging.
Furthermore, I do not like using exceptions as in-method control statements, which is essentially what is happening with the try/catch.
Here is a short solution which does not use recursion, and supports a default value:
def chained_dict_lookup(lookup_dict, keys, default=None):
_current_level = lookup_dict
for key in keys:
if key in _current_level:
_current_level = _current_level[key]
else:
return default
return _current_level
The accepted answer is a good one, but here is another approach. It's a little less typing and a little easier on the eyes (in my opinion) if you end up having to do this a lot. It also doesn't require any additional package dependencies like some of the other answers. Have not compared performance.
import functools
def haskey(d, path):
try:
functools.reduce(lambda x, y: x[y], path.split("."), d)
return True
except KeyError:
return False
# Throwing in this approach for nested get for the heck of it...
def getkey(d, path, *default):
try:
return functools.reduce(lambda x, y: x[y], path.split("."), d)
except KeyError:
if default:
return default[0]
raise
Usage:
data = {
"spam": {
"egg": {
"bacon": "Well..",
"sausages": "Spam egg sausages and spam",
"spam": "does not have much spam in it",
}
}
}
(Pdb) haskey(data, "spam")
True
(Pdb) haskey(data, "spamw")
False
(Pdb) haskey(data, "spam.egg")
True
(Pdb) haskey(data, "spam.egg3")
False
(Pdb) haskey(data, "spam.egg.bacon")
True
Original inspiration from the answers to this question.
EDIT: a comment pointed out that this only works with string keys. A more generic approach would be to accept an iterable path param:
def haskey(d, path):
try:
functools.reduce(lambda x, y: x[y], path, d)
return True
except KeyError:
return False
(Pdb) haskey(data, ["spam", "egg"])
True
I had the same problem and recent python lib popped up:
https://pypi.org/project/dictor/
https://github.com/perfecto25/dictor
So in your case:
from dictor import dictor
x = dictor(s, 'mainsnak.datavalue.value.numeric-id')
Personal note:
I don't like 'dictor' name, since it doesn't hint what it actually does. So I'm using it like:
from dictor import dictor as extract
x = extract(s, 'mainsnak.datavalue.value.numeric-id')
Couldn't come up with better naming than extract. Feel free to comment, if you come up with more viable naming. safe_get, robust_get didn't felt right for my case.
Another way:
def does_nested_key_exists(dictionary, nested_key):
exists = nested_key in dictionary
if not exists:
for key, value in dictionary.items():
if isinstance(value, dict):
exists = exists or does_nested_key_exists(value, nested_key)
return exists
The selected answer works well on the happy path, but there are a couple obvious issues to me. If you were to search for ["spam", "egg", "bacon", "pizza"], it would throw a type error due to trying to index "well..." using the string "pizza". Like wise, if you replaced pizza with 2, it would use that to get the index 2 from "Well..."
Selected Answer Output Issues:
data = {
"spam": {
"egg": {
"bacon": "Well..",
"sausages": "Spam egg sausages and spam",
"spam": "does not have much spam in it"
}
}
}
print(keys_exists(data, "spam", "egg", "bacon", "pizza"))
>> TypeError: string indices must be integers
print(keys_exists(data, "spam", "egg", "bacon", 2)))
>> l
I also feel that using try except can be a crutch that we might too quickly rely on. Since I believe we already need to check for the type, might as well remove the try except.
Solution:
def dict_value_or_default(element, keys=[], default=Undefined):
'''
Check if keys (nested) exists in `element` (dict).
Returns value if last key exists, else returns default value
'''
if not isinstance(element, dict):
return default
_element = element
for key in keys:
# Necessary to ensure _element is not a different indexable type (list, string, etc).
# get() would have the same issue if that method name was implemented by a different object
if not isinstance(_element, dict) or key not in _element:
return default
_element = _element[key]
return _element
Output:
print(dict_value_or_default(data, ["spam", "egg", "bacon", "pizza"]))
>> INVALID
print(dict_value_or_default(data, ["spam", "egg", "bacon", 2]))
>> INVALID
print(dict_value_or_default(data, ["spam", "egg", "bacon"]))
>> "Well..."
Here's my small snippet based on #Aroust's answer:
def exist(obj, *keys: str) -> bool:
_obj = obj
try:
for key in keys:
_obj = _obj[key]
except (KeyError, TypeError):
return False
return True
if __name__ == '__main__':
obj = {"mainsnak": {"datavalue": {"value": "A"}}}
answer = exist(obj, "mainsnak", "datavalue", "value", "B")
print(answer)
I added TypeError because when _obj is str, int, None, or etc, it would raise that error.
I wrote a data parsing library called dataknead for cases like this, basically because i got frustrated by the JSON the Wikidata API returns as well.
With that library you could do something like this
from dataknead import Knead
numid = Knead(s).query("mainsnak/datavalue/value/numeric-id").data()
if numid:
# Do something with `numeric-id`
Using dict with defaults is concise and appears to execute faster than using consecutive if statements.
Try it yourself:
import timeit
timeit.timeit("'x' in {'a': {'x': {'y'}}}.get('a', {})")
# 0.2874350370002503
timeit.timeit("'a' in {'a': {'x': {'y'}}} and 'x' in {'a': {'x': {'y'}}}['a']")
# 0.3466246419993695
I have written a handy library for this purpose.
I am iterating over ast of the dict and trying to check if a particular key is present or not.
Do check this out.
https://github.com/Agent-Hellboy/trace-dkey
If you can suffer testing a string representation of the object path then this approach might work for you:
def exists(str):
try:
eval(str)
return True
except:
return False
exists("lst['sublist']['item']")
one can try to use this for checking whether key/nestedkey/value is in nested dict
import yaml
#d - nested dictionary
if something in yaml.dump(d, default_flow_style=False):
print(something, "is in", d)
else:
print(something, "is not in", d)
There are many great answers. here is my humble take on it. Added check for array of dictionaries as well. Please note that I am not checking for arguments validity. I used part Arnot's code above. I added this answer because a I got a use case that requires checking array or dictionaries in my data.
Here is the code:
def keys_exists(element, *keys):
'''
Check if *keys (nested) exists in `element` (dict).
'''
retval=False
if isinstance(element,dict):
for key,value in element.items():
for akey in keys:
if element.get(akey) is not None:
return True
if isinstance(value,dict) or isinstance(value,list):
retval= keys_exists(value, *keys)
elif isinstance(element, list):
for val in element:
if isinstance(val,dict) or isinstance(val,list):
retval=keys_exists(val, *keys)
return retval
In my application I am receiving a string 'abc[0]=123'
I want to convert this string to an array of items. I have tried eval() it didnt work for me. I know the array name abc but the number of items will be different in each time.
I can split the string, get array index and do. But I would like to know if there is any direct way to convert this string as an array insert.
I would greately appreciate any suggestion.
are you looking for something like
In [36]: s = "abc[0]=123"
In [37]: vars()[s[:3]] = []
In [38]: vars()[s[:3]].append(eval(s[s.find('=') + 1:]))
In [39]: abc
Out[39]: [123]
But this is not a good way to create a variable
Here's a function for parsing urls according to php rules (i.e. using square brackets to create arrays or nested structures):
import urlparse, re
def parse_qs_as_php(qs):
def sint(x):
try:
return int(x)
except ValueError:
return x
def nested(rest, base, val):
curr, rest = base, re.findall(r'\[(.*?)\]', rest)
while rest:
curr = curr.setdefault(
sint(rest.pop(0) or len(curr)),
{} if rest else val)
return base
def dtol(d):
if not hasattr(d, 'items'):
return d
if sorted(d) == range(len(d)):
return [d[x] for x in range(len(d))]
return {k:dtol(v) for k, v in d.items()}
r = {}
for key, val in urlparse.parse_qsl(qs):
id, rest = re.match(r'^(\w+)(.*)$', key).groups()
r[id] = nested(rest, r.get(id, {}), val) if rest else val
return dtol(r)
Example:
qs = 'one=1&abc[0]=123&abc[1]=345&foo[bar][baz]=555'
print parse_qs_as_php(qs)
# {'abc': ['123', '345'], 'foo': {'bar': {'baz': '555'}}, 'one': '1'}
Your other application is doing it wrong. It should not be specifying index values in the parameter keys. The correct way to specify multiple values for a single key in a GET is to simply repeat the key:
http://my_url?abc=123&abc=456
The Python server side should correctly resolve this into a dictionary-like object: you don't say what framework you're running, but for instance Django uses a QueryDict which you can then access using request.GET.getlist('abc') which will return ['123', '456']. Other frameworks will be similar.
I am trying to implement a simple twisted HTTP server which would respond to requests for loading tiles from a database and return them. However I find that the way it interprets request strings quite odd.
This is what I POST to the server:
curl -d "request=loadTiles&grid[0][x]=17&grid[0][y]=185&grid[1][x]=18&grid[1][y]=184" http://localhost:8080/fetch/
What I expect the request.args to be:
{'request': 'loadTiles', 'grid': [{'x': 17, 'y': 185}, {'x': 18, 'y': 184}]}
How Twisted interprets request.args:
{'grid[1][y]': ['184'], 'grid[0][y]': ['185'], 'grid[1][x]': ['18'], 'request': ['loadTiles'], 'grid[0][x]': ['17']}
Is it possible to have it automatically parse the request string and create a list for the grid parameter or do I have to do it manually?
I could json encode the grid parameter and then decode it server side, but it seems like an unneccssary hack.
I don't know why you would expect your urlencoded data to be decoded according to some ad-hoc non-standard rules, or why you would consider the standard treatment "odd"; [ isn't special in query strings. What software decodes them this way?
In any event, this isn't really Twisted, but Python (and more generally speaking, the web-standard way of parsing this data). You can see the sort of data you'll get back via the cgi.parse_qs function interactively. For example:
>>> import cgi
>>> cgi.parse_qs("")
{}
>>> cgi.parse_qs("x=1")
{'x': ['1']}
>>> cgi.parse_qs("x[something]=1")
{'x[something]': ['1']}
>>> cgi.parse_qs("x=1&y=2")
{'y': ['2'], 'x': ['1']}
>>> cgi.parse_qs("x=1&y=2&x=3")
{'y': ['2'], 'x': ['1', '3']}
I hope that clears things up for you.
Maybe instead of a parser, how about something to post-process the request.args you are getting?
from pyparsing import Suppress, alphas, alphanums, nums, Word
from itertools import groupby
# you could do this with regular expressions too, if you prefer
LBRACK,RBRACK = map(Suppress, '[]')
ident = Word('_' + alphas, '_' + alphanums)
integer = Word(nums).setParseAction(lambda t : int(t[0]))
subscriptedRef = ident + 2*(LBRACK + (ident | integer) + RBRACK)
def simplify_value(v):
if isinstance(v,list) and len(v)==1:
return simplify_value(v[0])
if v == integer:
return int(v)
return v
def regroup_args(dd):
ret = {}
subscripts = []
for k,v in dd.items():
# this is a pyparsing short-cut to see if a string matches a pattern
# I also used it above in simplify_value to test for integerness of a string
if k == subscriptedRef:
subscripts.append(tuple(subscriptedRef.parseString(k))+
(simplify_value(v),))
else:
ret[k] = simplify_value(v)
# sort all the matched subscripted args, and then use groupby to
# group by name and list index
# this assumes all indexes 0-n are present in the parsed arguments
subscripts.sort()
for name,nameitems in groupby(subscripts, key=lambda x:x[0]):
ret[name] = []
for idx,idxitems in groupby(nameitems, key=lambda x:x[1]):
idd = {}
for item in idxitems:
name, i, attr, val = item
idd[attr] = val
ret[name].append(idd)
return ret
request_args = {'grid[1][y]': ['184'], 'grid[0][y]': ['185'], 'grid[1][x]': ['18'], 'request': ['loadTiles'], 'grid[0][x]': ['17']}
print regroup_args(request_args)
prints
{'grid': [{'y': 185, 'x': 17}, {'y': 184, 'x': 18}], 'request': 'loadTiles'}
Note that this also simplifies the single-element lists to just the 0'th element value, and converts the numeric strings to actual integers.
Is there a Python module that can be used in the same way as Perl's Data::Dumper module?
Edit: Sorry, I should have been clearer. I was mainly after a module for inspecting data rather than persisting.
BTW Thanks for the answers. This is one awesome site!
Data::Dumper has two main uses: data persistence and debugging/inspecting objects. As far as I know, there isn't anything that's going to work exactly the same as Data::Dumper.
I use pickle for data persistence.
I use pprint to visually inspect my objects / debug.
I think the closest you will find is the pprint module.
>>> l = [1, 2, 3, 4]
>>> l.append(l)
>>> d = {1: l, 2: 'this is a string'}
>>> print d
{1: [1, 2, 3, 4, [...]], 2: 'this is a string'}
>>> pprint.pprint(d)
{1: [1, 2, 3, 4, <Recursion on list with id=47898714920216>],
2: 'this is a string'}
Possibly a couple of alternatives: pickle, marshal, shelve.
I too have been using Data::Dumper for quite some time and have gotten used to its way of displaying nicely formatted complex data structures. pprint as mentioned above does a pretty decent job, but I didn't quite like its formatting style. That plus pprint doesn't allow you to inspect objects like Data::Dumper does:
Searched on the net and came across these:
https://gist.github.com/1071857#file_dumper.pyamazon
>>> y = { 1: [1,2,3], 2: [{'a':1},{'b':2}]}
>>> pp = pprint.PrettyPrinter(indent = 4)
>>> pp.pprint(y)
{ 1: [1, 2, 3], 2: [{ 'a': 1}, { 'b': 2}]}
>>> print(Dumper.dump(y)) # Dumper is the python module in the above link
{
1: [
1
2
3
]
2: [
{
'a': 1
}
{
'b': 2
}
]
}
>>> print(Dumper.dump(pp))
instance::pprint.PrettyPrinter
__dict__ :: {
'_depth': None
'_stream': file:: >
'_width': 80
'_indent_per_level': 4
}
Also worth checking is http://salmon-protocol.googlecode.com/svn-history/r24/trunk/salmon-playground/dumper.py It has its own style and seems useful too.
Here is a simple solution for dumping nested data made up of dictionaries, lists, or tuples (it works quite well for me):
Python2
def printStruct(struc, indent=0):
if isinstance(struc, dict):
print ' '*indent+'{'
for key,val in struc.iteritems():
if isinstance(val, (dict, list, tuple)):
print ' '*(indent+1) + str(key) + '=> '
printStruct(val, indent+2)
else:
print ' '*(indent+1) + str(key) + '=> ' + str(val)
print ' '*indent+'}'
elif isinstance(struc, list):
print ' '*indent + '['
for item in struc:
printStruct(item, indent+1)
print ' '*indent + ']'
elif isinstance(struc, tuple):
print ' '*indent + '('
for item in struc:
printStruct(item, indent+1)
print ' '*indent + ')'
else: print ' '*indent + str(struc)
Python3
def printStruct(struc, indent=0):
if isinstance(struc, dict):
print (' '*indent+'{')
for key,val in struc.items():
if isinstance(val, (dict, list, tuple)):
print (' '*(indent+1) + str(key) + '=> ')
printStruct(val, indent+2)
else:
print (' '*(indent+1) + str(key) + '=> ' + str(val))
print (' '*indent+'}')
elif isinstance(struc, list):
print (' '*indent + '[')
for item in struc:
printStruct(item, indent+1)
print (' '*indent + ']')
elif isinstance(struc, tuple):
print (' '*indent + '(')
for item in struc:
printStruct(item, indent+1)
print (' '*indent + ')')
else: print (' '*indent + str(struc))
See it at work:
>>> d = [{'a1':1, 'a2':2, 'a3':3}, [1,2,3], [{'b1':1, 'b2':2}, {'c1':1}], 'd1', 'd2', 'd3']
>>> printStruct(d)
[
{
a1=> 1
a3=> 3
a2=> 2
}
[
1
2
3
]
[
{
b1=> 1
b2=> 2
}
{
c1=> 1
}
]
d1
d2
d3
]
For serialization, there are many options.
One of the best is JSON, which is a language-agnostic standard for serialization. It is available in 2.6 in the stdlib json module and before that with the same API in the third-party simplejson module.
You do not want to use marshal, which is fairly low-level. If you wanted what it provides, you would use pickle.
I avoid using pickle the format is Python-only and insecure. Deserializing using pickle can execute arbitrary code.
If you did use pickle, you want to use the C implementation thereof. (Do import cPickle as pickle.)
For debugging, you usually want to look at the object's repr or to use the pprint module.
I needed to return Perl-like dump for API request so I came up with this which doesn't format the output to be pretty, but makes perfect job for me.
from decimal import Decimal
from datetime import datetime, date
def dump(self, obj):
if obj is None:
return "undef"
if isinstance(obj, dict):
return self.dump_dict(obj)
if isinstance(obj, (list, tuple)):
return self.dump_list(obj)
if isinstance(obj, Decimal):
return "'{:.05f}'".format(obj)
# ... or handle it your way
if isinstance(obj, (datetime, date)):
return "'{}'".format(obj.isoformat(
sep=' ',
timespec='milliseconds'))
# ... or handle it your way
return "'{}'".format(obj)
def dump_dict(self, obj):
result = []
for key, val in obj.items():
result.append(' => '.join((self.dump(key), self.dump(val))))
return ' '.join(('{', ', '.join(result), '}'))
def dump_list(self, obj):
result = []
for val in obj:
result.append(self.dump(val))
return ' '.join(('[', ', '.join(result), ']'))
Using the above:
example_dict = {'a': 'example1', 'b': 'example2', 'c': [1, 2, 3, 'asd'], 'd': [{'g': 'something1', 'e': 'something2'}, {'z': 'something1'}]}
print(dump(example_dict))
will ouput:
{ 'b' => 'example2', 'a' => 'example1', 'd' => [ { 'g' => 'something1', 'e' => 'something2' }, { 'z' => 'something1' } ], 'c' => [ '1', '2', '3', 'asd' ] }
If you want something that works better than pprint, but doesn't require rolling your own, try importing dumper from pypi:
https://github.com/jric/Dumper.py or
https://github.com/ericholscher/pypi/blob/master/dumper.py
Saw this and realized Python has something that works akin to Data::Dumper in Dumper. The author describes it as
Dump Python data structures (including class instances) in a nicely-
nested, easy-to-read form. Handles recursive data structures properly,
and has sensible options for limiting the extent of the dump both by
simple depth and by some rules for how to handle contained instances.
Install it via pip. The Github repo is at https://github.com/jric/Dumper.py.
As far as inspecting your object goes, I found this a useful equivalent of Data:Dumper:
https://salmon-protocol.googlecode.com/svn-history/r24/trunk/salmon-playground/dumper.py
dumper.py archived from Google Code to Wayback Machine (archive.org)
or
dumper.py exported from Google Code to Github
It can handle unicode strings.
Isn't it true that all the answers ignore dumping anything else than a combination of the basic data types like lists, dicts, etc.? I have just ran pprint on an Python Xlib Window object and got… <<class 'Xlib.display.Window'> 0x004001fe>… that's all. No data fields, no methods listed… Is there anything that does a real dump of an object? With e.g.: all its attributes?