Related
I have a json file with about a 1000 data entries. For example
{"1":"Action","2":"Adventure",....."1000":"Mystery"}
The above is just a example.
I am using the json.load feature by importing json.
How do I load only the first 10 data entries from the json.
{"1":"Action","2":"Adventure",....."10":"Thriller"}
You can iteratively parse json (that is to say, not "all at once") using ijson, and assuming your input really is as simple as your example:
import ijson
def iter_items(parser):
for prefix, event, value in parser:
if event == 'string':
yield prefix, value
with open('filename.json') as infile:
items = iter_items(ijson.parser(infile))
# choose one of the following
# first 10 items from the file regardless of keys
print dict(itertools.islice(items, 10))
# least 10 keys when considered as integers
print dict(heapq.nsmallest(items, 10, lambda p: int(p[0])))
Obviously the second of these would still have to read the whole file, it just doesn't have to keep the whole file in memory at once. Avoiding that is premature optimization for only 1000 small key-value pairs, but whatever. I found the question interesting enough to use a library I've never considered before because maybe sometimes json files are huge, and because of the close analogy with SAX parsers (which are event-based streaming parsers for XML).
By the way, if order was important then the producer of this JSON probably should put an array in the JSON. But perhaps as consumer you can't do anything about that.
JSON objects, like Python dictionaries, have no order. You can also not control how much of an object is loaded, not with the standard library json module at any rate.
After loading, you could take the ten key-value pairs with the lowest key value:
import heapq
import json
data = json.loads(json_string)
limited = {k: data[k] for k in heapq.nsmallest(data, 10, key=int)}
The heapq.nsmallest() will efficiently pick out the 10 smallest keys regardless of the size of data.
Of course, if the keys are always consecutive and always start at 1, you may as well use a range() here:
data = json.loads(json_string)
limited = {str(k): data[str(k)] for k in range(1, 11)}
If you want to capture the objects in file definition order you could use the object_pairs_hook argument to json.load() and json.loads():
class FirstTenDict(dict):
def __init__(self, pairs):
super(FirstTenDict, self).__init__(pairs[:10])
data = json.loads(json_string, object_pairs_hook=FirstTenDict)
Demo of the latter approach:
>>> import json
>>> class FirstTenDict(dict):
... def __init__(self, pairs):
... super(FirstTenDict, self).__init__(pairs[:10])
...
>>> json_data = '''\
... {"foo42": "bar", "foo31": "baz", "foo10": "spam", "foo44": "ham", "foo1": "eggs",
... "foo24": "vikings", "foo21": "monty", "foo88": "python", "foo11": "eric", "foo65": "idle",
... "foo13": "will", "foo31": "be", "foo76": "ignored"}
... '''
>>> json.loads(json_data)
{'foo1': 'eggs', 'foo88': 'python', 'foo44': 'ham', 'foo10': 'spam', 'foo76': 'ignored', 'foo42': 'bar', 'foo24': 'vikings', 'foo11': 'eric', 'foo31': 'be', 'foo13': 'will', 'foo21': 'monty', 'foo65': 'idle'}
>>> json.loads(json_data, object_pairs_hook=FirstTenDict)
{'foo1': 'eggs', 'foo88': 'python', 'foo44': 'ham', 'foo10': 'spam', 'foo24': 'vikings', 'foo11': 'eric', 'foo21': 'monty', 'foo42': 'bar', 'foo31': 'baz', 'foo65': 'idle'}
>>> import pprint
>>> pprint.pprint(_)
{'foo1': 'eggs',
'foo10': 'spam',
'foo11': 'eric',
'foo21': 'monty',
'foo24': 'vikings',
'foo31': 'baz',
'foo42': 'bar',
'foo44': 'ham',
'foo65': 'idle',
'foo88': 'python'}
file = 'data.json'
with open(file, 'rb') as f:
content = json.load(file)
what_you_want = {int(k):v for k,v in content.items() if int(k) in range(1, 11)}
I don't think there any other way.
You must load the entire thing and only then you can extract the keys you want.
In short, you can't.
While each entry is a JSON entry, the file as a whole is a valid JSON file.
For example:
"1":"Action" is proper JSON format, but you cannot load it on its own.
In order to be able to import it as a JSON format, you'll need the full syntax of it {"1":"Action"}
What you'll need to do is still load the whole file, then assign first 10 lines to a variable.
You have two options:
If you use Python >= 3.1 you can use
from collections import OrderedDict
decoder = json.JSONDecoder(object_pairs_hook=OrderedDict)
data = decoder.decode(datastring)
This will decode the whole file, but keep all key-value pairs in the same order as they were in the file.
Then you can slice the first n items with something like
result = OrderedDict((k,v) for (k,v),i in zip(data.items(), range(n)))
This isn't efficient, but you will get the first 10 entries, as they were written in the JSON.
The second option and the more efficient but harder one is using an iterative JSON parser like ijson as #steve-jessop mentioned.
If and only if your JSON files are always flat (don't contain any subobjects or lists), as your example in the question, the following code will put the first 10 elements into result. More complex files need more complex parser code.
import ijson
result = {}
for prefix, event, value in ijson.parse(file):
if event == 'map_key':
if len(result) > 10:
break
if prefix:
result[prefix] = value
I hope to use the dictionary load by json file. However, each item contains the character 'u'. I need to remove the 'u's.
I tried dumps, but it does not work.
import ast
import json
data= {u'dot',
u'dog',
u'fog',
u'eeee'}
res = eval(json.dumps(data))
print res
I hope to get: {
'dot',
'dog',
'fog,
'eeee'
}
But the error is:
TypeError: set([u'eeee', u'fog', u'dog', u'dot']) is not JSON serializable
The strings that start with u are unicode strings.
In your case, this has nothing to do with the problem:
data= {u'dot',
u'dog',
u'fog',
u'eeee'}
This creates a set and stores the results in the data variable. The json serializer can't handle sets since the json spec makes no mention of them. If you change this to be a list, the serializer can handle the data:
res = set(eval(json.dumps(list(data))))
Here I'm converting the data variable to a list to serialize it, then converting it back to a set to store the result in the res variable.
Alternatively, you can directly ask Python to convert the unicode strings to strings, using something like this:
res = {x.encode("utf-8") for x in data}
print(res)
Is there a simple way to create a dictionary from a list of formatted tuples. e.g. if I do something like:
d={"responseStatus":"SUCCESS","sessionId":"01234","userId":2000004904}
This creates a dictionary called d. However, if I want to create a dictionary from a string which contains the same string, I can't do that
res=<some command that returns {"responseStatus":"SUCCESS","sessionId":"01234","userId":2000004904}>
print res
# returns {"responseStatus":"SUCCESS","sessionId":"01234","userId":2000004904}
d=dict(res)
This throws an error that says:
ValueError: dictionary update sequence element #0 has length 1; 2 is required
I strongly strongly suspect that you have json on your hands.
import json
d = json.loads('{"responseStatus":"SUCCESS","sessionId":"01234","userId":2000004904}')
would give you what you want.
Use dict(zip(tuples))
>>> u = ("foo", "bar")
>>> v = ("blah", "zoop")
>>> d = dict(zip(u, v))
>>> d
{'foo': 'blah', 'bar': 'zoop'}
Note, if you have an odd number of tuples this will not work.
Based on what you gave is, res is
# returns {"responseStatus":"SUCCESS","sessionId":"01234","userId":2000004904}
So the plan is to grab the string starting at the curly brace to the end and use json to decode it:
import json
# Discard the text before the curly brace
res = res[res.index('{'):]
# Turn that text into a dictionary
d = json.loads(res)
All you need to do in your particular case is
d = eval(res)
And please keep security in mind when using eval, especially if you're mixing it with ajax/json.
UPDATE
Since others pointed out you might be getting this data over the web and it isn't just a "how to make this work" question, use this:
import json
json.loads(res)
How can I turn this string
"((145541L, u'/.stats/'), (175706L, u'///')"
to a json object in python such as
{'145541' : '/.stats/',
'175706' : '///'
}
I've been trying tuple() and others but it does
Thanks
Quick fix:
>>> import ast
>>> s = "((145541L, u'/.stats/'), (175706L, u'///')"
>>> {str(k): v for (k, v) in ast.literal_eval(s + ')')}
{'175706': u'///', '145541': u'/.stats/'}
But you should really try looking into json.loads instead.
You do most probably have a tuple of tuples, and want to create a dictionary. To do so, try the following:
data = ((145541L, u'/.stats/'), (175706L, u'///'))
result = dict(data)
If what you have is really a string, add the initial line:
data = "((145541L, u'/.stats/'), (175706L, u'///'))"
data = eval(data)
result = dict(data)
As pointed by #Volatility, eval may be dangerous, since it evaluates any piece of code, not only literals. This way someone could execute commands on your program if you received commands on your strings.
To avoid so, you may use ast.literal_eval instead:
from ast import literal_eval
data = "((145541L, u'/.stats/'), (175706L, u'///'))"
result = dict(literal_eval(data))
I am writing a program that stores data in a dictionary object, but this data needs to be saved at some point during the program execution and loaded back into the dictionary object when the program is run again.
How would I convert a dictionary object into a string that can be written to a file and loaded back into a dictionary object? This will hopefully support dictionaries containing dictionaries.
The json module is a good solution here. It has the advantages over pickle that it only produces plain text output, and is cross-platform and cross-version.
import json
json.dumps(dict)
If your dictionary isn't too big maybe str + eval can do the work:
dict1 = {'one':1, 'two':2, 'three': {'three.1': 3.1, 'three.2': 3.2 }}
str1 = str(dict1)
dict2 = eval(str1)
print(dict1 == dict2)
You can use ast.literal_eval instead of eval for additional security if the source is untrusted.
I use json:
import json
# convert to string
input_ = json.dumps({'id': id_ })
# load to dict
my_dict = json.loads(input_)
Why not to use Python 3's inbuilt ast library's function literal_eval. It is better to use literal_eval instead of eval
import ast
str_of_dict = "{'key1': 'key1value', 'key2': 'key2value'}"
ast.literal_eval(str_of_dict)
will give output as actual Dictionary
{'key1': 'key1value', 'key2': 'key2value'}
And If you are asking to convert a Dictionary to a String then, How about using str() method of Python.
Suppose the dictionary is :
my_dict = {'key1': 'key1value', 'key2': 'key2value'}
And this will be done like this :
str(my_dict)
Will Print :
"{'key1': 'key1value', 'key2': 'key2value'}"
This is the easy as you like.
Use the pickle module to save it to disk and load later on.
Convert dictionary into JSON (string)
import json
mydict = { "name" : "Don",
"surname" : "Mandol",
"age" : 43}
result = json.dumps(mydict)
print(result[0:20])
will get you:
{"name": "Don", "sur
Convert string into dictionary
back_to_mydict = json.loads(result)
In Chinese language you should do the following adjustments:
import codecs
fout = codecs.open("xxx.json", "w", "utf-8")
dict_to_json = json.dumps({'text':"中文"},ensure_ascii=False,indent=2)
fout.write(dict_to_json + '\n')
You may find the json.dumps() method needs help handling some object types.
Credit goes to the top answer of this post for the following:
import json
json.dumps(my_dictionary, indent=4, sort_keys=True, default=str)
I think you should consider using the shelve module which provides persistent file-backed dictionary-like objects. It's easy to use in place of a "real" dictionary because it almost transparently provides your program with something that can be used just like a dictionary, without the need to explicitly convert it to a string and then write to a file (or vice-versa).
The main difference is needing to initially open() it before first use and then close() it when you're done (and possibly sync()ing it, depending on the writeback option being used). Any "shelf" file objects create can contain regular dictionaries as values, allowing them to be logically nested.
Here's a trivial example:
import shelve
shelf = shelve.open('mydata') # open for reading and writing, creating if nec
shelf.update({'one':1, 'two':2, 'three': {'three.1': 3.1, 'three.2': 3.2 }})
shelf.close()
shelf = shelve.open('mydata')
print shelf
shelf.close()
Output:
{'three': {'three.1': 3.1, 'three.2': 3.2}, 'two': 2, 'one': 1}
If you care about the speed use ujson (UltraJSON), which has the same API as json:
import ujson
ujson.dumps([{"key": "value"}, 81, True])
# '[{"key":"value"},81,true]'
ujson.loads("""[{"key": "value"}, 81, true]""")
# [{u'key': u'value'}, 81, True]
I use yaml for that if needs to be readable (neither JSON nor XML are that IMHO), or if reading is not necessary I use pickle.
Write
from pickle import dumps, loads
x = dict(a=1, b=2)
y = dict(c = x, z=3)
res = dumps(y)
open('/var/tmp/dump.txt', 'w').write(res)
Read back
from pickle import dumps, loads
rev = loads(open('/var/tmp/dump.txt').read())
print rev
I figured out the problem was not with my dict object it was the keys and values that were of RubyString type after loading it with RubyMarshl 'loads' method
So i did this:
dic_items = dict.items()
new_dict = {str(key): str(value) for key, value in dic_items}