Convert json dict - python

I have pulled JSON data from a url. The result is a dictionary. How can I transform this dictionary so metric is a column, and the time is the index for each value
Thanks in advance
time------------------------AdrActCnt-----BlkCnt------BlkSizeByte
2021-01-28T00:00:00.000Z----1097896.0-----145.0-------190568423.0
2021-01-29T00:00:00.000Z----1208741.0-----152.0-------199725189.0
2021-01-29T00:00:00.000Z----1087755.0-----136.0-------177349536.0
Output:
{"metricData":{"metrics":["AdrActCnt","BlkCnt","BlkSizeByte"],"series":
[{"time":"2021-01-28T00:00:00Z","values"["1097896.0","145.0","190568423.0"]},
{"time":"2021-01-29T00:00:00Z","values":["1208741.0","152.0","199725189.0"]},
{"time":"2021-01-30T00:00:00Z","values":["1087755.0","136.0","177349536.0"]}

You may be looking for a dict comprehension, which is similar to a list comprehension, just creates a dictionary at the end:
liststuff = [{"time":"2021-01-28T00:00:00.000Z","values":["1097896.0","145.0","190568423.0"]},{"time":"2021-01-29T00:00:00.000Z","values":["1208741.0","152.0","199725189.0"]},{"time":"2021-01-30T00:00:00.000Z","values":["1087755.0","136.0","177349536.0"]}]
dictstuff = {item['time']:item['values'] for item in liststuff}
print(dictstuff)
{'2021-01-28T00:00:00.000Z': ['1097896.0', '145.0', '190568423.0'], '2021-01-29T00:00:00.000Z': ['1208741.0', '152.0', '199725189.0'], '2021-01-30T00:00:00.000Z': ['1087755.0', '136.0', '177349536.0']}
liststuff is your data, just needed [] wrapping (I assume that's a typo in the question, it's not valid JSON without the brackets). If you need help with parsing the string, use json.loads() (from the json module) to make it actual Python data:
import json
jsonstuff = '[{"time":"2021-01-28T00:00:00.000Z","values":["1097896.0","145.0","190568423.0"]},{"time":"2021-01-29T00:00:00.000Z","values":["1208741.0","152.0","199725189.0"]},{"time":"2021-01-30T00:00:00.000Z","values":["1087755.0","136.0","177349536.0"]}]'
liststuff = json.loads(jsonstuff)
(here jsonstuff is the string you've downloaded)

Related

Remove 'u' in JSON dictionary

I hope to use the dictionary load by json file. However, each item contains the character 'u'. I need to remove the 'u's.
I tried dumps, but it does not work.
import ast
import json
data= {u'dot',
u'dog',
u'fog',
u'eeee'}
res = eval(json.dumps(data))
print res
I hope to get: {
'dot',
'dog',
'fog,
'eeee'
}
But the error is:
TypeError: set([u'eeee', u'fog', u'dog', u'dot']) is not JSON serializable
The strings that start with u are unicode strings.
In your case, this has nothing to do with the problem:
data= {u'dot',
u'dog',
u'fog',
u'eeee'}
This creates a set and stores the results in the data variable. The json serializer can't handle sets since the json spec makes no mention of them. If you change this to be a list, the serializer can handle the data:
res = set(eval(json.dumps(list(data))))
Here I'm converting the data variable to a list to serialize it, then converting it back to a set to store the result in the res variable.
Alternatively, you can directly ask Python to convert the unicode strings to strings, using something like this:
res = {x.encode("utf-8") for x in data}
print(res)

Process malformed JSON string in Python

I'm trying to process a log from Symphony using Pandas, but have some trouble with a malformed JSON which I can't parse.
An example of the log :
'{id:46025,
work_assignment:43313=>43313,
declaration:<p><strong>Bijkomende interventie.</strong></p>\r\n\r\n<p>H </p>\r\n\r\n<p><strong><em>Vaststellingen.</em></strong></p>\r\n\r\n<p><strong><em>CV. </em></strong>De.</p>=><p><strong>Bijkomende interventie.</strong></p>\r\n\r\n<p>He </p>\r\n\r\n<p><strong><em>Vaststellingen.</em></strong></p>\r\n\r\n<p><strong><em>CV. </em></strong>De.</p>,conclusions:<p>H </p>=><p>H </p>}'
What is the best way to process this?
For each part (id/work_assignment/declaration/etc) I would like to retrieve the old and new value (which are separated by "=>").
Use the following code:
def clean(my_log):
my_log.replace("{", "").replace("}", "") # Removes the unneeded { }
my_items = list(my_log.split(",")) # Split at the comma to get the pairs
my_dict = {}
for i in my_items:
key, value = i.split(":") # Split at the colon to separate the key and value
my_dict[key] = value # Add to the dictionary
return my_dict
Function returns a Python dictionary, which can then be converted to JSON using a serializer if needed, or directly used.
Hope I helped :D

Converting a dataframe into JSON (in pyspark) and then selecting desired fields

I'm new to Spark. I have a dataframe that contains the results of some analysis. I converted that dataframe into JSON so I could display it in a Flask App:
results = result.toJSON().collect()
An example entry in my json file is below. I then tried to run a for loop in order to get specific results:
{"userId":"1","systemId":"30","title":"interest"}
for i in results:
print i["userId"]
This doesn't work at all and I get errors such as: Python (json) : TypeError: expected string or buffer
I used json.dumps and json.loads and still nothing - I keep on getting errors such as string indices must be integers, as well as the above error.
I then tried this:
print i[0]
This gave me the first character in the json "{" instead of the first line. I don't really know what to do, can anyone tell me where I'm going wrong?
Many Thanks.
If the result of result.toJSON().collect() is a JSON encoded string, then you would use json.loads() to convert it to a dict. The issue you're running into is that when you iterate a dict with a for loop, you're given the keys of the dict. In your for loop, you're treating the key as if it's a dict, when in fact it is just a string. Try this:
# toJSON() turns each row of the DataFrame into a JSON string
# calling first() on the result will fetch the first row.
results = json.loads(result.toJSON().first())
for key in results:
print results[key]
# To decode the entire DataFrame iterate over the result
# of toJSON()
def print_rows(row):
data = json.loads(row)
for key in data:
print "{key}:{value}".format(key=key, value=data[key])
results = result.toJSON()
results.foreach(print_rows)
EDIT: The issue is that collect returns a list, not a dict. I've updated the code. Always read the docs.
collect() Return a list that contains all of the elements in this RDD.
Note This method should only be used if the resulting array is
expected to be small, as all the data is loaded into the driver’s
memory.
EDIT2: I can't emphasize enough, always read the docs.
EDIT3: Look here.
import json
>>> df = sqlContext.read.table("n1")
>>> df.show()
+-----+-------+----+---------------+-------+----+
| c1| c2| c3| c4| c5| c6|
+-----+-------+----+---------------+-------+----+
|00001|Content| 1|Content-article| |2018|
|00002|Content|null|Content-article|Content|2015|
+-----+-------+----+---------------+-------+----+
>>> results = df.toJSON().map(lambda j: json.loads(j)).collect()
>>> for i in results: print i["c1"], i["c6"]
...
00001 2018
00002 2015
Here is what worked for me:
df_json = df.toJSON()
for row in df_json.collect():
#json string
print(row)
#json object
line = json.loads(row)
print(line[some_key])
Keep in mind that using .collect() is not advisable, since it collects the distributed data frames, and defeats the purpose of using data frames.
To get an array of python dicts:
results = df.toJSON().map(json.loads).collect()
To get an array of JSON strings:
results = df.toJSON().collect()
To get a JSON string (i.e. a JSON string of an array):
results = df.toPandas().to_json(orient='records')
and using that to get an array of Python dicts:
results = json.loads(df.toPandas().to_json(orient='records'))

How can I parse a dictionary string?

I am trying to convert a string to a dictionary with dict function, like this
import json
p = "{'id':'12589456'}"
d = dict(p)
print d['id']
But I get the following error
ValueError: dictionary update sequence element #0 has length 1; 2 is required
Why does it fail? How can I fix this?
What you have is a string, but dict function can only iterate over tuples (key-value pairs) to construct a dictionary. See the examples given in the dict's documentation.
In this particular case, you can use ast.literal_eval to convert the string to the corresponding dict object, like this
>>> p = "{'id':'12589456'}"
>>> from ast import literal_eval
>>> d = literal_eval(p)
>>> d['id']
'12589456'
Since p is a string containing JSON (ish), you have to load it first to get back a Python dictionary. Then you can access items within it:
p = '{"id":"12589456"}'
d = json.loads(p)
print d["id"]
However, note that the value in p is not actually JSON; JSON demands (and the Python json module enforces) that strings are quoted with double-quotes, not single quotes. I've updated it in my example here, but depending on where you got your example from, you might have more to do.

Trying to iterate over a JSON object

I am trying to iterate over a JSON object, using simplejson.
def main(arg1):
response = urllib2.urlopen("http://search.twitter.com/search.json?q=" + arg1) #+ "&rpp=100&page=15")
twitsearch = simplejson.load(response)
twitsearch = twitsearch['results']
twitsearch = twitsearch['text']
print twitsearch
I am passing a list of values to search for in Twitter, like "I'm", "Think", etc.
The problem is that there are multiple text fields, one each for every Tweet. I want to iterate over the entire JSON object, pulling out the "text" field.
How would I do this? I'm reading the documentation and can't see exactly where it talks about this.
EDIT: It appears to be stored as a list of JSON objects.
Trying to do this:
for x in twitsearch:
x['text']
How would I store x['text'] in a list? Append?
Note that
twitsearch['results']
is a Python list. You can iterate over that list, storing the text component of each of those objects in your own list. A list comprehension would be a good thing to use here.
text_list = [x['text'] for x in twitsearch['results']]
Easy. Figured it out.
tweets = []
for x in twitsearch:
tweets.append(x['text'])

Categories