Related
I am pulling from an API in URL, and keep getting string indices must be integers error code thrown in python. I am wondering how to fix this? (The url is replaced with "url").
import urllib.request
import json
link = "url"
f = urllib.request.urlopen(link)
data = f.read()
print (str(data, 'utf-8'))
weather = json.loads(data)
print('/n')
print(weather["name"]["temp"])
Here's a sample of the json data:
{"coord":{"lon":-94.2166,"lat":36.4676},"weather":[{"id":600,"main":"Snow","description":"light snow","icon":"13n"}],"base":"stations","main":{"temp":262.83,"feels_like":255.36,"temp_min":262.04,"temp_max":263.71,"pressure":1025,"humidity":92},"visibility":10000,"wind":{"speed":6.17,"deg":30},"clouds":{"all":90},"dt":1613195709,"sys":{"type":1,"id":5695,"country":"US","sunrise":1613135272,"sunset":1613174070},"timezone":-21600,"id":0,"name":"Bella Vista","cod":200}
Here is the data snippet you shared formatted in a way that it makes it easier to see what is going on. As you can see, while weather["name"] is valid, weather["name"]["temp"] is not, and that is what is producing the error you are seeing. Instead, weather["main"]["temp"] will display the temperature value in that dictionary.
weather = {'base': 'stations',
'clouds': {'all': 90},
'cod': 200,
'coord': {'lat': 36.4676, 'lon': -94.2166},
'dt': 1613195709,
'id': 0,
'main': {'feels_like': 255.36,
'humidity': 92,
'pressure': 1025,
'temp': 262.83,
'temp_max': 263.71,
'temp_min': 262.04},
'name': 'Bella Vista',
'sys': {'country': 'US',
'id': 5695,
'sunrise': 1613135272,
'sunset': 1613174070,
'type': 1},
'timezone': -21600,
'visibility': 10000,
'weather': [{'description': 'light snow',
'icon': '13n',
'id': 600,
'main': 'Snow'}],
'wind': {'deg': 30, 'speed': 6.17}}
js part
$('#btnUpdate').click(function(){
var formData = JSON.stringify($("#contrast_rule_set").serializeArray());
$.ajax({
type: "POST",
url: "./contrast_rule_set",
data: formData,
success: function(){},
dataType: "json",
contentType : "application/json"
});
})
python part
#app.route('/get_test', methods=['GET','POST'])
def get_test():
web_form_data = request.json
print(web_form_data)
print(type(web_form_data))
print(jsonify(web_form_data))
print(json.dumps(web_form_data))
python print console like
[{'name': 'logic_1', 'value': '1'}, {'name': 'StudyDescription_1', 'value': ''}, {'name': 'SeriesDescription_1', 'value': 'C\\+'}, {'name': 'ImageComments_1', 'value': ''}, {'name': 'logic_2', 'value': '1'}, {'name': 'StudyDescription_2', 'value': '\\-C'}, {'name': 'SeriesDescription_2', 'value': '\\-C'}, {'name': 'ImageComments_2', 'value': '\\-C'}, {'name': 'logic_3', 'value': '1'}, {'name': 'StudyDescription_3', 'value': ''}, {'name': 'SeriesDescription_3', 'value': '\\+C'}, {'name': 'ImageComments_3', 'value': '\\+C'}]
<class 'list'>
how to get list to json data type (or converter ) (html side code adjust or python side code adjust? )
then hope to get data like json type (data is from my another json file )
{
'Logic': 'AND',
'StudyDescription': '',
'SeriesDescription': 'C\+',
'ImageComments': ''
},
{
'Logic': 'NOT',
'StudyDescription': '\-C',
'SeriesDescription': '\-C',
'ImageComments': '\-C'
},
{
'Logic': 'AND',
'StudyDescription': '',
'SeriesDescription': '\+C',
'ImageComments': '\+C'
}
I am having issues pulling a JSON array from http://api.divesites.com.
When I manually set the data as a string in Python 3.7, I can see the array.
import json
def main():
hardcode = """{"request":{"str":null,"timestamp":1572865590,"loc":{"lat":"50.442","lng":"-4.08279999999999"},"mode":"sites","dist":"9","api":1},"sites":[{"currents":null,"distance":"7.47","hazards":null,"lat":"50.3378","name":"Fort Bovisand","water":null,"marinelife":null,"description":null,"maxdepth":null,"mindepth":null,"predive":null,"id":"24387","equipment":null,"lng":"-4.1285"},{"currents":null,"distance":"7.93","hazards":null,"lat":"50.3352","name":"Plymouth Breakwater","water":null,"marinelife":null,"description":null,"maxdepth":null,"mindepth":null,"predive":null,"id":"24388","equipment":null,"lng":"-4.1485"}],"version":1,"loc":{"lat":"50.442","lng":"-4.08279999999999"},"result":true}
"""
print("Original Data: ---", get_data(hardcode))
print("Site Data:----", get_data(hardcode)['sites'])
def get_data(data):
return json.loads(data)
if __name__=='__main__':
main()
Output
Original Data: --- {'request': {'str': None, 'timestamp': 1572865590, 'loc': {'lat': '50.442', 'lng': '-4.08279999999999'}, 'mode': 'sites', 'dist': '9', 'api': 1}, 'sites': [{'currents': None, 'distance': '7.47', 'hazards': None, 'lat': '50.3378', 'name': 'Fort Bovisand', 'water': None, 'marinelife': None, 'description': None, 'maxdepth': None, 'mindepth': None, 'predive': None, 'id': '24387', 'equipment': None, 'lng': '-4.1285'}, {'currents': None, 'distance': '7.93', 'hazards': None, 'lat': '50.3352', 'name': 'Plymouth Breakwater', 'water': None, 'marinelife': None, 'description': None, 'maxdepth': None, 'mindepth': None, 'predive': None, 'id': '24388', 'equipment': None, 'lng': '-4.1485'}], 'version': 1, 'loc': {'lat': '50.442', 'lng': '-4.08279999999999'}, 'result': True}
Site Data:---- [{'currents': None, 'distance': '7.47', 'hazards': None, 'lat': '50.3378', 'name': 'Fort Bovisand', 'water': None, 'marinelife': None, 'description': None, 'maxdepth': None, 'mindepth': None, 'predive': None, 'id': '24387', 'equipment': None, 'lng': '-4.1285'}, {'currents': None, 'distance': '7.93', 'hazards': None, 'lat': '50.3352', 'name': 'Plymouth Breakwater', 'water': None, 'marinelife': None, 'description': None, 'maxdepth': None, 'mindepth': None, 'predive': None, 'id': '24388', 'equipment': None, 'lng': '-4.1485'}]
However when I try and perform a GET request (using urllib.request.urlopen()), I only receive the objects.
import urllib.request
import json
def get_jsonparsed_data(url):
response = urllib.request.urlopen(url)
data = response.read().decode("utf-8")
return json.loads(data)
def main():
url = 'http://api.divesites.com/?mode=sites&lat=-50.350874&lng=175.849890&dist=12'
print("Original Data: ---", get_jsonparsed_data(url))
print("Sites: ----", get_jsonparsed_data(url)['sites'])
if __name__=='__main__':
main()
Output
Original Data: --- {'request': {'str': None, 'timestamp': 1572866211, 'loc': {'lat': '-50.350874', 'lng': '175.849890'}, 'mode': 'sites', 'dist': '12', 'api': 1}, 'sites': [], 'version': 1, 'loc': {'lat': '-50.350874', 'lng': '175.849890'}, 'result': True}
Site Data:---- []
Am I doing something wrong, or do I need an extra step?
Solved it. I spent so long focusing on debugging the code I forgot to check the final API call. Turns out I either didn't have enough distance set, or the wrong starting longitude and latitude. This was different to my hard coded JSON.
The URL should have been: http://api.divesites.com/?mode=sites&lat=50.442&lng=-4.08279999999999&dist=9
Stupid mistake. Thanks for helping though. It got me to look.
First, I'd use the requests library instead and then you can find the solution here
I have a csv with 500+ rows where one column "_source" is stored as JSON. I want to extract that into a pandas dataframe. I need each key to be its own column. #I have a 1 mb Json file of online social media data that I need to convert the dictionary and key values into their own separate columns. The social media data is from Facebook,Twitter/web crawled... etc. There are approximately 528 separate rows of posts/tweets/text with each having many dictionaries inside dictionaries. I am attaching a few steps from my Jupyter notebook below to give a more complete understanding. need to turn all key value pairs for dictionaries inside dictionaries into columns inside a dataframe
Thank you so much this will be a huge help!!!
I have tried changing it to a dataframe by doing this
source = pd.DataFrame.from_dict(source, orient='columns')
And it returns something like this... I thought it might unpack the dictionary but it did not.
#source.head()
#_source
#0 {'sub_organization_id': 'default', 'uid': 'aba...
#1 {'sub_organization_id': 'default', 'uid': 'ab0...
#2 {'sub_organization_id': 'default', 'uid': 'ac0...
below is the shape
#source.shape (528, 1)
below is what the an actual "_source" row looks like stretched out. There are many dictionaries and key:value pairs where each key needs to be its own column. Thanks! The actual links have been altered/scrambled for privacy reasons.
{'sub_organization_id': 'default',
'uid': 'ac0fafe9ba98327f2d0c72ddc365ffb76336czsa13280b',
'project_veid': 'default',
'campaign_id': 'default',
'organization_id': 'default',
'meta': {'rule_matcher': [{'atribs': {'website': 'github.com/res',
'source': 'Explicit',
'version': '1.1',
'type': 'crawl'},
'results': [{'rule_type': 'hashtag',
'rule_tag': 'Far',
'description': None,
'project_veid': 'A7180EA-7078-0C7F-ED5D-86AD7',
'campaign_id': '2A6DA0C-365BB-67DD-B05830920',
'value': '#Far',
'organization_id': None,
'sub_organization_id': None,
'appid': 'ray',
'project_id': 'CDE2F42-5B87-C594-C900E578C',
'rule_id': '1838',
'node_id': None,
'metadata': {'campaign_title': 'AF',
'project_title': 'AF '}}]}],
'render': [{'attribs': {'website': 'github.com/res',
'version': '1.0',
'type': 'Page Render'},
'results': [{'render_status': 'success',
'path': 'https://east.amanaws.com/rays-ime-store/renders/b/b/70f7dffb8b276f2977f8a13415f82c.jpeg',
'image_hash': 'bb7674b8ea3fc05bfd027a19815f82c',
'url': 'https://discooprdapp.com/',
'load_time': 32}]}]},
'norm_attribs': {'website': 'github.com/res',
'version': '1.1',
'type': 'crawl'},
'project_id': 'default',
'system_timestamp': '2019-02-22T19:04:53.569623',
'doc': {'appid': 'subtter',
'links': [],
'response_url': 'https://discooprdapp.com',
'url': 'https://discooprdapp.com/',
'status_code': 200,
'status_msg': 'OK',
'encoding': 'utf-8',
'attrs': {'uid': '2ab8f2651cb32261b911c990a8b'},
'timestamp': '2019-02-22T19:04:53.963',
'crawlid': '7fd95-785-4dd259-fcc-8752f'},
'type': 'crawl',
'norm': {'body': '\n',
'domain': 'discordapp.com',
'author': 'crawl',
'url': 'https://discooprdapp.com',
'timestamp': '2019-02-22T19:04:53.961283+00:00',
'id': '7fc5-685-4dd9-cc-8762f'}}
before you post make sure the actual code works for the data attached. Thanks!
The below code I tried but it did not work there was a syntax error that I could not figure out.
pd.io.json.json_normalize(source_data.[_source].apply(json.loads))
pd.io.json.json_normalize(source_data.[_source].apply(json.loads))
^
SyntaxError: invalid syntax
Whoever can help me with this will be a saint!
I had to do something like that a while back. Basically I used a function that completely flattened out the json to identify the keys that would be turned into the columns, then iterated through the json to reconstruct a row and append each row into a "results" dataframe. So with the data you provided, it created 52 column row and looking through it, looks like it included all the keys into it's own column. Anything nested, for example: 'meta': {'rule_matcher':[{'atribs': {'website': ...]} should then have a column name meta.rule_matcher.atribs.website where the '.' denotes those nested keys
data_source = {'sub_organization_id': 'default',
'uid': 'ac0fafe9ba98327f2d0c72ddc365ffb76336czsa13280b',
'project_veid': 'default',
'campaign_id': 'default',
'organization_id': 'default',
'meta': {'rule_matcher': [{'atribs': {'website': 'github.com/res',
'source': 'Explicit',
'version': '1.1',
'type': 'crawl'},
'results': [{'rule_type': 'hashtag',
'rule_tag': 'Far',
'description': None,
'project_veid': 'A7180EA-7078-0C7F-ED5D-86AD7',
'campaign_id': '2A6DA0C-365BB-67DD-B05830920',
'value': '#Far',
'organization_id': None,
'sub_organization_id': None,
'appid': 'ray',
'project_id': 'CDE2F42-5B87-C594-C900E578C',
'rule_id': '1838',
'node_id': None,
'metadata': {'campaign_title': 'AF',
'project_title': 'AF '}}]}],
'render': [{'attribs': {'website': 'github.com/res',
'version': '1.0',
'type': 'Page Render'},
'results': [{'render_status': 'success',
'path': 'https://east.amanaws.com/rays-ime-store/renders/b/b/70f7dffb8b276f2977f8a13415f82c.jpeg',
'image_hash': 'bb7674b8ea3fc05bfd027a19815f82c',
'url': 'https://discooprdapp.com/',
'load_time': 32}]}]},
'norm_attribs': {'website': 'github.com/res',
'version': '1.1',
'type': 'crawl'},
'project_id': 'default',
'system_timestamp': '2019-02-22T19:04:53.569623',
'doc': {'appid': 'subtter',
'links': [],
'response_url': 'https://discooprdapp.com',
'url': 'https://discooprdapp.com/',
'status_code': 200,
'status_msg': 'OK',
'encoding': 'utf-8',
'attrs': {'uid': '2ab8f2651cb32261b911c990a8b'},
'timestamp': '2019-02-22T19:04:53.963',
'crawlid': '7fd95-785-4dd259-fcc-8752f'},
'type': 'crawl',
'norm': {'body': '\n',
'domain': 'discordapp.com',
'author': 'crawl',
'url': 'https://discooprdapp.com',
'timestamp': '2019-02-22T19:04:53.961283+00:00',
'id': '7fc5-685-4dd9-cc-8762f'}}
Code:
def flatten_json(y):
out = {}
def flatten(x, name=''):
if type(x) is dict:
for a in x:
flatten(x[a], name + a + '_')
elif type(x) is list:
i = 0
for a in x:
flatten(a, name + str(i) + '_')
i += 1
else:
out[name[:-1]] = x
flatten(y)
return out
flat = flatten_json(data_source)
import pandas as pd
import re
results = pd.DataFrame()
special_cols = []
columns_list = list(flat.keys())
for item in columns_list:
try:
row_idx = re.findall(r'\_(\d+)\_', item )[0]
except:
special_cols.append(item)
continue
column = re.findall(r'\_\d+\_(.*)', item )[0]
column = re.sub(r'\_\d+\_', '.', column)
row_idx = int(row_idx)
value = flat[item]
results.loc[row_idx, column] = value
for item in special_cols:
results[item] = flat[item]
Output:
print (results.to_string())
atribs_website atribs_source atribs_version atribs_type results.rule_type results.rule_tag results.description results.project_veid results.campaign_id results.value results.organization_id results.sub_organization_id results.appid results.project_id results.rule_id results.node_id results.metadata_campaign_title results.metadata_project_title attribs_website attribs_version attribs_type results.render_status results.path results.image_hash results.url results.load_time sub_organization_id uid project_veid campaign_id organization_id norm_attribs_website norm_attribs_version norm_attribs_type project_id system_timestamp doc_appid doc_response_url doc_url doc_status_code doc_status_msg doc_encoding doc_attrs_uid doc_timestamp doc_crawlid type norm_body norm_domain norm_author norm_url norm_timestamp norm_id
0 github.com/res Explicit 1.1 crawl hashtag Far NaN A7180EA-7078-0C7F-ED5D-86AD7 2A6DA0C-365BB-67DD-B05830920 #Far NaN NaN ray CDE2F42-5B87-C594-C900E578C 1838 NaN AF AF github.com/res 1.0 Page Render success https://east.amanaws.com/rays-ime-store/render... bb7674b8ea3fc05bfd027a19815f82c https://discooprdapp.com/ 32.0 default ac0fafe9ba98327f2d0c72ddc365ffb76336czsa13280b default default default github.com/res 1.1 crawl default 2019-02-22T19:04:53.569623 subtter https://discooprdapp.com https://discooprdapp.com/ 200 OK utf-8 2ab8f2651cb32261b911c990a8b 2019-02-22T19:04:53.963 7fd95-785-4dd259-fcc-8752f crawl \n discordapp.com crawl https://discooprdapp.com 2019-02-22T19:04:53.961283+00:00 7fc5-685-4dd9-cc-8762f
I have a list of dictionaries in which I am trying to remove any dictionary should the value of a certain key is None, it will be removed.
item_dict = [
{'code': 'aaa0000',
'id': 415294,
'index_range': '10-33',
'location': 'A010',
'type': 'True'},
{'code': 'bbb1458',
'id': 415575,
'index_range': '30-62',
'location': None,
'type': 'True'},
{'code': 'ccc3013',
'id': 415575,
'index_range': '14-59',
'location': 'C041',
'type': 'True'}
]
for item in item_dict:
filtered = dict((k,v) for k,v in item.iteritems() if v is not None)
# Output Results
# Item - aaa0000 is missing
# {'index_range': '14-59', 'code': 'ccc3013', 'type': 'True', 'id': 415575, 'location': 'C041'}
In my example, the output result is missing one of the dictionary and if I tried to create a new list to append filtered, item bbb1458 will be included in the list as well.
How can I rectify this?
[item for item in item_dict if None not in item.values()]
Each item in this list is a dictionary. And a dictionary is only appended to this list if None does not appear in the dictionary values.
You can create a new list using a list comprehension, filtering on the condition that all values are not None:
item_dict = [
{'code': 'aaa0000',
'id': 415294,
'index_range': '10-33',
'location': 'A010',
'type': 'True'},
{'code': 'bbb1458',
'id': 415575,
'index_range': '30-62',
'location': None,
'type': 'True'},
{'code': 'ccc3013',
'id': 415575,
'index_range': '14-59',
'location': 'C041',
'type': 'True'}
]
filtered = [d for d in item_dict if all(value is not None for value in d.values())]
print(filtered)
#[{'index_range': '10-33', 'id': 415294, 'location': 'A010', 'type': 'True', 'code': 'aaa0000'}, {'index_range': '14-59', 'id': 415575, 'location': 'C041', 'type': 'True', 'code': 'ccc3013'}]