I have several lists of data which looks like this:
ISIN Currency Rates
26545555 Eur 0.12345
56554455 Eur 0.25665
75884554 Eur 0.89654
I want to save this data into a dictionary or json like format.
So I am trying to store the following data:
id: 0, ISIN: 26545555, Currency: Eur, Rates: 0.12345
id: 1, ISIN: 56554455, Currency: Eur, Rates: 0.25665
The problem is I am trying to use the following dictionary:
dict_data = {'id': '', 'ISIN': '', 'Currency': ''}
But when I try to append data it doesn't store all of the data.
I am getting the data from an Excel sheet using Pandas. If you think I should use something else, please let me know.
You should use list of dicts:
[
{'id': 0, 'ISIN': '', 'Currency': ''},
{'id': 1, 'ISIN': '', 'Currency': ''}
]
or if you want dict:
{
0: {'ISIN': '', 'Currency': ''},
1: {'ISIN': '', 'Currency': ''}
}
or(partly based on Marco's suggestion):
{
0: [26545555, 'Eur', 0.12345],
1: [26545555, 'Eur', 0.12345]
}
But personally I prefer second variant if you will need to access elements by ID.
One item is a dict, and group all data in a list
[
{'id': 0, 'ISIN': '', 'Currency': ''},
{'id': 1, 'ISIN': '', 'Currency': ''}
]
Related
When I click on a treeview item it outputs
{'text': 1, 'image': '', 'values': [1, '3:18:00', 'pm'], 'open': 0, 'tags': ''}
How do I retrieve the specific values like the 1 or pm ? I used
queryResultTable.bind('<ButtonRelease-1>', select_item)
def select_item(a):
itemlibrary = queryResultTable.focus()
print(queryResultTable.item(itemlibrary))
I tried .get but couldn't really get anywhere
Try this:
a = {'text': 1, 'image': '', 'values': [1, '3:18:00', 'pm'], 'open': 0, 'tags': ''}
print(a['values'][0])
print(a['values'][2])
This should give you:
1
pm
I have a csv with 500+ rows where one column "_source" is stored as JSON. I want to extract that into a pandas dataframe. I need each key to be its own column. #I have a 1 mb Json file of online social media data that I need to convert the dictionary and key values into their own separate columns. The social media data is from Facebook,Twitter/web crawled... etc. There are approximately 528 separate rows of posts/tweets/text with each having many dictionaries inside dictionaries. I am attaching a few steps from my Jupyter notebook below to give a more complete understanding. need to turn all key value pairs for dictionaries inside dictionaries into columns inside a dataframe
Thank you so much this will be a huge help!!!
I have tried changing it to a dataframe by doing this
source = pd.DataFrame.from_dict(source, orient='columns')
And it returns something like this... I thought it might unpack the dictionary but it did not.
#source.head()
#_source
#0 {'sub_organization_id': 'default', 'uid': 'aba...
#1 {'sub_organization_id': 'default', 'uid': 'ab0...
#2 {'sub_organization_id': 'default', 'uid': 'ac0...
below is the shape
#source.shape (528, 1)
below is what the an actual "_source" row looks like stretched out. There are many dictionaries and key:value pairs where each key needs to be its own column. Thanks! The actual links have been altered/scrambled for privacy reasons.
{'sub_organization_id': 'default',
'uid': 'ac0fafe9ba98327f2d0c72ddc365ffb76336czsa13280b',
'project_veid': 'default',
'campaign_id': 'default',
'organization_id': 'default',
'meta': {'rule_matcher': [{'atribs': {'website': 'github.com/res',
'source': 'Explicit',
'version': '1.1',
'type': 'crawl'},
'results': [{'rule_type': 'hashtag',
'rule_tag': 'Far',
'description': None,
'project_veid': 'A7180EA-7078-0C7F-ED5D-86AD7',
'campaign_id': '2A6DA0C-365BB-67DD-B05830920',
'value': '#Far',
'organization_id': None,
'sub_organization_id': None,
'appid': 'ray',
'project_id': 'CDE2F42-5B87-C594-C900E578C',
'rule_id': '1838',
'node_id': None,
'metadata': {'campaign_title': 'AF',
'project_title': 'AF '}}]}],
'render': [{'attribs': {'website': 'github.com/res',
'version': '1.0',
'type': 'Page Render'},
'results': [{'render_status': 'success',
'path': 'https://east.amanaws.com/rays-ime-store/renders/b/b/70f7dffb8b276f2977f8a13415f82c.jpeg',
'image_hash': 'bb7674b8ea3fc05bfd027a19815f82c',
'url': 'https://discooprdapp.com/',
'load_time': 32}]}]},
'norm_attribs': {'website': 'github.com/res',
'version': '1.1',
'type': 'crawl'},
'project_id': 'default',
'system_timestamp': '2019-02-22T19:04:53.569623',
'doc': {'appid': 'subtter',
'links': [],
'response_url': 'https://discooprdapp.com',
'url': 'https://discooprdapp.com/',
'status_code': 200,
'status_msg': 'OK',
'encoding': 'utf-8',
'attrs': {'uid': '2ab8f2651cb32261b911c990a8b'},
'timestamp': '2019-02-22T19:04:53.963',
'crawlid': '7fd95-785-4dd259-fcc-8752f'},
'type': 'crawl',
'norm': {'body': '\n',
'domain': 'discordapp.com',
'author': 'crawl',
'url': 'https://discooprdapp.com',
'timestamp': '2019-02-22T19:04:53.961283+00:00',
'id': '7fc5-685-4dd9-cc-8762f'}}
before you post make sure the actual code works for the data attached. Thanks!
The below code I tried but it did not work there was a syntax error that I could not figure out.
pd.io.json.json_normalize(source_data.[_source].apply(json.loads))
pd.io.json.json_normalize(source_data.[_source].apply(json.loads))
^
SyntaxError: invalid syntax
Whoever can help me with this will be a saint!
I had to do something like that a while back. Basically I used a function that completely flattened out the json to identify the keys that would be turned into the columns, then iterated through the json to reconstruct a row and append each row into a "results" dataframe. So with the data you provided, it created 52 column row and looking through it, looks like it included all the keys into it's own column. Anything nested, for example: 'meta': {'rule_matcher':[{'atribs': {'website': ...]} should then have a column name meta.rule_matcher.atribs.website where the '.' denotes those nested keys
data_source = {'sub_organization_id': 'default',
'uid': 'ac0fafe9ba98327f2d0c72ddc365ffb76336czsa13280b',
'project_veid': 'default',
'campaign_id': 'default',
'organization_id': 'default',
'meta': {'rule_matcher': [{'atribs': {'website': 'github.com/res',
'source': 'Explicit',
'version': '1.1',
'type': 'crawl'},
'results': [{'rule_type': 'hashtag',
'rule_tag': 'Far',
'description': None,
'project_veid': 'A7180EA-7078-0C7F-ED5D-86AD7',
'campaign_id': '2A6DA0C-365BB-67DD-B05830920',
'value': '#Far',
'organization_id': None,
'sub_organization_id': None,
'appid': 'ray',
'project_id': 'CDE2F42-5B87-C594-C900E578C',
'rule_id': '1838',
'node_id': None,
'metadata': {'campaign_title': 'AF',
'project_title': 'AF '}}]}],
'render': [{'attribs': {'website': 'github.com/res',
'version': '1.0',
'type': 'Page Render'},
'results': [{'render_status': 'success',
'path': 'https://east.amanaws.com/rays-ime-store/renders/b/b/70f7dffb8b276f2977f8a13415f82c.jpeg',
'image_hash': 'bb7674b8ea3fc05bfd027a19815f82c',
'url': 'https://discooprdapp.com/',
'load_time': 32}]}]},
'norm_attribs': {'website': 'github.com/res',
'version': '1.1',
'type': 'crawl'},
'project_id': 'default',
'system_timestamp': '2019-02-22T19:04:53.569623',
'doc': {'appid': 'subtter',
'links': [],
'response_url': 'https://discooprdapp.com',
'url': 'https://discooprdapp.com/',
'status_code': 200,
'status_msg': 'OK',
'encoding': 'utf-8',
'attrs': {'uid': '2ab8f2651cb32261b911c990a8b'},
'timestamp': '2019-02-22T19:04:53.963',
'crawlid': '7fd95-785-4dd259-fcc-8752f'},
'type': 'crawl',
'norm': {'body': '\n',
'domain': 'discordapp.com',
'author': 'crawl',
'url': 'https://discooprdapp.com',
'timestamp': '2019-02-22T19:04:53.961283+00:00',
'id': '7fc5-685-4dd9-cc-8762f'}}
Code:
def flatten_json(y):
out = {}
def flatten(x, name=''):
if type(x) is dict:
for a in x:
flatten(x[a], name + a + '_')
elif type(x) is list:
i = 0
for a in x:
flatten(a, name + str(i) + '_')
i += 1
else:
out[name[:-1]] = x
flatten(y)
return out
flat = flatten_json(data_source)
import pandas as pd
import re
results = pd.DataFrame()
special_cols = []
columns_list = list(flat.keys())
for item in columns_list:
try:
row_idx = re.findall(r'\_(\d+)\_', item )[0]
except:
special_cols.append(item)
continue
column = re.findall(r'\_\d+\_(.*)', item )[0]
column = re.sub(r'\_\d+\_', '.', column)
row_idx = int(row_idx)
value = flat[item]
results.loc[row_idx, column] = value
for item in special_cols:
results[item] = flat[item]
Output:
print (results.to_string())
atribs_website atribs_source atribs_version atribs_type results.rule_type results.rule_tag results.description results.project_veid results.campaign_id results.value results.organization_id results.sub_organization_id results.appid results.project_id results.rule_id results.node_id results.metadata_campaign_title results.metadata_project_title attribs_website attribs_version attribs_type results.render_status results.path results.image_hash results.url results.load_time sub_organization_id uid project_veid campaign_id organization_id norm_attribs_website norm_attribs_version norm_attribs_type project_id system_timestamp doc_appid doc_response_url doc_url doc_status_code doc_status_msg doc_encoding doc_attrs_uid doc_timestamp doc_crawlid type norm_body norm_domain norm_author norm_url norm_timestamp norm_id
0 github.com/res Explicit 1.1 crawl hashtag Far NaN A7180EA-7078-0C7F-ED5D-86AD7 2A6DA0C-365BB-67DD-B05830920 #Far NaN NaN ray CDE2F42-5B87-C594-C900E578C 1838 NaN AF AF github.com/res 1.0 Page Render success https://east.amanaws.com/rays-ime-store/render... bb7674b8ea3fc05bfd027a19815f82c https://discooprdapp.com/ 32.0 default ac0fafe9ba98327f2d0c72ddc365ffb76336czsa13280b default default default github.com/res 1.1 crawl default 2019-02-22T19:04:53.569623 subtter https://discooprdapp.com https://discooprdapp.com/ 200 OK utf-8 2ab8f2651cb32261b911c990a8b 2019-02-22T19:04:53.963 7fd95-785-4dd259-fcc-8752f crawl \n discordapp.com crawl https://discooprdapp.com 2019-02-22T19:04:53.961283+00:00 7fc5-685-4dd9-cc-8762f
I have a JSON object in Python created through requests built in .json() function.
Here is a simplified sample of what I'm doing:
data = session.get(url)
obj = data.json()
s3object = s3.Object(s3_bucket, output_file)
s3object.put(Body=(bytes(json.dumps(obj).encode('UTF-8'))))
Example obj:
{'id': 'fab779b7-2586-4895-9f3b-c9518f34e028', 'project_id': 'a1a73e68-9943-4584-9d59-cc84a0d3e92b', 'created_at': '2017-10-23 02:57:03 -0700', 'sections': [{'section_name': '', 'items': [{'id': 'ffadc652-dd36-4b9f-817c-6539a4b462ab', 'created_at': '2017-10-23 03:36:13 -0700', 'updated_at': '2017-10-23 03:38:32 -0700', 'created_by': 'paul', 'question_text': 'Drawing Ref(s)', 'spec_ref': '', 'display_number': null, 'response': '', 'comment': 'see attached mh309', 'position': 1, 'is_conforming': 'N/A', 'display_type': 'text'}]}]}
I need to replace any occurrence of the string "N/A" with "Not Applicable" anywhere it appears regardless of its key or location before I upload the JSON to S3. I cannot use local disk writes hence the reason this is done this way.
Is this possible?
My original plan was to turn it to a string and just replace before turning back, is this inefficient?
Thanks,
As mentioned in the comments, obj is a dict. One way to replace N/A with Not Applicable regardless of location is to convert it to a string, use string.replace and convert it back to dict for further processing
import json
#Original dict with N/A
obj = {'id': 'fab779b7-2586-4895-9f3b-c9518f34e028', 'project_id': 'a1a73e68-9943-4584-9d59-cc84a0d3e92b', 'created_at': '2017-10-23 02:57:03 -0700', 'sections': [{'section_name': '', 'items': [{'id': 'ffadc652-dd36-4b9f-817c-6539a4b462ab', 'created_at': '2017-10-23 03:36:13 -0700', 'updated_at': '2017-10-23 03:38:32 -0700', 'created_by': 'paul', 'question_text': 'Drawing Ref(s)', 'spec_ref': '', 'display_number': None, 'response': '', 'comment': 'see attached mh309', 'position': 1, 'is_conforming': 'N/A', 'display_type': 'text'}]}]}
#Convert to string and replace
obj_str = json.dumps(obj).replace('N/A', 'Not Applicable')
#Get obj back with replacement
obj = json.loads(obj_str)
Although #Devesh Kumar Singh's answer works with the sample json data in your question, converting the whole thing to a string, and then doing a wholesale bulk replace of the substring seems possibly error-prone because potentially it might change it in portions other than only in the values associated with dictionary keys.
To avoid that I would suggest using the following, which is more selective even though it takes a few more lines of code:
import json
def replace_NA(obj):
def decode_dict(a_dict):
for key, value in a_dict.items():
try:
a_dict[key] = value.replace('N/A', 'Not Applicable')
except AttributeError:
pass
return a_dict
return json.loads(json.dumps(obj), object_hook=decode_dict)
obj = {'id': 'fab779b7-2586-4895-9f3b-c9518f34e028', 'project_id': 'a1a73e68-9943-4584-9d59-cc84a0d3e92b', 'created_at': '2017-10-23 02:57:03 -0700', 'sections': [{'section_name': '', 'items': [{'id': 'ffadc652-dd36-4b9f-817c-6539a4b462ab', 'created_at': '2017-10-23 03:36:13 -0700', 'updated_at': '2017-10-23 03:38:32 -0700', 'created_by': 'paul', 'question_text': 'Drawing Ref(s)', 'spec_ref': '', 'display_number': None, 'response': '', 'comment': 'see attached mh309', 'position': 1, 'is_conforming': 'N/A', 'display_type': 'text'}]}]}
obj = replace_NA(obj)
I guess the Object you've pasted here must be of dict type, you can check it as if "type(json_object) is class dict". With that assumption youcan do it as:-
keys = json_object.keys()
for i in keys:
if json_object[i]=="N/A":
json_object[i]="Not Available"
Hope it helps!
I have a nested json string as follows:
[{'id': 'tfghnbkivbgdcse',
'authorization': None,
'operation_type': 'in',
'card': {'type': 'debit',
'brand': 'mastercard',
'address': None,
'card_number': '123456XXXXXX7890',
'holder_name': 'aaaa bbbb’,
'expiration_year': '21',
'expiration_month': '11',
'bank_name': 'XXXXBANK',
'bank_code': '000'},
'status': 'failed',
'creation_date': '2018-06-30T23:59:16-05:00',
'error_message': 'Bank authorization is required for this charge',
'order_id': '1743790',
'amount': 2668.0,
'currency': 'USD',
'customer': {'name': 'AAAA',
'last_name': 'BBBB',
'email': 'XXXX_1234#outlook.com',
'phone_number': '1234567890',
'address': None,
'creation_date': '2018-06-30T23:59:17-05:00',
'external_id': None,
'clabe': None},
'fee': {'amount': 0.95, 'tax': 0.152, 'currency': 'USD'}}]
I want to convert json string into data frame. I have used json_normalize from pandas.io.json, but I am getting an error.
It works if you:
Change all None to "None"
Change 'aaaa bbbb’ to 'aaaa bbbb' (The last character was a different single quote character)
Change all <'> to <">
It's probably just the quote character you need to fix, if the json data is part of python code.
In [39]: ord("'")
Out[39]: 39
In [40]: ord("’")
Out[40]: 8217
I have this:
items = {{'project':'Project 1','description':'Task description','time':1222222},
{'project':'Project 2','description':'Task description 2','time':1224322},
{'project':'Project 1','description':'Task description 3','time':13222152}}
And I need something like this:
resultitems = {
'project':'Project 1','pritems':{
{'description':'Task description','time':1222222},
{'description':'Task description 3','time':13222152}},
'project':'Project 2',pritems':{
{'description':'Task description 2','time':1224322}},
}
of simply the name of each project as a key
I've tried this approach:
resultitems = {}
resultitems['Project 2'] = {}
resultitems['Project 2'].update(..)
update does not work, since it replaces the previous value
in php, it was easy,
$resultitems['Project 2'][] = array(...)
but don't find the way to do this in Python
result_items = {
'house project': [{'task': 'cleaning', 'hours': 20}, {'task': 'painting', 'hours: 30', etc.],
'garden project': [{'task': 'mowing the lawn', 'hours': 1, etc.
etc.
}
Your variable 'items' is not correct. If it is a list of dictionaries, it should be:
items = [{...}, {...}, {...}]
Please write the source of the data, from where do you get the data. This will determine the way you will fill in the desired dictionary. If you already have the data as in 'items' (i.e. a list of dictionaries), then here is how to converted it:
items = [{'project':'Project 1','description':'Task description','time':1222222},
{'project':'Project 2','description':'Task description 2','time':1224322},
{'project':'Project 1','description':'Task description 3','time':13222152}]
dct = {}
for e in items :
if e['project'] not in dct :
dct[e['project']] = []
dct[e['project']].append(dict([(k, v) for k,v in e.items() if k != 'project']))
print dct
and output is:
{'Project 2': [{'description': 'Task description 2', 'time': 1224322}], 'Project 1': [{'description': 'Task description', 'time': 1222222}, {'description': 'Task description 3', 'time': 13222152}]}
Finally, I used this:
newdata = {}
for data in result['data']:
try:
newdata[data['project']].append({"description":data['description'],"start":data['start'],"time":data['dur']})
except:
newdata[data['project']] = []
newdata[data['project']].append({"description":data['description'],"start":data['start'],"time":data['dur']})
print newdata
And the result has been like this, and this is what I needed:
{
u'Project 1': [
{'start': u'2015-07-09T18:09:41-03:00', 'description': u'Task 1 name', 'time': 1432000},
{'start': u'2015-07-09T17:42:36-03:00', 'description': u'Task 2 name', 'time': 618000}
],
u'Project 2': [
{'start': u'2015-07-09T20:14:16-03:00', 'description': u'Other Task Name', 'time': 4424000}
],
u'Project 3': [
{'start': u'2015-07-09T22:29:51-03:00', 'description': u'another task name for pr3', 'time': 3697000},
{'start': u'2015-07-09T19:38:02-03:00', 'description': u'something more to do', 'time': 59000},
{'start': u'2015-07-09T19:11:49-03:00', 'description': u'Base tests', 'time': 0},
{'start': u'2015-07-09T19:11:29-03:00', 'description': u'Domain', 'time': 0}
],
u'Project something': [
{'start': u'2015-07-09T19:39:30-03:00', 'description': u'Study more', 'time': 2069000},
{'start': u'2015-07-09T15:46:39-03:00', 'description': u'Study more (2)', 'time': 3800000},
{'start': u'2015-07-09T11:46:00-03:00', 'description': u'check forms', 'time': 660000}
]
}
by the way, I was no asking about the structure itself.. instead what I needed was someway to program a "something like this" structure.