Normalize JSON API data to columns - python

I'm trying to get data from our Hubspot CRM database and convert it to a dataframe using pandas. I'm still a beginner in python, but I can't get json_normalize to work.
The output from the database is i JSON format like this:
{'archived': False,
'archived_at': None,
'associations': None,
'created_at': datetime.datetime(2019, 12, 21, 17, 56, 24, 739000, tzinfo=tzutc()),
'id': 'xxx',
'properties': {'createdate': '2019-12-21T17:56:24.739Z',
'email': 'xxxxx#xxxxx.com',
'firstname': 'John',
'hs_object_id': 'xxx',
'lastmodifieddate': '2020-04-22T04:37:40.274Z',
'lastname': 'Hansen'},
'updated_at': datetime.datetime(2020, 4, 22, 4, 37, 40, 274000, tzinfo=tzutc())}, {'archived': False,
'archived_at': None,
'associations': None,
'created_at': datetime.datetime(2019, 12, 21, 17, 52, 38, 485000, tzinfo=tzutc()),
'id': 'bbb',
'properties': {'createdate': '2019-12-21T17:52:38.485Z',
'email': 'bbb#bbb.dk',
'firstname': 'John2',
'hs_object_id': 'bbb',
'lastmodifieddate': '2020-05-19T07:18:28.384Z',
'lastname': 'Hansen2'},
'updated_at': datetime.datetime(2020, 5, 19, 7, 18, 28, 384000, tzinfo=tzutc())}, {'archived': False,
'archived_at': None,
'associations': None,
etc.
Trying to put it into a dataframe using this code:
import hubspot
import pandas as pd
import json
from pandas.io.json import json_normalize
import os
client = hubspot.Client.create(api_key='################')
all_contacts = contacts_client = client.crm.contacts.get_all()
df=pd.io.json.json_normalize(all_contacts,'properties')
df.head
df.to_csv ('All contacts.csv')
But i keep getting an error that i can't resolve.
I have also tried the
pd.dataframe(all_contacts)
and
pf.dataframe.from_dict(all_contacts)

The all_contacts variable is a list of dictionary-like elements. So to create the dataframe I have used list comprehension to create a tuple that only contains the 'properties' for each dictionary-like element.
import datetime
import pandas as pd
from dateutil.tz import tzutc
data = ({'archived': False,
'archived_at': None,
'associations': None,
'created_at': datetime.datetime(2019, 12, 21, 17, 56, 24, 739000, tzinfo=tzutc()),
'id': 'xxx',
'properties': {'createdate': '2019-12-21T17:56:24.739Z',
'email': 'xxxxx#xxxxx.com',
'firstname': 'John',
'hs_object_id': 'xxx',
'lastmodifieddate': '2020-04-22T04:37:40.274Z',
'lastname': 'Hansen'},
'updated_at': datetime.datetime(2020, 4, 22, 4, 37, 40, 274000, tzinfo=tzutc())},
{'archived': False,
'archived_at': None,
'associations': None,
'created_at': datetime.datetime(2019, 12, 21, 17, 52, 38, 485000, tzinfo=tzutc()),
'id': 'bbb',
'properties': {
'createdate': '2019-12-21T17:52:38.485Z',
'email': 'bbb#bbb.dk',
'firstname': 'John2',
'hs_object_id': 'bbb',
'lastmodifieddate': '2020-05-19T07:18:28.384Z',
'lastname': 'Hansen2'},
'updated_at': datetime.datetime(2020, 5, 19, 7, 18, 28, 384000, tzinfo=tzutc())})
df = pd.DataFrame([row['properties'] for row in data])
print(df)
OUTPUT:
createdate email ... lastmodifieddate lastname
0 2019-12-21T17:56:24.739Z xxxxx#xxxxx.com ... 2020-04-22T04:37:40.274Z Hansen
1 2019-12-21T17:52:38.485Z bbb#bbb.dk ... 2020-05-19T07:18:28.384Z Hansen2
[2 rows x 6 columns]

Related

Unable to covert Json to Dataframe

Unable to covert Json to Dataframe, the following TypeError shows :
The following data is created
Test_data =
{'archived': False,
'archived_at': None,
'associations': None,
'created_at': datetime.datetime(2020, 10, 30, 8, 3, 54, 190000, tzinfo=tzlocal()),
'id': '12345',
'properties': {'createdate': '[![2020-10-30T08:03:54.190Z][1]][1]',
'email': 'testmail#gmail.com',
'firstname': 'TestFirst',
'lastname': 'TestLast'},
'properties_with_history': None,
'updated_at': datetime.datetime(2022, 11, 10, 6, 44, 14, 5000, tzinfo=tzlocal())}
data = json.loads(test_data)
TypeError: the JSON object must be str, bytes or bytearray, not SimplePublicObjectWithAssociations
The following has been tried:
s1 = json.dumps(test_data)
d2 = json.loads(s1)
TypeError: Object of type SimplePublicObjectWithAssociations is not JSON serializable
Prefered Output :
can you try this:
df=pd.json_normalize(Test_data)
print(df)
'''
archived archived_at associations created_at id properties_with_history updated_at properties.createdate properties.email properties.firstname
0 False None None 2020-10-30T08:03:54.190Z 12345 2022-11-10T06:44:14.500Z [![2020-10-30T08:03:54.190Z][1]][1] testmail#gmail.com TestFirst
'''
if you want to specific columns:
df = df[['id','properties.createdate','properties.email','properties.firstname','properties.lastname']]
df.columns = df.columns.str.replace('properties.', '')
df
id createdate email firstname lastname
0 12345 [![2020-10-30T08:03:54.190Z][1]][1] testmail#gmail.com TestFirst TestLast
if you want convert createdate column to datetime:
import datefinder
df['createdate']=df['createdate'].apply(lambda x: list(datefinder.find_dates(x))[0])
df
id createdate email firstname lastname
0 12345 2020-10-30 08:03:54.190000+00:00 testmail#gmail.com TestFirst TestLast
There is a partial solution.....Maybe selecting or doing an unpivot dataframe this approach could be useful...
import pandas as pd
import datetime
import json
import jsonpickle
test_data ={'archived': False,
'archived_at': None,
'associations': None,
'created_at': datetime.datetime(2020, 10, 30, 8, 3, 54, 190000),
'id': '12345',
'properties': {'createdate': '[![2020-10-30T08:03:54.190Z][1]][1]',
'email': 'testmail#gmail.com',
'firstname': 'TestFirst',
'lastname': 'TestLast'},
'properties_with_history': None,
'updated_at': datetime.datetime(2022, 11, 10, 6, 44, 14, 5000)}
data = jsonpickle.encode(test_data, unpicklable=False)
pd.read_json(data)
I have tried with melt and unstack but I didn't reach your prefered output...

How to change all datetime objects in a list to standard YYYY-MM-DD HH:MM:SS

When I query MySQL with Python and the query has datetime fields then I get this list as a result.
[{'_id': 1, 'name': 'index', '_cdate': datetime.datetime(2020, 10, 27, 9, 4, 34), 'title': 'DataExtract'}, {'_id': 2, 'name': 'topmenu', '_cdate': datetime.datetime(2020, 11, 4, 19, 52, 17), 'title': 'topmenu'}, {'_id': 3, 'name': 'functions_common', '_cdate': datetime.datetime(2020, 11, 4, 19, 52, 50), 'title': 'common functions'}, {'_id': 4, 'name': 'leftmenu', '_cdate': datetime.datetime(2020, 11, 4, 19, 53, 56), 'title': 'Left Menu'}, {'_id': 5, 'name': 'todo', '_cdate': datetime.datetime(2020, 11, 7, 8, 49, 38), 'title': 'Todo'}, {'_id': 6, 'name': 'cron_publish', '_cdate': datetime.datetime(2020, 12, 2, 19, 30, 11), 'title': 'Run Publish reports'}, {'_id': 7, 'name': 'test', '_cdate': datetime.datetime(2020, 12, 2, 22, 32, 54), 'title': 'test'}, {'_id': 8, 'name': 'help', '_cdate': datetime.datetime(2020, 12, 5, 7, 12, 44), 'title': 'Help'}, {'_id': 9, 'name': 'api', '_cdate': datetime.datetime(2020, 12, 5, 21, 22, 13), 'title': 'API'}, {'_id': 10, 'name': 'ben', '_cdate': datetime.datetime(2021, 10, 4, 11, 37, 3), 'title': 'List of Reports'}]
How do I either get the query to return the date fields in YYYY-MM-DD HH:MM:SS format? Or how do I convert them in the returned list. When I try to change them by enumerating over the results python throw as error that the dictionary has changed.
The datetime.datetime() objects you're getting are the standard representation of these objects - if you were expecting strings instead, you could simple convert them with datetime.strftime('%Y-%m-%d %H:%M:%S', value) but keep in mind that the datetime object is a more flexible way of keeping the data around. I'd recommend only formatting the date in a specific way if you're writing it to the screen or a file format that expects a string.
Example:
data = [{'_id': 1, 'name': 'index', '_cdate': datetime.datetime(2020, 10, 27, 9, 4, 34), 'title': 'DataExtract'}, {'_id': 2, 'name': 'topmenu', '_cdate': datetime.datetime(2020, 11, 4, 19, 52, 17), 'title': 'topmenu'}, {'_id': 3, 'name': 'functions_common', '_cdate': datetime.datetime(2020, 11, 4, 19, 52, 50), 'title': 'common functions'}, {'_id': 4, 'name': 'leftmenu', '_cdate': datetime.datetime(2020, 11, 4, 19, 53, 56), 'title': 'Left Menu'}, {'_id': 5, 'name': 'todo', '_cdate': datetime.datetime(2020, 11, 7, 8, 49, 38), 'title': 'Todo'}, {'_id': 6, 'name': 'cron_publish', '_cdate': datetime.datetime(2020, 12, 2, 19, 30, 11), 'title': 'Run Publish reports'}, {'_id': 7, 'name': 'test', '_cdate': datetime.datetime(2020, 12, 2, 22, 32, 54), 'title': 'test'}, {'_id': 8, 'name': 'help', '_cdate': datetime.datetime(2020, 12, 5, 7, 12, 44), 'title': 'Help'}, {'_id': 9, 'name': 'api', '_cdate': datetime.datetime(2020, 12, 5, 21, 22, 13), 'title': 'API'}, {'_id': 10, 'name': 'ben', '_cdate': datetime.datetime(2021, 10, 4, 11, 37, 3), 'title': 'List of Reports'}]
for rec in data:
rec['date_str'] = datetime.datetime.strftime('%Y-%m-%d %H:%M:%S', rec['_cdate'])
That would add 'date_str' field to every record with the format you require. Of course, you could also modify it to overwrite the original value.

How to Iterate through an array of dictionaries to copy only relevant keys to new dictionary?

I want to iterate through a dictionary array like the following to only copy the 'symbol' and 'product_progress' keys and their corresponding values to new dictionary array.
[{'coin_name': 'Bitcoin', 'coin_id': 'bitcoin', 'symbol': 'btc', 'rank': 1, 'product_progress': 93, 'team': 100, 'token_fundamentals': 100, 'github_activity': 95, 'marketing': 5, 'partnership': 5, 'uniqueness': 5, 'total_score': 96, 'exchange_name': 'Bitfinex', 'exchange_link': 'https://www.bitfinex.com/t/BTCUSD', 'website': 'https://bitcoin.org/en/', 'twitter': 'https://twitter.com/Bitcoin', 'telegram': None, 'whitepaper': 'https://bitcoin.org/en/bitcoin-paper'}, {'coin_name': 'Ethereum', 'coin_id': 'ethereum', 'symbol': 'eth', 'rank': 2, 'product_progress': 87, 'team': 98, 'token_fundamentals': 97, 'github_activity': 100, 'marketing': 5, 'partnership': 5, 'uniqueness': 5, 'total_score': 94, 'exchange_name': 'Gemini', 'exchange_link': 'https://gemini.com/', 'website': 'https://www.ethereum.org/', 'twitter': 'https://twitter.com/ethereum', 'telegram': None, 'whitepaper': 'https://ethereum.org/en/whitepaper/'}] ...
The code I have so far is:
# need to iterate through list of dictionaries
for index in range(len(projectlist3)):
for key in projectlist3[index]:
d['symbol'] = projectlist3[index]['symbol']
d['token_fundamentals'] = projectlist3[index]['token_fundamentals']
print(d)
It's just saving the last entry rather than all of the entries {'symbol': 'eth', 'token_fundamentals': 97}
Given your data:
l = [{
'coin_name': 'Bitcoin',
'coin_id': 'bitcoin',
'symbol': 'btc',
'rank': 1,
'product_progress': 93,
'team': 100,
'token_fundamentals': 100,
'github_activity': 95,
'marketing': 5,
'partnership': 5,
'uniqueness': 5,
'total_score': 96,
'exchange_name': 'Bitfinex',
'exchange_link': 'https://www.bitfinex.com/t/BTCUSD',
'website': 'https://bitcoin.org/en/',
'twitter': 'https://twitter.com/Bitcoin',
'telegram': None,
'whitepaper': 'https://bitcoin.org/en/bitcoin-paper'
}, {
'coin_name': 'Ethereum',
'coin_id': 'ethereum',
'symbol': 'eth',
'rank': 2,
'product_progress': 87,
'team': 98,
'token_fundamentals': 97,
'github_activity': 100,
'marketing': 5,
'partnership': 5,
'uniqueness': 5,
'total_score': 94,
'exchange_name': 'Gemini',
'exchange_link': 'https://gemini.com/',
'website': 'https://www.ethereum.org/',
'twitter': 'https://twitter.com/ethereum',
'telegram': None,
'whitepaper': 'https://ethereum.org/en/whitepaper/'
}]
You can use listcomp
new_l = [{field: d[field] for field in ['symbol', 'token_fundamentals']}
for d in l]
which is better equivalent of this:
new_l = []
for d in l:
new_d = {}
for field in ['symbol', 'token_fundamentals']:
new_d[field] = d[field]
new_l.append(new_d)
Judging by what your writing into d you want to save a list of objects so this would work:
[{"symbol": i['symbol'], "token_fundamentals": i['token_fundamentals']} for i in d]
Result:
[{'symbol': 'btc', 'token_fundamentals': 100}, {'symbol': 'eth', 'token_fundamentals': 97}]

Pandas ticker to ohlc

rows is a list of dict from mysql.
rows example
[{'date': datetime.datetime(2017, 3, 21, 13, 27, 20), 'tid': 648605515L, 'price': Decimal('1080.04000000'), 'type': 1, 'amount': Decimal('10.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 20), 'tid': 648605549L, 'price': Decimal('1081.55000000'), 'type': 1, 'amount': Decimal('16.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 20), 'tid': 648605547L, 'price': Decimal('1081.33000000'), 'type': 1, 'amount': Decimal('20.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 20), 'tid': 648605545L, 'price': Decimal('1081.30000000'), 'type': 1, 'amount': Decimal('16.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 20), 'tid': 648605543L, 'price': Decimal('1081.29000000'), 'type': 1, 'amount': Decimal('20.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 20), 'tid': 648605541L, 'price': Decimal('1080.46000000'), 'type': 1, 'amount': Decimal('26.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 20), 'tid': 648605517L, 'price': Decimal('1080.04000000'), 'type': 1, 'amount': Decimal('8.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 22), 'tid': 648605601L, 'price': Decimal('1079.69000000'), 'type': -1, 'amount': Decimal('70.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 25), 'tid': 648605686L, 'price': Decimal('1079.72000000'), 'type': -1, 'amount': Decimal('4.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 26), 'tid': 648605765L, 'price': Decimal('1079.45000000'), 'type': 1, 'amount': Decimal('6.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 26), 'tid': 648605753L, 'price': Decimal('1079.60000000'), 'type': -1, 'amount': Decimal('106.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 26), 'tid': 648605751L, 'price': Decimal('1079.60000000'), 'type': -1, 'amount': Decimal('80.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 26), 'tid': 648605749L, 'price': Decimal('1079.67000000'), 'type': -1, 'amount': Decimal('430.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 26), 'tid': 648605747L, 'price': Decimal('1079.70000000'), 'type': -1, 'amount': Decimal('66.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 26), 'tid': 648605745L, 'price': Decimal('1079.74000000'), 'type': -1, 'amount': Decimal('12.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 27), 'tid': 648605785L, 'price': Decimal('1079.45000000'), 'type': 1, 'amount': Decimal('2.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 27), 'tid': 648605774L, 'price': Decimal('1079.45000000'), 'type': 1, 'amount': Decimal('6.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 27), 'tid': 648605771L, 'price': Decimal('1079.45000000'), 'type': 1, 'amount': Decimal('14.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 28), 'tid': 648605827L, 'price': Decimal('1079.45000000'), 'type': 1, 'amount': Decimal('42.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 28), 'tid': 648605842L, 'price': Decimal('1079.45000000'), 'type': 1, 'amount': Decimal('10.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 32), 'tid': 648605973L, 'price': Decimal('1079.45000000'), 'type': 1, 'amount': Decimal('2.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 37), 'tid': 648606114L, 'price': Decimal('1079.44000000'), 'type': 1, 'amount': Decimal('24.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 37), 'tid': 648606116L, 'price': Decimal('1079.45000000'), 'type': 1, 'amount': Decimal('40.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 42), 'tid': 648606258L, 'price': Decimal('1079.45000000'), 'type': 1, 'amount': Decimal('56.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 45), 'tid': 648606345L, 'price': Decimal('1079.46000000'), 'type': -1, 'amount': Decimal('10.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 46), 'tid': 648606392L, 'price': Decimal('1079.69000000'), 'type': 1, 'amount': Decimal('44.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 48), 'tid': 648606418L, 'price': Decimal('1079.60000000'), 'type': -1, 'amount': Decimal('40.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 48), 'tid': 648606420L, 'price': Decimal('1079.46000000'), 'type': -1, 'amount': Decimal('36.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 48), 'tid': 648606422L, 'price': Decimal('1079.46000000'), 'type': -1, 'amount': Decimal('94.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 50), 'tid': 648606499L, 'price': Decimal('1079.31000000'), 'type': 1, 'amount': Decimal('80.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 50), 'tid': 648606478L, 'price': Decimal('1079.31000000'), 'type': -1, 'amount': Decimal('6.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 50), 'tid': 648606476L, 'price': Decimal('1079.31000000'), 'type': -1, 'amount': Decimal('34.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 50), 'tid': 648606474L, 'price': Decimal('1079.55000000'), 'type': -1, 'amount': Decimal('8.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 55), 'tid': 648606666L, 'price': Decimal('1079.31000000'), 'type': 1, 'amount': Decimal('44.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 55), 'tid': 648606650L, 'price': Decimal('1079.17000000'), 'type': 1, 'amount': Decimal('8.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 55), 'tid': 648606648L, 'price': Decimal('1079.17000000'), 'type': 1, 'amount': Decimal('8.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 1), 'tid': 648606820L, 'price': Decimal('1079.03000000'), 'type': -1, 'amount': Decimal('28.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 2), 'tid': 648606825L, 'price': Decimal('1079.03000000'), 'type': 1, 'amount': Decimal('30.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 2), 'tid': 648606836L, 'price': Decimal('1079.02000000'), 'type': -1, 'amount': Decimal('22.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 5), 'tid': 648606945L, 'price': Decimal('1078.58000000'), 'type': -1, 'amount': Decimal('2.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 5), 'tid': 648606943L, 'price': Decimal('1078.61000000'), 'type': -1, 'amount': Decimal('2.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 5), 'tid': 648606941L, 'price': Decimal('1078.63000000'), 'type': -1, 'amount': Decimal('2.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 5), 'tid': 648606939L, 'price': Decimal('1078.88000000'), 'type': -1, 'amount': Decimal('2.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 5), 'tid': 648606926L, 'price': Decimal('1078.88000000'), 'type': -1, 'amount': Decimal('428.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 6), 'tid': 648606984L, 'price': Decimal('1078.58000000'), 'type': 1, 'amount': Decimal('2.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 6), 'tid': 648606982L, 'price': Decimal('1078.05000000'), 'type': -1, 'amount': Decimal('10.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 6), 'tid': 648606971L, 'price': Decimal('1078.58000000'), 'type': 1, 'amount': Decimal('2.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 6), 'tid': 648606957L, 'price': Decimal('1078.05000000'), 'type': -1, 'amount': Decimal('74.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 6), 'tid': 648606955L, 'price': Decimal('1078.15000000'), 'type': -1, 'amount': Decimal('6.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 6), 'tid': 648606953L, 'price': Decimal('1078.15000000'), 'type': -1, 'amount': Decimal('14.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 6), 'tid': 648606951L, 'price': Decimal('1078.42000000'), 'type': -1, 'amount': Decimal('16.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 7), 'tid': 648606992L, 'price': Decimal('1078.05000000'), 'type': -1, 'amount': Decimal('2.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 7), 'tid': 648606995L, 'price': Decimal('1078.58000000'), 'type': 1, 'amount': Decimal('2.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 7), 'tid': 648607023L, 'price': Decimal('1078.06000000'), 'type': -1, 'amount': Decimal('4.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 8), 'tid': 648607047L, 'price': Decimal('1078.86000000'), 'type': 1, 'amount': Decimal('2.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 10), 'tid': 648607113L, 'price': Decimal('1078.06000000'), 'type': -1, 'amount': Decimal('2.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 10), 'tid': 648607115L, 'price': Decimal('1078.03000000'), 'type': -1, 'amount': Decimal('148.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 12), 'tid': 648607192L, 'price': Decimal('1079.00000000'), 'type': -1, 'amount': Decimal('10.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 13), 'tid': 648607218L, 'price': Decimal('1078.99000000'), 'type': 1, 'amount': Decimal('98.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 13), 'tid': 648607220L, 'price': Decimal('1079.00000000'), 'type': 1, 'amount': Decimal('42.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 13), 'tid': 648607222L, 'price': Decimal('1079.03000000'), 'type': 1, 'amount': Decimal('342.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 13), 'tid': 648607224L, 'price': Decimal('1079.16000000'), 'type': 1, 'amount': Decimal('512.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 14), 'tid': 648607250L, 'price': Decimal('1078.98000000'), 'type': 1, 'amount': Decimal('44.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 14), 'tid': 648607252L, 'price': Decimal('1078.98000000'), 'type': 1, 'amount': Decimal('12.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 14), 'tid': 648607254L, 'price': Decimal('1079.00000000'), 'type': 1, 'amount': Decimal('106.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 14), 'tid': 648607256L, 'price': Decimal('1079.16000000'), 'type': 1, 'amount': Decimal('40.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 20), 'tid': 648607431L, 'price': Decimal('1079.16000000'), 'type': 1, 'amount': Decimal('28.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 20), 'tid': 648607429L, 'price': Decimal('1079.01000000'), 'type': 1, 'amount': Decimal('2.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 20), 'tid': 648607427L, 'price': Decimal('1079.01000000'), 'type': 1, 'amount': Decimal('2.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 23), 'tid': 648607518L, 'price': Decimal('1079.16000000'), 'type': 1, 'amount': Decimal('8.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 24), 'tid': 648607544L, 'price': Decimal('1079.16000000'), 'type': 1, 'amount': Decimal('344.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 25), 'tid': 648607593L, 'price': Decimal('1078.79000000'), 'type': -1, 'amount': Decimal('6.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 26), 'tid': 648607631L, 'price': Decimal('1079.16000000'), 'type': 1, 'amount': Decimal('430.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 26), 'tid': 648607623L, 'price': Decimal('1079.16000000'), 'type': 1, 'amount': Decimal('18.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 26), 'tid': 648607621L, 'price': Decimal('1078.79000000'), 'type': 1, 'amount': Decimal('14.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 29), 'tid': 648607695L, 'price': Decimal('1079.16000000'), 'type': 1, 'amount': Decimal('776.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 32), 'tid': 648607803L, 'price': Decimal('1079.16000000'), 'type': 1, 'amount': Decimal('2.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 32), 'tid': 648607805L, 'price': Decimal('1079.16000000'), 'type': 1, 'amount': Decimal('10.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 36), 'tid': 648607905L, 'price': Decimal('1079.16000000'), 'type': 1, 'amount': Decimal('2.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 37), 'tid': 648607940L, 'price': Decimal('1079.31000000'), 'type': 1, 'amount': Decimal('2.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 42), 'tid': 648608110L, 'price': Decimal('1079.46000000'), 'type': -1, 'amount': Decimal('12.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 46), 'tid': 648608211L, 'price': Decimal('1079.88000000'), 'type': -1, 'amount': Decimal('12.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 46), 'tid': 648608213L, 'price': Decimal('1079.88000000'), 'type': -1, 'amount': Decimal('6.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 57), 'tid': 648608534L, 'price': Decimal('1080.29000000'), 'type': 1, 'amount': Decimal('14.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 57), 'tid': 648608536L, 'price': Decimal('1080.30000000'), 'type': 1, 'amount': Decimal('2.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 29, 2), 'tid': 648608683L, 'price': Decimal('1080.59000000'), 'type': 1, 'amount': Decimal('40.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 29, 3), 'tid': 648608733L, 'price': Decimal('1080.59000000'), 'type': 1, 'amount': Decimal('360.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 29, 7), 'tid': 648608838L, 'price': Decimal('1080.90000000'), 'type': 1, 'amount': Decimal('82.00000000')}]
if I didn't use set_index ,it will have an TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex'
if rows:
df = pd.DataFrame(rows)
print df.head()
# TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex'
df = df.set_index("date")
print df.head()
resample_data = df.resample("1min", how={"price": "ohlc", "amount": "sum"})
print resample_data
Result :
Connected to pydev debugger (build 162.1967.10)
amount date price tid type
0 2.00000000 2017-03-21 11:15:12 1075.83000000 648370156 -1
1 10.00000000 2017-03-21 11:15:15 1076.00000000 648370241 -1
2 10.00000000 2017-03-21 11:15:17 1075.83000000 648370297 -1
3 10.00000000 2017-03-21 11:15:17 1075.83000000 648370311 1
4 8.00000000 2017-03-21 11:15:19 1076.13000000 648370370 1
amount price tid type
date
2017-03-21 11:15:12 2.00000000 1075.83000000 648370156 -1
2017-03-21 11:15:15 10.00000000 1076.00000000 648370241 -1
2017-03-21 11:15:17 10.00000000 1075.83000000 648370297 -1
2017-03-21 11:15:17 10.00000000 1075.83000000 648370311 1
2017-03-21 11:15:19 8.00000000 1076.13000000 648370370 1
/Users/wyx/bitcoin_workspace/fibo-strategy/ticker.py:45: FutureWarning: how in .resample() is deprecated
the new syntax is .resample(...)..apply(<func>)
resample_data = df.resample("1min", how={"price": "ohlc", "amount": "sum"})
Traceback (most recent call last):
File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 1580, in <module>
globals = debugger.run(setup['file'], None, None, is_module)
File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 964, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/Users/wyx/bitcoin_workspace/fibo-strategy/ticker.py", line 45, in <module>
resample_data = df.resample("1min", how={"price": "ohlc", "amount": "sum"})
File "/Users/wyx/bitcoin_workspace/fibo-strategy/.env/lib/python2.7/site-packages/pandas/core/generic.py", line 4216, in resample
limit=limit)
File "/Users/wyx/bitcoin_workspace/fibo-strategy/.env/lib/python2.7/site-packages/pandas/tseries/resample.py", line 582, in _maybe_process_deprecations
r = r.aggregate(how)
File "/Users/wyx/bitcoin_workspace/fibo-strategy/.env/lib/python2.7/site-packages/pandas/tseries/resample.py", line 320, in aggregate
result, how = self._aggregate(arg, *args, **kwargs)
File "/Users/wyx/bitcoin_workspace/fibo-strategy/.env/lib/python2.7/site-packages/pandas/core/base.py", line 549, in _aggregate
result = _agg(arg, _agg_1dim)
File "/Users/wyx/bitcoin_workspace/fibo-strategy/.env/lib/python2.7/site-packages/pandas/core/base.py", line 500, in _agg
result[fname] = func(fname, agg_how)
File "/Users/wyx/bitcoin_workspace/fibo-strategy/.env/lib/python2.7/site-packages/pandas/core/base.py", line 483, in _agg_1dim
return colg.aggregate(how, _level=(_level or 0) + 1)
File "/Users/wyx/bitcoin_workspace/fibo-strategy/.env/lib/python2.7/site-packages/pandas/core/groupby.py", line 2652, in aggregate
return getattr(self, func_or_funcs)(*args, **kwargs)
File "/Users/wyx/bitcoin_workspace/fibo-strategy/.env/lib/python2.7/site-packages/pandas/core/groupby.py", line 1128, in ohlc
lambda x: x._cython_agg_general('ohlc'))
File "/Users/wyx/bitcoin_workspace/fibo-strategy/.env/lib/python2.7/site-packages/pandas/core/groupby.py", line 3103, in _apply_to_column_groupbys
return func(self)
File "/Users/wyx/bitcoin_workspace/fibo-strategy/.env/lib/python2.7/site-packages/pandas/core/groupby.py", line 1128, in <lambda>
lambda x: x._cython_agg_general('ohlc'))
File "/Users/wyx/bitcoin_workspace/fibo-strategy/.env/lib/python2.7/site-packages/pandas/core/groupby.py", line 808, in _cython_agg_general
raise DataError('No numeric types to aggregate')
pandas.core.base.DataError: No numeric types to aggregate
Process finished with exit code 1
I am a rookie for pandas.
How solve the error?
And if I want to use the last close price to fill the NaN of next
min ohlc. How to do that?
You need to set an index using your dates.
Code:
from io import StringIO
df = pd.read_csv(StringIO(
u"""amount date price tid type
6.00000000 2017-03-21t10:46:32 1059.26000000 648313975 -1
4.00000000 2017-03-21t10:46:37 1059.42000000 648314094 -1
2.00000000 2017-03-21t10:46:37 1059.42000000 648314096 -1
2.00000000 2017-03-21t10:46:41 1059.26000000 648314176 -1
32.00000000 2017-03-21t10:46:41 1059.26000000 648314189 -1
"""), sep='\s+', parse_dates='date'.split())
print(df)
resample_data = df.set_index('date').resample(
"1min", how={"price": "ohlc", "amount": "sum"})
print(resample_data)
Results:
amount date price tid type
0 6.0 2017-03-21 10:46:32 1059.26 648313975 -1
1 4.0 2017-03-21 10:46:37 1059.42 648314094 -1
2 2.0 2017-03-21 10:46:37 1059.42 648314096 -1
3 2.0 2017-03-21 10:46:41 1059.26 648314176 -1
4 32.0 2017-03-21 10:46:41 1059.26 648314189 -1
price amount
open high low close amount
date
2017-03-21 10:46:00 1059.26 1059.42 1059.26 1059.26 46.0

How to extract raw data from Salesforce using Beatbox python API

I am using the following code to extract data from Salesforce using beatbox python API.
import beatbox
sf_username = "xyz#salesforce.com"
sf_password = "123"
sf_api_token = "ABC"
def extract():
sf_client = beatbox.PythonClient()
password = str("%s%s" % (sf_password, sf_api_token))
sf_client.login(sf_username, password)
lead_qry = "SELECT CountryIsoCode__c,LastModifiedDate FROM Country limit 10"
records = sf_client.query(lead_qry)
output = open('output','w')
for record in records:
output.write('\t'.join(record.values())
output.close()
if _name_ == '__main__':
extract()
But this is what I get in the output. How to get the raw data, just the values I see in the workbench. I don't want to parse each datatype and get the raw value.
Actual Output:
[{'LastModifiedDate': datetime.datetime(2012, 11, 2, 9, 32, 4),
'CountryIsoCode_c': 'AU', 'type': 'Country_c', 'Id': ''},
{'LastModifiedDate': datetime.datetime(2012, 8, 18, 14, 0, 21),
'CountryIsoCode_c': 'LX', 'type': 'Country_c', 'Id': ''},
{'LastModifiedDate': datetime.datetime(2012, 11, 12, 15, 20, 11),
'CountryIsoCode_c': 'AE', 'type': 'Country_c', 'Id': ''},
{'LastModifiedDate': datetime.datetime(2012, 11, 12, 15, 20, 29),
'CountryIsoCode_c': 'AR', 'type': 'Country_c', 'Id': ''},
{'LastModifiedDate': datetime.datetime(2012, 11, 2, 9, 32, 4),
'CountryIsoCode_c': 'AT', 'type': 'Country_c', 'Id': ''},
{'LastModifiedDate': datetime.datetime(2012, 11, 2, 9, 32, 4),
'CountryIsoCode_c': 'BE', 'type': 'Country_c', 'Id': ''},
{'LastModifiedDate': datetime.datetime(2012, 11, 12, 15, 21, 28),
'CountryIsoCode_c': 'BR', 'type': 'Country_c', 'Id': ''},
{'LastModifiedDate': datetime.datetime(2012, 11, 12, 15, 21, 42),
'CountryIsoCode_c': 'CA', 'type': 'Country_c', 'Id': ''},
{'LastModifiedDate': datetime.datetime(2012, 11, 12, 15, 36, 18),
'CountryIsoCode_c': 'CH', 'type': 'Country_c', 'Id': ''},
{'LastModifiedDate': datetime.datetime(2012, 11, 12, 15, 35, 8),
'CountryIsoCode_c': 'CL', 'type': 'Country_c', 'Id': ''}]
Expected Output:
AU 2012-11-02T09:32:04Z
LX 2012-08-18T14:00:21Z
If you work with table data you should use Pandas library
Here is an example:
import pandas as pd
from datetime import datetime
import beatbox
service = beatbox.PythonClient()
service.login('login_here', 'creds_here')
query_result = service.query("SELECT Name, Country, CreatedDate FROM Lead limit 5") # CreatedDate is a datetime object
records = query_result['records'] # records is a list of dictionaries
records is a list of dictionaries as you mentioned before
df = pd.DataFrame(records)
print (df)
Country CreatedDate Id Name type
0 United States 2011-05-26 23:39:58 qwe qwe Lead
1 France 2011-09-01 08:45:26 qwe qwe Lead
2 France 2011-09-01 08:37:36 qwe qwe Lead
3 France 2011-09-01 08:46:38 qwe qwe Lead
4 France 2011-09-01 08:46:57 qwe qwe Lead
Now you have table-style Dataframe object. You can index multiple columns and rows:
df['CreatedDate']
0 2011-05-26 23:39:58
1 2011-09-01 08:45:26
2 2011-09-01 08:37:36
3 2011-09-01 08:46:38
4 2011-09-01 08:46:57
Here is more about pandas time functionality http://pandas.pydata.org/pandas-docs/stable/timeseries.html
And here is about pandas http://pandas.pydata.org/pandas-docs/stable/install.html

Categories