How to extract raw data from Salesforce using Beatbox python API - python

I am using the following code to extract data from Salesforce using beatbox python API.
import beatbox
sf_username = "xyz#salesforce.com"
sf_password = "123"
sf_api_token = "ABC"
def extract():
sf_client = beatbox.PythonClient()
password = str("%s%s" % (sf_password, sf_api_token))
sf_client.login(sf_username, password)
lead_qry = "SELECT CountryIsoCode__c,LastModifiedDate FROM Country limit 10"
records = sf_client.query(lead_qry)
output = open('output','w')
for record in records:
output.write('\t'.join(record.values())
output.close()
if _name_ == '__main__':
extract()
But this is what I get in the output. How to get the raw data, just the values I see in the workbench. I don't want to parse each datatype and get the raw value.
Actual Output:
[{'LastModifiedDate': datetime.datetime(2012, 11, 2, 9, 32, 4),
'CountryIsoCode_c': 'AU', 'type': 'Country_c', 'Id': ''},
{'LastModifiedDate': datetime.datetime(2012, 8, 18, 14, 0, 21),
'CountryIsoCode_c': 'LX', 'type': 'Country_c', 'Id': ''},
{'LastModifiedDate': datetime.datetime(2012, 11, 12, 15, 20, 11),
'CountryIsoCode_c': 'AE', 'type': 'Country_c', 'Id': ''},
{'LastModifiedDate': datetime.datetime(2012, 11, 12, 15, 20, 29),
'CountryIsoCode_c': 'AR', 'type': 'Country_c', 'Id': ''},
{'LastModifiedDate': datetime.datetime(2012, 11, 2, 9, 32, 4),
'CountryIsoCode_c': 'AT', 'type': 'Country_c', 'Id': ''},
{'LastModifiedDate': datetime.datetime(2012, 11, 2, 9, 32, 4),
'CountryIsoCode_c': 'BE', 'type': 'Country_c', 'Id': ''},
{'LastModifiedDate': datetime.datetime(2012, 11, 12, 15, 21, 28),
'CountryIsoCode_c': 'BR', 'type': 'Country_c', 'Id': ''},
{'LastModifiedDate': datetime.datetime(2012, 11, 12, 15, 21, 42),
'CountryIsoCode_c': 'CA', 'type': 'Country_c', 'Id': ''},
{'LastModifiedDate': datetime.datetime(2012, 11, 12, 15, 36, 18),
'CountryIsoCode_c': 'CH', 'type': 'Country_c', 'Id': ''},
{'LastModifiedDate': datetime.datetime(2012, 11, 12, 15, 35, 8),
'CountryIsoCode_c': 'CL', 'type': 'Country_c', 'Id': ''}]
Expected Output:
AU 2012-11-02T09:32:04Z
LX 2012-08-18T14:00:21Z

If you work with table data you should use Pandas library
Here is an example:
import pandas as pd
from datetime import datetime
import beatbox
service = beatbox.PythonClient()
service.login('login_here', 'creds_here')
query_result = service.query("SELECT Name, Country, CreatedDate FROM Lead limit 5") # CreatedDate is a datetime object
records = query_result['records'] # records is a list of dictionaries
records is a list of dictionaries as you mentioned before
df = pd.DataFrame(records)
print (df)
Country CreatedDate Id Name type
0 United States 2011-05-26 23:39:58 qwe qwe Lead
1 France 2011-09-01 08:45:26 qwe qwe Lead
2 France 2011-09-01 08:37:36 qwe qwe Lead
3 France 2011-09-01 08:46:38 qwe qwe Lead
4 France 2011-09-01 08:46:57 qwe qwe Lead
Now you have table-style Dataframe object. You can index multiple columns and rows:
df['CreatedDate']
0 2011-05-26 23:39:58
1 2011-09-01 08:45:26
2 2011-09-01 08:37:36
3 2011-09-01 08:46:38
4 2011-09-01 08:46:57
Here is more about pandas time functionality http://pandas.pydata.org/pandas-docs/stable/timeseries.html
And here is about pandas http://pandas.pydata.org/pandas-docs/stable/install.html

Related

Unable to covert Json to Dataframe

Unable to covert Json to Dataframe, the following TypeError shows :
The following data is created
Test_data =
{'archived': False,
'archived_at': None,
'associations': None,
'created_at': datetime.datetime(2020, 10, 30, 8, 3, 54, 190000, tzinfo=tzlocal()),
'id': '12345',
'properties': {'createdate': '[![2020-10-30T08:03:54.190Z][1]][1]',
'email': 'testmail#gmail.com',
'firstname': 'TestFirst',
'lastname': 'TestLast'},
'properties_with_history': None,
'updated_at': datetime.datetime(2022, 11, 10, 6, 44, 14, 5000, tzinfo=tzlocal())}
data = json.loads(test_data)
TypeError: the JSON object must be str, bytes or bytearray, not SimplePublicObjectWithAssociations
The following has been tried:
s1 = json.dumps(test_data)
d2 = json.loads(s1)
TypeError: Object of type SimplePublicObjectWithAssociations is not JSON serializable
Prefered Output :
can you try this:
df=pd.json_normalize(Test_data)
print(df)
'''
archived archived_at associations created_at id properties_with_history updated_at properties.createdate properties.email properties.firstname
0 False None None 2020-10-30T08:03:54.190Z 12345 2022-11-10T06:44:14.500Z [![2020-10-30T08:03:54.190Z][1]][1] testmail#gmail.com TestFirst
'''
if you want to specific columns:
df = df[['id','properties.createdate','properties.email','properties.firstname','properties.lastname']]
df.columns = df.columns.str.replace('properties.', '')
df
id createdate email firstname lastname
0 12345 [![2020-10-30T08:03:54.190Z][1]][1] testmail#gmail.com TestFirst TestLast
if you want convert createdate column to datetime:
import datefinder
df['createdate']=df['createdate'].apply(lambda x: list(datefinder.find_dates(x))[0])
df
id createdate email firstname lastname
0 12345 2020-10-30 08:03:54.190000+00:00 testmail#gmail.com TestFirst TestLast
There is a partial solution.....Maybe selecting or doing an unpivot dataframe this approach could be useful...
import pandas as pd
import datetime
import json
import jsonpickle
test_data ={'archived': False,
'archived_at': None,
'associations': None,
'created_at': datetime.datetime(2020, 10, 30, 8, 3, 54, 190000),
'id': '12345',
'properties': {'createdate': '[![2020-10-30T08:03:54.190Z][1]][1]',
'email': 'testmail#gmail.com',
'firstname': 'TestFirst',
'lastname': 'TestLast'},
'properties_with_history': None,
'updated_at': datetime.datetime(2022, 11, 10, 6, 44, 14, 5000)}
data = jsonpickle.encode(test_data, unpicklable=False)
pd.read_json(data)
I have tried with melt and unstack but I didn't reach your prefered output...

How to filter list of dictionaries in python?

I have a list of dictionaries which is as follow-
VehicleList = [
{
'id': '1',
'VehicleType': 'Car',
'CreationDate': datetime.datetime(2021, 12, 10, 16, 9, 44, 872000)
},
{
'id': '2',
'VehicleType': 'Bike',
'CreationDate': datetime.datetime(2021, 12, 15, 11, 8, 21, 612000)
},
{
'id': '3',
'VehicleType': 'Truck',
'CreationDate': datetime.datetime(2021, 9, 13, 10, 1, 50, 350095)
},
{
'id': '4',
'VehicleType': 'Bike',
'CreationDate': datetime.datetime(2021, 12, 10, 21, 1, 00, 300012)
},
{
'id': '5',
'VehicleType': 'Car',
'CreationDate': datetime.datetime(2021, 12, 21, 10, 1, 50, 600095)
}
]
How can I get a list of the latest vehicles for each 'VehicleType' based on their 'CreationDate'?
I expect something like this-
latestVehicles = [
{
'id': '5',
'VehicleType': 'Car',
'CreationDate': datetime.datetime(2021, 12, 21, 10, 1, 50, 600095)
},
{
'id': '2',
'VehicleType': 'Bike',
'CreationDate': datetime.datetime(2021, 12, 15, 11, 8, 21, 612000)
},
{
'id': '3',
'VehicleType': 'Truck',
'CreationDate': datetime.datetime(2021, 9, 13, 10, 1, 50, 350095)
}
]
I tried separating out each dictionary based on their 'VehicleType' into different lists and then picking up the latest one.
I believe there might be a more optimal way to do this.
Use a dictionary mapping from VehicleType value to the dictionary you want in your final list. Compare the date of each item in the input list with the one your dict, and keep the later one.
latest_dict = {}
for vehicle in VehicleList:
t = vehicle['VehicleType']
if t not in latest_dict or vehicle['CreationDate'] > latest_dict[t]['CreationDate']:
latest_dict[t] = vehicle
latestVehicles = list(latest_dict.values())
Here is a solution using max and filter:
VehicleLatest = [
max(
filter(lambda _: _["VehicleType"] == t, VehicleList),
key=lambda _: _["CreationDate"]
) for t in {_["VehicleType"] for _ in VehicleList}
]
Result
print(VehicleLatest)
# [{'id': '2', 'VehicleType': 'Bike', 'CreationDate': datetime.datetime(2021, 12, 15, 11, 8, 21, 612000)}, {'id': '3', 'VehicleType': 'Truck', 'CreationDate': datetime.datetime(2021, 9, 13, 10, 1, 50, 350095)}, {'id': '5', 'VehicleType': 'Car', 'CreationDate': datetime.datetime(2021, 12, 21, 10, 1, 50, 600095)}]
I think you can acheive what you want using the groupby function from itertools.
from itertools import groupby
# entries sorted according to the key we wish to groupby: 'VehicleType'
VehicleList = sorted(VehicleList, key=lambda x: x["VehicleType"])
latestVehicles = []
# Then the elements are grouped.
for k, v in groupby(VehicleList, lambda x: x["VehicleType"]):
# We then append to latestVehicles the 0th entry of the
# grouped elements after sorting according to the 'CreationDate'
latestVehicles.append(sorted(list(v), key=lambda x: x["CreationDate"], reverse=True)[0])
Sort by 'VehicleType' and 'CreationDate', then create a dictionary from 'VehicleType' and vehicle to get the latest vehicle for each type:
VehicleList.sort(key=lambda x: (x.get('VehicleType'), x.get('CreationDate')))
out = list(dict(zip([item.get('VehicleType') for item in VehicleList], VehicleList)).values())
Output:
[{'id': '2',
'VehicleType': 'Bike',
'CreationDate': datetime.datetime(2021, 12, 15, 11, 8, 21, 612000)},
{'id': '5',
'VehicleType': 'Car',
'CreationDate': datetime.datetime(2021, 12, 21, 10, 1, 50, 600095)},
{'id': '3',
'VehicleType': 'Truck',
'CreationDate': datetime.datetime(2021, 9, 13, 10, 1, 50, 350095)}]
This is very straightforwards in pandas. First load the list of dicts as a pandas dataframe, then sort the values by date, take the top n items (3 in the example below), and export to dict.
import pandas as pd
df = pd.DataFrame(VehicleList)
df.sort_values('CreationDate', ascending=False).head(3).to_dict(orient='records')
You can use the operator to achieve that goal:
import operator
my_sorted_list_by_type_and_date = sorted(VehicleList, key=operator.itemgetter('VehicleType', 'CreationDate'))
A small plea for more readable code:
from operator import itemgetter
from itertools import groupby
vtkey = itemgetter('VehicleType')
cdkey = itemgetter('CreationDate')
latest = [
# Get latest from each group.
max(vs, key = cdkey)
# Sort and group by VehicleType.
for g, vs in groupby(sorted(vehicles, key = vtkey), vtkey)
]
A variation on Blckknght's answer using defaultdict to avoid the long if condition:
from collections import defaultdict
import datetime
from operator import itemgetter
latest_dict = defaultdict(lambda: {'CreationDate': datetime.datetime.min})
for vehicle in VehicleList:
t = vehicle['VehicleType']
latest_dict[t] = max(vehicle, latest_dict[t], key=itemgetter('CreationDate'))
latestVehicles = list(latest_dict.values())
latestVehicles:
[{'id': '5', 'VehicleType': 'Car', 'CreationDate': datetime.datetime(2021, 12, 21, 10, 1, 50, 600095)},
{'id': '2', 'VehicleType': 'Bike', 'CreationDate': datetime.datetime(2021, 12, 15, 11, 8, 21, 612000)},
{'id': '3', 'VehicleType': 'Truck', 'CreationDate': datetime.datetime(2021, 9, 13, 10, 1, 50, 350095)}]

How to change all datetime objects in a list to standard YYYY-MM-DD HH:MM:SS

When I query MySQL with Python and the query has datetime fields then I get this list as a result.
[{'_id': 1, 'name': 'index', '_cdate': datetime.datetime(2020, 10, 27, 9, 4, 34), 'title': 'DataExtract'}, {'_id': 2, 'name': 'topmenu', '_cdate': datetime.datetime(2020, 11, 4, 19, 52, 17), 'title': 'topmenu'}, {'_id': 3, 'name': 'functions_common', '_cdate': datetime.datetime(2020, 11, 4, 19, 52, 50), 'title': 'common functions'}, {'_id': 4, 'name': 'leftmenu', '_cdate': datetime.datetime(2020, 11, 4, 19, 53, 56), 'title': 'Left Menu'}, {'_id': 5, 'name': 'todo', '_cdate': datetime.datetime(2020, 11, 7, 8, 49, 38), 'title': 'Todo'}, {'_id': 6, 'name': 'cron_publish', '_cdate': datetime.datetime(2020, 12, 2, 19, 30, 11), 'title': 'Run Publish reports'}, {'_id': 7, 'name': 'test', '_cdate': datetime.datetime(2020, 12, 2, 22, 32, 54), 'title': 'test'}, {'_id': 8, 'name': 'help', '_cdate': datetime.datetime(2020, 12, 5, 7, 12, 44), 'title': 'Help'}, {'_id': 9, 'name': 'api', '_cdate': datetime.datetime(2020, 12, 5, 21, 22, 13), 'title': 'API'}, {'_id': 10, 'name': 'ben', '_cdate': datetime.datetime(2021, 10, 4, 11, 37, 3), 'title': 'List of Reports'}]
How do I either get the query to return the date fields in YYYY-MM-DD HH:MM:SS format? Or how do I convert them in the returned list. When I try to change them by enumerating over the results python throw as error that the dictionary has changed.
The datetime.datetime() objects you're getting are the standard representation of these objects - if you were expecting strings instead, you could simple convert them with datetime.strftime('%Y-%m-%d %H:%M:%S', value) but keep in mind that the datetime object is a more flexible way of keeping the data around. I'd recommend only formatting the date in a specific way if you're writing it to the screen or a file format that expects a string.
Example:
data = [{'_id': 1, 'name': 'index', '_cdate': datetime.datetime(2020, 10, 27, 9, 4, 34), 'title': 'DataExtract'}, {'_id': 2, 'name': 'topmenu', '_cdate': datetime.datetime(2020, 11, 4, 19, 52, 17), 'title': 'topmenu'}, {'_id': 3, 'name': 'functions_common', '_cdate': datetime.datetime(2020, 11, 4, 19, 52, 50), 'title': 'common functions'}, {'_id': 4, 'name': 'leftmenu', '_cdate': datetime.datetime(2020, 11, 4, 19, 53, 56), 'title': 'Left Menu'}, {'_id': 5, 'name': 'todo', '_cdate': datetime.datetime(2020, 11, 7, 8, 49, 38), 'title': 'Todo'}, {'_id': 6, 'name': 'cron_publish', '_cdate': datetime.datetime(2020, 12, 2, 19, 30, 11), 'title': 'Run Publish reports'}, {'_id': 7, 'name': 'test', '_cdate': datetime.datetime(2020, 12, 2, 22, 32, 54), 'title': 'test'}, {'_id': 8, 'name': 'help', '_cdate': datetime.datetime(2020, 12, 5, 7, 12, 44), 'title': 'Help'}, {'_id': 9, 'name': 'api', '_cdate': datetime.datetime(2020, 12, 5, 21, 22, 13), 'title': 'API'}, {'_id': 10, 'name': 'ben', '_cdate': datetime.datetime(2021, 10, 4, 11, 37, 3), 'title': 'List of Reports'}]
for rec in data:
rec['date_str'] = datetime.datetime.strftime('%Y-%m-%d %H:%M:%S', rec['_cdate'])
That would add 'date_str' field to every record with the format you require. Of course, you could also modify it to overwrite the original value.

Normalize JSON API data to columns

I'm trying to get data from our Hubspot CRM database and convert it to a dataframe using pandas. I'm still a beginner in python, but I can't get json_normalize to work.
The output from the database is i JSON format like this:
{'archived': False,
'archived_at': None,
'associations': None,
'created_at': datetime.datetime(2019, 12, 21, 17, 56, 24, 739000, tzinfo=tzutc()),
'id': 'xxx',
'properties': {'createdate': '2019-12-21T17:56:24.739Z',
'email': 'xxxxx#xxxxx.com',
'firstname': 'John',
'hs_object_id': 'xxx',
'lastmodifieddate': '2020-04-22T04:37:40.274Z',
'lastname': 'Hansen'},
'updated_at': datetime.datetime(2020, 4, 22, 4, 37, 40, 274000, tzinfo=tzutc())}, {'archived': False,
'archived_at': None,
'associations': None,
'created_at': datetime.datetime(2019, 12, 21, 17, 52, 38, 485000, tzinfo=tzutc()),
'id': 'bbb',
'properties': {'createdate': '2019-12-21T17:52:38.485Z',
'email': 'bbb#bbb.dk',
'firstname': 'John2',
'hs_object_id': 'bbb',
'lastmodifieddate': '2020-05-19T07:18:28.384Z',
'lastname': 'Hansen2'},
'updated_at': datetime.datetime(2020, 5, 19, 7, 18, 28, 384000, tzinfo=tzutc())}, {'archived': False,
'archived_at': None,
'associations': None,
etc.
Trying to put it into a dataframe using this code:
import hubspot
import pandas as pd
import json
from pandas.io.json import json_normalize
import os
client = hubspot.Client.create(api_key='################')
all_contacts = contacts_client = client.crm.contacts.get_all()
df=pd.io.json.json_normalize(all_contacts,'properties')
df.head
df.to_csv ('All contacts.csv')
But i keep getting an error that i can't resolve.
I have also tried the
pd.dataframe(all_contacts)
and
pf.dataframe.from_dict(all_contacts)
The all_contacts variable is a list of dictionary-like elements. So to create the dataframe I have used list comprehension to create a tuple that only contains the 'properties' for each dictionary-like element.
import datetime
import pandas as pd
from dateutil.tz import tzutc
data = ({'archived': False,
'archived_at': None,
'associations': None,
'created_at': datetime.datetime(2019, 12, 21, 17, 56, 24, 739000, tzinfo=tzutc()),
'id': 'xxx',
'properties': {'createdate': '2019-12-21T17:56:24.739Z',
'email': 'xxxxx#xxxxx.com',
'firstname': 'John',
'hs_object_id': 'xxx',
'lastmodifieddate': '2020-04-22T04:37:40.274Z',
'lastname': 'Hansen'},
'updated_at': datetime.datetime(2020, 4, 22, 4, 37, 40, 274000, tzinfo=tzutc())},
{'archived': False,
'archived_at': None,
'associations': None,
'created_at': datetime.datetime(2019, 12, 21, 17, 52, 38, 485000, tzinfo=tzutc()),
'id': 'bbb',
'properties': {
'createdate': '2019-12-21T17:52:38.485Z',
'email': 'bbb#bbb.dk',
'firstname': 'John2',
'hs_object_id': 'bbb',
'lastmodifieddate': '2020-05-19T07:18:28.384Z',
'lastname': 'Hansen2'},
'updated_at': datetime.datetime(2020, 5, 19, 7, 18, 28, 384000, tzinfo=tzutc())})
df = pd.DataFrame([row['properties'] for row in data])
print(df)
OUTPUT:
createdate email ... lastmodifieddate lastname
0 2019-12-21T17:56:24.739Z xxxxx#xxxxx.com ... 2020-04-22T04:37:40.274Z Hansen
1 2019-12-21T17:52:38.485Z bbb#bbb.dk ... 2020-05-19T07:18:28.384Z Hansen2
[2 rows x 6 columns]

How to write list of multilevel dictionaries into csv

I have a list of dictionaries stored in tweets, and I am trying to write these into a csv file using writerows method.
Sample List looks something like this:
[{'sentiment': 'Unknown', 'date': datetime.datetime(2013, 1, 1, 5, 31, 32), 'body': 'mcd brk b'},
{'sentiment': 'Unknown', 'date': datetime.datetime(2013, 1, 1, 6, 55, 23), 'body': 'co hihq'},
{'sentiment': {'basic': 'Bullish'}, 'date': datetime.datetime(2013, 1, 1, 7, 36, 32), 'body': 'new year bac'}]
Here sentiment key has either one level or two. I am trying to write these dictionaries into a csv format such that I only have the values of these keys for above either 'Unknown' or 'Bullish'.
file = open('BAC.csv','w')
keys=tweets[0].keys()
dict_writer=csv.DictWriter(file,keys)
dict_writer.writerows(tweets)
I get the csv file in the following format
Unknown,2013-01-01 05:31:32,mcd brk b
Unknown,2013-01-01 06:55:23,co hihq
{'basic': 'Bullish'},2013-01-01 07:36:32,mnew year bac
But I need it as
Unknown,2013-01-01 05:31:32,mcd brk b
Unknown,2013-01-01 06:55:23,co hihq
Bullish,2013-01-01 07:36:32,mnew year bac
Is there any easy way to do this? In many instances the levels go up to five, but similar deal just need the value.
You will need to write a function to flatten these sentiment values.
Something like this could work if you have only one element in each level.
def flatten(row, field):
if isinstance(row[field], dict):
row[field] = row[field].values()[0]
return flatten(row, field)
return row
Then you would need to call this method on each row before writing it to the csv.
tweets = [{'sentiment': 'Unknown', 'date': datetime.datetime(2013, 1, 1, 5, 31, 32), 'body': 'mcd brk b'},
{'sentiment': 'Unknown', 'date': datetime.datetime(2013, 1, 1, 6, 55, 23), 'body': 'co hihq'},
{'sentiment': {'basic': {'text': 'Bullish' } }, 'date': datetime.datetime(2013, 1, 1, 7, 36, 32), 'body': 'new year bac'}]
print [flatten(row, 'sentiment') for row in tweets]
Output
[{'date': datetime.datetime(2013, 1, 1, 5, 31, 32), 'body': 'mcd brk b', 'sentiment': 'Unknown'},
{'date': datetime.datetime(2013, 1, 1, 6, 55, 23), 'body': 'co hihq', 'sentiment': 'Unknown'},
{'date': datetime.datetime(2013, 1, 1, 7, 36, 32), 'body': 'new year bac', 'sentiment': 'Bullish'}]

Categories