python list of items (dicts) to group data - python

I have a list of items (dicts) that I have in a variable which was dumped from a MySQL select. Each dict in this list is a row from the MySQL Table. There is an id for each dict and there is also a specific id which can be duplicated in this list, and not always in order. I am trying to loop through this list and get all the data from the rows with similar specific IDs. I will then do something with that data like find price averages, max/min, etc. This table in MySQL is also a temp table, where I'm pulling this info, doing these calculations, then dumping it to a new MySQL table.
An example of the data in the list would be:
{'id': 1, 'item_id': 27, 'item_price': 1.5, 'item_length': 23, 'item_width': 12, 'item_depth': 16}
{'id': 2, 'item_id': 28, 'item_price': 1.5, 'item_length': 23, 'item_width': 12, 'item_depth': 16}
{'id': 3, 'item_id': 27, 'item_price': 1.5, 'item_length': 23, 'item_width': 12, 'item_depth': 16}
{'id': 4, 'item_id': 29, 'item_price': 1.5, 'item_length': 23, 'item_width': 12, 'item_depth': 16}
{'id': 5, 'item_id': 28, 'item_price': 1.5, 'item_length': 23, 'item_width': 12, 'item_depth': 16}
{'id': 6, 'item_id': 27, 'item_price': 1.5, 'item_length': 23, 'item_width': 12, 'item_depth': 16}
{'id': 7, 'item_id': 29, 'item_price': 1.5, 'item_length': 23, 'item_width': 12, 'item_depth': 16}
I want to go through each one, get all of the lines that have item_id of 27, do something with that data, then get all of the lines with item_id of 28, do something with that data, and so on.
I did try a temp_id value setting, but this would be set each time that item_id would change.
tempID = 0
for item in itemList:
if item["item_id"] != tempID:
tempID = item["item_id"]
<gather data>
I think I'm on the right track with setting tempIDs, but not sure how to set them where it goes through the entire list until that same id it first saw is completed and there are no more lines left.

One way to achieve this by organising your data based on "item_id", which will help you in fetching records cleanly. Here I am grouping your data by creating a dictionary with "item_id" as key. Here's a "one-liner" using dictionary comprehension along with the usage of sorted(), collections.groupby() and operator.itemgetter() to create the dictionary as:
my_list = [
{'id': 1, 'item_id': 27, 'item_price': 1.5, 'item_length': 23, 'item_width': 12, 'item_depth': 16},
{'id': 2, 'item_id': 28, 'item_price': 1.5, 'item_length': 23, 'item_width': 12, 'item_depth': 16},
{'id': 3, 'item_id': 27, 'item_price': 1.5, 'item_length': 23, 'item_width': 12, 'item_depth': 16},
{'id': 4, 'item_id': 29, 'item_price': 1.5, 'item_length': 23, 'item_width': 12, 'item_depth': 16},
{'id': 5, 'item_id': 28, 'item_price': 1.5, 'item_length': 23, 'item_width': 12, 'item_depth': 16},
{'id': 6, 'item_id': 27, 'item_price': 1.5, 'item_length': 23, 'item_width': 12, 'item_depth': 16},
{'id': 7, 'item_id': 29, 'item_price': 1.5, 'item_length': 23, 'item_width': 12, 'item_depth': 16}
]
from itertools import groupby
from operator import itemgetter
my_dict = {x: list(l) for x, l in groupby(sorted(my_list, key=itemgetter('item_id')), key=itemgetter('item_id'))}
which will return my_dict as:
{
27: [
{'item_price': 1.5, 'id': 1, 'item_id': 27, 'item_depth': 16, 'item_width': 12, 'item_length': 23},
{'item_price': 1.5, 'id': 3, 'item_id': 27, 'item_depth': 16, 'item_width': 12, 'item_length': 23},
{'item_price': 1.5, 'id': 6, 'item_id': 27, 'item_depth': 16, 'item_width': 12, 'item_length': 23}
],
28: [
{'item_price': 1.5, 'id': 2, 'item_id': 28, 'item_depth': 16, 'item_width': 12, 'item_length': 23},
{'item_price': 1.5, 'id': 5, 'item_id': 28, 'item_depth': 16, 'item_width': 12, 'item_length': 23}
],
29: [
{'item_price': 1.5, 'id': 4, 'item_id': 29, 'item_depth': 16, 'item_width': 12, 'item_length': 23},
{'item_price': 1.5, 'id': 7, 'item_id': 29, 'item_depth': 16, 'item_width': 12, 'item_length': 23}
]
}
Now you can iterate this dict and utilize your data as you need like:
for k, v in my_dict.items():
# Do whatever you want with the data
print("Key: {} - data: {}".format(k, str(v)))
# Prints:
# Key: 27 - data: [{'item_price': 1.5, 'id': 1, 'item_id': 27, 'item_depth': 16, 'item_width': 12, 'item_length': 23}, {'item_price': 1.5, 'id': 3, 'item_id': 27, 'item_depth': 16, 'item_width': 12, 'item_length': 23}, {'item_price': 1.5, 'id': 6, 'item_id': 27, 'item_depth': 16, 'item_width': 12, 'item_length': 23}]
# Key: 28 - data: [{'item_price': 1.5, 'id': 2, 'item_id': 28, 'item_depth': 16, 'item_width': 12, 'item_length': 23}, {'item_price': 1.5, 'id': 5, 'item_id': 28, 'item_depth': 16, 'item_width': 12, 'item_length': 23}]
# Key: 29 - data: [{'item_price': 1.5, 'id': 4, 'item_id': 29, 'item_depth': 16, 'item_width': 12, 'item_length': 23}, {'item_price': 1.5, 'id': 7, 'item_id': 29, 'item_depth': 16, 'item_width': 12, 'item_length': 23}]

If you can/want to use pandas, the pd.DataFrame.from_records can be helpful.
Just use it as item_df = pd.DataFrame.from_records(itemList) and use pandas groupby to compute the statistics

Related

How to filter list of dictionaries in python?

I have a list of dictionaries which is as follow-
VehicleList = [
{
'id': '1',
'VehicleType': 'Car',
'CreationDate': datetime.datetime(2021, 12, 10, 16, 9, 44, 872000)
},
{
'id': '2',
'VehicleType': 'Bike',
'CreationDate': datetime.datetime(2021, 12, 15, 11, 8, 21, 612000)
},
{
'id': '3',
'VehicleType': 'Truck',
'CreationDate': datetime.datetime(2021, 9, 13, 10, 1, 50, 350095)
},
{
'id': '4',
'VehicleType': 'Bike',
'CreationDate': datetime.datetime(2021, 12, 10, 21, 1, 00, 300012)
},
{
'id': '5',
'VehicleType': 'Car',
'CreationDate': datetime.datetime(2021, 12, 21, 10, 1, 50, 600095)
}
]
How can I get a list of the latest vehicles for each 'VehicleType' based on their 'CreationDate'?
I expect something like this-
latestVehicles = [
{
'id': '5',
'VehicleType': 'Car',
'CreationDate': datetime.datetime(2021, 12, 21, 10, 1, 50, 600095)
},
{
'id': '2',
'VehicleType': 'Bike',
'CreationDate': datetime.datetime(2021, 12, 15, 11, 8, 21, 612000)
},
{
'id': '3',
'VehicleType': 'Truck',
'CreationDate': datetime.datetime(2021, 9, 13, 10, 1, 50, 350095)
}
]
I tried separating out each dictionary based on their 'VehicleType' into different lists and then picking up the latest one.
I believe there might be a more optimal way to do this.
Use a dictionary mapping from VehicleType value to the dictionary you want in your final list. Compare the date of each item in the input list with the one your dict, and keep the later one.
latest_dict = {}
for vehicle in VehicleList:
t = vehicle['VehicleType']
if t not in latest_dict or vehicle['CreationDate'] > latest_dict[t]['CreationDate']:
latest_dict[t] = vehicle
latestVehicles = list(latest_dict.values())
Here is a solution using max and filter:
VehicleLatest = [
max(
filter(lambda _: _["VehicleType"] == t, VehicleList),
key=lambda _: _["CreationDate"]
) for t in {_["VehicleType"] for _ in VehicleList}
]
Result
print(VehicleLatest)
# [{'id': '2', 'VehicleType': 'Bike', 'CreationDate': datetime.datetime(2021, 12, 15, 11, 8, 21, 612000)}, {'id': '3', 'VehicleType': 'Truck', 'CreationDate': datetime.datetime(2021, 9, 13, 10, 1, 50, 350095)}, {'id': '5', 'VehicleType': 'Car', 'CreationDate': datetime.datetime(2021, 12, 21, 10, 1, 50, 600095)}]
I think you can acheive what you want using the groupby function from itertools.
from itertools import groupby
# entries sorted according to the key we wish to groupby: 'VehicleType'
VehicleList = sorted(VehicleList, key=lambda x: x["VehicleType"])
latestVehicles = []
# Then the elements are grouped.
for k, v in groupby(VehicleList, lambda x: x["VehicleType"]):
# We then append to latestVehicles the 0th entry of the
# grouped elements after sorting according to the 'CreationDate'
latestVehicles.append(sorted(list(v), key=lambda x: x["CreationDate"], reverse=True)[0])
Sort by 'VehicleType' and 'CreationDate', then create a dictionary from 'VehicleType' and vehicle to get the latest vehicle for each type:
VehicleList.sort(key=lambda x: (x.get('VehicleType'), x.get('CreationDate')))
out = list(dict(zip([item.get('VehicleType') for item in VehicleList], VehicleList)).values())
Output:
[{'id': '2',
'VehicleType': 'Bike',
'CreationDate': datetime.datetime(2021, 12, 15, 11, 8, 21, 612000)},
{'id': '5',
'VehicleType': 'Car',
'CreationDate': datetime.datetime(2021, 12, 21, 10, 1, 50, 600095)},
{'id': '3',
'VehicleType': 'Truck',
'CreationDate': datetime.datetime(2021, 9, 13, 10, 1, 50, 350095)}]
This is very straightforwards in pandas. First load the list of dicts as a pandas dataframe, then sort the values by date, take the top n items (3 in the example below), and export to dict.
import pandas as pd
df = pd.DataFrame(VehicleList)
df.sort_values('CreationDate', ascending=False).head(3).to_dict(orient='records')
You can use the operator to achieve that goal:
import operator
my_sorted_list_by_type_and_date = sorted(VehicleList, key=operator.itemgetter('VehicleType', 'CreationDate'))
A small plea for more readable code:
from operator import itemgetter
from itertools import groupby
vtkey = itemgetter('VehicleType')
cdkey = itemgetter('CreationDate')
latest = [
# Get latest from each group.
max(vs, key = cdkey)
# Sort and group by VehicleType.
for g, vs in groupby(sorted(vehicles, key = vtkey), vtkey)
]
A variation on Blckknght's answer using defaultdict to avoid the long if condition:
from collections import defaultdict
import datetime
from operator import itemgetter
latest_dict = defaultdict(lambda: {'CreationDate': datetime.datetime.min})
for vehicle in VehicleList:
t = vehicle['VehicleType']
latest_dict[t] = max(vehicle, latest_dict[t], key=itemgetter('CreationDate'))
latestVehicles = list(latest_dict.values())
latestVehicles:
[{'id': '5', 'VehicleType': 'Car', 'CreationDate': datetime.datetime(2021, 12, 21, 10, 1, 50, 600095)},
{'id': '2', 'VehicleType': 'Bike', 'CreationDate': datetime.datetime(2021, 12, 15, 11, 8, 21, 612000)},
{'id': '3', 'VehicleType': 'Truck', 'CreationDate': datetime.datetime(2021, 9, 13, 10, 1, 50, 350095)}]

How to change all datetime objects in a list to standard YYYY-MM-DD HH:MM:SS

When I query MySQL with Python and the query has datetime fields then I get this list as a result.
[{'_id': 1, 'name': 'index', '_cdate': datetime.datetime(2020, 10, 27, 9, 4, 34), 'title': 'DataExtract'}, {'_id': 2, 'name': 'topmenu', '_cdate': datetime.datetime(2020, 11, 4, 19, 52, 17), 'title': 'topmenu'}, {'_id': 3, 'name': 'functions_common', '_cdate': datetime.datetime(2020, 11, 4, 19, 52, 50), 'title': 'common functions'}, {'_id': 4, 'name': 'leftmenu', '_cdate': datetime.datetime(2020, 11, 4, 19, 53, 56), 'title': 'Left Menu'}, {'_id': 5, 'name': 'todo', '_cdate': datetime.datetime(2020, 11, 7, 8, 49, 38), 'title': 'Todo'}, {'_id': 6, 'name': 'cron_publish', '_cdate': datetime.datetime(2020, 12, 2, 19, 30, 11), 'title': 'Run Publish reports'}, {'_id': 7, 'name': 'test', '_cdate': datetime.datetime(2020, 12, 2, 22, 32, 54), 'title': 'test'}, {'_id': 8, 'name': 'help', '_cdate': datetime.datetime(2020, 12, 5, 7, 12, 44), 'title': 'Help'}, {'_id': 9, 'name': 'api', '_cdate': datetime.datetime(2020, 12, 5, 21, 22, 13), 'title': 'API'}, {'_id': 10, 'name': 'ben', '_cdate': datetime.datetime(2021, 10, 4, 11, 37, 3), 'title': 'List of Reports'}]
How do I either get the query to return the date fields in YYYY-MM-DD HH:MM:SS format? Or how do I convert them in the returned list. When I try to change them by enumerating over the results python throw as error that the dictionary has changed.
The datetime.datetime() objects you're getting are the standard representation of these objects - if you were expecting strings instead, you could simple convert them with datetime.strftime('%Y-%m-%d %H:%M:%S', value) but keep in mind that the datetime object is a more flexible way of keeping the data around. I'd recommend only formatting the date in a specific way if you're writing it to the screen or a file format that expects a string.
Example:
data = [{'_id': 1, 'name': 'index', '_cdate': datetime.datetime(2020, 10, 27, 9, 4, 34), 'title': 'DataExtract'}, {'_id': 2, 'name': 'topmenu', '_cdate': datetime.datetime(2020, 11, 4, 19, 52, 17), 'title': 'topmenu'}, {'_id': 3, 'name': 'functions_common', '_cdate': datetime.datetime(2020, 11, 4, 19, 52, 50), 'title': 'common functions'}, {'_id': 4, 'name': 'leftmenu', '_cdate': datetime.datetime(2020, 11, 4, 19, 53, 56), 'title': 'Left Menu'}, {'_id': 5, 'name': 'todo', '_cdate': datetime.datetime(2020, 11, 7, 8, 49, 38), 'title': 'Todo'}, {'_id': 6, 'name': 'cron_publish', '_cdate': datetime.datetime(2020, 12, 2, 19, 30, 11), 'title': 'Run Publish reports'}, {'_id': 7, 'name': 'test', '_cdate': datetime.datetime(2020, 12, 2, 22, 32, 54), 'title': 'test'}, {'_id': 8, 'name': 'help', '_cdate': datetime.datetime(2020, 12, 5, 7, 12, 44), 'title': 'Help'}, {'_id': 9, 'name': 'api', '_cdate': datetime.datetime(2020, 12, 5, 21, 22, 13), 'title': 'API'}, {'_id': 10, 'name': 'ben', '_cdate': datetime.datetime(2021, 10, 4, 11, 37, 3), 'title': 'List of Reports'}]
for rec in data:
rec['date_str'] = datetime.datetime.strftime('%Y-%m-%d %H:%M:%S', rec['_cdate'])
That would add 'date_str' field to every record with the format you require. Of course, you could also modify it to overwrite the original value.

Normalize JSON API data to columns

I'm trying to get data from our Hubspot CRM database and convert it to a dataframe using pandas. I'm still a beginner in python, but I can't get json_normalize to work.
The output from the database is i JSON format like this:
{'archived': False,
'archived_at': None,
'associations': None,
'created_at': datetime.datetime(2019, 12, 21, 17, 56, 24, 739000, tzinfo=tzutc()),
'id': 'xxx',
'properties': {'createdate': '2019-12-21T17:56:24.739Z',
'email': 'xxxxx#xxxxx.com',
'firstname': 'John',
'hs_object_id': 'xxx',
'lastmodifieddate': '2020-04-22T04:37:40.274Z',
'lastname': 'Hansen'},
'updated_at': datetime.datetime(2020, 4, 22, 4, 37, 40, 274000, tzinfo=tzutc())}, {'archived': False,
'archived_at': None,
'associations': None,
'created_at': datetime.datetime(2019, 12, 21, 17, 52, 38, 485000, tzinfo=tzutc()),
'id': 'bbb',
'properties': {'createdate': '2019-12-21T17:52:38.485Z',
'email': 'bbb#bbb.dk',
'firstname': 'John2',
'hs_object_id': 'bbb',
'lastmodifieddate': '2020-05-19T07:18:28.384Z',
'lastname': 'Hansen2'},
'updated_at': datetime.datetime(2020, 5, 19, 7, 18, 28, 384000, tzinfo=tzutc())}, {'archived': False,
'archived_at': None,
'associations': None,
etc.
Trying to put it into a dataframe using this code:
import hubspot
import pandas as pd
import json
from pandas.io.json import json_normalize
import os
client = hubspot.Client.create(api_key='################')
all_contacts = contacts_client = client.crm.contacts.get_all()
df=pd.io.json.json_normalize(all_contacts,'properties')
df.head
df.to_csv ('All contacts.csv')
But i keep getting an error that i can't resolve.
I have also tried the
pd.dataframe(all_contacts)
and
pf.dataframe.from_dict(all_contacts)
The all_contacts variable is a list of dictionary-like elements. So to create the dataframe I have used list comprehension to create a tuple that only contains the 'properties' for each dictionary-like element.
import datetime
import pandas as pd
from dateutil.tz import tzutc
data = ({'archived': False,
'archived_at': None,
'associations': None,
'created_at': datetime.datetime(2019, 12, 21, 17, 56, 24, 739000, tzinfo=tzutc()),
'id': 'xxx',
'properties': {'createdate': '2019-12-21T17:56:24.739Z',
'email': 'xxxxx#xxxxx.com',
'firstname': 'John',
'hs_object_id': 'xxx',
'lastmodifieddate': '2020-04-22T04:37:40.274Z',
'lastname': 'Hansen'},
'updated_at': datetime.datetime(2020, 4, 22, 4, 37, 40, 274000, tzinfo=tzutc())},
{'archived': False,
'archived_at': None,
'associations': None,
'created_at': datetime.datetime(2019, 12, 21, 17, 52, 38, 485000, tzinfo=tzutc()),
'id': 'bbb',
'properties': {
'createdate': '2019-12-21T17:52:38.485Z',
'email': 'bbb#bbb.dk',
'firstname': 'John2',
'hs_object_id': 'bbb',
'lastmodifieddate': '2020-05-19T07:18:28.384Z',
'lastname': 'Hansen2'},
'updated_at': datetime.datetime(2020, 5, 19, 7, 18, 28, 384000, tzinfo=tzutc())})
df = pd.DataFrame([row['properties'] for row in data])
print(df)
OUTPUT:
createdate email ... lastmodifieddate lastname
0 2019-12-21T17:56:24.739Z xxxxx#xxxxx.com ... 2020-04-22T04:37:40.274Z Hansen
1 2019-12-21T17:52:38.485Z bbb#bbb.dk ... 2020-05-19T07:18:28.384Z Hansen2
[2 rows x 6 columns]

how to calculate the average disk_available, based on the hostname hourly , python

result data :
<QuerySet [{'disk_available': 26, 'hostname': '2', 'day': datetime.datetime(2020, 2, 11, 0, 0, tzinfo=<UTC>), 'c': 354}, {'disk_available': 27, 'hostname': '2', 'day': datetime.datetime(2020, 2, 10, 0, 0, tzinfo=<UTC>), 'c': 273}, {'disk_available': 19, 'hostname': '2', 'day': datetime.datetime(2020, 2, 12, 0, 0, tzinfo=<UTC>), 'c': 12}, {'disk_available': 26, 'hostname': '2', 'day': datetime.datetime(2020, 2, 12, 0, 0, tzinfo=<UTC>), 'c': 45}, {'disk_available': 26, 'hostname': 'tes', 'day': datetime.datetime(2020, 2, 11, 0, 0, tzinfo=<UTC>), 'c': 1945}, {'disk_available': 19, 'hostname': 'tes', 'day': datetime.datetime(2020, 2, 12, 0, 0, tzinfo=<UTC>), 'c': 53}, {'disk_available': 1, 'hostname': 'tes', 'day': datetime.datetime(2020, 2, 11, 0, 0, tzinfo=<UTC>), 'c': 1}, {'disk_available': 26, 'hostname': 'tes', 'day': datetime.datetime(2020, 2, 12, 0, 0, tzinfo=<UTC>), 'c': 45}, {'disk_available': 27, 'hostname': 'tes', 'day': datetime.datetime(2020, 2, 10, 0, 0, tzinfo=<UTC>), 'c': 291}]>
SocketClient.objects.annotate(day=TruncDay('create')).values('day').annotate(c=Count('id')).values('day', 'disk_available', 'hostname', 'c').order_by('hostname')
.
from the results above I want to do that , I want to display the average result of hostname hourly

Pandas ticker to ohlc

rows is a list of dict from mysql.
rows example
[{'date': datetime.datetime(2017, 3, 21, 13, 27, 20), 'tid': 648605515L, 'price': Decimal('1080.04000000'), 'type': 1, 'amount': Decimal('10.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 20), 'tid': 648605549L, 'price': Decimal('1081.55000000'), 'type': 1, 'amount': Decimal('16.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 20), 'tid': 648605547L, 'price': Decimal('1081.33000000'), 'type': 1, 'amount': Decimal('20.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 20), 'tid': 648605545L, 'price': Decimal('1081.30000000'), 'type': 1, 'amount': Decimal('16.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 20), 'tid': 648605543L, 'price': Decimal('1081.29000000'), 'type': 1, 'amount': Decimal('20.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 20), 'tid': 648605541L, 'price': Decimal('1080.46000000'), 'type': 1, 'amount': Decimal('26.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 20), 'tid': 648605517L, 'price': Decimal('1080.04000000'), 'type': 1, 'amount': Decimal('8.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 22), 'tid': 648605601L, 'price': Decimal('1079.69000000'), 'type': -1, 'amount': Decimal('70.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 25), 'tid': 648605686L, 'price': Decimal('1079.72000000'), 'type': -1, 'amount': Decimal('4.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 26), 'tid': 648605765L, 'price': Decimal('1079.45000000'), 'type': 1, 'amount': Decimal('6.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 26), 'tid': 648605753L, 'price': Decimal('1079.60000000'), 'type': -1, 'amount': Decimal('106.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 26), 'tid': 648605751L, 'price': Decimal('1079.60000000'), 'type': -1, 'amount': Decimal('80.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 26), 'tid': 648605749L, 'price': Decimal('1079.67000000'), 'type': -1, 'amount': Decimal('430.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 26), 'tid': 648605747L, 'price': Decimal('1079.70000000'), 'type': -1, 'amount': Decimal('66.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 26), 'tid': 648605745L, 'price': Decimal('1079.74000000'), 'type': -1, 'amount': Decimal('12.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 27), 'tid': 648605785L, 'price': Decimal('1079.45000000'), 'type': 1, 'amount': Decimal('2.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 27), 'tid': 648605774L, 'price': Decimal('1079.45000000'), 'type': 1, 'amount': Decimal('6.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 27), 'tid': 648605771L, 'price': Decimal('1079.45000000'), 'type': 1, 'amount': Decimal('14.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 28), 'tid': 648605827L, 'price': Decimal('1079.45000000'), 'type': 1, 'amount': Decimal('42.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 28), 'tid': 648605842L, 'price': Decimal('1079.45000000'), 'type': 1, 'amount': Decimal('10.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 32), 'tid': 648605973L, 'price': Decimal('1079.45000000'), 'type': 1, 'amount': Decimal('2.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 37), 'tid': 648606114L, 'price': Decimal('1079.44000000'), 'type': 1, 'amount': Decimal('24.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 37), 'tid': 648606116L, 'price': Decimal('1079.45000000'), 'type': 1, 'amount': Decimal('40.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 42), 'tid': 648606258L, 'price': Decimal('1079.45000000'), 'type': 1, 'amount': Decimal('56.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 45), 'tid': 648606345L, 'price': Decimal('1079.46000000'), 'type': -1, 'amount': Decimal('10.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 46), 'tid': 648606392L, 'price': Decimal('1079.69000000'), 'type': 1, 'amount': Decimal('44.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 48), 'tid': 648606418L, 'price': Decimal('1079.60000000'), 'type': -1, 'amount': Decimal('40.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 48), 'tid': 648606420L, 'price': Decimal('1079.46000000'), 'type': -1, 'amount': Decimal('36.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 48), 'tid': 648606422L, 'price': Decimal('1079.46000000'), 'type': -1, 'amount': Decimal('94.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 50), 'tid': 648606499L, 'price': Decimal('1079.31000000'), 'type': 1, 'amount': Decimal('80.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 50), 'tid': 648606478L, 'price': Decimal('1079.31000000'), 'type': -1, 'amount': Decimal('6.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 50), 'tid': 648606476L, 'price': Decimal('1079.31000000'), 'type': -1, 'amount': Decimal('34.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 50), 'tid': 648606474L, 'price': Decimal('1079.55000000'), 'type': -1, 'amount': Decimal('8.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 55), 'tid': 648606666L, 'price': Decimal('1079.31000000'), 'type': 1, 'amount': Decimal('44.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 55), 'tid': 648606650L, 'price': Decimal('1079.17000000'), 'type': 1, 'amount': Decimal('8.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 27, 55), 'tid': 648606648L, 'price': Decimal('1079.17000000'), 'type': 1, 'amount': Decimal('8.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 1), 'tid': 648606820L, 'price': Decimal('1079.03000000'), 'type': -1, 'amount': Decimal('28.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 2), 'tid': 648606825L, 'price': Decimal('1079.03000000'), 'type': 1, 'amount': Decimal('30.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 2), 'tid': 648606836L, 'price': Decimal('1079.02000000'), 'type': -1, 'amount': Decimal('22.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 5), 'tid': 648606945L, 'price': Decimal('1078.58000000'), 'type': -1, 'amount': Decimal('2.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 5), 'tid': 648606943L, 'price': Decimal('1078.61000000'), 'type': -1, 'amount': Decimal('2.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 5), 'tid': 648606941L, 'price': Decimal('1078.63000000'), 'type': -1, 'amount': Decimal('2.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 5), 'tid': 648606939L, 'price': Decimal('1078.88000000'), 'type': -1, 'amount': Decimal('2.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 5), 'tid': 648606926L, 'price': Decimal('1078.88000000'), 'type': -1, 'amount': Decimal('428.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 6), 'tid': 648606984L, 'price': Decimal('1078.58000000'), 'type': 1, 'amount': Decimal('2.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 6), 'tid': 648606982L, 'price': Decimal('1078.05000000'), 'type': -1, 'amount': Decimal('10.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 6), 'tid': 648606971L, 'price': Decimal('1078.58000000'), 'type': 1, 'amount': Decimal('2.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 6), 'tid': 648606957L, 'price': Decimal('1078.05000000'), 'type': -1, 'amount': Decimal('74.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 6), 'tid': 648606955L, 'price': Decimal('1078.15000000'), 'type': -1, 'amount': Decimal('6.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 6), 'tid': 648606953L, 'price': Decimal('1078.15000000'), 'type': -1, 'amount': Decimal('14.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 6), 'tid': 648606951L, 'price': Decimal('1078.42000000'), 'type': -1, 'amount': Decimal('16.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 7), 'tid': 648606992L, 'price': Decimal('1078.05000000'), 'type': -1, 'amount': Decimal('2.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 7), 'tid': 648606995L, 'price': Decimal('1078.58000000'), 'type': 1, 'amount': Decimal('2.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 7), 'tid': 648607023L, 'price': Decimal('1078.06000000'), 'type': -1, 'amount': Decimal('4.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 8), 'tid': 648607047L, 'price': Decimal('1078.86000000'), 'type': 1, 'amount': Decimal('2.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 10), 'tid': 648607113L, 'price': Decimal('1078.06000000'), 'type': -1, 'amount': Decimal('2.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 10), 'tid': 648607115L, 'price': Decimal('1078.03000000'), 'type': -1, 'amount': Decimal('148.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 12), 'tid': 648607192L, 'price': Decimal('1079.00000000'), 'type': -1, 'amount': Decimal('10.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 13), 'tid': 648607218L, 'price': Decimal('1078.99000000'), 'type': 1, 'amount': Decimal('98.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 13), 'tid': 648607220L, 'price': Decimal('1079.00000000'), 'type': 1, 'amount': Decimal('42.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 13), 'tid': 648607222L, 'price': Decimal('1079.03000000'), 'type': 1, 'amount': Decimal('342.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 13), 'tid': 648607224L, 'price': Decimal('1079.16000000'), 'type': 1, 'amount': Decimal('512.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 14), 'tid': 648607250L, 'price': Decimal('1078.98000000'), 'type': 1, 'amount': Decimal('44.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 14), 'tid': 648607252L, 'price': Decimal('1078.98000000'), 'type': 1, 'amount': Decimal('12.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 14), 'tid': 648607254L, 'price': Decimal('1079.00000000'), 'type': 1, 'amount': Decimal('106.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 14), 'tid': 648607256L, 'price': Decimal('1079.16000000'), 'type': 1, 'amount': Decimal('40.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 20), 'tid': 648607431L, 'price': Decimal('1079.16000000'), 'type': 1, 'amount': Decimal('28.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 20), 'tid': 648607429L, 'price': Decimal('1079.01000000'), 'type': 1, 'amount': Decimal('2.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 20), 'tid': 648607427L, 'price': Decimal('1079.01000000'), 'type': 1, 'amount': Decimal('2.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 23), 'tid': 648607518L, 'price': Decimal('1079.16000000'), 'type': 1, 'amount': Decimal('8.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 24), 'tid': 648607544L, 'price': Decimal('1079.16000000'), 'type': 1, 'amount': Decimal('344.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 25), 'tid': 648607593L, 'price': Decimal('1078.79000000'), 'type': -1, 'amount': Decimal('6.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 26), 'tid': 648607631L, 'price': Decimal('1079.16000000'), 'type': 1, 'amount': Decimal('430.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 26), 'tid': 648607623L, 'price': Decimal('1079.16000000'), 'type': 1, 'amount': Decimal('18.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 26), 'tid': 648607621L, 'price': Decimal('1078.79000000'), 'type': 1, 'amount': Decimal('14.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 29), 'tid': 648607695L, 'price': Decimal('1079.16000000'), 'type': 1, 'amount': Decimal('776.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 32), 'tid': 648607803L, 'price': Decimal('1079.16000000'), 'type': 1, 'amount': Decimal('2.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 32), 'tid': 648607805L, 'price': Decimal('1079.16000000'), 'type': 1, 'amount': Decimal('10.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 36), 'tid': 648607905L, 'price': Decimal('1079.16000000'), 'type': 1, 'amount': Decimal('2.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 37), 'tid': 648607940L, 'price': Decimal('1079.31000000'), 'type': 1, 'amount': Decimal('2.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 42), 'tid': 648608110L, 'price': Decimal('1079.46000000'), 'type': -1, 'amount': Decimal('12.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 46), 'tid': 648608211L, 'price': Decimal('1079.88000000'), 'type': -1, 'amount': Decimal('12.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 46), 'tid': 648608213L, 'price': Decimal('1079.88000000'), 'type': -1, 'amount': Decimal('6.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 57), 'tid': 648608534L, 'price': Decimal('1080.29000000'), 'type': 1, 'amount': Decimal('14.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 28, 57), 'tid': 648608536L, 'price': Decimal('1080.30000000'), 'type': 1, 'amount': Decimal('2.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 29, 2), 'tid': 648608683L, 'price': Decimal('1080.59000000'), 'type': 1, 'amount': Decimal('40.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 29, 3), 'tid': 648608733L, 'price': Decimal('1080.59000000'), 'type': 1, 'amount': Decimal('360.00000000')}, {'date': datetime.datetime(2017, 3, 21, 13, 29, 7), 'tid': 648608838L, 'price': Decimal('1080.90000000'), 'type': 1, 'amount': Decimal('82.00000000')}]
if I didn't use set_index ,it will have an TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex'
if rows:
df = pd.DataFrame(rows)
print df.head()
# TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex'
df = df.set_index("date")
print df.head()
resample_data = df.resample("1min", how={"price": "ohlc", "amount": "sum"})
print resample_data
Result :
Connected to pydev debugger (build 162.1967.10)
amount date price tid type
0 2.00000000 2017-03-21 11:15:12 1075.83000000 648370156 -1
1 10.00000000 2017-03-21 11:15:15 1076.00000000 648370241 -1
2 10.00000000 2017-03-21 11:15:17 1075.83000000 648370297 -1
3 10.00000000 2017-03-21 11:15:17 1075.83000000 648370311 1
4 8.00000000 2017-03-21 11:15:19 1076.13000000 648370370 1
amount price tid type
date
2017-03-21 11:15:12 2.00000000 1075.83000000 648370156 -1
2017-03-21 11:15:15 10.00000000 1076.00000000 648370241 -1
2017-03-21 11:15:17 10.00000000 1075.83000000 648370297 -1
2017-03-21 11:15:17 10.00000000 1075.83000000 648370311 1
2017-03-21 11:15:19 8.00000000 1076.13000000 648370370 1
/Users/wyx/bitcoin_workspace/fibo-strategy/ticker.py:45: FutureWarning: how in .resample() is deprecated
the new syntax is .resample(...)..apply(<func>)
resample_data = df.resample("1min", how={"price": "ohlc", "amount": "sum"})
Traceback (most recent call last):
File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 1580, in <module>
globals = debugger.run(setup['file'], None, None, is_module)
File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 964, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/Users/wyx/bitcoin_workspace/fibo-strategy/ticker.py", line 45, in <module>
resample_data = df.resample("1min", how={"price": "ohlc", "amount": "sum"})
File "/Users/wyx/bitcoin_workspace/fibo-strategy/.env/lib/python2.7/site-packages/pandas/core/generic.py", line 4216, in resample
limit=limit)
File "/Users/wyx/bitcoin_workspace/fibo-strategy/.env/lib/python2.7/site-packages/pandas/tseries/resample.py", line 582, in _maybe_process_deprecations
r = r.aggregate(how)
File "/Users/wyx/bitcoin_workspace/fibo-strategy/.env/lib/python2.7/site-packages/pandas/tseries/resample.py", line 320, in aggregate
result, how = self._aggregate(arg, *args, **kwargs)
File "/Users/wyx/bitcoin_workspace/fibo-strategy/.env/lib/python2.7/site-packages/pandas/core/base.py", line 549, in _aggregate
result = _agg(arg, _agg_1dim)
File "/Users/wyx/bitcoin_workspace/fibo-strategy/.env/lib/python2.7/site-packages/pandas/core/base.py", line 500, in _agg
result[fname] = func(fname, agg_how)
File "/Users/wyx/bitcoin_workspace/fibo-strategy/.env/lib/python2.7/site-packages/pandas/core/base.py", line 483, in _agg_1dim
return colg.aggregate(how, _level=(_level or 0) + 1)
File "/Users/wyx/bitcoin_workspace/fibo-strategy/.env/lib/python2.7/site-packages/pandas/core/groupby.py", line 2652, in aggregate
return getattr(self, func_or_funcs)(*args, **kwargs)
File "/Users/wyx/bitcoin_workspace/fibo-strategy/.env/lib/python2.7/site-packages/pandas/core/groupby.py", line 1128, in ohlc
lambda x: x._cython_agg_general('ohlc'))
File "/Users/wyx/bitcoin_workspace/fibo-strategy/.env/lib/python2.7/site-packages/pandas/core/groupby.py", line 3103, in _apply_to_column_groupbys
return func(self)
File "/Users/wyx/bitcoin_workspace/fibo-strategy/.env/lib/python2.7/site-packages/pandas/core/groupby.py", line 1128, in <lambda>
lambda x: x._cython_agg_general('ohlc'))
File "/Users/wyx/bitcoin_workspace/fibo-strategy/.env/lib/python2.7/site-packages/pandas/core/groupby.py", line 808, in _cython_agg_general
raise DataError('No numeric types to aggregate')
pandas.core.base.DataError: No numeric types to aggregate
Process finished with exit code 1
I am a rookie for pandas.
How solve the error?
And if I want to use the last close price to fill the NaN of next
min ohlc. How to do that?
You need to set an index using your dates.
Code:
from io import StringIO
df = pd.read_csv(StringIO(
u"""amount date price tid type
6.00000000 2017-03-21t10:46:32 1059.26000000 648313975 -1
4.00000000 2017-03-21t10:46:37 1059.42000000 648314094 -1
2.00000000 2017-03-21t10:46:37 1059.42000000 648314096 -1
2.00000000 2017-03-21t10:46:41 1059.26000000 648314176 -1
32.00000000 2017-03-21t10:46:41 1059.26000000 648314189 -1
"""), sep='\s+', parse_dates='date'.split())
print(df)
resample_data = df.set_index('date').resample(
"1min", how={"price": "ohlc", "amount": "sum"})
print(resample_data)
Results:
amount date price tid type
0 6.0 2017-03-21 10:46:32 1059.26 648313975 -1
1 4.0 2017-03-21 10:46:37 1059.42 648314094 -1
2 2.0 2017-03-21 10:46:37 1059.42 648314096 -1
3 2.0 2017-03-21 10:46:41 1059.26 648314176 -1
4 32.0 2017-03-21 10:46:41 1059.26 648314189 -1
price amount
open high low close amount
date
2017-03-21 10:46:00 1059.26 1059.42 1059.26 1059.26 46.0

Categories