Select rows in a data frame based on the date range - python

I have a list of dictionaries:
mylist=
[{'Date': '10/2/2021', 'ID': 11773, 'Receiver': 'Mike'},
{'Date': '10/2/2021', 'ID': 15673, 'Receiver': 'Jane'},
{'Date': '10/3/2021', 'ID': 11773, 'Receiver': 'Mike'},
...
{'Date': '12/25/2021', 'ID': 34653, 'Receiver': 'Jack'}]
I want to select the rows within a date range, for example from 10/3/2021 to 11/3/2021. I tried the following steps:
dfmylist = pd.DataFrame(mylist)
dfmylistnew = (dfmylist['Date'] > '10/3/2021') & (dfmylist['Date'] <= '11/3/2021')
I converted my list to a data frame and then select the date range. However, the dfmylistnew data frame doesn't show up properly. What did I miss?
The output of dfmylistnew is:
0 False
1 False
2 False
3 False
4 False
Name: Date, dtype: bool

You missed to convert the "Date" to datetime type:
import pandas as pd
mylist=[{'Date': '10/2/2021', 'ID': 11773, 'Receiver': 'Mike'},
{'Date': '10/2/2021', 'ID': 15673, 'Receiver': 'Jane'},
{'Date': '10/3/2021', 'ID': 11773, 'Receiver': 'Mike'},
{'Date': '12/25/2021', 'ID': 34653, 'Receiver': 'Jack'}]
dfmylist = pd.DataFrame(mylist)
dfmylist['Date'] = pd.to_datetime(dfmylist['Date']) # you missed this line
dfmylistnew = (dfmylist['Date'] > '10/2/2021') & (dfmylist['Date'] <= '11/3/2021')
dfmylist.loc[dfmylistnew]

One option you have is to change the Date column to the index of the Dataframe, once that is set to the index python will recognize it as a date field and you can use a df.loc to find the data between specified dates.
df = pd.DataFrame(mylist)
df.set_index('Date')
df.loc['10/3/2021':'11/3/2021']

You can try this:
dfmylistnew = dfmylist[list(dfmylist['Date'] > '10/3/2021') and list(dfmylist['Date']<='11/3/2021')]

Related

Trying to make a pandas dataframe from a dictionary in a list of dictionaries

I have JSON data from a website, where I am trying to create a pandas dataframe from the data. It seems like I have a list of dictionaries that is nested in a dictionary and I am not sure what to do. My goal was to create key,value pairs and then make them into a dataframe.
import requests
import pandas as pd
search_url = 'https://data.europa.eu/api/hub/statistics/data/num-datasets'
response = requests.get(search_url)
root=response.json()
print(root)
I was able to get the data into my notebook, but I am not sure the best way to get data out of the dictionaries into lists to create a dataframe.
I tried to use pd.json_normalize(), but it didn't work.
The output looks like this:
{'name': 'count',
'stats': [{'date': '2019-08-01', 'count': 877625.0},
{'date': '2019-09-01', 'count': 895697.0},
{'date': '2020-10-01', 'count': 1161894.0},
{'date': '2020-11-01', 'count': 1205046.0},
{'date': '2020-12-01', 'count': 1184899.0},
{'date': '2023-01-01', 'count': 1503404.0}]}
My goal is to have two columns in a pd.Dataframe:
Date
Count
d={'name': 'count',
'stats': [{'date': '2019-08-01', 'count': 877625.0},
{'date': '2019-09-01', 'count': 895697.0},
{'date': '2020-10-01', 'count': 1161894.0},
{'date': '2020-11-01', 'count': 1205046.0},
{'date': '2020-12-01', 'count': 1184899.0},
{'date': '2023-01-01', 'count': 1503404.0}]}
pd.DataFrame(d['stats'])
Out[274]:
date count
0 2019-08-01 877625.0
1 2019-09-01 895697.0
2 2020-10-01 1161894.0
3 2020-11-01 1205046.0
4 2020-12-01 1184899.0
5 2023-01-01 1503404.0

Converting JSON to pandas DataFrame- Python (JSON fom yahoo_financials)

can anyone help me with that JSON format: (updated dataframe)
JSON:
{'PSG.MC': [{'date': 1547452800,'formatted_date': '2019-01-14', 'amount': 0.032025}, {'date': 1554361200, 'formatted_date': '2019-04-04', 'amount': 0.032025}, {'date': 1562310000, 'formatted_date': '2019-07-05', 'amount': 0.032025}, {'date': 1570690800, 'formatted_date': '2019-10-10', 'amount': 0.032025}, {'date': 1578902400, 'formatted_date': '2020-01-13', 'amount': 0.033}, {'date': 1588057200, 'formatted_date': '2020-04-28', 'amount': 0.033}, {'date': 1595228400, 'formatted_date': '2020-07-20', 'amount': 0.033}, {'date': 1601362800, 'formatted_date': '2020-09-29', 'amount': 0.033}, {'date': 1603436400, 'formatted_date': '2020-10-23', 'amount': 0.033}], 'ACX.MC': [{'date': 1559545200, 'formatted_date': '2019-06-03', 'amount': 0.3}, {'date': 1562137200, 'formatted_date': '2019-07-03', 'amount': 0.2}, {'date': 1591254000, 'formatted_date': '2020-06-04', 'amount': 0.4}, {'date': 1594018800, 'formatted_date': '2020-07-06', 'amount': 0.1}, {'date': 1606809600, 'formatted_date': '2020-12-01', 'amount': 0.1}]}
So I got it from
yahoo_financials.get_daily_dividend_data('2019-1-1', '2020-12-1')
As an example.
tried it to convert to DataFrame by:
data2 = {"data": {'VIG.VI': [{'date'......................................
s=pd.DataFrame(data2)
pd.concat([s.drop('data',1),pd.DataFrame(s.data.tolist(),index=s.index)],1)
In this case I get result like: 0 [{'date': 1433314500, 'formatted_date': '2015-... [{'date': 1430290500, 'formatted_date': '2015-...
Everything is perfect if weuse only 1 date + delete []:
Also I tried the code which under this topic: It works fine if format is the same for every variable in [], however if it is as in example above, then I get a mistake "arrays must all be same length"
Does anyone have any idea how could I convert this type of JSON to DataFrame?
You can convert that list of dict to dict of list. Then convert the final dict to multi index columns dataframe with:
import pandas as pd
from collections import defaultdict
data2 = {"data": {'PSG.MC': [{'date': 1547452800,'formatted_date': '2019-01-14', 'amount': 0.032025}, {'date': 1554361200, 'formatted_date': '2019-04-04', 'amount': 0.032025}, {'date': 1562310000, 'formatted_date': '2019-07-05', 'amount': 0.032025}, {'date': 1570690800, 'formatted_date': '2019-10-10', 'amount': 0.032025}, {'date': 1578902400, 'formatted_date': '2020-01-13', 'amount': 0.033}, {'date': 1588057200, 'formatted_date': '2020-04-28', 'amount': 0.033}, {'date': 1595228400, 'formatted_date': '2020-07-20', 'amount': 0.033}, {'date': 1601362800, 'formatted_date': '2020-09-29', 'amount': 0.033}, {'date': 1603436400, 'formatted_date': '2020-10-23', 'amount': 0.033}], 'ACX.MC': [{'date': 1559545200, 'formatted_date': '2019-06-03', 'amount': 0.3}, {'date': 1562137200, 'formatted_date': '2019-07-03', 'amount': 0.2}, {'date': 1591254000, 'formatted_date': '2020-06-04', 'amount': 0.4}, {'date': 1594018800, 'formatted_date': '2020-07-06', 'amount': 0.1}, {'date': 1606809600, 'formatted_date': '2020-12-01', 'amount': 0.1}]}}
data = {}
for key, values in data2['data'].items():
res = defaultdict(list)
{res[k].append(sub[k]) for sub in values for k in sub}
data[key] = dict(res)
def reform_dict(data):
reformed_dict = {}
for outerKey, innerDict in data.items():
for innerKey, values in innerDict.items():
reformed_dict[(outerKey, innerKey)] = values
return reformed_dict
df = pd.concat([pd.DataFrame(reform_dict({key: value})) for key, value in data.items()], axis=1)
print(df)
PSG.MC ACX.MC
date formatted_date amount date formatted_date amount
0 1547452800 2019-01-14 0.032025 1.559545e+09 2019-06-03 0.3
1 1554361200 2019-04-04 0.032025 1.562137e+09 2019-07-03 0.2
2 1562310000 2019-07-05 0.032025 1.591254e+09 2020-06-04 0.4
3 1570690800 2019-10-10 0.032025 1.594019e+09 2020-07-06 0.1
4 1578902400 2020-01-13 0.033000 1.606810e+09 2020-12-01 0.1
5 1588057200 2020-04-28 0.033000 NaN NaN NaN
6 1595228400 2020-07-20 0.033000 NaN NaN NaN
7 1601362800 2020-09-29 0.033000 NaN NaN NaN
8 1603436400 2020-10-23 0.033000 NaN NaN NaN
Thank you for your code and help.
Here sharing my code, it works nice and output is nice table with needed data, may be it will be helpful for someone:
def getDividends:
def getDividends(tickers, start_date, end_date):
yahoo_financials = YahooFinancials(tickers)
dividends = yahoo_financials.get_daily_dividend_data(start_date, end_date)
return dividends
def Frame:
def getDividendDataFrame(tickerList):
dividendList = getDividends(tickerList, '2015-1-1', '2020-12-1')
dataFrame = pd.DataFrame()
for ticker in dividendList:
for dividend in dividendList[ticker]:
series = pd.Series([ticker, dividend['formatted_date'], dividend['amount']])
dfItem = pd.DataFrame([series])
dataFrame = pd.concat([dataFrame, dfItem], ignore_index=True)
print('\n')
dataFrame.columns=['Ticker', 'formatted_date', 'amount']
return dataFrame

Using pandas, how can I group/aggregate summing cases where boolean columns are true?

I have a DataFrame constructed from a database query. Each row in the frame has a database id, date, job, an issues boolean, and a fixed boolean. For example:
data = [
{'id': 1, 'date': '2020-02-01', 'job': 'ABC', 'issue': True, 'fixed': False},
{'id': 2, 'date': '2020-02-01', 'job': 'ABC', 'issue': False, 'fixed': False},
{'id': 3, 'date': '2020-02-01', 'job': 'ABC', 'issue': True, 'fixed': True},
{'id': 4, 'date': '2020-02-01', 'job': 'DEF', 'issue': True, 'fixed': True}
]
data_df = pd.DataFrame(data)
I want to do a groupby and agg where I am grouping by job and date, and getting the count of 'issues' and 'fixed' that are True. Something like:
result_data = [
{'date': '2020-02-01', 'job': 'ABC', 'issue': 2, 'fixed': 1},
{'date': '2020-02-01', 'job': 'DEF', 'issue': 1, 'fixed': 1}
]
result_df = pd.DataFrame(result_data)
The code would look something like:
result_df = data_df.groupby(['date', 'job']).agg({'issue': 'sum-true', 'fixed': 'sum-true'})
but I am not sure what 'sum-true' should be. Not, I cant just filter the whole DF by the column being true, and summing, as issue might be True, while fixed is False.
How about this?
>>> df.groupby(['date', 'job'])[['issue', 'fixed']].sum()
issue fixed
date job
2020-02-01 ABC 2.0 1.0
DEF 1.0 1.0
Simply summing a boolean vector will return True counts.
And if you want the data in the exact format you specified above, just reset_index:
>>> df.groupby(['date', 'job'])[['issue', 'fixed']].sum().reset_index()
date job issue fixed
0 2020-02-01 ABC 2.0 1.0
1 2020-02-01 DEF 1.0 1.0

python nested list format, search for values greater than or latest [duplicate]

This question already has answers here:
How to find the min/max value of a common key in a list of dicts?
(5 answers)
Closed 5 years ago.
Below is in which I have data.
a = [{'ID': 319684283, 'ID1': 1025018, 'date': '2018-01-07 17:39:46', 'rate': 9.639e-05, 'amount': 410.84392747, 'total': 0.03960124, 'order': 16532584965, 'type': 'A', 'category': 'website'}, {'ID': 319684282, 'ID1': 1025017, 'date': '2018-01-07 17:39:46', 'amount': 24.84386425, 'total': 0.00239445, 'order': 16532584965, 'type': 'phone', 'category': 'exchange'}, {'ID': 319684281, 'ID1': 1125117, 'date': '2018-01-17 17:39:16', 'amount': 14.8138145, 'total': 0.10239445, 'order': 16512581965, 'type': 'phone', 'category': 'exchange'}]
How do I pull record which is latest of these. I tried
for c in a:
print(min(c['date']
It failed as it is not able to traverse all the elements.
How do I achieve the below out put, limited values:
d = { 'date': '2018-01-17 17:39:16', 'amount': 14.8138145, 'total': 0.10239445, 'order': 16512581965}
Here d should have only the latest value based on date.
you need to iterate through each list and then from dictionary need to get the date to find the minimum date. Once you have find the minimum date you will have your value.
min_date = "2222-01-01 00:00:00"
for dict in data_list:
date = dict.get("date")
if date < min_date:
value = dict

The order of the keys in a list of dictionaries

#!/usr/bin/python
# 1.15. Grouping Records Together Based on a Field
# Problem: You have a sequence of dictionaries or instances and you want to iterate over the data
# in groups based on the value of a particular field, such as date.
from operator import itemgetter
from itertools import groupby
# To iterate over the data in chunks grouped by date.
# First, sort by the desired field (in this case, date) and
# then use itertools.groupby():
rows = [
{'address': '5412 N CLARK', 'date': '07/01/2012'},
{'address': '5148 N CLARK', 'date': '07/04/2012'},
{'address': '5800 E 58TH', 'date': '07/02/2012'},
{'address': '2122 N CLARK', 'date': '07/03/2012'},
{'address': '5645 N RAVENSWOOD', 'date': '07/02/2012'},
{'address': '1060 W ADDISON', 'date': '07/02/2012'},
{'address': '4801 N BROADWAY', 'date': '07/01/2012'},
{'address': '1039 W GRANVILLE', 'date': '07/04/2012'},
]
# Sort by the desired field first
rows.sort(key=itemgetter('date'))
print (rows)
for date, items in groupby(rows, key=itemgetter('date')):
print(date)
for i in items:
print(' ', i)
The output of the above code is like:
[{'date': '07/01/2012', 'address': '5412 N CLARK'}, {'date': '07/01/2012', 'address': '4801 N BROADWAY'}, {'date': '07/02/2012', 'address': '5800 E 58TH'}, {'date': '07/02/2012', 'address': '5645 N RAVENSWOOD'}, {'date': '07/02/2012', 'address': '1060 W ADDISON'}, {'date': '07/03/2012', 'address': '2122 N CLARK'}, {'date': '07/04/2012', 'address': '5148 N CLARK'}, {'date': '07/04/2012', 'address': '1039 W GRANVILLE'}]
07/01/2012
{'date': '07/01/2012', 'address': '5412 N CLARK'}
{'date': '07/01/2012', 'address': '4801 N BROADWAY'}
07/02/2012
{'date': '07/02/2012', 'address': '5800 E 58TH'}
{'date': '07/02/2012', 'address': '5645 N RAVENSWOOD'}
{'date': '07/02/2012', 'address': '1060 W ADDISON'}
07/03/2012
{'date': '07/03/2012', 'address': '2122 N CLARK'}
07/04/2012
{'date': '07/04/2012', 'address': '5148 N CLARK'}
{'date': '07/04/2012', 'address': '1039 W GRANVILLE'}
The "date" is in front of the "address".
However, if I change the code by just adding print (rows) at line 24 as following :
#!/usr/bin/python
# 1.15. Grouping Records Together Based on a Field
# Problem: You have a sequence of dictionaries or instances and you want to iterate over the data
# in groups based on the value of a particular field, such as date.
from operator import itemgetter
from itertools import groupby
# To iterate over the data in chunks grouped by date.
# First, sort by the desired field (in this case, date) and
# then use itertools.groupby():
rows = [
{'address': '5412 N CLARK', 'date': '07/01/2012'},
{'address': '5148 N CLARK', 'date': '07/04/2012'},
{'address': '5800 E 58TH', 'date': '07/02/2012'},
{'address': '2122 N CLARK', 'date': '07/03/2012'},
{'address': '5645 N RAVENSWOOD', 'date': '07/02/2012'},
{'address': '1060 W ADDISON', 'date': '07/02/2012'},
{'address': '4801 N BROADWAY', 'date': '07/01/2012'},
{'address': '1039 W GRANVILLE', 'date': '07/04/2012'},
]
print (rows)
# Sort by the desired field first
rows.sort(key=itemgetter('date'))
print (rows)
for date, items in groupby(rows, key=itemgetter('date')):
print(date)
for i in items:
print(' ', i)
The output of the above code is like:
[{'address': '5412 N CLARK', 'date': '07/01/2012'}, {'address': '4801 N BROADWAY', 'date': '07/01/2012'}, {'address': '5800 E 58TH', 'date': '07/02/2012'}, {'address': '5645 N RAVENSWOOD', 'date': '07/02/2012'}, {'address': '1060 W ADDISON', 'date': '07/02/2012'}, {'address': '2122 N CLARK', 'date': '07/03/2012'}, {'address': '5148 N CLARK', 'date': '07/04/2012'}, {'address': '1039 W GRANVILLE', 'date': '07/04/2012'}]
07/01/2012
{'address': '5412 N CLARK', 'date': '07/01/2012'}
{'address': '4801 N BROADWAY', 'date': '07/01/2012'}
07/02/2012
{'address': '5800 E 58TH', 'date': '07/02/2012'}
{'address': '5645 N RAVENSWOOD', 'date': '07/02/2012'}
{'address': '1060 W ADDISON', 'date': '07/02/2012'}
07/03/2012
{'address': '2122 N CLARK', 'date': '07/03/2012'}
07/04/2012
{'address': '5148 N CLARK', 'date': '07/04/2012'}
{'address': '1039 W GRANVILLE', 'date': '07/04/2012'}
The "address" is in front of the "date".
Why the order of the keys will change?
The order varies not because you've added a line of code, but because of hash randomization. Implementing hash randomization mitigates DoS attacks using broken sequences of tens of thousands of values that hash to the same value in e.g. a HTTP POST request.
If you want the order to remain constant, you need to used an OrderedDict from collections.
from collections import OrderedDict
row = OrderedDict([('address', '5412 N CLARK'), ('date', '07/01/2012')])
>>> row
OrderedDict([('address', '5412 N CLARK'), ('date', '07/01/2012')])
>>> rows.keys()
['address', 'date']

Categories