I have the following output when getting data from an API:
{'Textbook': [{'Type': 'Chapters', 'Case': 'Ch09', 'Rates':
[{'Date': '2021- 04-23T00:00:00', 'Rate': 10.0}, {'Date': '2021-04-26T00:00:00', 'Rate': 10.0},
{'Date': '2021-04-27T00:00:00', 'Rate': 10.5}, {'Date': '2021-04-28T00:00:00', 'Rate': 10.5},
{'Date': '2021-04-29T00:00:00', 'Rate': 10.5}, {'Date': '2021-04-30T00:00:00', 'Rate': 10.0}]}]}
I am trying to get the following output in a dataframe:
Date Rate
2021- 04-23T00:00:00 10.0
2021-04-26T00:00:00 10.0
2021-04-27T00:00:00 10.5
etc
I tried the following code:
l=parsed ###this is the output from API
df=pd.DataFrame()
for i in l:
d1 = {}
reportDate = []
price = []
for j in i['Chapters']:
reportDate.append(j['Date'])
price.append(j['Rate'])
d1['Date'] = reportDate
d1['Rate'] = price
df = df.append(pd.DataFrame(d1))
df['Date'] = pd.to_datetime(df['Date'])
However, I get the following error: string indices must be integers for the line for j in i['Chapters']:
Below fix on your code will solve your issue. Although the answer by Andreas is a pythonic way!
import ast
# Data setup
raw_data="""
{'Textbook': [{'Type': 'Chapters', 'Case': 'Ch09', 'Rates':
[{'Date': '2021- 04-23T00:00:00', 'Rate': 10.0}, {'Date': '2021-04-26T00:00:00', 'Rate': 10.0},
{'Date': '2021-04-27T00:00:00', 'Rate': 10.5}, {'Date': '2021-04-28T00:00:00', 'Rate': 10.5},
{'Date': '2021-04-29T00:00:00', 'Rate': 10.5}, {'Date': '2021-04-30T00:00:00', 'Rate': 10.0}]}]}
"""
val=ast.literal_eval(raw_data) # eval to dictionary
the fix would be(pls review the comment section)
l=val ###this is the output from API, added val in this example
reportDate = [] # moved out of loop to collect the data
price = [] # moved out of loop to collect the data
#df=pd.DataFrame() build the dataframe once all the data is ready
for i in l: # this is dictionary
#d1 = {} not needed
for j in l[i][0]['Rates']:
reportDate.append(j['Date'])
price.append(j['Rate'])
#d1['Date'] = reportDate
#d1['Rate'] = price
#df = df.append(pd.DataFrame(d1))
#df['Date'] = pd.to_datetime(df['Date'])
df=pd.DataFrame({'Date':reportDate,"Rate":price})
You can try this:
d = {'Textbook': [{'Type': 'Chapters', 'Case': 'Ch09', 'Rates':
[{'Date': '2021- 04-23T00:00:00', 'Rate': 10.0}, {'Date': '2021-04-26T00:00:00', 'Rate': 10.0},
{'Date': '2021-04-27T00:00:00', 'Rate': 10.5}, {'Date': '2021-04-28T00:00:00', 'Rate': 10.5},
{'Date': '2021-04-29T00:00:00', 'Rate': 10.5}, {'Date': '2021-04-30T00:00:00', 'Rate': 10.0}]}]}
pd.DataFrame(d.get('Textbook')[0].get('Rates'))
# Date Rate
# 0 2021- 04-23T00:00:00 10.0
# 1 2021-04-26T00:00:00 10.0
# 2 2021-04-27T00:00:00 10.5
# 3 2021-04-28T00:00:00 10.5
# 4 2021-04-29T00:00:00 10.5
# 5 2021-04-30T00:00:00 10.0
Possible solutions to your question
You could read a documentation here: https://www.activestate.com/resources/quick-reads/how-to-save-a-dataframe/
or you could try this code
d = {'Textbook': [{'Type': 'Chapters', 'Case': 'Ch09', 'Rates':
[{'Date': '2021- 04-23T00:00:00', 'Rate': 10.0}, {'Date': '2021-04-26T00:00:00', 'Rate': 10.0},
{'Date': '2021-04-27T00:00:00', 'Rate': 10.5}, {'Date': '2021-04-28T00:00:00', 'Rate': 10.5},
{'Date': '2021-04-29T00:00:00', 'Rate': 10.5}, {'Date': '2021-04-30T00:00:00', 'Rate': 10.0}]}]}
pd.DataFrame(d.get('Textbook')[0].get('Rates'))
Code doesn't work? Please comment below.. any other questions, I'll be glad to talk.
Related
can anyone help me with that JSON format: (updated dataframe)
JSON:
{'PSG.MC': [{'date': 1547452800,'formatted_date': '2019-01-14', 'amount': 0.032025}, {'date': 1554361200, 'formatted_date': '2019-04-04', 'amount': 0.032025}, {'date': 1562310000, 'formatted_date': '2019-07-05', 'amount': 0.032025}, {'date': 1570690800, 'formatted_date': '2019-10-10', 'amount': 0.032025}, {'date': 1578902400, 'formatted_date': '2020-01-13', 'amount': 0.033}, {'date': 1588057200, 'formatted_date': '2020-04-28', 'amount': 0.033}, {'date': 1595228400, 'formatted_date': '2020-07-20', 'amount': 0.033}, {'date': 1601362800, 'formatted_date': '2020-09-29', 'amount': 0.033}, {'date': 1603436400, 'formatted_date': '2020-10-23', 'amount': 0.033}], 'ACX.MC': [{'date': 1559545200, 'formatted_date': '2019-06-03', 'amount': 0.3}, {'date': 1562137200, 'formatted_date': '2019-07-03', 'amount': 0.2}, {'date': 1591254000, 'formatted_date': '2020-06-04', 'amount': 0.4}, {'date': 1594018800, 'formatted_date': '2020-07-06', 'amount': 0.1}, {'date': 1606809600, 'formatted_date': '2020-12-01', 'amount': 0.1}]}
So I got it from
yahoo_financials.get_daily_dividend_data('2019-1-1', '2020-12-1')
As an example.
tried it to convert to DataFrame by:
data2 = {"data": {'VIG.VI': [{'date'......................................
s=pd.DataFrame(data2)
pd.concat([s.drop('data',1),pd.DataFrame(s.data.tolist(),index=s.index)],1)
In this case I get result like: 0 [{'date': 1433314500, 'formatted_date': '2015-... [{'date': 1430290500, 'formatted_date': '2015-...
Everything is perfect if weuse only 1 date + delete []:
Also I tried the code which under this topic: It works fine if format is the same for every variable in [], however if it is as in example above, then I get a mistake "arrays must all be same length"
Does anyone have any idea how could I convert this type of JSON to DataFrame?
You can convert that list of dict to dict of list. Then convert the final dict to multi index columns dataframe with:
import pandas as pd
from collections import defaultdict
data2 = {"data": {'PSG.MC': [{'date': 1547452800,'formatted_date': '2019-01-14', 'amount': 0.032025}, {'date': 1554361200, 'formatted_date': '2019-04-04', 'amount': 0.032025}, {'date': 1562310000, 'formatted_date': '2019-07-05', 'amount': 0.032025}, {'date': 1570690800, 'formatted_date': '2019-10-10', 'amount': 0.032025}, {'date': 1578902400, 'formatted_date': '2020-01-13', 'amount': 0.033}, {'date': 1588057200, 'formatted_date': '2020-04-28', 'amount': 0.033}, {'date': 1595228400, 'formatted_date': '2020-07-20', 'amount': 0.033}, {'date': 1601362800, 'formatted_date': '2020-09-29', 'amount': 0.033}, {'date': 1603436400, 'formatted_date': '2020-10-23', 'amount': 0.033}], 'ACX.MC': [{'date': 1559545200, 'formatted_date': '2019-06-03', 'amount': 0.3}, {'date': 1562137200, 'formatted_date': '2019-07-03', 'amount': 0.2}, {'date': 1591254000, 'formatted_date': '2020-06-04', 'amount': 0.4}, {'date': 1594018800, 'formatted_date': '2020-07-06', 'amount': 0.1}, {'date': 1606809600, 'formatted_date': '2020-12-01', 'amount': 0.1}]}}
data = {}
for key, values in data2['data'].items():
res = defaultdict(list)
{res[k].append(sub[k]) for sub in values for k in sub}
data[key] = dict(res)
def reform_dict(data):
reformed_dict = {}
for outerKey, innerDict in data.items():
for innerKey, values in innerDict.items():
reformed_dict[(outerKey, innerKey)] = values
return reformed_dict
df = pd.concat([pd.DataFrame(reform_dict({key: value})) for key, value in data.items()], axis=1)
print(df)
PSG.MC ACX.MC
date formatted_date amount date formatted_date amount
0 1547452800 2019-01-14 0.032025 1.559545e+09 2019-06-03 0.3
1 1554361200 2019-04-04 0.032025 1.562137e+09 2019-07-03 0.2
2 1562310000 2019-07-05 0.032025 1.591254e+09 2020-06-04 0.4
3 1570690800 2019-10-10 0.032025 1.594019e+09 2020-07-06 0.1
4 1578902400 2020-01-13 0.033000 1.606810e+09 2020-12-01 0.1
5 1588057200 2020-04-28 0.033000 NaN NaN NaN
6 1595228400 2020-07-20 0.033000 NaN NaN NaN
7 1601362800 2020-09-29 0.033000 NaN NaN NaN
8 1603436400 2020-10-23 0.033000 NaN NaN NaN
Thank you for your code and help.
Here sharing my code, it works nice and output is nice table with needed data, may be it will be helpful for someone:
def getDividends:
def getDividends(tickers, start_date, end_date):
yahoo_financials = YahooFinancials(tickers)
dividends = yahoo_financials.get_daily_dividend_data(start_date, end_date)
return dividends
def Frame:
def getDividendDataFrame(tickerList):
dividendList = getDividends(tickerList, '2015-1-1', '2020-12-1')
dataFrame = pd.DataFrame()
for ticker in dividendList:
for dividend in dividendList[ticker]:
series = pd.Series([ticker, dividend['formatted_date'], dividend['amount']])
dfItem = pd.DataFrame([series])
dataFrame = pd.concat([dataFrame, dfItem], ignore_index=True)
print('\n')
dataFrame.columns=['Ticker', 'formatted_date', 'amount']
return dataFrame
I am new to python and am trying to write a For loop that iterate over a large text file line by line to extract specific Regex values and add them to a new CSV file. I am following code I located to solve a similar problem. My issue is that none values are being added to the dictionary despite using an "if value not None" line. The output files are printing multiple blank rows in the output csv because all of the none values are included in the list. Any help would be appreciated. code below:
import re
import pandas as pd
list = []
fh = open(r"test_data.txt", "r").read()
contents = fh.split()
for item in contents:
list_dict = {}
date_field = re.search(r"(\d{1})[/.-](\d{1})[/.-](\d{4})$", item)
if date_field is not None:
date = date_field.group()
else:
date = None
list_dict["date"] = date
list.append(list_dict)
print(list)
df = pd.DataFrame(list)
df.to_csv("test_export_with_testdata.csv", index=False)
Output
[{'date': None}, {'date': None}, {'date': None}, {'date': None}, {'date': None}, {'date': None}, {'date': None}, {'date': None}, {'date': None}, {'date': '2/5/2021'}, {'date': None}, {'date': None}, {'date': None}, {'date': None}, {'date': None}, {'date': None}, {'date': None}, {'date': None}, {'date': None}, {'date': None}, {'date': None}, {'date': None}, {'date': None}, {'date': None}, {'date': '2/6/2021'}, {'date': None}, {'date': None}, {'date': None}, {'date': None}, {'date': None}, {'date': None}, {'date': None}, {'date': None}, {'date': None}, {'date': None}, {'date': None}, {'date': None}, {'date': None}, {'date': None}, {'date': '2/7/2021'}, {'date': None}, {'date': None}, {'date': None}, {'date': None}, {'date': None}, {'date': None}, {'date': None}, {'date': None}, {'date': None}, {'date': None}, {'date': None}, {'date': None}, {'date': None}, {'date': None}, {'date': '2/8/2021'}, {'date': None}, {'date': None}, {'date': None}, {'date': None}, {'date': None}]
Process finished with exit code 0
You are still running your append() if the value is None.
If you do not want to include the lines where the regex found no result, simply move all of the code into the if statement.
for item in contents:
list_dict = {}
date_field = re.search(r"(\d{1})[/.-](\d{1})[/.-](\d{4})$", item)
if date_field:
date = date_field.group()
list_dict["date"] = date
list.append(list_dict)
import re
import pandas as pd
list1 = []
fh = open(r"test_data.txt", "r").read()
contents = fh.split()
for item in contents:
list_dict = {}
date_field = re.search(r"(\d{1})[/.-](\d{1})[/.-](\d{4})$", item)
if date_field is not None:
date = date_field.group()
list_dict["date"] = date
list1.append(list_dict)
else:
date = None
time_field = re.search(r"(\d{1,2})[:](\d{2})[:](\d{2})$", item)
if time_field is not None:
time = time_field.group()
list_dict["time"] = time
list1.append(list_dict)
print(list1)
df = pd.DataFrame(list1)
df.to_csv("test_export_with_testdata.csv", index=False)
Output:
date time
0 2/5/2021 NaN
1 NaN 10:41:45
2 2/6/2021 NaN
3 NaN 10:42:45
4 2/7/2021 NaN
5 NaN 10:43:45
6 2/8/2021 NaN
7 NaN 10:44:45
I want to rearrange dictionaries with new variable which is newly defined with existing variables.
For example,
security_info = [{'date': '19.04.15', 'price': 785000, 'trade': 79620},
{'date': '19.04.16', 'price': 785000, 'trade': 68203},
{'date': '19.04.17', 'price': 754000, 'trade': 165929},
{'date': '19.04.18', 'price': 779000, 'trade': 94462},
{'date': '19.04.19', 'price': 770000, 'trade': 76814},
{'date': '19.04.22', 'price': 774000, 'trade': 58079},
{'date': '19.04.23', 'price': 775000, 'trade': 79128},
{'date': '19.04.24', 'price': 771000, 'trade': 61650},
{'date': '19.04.25', 'price': 757000, 'trade': 111805},
{'date': '19.04.26', 'price': 764000, 'trade': 68237}]
I want rearrange this list by 'net return', which is defined by 'price today/price yesterday *100' (of course 'net return' of first date does not exist).
But I don't want to solve this by adding new key and value.
Thanks.
I should preface this by making a few comments:
This is not a great structure for these data. I would recommend looking into pandas, which can handle dates and would work well in this situation.
Making a new value for the net return would probably be a good idea, since this is what you are using to sort the values by.
Nevertheless, if you want to sort the values based on the net return, and the list is currently sorted by date (as in your example), then something like this could work:
def net_return(entry):
i = security_info.index(entry)
if i>0:
return security_info[i]['price']/security_info[i-1]['price']
else:
return -1
sorted(security_info, key=net_return)
This returns the following list:
[{'date': '19.04.15', 'price': 785000, 'trade': 79620},
{'date': '19.04.17', 'price': 754000, 'trade': 165929},
{'date': '19.04.25', 'price': 757000, 'trade': 111805},
{'date': '19.04.19', 'price': 770000, 'trade': 76814},
{'date': '19.04.24', 'price': 771000, 'trade': 61650},
{'date': '19.04.16', 'price': 785000, 'trade': 68203},
{'date': '19.04.23', 'price': 775000, 'trade': 79128},
{'date': '19.04.22', 'price': 774000, 'trade': 58079},
{'date': '19.04.26', 'price': 764000, 'trade': 68237},
{'date': '19.04.18', 'price': 779000, 'trade': 94462}]
Hope this helps.
I am trying to work with a nested json and I am not reaching the result that I want.
I have a JSON data like this:
{'from_cache': True,
'results': [{'data': [{'date': '2019/06/01', 'value': 0},
{'date': '2019/06/02', 'value': 0},
{'date': '2019/08/09', 'value': 7087},
{'date': '2019/08/10', 'value': 0},
{'date': '2019/08/11', 'value': 15},
{'date': '2019/08/12', 'value': 14177},
{'date': '2019/08/13', 'value': 0}],
'name': 'Clicks'},
{'data': [{'date': '2019/06/01', 'value': 0.0},
{'date': '2019/06/02', 'value': 0.0},
{'date': '2019/06/03', 'value':1.0590561064390611},
{'date': '2019/08/11', 'value':1.8610421836228286},
{'date': '2019/08/12', 'value': 6.191613785151832},
{'date': '2019/08/13', 'value': 0.0}],
'name': 'Rate'}]}
The expected result is a dataframe like this:
date Clicks Rate
2019/06/01 0 0.0
2019/06/02 0 0.0
2019/08/09 7087 1.0590561064390611
As you can see I want each 'name' as a dataframe column with the respective 'values'.
I am working with pd.io.json_normalize, but no success to get this result. The best result I've reached is a dataframe with the columns: date, value, name.
Can someone help me with this?
IIUC, use pd.concat through axis=1
df = pd.concat([pd.DataFrame(k['data']).rename(columns={'value': k['name']})\
.set_index('date')
for k in d['results']],
sort=False,
axis=1)
Clicks Rate
2019/06/01 0.0 0.000000
2019/06/02 0.0 0.000000
2019/08/09 7087.0 NaN
2019/08/10 0.0 NaN
2019/08/11 15.0 1.861042
2019/08/12 14177.0 6.191614
2019/08/13 0.0 0.000000
2019/06/03 NaN 1.059056
Another way with pivot_table
df = pd.concat([pd.DataFrame(x['data']).assign(column=x['name']) for x in d['results']])\
.pivot_table(columns='column', index='date', values='value')
Without loops:
from pandas.io.json import json_normalize
import matplotlib.pyplot as plt
df = json_normalize(data['results'], record_path=['data'], meta=['name'])
df.date = pd.to_datetime(df.date)
df_clicks = df[df.name == 'Clicks'].drop('name', axis=1).rename(columns={'value': 'Clicks'})
df_rate = df[df.name == 'Rate'].drop('name', axis=1).rename(columns={'value': 'Rate'})
df_final = df_clicks.merge(df_rate, how='outer', sort=True)
df_final.set_index('date', drop=True, inplace=True)
unexpected data:
2019-06-03: a rate with no clicks
2019-08-09: clicks, but no rate
Plot it:
df_final.plot(kind='bar', logy=True)
plt.show()
Suggested new json format:
data = {'from_cache': True,
'results': [{'date': '2019/06/01', 'Clicks': 0, 'Rate': 0},
{'date': '2019/06/02', 'Clicks': 0, 'Rate': 0},
{'date': '2019/06/03', 'Clicks': 0, 'Rate': 1.0590561064390611},
{'date': '2019/08/09', 'Clicks': 7087, 'Rate': 0},
{'date': '2019/08/10', 'Clicks': 0, 'Rate': 0},
{'date': '2019/08/11', 'Clicks': 15, 'Rate': 1.8610421836228286},
{'date': '2019/08/12', 'Clicks': 14177, 'Rate': 6.191613785151832},
{'date': '2019/08/13', 'Clicks': 0, 'Rate': 0}]}
This question already has answers here:
Remove duplicate dict in list in Python
(16 answers)
Closed 6 years ago.
[
{'date': '08/11/2016', 'duration': 13.0},
{'date': '08/17/2016', 'duration': 5.0},
{'date': '08/01/2016', 'duration': 5.2},
{'date': '08/11/2016', 'duration': 13.0},
{'date': '08/11/2016', 'duration': 13.0},
{'date': '08/11/2016', 'duration': 13.0}
]
if data is like that.
One easy but not really efficient solution can be:
a = [{'date': '08/11/2016', 'duration': 13.0}, {'date': '08/17/2016', 'duration': 5.0}, {'date': '08/01/2016', 'duration': 5.2}, {'date': '08/11/2016', 'duration': 13.0}, {'date': '08/11/2016', 'duration': 13.0}, {'date': '08/11/2016', 'duration': 13.0}]
b = []
for c in a:
if c in b:
continue
b.append(c)
print(b)
One way is to use FrozenDict. By using this, you may perform set like operation on your dict. But this not available as default python package.
Alternatively, Make entry from your list to new list and make a check before entering the new value. Below is the sample code:
my_list = [{'date': '08/11/2016', 'duration': 13.0}, {'date': '08/17/2016', 'duration': 5.0}, {'date': '08/01/2016', 'duration': 5.2}, {'date': '08/11/2016', 'duration': 13.0}, {'date': '08/11/2016', 'duration': 13.0}, {'date': '08/11/2016', 'duration': 13.0}]
new_list = []
for item in my_list:
if item not in new_list:
new_list.append(item)
# new_list = [{'date': '08/11/2016', 'duration': 13.0}, {'date': '08/17/2016', 'duration': 5.0}, {'date': '08/01/2016', 'duration': 5.2}]