Pandas Dataframe slice fillna value not being assigned (not working) - python

I tried this
values = {'BsmtQual':'None','BsmtCond':'None', 'BsmtExposure':'None', 'BsmtFinType1':'None', 'BsmtFinType2':'None'}
df_test.loc[:, ('BsmtQual','BsmtCond', 'BsmtExposure', 'BsmtFinType1', 'BsmtFinType2')].fillna(value=values, inplace=True)
and this
values = {'BsmtQual':'None','BsmtCond':'None', 'BsmtExposure':'None', 'BsmtFinType1':'None', 'BsmtFinType2':'None'}
df_test.loc[:, ['BsmtQual','BsmtCond', 'BsmtExposure', 'BsmtFinType1', 'BsmtFinType2']].fillna(value=values, inplace=True)
and this
values = {'BsmtQual':'None','BsmtCond':'None', 'BsmtExposure':'None', 'BsmtFinType1':'None', 'BsmtFinType2':'None'}
df_test[['BsmtQual','BsmtCond', 'BsmtExposure', 'BsmtFinType1', 'BsmtFinType2']].fillna(value=values, inplace=True)
just this
df_test[['BsmtQual','BsmtCond', 'BsmtExposure', 'BsmtFinType1', 'BsmtFinType2']].fillna('None', inplace=True)
one line with .loc
df_test.loc[:, ['BsmtQual','BsmtCond', 'BsmtExposure', 'BsmtFinType1', 'BsmtFinType2']].fillna('None', inplace=True)
and nothing worked! Please help me out

You need to assign it. For example:
df_test.loc[:, ('BsmtQual','BsmtCond', 'BsmtExposure', 'BsmtFinType1', 'BsmtFinType2')] = df_test.loc[:, ('BsmtQual','BsmtCond', 'BsmtExposure', 'BsmtFinType1', 'BsmtFinType2')].fillna(value=values)

Related

Group duplicate rows with different column values then send to csv

I have this csv file favsites.csv:
Emails Favorite Site
batman#email.com something.com
batman#email.com hamburgers.com
poisonivy#email.com yonder.com
superman#email.com cookies.com
catgirl#email.com cattreats.com
catgirl#email.com fishcaviar.com
catgirl#email.com elegantfashion.com
joker#email.com cards.com
supergirl#email.com nailart.com
I want to group the duplicates, then merge the columns, and then send to a csv.
So once grouped and merged it should look like this:
Emails Favorite Site
batman#email.com something.com
hamburgers.com
poisonivy#email.com yonder.com
superman#email.com cookies.com
catgirl#email.com cattreats.com
fishcaviar.com
elegantfashion.com
joker#email.com cards.com
supergirl#email.com nailart.com
How would I send this to a csv file and have it look like this? But something.com and hamburgers.com are in one cell for batman; and cattreats.com, fishcaviar.com, and elegantfashion.com are in one cell for catgirl. OR, have them in the same row but different columns like this.
Emails Favorite Site
batman#email.com something.com hamburgers.com
poisonivy#email.com yonder.com
superman#email.com cookies.com
catgirl#email.com cattreats.com fishcaviar.com elegantfashion.com
joker#email.com cards.com
supergirl#email.com nailart.com
Here is my code so far:
import pandas as pd
Dir='favsites.csv'
sendcsv='mergednames.csv'
df = pd.read_csv(Dir)
df = pd.DataFrame(df)
df_sort = df.sort_values('Emails')
grouped = df_sort.groupby(['Emails', 'Favorite Site']).agg('sum')
When I print grouped it shows:
Empty DataFrame
Columns: []
Index: [(batman#email.com, hamburgers.com), (batman#email.com, something.com), (catgirl#email.com, cattreats.com), (catgirl#email.com, elegantfashion.com), (catgirl#email.com, fishcaviar.com), (joker#email.com, cards.com), (poisonivy#email.com, yonder.com), (supergirl#email.com, nailart.com), (superman#email.com, cookies.com)]
You can replace duplicated values with empty strings:
emails = ['batman#email.com', 'poisonivy#email.com','superman#email.com', 'batman#email.com']
favs =['something.com', 'hamburgers.com', 'yonder.com', 'cookies.com' ]
df = pd.DataFrame({'Emails': emails, 'Favorite Site': favs})
df_sorted = df.sort_values('Emails')
df_sorted.loc[df['Emails'].duplicated(), 'Emails'] = ''
Output:
Emails
Favorite Site
batman#email.com
something.com
cookies.com
poisonivy#email.com
hamburgers.com
superman#email.com
yonder.com
IIUC, you can use pandas.Series.str.ljust and pandas.DataFrame.to_csv with (\t) as a sep :
df.loc[df["Emails"].duplicated(), "Emails"] = ""
len_emails = df["Emails"].str.len().max()
len_sites = df["Favorite Site"].str.len().max()
df = df.T.reset_index().T.reset_index(drop=True)
df[0] = df[0].str.ljust(len_emails)
df[1] = df[1].str.ljust(len_sites)
df.to_csv("/tmp/out1.csv", index=False, header=False, sep="\t")
Output (notepad) :
For the second format, you can use pandas.DataFrame.groupby
df = (
pd.read_csv("/tmp/input.csv", sep="\s\s+", engine="python")
.groupby("Emails", as_index=False, sort=False).agg(",".join)
.T.reset_index().T.reset_index(drop=True)
.pipe(lambda d: d[[0]].join(d[1].str.split(",", expand=True), rsuffix="_"))
.pipe(lambda d: pd.concat([d[col].str.ljust(d[col].fillna("").str.len().max().sum())
for col in d.columns], axis=1))
)
df.to_csv('tmp/out2.csv', index=False, header=False, sep="\t")
Output (notepad) :

Select specific dates from a data frame in python using MODIS data in NETCDF4

I am not well-versed in python, and I'm sure there is a simple solution to this (although, I have looked). I got this code from an lpdaac tutorial.
My input is a NETCDF4 file downloaded from MODIS satellite. Printing the metadata of the file returns the variables
file_in = Dataset(file_list[0], 'r', format = 'NETCDF4')
#print metadata
list(file_in.variables)
Out[19]: ['crs', 'time', 'lat', 'lon', '_1_km_16_days_EVI', '_1_km_16_days_VI_Quality']
I want to convert the time variable to date format, and then only select 1 date from each year. Here is the code to convert to date format:
from netCDF4 import num2date
times = file_in.variables["time"] #import time variables
dates = num2date(times[:], times.units) #get the time info
dates = [date.strftime("%Y-%m-%d") for date in dates] #get the list of datetime
print(dates)
The dates are as follows:
['2000-06-25', '2000-07-11', '2000-07-27', '2000-08-12', '2000-08-28', '2000-09-13', '2000-09-29', '2000-10-15', '2000-10-31', '2000-11-16', '2000-12-02', '2000-12-18', '2001-01-01', '2001-01-17', '2001-02-02', '2001-02-18', '2001-03-06', '2001-03-22', '2001-04-07', '2001-04-23', '2001-05-09', '2001-05-25', '2001-06-10', '2001-06-26', '2001-07-12', '2001-07-28', '2001-08-13', '2001-08-29', '2001-09-14', '2001-09-30', '2001-10-16', '2001-11-01', '2001-11-17', '2001-12-03', '2001-12-19', '2002-01-01', '2002-01-17', '2002-02-02', '2002-02-18', '2002-03-06', '2002-03-22', '2002-04-07', '2002-04-23', '2002-05-09', '2002-05-25', '2002-06-10', '2002-06-26', '2002-07-12', '2002-07-28', '2002-08-13', '2002-08-29', '2002-09-14', '2002-09-30', '2002-10-16', '2002-11-01', '2002-11-17', '2002-12-03', '2002-12-19', '2003-01-01', '2003-01-17', '2003-02-02', '2003-02-18', '2003-03-06', '2003-03-22', '2003-04-07', '2003-04-23', '2003-05-09', '2003-05-25', '2003-06-10', '2003-06-26', '2003-07-12', '2003-07-28', '2003-08-13', '2003-08-29', '2003-09-14', '2003-09-30', '2003-10-16', '2003-11-01', '2003-11-17', '2003-12-03', '2003-12-19', '2004-01-01', '2004-01-17', '2004-02-02', '2004-02-18', '2004-03-05', '2004-03-21', '2004-04-06', '2004-04-22', '2004-05-08', '2004-05-24', '2004-06-09', '2004-06-25', '2004-07-11', '2004-07-27', '2004-08-12', '2004-08-28', '2004-09-13', '2004-09-29', '2004-10-15', '2004-10-31', '2004-11-16', '2004-12-02', '2004-12-18', '2005-01-01', '2005-01-17', '2005-02-02', '2005-02-18', '2005-03-06', '2005-03-22', '2005-04-07', '2005-04-23', '2005-05-09', '2005-05-25', '2005-06-10', '2005-06-26', '2005-07-12', '2005-07-28', '2005-08-13', '2005-08-29', '2005-09-14', '2005-09-30', '2005-10-16', '2005-11-01', '2005-11-17', '2005-12-03', '2005-12-19', '2006-01-01', '2006-01-17', '2006-02-02', '2006-02-18', '2006-03-06', '2006-03-22', '2006-04-07', '2006-04-23', '2006-05-09', '2006-05-25', '2006-06-10', '2006-06-26', '2006-07-12', '2006-07-28', '2006-08-13', '2006-08-29', '2006-09-14', '2006-09-30', '2006-10-16', '2006-11-01', '2006-11-17', '2006-12-03', '2006-12-19', '2007-01-01', '2007-01-17', '2007-02-02', '2007-02-18', '2007-03-06', '2007-03-22', '2007-04-07', '2007-04-23', '2007-05-09', '2007-05-25', '2007-06-10', '2007-06-26', '2007-07-12', '2007-07-28', '2007-08-13', '2007-08-29', '2007-09-14', '2007-09-30', '2007-10-16', '2007-11-01', '2007-11-17', '2007-12-03', '2007-12-19', '2008-01-01', '2008-01-17', '2008-02-02', '2008-02-18', '2008-03-05', '2008-03-21', '2008-04-06', '2008-04-22', '2008-05-08', '2008-05-24', '2008-06-09', '2008-06-25', '2008-07-11', '2008-07-27', '2008-08-12', '2008-08-28', '2008-09-13', '2008-09-29', '2008-10-15', '2008-10-31', '2008-11-16', '2008-12-02', '2008-12-18', '2009-01-01', '2009-01-17', '2009-02-02', '2009-02-18', '2009-03-06', '2009-03-22', '2009-04-07', '2009-04-23', '2009-05-09', '2009-05-25', '2009-06-10', '2009-06-26', '2009-07-12', '2009-07-28', '2009-08-13', '2009-08-29', '2009-09-14', '2009-09-30', '2009-10-16', '2009-11-01', '2009-11-17', '2009-12-03', '2009-12-19', '2010-01-01', '2010-01-17', '2010-02-02', '2010-02-18', '2010-03-06', '2010-03-22', '2010-04-07', '2010-04-23', '2010-05-09', '2010-05-25', '2010-06-10', '2010-06-26', '2010-07-12', '2010-07-28', '2010-08-13', '2010-08-29', '2010-09-14', '2010-09-30', '2010-10-16', '2010-11-01', '2010-11-17', '2010-12-03', '2010-12-19', '2011-01-01', '2011-01-17', '2011-02-02', '2011-02-18', '2011-03-06', '2011-03-22', '2011-04-07', '2011-04-23', '2011-05-09', '2011-05-25', '2011-06-10', '2011-06-26', '2011-07-12', '2011-07-28', '2011-08-13', '2011-08-29', '2011-09-14', '2011-09-30', '2011-10-16', '2011-11-01', '2011-11-17', '2011-12-03', '2011-12-19', '2012-01-01', '2012-01-17', '2012-02-02', '2012-02-18', '2012-03-05', '2012-03-21', '2012-04-06', '2012-04-22', '2012-05-08', '2012-05-24', '2012-06-09', '2012-06-25', '2012-07-11', '2012-07-27', '2012-08-12', '2012-08-28', '2012-09-13', '2012-09-29', '2012-10-15', '2012-10-31', '2012-11-16', '2012-12-02', '2012-12-18', '2013-01-01', '2013-01-17', '2013-02-02', '2013-02-18', '2013-03-06', '2013-03-22', '2013-04-07', '2013-04-23', '2013-05-09', '2013-05-25', '2013-06-10', '2013-06-26', '2013-07-12', '2013-07-28', '2013-08-13', '2013-08-29', '2013-09-14', '2013-09-30', '2013-10-16', '2013-11-01', '2013-11-17', '2013-12-03', '2013-12-19', '2014-01-01', '2014-01-17', '2014-02-02', '2014-02-18', '2014-03-06', '2014-03-22', '2014-04-07', '2014-04-23', '2014-05-09', '2014-05-25', '2014-06-10', '2014-06-26', '2014-07-12', '2014-07-28', '2014-08-13', '2014-08-29', '2014-09-14', '2014-09-30', '2014-10-16', '2014-11-01', '2014-11-17', '2014-12-03', '2014-12-19', '2015-01-01', '2015-01-17', '2015-02-02', '2015-02-18', '2015-03-06', '2015-03-22', '2015-04-07', '2015-04-23', '2015-05-09', '2015-05-25', '2015-06-10', '2015-06-26', '2015-07-12', '2015-07-28', '2015-08-13', '2015-08-29', '2015-09-14', '2015-09-30', '2015-10-16', '2015-11-01', '2015-11-17', '2015-12-03', '2015-12-19', '2016-01-01', '2016-01-17', '2016-02-02', '2016-02-18', '2016-03-05', '2016-03-21', '2016-04-06', '2016-04-22', '2016-05-08', '2016-05-24', '2016-06-09', '2016-06-25', '2016-07-11', '2016-07-27', '2016-08-12', '2016-08-28', '2016-09-13', '2016-09-29', '2016-10-15', '2016-10-31', '2016-11-16', '2016-12-02', '2016-12-18', '2017-01-01', '2017-01-17', '2017-02-02', '2017-02-18', '2017-03-06', '2017-03-22', '2017-04-07', '2017-04-23', '2017-05-09', '2017-05-25', '2017-06-10', '2017-06-26', '2017-07-12', '2017-07-28', '2017-08-13', '2017-08-29', '2017-09-14', '2017-09-30', '2017-10-16', '2017-11-01', '2017-11-17', '2017-12-03', '2017-12-19', '2018-01-01', '2018-01-17', '2018-02-02', '2018-02-18', '2018-03-06', '2018-03-22', '2018-04-07', '2018-04-23', '2018-05-09', '2018-05-25', '2018-06-10', '2018-06-26', '2018-07-12', '2018-07-28', '2018-08-13', '2018-08-29', '2018-09-14', '2018-09-30', '2018-10-16', '2018-11-01', '2018-11-17', '2018-12-03', '2018-12-19', '2019-01-01', '2019-01-17', '2019-02-02', '2019-02-18', '2019-03-06', '2019-03-22', '2019-04-07', '2019-04-23', '2019-05-09', '2019-05-25', '2019-06-10', '2019-06-26', '2019-07-12', '2019-07-28', '2019-08-13', '2019-08-29', '2019-09-14', '2019-09-30', '2019-10-16', '2019-11-01', '2019-11-17', '2019-12-03', '2019-12-19', '2020-01-01', '2020-01-17', '2020-02-02', '2020-02-18', '2020-03-05', '2020-03-21', '2020-04-06', '2020-04-22', '2020-05-08', '2020-05-24', '2020-06-09', '2020-06-25', '2020-07-11', '2020-07-27']
And these are the dates I want in the data frame:
dates = ['2000-07-11', '2001-07-12', '2002-07-12', '2003-07-12', '2004-07-11',
'2005-07-12', '2006-07-12', '2007-07-12', '2008-07-11', '2009-07-12',
'2010-07-12', '2011-07-12', '2012-07-11', '2013-07-12', '2014-07-12',
'2015-07-12', '2016-07-11', '2017-07-12', '2018-07-12', '2019-07-12',
'2020-07-11']
I tried just defining a new dates data frame, but I think that caused problems for me later in the code, so I would like to just subset the first dates data frame if there is an easy way to do it.
Thank you for your help

Transform data to growth rates in Python

I have two variables and I want to express one of them (monetary_base) in terms of monthly growth.
How can I do that?. In the R language you should first transform the data into time series, in Python is this also the case?
#LLamando a las series que buscamos
inflacion = llamada_api('https://api.estadisticasbcra.com/inflacion_mensual_oficial')
base_monetaria = llamada_api('https://api.estadisticasbcra.com/base')
#Armando DataFrames
df = pd.DataFrame(inflacion)
df_bm = pd.DataFrame(base_monetaria)
#Renombrando columnas
df = df.rename(columns={'d':'Fecha',
'v':'IPC'})
df_bm = df_bm.rename(columns={'d':'Fecha',
'v':'base_monetaria'})
#Arreglando tipo de datos
df['Fecha']=pd.to_datetime(df['Fecha'])
df_bm['Fecha']=pd.to_datetime(df_bm['Fecha'])
#Verificando que las fechas esten en formato date
df['Fecha'].dtype
df_bm['Fecha'].dtype
#Filtrando
df_ipc = df[(df['Fecha'] > '2002-12-31')]
df_bm_filter = df_bm[(df_bm['Fecha'] > '2002-12-31')]
#Graficando
plt.figure(figsize=(14,12))
df_ipc.plot(x = 'Fecha', y = 'IPC')
plt.title('IPC-Mensual', fontdict={'fontsize':20})
plt.ylabel('IPC')
plt.xticks(rotation=45)
plt.show()
The data looks like this
Fecha base_monetaria
1748 2003-01-02 29302
1749 2003-01-03 29360
1750 2003-01-06 29524
1751 2003-01-07 29867
1752 2003-01-08 29957
... ...
5966 2020-02-18 1941302
5967 2020-02-19 1941904
5968 2020-02-20 1887975
5969 2020-02-21 1855477
5970 2020-02-26 1807042
The idea is to take the data for the last day of the month and calculate the growth rate with the data for the last day of the previous month.
You can try something like this
from pandas.tseries.offsets import MonthEnd
import pandas as pd
df = pd.DataFrame({'Fecha': ['2020-01-31', '2020-02-29', '2020-03-31', '2020-05-31', '2020-04-30', '2020-07-31', '2020-06-30', '2020-08-31', '2020-09-30', '2020-10-31', '2020-11-30', '2020-12-31'],
'price': ['32132', '54321', '3213121', '432123', '32132', '54321', '32132', '54321', '3213121', '432123', '32132', '54321']})
df['Fecha'] = df['Fecha'].astype('datetime64[ns]')
df['is_month_end'] = df['Fecha'].dt.is_month_end
df = df[df['is_month_end'] == True]
df.sort_values('Fecha',inplace=True)
df.reset_index(drop=True, inplace = True)
def change(x,y):
try:
index = df[df['Fecha']==y].index.item()
last = df.loc[index-1][1]
return float(x)/float(last)
except:
return 0
df['new_column'] = df.apply(lambda row: change(row['price'],row['Fecha']), axis=1)
df.head(12)
Assuming the base_moetaria is a monthly cumulative value then
df = pd.DataFrame({'Fecha': ['2020-01-31', '2020-02-29', '2020-03-31', '2020-05-31', '2020-04-30', '2020-07-31', '2020-06-30', '2020-08-31', '2020-09-30', '2020-10-31', '2020-11-30', '2020-12-31'],
'price': [32132, 54321, 3213121, 432123, 32132, 54321, 32132, 54321, 3213121, 432123, 32132, 54321]})
df['Fecha'] = pd.to_datetime(df['Fecha'])
df.set_index('Fecha', inplace=True)
new_df = df.groupby(pd.Grouper(freq="M")).tail(1).reset_index()
new_df['rate'] = (new_df['price'] -new_df['price'].shift(1))/new_df['price'].shift(1)
The new_df['rate'] will give you the growth rate the way you explained in the comment below
The problem can be solve creating a column with the lag values of base_monetaria
df_bm_filter['is_month_end'] = df_bm_filter['Fecha'].dt.is_month_end
df_last_date = df_bm_filter[df_bm_filter['is_month_end'] == True]
df_last_date['base_monetaria_lag'] = df_last_date['base_monetaria'].shift(1)
df_last_date['bm_growth'] = (df_last_date['base_monetaria'] - df_last_date['base_monetaria_lag']) / df_last_date['base_monetaria_lag']

Matplotlib Auto Annotate max doesn't annotate and pushes chart off page

Hello I'm trying to auto annotate matplotlib chart.
I've manage to create it in a way that that doesn't give me any errors when I run it.
However, it doesn't plot the annotation and as I'm plotting in jupyter notebooks it pushes the plot right off the page.
The result I'm looking for is an automatically assigning annotation pointing to the max number in the series ppc_rolling_7d on the chart.
I'm kinda out of ideas as to what has happened here.
example data:
ppc_data = pd.DataFrame({
'Day':['2018-08-31', '2018-09-01', '2018-09-02', '2018-09-03',
'2018-09-04', '2018-09-05', '2018-09-06', '2018-09-07',
'2018-09-08', '2018-09-09', '2018-09-10', '2018-09-11',
'2018-09-12', '2018-09-13', '2018-09-14', '2018-09-15',
'2018-09-16', '2018-09-17', '2018-09-18', '2018-09-19',
'2018-09-20', '2018-09-21', '2018-09-22', '2018-09-23',
'2018-09-24', '2018-09-25', '2018-09-26', '2018-09-27',
'2018-09-28', '2018-09-29', '2018-09-30', '2018-10-01',
'2018-10-02', '2018-10-03', '2018-10-04', '2018-10-05',
'2018-10-06', '2018-10-07', '2018-10-08', '2018-10-09',
'2018-10-10', '2018-10-11', '2018-10-12', '2018-10-13',
'2018-10-14', '2018-10-15', '2018-10-16', '2018-10-17',
'2018-10-18', '2018-10-19', '2018-10-20', '2018-10-21',
'2018-10-22', '2018-10-23', '2018-10-24', '2018-10-25',
'2018-10-26', '2018-10-27', '2018-10-28', '2018-10-29',
'2018-10-30', '2018-10-31', '2018-11-01', '2018-11-02',
'2018-11-03', '2018-11-04', '2018-11-05', '2018-11-06',
'2018-11-07', '2018-11-08', '2018-11-09', '2018-11-10',
'2018-11-11', '2018-11-12', '2018-11-13', '2018-11-14',
'2018-11-15', '2018-11-16', '2018-11-17', '2018-11-18',
'2018-11-19', '2018-11-20', '2018-11-21', '2018-11-22',
'2018-11-23', '2018-11-24', '2018-11-25', '2018-11-26',
'2018-11-27', '2018-11-28', '2018-11-29', '2018-11-30',
'2018-12-01', '2018-12-02', '2018-12-03', '2018-12-04',
'2018-12-05', '2018-12-06', '2018-12-07', '2018-12-08'],
'Cost' : [1105.8097834013993, 1035.8355715930172, 2335.4700418958632,
655.0721024605979, 1154.3067936459986, 2275.8927050269917,
174.47816810392712,1606.0865381579742,973.1285739075876,
677.3734705782231,2381.149891233519, 1137.840620239881,
673.0575320194132, 1969.3783478235364, 1667.3405411738886,
1365.707089062391, 1686.492803446683, 1613.2530220414621,
2275.475164597224, 1593.9382082221036, 1278.8267306408893,
1342.2964464944962, 863.9840442789089, 289.34425736432837,
15.219941807702485, 1595.2327617943374, 1592.8333476628231,
961.5931139385652, 703.2690737772505, 312.9730830647801,
2105.920303495205, 707.710807657391, 873.7377744639931,
152.51387772605813, 1292.4027169055073, 1142.7323830723421,
2400.462099397225, 2027.5730000421765, 2380.127923249452,
370.97680360266463, 978.7472607817784, 144.50724935561453,
1257.3962926696906, 339.44922335906256, 989.3364341529344,
1274.7020560588671, 1697.9640365081489, 81.00819304765376,
528.9126509191693, 893.839100786781, 1778.7263797734338,
1388.1976452584615, 533.7823940180391, 1390.507110740847,
1582.8069647428326, 2058.124928605663, 1456.0037174730746,
315.93672830017414,488.9620970966599, 2020.6125475658266,
1358.8988386729175,1967.1442608919235,436.40540549351783,
2090.41730824453,2114.3435803364277,2235.719648814769,
1773.3190866160382,2372.165649889117, 1186.850504563462,
864.4092140750176, 772.6148714908818,1749.9856862684244,
802.1475898419487, 1013.3410373277948, 1604.4137362997474,
1880.084707526689, 1823.9691856540412,550.6041906641643,
75.26104973616485, 819.9409527114842, 2272.8529542934198,
1836.7071931445969,1491.3728333359875, 1807.2130424285615,
2378.1185581431337,1434.1809462567153,296.49945129452675,
2025.2054514729998,2346.234514785023, 2438.058561262957,
277.36529451533386, 1212.541281523483,2005.258496330315,
2053.7325650486177,2076.001012737591, 2245.606468047353,
2493.336539619115,1116.075112703116,319.54750552662733,
648.633853658328]}
).set_index('Day')
ppc_data.index = pd.to_datetime(ppc_data.index)
ppc_weekly = ppc_data['Cost'].resample('W').mean()
ppc_rolling_7d = ppc_data['Cost'].rolling(window=7, center=True).mean()
ax = fig.add_subplot(111)
figsize = (15,8)
ppc_data['Cost'].plot(figsize=figsize,
alpha=.5,
marker='.',
linestyle='-',
linewidth=0.5,
label='Daily'
)
ppc_weekly.plot(figsize=figsize,
marker='x',
markersize=8,
linestyle='-',
label='Weekly Mean Resample'
)
ppc_rolling_7d.plot(figsize=figsize,
marker='o',
linestyle='-',
label='7-d Rolling Mean'
)
max_value = ppc_rolling_7d.max()
max_value_index = [i for i, j in enumerate(ppc_rolling_7) if j == max_value]
#Create ax customatisations
ax.annotate('Lots of Pageviews but few clicks',
xy=(max_value_index[0],max_value),
xytext=(max_value_index[0],max_value),
arrowprops=dict(facecolor='cyan', #colour
shrink=0.05, #length of arrow
lw=1, #line width
ec='magenta', #boarder colour
zorder=1)) #layering order of annotation
#Global Plot settings
plt.title('COMPARE: Daily, Weekly Mean, 7-d Rolling Mean ') # set chart name
fig.legend() # set the legend
#display the charts
plt.show()
Any suggestions to what could be the problem are welcome.
Thanks to ImportanceOfBeingErnest who commented with the answer.
Using idxmax() will quickly find the index of the max value.
x = ppc_rolling_7d.idxmax(); y = ppc_rolling_7d.max()

pandas.concat() does not fill the columns

I am trying to create dummy data as follows:
import numpy as np
import pandas as pd
def dummy_historical(seclist, dates, startvalues):
dfHist = pd.DataFrame(0, index=[0], columns=seclist)
for sec in seclist:
# (works fine)
svalue = startvalues[sec].max()
# this creates a random sequency of 84 rows and 1 column (works fine)
dfRandom = pd.DataFrame(np.random.randint(svalue-10,svalue+10, size=(dates.size, 1 )), index=dates, columns=[sec])
# does not work
dfHist[sec] = pd.concat([ dfHist[sec] , dfRandom ])
return dfHist
When I print dfHist, it only shows me the first row (as when initiated). Thus nothing has been filled.
Here is an example of the data:
seclist = ['AAPL', 'GOOGL']
# use any number for startvalues
dates = DatetimeIndex(['2017-01-05', '2017-01-06', '2017-01-07', '2017-01-08',
'2017-01-09', '2017-01-10', '2017-01-11', '2017-01-12',
'2017-01-13', '2017-01-14', '2017-01-15', '2017-01-16',
'2017-01-17', '2017-01-18', '2017-01-19', '2017-01-20',
'2017-01-21', '2017-01-22', '2017-01-23', '2017-01-24',
'2017-01-25', '2017-01-26', '2017-01-27', '2017-01-28',
'2017-01-29', '2017-01-30', '2017-01-31', '2017-02-01',
'2017-02-02', '2017-02-03', '2017-02-04', '2017-02-05',
'2017-02-06', '2017-02-07', '2017-02-08', '2017-02-09',
'2017-02-10', '2017-02-11', '2017-02-12', '2017-02-13',
'2017-02-14', '2017-02-15', '2017-02-16', '2017-02-17',
'2017-02-18', '2017-02-19', '2017-02-20', '2017-02-21',
'2017-02-22', '2017-02-23', '2017-02-24', '2017-02-25',
'2017-02-26', '2017-02-27', '2017-02-28', '2017-03-01',
'2017-03-02', '2017-03-03', '2017-03-04', '2017-03-05',
'2017-03-06', '2017-03-07', '2017-03-08', '2017-03-09',
'2017-03-10', '2017-03-11', '2017-03-12', '2017-03-13',
'2017-03-14', '2017-03-15', '2017-03-16', '2017-03-17',
'2017-03-18', '2017-03-19', '2017-03-20', '2017-03-21',
'2017-03-22', '2017-03-23', '2017-03-24', '2017-03-25',
'2017-03-26', '2017-03-27', '2017-03-28', '2017-03-29'],
dtype='datetime64[ns]', freq='D')
You need to pass axis=1 to concat if you want to concatenate columns. In addition, you don't need to initialize your data frame with data in the beginning (except you want to have the 0 value):
def dummy_historical(seclist, dates, startvalues):
dfHist = pd.DataFrame()
for sec in seclist:
svalue = startvalues[sec].max()
dfRandom = pd.DataFrame(np.random.randint(svalue-10,svalue+10, size=(dates.size, 1 )), index=dates, columns=[sec])
dfHist = pd.concat([ dfHist , dfRandom ], axis=1)
return dfHist
You can even write in a more concise way avoiding concat like:
def generate(sec):
svalue = startvalues[sec].max()
return np.random.randint(svalue-10,svalue+10, size=dates.size)
dfHist = pd.DataFrame({sec: generate(sec) for sec in seclist}, index=dates)

Categories