I am not well-versed in python, and I'm sure there is a simple solution to this (although, I have looked). I got this code from an lpdaac tutorial.
My input is a NETCDF4 file downloaded from MODIS satellite. Printing the metadata of the file returns the variables
file_in = Dataset(file_list[0], 'r', format = 'NETCDF4')
#print metadata
list(file_in.variables)
Out[19]: ['crs', 'time', 'lat', 'lon', '_1_km_16_days_EVI', '_1_km_16_days_VI_Quality']
I want to convert the time variable to date format, and then only select 1 date from each year. Here is the code to convert to date format:
from netCDF4 import num2date
times = file_in.variables["time"] #import time variables
dates = num2date(times[:], times.units) #get the time info
dates = [date.strftime("%Y-%m-%d") for date in dates] #get the list of datetime
print(dates)
The dates are as follows:
['2000-06-25', '2000-07-11', '2000-07-27', '2000-08-12', '2000-08-28', '2000-09-13', '2000-09-29', '2000-10-15', '2000-10-31', '2000-11-16', '2000-12-02', '2000-12-18', '2001-01-01', '2001-01-17', '2001-02-02', '2001-02-18', '2001-03-06', '2001-03-22', '2001-04-07', '2001-04-23', '2001-05-09', '2001-05-25', '2001-06-10', '2001-06-26', '2001-07-12', '2001-07-28', '2001-08-13', '2001-08-29', '2001-09-14', '2001-09-30', '2001-10-16', '2001-11-01', '2001-11-17', '2001-12-03', '2001-12-19', '2002-01-01', '2002-01-17', '2002-02-02', '2002-02-18', '2002-03-06', '2002-03-22', '2002-04-07', '2002-04-23', '2002-05-09', '2002-05-25', '2002-06-10', '2002-06-26', '2002-07-12', '2002-07-28', '2002-08-13', '2002-08-29', '2002-09-14', '2002-09-30', '2002-10-16', '2002-11-01', '2002-11-17', '2002-12-03', '2002-12-19', '2003-01-01', '2003-01-17', '2003-02-02', '2003-02-18', '2003-03-06', '2003-03-22', '2003-04-07', '2003-04-23', '2003-05-09', '2003-05-25', '2003-06-10', '2003-06-26', '2003-07-12', '2003-07-28', '2003-08-13', '2003-08-29', '2003-09-14', '2003-09-30', '2003-10-16', '2003-11-01', '2003-11-17', '2003-12-03', '2003-12-19', '2004-01-01', '2004-01-17', '2004-02-02', '2004-02-18', '2004-03-05', '2004-03-21', '2004-04-06', '2004-04-22', '2004-05-08', '2004-05-24', '2004-06-09', '2004-06-25', '2004-07-11', '2004-07-27', '2004-08-12', '2004-08-28', '2004-09-13', '2004-09-29', '2004-10-15', '2004-10-31', '2004-11-16', '2004-12-02', '2004-12-18', '2005-01-01', '2005-01-17', '2005-02-02', '2005-02-18', '2005-03-06', '2005-03-22', '2005-04-07', '2005-04-23', '2005-05-09', '2005-05-25', '2005-06-10', '2005-06-26', '2005-07-12', '2005-07-28', '2005-08-13', '2005-08-29', '2005-09-14', '2005-09-30', '2005-10-16', '2005-11-01', '2005-11-17', '2005-12-03', '2005-12-19', '2006-01-01', '2006-01-17', '2006-02-02', '2006-02-18', '2006-03-06', '2006-03-22', '2006-04-07', '2006-04-23', '2006-05-09', '2006-05-25', '2006-06-10', '2006-06-26', '2006-07-12', '2006-07-28', '2006-08-13', '2006-08-29', '2006-09-14', '2006-09-30', '2006-10-16', '2006-11-01', '2006-11-17', '2006-12-03', '2006-12-19', '2007-01-01', '2007-01-17', '2007-02-02', '2007-02-18', '2007-03-06', '2007-03-22', '2007-04-07', '2007-04-23', '2007-05-09', '2007-05-25', '2007-06-10', '2007-06-26', '2007-07-12', '2007-07-28', '2007-08-13', '2007-08-29', '2007-09-14', '2007-09-30', '2007-10-16', '2007-11-01', '2007-11-17', '2007-12-03', '2007-12-19', '2008-01-01', '2008-01-17', '2008-02-02', '2008-02-18', '2008-03-05', '2008-03-21', '2008-04-06', '2008-04-22', '2008-05-08', '2008-05-24', '2008-06-09', '2008-06-25', '2008-07-11', '2008-07-27', '2008-08-12', '2008-08-28', '2008-09-13', '2008-09-29', '2008-10-15', '2008-10-31', '2008-11-16', '2008-12-02', '2008-12-18', '2009-01-01', '2009-01-17', '2009-02-02', '2009-02-18', '2009-03-06', '2009-03-22', '2009-04-07', '2009-04-23', '2009-05-09', '2009-05-25', '2009-06-10', '2009-06-26', '2009-07-12', '2009-07-28', '2009-08-13', '2009-08-29', '2009-09-14', '2009-09-30', '2009-10-16', '2009-11-01', '2009-11-17', '2009-12-03', '2009-12-19', '2010-01-01', '2010-01-17', '2010-02-02', '2010-02-18', '2010-03-06', '2010-03-22', '2010-04-07', '2010-04-23', '2010-05-09', '2010-05-25', '2010-06-10', '2010-06-26', '2010-07-12', '2010-07-28', '2010-08-13', '2010-08-29', '2010-09-14', '2010-09-30', '2010-10-16', '2010-11-01', '2010-11-17', '2010-12-03', '2010-12-19', '2011-01-01', '2011-01-17', '2011-02-02', '2011-02-18', '2011-03-06', '2011-03-22', '2011-04-07', '2011-04-23', '2011-05-09', '2011-05-25', '2011-06-10', '2011-06-26', '2011-07-12', '2011-07-28', '2011-08-13', '2011-08-29', '2011-09-14', '2011-09-30', '2011-10-16', '2011-11-01', '2011-11-17', '2011-12-03', '2011-12-19', '2012-01-01', '2012-01-17', '2012-02-02', '2012-02-18', '2012-03-05', '2012-03-21', '2012-04-06', '2012-04-22', '2012-05-08', '2012-05-24', '2012-06-09', '2012-06-25', '2012-07-11', '2012-07-27', '2012-08-12', '2012-08-28', '2012-09-13', '2012-09-29', '2012-10-15', '2012-10-31', '2012-11-16', '2012-12-02', '2012-12-18', '2013-01-01', '2013-01-17', '2013-02-02', '2013-02-18', '2013-03-06', '2013-03-22', '2013-04-07', '2013-04-23', '2013-05-09', '2013-05-25', '2013-06-10', '2013-06-26', '2013-07-12', '2013-07-28', '2013-08-13', '2013-08-29', '2013-09-14', '2013-09-30', '2013-10-16', '2013-11-01', '2013-11-17', '2013-12-03', '2013-12-19', '2014-01-01', '2014-01-17', '2014-02-02', '2014-02-18', '2014-03-06', '2014-03-22', '2014-04-07', '2014-04-23', '2014-05-09', '2014-05-25', '2014-06-10', '2014-06-26', '2014-07-12', '2014-07-28', '2014-08-13', '2014-08-29', '2014-09-14', '2014-09-30', '2014-10-16', '2014-11-01', '2014-11-17', '2014-12-03', '2014-12-19', '2015-01-01', '2015-01-17', '2015-02-02', '2015-02-18', '2015-03-06', '2015-03-22', '2015-04-07', '2015-04-23', '2015-05-09', '2015-05-25', '2015-06-10', '2015-06-26', '2015-07-12', '2015-07-28', '2015-08-13', '2015-08-29', '2015-09-14', '2015-09-30', '2015-10-16', '2015-11-01', '2015-11-17', '2015-12-03', '2015-12-19', '2016-01-01', '2016-01-17', '2016-02-02', '2016-02-18', '2016-03-05', '2016-03-21', '2016-04-06', '2016-04-22', '2016-05-08', '2016-05-24', '2016-06-09', '2016-06-25', '2016-07-11', '2016-07-27', '2016-08-12', '2016-08-28', '2016-09-13', '2016-09-29', '2016-10-15', '2016-10-31', '2016-11-16', '2016-12-02', '2016-12-18', '2017-01-01', '2017-01-17', '2017-02-02', '2017-02-18', '2017-03-06', '2017-03-22', '2017-04-07', '2017-04-23', '2017-05-09', '2017-05-25', '2017-06-10', '2017-06-26', '2017-07-12', '2017-07-28', '2017-08-13', '2017-08-29', '2017-09-14', '2017-09-30', '2017-10-16', '2017-11-01', '2017-11-17', '2017-12-03', '2017-12-19', '2018-01-01', '2018-01-17', '2018-02-02', '2018-02-18', '2018-03-06', '2018-03-22', '2018-04-07', '2018-04-23', '2018-05-09', '2018-05-25', '2018-06-10', '2018-06-26', '2018-07-12', '2018-07-28', '2018-08-13', '2018-08-29', '2018-09-14', '2018-09-30', '2018-10-16', '2018-11-01', '2018-11-17', '2018-12-03', '2018-12-19', '2019-01-01', '2019-01-17', '2019-02-02', '2019-02-18', '2019-03-06', '2019-03-22', '2019-04-07', '2019-04-23', '2019-05-09', '2019-05-25', '2019-06-10', '2019-06-26', '2019-07-12', '2019-07-28', '2019-08-13', '2019-08-29', '2019-09-14', '2019-09-30', '2019-10-16', '2019-11-01', '2019-11-17', '2019-12-03', '2019-12-19', '2020-01-01', '2020-01-17', '2020-02-02', '2020-02-18', '2020-03-05', '2020-03-21', '2020-04-06', '2020-04-22', '2020-05-08', '2020-05-24', '2020-06-09', '2020-06-25', '2020-07-11', '2020-07-27']
And these are the dates I want in the data frame:
dates = ['2000-07-11', '2001-07-12', '2002-07-12', '2003-07-12', '2004-07-11',
'2005-07-12', '2006-07-12', '2007-07-12', '2008-07-11', '2009-07-12',
'2010-07-12', '2011-07-12', '2012-07-11', '2013-07-12', '2014-07-12',
'2015-07-12', '2016-07-11', '2017-07-12', '2018-07-12', '2019-07-12',
'2020-07-11']
I tried just defining a new dates data frame, but I think that caused problems for me later in the code, so I would like to just subset the first dates data frame if there is an easy way to do it.
Thank you for your help
I have two variables and I want to express one of them (monetary_base) in terms of monthly growth.
How can I do that?. In the R language you should first transform the data into time series, in Python is this also the case?
#LLamando a las series que buscamos
inflacion = llamada_api('https://api.estadisticasbcra.com/inflacion_mensual_oficial')
base_monetaria = llamada_api('https://api.estadisticasbcra.com/base')
#Armando DataFrames
df = pd.DataFrame(inflacion)
df_bm = pd.DataFrame(base_monetaria)
#Renombrando columnas
df = df.rename(columns={'d':'Fecha',
'v':'IPC'})
df_bm = df_bm.rename(columns={'d':'Fecha',
'v':'base_monetaria'})
#Arreglando tipo de datos
df['Fecha']=pd.to_datetime(df['Fecha'])
df_bm['Fecha']=pd.to_datetime(df_bm['Fecha'])
#Verificando que las fechas esten en formato date
df['Fecha'].dtype
df_bm['Fecha'].dtype
#Filtrando
df_ipc = df[(df['Fecha'] > '2002-12-31')]
df_bm_filter = df_bm[(df_bm['Fecha'] > '2002-12-31')]
#Graficando
plt.figure(figsize=(14,12))
df_ipc.plot(x = 'Fecha', y = 'IPC')
plt.title('IPC-Mensual', fontdict={'fontsize':20})
plt.ylabel('IPC')
plt.xticks(rotation=45)
plt.show()
The data looks like this
Fecha base_monetaria
1748 2003-01-02 29302
1749 2003-01-03 29360
1750 2003-01-06 29524
1751 2003-01-07 29867
1752 2003-01-08 29957
... ...
5966 2020-02-18 1941302
5967 2020-02-19 1941904
5968 2020-02-20 1887975
5969 2020-02-21 1855477
5970 2020-02-26 1807042
The idea is to take the data for the last day of the month and calculate the growth rate with the data for the last day of the previous month.
You can try something like this
from pandas.tseries.offsets import MonthEnd
import pandas as pd
df = pd.DataFrame({'Fecha': ['2020-01-31', '2020-02-29', '2020-03-31', '2020-05-31', '2020-04-30', '2020-07-31', '2020-06-30', '2020-08-31', '2020-09-30', '2020-10-31', '2020-11-30', '2020-12-31'],
'price': ['32132', '54321', '3213121', '432123', '32132', '54321', '32132', '54321', '3213121', '432123', '32132', '54321']})
df['Fecha'] = df['Fecha'].astype('datetime64[ns]')
df['is_month_end'] = df['Fecha'].dt.is_month_end
df = df[df['is_month_end'] == True]
df.sort_values('Fecha',inplace=True)
df.reset_index(drop=True, inplace = True)
def change(x,y):
try:
index = df[df['Fecha']==y].index.item()
last = df.loc[index-1][1]
return float(x)/float(last)
except:
return 0
df['new_column'] = df.apply(lambda row: change(row['price'],row['Fecha']), axis=1)
df.head(12)
Assuming the base_moetaria is a monthly cumulative value then
df = pd.DataFrame({'Fecha': ['2020-01-31', '2020-02-29', '2020-03-31', '2020-05-31', '2020-04-30', '2020-07-31', '2020-06-30', '2020-08-31', '2020-09-30', '2020-10-31', '2020-11-30', '2020-12-31'],
'price': [32132, 54321, 3213121, 432123, 32132, 54321, 32132, 54321, 3213121, 432123, 32132, 54321]})
df['Fecha'] = pd.to_datetime(df['Fecha'])
df.set_index('Fecha', inplace=True)
new_df = df.groupby(pd.Grouper(freq="M")).tail(1).reset_index()
new_df['rate'] = (new_df['price'] -new_df['price'].shift(1))/new_df['price'].shift(1)
The new_df['rate'] will give you the growth rate the way you explained in the comment below
The problem can be solve creating a column with the lag values of base_monetaria
df_bm_filter['is_month_end'] = df_bm_filter['Fecha'].dt.is_month_end
df_last_date = df_bm_filter[df_bm_filter['is_month_end'] == True]
df_last_date['base_monetaria_lag'] = df_last_date['base_monetaria'].shift(1)
df_last_date['bm_growth'] = (df_last_date['base_monetaria'] - df_last_date['base_monetaria_lag']) / df_last_date['base_monetaria_lag']
Hello I'm trying to auto annotate matplotlib chart.
I've manage to create it in a way that that doesn't give me any errors when I run it.
However, it doesn't plot the annotation and as I'm plotting in jupyter notebooks it pushes the plot right off the page.
The result I'm looking for is an automatically assigning annotation pointing to the max number in the series ppc_rolling_7d on the chart.
I'm kinda out of ideas as to what has happened here.
example data:
ppc_data = pd.DataFrame({
'Day':['2018-08-31', '2018-09-01', '2018-09-02', '2018-09-03',
'2018-09-04', '2018-09-05', '2018-09-06', '2018-09-07',
'2018-09-08', '2018-09-09', '2018-09-10', '2018-09-11',
'2018-09-12', '2018-09-13', '2018-09-14', '2018-09-15',
'2018-09-16', '2018-09-17', '2018-09-18', '2018-09-19',
'2018-09-20', '2018-09-21', '2018-09-22', '2018-09-23',
'2018-09-24', '2018-09-25', '2018-09-26', '2018-09-27',
'2018-09-28', '2018-09-29', '2018-09-30', '2018-10-01',
'2018-10-02', '2018-10-03', '2018-10-04', '2018-10-05',
'2018-10-06', '2018-10-07', '2018-10-08', '2018-10-09',
'2018-10-10', '2018-10-11', '2018-10-12', '2018-10-13',
'2018-10-14', '2018-10-15', '2018-10-16', '2018-10-17',
'2018-10-18', '2018-10-19', '2018-10-20', '2018-10-21',
'2018-10-22', '2018-10-23', '2018-10-24', '2018-10-25',
'2018-10-26', '2018-10-27', '2018-10-28', '2018-10-29',
'2018-10-30', '2018-10-31', '2018-11-01', '2018-11-02',
'2018-11-03', '2018-11-04', '2018-11-05', '2018-11-06',
'2018-11-07', '2018-11-08', '2018-11-09', '2018-11-10',
'2018-11-11', '2018-11-12', '2018-11-13', '2018-11-14',
'2018-11-15', '2018-11-16', '2018-11-17', '2018-11-18',
'2018-11-19', '2018-11-20', '2018-11-21', '2018-11-22',
'2018-11-23', '2018-11-24', '2018-11-25', '2018-11-26',
'2018-11-27', '2018-11-28', '2018-11-29', '2018-11-30',
'2018-12-01', '2018-12-02', '2018-12-03', '2018-12-04',
'2018-12-05', '2018-12-06', '2018-12-07', '2018-12-08'],
'Cost' : [1105.8097834013993, 1035.8355715930172, 2335.4700418958632,
655.0721024605979, 1154.3067936459986, 2275.8927050269917,
174.47816810392712,1606.0865381579742,973.1285739075876,
677.3734705782231,2381.149891233519, 1137.840620239881,
673.0575320194132, 1969.3783478235364, 1667.3405411738886,
1365.707089062391, 1686.492803446683, 1613.2530220414621,
2275.475164597224, 1593.9382082221036, 1278.8267306408893,
1342.2964464944962, 863.9840442789089, 289.34425736432837,
15.219941807702485, 1595.2327617943374, 1592.8333476628231,
961.5931139385652, 703.2690737772505, 312.9730830647801,
2105.920303495205, 707.710807657391, 873.7377744639931,
152.51387772605813, 1292.4027169055073, 1142.7323830723421,
2400.462099397225, 2027.5730000421765, 2380.127923249452,
370.97680360266463, 978.7472607817784, 144.50724935561453,
1257.3962926696906, 339.44922335906256, 989.3364341529344,
1274.7020560588671, 1697.9640365081489, 81.00819304765376,
528.9126509191693, 893.839100786781, 1778.7263797734338,
1388.1976452584615, 533.7823940180391, 1390.507110740847,
1582.8069647428326, 2058.124928605663, 1456.0037174730746,
315.93672830017414,488.9620970966599, 2020.6125475658266,
1358.8988386729175,1967.1442608919235,436.40540549351783,
2090.41730824453,2114.3435803364277,2235.719648814769,
1773.3190866160382,2372.165649889117, 1186.850504563462,
864.4092140750176, 772.6148714908818,1749.9856862684244,
802.1475898419487, 1013.3410373277948, 1604.4137362997474,
1880.084707526689, 1823.9691856540412,550.6041906641643,
75.26104973616485, 819.9409527114842, 2272.8529542934198,
1836.7071931445969,1491.3728333359875, 1807.2130424285615,
2378.1185581431337,1434.1809462567153,296.49945129452675,
2025.2054514729998,2346.234514785023, 2438.058561262957,
277.36529451533386, 1212.541281523483,2005.258496330315,
2053.7325650486177,2076.001012737591, 2245.606468047353,
2493.336539619115,1116.075112703116,319.54750552662733,
648.633853658328]}
).set_index('Day')
ppc_data.index = pd.to_datetime(ppc_data.index)
ppc_weekly = ppc_data['Cost'].resample('W').mean()
ppc_rolling_7d = ppc_data['Cost'].rolling(window=7, center=True).mean()
ax = fig.add_subplot(111)
figsize = (15,8)
ppc_data['Cost'].plot(figsize=figsize,
alpha=.5,
marker='.',
linestyle='-',
linewidth=0.5,
label='Daily'
)
ppc_weekly.plot(figsize=figsize,
marker='x',
markersize=8,
linestyle='-',
label='Weekly Mean Resample'
)
ppc_rolling_7d.plot(figsize=figsize,
marker='o',
linestyle='-',
label='7-d Rolling Mean'
)
max_value = ppc_rolling_7d.max()
max_value_index = [i for i, j in enumerate(ppc_rolling_7) if j == max_value]
#Create ax customatisations
ax.annotate('Lots of Pageviews but few clicks',
xy=(max_value_index[0],max_value),
xytext=(max_value_index[0],max_value),
arrowprops=dict(facecolor='cyan', #colour
shrink=0.05, #length of arrow
lw=1, #line width
ec='magenta', #boarder colour
zorder=1)) #layering order of annotation
#Global Plot settings
plt.title('COMPARE: Daily, Weekly Mean, 7-d Rolling Mean ') # set chart name
fig.legend() # set the legend
#display the charts
plt.show()
Any suggestions to what could be the problem are welcome.
Thanks to ImportanceOfBeingErnest who commented with the answer.
Using idxmax() will quickly find the index of the max value.
x = ppc_rolling_7d.idxmax(); y = ppc_rolling_7d.max()
I am trying to create dummy data as follows:
import numpy as np
import pandas as pd
def dummy_historical(seclist, dates, startvalues):
dfHist = pd.DataFrame(0, index=[0], columns=seclist)
for sec in seclist:
# (works fine)
svalue = startvalues[sec].max()
# this creates a random sequency of 84 rows and 1 column (works fine)
dfRandom = pd.DataFrame(np.random.randint(svalue-10,svalue+10, size=(dates.size, 1 )), index=dates, columns=[sec])
# does not work
dfHist[sec] = pd.concat([ dfHist[sec] , dfRandom ])
return dfHist
When I print dfHist, it only shows me the first row (as when initiated). Thus nothing has been filled.
Here is an example of the data:
seclist = ['AAPL', 'GOOGL']
# use any number for startvalues
dates = DatetimeIndex(['2017-01-05', '2017-01-06', '2017-01-07', '2017-01-08',
'2017-01-09', '2017-01-10', '2017-01-11', '2017-01-12',
'2017-01-13', '2017-01-14', '2017-01-15', '2017-01-16',
'2017-01-17', '2017-01-18', '2017-01-19', '2017-01-20',
'2017-01-21', '2017-01-22', '2017-01-23', '2017-01-24',
'2017-01-25', '2017-01-26', '2017-01-27', '2017-01-28',
'2017-01-29', '2017-01-30', '2017-01-31', '2017-02-01',
'2017-02-02', '2017-02-03', '2017-02-04', '2017-02-05',
'2017-02-06', '2017-02-07', '2017-02-08', '2017-02-09',
'2017-02-10', '2017-02-11', '2017-02-12', '2017-02-13',
'2017-02-14', '2017-02-15', '2017-02-16', '2017-02-17',
'2017-02-18', '2017-02-19', '2017-02-20', '2017-02-21',
'2017-02-22', '2017-02-23', '2017-02-24', '2017-02-25',
'2017-02-26', '2017-02-27', '2017-02-28', '2017-03-01',
'2017-03-02', '2017-03-03', '2017-03-04', '2017-03-05',
'2017-03-06', '2017-03-07', '2017-03-08', '2017-03-09',
'2017-03-10', '2017-03-11', '2017-03-12', '2017-03-13',
'2017-03-14', '2017-03-15', '2017-03-16', '2017-03-17',
'2017-03-18', '2017-03-19', '2017-03-20', '2017-03-21',
'2017-03-22', '2017-03-23', '2017-03-24', '2017-03-25',
'2017-03-26', '2017-03-27', '2017-03-28', '2017-03-29'],
dtype='datetime64[ns]', freq='D')
You need to pass axis=1 to concat if you want to concatenate columns. In addition, you don't need to initialize your data frame with data in the beginning (except you want to have the 0 value):
def dummy_historical(seclist, dates, startvalues):
dfHist = pd.DataFrame()
for sec in seclist:
svalue = startvalues[sec].max()
dfRandom = pd.DataFrame(np.random.randint(svalue-10,svalue+10, size=(dates.size, 1 )), index=dates, columns=[sec])
dfHist = pd.concat([ dfHist , dfRandom ], axis=1)
return dfHist
You can even write in a more concise way avoiding concat like:
def generate(sec):
svalue = startvalues[sec].max()
return np.random.randint(svalue-10,svalue+10, size=dates.size)
dfHist = pd.DataFrame({sec: generate(sec) for sec in seclist}, index=dates)