Good Evening,
I have gps coordinates for each trip and i'm trying to have a line between each point for each trip..
I'm using this code, but still doesn't work , if i delete groupby(id) it works but i get a line also between points from not the same tripId ..
tooltip = "Click me!"
for i in range(11):
folium.Marker(df.groupby('id')
[df['latitude'][i],df['longitude'][i]], popup=df['id'][i], tooltip=tooltip ).add_to(map)
route = folium.PolyLine(df.groupby('id')
[[df['latitude'][i],df['longitude'][i]],[df['latitude'][i+1],df['longitude'][i+1]]],
tooltip = "trip" ).add_to(map)
my dataframe looks like that :
longitude latitude id
0 5.184529 52.032471 66168
1 5.184513 52.032047 66168
2 5.184468 52.031559 66168
7 5.183908 52.027328 66168
8 5.175724 52.084732 89751
9 5.175513 52.084743 89751
10 5.174866 52.084713 89751
I suggest separating adding the polylines and the markers to the map. Markers can be added individually, the polylines as lists of geolocations. Since the latter needs to be clustered by id, it makes sense to add them per group, after the groupby:
import pandas as pd
import folium
import io
data = ''' longitude latitude id
0 5.184529 52.032471 66168
1 5.184513 52.032047 66168
2 5.184468 52.031559 66168
7 5.183908 52.027328 66168
8 5.175724 52.084732 89751
9 5.175513 52.084743 89751
10 5.174866 52.084713 89751'''
df = pd.read_csv(io.StringIO(data), sep='\s\s+')
tooltip = "Click me!"
m = folium.Map(location=[52.031559, 5.184468],
zoom_start=15)
for index, row in df.iterrows():
folium.Marker([row['latitude'], row['longitude']],
popup=row['id'],
tooltip=tooltip
).add_to(m)
for index, row in df.groupby('id',as_index=False)[['latitude','longitude']].agg(list).iterrows():
loc = list(zip(row.latitude, row.longitude))
folium.PolyLine(loc, tooltip = "trip").add_to(m)
Related
I got my .dat data formatted into arrays I could use in graphs and whatnot.
I got my data from this website and it requires an account if you want to download it yourself. The data will still be provided below, however.
https://daac.ornl.gov/cgi-bin/dsviewer.pl?ds_id=1028
data in python:
import pandas as pd
df = pd.read_csv("ocean_flux_co2_2d.dat", header=None)
print(df.head())
0 1 2 3
0 -178.75 -77.0 0.000003 32128.7
1 -176.25 -77.0 0.000599 32128.7
2 -173.75 -77.0 0.001649 39113.5
3 -171.25 -77.0 0.003838 58934.0
4 -168.75 -77.0 0.007192 179959.0
I then decided to put this data into arrays that could be put into graphs and other functions.
Like so:
lat = []
lon = []
sed = []
area = []
with open('/home/srowpie/SrowFinProj/Datas/ocean_flux_tss_2d.dat') as f:
for line in f:
parts = line.split(',')
lat.append(float(parts[0]))
lon.append(float(parts[1]))
sed.append(float(parts[2]))
area.append(float(parts[3]))
lat = np.array(lat)
lon = np.array(lon)
sed = np.array(sed)
area = np.array(area)
My question now is how can I put this data into a map with data points? Column 1 is latitude, Column 2 is longitude, Column 3 is sediment flux, and Column 4 is the area covered. Or do I have to bootleg it by making a graph that takes into account the variables lat, lon, and sed?
You don't need to get the data into an array. Just apply df.values and you would have a numpy array of all the data in the dataframe.
Example -
array([[-1.78750e+02, -7.70000e+01, 3.00000e-06, 3.21287e+04],
[-1.76250e+02, -7.70000e+01, 5.99000e-04, 3.21287e+04],
[-1.73750e+02, -7.70000e+01, 1.64900e-03, 3.91135e+04],
[-1.71250e+02, -7.70000e+01, 3.83800e-03, 5.89340e+04],
[-1.68750e+02, -7.70000e+01, 7.19200e-03, 1.79959e+05]])
I'll not recommend storing individual columns as variable. Instead just set the column names for the dataframe and then use them to extract a pandas Series of the data in that column.
df.columns = ["Latitude", "Longitude", "Sediment Flux", "Area covered"]
This what the table would look like after this,
Latitude
Longitude
Sediment Flux
Area covered
0
-178.75
-77.0
3e-06
32128.7
1
-176.25
-77.0
0.000599
32128.7
2
-173.75
-77.0
0.001649
39113.5
3
-171.25
-77.0
0.003838
58934.0
4
-168.75
-77.0
0.007192
179959.0
Simply do df[column_name] to get the data in that column.
For example -> df["Latitude"]
Output -
0 -178.75
1 -176.25
2 -173.75
3 -171.25
4 -168.75
Name: Latitude, dtype: float64
Once you have done all this, you can use folium to plot the rows on real interactive maps.
import folium as fl
map = fl.Map(df.iloc[0, :2], zoom_start = 100)
for index in df.index:
row = df.loc[index, :]
fl.Marker(row[:2].values, f"{dict(row[2:])}").add_to(map)
map
So, for a forecasting project, I have a really long Dataframe of multiple time series of the following type (it has a numerical index):
date
time_series_id
value
2015-08-01
0
0
2015-08-02
0
1
2015-08-03
0
2
2015-08-04
0
3
2015-08-01
1
2
2015-08-02
1
3
2015-08-03
1
4
2015-08-04
1
5
My objective, is to add 3 new columns to these dataset, for each individual time series (each id) that correspond to trend, seasonal and resid.
According to the characteristics of the dataset, they tend to have Nans at the start and the end of the dates.
What I was trying to do was the following:
from statsmodels.tsa.seasonal import seasonal_decompose
df.assign(trend = lambda x: x.groupby("time_series_id")["value"].transform(lambda s: s.mask(~s.isna(), other= seasonal_decompose(s[~s.isna()], model='aditive', extrapolate_trend='freq').trend))
The expected output (trend value are not actual values) should be:
date
time_series_id
value
trend
2015-08-01
0
0
1
2015-08-02
0
1
1
2015-08-03
0
2
1
2015-08-04
0
3
1
2015-08-01
1
2
1
2015-08-02
1
3
1
2015-08-03
1
4
1
2015-08-04
1
5
1
But I get the following error message:
AttributeError: 'Int64Index' object has no attribute 'inferred_freq'
In a previous iteration of my code, this worked for my individual time series data frames, since I had embedded the date column as an index of the data frame instead of an additional column, so the "x" that the lambda function takes has already a "date time" index appropriate for the seasonal_decompose function.
df.assign(
trend = lambda x: x["value"].mask(~x["value"].isna(), other =
seasonal_decompose(x["value"][~x["value"].isna()], model='aditive', extrapolate_trend='freq').trend))
My questions are, first: is it possible to achieve this using groupby? Or other approaches are possible second: is it possible to handle this that doesn't eat much memory? The original dataset I'm working on has approximately 1MM ~ rows, so any help is really welcomed :).
Did one of the already posed solutions work? If so or you found a different solution please share. I tried each without success, but I'm new to Python so probably missing something.
Here is what I came up with, using a for loop. For my dataset it took 8 minutes to decompose 20 million rows consisting of 6,000 different subsets. This works but I wish it were faster.
Date Time
Segment ID
Travel Time(Minutes)
2021-11-09 07:15:00
1
30
2021-11-09 07:30:00
1
18
2021-11-09 07:15:00
2
30
2021-11-09 07:30:00
2
17
segments = set(frame['Segment ID'])
data = pd.DataFrame([])
for s in segments:
df = frame[frame['Segment ID'] == s].set_index('Date Time').resample('H').mean()
comp = sm.tsa.seasonal_decompose(x=df['Travel Time(Minutes)'], period=24*7, two_sided=False)
df = df.join(comp.trend).join(comp.seasonal).join(comp.resid)
#optional columns with some statistics to find outliers and trend changes
df['resid zscore'] = (df['resid'] - df['resid'].mean()).div(df['resid'].std())
df['trend pct_change'] = df.trend.pct_change()
df['trend pct_change zscore'] = (df['trend pct_change'] - df['trend pct_change'].mean()).div(df['trend pct_change'].std())
data = data.append(df.dropna())
where you have lambda x: x.groupby(..., you don't have anything to group; you are telling it to group a row (I believe). You can try a setup like this, perhaps
Here you define a function to act on the group you are sending via the apply() method. Then you should be able to use your original code.
I have not tested this, but I use this setup quite often to work on groups.
def trend_function(x):
# do your lambda function here as you are sending each grouping
x.assign(
trend = lambda x: x["value"].mask(~x["value"].isna(), other =
seasonal_decompose(x["value"][~x["value"].isna()], model='aditive', extrapolate_trend='freq').trend))
return x
dfnew = df.groupby('time_series_id').apply(trend_function)
use extrapolate_trend='freq' as a parameter. you add the trend, seasonal, and residual to a dictionary and plot the dictionary
from statsmodels.graphics import tsaplots
import statsmodels.api as sm
date=['2015-08-01','2015-08-02','2015-08-03','2015-08-04','2015-08-01','2015-08-02','2015-08-03','2015-08-04']
time_series_id=[0,0,0,0,1,1,1,1]
value=[0,1,2,3,2,3,4,5]
df=pd.DataFrame({'date':date,'time_series_id':time_series_id,'value':value})
df['date']=pd.to_datetime(df['date'])
df=df.set_index('date')
print(df)
index_day = df.index.day
value_by_day = df.groupby(index_day)['value'].mean()
fig,ax = plt.subplots(figsize=(12,4))
value_by_day.plot(ax=ax)
plt.title('value by month')
plt.show()
df[['value']].boxplot()
plt.show()
fig,ax = plt.subplots(figsize=(12,4))
df[['value']].hist(ax=ax, bins=5)
plt.show()
fig,ax = plt.subplots(figsize=(12,4))
df[['value']].plot(kind='density', ax=ax)
plt.show()
plt.clf()
fig,ax = plt.subplots(figsize=(12,4))
plt.style.use('seaborn-pastel')
fig = tsaplots.plot_acf(df['value'], lags=4,ax=ax)
plt.show()
decomposition=sm.tsa.seasonal_decompose(x=df['value'],model='additive', extrapolate_trend='freq', period=1)
decomposition.plot()
plt.show()
decomposition_trend=decomposition.trend
ax= decomposition_trend.plot(figsize=(14,2))
ax.set_xlabel('Date')
ax.set_ylabel('Trend of time series')
ax.set_title('Trend values of the time series')
plt.show()
I changed the first piece of code according to my scinario.
Here's my code and attached output
data = pd.DataFrame([])
segments = set(subset['Planning_Material'])
for s in segments:
df = subset[subset['Planning_Material'] == s].set_index('Cal_year_month').resample('M').sum()
comp = sm.tsa.seasonal_decompose(df)
df = df.join(comp.trend).join(comp.seasonal).join(comp.resid)
df['Planning_Material'] = s
data = pd.concat([data,df])
data = data.reset_index()
data = data[['Planning_Material', 'Cal_year_month', 'Historical_demand', 'trend','seasonal','resid']]
data
The context of the problem that I am dealing with is trying to convert the results from a time series forecast, plotted with matplotlib.plotly back into a dataframe so that I can use the cufflinks library to be able to get a more interactive chart going so that I can hover over data points to get a more detailed look at the forecast.
so after training and creating a simulation the code goes:
date_ori = pd.to_datetime(df.iloc[:, 0]).tolist()
for i in range(test_size):
date_ori.append(date_ori[-1] + timedelta(days = 1))
date_ori = pd.Series(date_ori).dt.strftime(date_format = '%Y-%m-%d').tolist()
date_ori[-5:]
accepted_results = []
for r in results:
if (np.array(r[-test_size:]) < np.min(df['Close'])).sum() == 0 and \
(np.array(r[-test_size:]) > np.max(df['Close']) * 2).sum() == 0:
accepted_results.append(r)
len(accepted_results)
accuracies = [calculate_accuracy(df['Close'].values, r[:-test_size]) for r in accepted_results]
plt.figure(figsize = (15, 5))
for no, r in enumerate(accepted_results):
plt.plot(r, label = 'forecast %d'%(no + 1))
plt.plot(df['Close'], label = 'true trend', c = 'black')
plt.legend()
plt.title('average accuracy: %.4f'%(np.mean(accuracies)))
x_range_future = np.arange(len(results[0]))
plt.xticks(x_range_future[::30], date_ori[::30])
plt.show()
I have started to dissect the last plotting section to attempt to convert the data into a dataframe in order to plot with cufflinks as the format for cufflinks is like :
import cufflinks as cf
# data from FXCM Forex Capital Markets Ltd.
raw = pd.read_csv('http://hilpisch.com/fxcm_eur_usd_eod_data.csv',
index_col=0, parse_dates=True)
quotes = raw[['AskOpen', 'AskHigh', 'AskLow', 'AskClose']]
quotes = quotes.iloc[-60:]
quotes.tail()
AskOpen AskHigh AskLow AskClose
2017-12-25 22:00:00 1.18667 1.18791 1.18467 1.18587
2017-12-26 22:00:00 1.18587 1.19104 1.18552 1.18885
2017-12-27 22:00:00 1.18885 1.19592 1.18885 1.19426
2017-12-28 22:00:00 1.19426 1.20256 1.19369 1.20092
2017-12-31 22:00:00 1.20092 1.20144 1.19994 1.20147
qf = cf.QuantFig(
quotes,
title='EUR/USD Exchange Rate',
legend='top',
name='EUR/USD'
)
qf.iplot()
Where I have gotten so far is trying to dissect the plotly graph into a dataframe as so, these are the forecasted results:
df = accepted_results
rd = pd.DataFrame(df)
rd.T
0 1 2 3 4 5 6 7
0 768.699985 768.699985 768.699985 768.699985 768.699985 768.699985 768.699985 768.699985
1 775.319656 775.891012 772.283885 737.763376 773.811344 785.021571 770.438252 770.464180
2 772.387081 787.562968 764.858772 737.837558 775.712162 770.660990 768.103724 770.786379
3 786.316425 779.248516 765.839603 760.195678 783.410054 789.610540 765.924561 773.466415
4 796.039144 803.113903 790.219174 770.508252 795.110376 793.371152 774.331197 786.772606
... ... ... ... ... ... ... ... ...
277 1042.788063 977.462670 1057.189696 1262.203613 1057.900621 1042.329811 1053.378352 1171.416597
278 1026.857102 975.473725 1061.585063 1307.540754 1061.490772 1049.696547 1054.122795 1117.779434
279 1029.388746 977.097765 1069.265953 1192.250498 1064.540056 1049.169295 1045.126807 1242.474584
280 1030.373147 983.650686 1070.628785 1103.139889 1053.571269 1030.669091 1047.641127 1168.965372
281 1023.118504 984.660763 1071.661590 1068.445156 1080.461617 1035.736879 1035.599867 1231.714340
then converting the x axis from
plt.xticks(x_range_future[::30], date_ori[::30])
to
df1 = pd.DataFrame((x_range_future[::30], date_ori[::30]))
df1.T
0 1
0 0 2016-11-02
1 30 2016-12-15
2 60 2017-01-31
3 90 2017-03-15
4 120 2017-04-27
5 150 2017-06-09
6 180 2017-07-24
7 210 2017-09-05
8 240 2017-10-17
9 270 2017-11-20
lastly I have the close column and this is what I've been able to come up with for it so far
len(df['Close'].values)
252
when i use
df['Close'].values
I get an array, I'm having problems getting this all together, the cufflinks iplot graphs are just way better, and it would be amazing if I could somehow gain the intuition to do this, I apologize in advance if I didn't try hard enough, but I'm doing my best I can't seem to find the answer no matter how many times I've searched google so I thought I would ask here.
This is what I did, I went through and printed indipendent strings like print(date_ori) as well as simplified it with print(len(date_ori) which in turn had all of the dates for the forecast, then i made it into a dataframe with df['date'] = pd.DataFrame(date_ori), where as with the results, I had to transpose them with df.T so they would be in a long column format rather than in a long row, so first
df = pd.DataFrame(results)
df = df.T
then
df['date'] = pd.DataFrame(date_ori)
I had trouble naming the column 0 which contained all of the predicted results so i just saved the file with
df.to_csv('yo')
then i edited the column named 0 to results and added .csv to the end, then pulled the data back into memory
then i formatted the date
format = '%Y-%m-%d'
df['Datetime'] = pd.to_datetime(df['date'], format=format)
df = df.set_index(pd.DatetimeIndex(df['Datetime']))
and dropped the un needed columns, and i guess i could add the close data that i started with to plot together now, but i got the results into the dataframe so now i can use these awesome charts! Can't believe i figured it out within 18 hours I was so lost lol.
also i dropped the experiment to just one simulation so there was only 1 row of results to deal with so i could figure it out.
I have a pandas data frame that looks something like this
| id | name | latlon |
0 sat -28,14 | -23, 12 | -21, 13...
the latlon column entry contains multiple latitude/longitude entries, seperated with the | symbol, I need to split them into a list as follows: lat = [-28,-23,-21] lon = [14,12,13]
running the following command will create a list of all the values
sat_df["latlon"]= sat_df["latlon"].str.split("|", expand=False)
example:indexnumber [-58.562242560404705,52.82662430990185, -61.300361184039964,64.0645716165538, -62.8683906074927,76.96557954998904, -63.078154849236505,90.49660509514713, -61.95530287454162,103.39930010176977, -59.727998547544765,114.629246065411, -56.63116878989326,124.07501384844198, -52.9408690779807,131.75498199669985, -48.85803704806645,137.9821558270659, -44.56621244973711,143.03546934613863, -40.08092215592037,147.27807367743728, -35.5075351924213,150.86679792543603,]
how can I continue to split the data, so each other entry is assgined to the lat/lon list respectivley, for the entire dataframe. Alternativley, is there some way to create two columns (lat/lon) which both hold a list object with all the values?
EDIT:
import pandas as pd
sat_df = pd.DataFrame({'卫星编号': {0: 38858, 1: 5, 2: 16}, 'path': {0: '-2023240,1636954,-1409847|-2120945,1594435,-1311586|-2213791,1547970,-1209918|', 1: '8847,-974294,-168045|69303,-972089,-207786|129332,-963859,-246237|189050,-949637,-283483|', 2: '283880,751564,538726|214030,782804,550729|142133,808810,558964|69271,829348,563411|'}, 'latlon': {0: '-28.566504816706743,-58.42623323318429|-26.424915546197877,-58.03051668423269|-24.24957760771616,-57.709052434729294|-22.049419348341488,-57.45429550739338|-19.82765114196696,-57.258197633964414|-17.58719794818057,-57.113255687570714|-15.33074070109176,-57.01245109909582|-13.060755383916138,-56.949188922655416|-10.779548173615462,-56.91723753411087|-8.48928513939462,-56.910669632641685|-6.192021225701933,-56.92380598464241|-3.8897270110140494,-56.951159278680606|-1.5843114029280712,-56.987381318629815|0.7223533959819478,-57.02721062232328|3.028411197431552,-57.06542107180802|5.331999106238248,-57.09677071391785|7.631224662503422,-57.115951252231326|9.924144733525859,-57.11753523668981|12.20873984934678,-57.09592379302077|14.482890506579363,-57.045292032888945|16.744349099342163,-56.95953284633186|18.99070929829218,-56.83219872719919|', 1: '-9.826016080133869,71.12640824438319|-12.077961267269185,74.17040194928683|-14.251942328865088,77.22102880126546|-16.362232784638383,80.31943171515469|-18.372371674164317,83.43158582640798|-20.311489634835258,86.62273098947678|-22.14461262803909,89.85609377674561|-23.896490600856566,93.19765633031801|-25.53339979617313,96.60696767976263|-27.063070616439813,100.12254137641649|-28.488648081761962,103.78528610926675|-29.778331008010497,107.54645547637602|-30.942622037767002,111.47495996053523|-31.95152016226762,115.51397654947516|-32.80866797590735,119.73211812295206|-33.486858278098815,124.06227007574186|-33.98257678066123,128.57116785317814|-34.27304876808886,133.17990028392123|-34.34804732039687,137.91355482600457|-34.19053759979979,142.79776551711302|-33.788689805715364,147.73758823197466|-33.12248489727676,152.7937677542324|', 2: '34.00069374375586,-130.03583418452314|34.3070000099521,-125.16691893340256|34.37547230320849,-120.37930544344802|34.219644836708575,-115.72548686095767|33.8599777210809,-111.25048787484094|33.307236654159695,-106.89130089454063|32.579218893589676,-102.68672977394559|31.69071108398145,-98.63657044455137|30.663892680279847,-94.76720076317056|29.49498481622457,-91.01231662520239|28.20247456939903,-87.39472628213446|26.796048279088225,-83.90476041381801|25.29620394685256,-80.5572008057606|23.686627724590036,-77.28791855670698|21.984668849769005,-74.1108962902788|20.209508481020038,-71.0367205896831|18.337433788359615,-68.00383542959851|16.385207987194672,-65.02251732177939|14.355346635752394,-62.078279068092414|12.266387624465171,-59.17870114389838|10.087160866120724,-56.262880710180255|7.8348695447113235,-53.336971029542006|'}})
#splits latlon data into a list
sat_df.dropna(inplace=True)
sat_df["latlon"]= sat_df["latlon"].str.split("|", expand=False)
sat_df
#need to write each entries latlon list as two lists (alternating lat and lon)
lat = []
lon = []
#for sat_df["latlon"]:
lets go a step back from your str.strip and make use of explode which was added in pandas 0.25
then merge it back based on the index.
df = sat_df['latlon'].str.split('|').explode().str.split(',',expand=True)
new_df = pd.merge(sat_df.drop('latlon',axis=1),
df,left_index=True,
right_index=True).rename(columns={0 : 'Lat', 1 : 'Lon'})
print(new_df.drop('path',axis=1))
卫星编号 Lat Lon
0 38858 -28.566504816706743 -58.42623323318429
0 38858 -26.424915546197877 -58.03051668423269
0 38858 -24.24957760771616 -57.709052434729294
0 38858 -22.049419348341488 -57.45429550739338
0 38858 -19.82765114196696 -57.258197633964414
.. ... ... ...
2 16 14.355346635752394 -62.078279068092414
2 16 12.266387624465171 -59.17870114389838
2 16 10.087160866120724 -56.262880710180255
2 16 7.8348695447113235 -53.336971029542006
2 16 None
For this purpose we are using pandas library.
Initially I have created a dataframe as you have mentioned.
Code:
import pandas as pd
latlon = [-58.562242560404705,52.82662430990185, -61.300361184039964,64.0645716165538, -62.8683906074927,76.96557954998904, -63.078154849236505,90.49660509514713, -61.95530287454162,103.39930010176977, -59.727998547544765,114.629246065411, -56.63116878989326,124.07501384844198, -52.9408690779807,131.75498199669985, -48.85803704806645,137.9821558270659, -44.56621244973711,143.03546934613863, -40.08092215592037,147.27807367743728, -35.5075351924213,150.86679792543603,]
# print(latlon)
data = pd.DataFrame({'id':[0],'name':['sat'],'latlon':[latlon]})
print(data)
Output:
id name latlon
0 0 sat [-58.562242560404705, 52.82662430990185, -61.3...
Now I've converted the latlon to string in order to iterate because if you try to iterate float value you may get error. Then we are passing the lattitude and longitude values to corresponding columns of the dataframe.
This code will work even if you have more any number of records or rows in your dataframe.
Code:
#splittint latlon and making adding the values to lat and lon columns
lats = []
lons = []
for i in range(len(data)):
lat_lon = [str(x) for x in (data['latlon'].tolist()[i])]
lat = []
lon = []
for i in range(len(lat_lon)):
if i%2==0:
lat.append(float(lat_lon[i]))
else:
lon.append(float(lat_lon[i]))
lats.append(lat)
lons.append(lon)
data = data.drop('latlon',axis=1) #dropping latlon column
data.insert(2,'lat',lats) #adding lat column
data.insert(3,'lon',lons) #adding lon column
# print(data)
data #displaying dataframe
Output:
id name lat lon
0 0 sat [-58.562242560404705, -61.300361184039964, -62... [52.82662430990185, 64.0645716165538, 76.96557...
I hope it would be helpful.
I currently have the following code which goes through each row of a dataframe and assigns the prior row value for a certain cell to the current row of a different cell.
Basically what im doing is finding out what 'yesterdays' value for a certain metric is compared to today. As you would expect this is quite slow (especially since I am working with dataframes that have hundreds of thousands of lines).
for index, row in symbol_df.iterrows():
if index != 0:
symbol_df.loc[index, 'yesterday_sma_20'] = symbol_df.loc[index-1]['sma_20']
symbol_df.loc[index, 'yesterday_roc_20'] = symbol_df.loc[index-1]['roc_20']
symbol_df.loc[index, 'yesterday_roc_100'] = symbol_df.loc[index-1]['roc_100']
symbol_df.loc[index, 'yesterday_atr_10'] = symbol_df.loc[index-1]['atr_10']
symbol_df.loc[index, 'yesterday_vsma_20'] = symbol_df.loc[index-1]['vsma_20']
Is there a way to turn this into a vectorized operation? Or really just any way to speed it up instead of having to iterate through each row individually?
I might be overlooking something, but I think using .shift() should do it.
import pandas as pd
df = pd.read_csv('test.csv')
print df
# Date SMA_20 ROC_20
# 0 7/22/2015 0.754889 0.807870
# 1 7/23/2015 0.376448 0.791365
# 2 7/22/2015 0.527232 0.407420
# 3 7/24/2015 0.616281 0.027188
# 4 7/22/2015 0.126556 0.274681
# 5 7/25/2015 0.570008 0.864057
# 6 7/22/2015 0.632057 0.746988
# 7 7/26/2015 0.373405 0.883944
# 8 7/22/2015 0.775591 0.453368
# 9 7/27/2015 0.678638 0.313374
df['y_SMA_20'] = df['SMA_20'].shift()
df['y_ROC_20'] = df['ROC_20'].shift()
print df
# Date SMA_20 ROC_20 y_SMA_20 y_ROC_20
# 0 7/22/2015 0.754889 0.807870 NaN NaN
# 1 7/23/2015 0.376448 0.791365 0.754889 0.807870
# 2 7/22/2015 0.527232 0.407420 0.376448 0.791365
# 3 7/24/2015 0.616281 0.027188 0.527232 0.407420
# 4 7/22/2015 0.126556 0.274681 0.616281 0.027188
# 5 7/25/2015 0.570008 0.864057 0.126556 0.274681
# 6 7/22/2015 0.632057 0.746988 0.570008 0.864057
# 7 7/26/2015 0.373405 0.883944 0.632057 0.746988
# 8 7/22/2015 0.775591 0.453368 0.373405 0.883944
# 9 7/27/2015 0.678638 0.313374 0.775591 0.453368