Month, Year with Value Plot, Pandas and MatPlotLib - python

I am trying to plot a time graph with month and year combined for my x and values for y. Python is reading my excel data with decimal points so won't allow to convert to %m %Y. Any ideas?
MY EXCEL DATA
How python reads my data
0 3.0-2015.0
1 5.0-2015.0
3 6.0-2017.0
...
68 nan-nan
69 nan-nan
70 nan-nan
71 nan-nan'
# Code
import plotly
import plotly.graph_objects as go
import matplotlib.pyplot as plt
import pandas as pd
import math
# Set Directory
workbook1 = 'GAP Insurance - 1.xlsx'
workbook2 = 'GAP Insurance - 2.xlsx'
workbook3 = 'GAP Insurance - 3.xlsx'
df = pd.read_excel(workbook1, 'Sheet1',)
# Set x axis
df['Time'] = (df['Month']).astype(str)+ '-' + (df['Year']).astype(str)
df['Time'] = pd.to_datetime(df['Time'], format='%m-%Y').dt.strftime('%m-%Y')

You could try converting to "int" before converting to "str" in this line:
df['Time'] = (df['Month']).astype(str)+ '-' + (df['Year']).astype(str)
This should ensure that what gets stored does not include decimal points.

Related

How the X axis on a Linearregression is formated and processed?

I am trying to build a regression line based on date and closure price of a stock.
I know the regline doesn't allow to be calculated on date, so I transform the date to be a numerical value.
I have been able to format the data as it requires.
Here is my sample code :
import datetime as dt
import csv
import pandas as pd
from sklearn.linear_model import LinearRegression
import numpy as np
source = 'C:\\path'
#gets file
df = pd.read_csv(source+'\\ABBN.SW.csv')
#change string to datetime
df['Date'] = pd.to_datetime(df['Date'])
#change datetime to numerical value
df['Date'] = df['Date'].map(dt.datetime.toordinal)
#build X and Y axis
x = np.array(df['Date']).reshape(-1, 1)
y = np.array(df['Close'])
model = LinearRegression()
model.fit(x,y)
print(model.intercept_)
print(model.coef_)
print(x)
[[734623]
[734625]
[734626]
...
[738272]
[738273]
[738274]]
print(y)
[16.54000092 16.61000061 16.5 28.82999992 28.88999939 ... 29.60000038]
intercept : -1824.9528261991056 #complete off the charts, it should be around 18-20
coef : [0.00250826]
The question here is : What I am missing on the X axis (date) to produce a correct intercept ?
It looks like the the coef is right tho.
See the example on excel (old data)
References used :
https://realpython.com/linear-regression-in-python/
https://medium.com/python-data-analysis/linear-regression-on-time-series-data-like-stock-price-514a42d5ac8a
https://www.alpharithms.com/predicting-stock-prices-with-linear-regression-214618/
I would suggest to apply min-max normalisation to your ordinal dates. In this manner you will get the desired "small" intercept out of the linear regression.
import datetime as dt
import csv
import pandas as pd
from sklearn.linear_model import LinearRegression
import numpy as np
df = pd.read_csv("data.csv")
df['Date'] = pd.to_datetime(df["Date"])
df['Date_ordinal'] = df["Date"].map(dt.datetime.toordinal)
df["Date_normalized"] = df["Date"].apply(lambda x: len(df["Date"]) * (x - df["Date"].min()) / (df["Date"].max() - df["Date"].min()))
print(df)
def apply_linear(df,label_dates):
x = np.array(df[label_dates]).reshape(-1, 1)
y = np.array(df['Close'])
model = LinearRegression()
model.fit(x,y)
print("intercep = ",model.intercept_)
print("coef = ",model.coef_[0])
print("Without normalization")
apply_linear(df,"Date_ordinal")
print("With normalization")
apply_linear(df,"Date_normalized")
And the results of my execution as follows, passing to it an invented representative data set for your purpose:
PS C:\Users\ruben\PycharmProjects\stackOverFlowQnA> python .\main.py
Date Close Date_ordinal Date_normalized
0 2022-04-01 111 738246 0.000000
1 2022-04-02 112 738247 0.818182
2 2022-04-03 120 738248 1.636364
3 2022-04-04 115 738249 2.454545
4 2022-04-05 105 738250 3.272727
5 2022-04-09 95 738254 6.545455
6 2022-04-10 100 738255 7.363636
7 2022-04-11 105 738256 8.181818
8 2022-04-12 112 738257 9.000000
Without normalization
intercep = 743632.8904761908
coef = -1.0071428571428576
With normalization
intercep = 113.70476190476191
coef = -1.2309523809523817

pandas DataFrame plot - impossible to set xtick intervals for timedelta values

I am trying to specify the x-axis interval when plotting DataFrames. I have several data files like,
0:0:0 29
0:5:0 85
0:10:0 141
0:15:0 198
0:20:0 251
0:25:0 308
0:30:0 363
0:35:0 413
Where first column is time in %H:%M:%S format but hours goes beyond 24 hours (till 48 hours).
When I read the file as below and plot it looks fine but I want to set the xticks interval to 8 hours.
df0 = pd.read_csv(fil, names=['Time', 'Count'], delim_whitespace=True, parse_dates=['Time'])
df0 = df0.set_index('Time')
ax = matplotlib.pyplot.gca()
mkfunc = lambda x, pos: '%1.1fM' % (x * 1e-6) if x >= 1e6 else '%1.1fK' % (x * 1e-3) if x >= 1e3 else '%1.1f' % x
mkformatter = matplotlib.ticker.FuncFormatter(mkfunc)
ax.yaxis.set_major_formatter(mkformatter)
ax.xaxis.set_major_locator(mdates.HourLocator(interval=8))
ax.xaxis.set_major_formatter(mdates.DateFormatter('%H'))
df0.plot(ax=ax, x_compat=True, color='blue')
plt.grid()
plt.savefig('figure2.pdf',dpi=300, bbox_inches = "tight")
I tried the above method as specified by many answers here but that resulted in the following warning,
Locator attempting to generate 1874 ticks ([-28.208333333333332, ..., 596.125]), which exceeds Locator.MAXTICKS (1000).
The figure also displayed many vertical lines.
I tried converting my time column specifically to timedelta and it still did not help.
I converted to timedelta as below.
custom_date_parser = lambda x: pd.to_timedelta(x.split('.')[0])
df0 = pd.read_csv(fil, names=['Time', 'Count'], delim_whitespace=True, parse_dates=['Time']), date_parser=custom_date_parser)
Could you please help me to identify the issue and set the xticks interval correctly?
The problem here is that a) matplotlib/pandas don't have much support for timedelta objects and b) you cannot use the HourLocator with your data because after conversion to a datetime object, your axis would be labelled 0, 8, 16, 0, 8, 16...
Instead, we can convert the timedelta imported by your converter into hours and plot the numerical values:
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.ticker import MultipleLocator
import numpy as np
custom_date_parser = lambda x: pd.to_timedelta(x.split('.')[0])
df0 = pd.read_csv("test.txt", names=['Time', 'Count'], delim_whitespace=True, parse_dates=['Time'], date_parser=custom_date_parser)
#conversion into numerical hour value
df0["Time"] /= np.timedelta64(1, "h")
df0 = df0.set_index('Time')
ax = matplotlib.pyplot.gca()
df0.plot(ax=ax, x_compat=True, color='blue')
mkfunc = lambda x, pos: '%1.1fM' % (x * 1e-6) if x >= 1e6 else '%1.1fK' % (x * 1e-3) if x >= 1e3 else '%1.1f' % x
mkformatter = matplotlib.ticker.FuncFormatter(mkfunc)
ax.yaxis.set_major_formatter(mkformatter)
#set locator at regular hour intervals
ax.xaxis.set_major_locator(MultipleLocator(8))
ax.set_xlabel("Time (in h)")
plt.grid()
plt.show()
Sample output:
If for reasons unknown you actually need datetime objects, you can convert your timedelta values using an arbitrary offset, as you intend to ignore the day value:
df0["Time"] += pd.to_datetime("2000-01-01 00:00:00 UTC")
But I doubt this will be of advantage in your case.
As an aside - for debugging, it is useful not to use regularly spaced test data. In your example, you probably did not notice that the graph was plotted against the index (0, 1, 2...) and then relabeled with strings, imitating regularly spaced datetime objects. The following test data immediately reveal the problem.
0:0:0 29
0:5:0 85
0:10:0 141
3:15:0 98
5:20:0 251
17:25:0 308
27:30:0 63
35:35:0 413

how to create a stacked bar chart indicating time spent on nest per day

I have some data of an owl being present in the nest box. In a previous question you helped me visualize when the owl is in the box:
In addition I created a plot of the hours per day spent in the box with the code below (probably this can be done more efficiently):
import pandas as pd
import matplotlib.pyplot as plt
# raw data indicating time spent in box (each row represents start and end time)
time = pd.DatetimeIndex(["2021-12-01 18:08","2021-12-01 18:11",
"2021-12-02 05:27","2021-12-02 05:29",
"2021-12-02 22:40","2021-12-02 22:43",
"2021-12-03 19:24","2021-12-03 19:27",
"2021-12-06 18:04","2021-12-06 18:06",
"2021-12-07 05:28","2021-12-07 05:30",
"2021-12-10 03:05","2021-12-10 03:10",
"2021-12-10 07:11","2021-12-10 07:13",
"2021-12-10 20:40","2021-12-10 20:41",
"2021-12-12 19:42","2021-12-12 19:45",
"2021-12-13 04:13","2021-12-13 04:17",
"2021-12-15 04:28","2021-12-15 04:30",
"2021-12-15 05:21","2021-12-15 05:25",
"2021-12-15 17:40","2021-12-15 17:44",
"2021-12-15 22:31","2021-12-15 22:37",
"2021-12-16 04:24","2021-12-16 04:28",
"2021-12-16 19:58","2021-12-16 20:09",
"2021-12-17 17:42","2021-12-17 18:04",
"2021-12-17 22:19","2021-12-17 22:26",
"2021-12-18 05:41","2021-12-18 05:44",
"2021-12-19 07:40","2021-12-19 16:55",
"2021-12-19 20:39","2021-12-19 20:52",
"2021-12-19 21:56","2021-12-19 23:17",
"2021-12-21 04:53","2021-12-21 04:59",
"2021-12-21 05:37","2021-12-21 05:39",
"2021-12-22 08:06","2021-12-22 17:22",
"2021-12-22 20:04","2021-12-22 21:24",
"2021-12-22 21:44","2021-12-22 22:47",
"2021-12-23 02:20","2021-12-23 06:17",
"2021-12-23 08:07","2021-12-23 16:54",
"2021-12-23 19:36","2021-12-23 23:59:59",
"2021-12-24 00:00","2021-12-24 00:28",
"2021-12-24 07:53","2021-12-24 17:00",
])
# create dataframe with column indicating presence (1) or absence (0)
time_df = pd.DataFrame(data={'present':[1,0]*int(len(time)/2)}, index=time)
# calculate interval length and add to time_df
time_df['interval'] = time_df.index.to_series().diff().astype('timedelta64[m]')
# add column with day to time_df
time_df['day'] = time.day
#select only intervals where owl is present
timeinbox = time_df.iloc[1::2, :]
interval = timeinbox.interval
day = timeinbox.day
# sum multiple intervals per day
interval_tot = [interval[0]]
day_tot = [day[0]]
for i in range(1, len(day)):
if day[i] == day[i-1]:
interval_tot[-1] +=interval[i]
else:
day_tot.append(day[i])
interval_tot.append(interval[i])
# recalculate to hours
for i in range(len(interval_tot)):
interval_tot[i] = interval_tot[i]/(60)
plt.figure(figsize=(15, 5))
plt.grid(zorder=0)
plt.bar(day_tot, interval_tot, color='g', zorder=3)
plt.xlim([1,31])
plt.xlabel('day in December')
plt.ylabel('hours per day in nest box')
plt.xticks(np.arange(1,31,1))
plt.ylim([0, 24])
Now I would like to combine all data in one plot by making a stacked bar chart, where each day is represented by a bar and each bar indicating for each of the 24*60 minutes whether the owl is present or not. Is this possible from the current data structure?
The data seems to have been created manually, so I have changed the format of the data presented. The approach I took was to create the time spent and the time not spent, with a continuous index of 1 minute intervals with the start and end time as the difference time and a flag of 1. Now to create non-stay time, I will create a time series index of start and end date + 1 at 1 minute intervals. Update the original data frame with the newly created index. This is the data for the graph. In the graph, based on the data frame extracted in days, create a color list with red for stay and green for non-stay. Then, in a bar graph, stack the height one. It may be necessary to consider grouping the data into hourly units.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import timedelta
import io
data = '''
start_time,end_time
"2021-12-01 18:08","2021-12-01 18:11"
"2021-12-02 05:27","2021-12-02 05:29"
"2021-12-02 22:40","2021-12-02 22:43"
"2021-12-03 19:24","2021-12-03 19:27"
"2021-12-06 18:04","2021-12-06 18:06"
"2021-12-07 05:28","2021-12-07 05:30"
"2021-12-10 03:05","2021-12-10 03:10"
"2021-12-10 07:11","2021-12-10 07:13"
"2021-12-10 20:40","2021-12-10 20:41"
"2021-12-12 19:42","2021-12-12 19:45"
"2021-12-13 04:13","2021-12-13 04:17"
"2021-12-15 04:28","2021-12-15 04:30"
"2021-12-15 05:21","2021-12-15 05:25"
"2021-12-15 17:40","2021-12-15 17:44"
"2021-12-15 22:31","2021-12-15 22:37"
"2021-12-16 04:24","2021-12-16 04:28"
"2021-12-16 19:58","2021-12-16 20:09"
"2021-12-17 17:42","2021-12-17 18:04"
"2021-12-17 22:19","2021-12-17 22:26"
"2021-12-18 05:41","2021-12-18 05:44"
"2021-12-19 07:40","2021-12-19 16:55"
"2021-12-19 20:39","2021-12-19 20:52"
"2021-12-19 21:56","2021-12-19 23:17"
"2021-12-21 04:53","2021-12-21 04:59"
"2021-12-21 05:37","2021-12-21 05:39"
"2021-12-22 08:06","2021-12-22 17:22"
"2021-12-22 20:04","2021-12-22 21:24"
"2021-12-22 21:44","2021-12-22 22:47"
"2021-12-23 02:20","2021-12-23 06:17"
"2021-12-23 08:07","2021-12-23 16:54"
"2021-12-23 19:36","2021-12-24 00:00"
"2021-12-24 00:00","2021-12-24 00:28"
"2021-12-24 07:53","2021-12-24 17:00"
'''
df = pd.read_csv(io.StringIO(data), sep=',')
df['start_time'] = pd.to_datetime(df['start_time'])
df['end_time'] = pd.to_datetime(df['end_time'])
time_df = pd.DataFrame()
for idx, row in df.iterrows():
rng = pd.date_range(row['start_time'], row['end_time']-timedelta(minutes=1), freq='1min')
tmp = pd.DataFrame({'present':[1]*len(rng)}, index=rng)
time_df = time_df.append(tmp)
date_add = pd.date_range(time_df.index[0].date(), time_df.index[-1].date()+timedelta(days=1), freq='1min')
time_df = time_df.reindex(date_add, fill_value=0)
time_df['day'] = time_df.index.day
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(8,15))
ax.set_yticks(np.arange(0,1500,60))
ax.set_ylim(0,1440)
ax.set_xticks(np.arange(1,25,1))
days = time_df['day'].unique()
for d in days:
#if d == 1:
day_df = time_df.query('day == #d')
colors = [ 'r' if p == 1 else 'g' for p in day_df['present']]
for i in range(len(day_df)):
ax.bar(d, height=1, width=0.5, bottom=i+1, color=colors[i])
plt.show()

Find intersection points for two stock timeseries

Background
I am trying to find intersection points of two series. In this stock example, I would like to find the intersection points of SMA20 & SMA50. Simple Moving Average (SMA) is commonly used as stock indicators, combined with intersections and other strategies will help one to make decision. Below is the code example.
Code
You can run the following with jupyter.
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
datafile = 'output_XAG_D1_20200101_to_20200601.csv'
#This creates a dataframe from the CSV file:
data = pd.read_csv(datafile, index_col = 'Date')
#This selects the 'Adj Close' column
close = data['BidClose']
#This converts the date strings in the index into pandas datetime format:
close.index = pd.to_datetime(close.index)
close
sma20 = close.rolling(window=20).mean()
sma50 = close.rolling(window=50).mean()
priceSma_df = pd.DataFrame({
'BidClose' : close,
'SMA 20' : sma20,
'SMA 50' : sma50
})
priceSma_df.plot()
plt.show()
Sample Data
This is the data file used in example output_XAG_D1_20200101_to_20200601.csv
Date,BidOpen,BidHigh,BidLow,BidClose,AskOpen,AskHigh,AskLow,AskClose,Volume
01.01.2020 22:00:00,1520.15,1531.26,1518.35,1527.78,1520.65,1531.75,1518.73,1531.73,205667
01.02.2020 22:00:00,1527.78,1553.43,1526.72,1551.06,1531.73,1553.77,1528.17,1551.53,457713
01.05.2020 22:00:00,1551.06,1588.16,1551.06,1564.4,1551.53,1590.51,1551.53,1568.32,540496
01.06.2020 22:00:00,1564.4,1577.18,1555.2,1571.62,1568.32,1577.59,1555.54,1575.56,466430
01.07.2020 22:00:00,1571.62,1611.27,1552.13,1554.79,1575.56,1611.74,1552.48,1558.72,987671
01.08.2020 22:00:00,1554.79,1561.24,1540.08,1549.78,1558.72,1561.58,1540.5,1553.73,473799
01.09.2020 22:00:00,1549.78,1563.0,1545.62,1562.44,1553.73,1563.41,1545.96,1562.95,362002
01.12.2020 22:00:00,1562.44,1562.44,1545.38,1545.46,1562.95,1563.06,1546.71,1549.25,280809
01.13.2020 22:00:00,1545.46,1548.77,1535.78,1545.1,1549.25,1549.25,1536.19,1548.87,378200
01.14.2020 22:00:00,1545.1,1558.04,1543.79,1554.89,1548.87,1558.83,1546.31,1558.75,309719
01.15.2020 22:00:00,1554.89,1557.98,1547.91,1551.18,1558.75,1558.75,1548.24,1554.91,253944
01.16.2020 22:00:00,1551.18,1561.12,1549.28,1556.68,1554.91,1561.55,1549.59,1557.15,239186
01.19.2020 22:00:00,1556.68,1562.69,1556.25,1560.77,1557.15,1562.97,1556.61,1561.17,92020
01.20.2020 22:00:00,1560.77,1568.49,1546.21,1556.8,1561.17,1568.87,1546.56,1558.5,364753
01.21.2020 22:00:00,1556.8,1559.18,1550.07,1558.59,1558.5,1559.47,1550.42,1559.31,238468
01.22.2020 22:00:00,1558.59,1567.83,1551.8,1562.45,1559.31,1568.16,1552.11,1564.17,365518
01.23.2020 22:00:00,1562.45,1575.77,1556.44,1570.39,1564.17,1576.12,1556.76,1570.87,368529
01.26.2020 22:00:00,1570.39,1588.41,1570.39,1580.51,1570.87,1588.97,1570.87,1582.33,510524
01.27.2020 22:00:00,1580.51,1582.93,1565.31,1567.15,1582.33,1583.3,1565.79,1570.62,384205
01.28.2020 22:00:00,1567.15,1577.93,1563.27,1576.7,1570.62,1578.22,1563.61,1577.25,328766
01.29.2020 22:00:00,1576.7,1585.87,1572.19,1573.23,1577.25,1586.18,1572.44,1575.33,522371
01.30.2020 22:00:00,1573.23,1589.98,1570.82,1589.75,1575.33,1590.37,1571.14,1590.31,482710
02.02.2020 22:00:00,1589.75,1593.09,1568.65,1575.62,1590.31,1595.82,1569.85,1578.35,488585
02.03.2020 22:00:00,1575.62,1579.56,1548.95,1552.55,1578.35,1579.87,1549.31,1556.4,393037
02.04.2020 22:00:00,1552.55,1562.3,1547.34,1554.62,1556.4,1562.64,1547.72,1556.42,473172
02.05.2020 22:00:00,1554.62,1568.14,1552.39,1565.08,1556.42,1568.51,1552.73,1567.0,365580
02.06.2020 22:00:00,1565.08,1574.02,1559.82,1570.11,1567.0,1574.33,1560.7,1570.55,424269
02.09.2020 22:00:00,1570.11,1576.9,1567.9,1571.05,1570.55,1577.25,1568.21,1573.34,326606
02.10.2020 22:00:00,1571.05,1573.92,1561.92,1566.12,1573.34,1574.27,1562.24,1568.12,310037
02.11.2020 22:00:00,1566.12,1570.39,1561.45,1564.26,1568.12,1570.71,1561.91,1567.02,269032
02.12.2020 22:00:00,1564.26,1578.24,1564.26,1574.5,1567.02,1578.52,1565.81,1576.63,368438
02.13.2020 22:00:00,1574.5,1584.87,1572.44,1584.49,1576.63,1585.29,1573.28,1584.91,250788
02.16.2020 22:00:00,1584.49,1584.49,1578.7,1580.79,1584.91,1584.91,1579.06,1581.31,101499
02.17.2020 22:00:00,1580.79,1604.97,1580.79,1601.06,1581.31,1605.33,1581.31,1603.08,321542
02.18.2020 22:00:00,1601.06,1612.83,1599.41,1611.27,1603.08,1613.4,1599.77,1613.34,357488
02.19.2020 22:00:00,1611.27,1623.62,1603.74,1618.48,1613.34,1623.98,1604.12,1621.27,535148
02.20.2020 22:00:00,1618.48,1649.26,1618.48,1643.42,1621.27,1649.52,1619.19,1643.87,590262
02.23.2020 22:00:00,1643.42,1689.22,1643.42,1658.62,1643.87,1689.55,1643.87,1659.07,1016570
02.24.2020 22:00:00,1658.62,1660.76,1624.9,1633.19,1659.07,1661.52,1625.5,1636.23,1222774
02.25.2020 22:00:00,1633.19,1654.88,1624.74,1640.4,1636.23,1655.23,1625.11,1642.59,1004692
02.26.2020 22:00:00,1640.4,1660.3,1635.15,1643.99,1642.59,1660.6,1635.6,1646.42,1084115
02.27.2020 22:00:00,1643.99,1649.39,1562.74,1584.95,1646.42,1649.84,1563.22,1585.58,1174015
03.01.2020 22:00:00,1584.95,1610.94,1575.29,1586.55,1585.58,1611.26,1575.88,1590.33,1115889
03.02.2020 22:00:00,1586.55,1649.16,1586.55,1640.19,1590.33,1649.6,1589.43,1644.16,889364
03.03.2020 22:00:00,1640.19,1652.81,1631.73,1635.95,1644.16,1653.51,1632.1,1639.05,589438
03.04.2020 22:00:00,1635.95,1674.51,1634.91,1669.36,1639.05,1674.9,1635.3,1672.83,643444
03.05.2020 22:00:00,1669.36,1692.1,1641.61,1673.89,1672.83,1692.65,1642.75,1674.46,1005737
03.08.2020 21:00:00,1673.89,1703.19,1656.98,1678.31,1674.46,1703.52,1657.88,1679.2,910166
03.09.2020 21:00:00,1678.31,1680.43,1641.37,1648.71,1679.2,1681.18,1641.94,1649.75,943377
03.10.2020 21:00:00,1648.71,1671.15,1632.9,1634.42,1649.75,1671.56,1633.31,1637.07,793816
03.11.2020 21:00:00,1634.42,1650.28,1560.5,1578.29,1637.07,1650.8,1560.92,1580.01,1009172
03.12.2020 21:00:00,1578.29,1597.85,1504.34,1528.99,1580.01,1598.36,1505.14,1530.09,1052940
03.15.2020 21:00:00,1528.99,1575.2,1451.08,1509.12,1530.09,1576.05,1451.49,1512.94,1196812
03.16.2020 21:00:00,1509.12,1553.91,1465.4,1528.57,1512.94,1554.21,1466.1,1529.43,1079729
03.17.2020 21:00:00,1528.57,1545.93,1472.49,1485.85,1529.43,1546.74,1472.99,1486.75,976857
03.18.2020 21:00:00,1485.85,1500.68,1463.49,1471.89,1486.75,1501.6,1464.64,1474.16,833803
03.19.2020 21:00:00,1471.89,1516.07,1454.46,1497.01,1474.16,1516.57,1455.93,1497.82,721471
03.22.2020 21:00:00,1497.01,1560.86,1482.21,1551.45,1497.82,1561.65,1483.22,1553.09,707830
03.23.2020 21:00:00,1551.45,1631.23,1551.45,1621.05,1553.09,1638.75,1553.09,1631.35,164862
03.24.2020 21:00:00,1621.05,1636.23,1588.82,1615.77,1631.35,1650.03,1601.29,1618.47,205272
03.25.2020 21:00:00,1615.77,1642.96,1587.7,1628.31,1618.47,1649.81,1599.87,1633.29,152804
03.26.2020 21:00:00,1628.31,1630.48,1606.76,1617.5,1633.29,1638.48,1616.9,1622.8,307278
03.29.2020 21:00:00,1617.5,1631.48,1602.51,1620.91,1622.8,1643.86,1612.55,1623.77,291653
03.30.2020 21:00:00,1620.91,1626.55,1573.37,1574.9,1623.77,1627.31,1575.24,1579.1,371507
03.31.2020 21:00:00,1574.9,1600.41,1560.13,1590.13,1579.1,1603.42,1570.75,1592.43,412780
04.01.2020 21:00:00,1590.13,1619.76,1582.42,1612.07,1592.43,1621.1,1583.37,1614.49,704652
04.02.2020 21:00:00,1612.07,1625.21,1605.39,1618.63,1614.49,1626.83,1607.69,1621.37,409490
04.05.2020 21:00:00,1618.63,1668.35,1608.59,1657.77,1621.37,1670.98,1609.7,1663.43,381690
04.06.2020 21:00:00,1657.77,1671.95,1641.84,1644.84,1663.43,1677.53,1643.4,1650.46,286313
04.07.2020 21:00:00,1644.84,1656.39,1640.1,1644.06,1650.46,1657.43,1643.46,1646.66,219464
04.08.2020 21:00:00,1644.06,1689.66,1643.05,1682.16,1646.66,1691.13,1644.83,1686.74,300111
04.12.2020 21:00:00,1682.16,1722.25,1677.35,1709.16,1686.74,1725.48,1680.49,1718.28,280905
04.13.2020 21:00:00,1709.16,1747.04,1708.56,1726.18,1718.28,1748.88,1709.36,1729.72,435098
04.14.2020 21:00:00,1726.18,1730.53,1706.67,1714.35,1729.72,1732.97,1708.95,1717.25,419065
04.15.2020 21:00:00,1714.35,1738.65,1707.83,1715.99,1717.25,1740.35,1708.93,1720.09,615105
04.16.2020 21:00:00,1715.99,1718.46,1677.16,1683.2,1720.09,1720.09,1680.55,1684.97,587875
04.19.2020 21:00:00,1683.2,1702.49,1671.1,1694.71,1684.97,1703.46,1672.02,1697.29,412116
04.20.2020 21:00:00,1694.71,1697.66,1659.42,1683.4,1697.29,1698.44,1662.3,1686.58,502893
04.21.2020 21:00:00,1683.4,1718.21,1679.61,1713.67,1686.58,1719.19,1680.71,1716.91,647622
04.22.2020 21:00:00,1713.67,1738.59,1706.93,1729.89,1716.91,1739.47,1707.72,1731.83,751833
04.23.2020 21:00:00,1729.89,1736.31,1710.56,1726.74,1731.83,1736.98,1711.03,1727.71,608827
04.26.2020 21:00:00,1726.74,1727.55,1705.99,1713.36,1727.71,1728.55,1706.72,1715.29,698217
04.27.2020 21:00:00,1713.36,1716.52,1691.41,1707.66,1715.29,1718.02,1692.51,1710.22,749906
04.28.2020 21:00:00,1707.66,1717.42,1697.65,1711.58,1710.22,1718.57,1698.4,1715.42,630720
04.29.2020 21:00:00,1711.58,1721.94,1681.36,1684.97,1715.42,1722.79,1681.91,1687.92,631609
04.30.2020 21:00:00,1684.97,1705.87,1669.62,1699.92,1687.92,1706.33,1670.81,1701.66,764742
05.03.2020 21:00:00,1699.92,1714.75,1691.46,1700.42,1701.66,1715.83,1692.96,1702.17,355859
05.04.2020 21:00:00,1700.42,1711.64,1688.55,1703.04,1702.17,1712.55,1690.42,1706.71,415576
05.05.2020 21:00:00,1703.04,1708.1,1681.6,1685.18,1706.71,1708.71,1682.33,1688.33,346814
05.06.2020 21:00:00,1685.18,1721.95,1683.59,1715.17,1688.33,1722.53,1684.8,1716.91,379103
05.07.2020 21:00:00,1715.17,1723.54,1701.49,1704.06,1716.91,1724.42,1702.1,1705.25,409225
05.10.2020 21:00:00,1704.06,1712.02,1691.75,1696.68,1705.25,1713.03,1692.45,1697.58,438010
05.11.2020 21:00:00,1696.68,1710.94,1693.56,1701.46,1697.58,1711.31,1693.92,1703.32,369988
05.12.2020 21:00:00,1701.46,1718.11,1698.86,1716.09,1703.32,1718.69,1699.4,1718.63,518107
05.13.2020 21:00:00,1716.09,1736.16,1710.79,1727.71,1718.63,1736.55,1711.33,1731.38,447401
05.14.2020 21:00:00,1727.71,1751.56,1727.71,1743.94,1731.38,1752.1,1728.89,1744.96,561909
05.17.2020 21:00:00,1743.94,1765.3,1727.4,1731.73,1744.96,1765.92,1728.08,1732.99,495628
05.18.2020 21:00:00,1731.73,1747.76,1725.05,1743.52,1732.99,1748.24,1726.29,1746.9,596250
05.19.2020 21:00:00,1743.52,1753.8,1742.04,1747.22,1746.9,1754.28,1742.62,1748.48,497960
05.20.2020 21:00:00,1747.22,1748.7,1717.14,1726.56,1748.48,1751.18,1717.39,1727.82,557122
05.21.2020 21:00:00,1726.56,1740.06,1723.33,1735.67,1727.82,1740.7,1724.41,1736.73,336867
05.24.2020 21:00:00,1735.67,1735.67,1721.61,1727.88,1736.73,1736.73,1721.83,1730.25,164650
05.25.2020 21:00:00,1727.88,1735.39,1708.48,1710.1,1730.25,1735.99,1709.34,1712.21,404914
05.26.2020 21:00:00,1710.1,1715.93,1693.57,1708.36,1712.21,1716.3,1694.04,1709.85,436519
05.27.2020 21:00:00,1708.36,1727.42,1703.41,1717.28,1709.85,1727.93,1705.85,1721.0,416306
05.28.2020 21:00:00,1717.28,1737.58,1712.55,1731.2,1721.0,1738.26,1713.24,1732.07,399698
05.31.2020 21:00:00,1731.2,1744.51,1726.98,1738.73,1732.07,1745.11,1727.93,1742.56,365219
Problem
This is the result for this code and I'm looking for ways to find intersections for SMA20 (yellow) and SMA50 (green) lines and thus able to get alerts whenever these lines cross.
Solution
Print out intersections indication crossing from above or below relative to each series.
import numpy as np
g20=sma20.values
g50=sma50.values
# np.sign(...) return -1, 0 or 1
# np.diff(...) return value difference for (n-1) - n, to obtain intersections
# np.argwhere(...) remove zeros, preserves turning points only
idx20 = np.argwhere(np.diff(np.sign(g20 - g50))).flatten()
priceSma_df.plot()
plt.scatter(close.index[idx20], sma50[idx20], color='red')
plt.show()
import numpy as np
f=close.values
g20=sma20.values
g50=sma50.values
idx20 = np.argwhere(np.diff(np.sign(f - g20))).flatten()
idx50 = np.argwhere(np.diff(np.sign(f - g50))).flatten()
priceSma_df = pd.DataFrame({
'BidClose' : close,
'SMA 20' : sma20,
'SMA 50' : sma50
})
priceSma_df.plot()
plt.scatter(close.index[idx20], sma20[idx20], color='orange')
plt.scatter(close.index[idx50], sma50[idx50], color='green')
plt.show()

ggplot multiple plots in one object

I've created a script to create multiple plots in one object. The results I am looking for are two plots one over the other such that each plot has different y axis scale but x axis is fixed - dates. However, only one of the plots (the top) is properly created, the bottom plot is visible but empty i.e the geom_line is not visible. Furthermore, the y-axis of the second plot does not match the range of values - min to max. I also tried using facet_grid (scales="free") but no change in the y-axis. The y-axis for the second graph has a range of 0 to 0.05.
I've limited the date range to the past few weeks. This is the code I am using:
df = df.set_index('date')
weekly = df.resample('w-mon',label='left',closed='left').sum()
data = weekly[-4:].reset_index()
data= pd.melt(data, id_vars=['date'])
pplot = ggplot(aes(x="date", y="value", color="variable", group="variable"), data)
#geom_line()
scale_x_date(labels = date_format('%d.%m'),
limits=(data.date.min() - dt.timedelta(2),
data.date.max() + dt.timedelta(2)))
#facet_grid("variable", scales="free_y")
theme_bw()
The dataframe sample (df), its a daily dataset containing values for each variable x and a, in this case 'date' is the index:
date x a
2016-08-01 100 20
2016-08-02 50 0
2016-08-03 24 18
2016-08-04 0 10
The dataframe sample (to_plot) - weekly overview:
date variable value
0 2016-08-01 x 200
1 2016-08-08 x 211
2 2016-08-15 x 104
3 2016-08-22 x 332
4 2016-08-01 a 8
5 2016-08-08 a 15
6 2016-08-15 a 22
7 2016-08-22 a 6
Sorry for not adding the df dataframe before.
Your calls to the plot directives geom_line(), scale_x_date(), etc. are standing on their own in your script; you do not connect them to your plot object. Thus, they do not have any effect on your plot.
In order to apply a plot directive to an existing plot object, use the graphics language and "add" them to your plot object by connecting them with a + operator.
The result (as intended):
The full script:
from __future__ import print_function
import sys
import pandas as pd
import datetime as dt
from ggplot import *
if __name__ == '__main__':
df = pd.DataFrame({
'date': ['2016-08-01', '2016-08-08', '2016-08-15', '2016-08-22'],
'x': [100, 50, 24, 0],
'a': [20, 0, 18, 10]
})
df['date'] = pd.to_datetime(df['date'])
data = pd.melt(df, id_vars=['date'])
plt = ggplot(data, aes(x='date', y='value', color='variable', group='variable')) +\
scale_x_date(
labels=date_format('%y-%m-%d'),
limits=(data.date.min() - dt.timedelta(2), data.date.max() + dt.timedelta(2))
) +\
geom_line() +\
facet_grid('variable', scales='free_y')
plt.show()

Categories