I've created a script to create multiple plots in one object. The results I am looking for are two plots one over the other such that each plot has different y axis scale but x axis is fixed - dates. However, only one of the plots (the top) is properly created, the bottom plot is visible but empty i.e the geom_line is not visible. Furthermore, the y-axis of the second plot does not match the range of values - min to max. I also tried using facet_grid (scales="free") but no change in the y-axis. The y-axis for the second graph has a range of 0 to 0.05.
I've limited the date range to the past few weeks. This is the code I am using:
df = df.set_index('date')
weekly = df.resample('w-mon',label='left',closed='left').sum()
data = weekly[-4:].reset_index()
data= pd.melt(data, id_vars=['date'])
pplot = ggplot(aes(x="date", y="value", color="variable", group="variable"), data)
#geom_line()
scale_x_date(labels = date_format('%d.%m'),
limits=(data.date.min() - dt.timedelta(2),
data.date.max() + dt.timedelta(2)))
#facet_grid("variable", scales="free_y")
theme_bw()
The dataframe sample (df), its a daily dataset containing values for each variable x and a, in this case 'date' is the index:
date x a
2016-08-01 100 20
2016-08-02 50 0
2016-08-03 24 18
2016-08-04 0 10
The dataframe sample (to_plot) - weekly overview:
date variable value
0 2016-08-01 x 200
1 2016-08-08 x 211
2 2016-08-15 x 104
3 2016-08-22 x 332
4 2016-08-01 a 8
5 2016-08-08 a 15
6 2016-08-15 a 22
7 2016-08-22 a 6
Sorry for not adding the df dataframe before.
Your calls to the plot directives geom_line(), scale_x_date(), etc. are standing on their own in your script; you do not connect them to your plot object. Thus, they do not have any effect on your plot.
In order to apply a plot directive to an existing plot object, use the graphics language and "add" them to your plot object by connecting them with a + operator.
The result (as intended):
The full script:
from __future__ import print_function
import sys
import pandas as pd
import datetime as dt
from ggplot import *
if __name__ == '__main__':
df = pd.DataFrame({
'date': ['2016-08-01', '2016-08-08', '2016-08-15', '2016-08-22'],
'x': [100, 50, 24, 0],
'a': [20, 0, 18, 10]
})
df['date'] = pd.to_datetime(df['date'])
data = pd.melt(df, id_vars=['date'])
plt = ggplot(data, aes(x='date', y='value', color='variable', group='variable')) +\
scale_x_date(
labels=date_format('%y-%m-%d'),
limits=(data.date.min() - dt.timedelta(2), data.date.max() + dt.timedelta(2))
) +\
geom_line() +\
facet_grid('variable', scales='free_y')
plt.show()
Related
I am trying to specify the x-axis interval when plotting DataFrames. I have several data files like,
0:0:0 29
0:5:0 85
0:10:0 141
0:15:0 198
0:20:0 251
0:25:0 308
0:30:0 363
0:35:0 413
Where first column is time in %H:%M:%S format but hours goes beyond 24 hours (till 48 hours).
When I read the file as below and plot it looks fine but I want to set the xticks interval to 8 hours.
df0 = pd.read_csv(fil, names=['Time', 'Count'], delim_whitespace=True, parse_dates=['Time'])
df0 = df0.set_index('Time')
ax = matplotlib.pyplot.gca()
mkfunc = lambda x, pos: '%1.1fM' % (x * 1e-6) if x >= 1e6 else '%1.1fK' % (x * 1e-3) if x >= 1e3 else '%1.1f' % x
mkformatter = matplotlib.ticker.FuncFormatter(mkfunc)
ax.yaxis.set_major_formatter(mkformatter)
ax.xaxis.set_major_locator(mdates.HourLocator(interval=8))
ax.xaxis.set_major_formatter(mdates.DateFormatter('%H'))
df0.plot(ax=ax, x_compat=True, color='blue')
plt.grid()
plt.savefig('figure2.pdf',dpi=300, bbox_inches = "tight")
I tried the above method as specified by many answers here but that resulted in the following warning,
Locator attempting to generate 1874 ticks ([-28.208333333333332, ..., 596.125]), which exceeds Locator.MAXTICKS (1000).
The figure also displayed many vertical lines.
I tried converting my time column specifically to timedelta and it still did not help.
I converted to timedelta as below.
custom_date_parser = lambda x: pd.to_timedelta(x.split('.')[0])
df0 = pd.read_csv(fil, names=['Time', 'Count'], delim_whitespace=True, parse_dates=['Time']), date_parser=custom_date_parser)
Could you please help me to identify the issue and set the xticks interval correctly?
The problem here is that a) matplotlib/pandas don't have much support for timedelta objects and b) you cannot use the HourLocator with your data because after conversion to a datetime object, your axis would be labelled 0, 8, 16, 0, 8, 16...
Instead, we can convert the timedelta imported by your converter into hours and plot the numerical values:
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.ticker import MultipleLocator
import numpy as np
custom_date_parser = lambda x: pd.to_timedelta(x.split('.')[0])
df0 = pd.read_csv("test.txt", names=['Time', 'Count'], delim_whitespace=True, parse_dates=['Time'], date_parser=custom_date_parser)
#conversion into numerical hour value
df0["Time"] /= np.timedelta64(1, "h")
df0 = df0.set_index('Time')
ax = matplotlib.pyplot.gca()
df0.plot(ax=ax, x_compat=True, color='blue')
mkfunc = lambda x, pos: '%1.1fM' % (x * 1e-6) if x >= 1e6 else '%1.1fK' % (x * 1e-3) if x >= 1e3 else '%1.1f' % x
mkformatter = matplotlib.ticker.FuncFormatter(mkfunc)
ax.yaxis.set_major_formatter(mkformatter)
#set locator at regular hour intervals
ax.xaxis.set_major_locator(MultipleLocator(8))
ax.set_xlabel("Time (in h)")
plt.grid()
plt.show()
Sample output:
If for reasons unknown you actually need datetime objects, you can convert your timedelta values using an arbitrary offset, as you intend to ignore the day value:
df0["Time"] += pd.to_datetime("2000-01-01 00:00:00 UTC")
But I doubt this will be of advantage in your case.
As an aside - for debugging, it is useful not to use regularly spaced test data. In your example, you probably did not notice that the graph was plotted against the index (0, 1, 2...) and then relabeled with strings, imitating regularly spaced datetime objects. The following test data immediately reveal the problem.
0:0:0 29
0:5:0 85
0:10:0 141
3:15:0 98
5:20:0 251
17:25:0 308
27:30:0 63
35:35:0 413
I'd like to ask for suggestions how to calculate lenght of gap between two datasets in matplotlib made of pandas dataframe. Ideally, I would like to have these gap values written in the plot and also, if it is possible, include them into the dataframe.
Here is my simplified example of dataframe:
import pandas as pd
d = {'Mean-1': [0.195842, 0.295069, 0.321345, 0.773725], 'SEM-1': [0.001216, 0.002687, 0.005267, 0.029974], 'Mean-2': [0.143103, 0.250505, 0.305767, 0.960804],'SEM-2': [0.000959, 0.001368, 0.003722, 0.150025], 'Atom Number': [1, 3, 5, 7]}
df=pd.DataFrame(d)
df
Mean-1 SEM-1 Mean-2 SEM-2 Atom Number
0 0.195842 0.001216 0.143103 0.000959 1
1 0.295069 0.002687 0.250505 0.001368 3
2 0.321345 0.005267 0.305767 0.003722 5
3 0.773725 0.029974 0.960804 0.150025 7
Then I made plot, where we can see two lines representing Mean-1 and Mean-2, and then shaded area around each line representing standard error of the mean. This is done for the selected atom numbers.
import matplotlib.pyplot as plt
ax = df.plot(x='Atom Number', y=['Mean-1','Mean-2'])
y_1 = df['Mean-1']
y_2 = df['Mean-2']
x = df['Atom Number']
error_1 = df['SEM-1']
error_2 = df['SEM-1']
ax.fill_between(df['Atom Number'], y_1-error_1, y_1+error_1, alpha=0.2, edgecolor='#CC4F1B', facecolor='#FF9848')
ax.fill_between(df['Atom Number'], y_2-error_2, y_2+error_2, alpha=0.2, edgecolor='#3F7F4C', facecolor='#7EFF99')
plt.xticks(x)
What I would like to do further is to calculate the gap for each residue. The gap is the white space only, thus space where the lines as well as the shaded areas (SEMs) don't overlap.
And also would like to know if I can somehow print the gap values from the plot? And save them into column. Thank You for suggestions.
It's not a compact solution but you could try something like this (Check the order of things). Calculate all the position (y_i and upper and lower limits).
import numpy as np
df['y1_upper'] = y_1+error_1
df['y1_lower'] = y_1-error_1
df['y2_upper'] = y_2+error_2
df['y2_lower'] = y_2-error_2
which gives
Mean-1 SEM-1 Mean-2 SEM-2 Atom Number y1_upper y1_lower \
0 0.195842 0.001216 0.143103 0.000959 1 0.197058 0.194626
1 0.295069 0.002687 0.250505 0.001368 3 0.297756 0.292382
2 0.321345 0.005267 0.305767 0.003722 5 0.326612 0.316078
3 0.773725 0.029974 0.960804 0.150025 7 0.803699 0.743751
y2_upper y2_lower
0 0.144319 0.141887
1 0.253192 0.247818
2 0.311034 0.300500
3 0.990778 0.930830
The distances (gaps) are calculated differently depending on if y_1 is over y_2and vice versa. So use conditions on the upper and lower limits and use linalg.norm to compute the distance.
conditions = [
(df['y1_lower'] >= df['y2_upper']),
(df['y1_lower'] < df['y2_upper'])]
choices = [np.linalg.norm(df['y1_lower']-df['y2_upper']), np.linalg.norm(df['y2_lower']-df['y1_upper'])]
df['dist'] = np.select(conditions, choices)
This gives
Mean-1 SEM-1 Mean-2 SEM-2 Atom Number y1_upper y1_lower \
0 0.195842 0.001216 0.143103 0.000959 1 0.197058 0.194626
1 0.295069 0.002687 0.250505 0.001368 3 0.297756 0.292382
2 0.321345 0.005267 0.305767 0.003722 5 0.326612 0.316078
3 0.773725 0.029974 0.960804 0.150025 7 0.803699 0.743751
y2_upper y2_lower dist
0 0.144319 0.141887 0.255175
1 0.253192 0.247818 0.255175
2 0.311034 0.300500 0.255175
3 0.990778 0.930830 0.149605
As I said, check the order, but this is a possible solution.
IIUC, do you want something like this:
import matplotlib.pyplot as plt
ax = df.plot(x='Atom Number', y=['Mean-1','Mean-2'], figsize=(15,8))
y_1 = df['Mean-1']
y_2 = df['Mean-2']
x = df['Atom Number']
error_1 = df['SEM-1']
error_2 = df['SEM-1']
ax.fill_between(df['Atom Number'], y_1-error_1, y_1+error_1, alpha=0.2, edgecolor='#CC4F1B', facecolor='#FF9848')
ax.fill_between(df['Atom Number'], y_2-error_2, y_2+error_2, alpha=0.2, edgecolor='#3F7F4C', facecolor='#7EFF99')
ax.fill_between(df['Atom Number'], y_1+error_1, y_2-error_2, alpha=.2, edgecolor='k', facecolor='blue')
for i in range(len(x)):
gap = y_1[i]+error_1[i] - y_2[i]-error_2[i]
ylabel = min(y_1[i], y_2[i]) + abs(gap) / 2
_ = ax.annotate(f'{gap:0.4f}', xy=(x[i],ylabel), xytext=(x[i]-.14,y_1[i]+gap/abs(gap)*.2), arrowprops=dict(arrowstyle="-"))
plt.xticks(x);
Output:
I would like to plot a line with a range around it, like on this photo:
I posted an original question, but didn't specify the index being a datetime index. I thought it wouldn't be important, but I was wrong.
There is an answer that covers it with a numerical index:
Plotly: How to make a figure with multiple lines and shaded area for standard deviations?
and documentation here:
https://plotly.com/python/continuous-error-bars/
but the issue of datetime index is not covered.
Here is some test data:
timestamp price min mean max
1596267946298 100.0 100 100.5 101
1596267946299 101.0 100 100.5 101
1596267946300 102.0 98 99.5 102
1596267948301 99.0 98 99.5 102
1596267948302 98.0 98 99.5 102
1596267949303 99.0 98 995. 102
where I'd like the band to cover from min to max and the mean to be drawn in the center.
another option is to take the code from the first answer of the question posted above (Plotly: How to make a figure with multiple lines and shaded area for standard deviations?) and change the data generation to:
index = pd.date_range('1/1/2000', periods=25, freq='T')
df = pd.DataFrame(dict(A=np.random.uniform(low=-1, high=2, size=25).tolist(),
B=np.random.uniform(low=-4, high=3, size=25).tolist(),
C=np.random.uniform(low=-1, high=3, size=25).tolist()),
index=index)
this will work the same way but create a datetime index.
Compared to the setup in the linked question, what causes trouble is the fact that x+x[::-1] doesn't work very well with a datetime index. But if you set x=df.index in:
# add line and shaded area for each series and standards deviation
for i, col in enumerate(df):
new_col = next(line_color)
# x = list(df.index.values+1)
x = df.index
And then replace x+x[::-1] with x=x.append(x[::-1]):
# standard deviation area
fig.add_traces(go.Scatter(
#x+x[::-1],
x=x.append(x[::-1]),
Then things should work out perfectly well.
Plot:
Complete code:
# imports
import plotly.graph_objs as go
import plotly.express as px
import pandas as pd
import numpy as np
# sample data in a pandas dataframe
np.random.seed(1)
df=pd.DataFrame(dict(A=np.random.uniform(low=-1, high=2, size=25).tolist(),
B=np.random.uniform(low=-4, high=3, size=25).tolist(),
C=np.random.uniform(low=-1, high=3, size=25).tolist(),
))
df = df.cumsum()
# set daterange as index
df['dates'] = pd.date_range('2020', freq='D', periods=len(df))
df.set_index('dates', inplace=True)
# ---
# define colors as a list
colors = px.colors.qualitative.Plotly
# convert plotly hex colors to rgba to enable transparency adjustments
def hex_rgba(hex, transparency):
col_hex = hex.lstrip('#')
col_rgb = list(int(col_hex[i:i+2], 16) for i in (0, 2, 4))
col_rgb.extend([transparency])
areacol = tuple(col_rgb)
return areacol
rgba = [hex_rgba(c, transparency=0.2) for c in colors]
colCycle = ['rgba'+str(elem) for elem in rgba]
# Make sure the colors run in cycles if there are more lines than colors
def next_col(cols):
while True:
for col in cols:
yield col
line_color=next_col(cols=colCycle)
# plotly figure
fig = go.Figure()
# add line and shaded area for each series and standards deviation
for i, col in enumerate(df):
new_col = next(line_color)
x = df.index
y1 = df[col]
y1_upper = [(y + np.std(df[col])) for y in df[col]]
y1_lower = [(y - np.std(df[col])) for y in df[col]]
y1_lower = y1_lower[::-1]
# standard deviation area
fig.add_traces(go.Scatter(
#x+x[::-1],
x=x.append(x[::-1]),
y=y1_upper+y1_lower,
fill='tozerox',
fillcolor=new_col,
line=dict(color='rgba(255,255,255,0)'),
showlegend=False,
name=col))
# line trace
fig.add_traces(go.Scatter(x=df.index,
y=y1,
line=dict(color=new_col, width=2.5),
mode='lines',
name=col)
)
fig.update_layout(xaxis=dict(range=[df.index[1],df.index[-1]]))
fig.show()
I am doing a matplotlib.axes.Axes.stem graph where the x-axis is a dateline that shows days. Some of my data appear on certain days. While on other days, it has no data (because such info do not exist in my data).
Question 1: How do I make a timeline stem graph that will show my data, including days with no data? Is this possible? Is there some way to auto-scale the appearance of the data x-axis to handle such a situation?
Below is a sample data file called test.txt and my python script to read in its data to show a timeline stem plot for your consideration. output from this script is also given below.
Question2. Presentation question. How do I show a "-" symbol at each annotation? Also, how do I rotate the annotation by 30 degrees?
test.txt
No. Date
1 23/01/2020
2 24/01/2020
3 24/01/2020
4 26/01/2020
5 27/01/2020
6 28/01/2020
7 29/01/2020
8 29/01/2020
9 30/01/2020
10 30/01/2020
11 31/01/2020
12 31/01/2020
13 01/02/2020
14 01/02/2020
15 04/02/2020
16 04/02/2020
17 04/02/2020
18 05/02/2020
19 05/02/2020
20 05/02/2020
21 06/02/2020
22 07/02/2020
23 07/02/2020
24 07/02/2020
25 08/02/2020
26 08/02/2020
27 08/02/2020
28 08/02/2020
29 08/02/2020
30 09/02/2020
31 10/02/2020
32 10/02/2020
33 11/02/2020
34 11/02/2020
38 13/02/2020
39 13/02/2020
40 13/02/2020
41 13/02/2020
42 13/02/2020
43 13/02/2020
44 14/02/2020
45 14/02/2020
46 14/02/2020
47 14/02/2020
48 14/02/2020
49 14/02/2020
50 15/02/2020
51 15/02/2020
52 15/02/2020
53 15/02/2020
54 15/02/2020
57 18/02/2020
58 18/02/2020
59 18/02/2020
60 19/02/2020
61 21/02/2020
stem_plot.py
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.dates as mdates
from datetime import datetime
from pathlib import Path
#########################
#### DATA EXTRACTION ####
#########################
source = Path('./test.txt')
with source.open() as f:
lines = f.readlines()
#print( lines )
# Store source data in dictionary with date shown as mm-dd.
data={}
for line in lines[1:]:
case, cdate = line.strip().split()
cdate = datetime.strptime(cdate, "%d/%m/%Y").strftime('%m-%d')
data[case] = cdate
print( f'\ndata = {data}' )
# Collate data's y-axis for each date, i.e. history
history2={}
cdates = list(data.values())
sorted_dates = sorted( set( cdates ) )
for i in sorted_dates:
cases=[]
for case, date in data.items():
if i == date:
cases.append(case)
#print( i, cases)
history2[i] = cases
print( f'\nhistory2 = {history2}')
###########################
#### DATA PRESENTATION ####
###########################
# Create figure and plot a stem plot with the date
fig, ax = plt.subplots(figsize=(8.8, 5), constrained_layout=True)
ax.set(title="Test")
labels=list( history2.values() ) # For annotation
yy = [ len(i) for i in labels ] # y-axis
xx = list(history2.keys()) # x-axis
markerline, stemline, baseline = ax.stem(
xx, yy, linefmt="C1:", basefmt="k-", use_line_collection=True)
plt.setp(markerline, marker="None" )
# annotate stem lines
for ann_x, label in list(history2.items()):
print(ann_x, label)
each_count=1
for each in label:
ax.annotate( each, xy=(ann_x, each_count), xycoords='data')
each_count += 1
#print(f'each_count = {each_count}' )
# format xaxis
plt.setp( ax.get_xticklabels(), rotation=30 )
# remove top and right spines
for spine in ["top", "right"]:
ax.spines[spine].set_visible(False)
# show axis name
ax.get_yaxis().set_label_text(label='Y-axis')
ax.get_xaxis().set_label_text(label='X-axis')
plt.show()
Current output:
About your first question. Basically, you make a list of all days between the days you are using and use that. So add this to the beginning of your code:
import pandas as pd
alldays = pd.date_range(start="20200123",
end="20200221",
normalize=True)
dates = []
for i in alldays:
dates.append(f"{i.month:02}-{i.day:02}")
What this does is it gets a pandas data range between two dates and converts this range into a list of month-day strings.
Then modify this part of your code like this:
# Collate data's y-axis for each date, i.e. history
history2={}
cdates = list(data.values())
sorted_dates = sorted( set( cdates ) )
for i in dates: # This is the only change!
cases=[]
for case, date in data.items():
if i == date:
cases.append(case)
#print( i, cases)
history2[i] = cases
And this change would give you this:
About your second question, change your code to this:
# annotate stem lines
for ann_x, label in list(history2.items()):
print(ann_x, label)
each_count=1
for each in label:
ax.annotate(f"--{each}", xy=(ann_x, each_count), xycoords='data', rotation=30)
each_count += 1
I just changed the ax.annotate line. The two changes are:
added a "--" to each of your annotation labels,
added a rotation parameter. The rotation parameter does not appear directly in the documentation, but the documentation says you can use any of the methods for Text as kwargs, and they are here.
This would hopefully give you what you have asked for:
Adding to #SinanKurmus answer to my 1st Question:
Solution1:
A time-axis with a daily interval for the entire history of the given data can be obtained using matplotlib's methods, namely drange and num2date, and python. The use of pandas can be avoided here.
First, express the start and end date of the time axis as a python datetime object. Note, you need to add 1 more day to the end date else data from the last date would not be included. Next, use 1 day as your time interval using python's datetime.timedelta object. Next supply them to matplotlib.date.drange method that will return a NumPy array. Matplotlib's num2date method in turns converts that back to a python datetime object.
def get_time_axis( data ):
start = datetime.strptime(min(data.values()), "%Y-%m-%d")
end = datetime.strptime(max(data.values()), "%Y-%m-%d") + timedelta(days=1)
delta = timedelta(days=1)
time_axis_md = mdates.drange( start, end, delta )
time_axis_py = mdates.num2date( time_axis_md, tz=None ) # Add tz when required
return time_axis_py
Solution 2:
Apparently, Matplotlib also has a FAQ on how to skip dates where there is no data. I have included their sample code example below.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.mlab as mlab
import matplotlib.ticker as ticker
r = mlab.csv2rec('../data/aapl.csv')
r.sort()
r = r[-30:] # get the last 30 days
N = len(r)
ind = np.arange(N) # the evenly spaced plot indices
def format_date(x, pos=None):
thisind = np.clip(int(x+0.5), 0, N-1)
return r.date[thisind].strftime('%Y-%m-%d')
fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(ind, r.adj_close, 'o-')
ax.xaxis.set_major_formatter(ticker.FuncFormatter(format_date))
fig.autofmt_xdate()
plt.show()
I wrote this function and I would like it to accept more than one DF so that the final plot has multiple plotted lines for the predictions and the coef_DF gets completed with the rest of the coefficients.
The function extracts the needed features and target from a much larger dataset to make predictions using a linear regression func, it then makes the model, plots the line over the dataset and returns a df with all the coeficients.
(This is just an exercise.)
def prep_model_and_predict(feature, target, dataset, degree):
# part 1: make a df with relevant format and features
# degree >=1
poly_df=pd.DataFrame()
poly_df[str(target)] = dataset[str(target)]
poly_df['power_1'] = dataset[str(feature)]
#cehck if degree >1
if degree > 1:
for power in range(2, degree+1): #loop over reaming deg
name = 'power_'+str(power)
poly_df[name]=poly_df['power_1'].apply(lambda x: x**power)
#part 2: make model and predictions
features=list(poly_df.columns[1:])
X=poly_df[features]
y=poly_df[str(target)]
model=LinearRegression().fit(X,y)
predictions=model.predict(X)
#part 3: put weghts in a nice df
coef_df=pd.DataFrame()
coef_df=coef_df.append({"Name":'Intercept', 'Value':model.intercept_}, ignore_index=True)
coef_df=coef_df.append({'Name':'Power_1', 'Value':model.coef_[0]}, ignore_index=True)
if degree > 1:
for degree in range(2, degree+1):
name = 'Power_' + str(degree)
coef_df = coef_df.append({"Name":name,
'Value':'{:.3e}'.format(model.coef_[degree-1])}, ignore_index=True)
#prt 4: plot it
fig, ax = plt.subplots()
ax.plot(poly_df['power_1'], poly_df[str(target)], '.',
poly_df['power_1'], predictions, '-')
ax.set_xlabel('Square footage, living area')
ax.set_ylabel('Price per Sqft')
ax.ticklabel_format(axis='y', style='sci', scilimits=(-2,2))
return coef_df, ax
and this is the result:
Name Value
0 Intercept 506738
1 Power_1 2.71336e-77
2 Power_2 7.335e-39
3 Power_3 -1.850e-44
4 Power_4 8.437e-50
5 Power_5 0.000e+00
6 Power_6 0.000e+00
7 Power_7 3.645e-55
8 Power_8 1.504e-51
9 Power_9 5.760e-48
10 Power_10 1.958e-44
11 Power_11 5.394e-41
12 Power_12 9.404e-38
13 Power_13 -3.635e-41
14 Power_14 4.655e-45
15 Power_15 -1.972e-49
much appreciated!
I am not sure what exactly you are asking for. But I would suggest, next time try to ask a question that is easily produce-able and runnable by other people here in SO.
I have tried to answer your questions. Correct me if I misunderstand your question.
Pass arbitrary number of DataFrame to your function and plot it:
I have created three random dataframes for use:
df1 = pd.DataFrame(np.random.randint(0,10,size=(10, 2)), columns=list('AB'))
df2 = pd.DataFrame(np.random.randint(0,10,size=(10, 2)), columns=list('AB'))
df3 = pd.DataFrame(np.random.randint(0,10,size=(10, 2)), columns=list('AB'))
The functions that plots them:
def plot_me(*kwargs):
plt.figure(figsize=(13,9))
lab_ind = 0
for i in kwargs:
plt.plot(i['A'], i['B'], label = lab_ind)
lab_ind += 1
plt.legend()
plt.show()
The result plot you get:
Put the results of your model into a DataFrame
Regarding your second question, I am not going to concentrate too much on your exact details - for example the name of the columns of your dataframe, etc.
For this particular example I have generated two random arrays:
X = np.random.randint(0,50 ,size=(50, 2))
y = np.random.randint(0,2 ,size=(50, 1))
Then fit a LinearRegression model on this data.
model=LinearRegression().fit(X,y)
predictions=model.predict(X)
And then add it to a DataFrame:
res_df = pd.DataFrame(predictions,columns = ['Value'])
And if you print res_df
Value
0 0.420395
1 0.459389
2 0.369648
3 0.416058
4 0.644088
5 0.362072
6 0.363157
7 0.468943
. .
. .