I am trying to plot the following sample table:
Time Period
HR
Legal
Fin
Leadership
Market
UX
CX
01/04/2021
6.39
5.81
7.53
7.16
6.78
7.25
7.40
7.47
6.20
01/07/2021
6.95
6.25
7.46
7.16
7.05
7.51
7.70
7.83
6.69
01/10/2021
7.41
6.43
7.65
7.50
7.25
7.74
8.00
8.04
6.90
01/01/2022
7.51
6.51
7.74
7.52
8.00
7.84
8.10
8.04
7.05
01/04/2022
7.70
6.91
7.86
7.59
7.69
7.81
8.13
8.47
7.30
01/07/2022
7.80
6.60
7.50
7.50
7.80
7.50
7.70
7.90
7.15
(Please note there are 11 columns but all of them cannot be included)
This is the code I am using:
import pandas as pd
from datetime import date, timedelta
import datetime
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.graphics.tsaplots import plot_pacf
from statsmodels.tsa.arima_model import ARIMA
import statsmodels.api as sm
import warnings
from plotly.offline import download_plotlyjs, init_notebook_mode, plot
from plotly.graph_objs import *
init_notebook_mode()
'set filepath'
data = pd.read_csv(inputfilepath, parse_dates=["Time Period"], index_col = "Time Period")
print(data.shape)
'Convert column from object to datetime'
#data["Time Period"] = pd.to_datetime(data['Time Period'], format="%d/%m/%Y")
data
fig, axes = plt.subplots(nrows=3, ncols=4, dpi=110, figsize=(10,6))
#axes.axis("off")
for i, ax in enumerate(axes.flatten()):
macro_data = data[data.columns[i]]
ax.plot(data, color='red', linewidth=1)
# Decorations
ax.set_title(data.columns[i], fontsize = 10)
ax.xaxis.set_ticks_position('none')
ax.yaxis.set_ticks_position('none')
ax.spines["top"].set_alpha(0)
ax.tick_params(labelsize=6)
plt.tight_layout();
I end up with 11 duplicated sub plots and each subplot has 11 lines across it. I want one line for each subplot to see each plot individually to compare. And, The plot is definitely wrong since the lines are not changing their gradients between each time period. Here is what I see:
And here is what I would (ideally want)
For reference, I am using the guide found here:
https://github.com/nachi-hebbar/Multivariate-Time-Series-Forecasting/blob/main/VAR_Model%20(1).ipynb
Related
Learning plotly line animation and come across this question
My df:
Date
1Mo
2Mo
3Mo
6Mo
1Yr
2Yr
0
2023-02-12
4.66
4.77
4.79
4.89
4.50
4.19
1
2023-02-11
4.66
4.77
4.77
4.90
4.88
4.49
2
2023-02-10
4.64
4.69
4.72
4.88
4.88
4.79
3
2023-02-09
4.62
4.68
4.71
4.82
4.88
4.89
4
2023-02-08
4.60
4.61
4.72
4.83
4.89
4.89
How do I animate this dataframe so the frame has
x = [1Mo, 2Mo, 3Mo, 6Mo, 1Yr, 2Yr], and
y = the actual value on a date, eg y=df[df['Date']=="2023-02-08"], animation_frame = df['Date']?
I tried
plot = px.line(df, x=df.columns[1:], y=df['Date'], title="Treasury Yields", animation_frame=df_treasuries_yield['Date'])
No joy :(
I think the problem is you cannot pass multiple columns to the animation_frame parameter. But we can get around this by converting your df from wide to long format using pd.melt – for your data, we will want to take all of the values from [1Mo, 2Mo, 3Mo, 6Mo, 1Yr, 2Yr] and put them a new column called "value" and we will have a variable column called "variable" to tell us which column the value came from.
df_long = pd.melt(df, id_vars=['Date'], value_vars=['1Mo', '2Mo', '3Mo', '6Mo', '1Yr', '2Yr'])
This will look like the following:
Date variable value
0 2023-02-12 1Mo 4.66
1 2023-02-11 1Mo 4.66
2 2023-02-10 1Mo 4.64
3 2023-02-09 1Mo 4.62
4 2023-02-08 1Mo 4.60
...
28 2023-02-09 2Yr 4.89
29 2023-02-08 2Yr 4.89
Now can pass the argument animation_frame='Date' to px.line:
fig = px.line(df_long, x="variable", y="value", animation_frame="Date", title="Yields")
i want to plot the data which is shown below and compere it to a function which gives me the theoretical plot. I am able to plot the data with its uncertainty, but i am struguling to plot the mathematical function function which gives me the theoretical plot.
amplitude uncertainty position
5.2 0.429343685 0
12.2 1.836833144 1
21.4 0.672431409 2
30.2 0.927812481 3
38.2 1.163321108 4
44.2 1.340998136 5
48.4 1.506088975 6
51 1.543016526 7
51.2 1.587229032 8
49.8 1.507327436 9
46.2 1.400355669 10
40.6 1.254401849 11
32.5 0.995301462 12
24.2 0.753044487 13
14 0.58 14
7 0.29 15
here is my code so far:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
data = pd.read_excel("Verdier_6.xlsx")
verdier = data.values
frekvens = verdier [:,3]
effektresonans = verdier [:,0]
usikkerhet = verdier [:,1]
x = np.arange(0,15,0.1)
p= 28.2
r=0.8156
v= 343.8
f= 1117
y=p*np.sqrt(1+r**2+2*r*np.cos(((2*np.pi)/(v/f))*x))
plt.plot(x,y)
plt.plot(frekvens, effektresonans)
plt.errorbar(frekvens, effektresonans, usikkerhet, fmt = "o")
plt.title("")
plt.xlabel("Posisjon, X [cm]")
plt.ylabel("Amplitude, U [mV] ")
plt.grid()
plt.show()
And here is here is a image of the plot with only experimental data shown above:
and here is an image of how my experimental and theoretical plot look:
and here is an image of how the experimental and theoretical plot should look:
I have a set of data with different variables and I want to plot "Concentration at 10℃" against "Mean pH at 10", "concentration at 17.5C" against "Mean pH at 17.5" and "concentration at 25C" against "Mean pH at 25" all in same graph.
data = {'sample': ['24h', '1W', '2W', '3W', '5W'],
'Concentration at 10°C': [4.36, 4.84, 7.20, 4.14, 1.09], 'Mean pH 10': [8.2, 7.9, 8.1, 8.3, 8.2],
'concentration at 17.5°C': [4.4, 5.85, 5.35, 3.98, 1], 'Mean pH 17.5': [8.15, 8.2, 8.35, 8.4, 8.45],
'concentration at 25°C': [3.27, 4.31, 5.74, 4.18, 2.4], 'Mean pH 25': [8.4, 8.25, 8.2, 8.15, 8.35]}
df = pd.DataFrame(data)
sample Concentration at 10°C Mean pH 10 concentration at 17.5°C Mean pH 17.5 concentration at 25°C Mean pH 25
0 24h 4.36 8.2 4.40 8.15 3.27 8.40
1 1W 4.84 7.9 5.85 8.20 4.31 8.25
2 2W 7.20 8.1 5.35 8.35 5.74 8.20
3 3W 4.14 8.3 3.98 8.40 4.18 8.15
4 5W 1.09 8.2 1.00 8.45 2.40 8.35
I have 15 samples, each sample has 2 features (a time point: e.g. 24h, and a temperature: e.g. 10C), I want to plot all my data together but in a way that each time point has a specific marker (e.g. all 24h represented by square) and each temperature also represented by a color (e.g. 10C by blue), this way when a reader looks at my figure and for instance sees a blue square, they know what that sample is,
I've managed to plot all my data in one sample, but I am lost on how to change the marker and color of different samples.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
# Load Data
df = pd.read_excel(r'C:\Users\fatemeha\Desktop\MiCoDy\WP1-spring-LR-April2021.xlsx', sheet_name=7)
print(df)
df.plot.scatter(x=['Mean concentration at 10℃','Mean concentration at 17.5C', 'Mean concentration at 25C'], y=["Mean pH at 10", "Mean pH at 17.5", "Mean pH at 25"],
rot=0, color=["blue"] , xlabel='.', ylabel=".", marker=4,
title="pH change in Lake Rot Mesocosms in Spring 2021")
The easiest way will be to reshape and clean the dataframe into a tidy (long) form.
Unfortunately, some data (temperature) has been encoded into the column names. This is a bad practice, don't do it.
pandas.wide_to_long can be used to convert the dataframe to a tidy format, which will also extract the temperatures encoded in the column names.
suffix='(\d+(?:\.\d+)?)') is required to extract the int and float types from the column names, otherwise suffix='.+' will get all characters after the specified sep character.
The column name suffix must be in a standard form, which is extracted based on the regular expression. If the column name formats are not the same, they must be fixed before using .wide_to_long. The column names can be fixed by using .rename or .str.replace
All stubnames must also be standardized.
Once the dataframe is converted to a tidy format, it can be easily plotted with seaborn.scatterplot, which is a high-level API for matplotlib
This can also be done using seaborn.relplot with kind='scatter'
Test with pandas 1.3.1, seaborn 0.11.1, and matplotlib 3.4.2
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# using the dataframe from the OP
# remove °C so column suffixes will have a standard form
df.columns = df.columns.str.replace('°C', '').str.replace('concen', 'Concen')
# convert the dataframe to a tidy format
dfl = pd.wide_to_long(df, stubnames=['Concentration at', 'Mean pH'], j='°C', sep=' ', i='sample', suffix='.+').reset_index()
# rename a columns
dfl.rename({'Concentration at': 'Mean Concentration'}, axis=1, inplace=True)
# display(dfl.head())
sample °C Mean Concentration Mean pH
0 24h 10.0 4.36 8.2
1 1W 10.0 4.84 7.9
2 2W 10.0 7.20 8.1
3 3W 10.0 4.14 8.3
4 5W 10.0 1.09 8.2
# plot
fig, ax = plt.subplots(figsize=(8, 7))
p = sns.scatterplot(data=dfl, x='Mean Concentration', y='Mean pH', hue='°C', style='sample', palette='tab10', ax=ax, s=50)
p.set(title='Measurement Data')
_ = p.legend(bbox_to_anchor=(1, 1.02), loc='upper left')
Hello everyone,
I am trying to show the track of a tropical cyclone by using a scatter plot over a spatial plot containing sst. Now while plotting the track the scatter doesn't seem to plot all the plots, mostly it is seen over land its not plotting the points. I am not able to make out where I am doing wrong in my code, if anyone can help in this regard it will be much appreciated. I am posting part of my code along with the lat long values and the plot which I have generated.I am giving the lat long data as text as below:
Lat Lon grade
10.4 87 D
10.9 86.3 D
10.9 86.3 CS
11.1 86.1 CS
11.4 86 CS
11.5 86 SCS
12 86 VSCS
12.5 86.1 VSCS
13.2 86.3 ESCS
13.4 86.2 SuCS
14 86.3 SuCS
14.9 86.5 SuCS
15.6 86.7 SuCS
16.5 86.9 ESCS
17.4 87 ESCS
18.4 87.2 ESCS
19.1 87.5 ESCS
20.6 88 ESCS
21.9 88.4 VSCS
23.3 89 SCS
24.2 89.3 CS
25 89.6 DD
25.4 89.6 D
cs=map.contourf(x,y,plt_data,clevels,cmap=plt.cm.jet)#,clevels,cmap=plt.cm.jet)
map.colorbar(cs)
df = pd.read_excel('E:/bst_trc.xls',sheet_name='Sheet6')
colors = {'SuCS': 'red', 'ESCS': 'blue', 'SCS': 'green', 'D': 'black', 'VSCS': 'orange', 'DD':'cyan',
'CS': 'magenta'}
df['x'], df['y'] = map(list(df['Lon']), list(df['Lat']))
for grade in list(df['grade'].unique()):
ax.scatter(df[df['grade'] == grade]['x'],
df[df['grade'] == grade]['y'],
s = 50,
label = grade,
facecolors = colors[grade])
plt.plot(df['x'], df['y'], 'k-', lw = 1.5)
ax.legend(loc='upper center', bbox_to_anchor=(0.5, -0.05),
fancybox=True, shadow=True, ncol=7)
#plt.savefig('E:/Super_cyclone/Amphan/sst_bfr.tif', bbox_inches='tight', dpi=300)
plt.show()
grade_unique = df['grade'].unique()
grade_unique
colormap = {a:b/len(grade_unique) for b,a in enumerate(grade_unique)}
df['color'] = df['grade'].replace(colormap)
plt.plot(df['Lon'], df['Lat'], c='k')
plt.scatter(df['Lon'], df['Lat'], c=df['color'], cmap='magma')
plt.hlines(y=25,xmin=0,xmax=100, linestyles='dashed')
plt.ylim(5,30)
plt.xlim(75,95)
Plotting the data you supplied gives this.
I rescaled to get all data points onto the plot, and drew a line at your upper ylimit. So you are missing one datapoint because of scaling.
I have a DataFrame containing columns of numerical and non-numerical data. Here's a slice of it:
ATG12 Norm ATG5 Norm ATG7 Norm Cancer Stage
5.55 4.99 8.99 IIA
4.87 5.77 8.88 IIA
5.98 7.88 8.34 IIC
I want to group data by Cancer Stage, take the mean of every numerical data column and produce a table which lists means for each Cancer Stage; like this:
Cancer Stage ATG12 Mean ATG5 Mean ATG7 Mean
IIA 5.03 6.20 8.34
IIB 7.45 4.22 7.99
IIIA 5.32 3.85 6.68
I've figured out the groupby and mean() functions and can compute the means for one column at a time with:
AVG = data.groupby("Cancer Stage")['ATG12 Norm'].mean()
But that only gives me:
Cancer Stage
IIA 5.03
IIB 7.45
IIIA 5.32
Name: ATG12 Norm, dtype: float64
How can I apply this process to all the columns I want at once and produce a dataframe of it all? Sorry if this is a repeat; the pandas questions I've found that seem to be about related topics are all over my head.
Did you try
df.groupby('Cancer Stage').mean()
or
df.groupby('Cancer Stage')['ATG12 Norm','ATG5 Norm'].mean()
Example data with extra text column:
import pandas as pd
from StringIO import StringIO
data='''ATG12 Norm ATG5 Norm ATG7 Norm Cancer Stage Text
5.55 4.99 8.99 IIA ABC
4.87 5.77 8.88 IIA ABC
5.98 7.88 8.34 IIC ABC'''
df = pd.DataFrame.from_csv(StringIO(data), index_col=None, sep='\s{2,}')
print df
print df.groupby('Cancer Stage')['ATG12 Norm','ATG5 Norm'].mean()
print df.groupby('Cancer Stage').mean()
result:
ATG12 Norm ATG5 Norm ATG7 Norm Cancer Stage Text
0 5.55 4.99 8.99 IIA ABC
1 4.87 5.77 8.88 IIA ABC
2 5.98 7.88 8.34 IIC ABC
ATG12 Norm ATG5 Norm
Cancer Stage
IIA 5.21 5.38
IIC 5.98 7.88
ATG12 Norm ATG5 Norm ATG7 Norm
Cancer Stage
IIA 5.21 5.38 8.935
IIC 5.98 7.88 8.340