I have a pandas dataframe, which looks like this:
import seaborn as sns
import pandas as pd
d = {'a': [100, 125, 300, 520], 'b': [250, 270, 278, 248]}
df = pd.DataFrame(data=d, index=[25, 26, 26, 30])
a b
25 100 250
26 125 270
26 300 278
30 520 248
When I try to plot this dataframe with
df=sns.lineplot(data=df, dashes=False)
the values for 26 are averaged and a error bar shows up. However I want the values for 26 plotted separately.
That's what the estimator parameter does. See the docs: https://seaborn.pydata.org/generated/seaborn.lineplot.html
sns.lineplot(data=df, dashes=False, estimator=None)
Related
I am trying to produce multiple plots from a for loop.
My dataframe is multi-indexed as below:
temperature depth
ID Month
33 2 150 95
3 148 79
4 148 54
5 155 77
55 2 168 37
3 172 33
4 107 32
5 155 77
61 2 168 37
3 172 33
4 107 32
5 155 77
I want to loop through each ID and plot:
Temperature as a line against Month (x-axis)
Depth as a bar against Month (x-axis)
I want these to be on the same plot.
This is what I have so far:
# group the dataframe
grp = df.groupby([df.index.get_level_values(0), df.index.get_level_values(1)])
# create empty plots
fig, ax = plt.subplots()
# create an empty plot for combining with ax
ax2 = ax.twinx()
# for loop
for ID, group in grp:
ax.bar(df.index.get_level_values(1), group["temperature"], color='blue', label='Release')
ax2.plot(df.index.get_level_values(1), group["depth"], color='green', label='Hold')
ax.set_xticklabels(df.index.get_level_values(1))
plt.savefig("value{y}.png".format(y=ID))
next
dataframe reprex:
import pandas as pd
index = pd.MultiIndex.from_product([[33, 55, 61],['2','3','4', '5']], names=['ID','Month'])
df = pd.DataFrame([[150, 95],
[148, 79],
[148, 54],
[155, 77],
[168, 37],
[172, 33],
[107, 32],
[155, 77],
[168, 37],
[172, 33],
[107, 32],
[155, 77]],
columns=['temperature', 'depth'], index=index)
Here's the table from the dataframe:
Points_groups
Qty Contracts
Qty Gones
1
350+
108
275
2
300-350
725
1718
3
250-300
885
3170
4
200-250
2121
10890
5
150-200
3120
7925
6
100-150
653
1318
7
50-100
101
247
8
0-50
45
137
I'd like to get something like this out of it:
But that the columns correspond to the 'x' axis,
which was built from the 'Scores_groups' column like this
I tried a bunch of options already, but I couldn't get it.
For example:
df.plot(kind ='hist')
plt.xlabel('Points_groups')
plt.ylabel("Number Of Students");
or
sns.distplot(df['Кол-во Ушедшие'])
sns.distplot(df['Кол-во Контракт'])
plt.show()
or
df.hist(column='Баллы_groups', by= ['Кол-во Контракт', 'Кол-во Ушедшие'], bins=2, grid=False, rwidth=0.9,color='purple', sharex=True);
Since you already have the distribution in your pandas dataframe, the plot you need can be achieved with the following code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
Df = pd.DataFrame({'key': ['red', 'green', 'blue'], 'A': [1, 2, 1], 'B': [2, 4, 3]})
X_axis = np.arange(len(Df['key']))
plt.bar(X_axis - 0.2, Df['A'], 0.4, label = 'A')
plt.bar(X_axis + 0.2, Df['B'], 0.4, label = 'B')
X_label = list(Df['key'].values)
plt.xticks(X_axis, X_label)
plt.legend()
plt.show()
Since I don't have access to your data, I made some mock dataframe. This results in the following figure:
I was wondering, if you can annotate every graph in this example automatically using the column headers as labels.
import seaborn as sns
import pandas as pd
d = {'a': [100, 125, 300, 520],..., 'z': [250, 270, 278, 248]}
df = pd.DataFrame(data=d, index=[25, 26, 26, 30])
a ... z
25 100 ... 250
26 125 ... 270
26 300 ... 278
30 520 ... 248
When I use this code, I only get the column headers as a legend. However, I want the labels to be directly beside/above my graphs.
sns.lineplot(data=df, dashes=False, estimator=None)
Is this what you are looking for?
ax = sns.lineplot(data=df, dashes=False, estimator=None, legend=False)
for label, pos in df.iloc[0].iteritems():
ax.annotate(label, (df.index[0], pos*1.05), ha='left', va='bottom')
output:
Something like:
ax = sns.lineplot(data=df, dashes=False, estimator=None)
for c, l in zip(df.columns, ax.lines):
y = l.get_ydata()
ax.annotate(f'{c}', xy=(1.01,y[-1]), xycoords=('axes fraction', 'data'),
ha='left', va='center', color=l.get_color())
Source: https://stackoverflow.com/a/62703420/15239951
I have the following dataset:
# Make the data
df = pd.DataFrame({'weight': [200, 170, 160, 150, 145],
'days': [0, 91, 174, 205, 279]})
# Display the data
df
weight days
0 200 0
1 170 91
2 160 174
3 150 205
4 145 279
I want to make a lineplot with avxspan with the following code.
# Plot
sns.lineplot(
x=df['days'],
y=df['weight'],
marker="o",
alpha=0.5)
plt.axvspan(200, max(df['days']), facecolor='g', alpha=0.4, label='Intermittent Fasting Starts')
plt.xlabel('Days Passed')
plt.ylabel('Total Weight (Lbs)')
plt.legend();
However, the span isn't going to the extreme of the plot border.
How can I make the span to the edge of the plot?
Any suggestions would be appreciated. Thanks!
I am trying to switch pandas panel into xarray.Dataset
I have a dataset created form dictionary od dataframes. Each dataframe contains data for one stock. The dataframe rows are trading dates, columns are prices and indicators. Sample code:
import pandas as pd
import xarray as xr
panel_dict = {}
panel_dict['AAPL'] = pd.DataFrame({'Open': [100, 105], 'Close': [104, 108],
'SMA200':[102, 110], 'RSI2': [11 , 14]},
index=['2017-09-01', '2017-09-02'])
panel_dict['AMZN'] = pd.DataFrame({'Open': [200, 180], 'Close': [190, 170],
'SMA200':[190, 190], 'RSI2': [11 , 15]},
index=['2017-09-01', '2017-09-02'])
panel_dict['AGN'] = pd.DataFrame({'Open': [300, 310], 'Close': [300, 310],
'SMA200':[250, 250], 'RSI2': [5 , 15]},
index=['2017-09-01', '2017-09-02'])
ds_full = xr.Dataset(panel_dict)
print(ds_full)
# selecting one day works
ds = ds_full.sel(dim_0 = '2017-09-02')
print(ds)
# filtering does not work
c = ds[ds['Close']>ds['SMA200']]
c = c[c['RSI2'] < 12.0 ]
c = c.sort_values(by = 'RSI2', ascending=True)
The dataset ds_full looks like:
<xarray.Dataset>
Dimensions: (dim_0: 2, dim_1: 4)
Coordinates:
* dim_0 (dim_0) object '2017-09-01' '2017-09-02'
* dim_1 (dim_1) object 'Close' 'Open' 'RSI2' 'SMA200'
Data variables:
AAPL (dim_0, dim_1) int64 104 100 11 102 108 105 14 110
AMZN (dim_0, dim_1) int64 190 200 11 190 170 180 15 190
AGN (dim_0, dim_1) int64 300 300 5 250 310 310 15 250
<xarray.Dataset>
Selection of 1 day data with ds = ds_full.sel(dim_0 = '2017-09-02') works nice:
<xarray.Dataset>
Dimensions: (dim_1: 4)
Coordinates:
dim_0 <U10 '2017-09-02'
* dim_1 (dim_1) object 'Close' 'Open' 'RSI2' 'SMA200'
Data variables:
AAPL (dim_1) int64 108 105 14 110
AMZN (dim_1) int64 170 180 15 190
AGN (dim_1) int64 310 310 15 250
But how can I filter some additional conditions like 'Close' > 'SMA200' or 'RSI2' < 12 ? And how to sort the result by 'RSI2' column?
In original code using pandas.panel it was like this:
c = ds[ds['Close']>ds['SMA200']]
c = c[c['RSI2'] < 12.0 ]
c = c.sort_values(by = 'RSI2', ascending=True)