I have a dataframe that looks like this.
import pandas as pd
# intialise data of lists.
data = {'ID':[101762, 101762, 101762, 101762, 102842, 102842, 102842, 102842, 108615, 108615, 108615, 108615, 108615, 108615],
'Year':[2019, 2019, 2019, 2019, 2020, 2020, 2020, 2020, 2021, 2021, 2021, 2021, 2021, 2021],
'Quantity':[60, 80, 88, 75, 50, 55, 62, 58, 100, 105, 112, 110, 98, 95],
'Price':[2000, 3000, 3330, 4000, 850, 900, 915, 980, 1000, 1250, 1400, 1550, 1600, 1850]}
# Create DataFrame
df = pd.DataFrame(data)
# Print the output.
df
Here are some plots of the data.
import matplotlib.pyplot as plt
import seaborn as sns
uniques = df['ID'].unique()
for i in uniques:
fig, ax = plt.subplots()
fig.set_size_inches(4,3)
df_single = df[df['ID']==i]
sns.lineplot(data=df_single, x='Price', y='Quantity')
ax.set(xlabel='Price', ylabel='Quantity')
plt.xticks(rotation=45)
plt.show()
Now, I am trying to find the optimal price to sell something, before quantity sold starts to decline. I think the code below is pretty close, but when I run the code I get '33272.53'. This doesn't make any sense. I am trying to get the optimal price point per ID. How can I do that?
df["% Change in Quantity"] = df["Quantity"].pct_change()
df["% Change in Price"] = df["Price"].pct_change()
df["Price Elasticity"] = df["% Change in Quantity"] / df["% Change in Price"]
df.columns
import pandas as pd
from sklearn.linear_model import LinearRegression
x = df[["Price"]]
y = df["Quantity"]
# Fit a linear regression model to the data
reg = LinearRegression().fit(x, y)
# Find the optimal price that maximizes the quantity sold
optimal_price = reg.intercept_/reg.coef_[0]
optimal_price
Related
Courtesy to the original question for Plotly-R. The following will focus on Python.
Is it possible to create a plotly bar chart, e.g. any chart from the following website: plotly.com/r/bar-charts/ but with the gapped (broken) Y axis? An example from (ggplot2, I believe) attached below:
To my knowledge, plotly hasn't got any built-in functionality to do this. But it's still possible to make a figure that matches your image using subplots if you:
use make_subplots(rows=2, cols=1, vertical_spacing = <low>),
add the same traces to figure positions [1, 1] and [2, 1],
remove x-axis labels for [1, 1], and
adjust the y axes for figure positions [1, 1] and [2, 1] to respectively start and end with your desired cutoff values in a defined interval.
Plot:
Complete code:
# imports
import plotly.graph_objects as go
import plotly.express as px
import pandas as pd
from plotly.subplots import make_subplots
# data
df = pd.DataFrame({'years': [1995, 1996, 1997, 1998, 1999, 2000,
2001, 2002, 2003, 2004, 2005, 2006,
2007, 2008, 2009, 2010, 2011, 2012],
'China': [219, 146, 112, 127, 124, 180, 236,
207, 236, 263,350, 430, 474, 1526,
488, 537, 500, 439],
'Rest of world': [16, 13, 10, 11, 28, 37,
43, 55, 56, 88, 105, 156, 270,
299, 340, 403, 549, 1499]})
df.set_index('years', inplace = True)
# colors and cut-offs
colors = px.colors.qualitative.Plotly
cut_interval = [600, 1400]
# subplot setup
fig = make_subplots(rows=2, cols=1, vertical_spacing = 0.04)
fig.update_layout(title = "USA plastic scrap exports (...with some made-up values)")
# Traces for [2, 1]
# marker_color=colors[i] ensures that categories follow the same color cycle
for i, col in enumerate(df.columns):
fig.add_trace(go.Bar(x=df.index,
y=df[col],
name=col,
marker_color=colors[i],
legendgroup = col,
), row=2, col=1)
# Traces for [1, 1]
# Notice that showlegend = False.
# Since legendgroup = col the interactivity is
# taken care of in the previous for-loop.
for i, col in enumerate(df.columns):
fig.add_trace(go.Bar(x=df.index,
y=df[col],
name=col,
marker_color=colors[i],
legendgroup = col,
showlegend = False,
), row=1, col=1)
# Some aesthetical adjustments to layout
fig.update_yaxes(range=[cut_interval[1], max(df.max()*1.1)], row=1, col=1)
fig.update_xaxes(visible=False, row=1, col=1)
fig.update_yaxes(range=[0, cut_interval[0]], row=2, col=1)
fig.show()
So, I have a line chart that shows a random sales data from 2010 to 2020. But, I want to add a vertical line, or some visual resource to indicate something important that happened in 2014, for example. How can I do that in Python? Any library would do!
try using plt.axvline() with matplotlib
import matplotlib.pyplot as plt
x = [ 2015, 2016, 2017, 2018,2019,2020]
y = [ 1000, 1200, 2500, 1000, 1100,250]
plt.plot(x,y)
plt.title("Sales Bar graph")
plt.xlabel("year")
plt.ylabel('Sales')
#drwa a line in 2019 value
plt.axvline(x=2019, label='line at x = {}'.format(2019), c='red')
plt.show()
i would like to have a plot of nasdaq market index that has on x axis the years since 1971 and on y axis the values.
dataframe = pd.read_csv('nasdaq-historical-chart.csv', usecols=[1], engine='python')
dataset = dataframe.values
df = pd.read_csv('nasdaq-historical-chart.csv',parse_dates=True)
df['date'] = pd.to_datetime(df['date'])
df['year'] = df['date'].dt.year
plt.plot(dataset)
plt.figure(figsize=(25,10))
plt.plot(df['year'], dataset)
plt.title('NASDAQ historical chart',fontsize=24)
plt.xlabel('Time',fontsize=14)
plt.ylabel('Value',fontsize=14)
plt.tick_params(axis='both',labelsize=14)
plt.show()
In this way i have the right plot but without years on x axis
If i put:
plt.plot(df['year'], dataset)
i have:
why the plot changed? How can i modify it?
I created some data similar to yours as example, and you can do in a similar way:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'date':['1971-1-1','1971-1-2','1971-1-3','1971-1-4','1971-1-5','1971-1-6','1971-1-7','1971-1-8','1971-1-9',
'1971-1-10', '1971-1-11', '1971-1-12', '1972-1-1','1972-1-2','1972-1-3','1972-1-4',
'1972-1-5','1972-1-6','1972-1-7','1972-1-8','1972-1-9','1972-1-10', '1972-1-11', '1972-1-12'],
'values':[150, 130, 100, 95, 100, 105, 200, 180, 170, 160, 150, 155, 155, 170, 190, 192, 195, 200, 220,
230, 220, 210, 230, 235]})
print (df)
df['date'] = pd.to_datetime(df['date'], format='%Y-%d-%m')
plt.figure(figsize=(25,10))
plt.plot(df['date'], df['values'])
plt.title('NASDAQ historical chart',fontsize=24)
plt.xlabel('Time',fontsize=14)
plt.ylabel('Value',fontsize=14)
plt.tick_params(axis='both',labelsize=14)
plt.show()
I have two dataframes with the same index and columns like:
import pandas as pd
dfGDPgrowth = pd.DataFrame({'France':[2%, 1.8%, 3%], 'Germany':[3%, 2%, 2.5%]}, index = [2007, 2006, 2005])
dfpopulation = pd.DataFrame({'France':[100, 105, 112], 'Germany':[70, 73, 77]}, index = [2007, 2006, 2005])
Is there a straightforward matplotlib way to create a scatter plot with x-axis % grow and y-axis population?
Edit: My dataframe has 64 columns so I wonder if it could be done with some loop so I don't have to input them all manualy.
Are you looking for something like this
import pandas as pd
import matplotlib.pyplot as plt
dfGDPgrowth = pd.DataFrame({'France':[2, 1.8, 3], 'Germany':[3, 2, 2.5]}, index = [2007, 2006, 2005])
dfpopulation = pd.DataFrame({'France':[100, 105, 112], 'Germany':[70, 73, 77]}, index = [2007, 2006, 2005])
for col in dfGDPgrowth.columns:
plt.scatter(dfGDPgrowth[col], dfpopulation[col], label=col)
plt.legend(loc='best', fontsize=16)
plt.xlabel('Growth %')
plt.ylabel('Population')
I have a Pandas series with values for which I'd like to plot counts. This creates roughly what I want:
dy = sns.countplot(rated.year, color="#53A2BE")
axes = dy.axes
dy.set(xlabel='Release Year', ylabel = "Count")
dy.spines['top'].set_color('none')
dy.spines['right'].set_color('none')
plt.show()
The problem comes with missing data. There are 31 years with ratings, but over a timespan of 42 years. That means there should be some empty bins, which are not being displayed. Is there a way to configure this in Seaborn/Matplotlib? Should I use another type of graph, or is there another fix for this?
I thought about looking into whether it is possible to configure it as a time series, but I have the same problem with rating scales. So, on a 1-10 scale the count for e.g. 4 might be zero, and therefore '4' is not in the Pandas data series, which means it also does not show up in the graph.
The result I'd like is the full scale on the x-axis, with counts (for steps of one) on the y-axis, and showing zero/empty bins for missing instances of the scale, instead of simply showing the next bin for which data is available.
EDIT:
The data (rated.year) looks something like this:
import pandas as pd
rated = pd.DataFrame(data = [2016, 2004, 2007, 2010, 2015, 2016, 2016, 2015,
2011, 2010, 2016, 1975, 2011, 2016, 2015, 2016,
1993, 2011, 2013, 2011], columns = ["year"])
It has more values, but the format is the same. As you can see in..
rated.year.value_counts()
..there are quite a few x values for which count would have to be zero in the graph. Currently plot looks like:
I solved the problem by using the solution suggested by #mwaskom in the comments to my question. I.e. to add an 'order' to the countplot with all valid values for year, including those with count equals zero. This is the code that produces the graph:
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
rated = pd.DataFrame(data = [2016, 2004, 2007, 2010, 2015, 2016, 2016, 2015,
2011, 2010, 2016, 1975, 2011, 2016, 2015, 2016,
1993, 2011, 2013, 2011], columns = ["year"])
dy = sns.countplot(rated.year, color="#53A2BE", order = list(range(rated.year.min(),rated.year.max()+1)))
axes = dy.axes
dy.set(xlabel='Release Year', ylabel = "Count")
dy.spines['top'].set_color('none')
dy.spines['right'].set_color('none')
plt.show()
Consider a seaborn barplot by creating a reindexed series casted to a dataframe:
# REINDEXED DATAFRAME
rated_ser = pd.DataFrame(rated['year'].value_counts().\
reindex(range(rated.year.min(),rated.year.max()+1), fill_value=0))\
.reset_index()
# SNS BAR PLOT
dy = sns.barplot(x='index', y='year', data=rated_ser, color="#53A2BE")
dy.set_xticklabels(dy.get_xticklabels(), rotation=90) # ROTATE LABELS, 90 DEG.
axes = dy.axes
dy.set(xlabel='Release Year', ylabel = "Count")
dy.spines['top'].set_color('none')
dy.spines['right'].set_color('none')