I have the following dataframe coming from an excel file:
df = pd.read_excel('base.xlsx')
My excel file contains the following columns:
data - datetime64[ns]
stock- float64
demand - float64
origem - object
I need to plot a bar chart where the x-axis will be the date and the bars the stock and demand. Blue would be the demand and orange the stock:
This can be done with the pandas bar plot function. Note that if there are dates that are not recorded in your dataset (e.g. weekends or national holidays) they will not be automatically displayed with a gap in the bar plot. This is because bar plots in pandas (and other packages) are made primarily for categorical data, as mentioned here and here.
import numpy as np # v 1.19.2
import pandas as pd # v 1.1.3
import matplotlib.pyplot as plt # v 3.3.2
# Create a random time series with the date as index
# In your case where you are importing your dataset from excel you
# would assign your date column to the df index like this:
rng = np.random.default_rng(123)
days = 7
df = pd.DataFrame(dict(demand = rng.uniform(100, size=days),
stock = rng.uniform(100, size=days),
origin = np.random.choice(list('ABCD'), days)),
index = pd.date_range(start='2020-12-14', freq='D', periods=days))
# Create pandas bar plot
fig, ax = plt.subplots(figsize=(10,5))
df.plot.bar(ax=ax, color=['tab:blue', 'tab:orange'])
# Assign ticks with custom tick labels
# Date format codes for xticklabels:
# https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes
plt.xticks(ax.get_xticks(), [ts.strftime('%A') for ts in df.index], rotation=0)
plt.legend(frameon=False)
plt.show()
Related
I am having trouble eliminating datetime gaps within a dataset that i'm trying to create a very simple line chart in plotly express and I have straight lines on the graph connecting datapoints over a gap in the data (weekends).
Dataframe simply has an index of datetime (to the hour) called sale_date, and cols called NAME, COST with approximately 30 days worth of data.
df['sale_date'] = pd.to_datetime(df['sale_date'])
df = df.set_index('sale_date')
px.line(df, x=df.index, y='COST', color='NAME')
I've seen a few posts regarding this issue and one recommended setting datetime as the index, but it still yields the gap lines.
The data in the example may not be the same as yours, but the point is that you can change the x-axis data to string data instead of date/time data, or change the x-axis type to category, and add a scale and tick text.
import pandas as pd
import plotly.express as px
import numpy as np
np.random.seed(2021)
date_rng = pd.date_range('2021-08-01','2021-08-31', freq='B')
name = ['apple']
df = pd.DataFrame({'sale_date':pd.to_datetime(date_rng),
'COST':np.random.randint(100,3000,(len(date_rng),)),
'NAME':np.random.choice(name,size=len(date_rng))})
df = df.set_index('sale_date')
fig= px.line(df, x=[d.strftime('%m/%d') for d in df.index], y='COST', color='NAME')
fig.show()
xaxis update
fig= px.line(df, x=df.index, y='COST', color='NAME')
fig.update_xaxes(type='category',
tickvals=np.arange(0,len(df)),
ticktext=[d.strftime('%m/%d') for d in df.index])
I have an Excel file that looks like this:
I would like to plot all 3 individuals' weight on Jan 1, 2020 on a bar chart to compare visually in Python using Matplotlib. How can I do this?
It's probably easiest done with pandas:
import pandas as pd
import datetime as dt
df = pd.read_excel('your_file_location', sheet_name='sheet_name', parse_dates=['Date'])
df = df.loc[df['Date'] == dt.date(year=2020, month=1, day=1)]
ax = df.plot.bar(df['Name'], df['Weight'])
Here we first load data from a specific sheet of your excel file (you can omit sheet_name argument if your excel file has only a single sheet), then we filter data to show only records from a specific date and then plot with names on x-axis and weight on y-axis.
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('C:\\Desktop\\file.csv', index_col = 'Date',
parse_dates = True) #importing data to python and making date column as
index
df['year'] = df.index.year #extracting year from index
data_20 = df[df['year'] == 2020] # Filtering out 2020 date
ax = data_20.plot(kind='bar',x='Name',y='Weight') #Plotting by name for 2020
To plot only for 2 people:
ax = data_20[data_20['Name'] != 'John Smith']
.plot(kind='bar',x='Name',y='Weight') #Plotting by name for 2020
ax.set_ylabel('Weights in lbs') #Labeling y-axis
ax.set_xlabel('Names') #Labeling x-axis
ax.set_title('Weights for 2020') # Adding the title
To make it pretty just add labels:
ax.set_ylabel('Weights in lbs') #Labeling y-axis
ax.set_xlabel('Names') #Labeling x-axis
ax.set_title('Weights for 2020'); # Adding the title
I have been trying to get a boxplot with each box representing an emotion over a period of time.
The data frame used to plot this contains timestamp and emotion name. I have tried converting the timestamp into a string first and then to datetime and finally to int64. This resulted in the gaps between x labels as seen in the plot. I have tried the same without converting to int64, but the matplotlib doesn't seem to allow the dates in the plot.
I'm attaching the code I have used here:
import matplotlib as mpl
import matplotlib.pyplot as plt
plt.style.use('classic')
%matplotlib qt
import pandas as pd
import numpy as np
from datetime import datetime
import seaborn as sns
data = pd.read_csv("TX-governor-sentiment.csv")
## check data types
data.dtypes
# drop rows with all missing values
data = data.dropna(how='all')
## transforming the timestamp column
#convert from obj type to string then to date type
data['timestamp2'] = data['timestamp']
data['timestamp2'] = pd.to_datetime(data['timestamp2'].astype(str), format='%m/%d/%Y %H:%M')
# convert to number format with the following logic:
# yyyymmddhourmin --> this allows us to treat dates as a continuous variable
data['timestamp2'] = data['timestamp2'].dt.strftime('%Y%m%d%H%M')
data['timestamp2'] = data['timestamp2'].astype('int64')
print (data[['timestamp','timestamp2']])
#data transformation for data from Orange
df = pd.DataFrame(columns=('timestamp', 'emotion'))
for index, row in data.iterrows():
if row['sentiment'] == 0:
df.loc[index] = [row['timestamp2'], 'Neutral']
else:
df.loc[index] = [row['timestamp2'], row['Emotion']]
# Plot using Seaborn & Matplotlib
#convert timestamp in case it's not in number format
df['timestamp'] = df['timestamp'].astype('int64')
fig = plt.figure(figsize=(10,10))
#colors = {"Neutral": "grey", "Joy": "pink", "Surprise":"blue"}
#visualize as boxplot
plot_ = sns.boxplot(x="timestamp", y="emotion", data=df, width=0.5,whis=np.inf);
#add data point on top
plot_ = sns.stripplot(x="timestamp", y="emotion", data=df, alpha=0.8, color="black");
fig.canvas.draw()
#modify ticks and labels
plt.xlim([202003010000,202004120000])
plt.xticks([202003010000, 202003150000, 202003290000, 202004120000], ['2020/03/01', '2020/03/15', '2020/03/29', '2020/04/12'])
#add colors
for patch in plot_.artists:
r, g, b, a = patch.get_facecolor()
patch.set_facecolor((r, g, b, .3))
Please let me know how I can overcome this problem of gaps in the boxplot. Thank you!
I would like to consolidate tick data stored in a pandas dataframe to the open high low close format but not time related, but aggregated for every 100 ticks. After that I would like to display them in a candlestick chart using matlibplot.
I solved this already for a time related aggregation using a pandas dataset with two values: TIMESTAMP and PRICE. The TIMESTAMP already has the pandas date format so I work with that:
df["TIMESTAMP"]= pd.to_datetime(df["TIMESTAMP"])
df = df.set_index(['TIMESTAMP'])
data_ohlc = df['PRICE'].resample('15Min').ohlc()
Is there any function, that resamples datasets in the ohlc format not using a time frame, but a count of ticks?
After that it comes to visualization, so for plotting I have to change date format to mdates. The candlestick_ohlc function requires a mdate format:
data_ohlc["TIMESTAMP"] = data_ohlc["TIMESTAMP"].apply(mdates.date2num)
from mpl_finance import candlestick_ohlc
candlestick_ohlc(ax1,data_ohlc.values,width=0.005, colorup='g', colordown='r',alpha=0.75)
So is there any function to display a candle stick chart without mdates because by aggregating tick data there would be no time relation?
As there seems to be no build in function for this problem I wrote one myself. The given dataframe needs to have the actual values in the column "PRICE":
def get_pd_ohlc(mydf, interval):
## use a copy, so that the new column doesn't effect the original dataset
mydf = mydf.copy()
## Add a new column to name tick interval
interval = [(1+int(x/interval)) for x in range(mydf["PRICE"].count())]
mydf["interval"] = interval
##Step 1: Group
grouped = mydf.groupby('interval')
##Step 2: Calculate different aggregations
myopen = grouped['PRICE'].first()
myhigh = grouped['PRICE'].max()
mylow = grouped['PRICE'].min()
myclose = grouped['PRICE'].last()
##Step 3: Generate Dataframe:
pd_ohlc = pd.DataFrame({'OPEN':myopen,'HIGH':myhigh,'LOW':mylow,'CLOSE':myclose})
return(pd_ohlc)
pd_100 = get_pd_ohlc(df,100)
print (pd_100.head())
I also found a solution to display ist. Module mpl_finance has a function candlestick2_ohlc, that does not need any datetime information. Here is the code:
#Making plot
import matplotlib.pyplot as plt
from mpl_finance import candlestick2_ohlc
fig = plt.figure()
plt.rcParams['figure.figsize'] = (16,8)
ax1 = plt.subplot2grid((6,1), (0,0), rowspan=12, colspan=1)
#Making candlestick plot
candlestick2_ohlc(ax1, pd_ohlc['OPEN'], pd_ohlc['HIGH'],
pd_ohlc['LOW'], pd_ohlc['CLOSE'], width=0.5,
colorup='#008000', colordown='#FF0000', alpha=1)
I'm trying to plot a diagram of monthly data I received. When plotting the data, the plot only shows the year on the x axis, but not the month. How can I make it also show the month on x tick labels?
import pandas as pd
import numpy as np
new_index = pd.date_range(start = "2012-07-01", end = "2017-07-01", freq = "MS")
columns = ['0']
df = pd.DataFrame(index=new_index, columns=columns)
for index, row in df.iterrows():
row[0] = np.random.randint(0,100)
%matplotlib inline
df.loc['2015-09-01'] = np.nan
df.plot(kind="line",title="Data per month", figsize = (40,10), grid=True, fontsize=20)
You may use FixedFormatter from matplotlib.ticker to define your own formatter for custom ticks like here:
...
ticklabels = [item.strftime('%b %Y') for item in df.index[::6]] # set ticks format: month name and year for every 6 elements of index
plt.gca().xaxis.set_ticks(df.index[::6]) # set new ticks for current x axis
plt.gca().xaxis.set_major_formatter(ticker.FixedFormatter(ticklabels)) # apply new tick format
...
Or dates in two lines if use %b\n%Y format: