Bar graph values missing matplotlib - python

My code-
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
plt.figure()
cols = ['hops','frequency']
data = [[-13,1],[-8,1],[-5,1],[0,2],[2,1],[4,1],[7,1]]
data = np.asarray(data)
indices = np.arange(0,len(data))
plot_data = pd.DataFrame(data, index=indices, columns=cols)
plt.bar(plot_data['hops'].tolist(),plot_data['frequency'].tolist(),width=0.8)
plt.xlim([-20,20])
plt.ylim([0,20])
plt.ylabel('Frequency')
plt.xlabel('Hops')
Output-
My requirements-
I want the graph to have the scale X axis-[-20,20],Y axis [0,18] and the bars should be labelled like in this case the 1st bar should be numbered 1 in this case and so on.

From your comment above, I am assuming this is what you want. You just need to specify the positions at which you want the x-tick labels.
xtcks = [-20, 20]
plt.xticks(np.insert(xtcks, 1, data[:, 0]))
plt.yticks([0, 18])

Related

Stripplot color points based on date

I have a data which has various values of A, B, C and D based different dates, i want to make a stripplot of these points, such that data points of recent date should be shaded darker(or have more alpha value) compared data points of previous dates.
this is what i have right now, all i need is to shade the points based on date for each bucket. but i am not able to figure that out
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib as mlp
plt.style.use("ggplot")
data = pd.DataFrame({"Date":pd.date_range(start="2020-01-06", end="2020-08-10", freq="W-MON"),
"A":[np.random.randint(-5, 50) for _ in range(len(pd.date_range(start="2020-01-06", end="2020-08-10", freq="W-MON")))],
"B":[np.random.randint(-5, 50) for _ in range(len(pd.date_range(start="2020-01-06", end="2020-08-10", freq="W-MON")))],
"C":[np.random.randint(-10, 50) for _ in range(len(pd.date_range(start="2020-01-06", end="2020-08-10", freq="W-MON")))],
"D":[np.random.randint(9, 50) for _ in range(len(pd.date_range(start="2020-01-06", end="2020-08-10", freq="W-MON")))]})
data.set_index("Date", inplace=True)
data.head()
sns.catplot(data=data, aspect=15/6, height=6)
This is the result of the above code
A scatter plot with randomized x-displacements can be used to apply one colormap per column.
To illustrate the effect, the example below uses random data with the most recent values being the largest.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
plt.style.use("ggplot")
dates = pd.date_range(start="2020-01-06", end="2020-08-10", freq="W-MON")
N = len(dates)
data = pd.DataFrame({"Date": dates,
"A": 30 + np.random.uniform(-5, 8, N).cumsum(),
"B": 20 + np.random.uniform(-4, 9, N).cumsum(),
"C": 25 + np.random.uniform(-4, 7, N).cumsum(),
"D": 40 + np.random.uniform(-2, 8, N).cumsum()})
data.set_index("Date", inplace=True)
columns = data.columns
for col_id, (column, cmap) in enumerate(zip(columns, ['Reds', 'Blues', 'Greens', 'Purples'])):
plt.scatter(col_id + np.random.uniform(-0.2, 0.2, N), data[column], c=range(N), cmap=cmap)
plt.xticks(range(len(columns)), columns)
plt.show()

How to use matplotlib to plot complex bar graphs–multiple subplots with multiple rows of bars?

I have a sample dataframe like the following:
import pandas as pd
df = pd.DataFrame(np.random.randint(
0, 10, size=(1000, 11)), columns=list('ABCDEFGHIJK'))
The desired but unpolished output looks like this:
The data of each column in the dataframe is plotted as a subplot with five rows of bars.
I prefer to use matplotlib because I can relatively easily make the graphs looking good. But its performance seems pretty slow.
You can use the bottom parameter of bar to offset the individual rows.
The following not optimized example demonstrates this approach:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
df = pd.DataFrame(np.random.randint(0, 10, size=(1000, 11)), columns=list('ABCDEFGHIJK'))
fig = plt.figure()
for i,c in enumerate(df.columns):
ax = fig.add_subplot(3, 4, i+1)
for x,h,b in zip((df.index.to_numpy() % 200).reshape(-1, 200), df[c].to_numpy().reshape(-1, 200), (df.index.to_numpy() // 200 * 10).reshape(-1, 200)):
ax.set_title(c)
ax.bar(x, h, bottom=b, color='k' )

sharey='all' argument in plt.subplots() not passed to df.plot()?

I have a pandas dataframe which I would like to slice, and plot each slice in a separate subplot. I would like to use the sharey='all' and have matplotlib decide on some reasonable y-axis limits, rather than having to search the dataframe for the min and max and add offsets.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame(np.arange(50).reshape((5,10))).transpose()
fig, axes = plt.subplots(nrows=0,ncols=0, sharey='all', tight_layout=True)
for i in range(1, len(df.columns) + 1):
ax = fig.add_subplot(2,3,i)
iC = df.iloc[:, i-1]
iC.plot(ax=ax)
Which gives the following plot:
In fact, it gives that irrespective of what I specify sharey to be ('all','col','row',True, or False). What I sought after using sharey='all' would be something like:
Can somebody perhaps explain me what I'm doing wrong here?
The following version would only add those axes you need for your df-columns and share their y-scales:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame(np.arange(50).reshape((5,10))).transpose()
fig = plt.figure(tight_layout=True)
ref_ax = None
for i in range(len(df.columns)):
ax = fig.add_subplot(2, 3, i+1, sharey=ref_ax)
ref_ax=ax
iC = df.iloc[:, i]
iC.plot(ax=ax)
plt.show()
The grid-layout Parameters, which are explicitly given as ...add_subplot(2, 3, ... here can of course be calculated with respect to len(df.columns).
Your plots are not shared. You create a subplot grid with 0 rows and 0 columns, i.e. no subplots at all, but those nonexisting subplots have their y axes shared. Then you create some other (existing) subplots, which are not shared. Those are the ones that are plotted to.
Instead you need to set nrows and ncols to some useful values and plot to those hence created axes.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame(np.arange(50).reshape((5,10))).transpose()
fig, axes = plt.subplots(nrows=2,ncols=3, sharey='all', tight_layout=True)
for i, ax in zip(range(len(df.columns)), axes.flat):
iC = df.iloc[:, i]
iC.plot(ax=ax)
for j in range(len(df.columns),len(axes.flat)):
axes.flatten()[j].axis("off")
plt.show()

datetime x-axis matplotlib labels causing uncontrolled overlap

I'm trying to plot a pandas series with a 'pandas.tseries.index.DatetimeIndex'. The x-axis label stubbornly overlap, and I cannot make them presentable, even with several suggested solutions.
I tried stackoverflow solution suggesting to use autofmt_xdate but it doesn't help.
I also tried the suggestion to plt.tight_layout(), which fails to make an effect.
ax = test_df[(test_df.index.year ==2017) ]['error'].plot(kind="bar")
ax.figure.autofmt_xdate()
#plt.tight_layout()
print(type(test_df[(test_df.index.year ==2017) ]['error'].index))
UPDATE: That I'm using a bar chart is an issue. A regular time-series plot shows nicely-managed labels.
A pandas bar plot is a categorical plot. It shows one bar for each index at integer positions on the scale. Hence the first bar is at position 0, the next at 1 etc. The labels correspond to the dataframes' index. If you have 100 bars, you'll end up with 100 labels. This makes sense because pandas cannot know if those should be treated as categories or ordinal/numeric data.
If instead you use a normal matplotlib bar plot, it will treat the dataframe index numerically. This means the bars have their position according to the actual dates and labels are placed according to the automatic ticker.
import pandas as pd
import numpy as np; np.random.seed(42)
import matplotlib.pyplot as plt
datelist = pd.date_range(pd.datetime(2017, 1, 1).strftime('%Y-%m-%d'), periods=42).tolist()
df = pd.DataFrame(np.cumsum(np.random.randn(42)),
columns=['error'], index=pd.to_datetime(datelist))
plt.bar(df.index, df["error"].values)
plt.gcf().autofmt_xdate()
plt.show()
The advantage is then in addition that matplotlib.dates locators and formatters can be used. E.g. to label each first and fifteenth of a month with a custom format,
import pandas as pd
import numpy as np; np.random.seed(42)
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
datelist = pd.date_range(pd.datetime(2017, 1, 1).strftime('%Y-%m-%d'), periods=93).tolist()
df = pd.DataFrame(np.cumsum(np.random.randn(93)),
columns=['error'], index=pd.to_datetime(datelist))
plt.bar(df.index, df["error"].values)
plt.gca().xaxis.set_major_locator(mdates.DayLocator((1,15)))
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter("%d %b %Y"))
plt.gcf().autofmt_xdate()
plt.show()
In your situation, the easiest would be to manually create labels and spacing, and apply that using ax.xaxis.set_major_formatter.
Here's a possible solution:
Since no sample data was provided, I tried to mimic the structure of your dataset in a dataframe with some random numbers.
The setup:
# imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import matplotlib.ticker as ticker
# A dataframe with random numbers ro run tests on
np.random.seed(123456)
rows = 100
df = pd.DataFrame(np.random.randint(-10,10,size=(rows, 1)), columns=['error'])
datelist = pd.date_range(pd.datetime(2017, 1, 1).strftime('%Y-%m-%d'), periods=rows).tolist()
df['dates'] = datelist
df = df.set_index(['dates'])
df.index = pd.to_datetime(df.index)
test_df = df.copy(deep = True)
# Plot of data that mimics the structure of your dataset
ax = test_df[(test_df.index.year ==2017) ]['error'].plot(kind="bar")
ax.figure.autofmt_xdate()
plt.figure(figsize=(15,8))
A possible solution:
test_df = df.copy(deep = True)
ax = test_df[(test_df.index.year ==2017) ]['error'].plot(kind="bar")
plt.figure(figsize=(15,8))
# Make a list of empty myLabels
myLabels = ['']*len(test_df.index)
# Set labels on every 20th element in myLabels
myLabels[::20] = [item.strftime('%Y - %m') for item in test_df.index[::20]]
ax.xaxis.set_major_formatter(ticker.FixedFormatter(myLabels))
plt.gcf().autofmt_xdate()
# Tilt the labels
plt.setp(ax.get_xticklabels(), rotation=30, fontsize=10)
plt.show()
You can easily change the formatting of labels by checking strftime.org

Plotting through a subset of data frame in Pandas using Matplotlib

I have a Dataframe and I slice the Dataframe into three subsets. Each subset has 3 to 4 rows of data. After I slice the data frame into three subsets, I plot them using Matplotlib.
The problem I have is I am not able to create a plot where each subplot is plotted using sliced DataFrame. For example, in a group of three in a set, I have only one of the plots (last subplot) plotted where there is no data for the remaining two plots initial sets in a group. it looks like the 'r' value does not pass to 'r.plot' for all three subplots.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
df = pd.DataFrame(np.random.randn(10, 4), columns=list('ABCD'))
df['key1'] = 0
df.key1.iloc[0:3] = 1
df.key1.iloc[3:7] = 2
df.key1.iloc[7:] = 3
df_grouped = df.groupby('key1')
for group_name, group_value in df_grouped:
rows, columns = group_value.shape
fig, axes = plt.subplots(rows, 1, sharex=True, sharey=True, figsize=(15,20))
for i,r in group_value.iterrows():
r = r[0:columns-1]
r.plot(kind='bar', fill=False, log=False)
I think you might want what I call df_subset to be summarized in some way, but here's a way to plot each group in its own panel.
# Your Code Setting Up the Dataset
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame(np.random.randn(10, 4), columns=list('ABCD'))
df['key1'] = 0
df.key1.iloc[0:3] = 1
df.key1.iloc[3:7] = 2
df.key1.iloc[7:] = 3
# My Code to Plot in Three Panels
distinct_keys = df['key1'].unique()
fig, axes = plt.subplots(len(distinct_keys), 1, sharex=True, figsize=(3,5))
for i, key in enumerate(distinct_keys):
df_subset = df[df.key1==key]
# {maybe insert a line here to summarize df_subset somehow interesting?}
# plot
axes[i] = df_subset.plot(kind='bar', fill=False, log=False)

Categories