Matplotlib bar plot with two different colors - python

In a Dataframe column with 10 values, I am trying to create a bar plot with the first 5 being blue, and the remaining 5 red.
In [1]: import matplotlib.pyplot as plt
In [2]: import pandas as pd
In [3]: df = pd.DataFrame({"x": range(10)})
In [4]: ax = df.iloc[:5].plot(kind="bar", color="blue")
In [5]: plt.show()
In [6]: df.iloc[5:].plot(kind="bar", color="grey", ax=ax)
Out[6]: <Axes: >
In [7]: ax.bar(df.index[5:].values, df.iloc[5:, 0].values, color="grey")
Out[7]: <BarContainer object of 5 artists>
[5] Shows the plot of the first 5 blue elements. I tried two ways of adding the second part, [6] and [7], but that was not successful. I tried using plt.show() which did not show anything. A final ax.figure.savefig("fig.png") shows a barplot of the last 5 elements in grey.
How do I get them into one figure?
I tried using the above code to generate the desired figure. Tried putting the same commands into a jupyter notebook yielding the same results.

The output of what you want is unclear. Do you want to get something like:
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame({'x': range(1, 11)})
ax = df['x'].plot(kind='bar', color=['blue']*5+['red']*5)
plt.show()

Related

python plotting multiple bars

I've been trying to do it for several hours and I have a mistake every time. I want to create 3 bar plots in one graph. The y-axis is to be between 0 and 1000.
The end result should be this
Thats my code:
import matplotlib.pyplot as plt
import numpy as np
import csv
df = pd.read_csv('razemKM.csv')
dfn = pd.read_csv('razemNPM.csv')
print(df)
y=[0,1000]
a=(df["srednia"]-df["odchStand"])
a1=df["srednia"]
a2=(df["srednia"]+df["odchStand"])
plt.bar(y,a,width=0.1,color='r')
plt.bar(y,a1,width=0.1,color='g')
plt.bar(y,a2,width=0.1,color='y')
plt.show()
You can use pandas plot function:
df['Sum'] = df["srednia"]+df["odchStand"]
df['Dif'] = df["srednia"]-df["odchStand"]
df.plot.bar(y=['Diff','srednia', 'Sum'],width=0.1)
plt.show()

Plot all pandas dataframe columns separately

I have a pandas dataframe who just has numeric columns, and I am trying to create a separate histogram for all the features
ind group people value value_50
1 1 5 100 1
1 2 2 90 1
2 1 10 80 1
2 2 20 40 0
3 1 7 10 0
3 2 23 30 0
but in my real life data there are 50+ columns, how can I create a separate plot for all of them
I have tried
df.plot.hist( subplots = True, grid = True)
It gave me an overlapping unclear plot.
how can I arrange them using pandas subplots = True. Below example can help me to get graphs in (2,2) grid for four columns. But its a long method for all 50 columns
fig, [(ax1,ax2),(ax3,ax4)] = plt.subplots(2,2, figsize = (20,10))
Pandas subplots=True will arange the axes in a single column.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame(np.random.rand(7,20))
df.plot(subplots=True)
plt.tight_layout()
plt.show()
Here, tight_layout isn't applied, because the figure is too small to arange the axes nicely. One can use a bigger figure (figsize=(...)) though.
In order to have the axes on a grid, one can use the layout parameter, e.g.
df.plot(subplots=True, layout=(4,5))
The same can be achieved if creating the axes via plt.subplots()
fig, axes = plt.subplots(nrows=4, ncols=5)
df.plot(subplots=True, ax=axes)
If you want to plot them separately (which is why I ended up here), you can use
for i in df.columns:
plt.figure()
plt.hist(df[i])
An alternative for this task can be using the "hist" method with hyperparameter "layout". Example using part of the code provided by #ImportanceOfBeingErnest:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame(np.random.rand(7,20))
df.hist(layout=(5,4), figsize=(15,10))
plt.show()
Using pandas.DataFrame I would suggest using pandas.DataFrame.apply. With a custom function, in this example plot(), you can print and save each figure seperately.
def plot(col):
fig, ax = plt.subplots()
ax.plot(col)
plt.show()
df.apply(plot)
While not asked for in the question I thought I'd add that using the x parameter to plot would allow you to specify a column for the x axis data.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame(np.random.rand(7,20),columns=list('abcdefghijklmnopqrst'))
df.plot(x='a',subplots=True, layout=(4,5))
plt.tight_layout()
plt.show()
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.plot.html

Bar graph values missing matplotlib

My code-
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
plt.figure()
cols = ['hops','frequency']
data = [[-13,1],[-8,1],[-5,1],[0,2],[2,1],[4,1],[7,1]]
data = np.asarray(data)
indices = np.arange(0,len(data))
plot_data = pd.DataFrame(data, index=indices, columns=cols)
plt.bar(plot_data['hops'].tolist(),plot_data['frequency'].tolist(),width=0.8)
plt.xlim([-20,20])
plt.ylim([0,20])
plt.ylabel('Frequency')
plt.xlabel('Hops')
Output-
My requirements-
I want the graph to have the scale X axis-[-20,20],Y axis [0,18] and the bars should be labelled like in this case the 1st bar should be numbered 1 in this case and so on.
From your comment above, I am assuming this is what you want. You just need to specify the positions at which you want the x-tick labels.
xtcks = [-20, 20]
plt.xticks(np.insert(xtcks, 1, data[:, 0]))
plt.yticks([0, 18])

pandas: plot multiple columns of a timeseries with labels (example from pandas documentation)

I'm trying to recreate the example of plot multiple columns of a timeseries with labels as shown in the pandas documentation here: http://pandas.pydata.org/pandas-docs/dev/visualization.html#visualization-basic (second graph)
Here's my code:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
ts = pd.DataFrame(np.random.randn(1000, 4), index=pd.date_range('1/1/2000', periods=1000), columns=list('ABCD'))
ts = ts.cumsum()
fig = plt.figure()
print ts.head()
ts.plot()
fig.savefig("x.png")
the text output seems ok:
A B C D
2000-01-01 1.547838 -0.571000 -1.780852 0.559283
2000-01-02 1.165659 -1.859979 -0.490980 0.796502
2000-01-03 0.786416 -2.543543 -0.903669 1.117328
2000-01-04 1.640174 -3.756809 -1.862188 0.466236
2000-01-05 2.119575 -4.590741 -1.055563 1.004607
but x.png is always empty.
If I plot just one column:
ts['A'].plot()
I do get a result.
Is there a way to debug this, to find out what's going wrong here?
The reason you don't get a result is because you are not saving the 'correct' figure: you are making a figure with plt.figure(), but pandas does not plot on the current figure, and will create a new one.
If you do:
ts.plot()
fig = plt.gcf() # get current figure
fig.savefig("x.png")
I get the correct output. When plotting a Series, it does use the current axis if no axis is passed.
But it seems that the pandas docs are not fully correct on that account (as they use the plt.figure()), I reported an issue for that: https://github.com/pydata/pandas/issues/8776
Another option is to provide an axes object using the ax argument:
fig = plt.figure()
ts.plot(ax=plt.gca()) # 'get current axis'
fig.savefig("x.png")
or slightly cleaner (IMO):
fig, ax = plt.subplots()
ts.plot(ax=ax)
fig.savefig("x.png")

python pandas plot with uneven timeseries index (with count evenly distributed)

My dataframe has uneven time index.
how could I find a way to plot the data, and local the index automatically? I searched here, and I know I can plot something like
e.plot()
but the time index (x axis) will be even interval, for example per 5 minutes.
if I have to 100 data in first 5 minutes and 6 data for the second 5 minutes, how do I plot
with number of data evenly. and locate the right timestamp on x axis.
here's even count, but I don't know how to add time index.
plot(e['Bid'].values)
example of data format as requested
Time,Bid
2014-03-05 21:56:05:924300,1.37275
2014-03-05 21:56:05:924351,1.37272
2014-03-05 21:56:06:421906,1.37275
2014-03-05 21:56:06:421950,1.37272
2014-03-05 21:56:06:920539,1.37275
2014-03-05 21:56:06:920580,1.37272
2014-03-05 21:56:09:071981,1.37275
2014-03-05 21:56:09:072019,1.37272
and here's the link
http://code.google.com/p/eu-ats/source/browse/trunk/data/new/eur-fix.csv
here's the code, I used to plot
import numpy as np
import pandas as pd
import datetime as dt
e = pd.read_csv("data/ecb/eur.csv", dtype={'Time':object})
e.Time = pd.to_datetime(e.Time, format='%Y-%m-%d %H:%M:%S:%f')
e.plot()
f = e.copy()
f.index = f.Time
x = [str(s)[:-7] for s in f.index]
ff = f.set_index(pd.Series(x))
ff.index.name = 'Time'
ff.plot()
Update:
I added two new plots for comparison to clarify the issue. Now I tried brute force to convert timestamp index back to string, and plot string as x axis. the format easily got messed up. it seems hard to customize location of x label.
Ok, it seems like what you're after is that you want to move around the x-tick locations so that there are an equal number of points between each tick. And you'd like to have the grid drawn on these appropriately-located ticks. Do I have that right?
If so:
import pandas as pd
import urllib
import matplotlib.pyplot as plt
import seaborn as sbn
content = urllib.urlopen('https://eu-ats.googlecode.com/svn/trunk/data/new/eur-fix.csv')
df = pd.read_csv(content, header=0)
df['Time'] = pd.to_datetime(df['Time'], format='%Y-%m-%d %H:%M:%S:%f')
every30 = df.loc[df.index % 30 == 0, 'Time'].values
fig, ax = plt.subplots(1, 1, figsize=(9, 5))
df.plot(x='Time', y='Bid', ax=ax)
ax.set_xticks(every30)
I have tried to reproduce your issue, but I can't seem to. Can you have a look at this example and see how your situation differs?
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sbn
np.random.seed(0)
idx = pd.date_range('11:00', '21:30', freq='1min')
ser = pd.Series(data=np.random.randn(len(idx)), index=idx)
ser = ser.cumsum()
for i in range(20):
for j in range(8):
ser.iloc[10*i +j] = np.nan
fig, axes = plt.subplots(1, 2, figsize=(10, 5))
ser.plot(ax=axes[0])
ser.dropna().plot(ax=axes[1])
gives the following two plots:
There are a couple differences between the graphs. The one on the left doesn't connect the non-continuous bits of data. And it lacks vertical gridlines. But both seem to respect the actual index of the data. Can you show an example of your e series? What is the exact format of its index? Is it a datetime_index or is it just text?
Edit:
Playing with this, my guess is that your index is actually just text. If I continue from above with:
idx_str = [str(x) for x in idx]
newser = ser
newser.index = idx_str
fig, axes = plt.subplots(1, 2, figsize=(10, 5))
newser.plot(ax=axes[0])
newser.dropna().plot(ax=axes[1])
then I get something like your problem:
More edit:
If this is in fact your issue (the index is a bunch of strings, not really a bunch of timestamps) then you can convert them and all will be well:
idx_fixed = pd.to_datetime(idx_str)
fixedser = newser
fixedser.index = idx_fixed
fig, axes = plt.subplots(1, 2, figsize=(10, 5))
fixedser.plot(ax=axes[0])
fixedser.dropna().plot(ax=axes[1])
produces output identical to the first code sample above.
Editing again:
To see the uneven spacing of the data, you can do this:
fig, axes = plt.subplots(1, 2, figsize=(10, 5))
fixedser.plot(ax=axes[0], marker='.', linewidth=0)
fixedser.dropna().plot(ax=axes[1], marker='.', linewidth=0)
Let me try this one from scratch. Does this solve your issue?
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sbn
import urllib
content = urllib.urlopen('https://eu-ats.googlecode.com/svn/trunk/data/new/eur-fix.csv')
df = pd.read_csv(content, header=0, index_col='Time')
df.index = pd.to_datetime(df.index, format='%Y-%m-%d %H:%M:%S:%f')
df.plot()
The thing is, you want to plot bid vs time. If you've put the times into your index then they become your x-axis for "free". If the time data is just another column, then you need to specify that you want to plot bid as the y-axis variable and time as the x-axis variable. So in your code above, even when you convert the time data to be datetime type, you were never instructing pandas/matplotlib to use those datetimes as the x-axis.

Categories