Matplotlib bar plot with two different colors

Matplotlib bar plot with two different colors - python

In a Dataframe column with 10 values, I am trying to create a bar plot with the first 5 being blue, and the remaining 5 red.
In [1]: import matplotlib.pyplot as plt
In [2]: import pandas as pd
In [3]: df = pd.DataFrame({"x": range(10)})
In [4]: ax = df.iloc[:5].plot(kind="bar", color="blue")
In [5]: plt.show()
In [6]: df.iloc[5:].plot(kind="bar", color="grey", ax=ax)
Out[6]: <Axes: >
In [7]: ax.bar(df.index[5:].values, df.iloc[5:, 0].values, color="grey")
Out[7]: <BarContainer object of 5 artists>
[5] Shows the plot of the first 5 blue elements. I tried two ways of adding the second part, [6] and [7], but that was not successful. I tried using plt.show() which did not show anything. A final ax.figure.savefig("fig.png") shows a barplot of the last 5 elements in grey.
How do I get them into one figure?
I tried using the above code to generate the desired figure. Tried putting the same commands into a jupyter notebook yielding the same results.

The output of what you want is unclear. Do you want to get something like:
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame({'x': range(1, 11)})
ax = df['x'].plot(kind='bar', color=['blue']*5+['red']*5)
plt.show()

Related

python plotting multiple bars

I've been trying to do it for several hours and I have a mistake every time. I want to create 3 bar plots in one graph. The y-axis is to be between 0 and 1000.
The end result should be this
Thats my code:
import matplotlib.pyplot as plt
import numpy as np
import csv
df = pd.read_csv('razemKM.csv')
dfn = pd.read_csv('razemNPM.csv')
print(df)
y=[0,1000]
a=(df["srednia"]-df["odchStand"])
a1=df["srednia"]
a2=(df["srednia"]+df["odchStand"])
plt.bar(y,a,width=0.1,color='r')
plt.bar(y,a1,width=0.1,color='g')
plt.bar(y,a2,width=0.1,color='y')
plt.show()

You can use pandas plot function:
df['Sum'] = df["srednia"]+df["odchStand"]
df['Dif'] = df["srednia"]-df["odchStand"]
df.plot.bar(y=['Diff','srednia', 'Sum'],width=0.1)
plt.show()

Plot all pandas dataframe columns separately

I have a pandas dataframe who just has numeric columns, and I am trying to create a separate histogram for all the features
ind group people value value_50
1 1 5 100 1
1 2 2 90 1
2 1 10 80 1
2 2 20 40 0
3 1 7 10 0
3 2 23 30 0
but in my real life data there are 50+ columns, how can I create a separate plot for all of them
I have tried
df.plot.hist( subplots = True, grid = True)
It gave me an overlapping unclear plot.
how can I arrange them using pandas subplots = True. Below example can help me to get graphs in (2,2) grid for four columns. But its a long method for all 50 columns
fig, [(ax1,ax2),(ax3,ax4)] = plt.subplots(2,2, figsize = (20,10))

Pandas subplots=True will arange the axes in a single column.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame(np.random.rand(7,20))
df.plot(subplots=True)
plt.tight_layout()
plt.show()
Here, tight_layout isn't applied, because the figure is too small to arange the axes nicely. One can use a bigger figure (figsize=(...)) though.
In order to have the axes on a grid, one can use the layout parameter, e.g.
df.plot(subplots=True, layout=(4,5))
The same can be achieved if creating the axes via plt.subplots()
fig, axes = plt.subplots(nrows=4, ncols=5)
df.plot(subplots=True, ax=axes)

If you want to plot them separately (which is why I ended up here), you can use
for i in df.columns:
plt.figure()
plt.hist(df[i])

An alternative for this task can be using the "hist" method with hyperparameter "layout". Example using part of the code provided by #ImportanceOfBeingErnest:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame(np.random.rand(7,20))
df.hist(layout=(5,4), figsize=(15,10))
plt.show()

Using pandas.DataFrame I would suggest using pandas.DataFrame.apply. With a custom function, in this example plot(), you can print and save each figure seperately.
def plot(col):
fig, ax = plt.subplots()
ax.plot(col)
plt.show()
df.apply(plot)

While not asked for in the question I thought I'd add that using the x parameter to plot would allow you to specify a column for the x axis data.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame(np.random.rand(7,20),columns=list('abcdefghijklmnopqrst'))
df.plot(x='a',subplots=True, layout=(4,5))
plt.tight_layout()
plt.show()
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.plot.html

Bar graph values missing matplotlib

My code-
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
plt.figure()
cols = ['hops','frequency']
data = [[-13,1],[-8,1],[-5,1],[0,2],[2,1],[4,1],[7,1]]
data = np.asarray(data)
indices = np.arange(0,len(data))
plot_data = pd.DataFrame(data, index=indices, columns=cols)
plt.bar(plot_data['hops'].tolist(),plot_data['frequency'].tolist(),width=0.8)
plt.xlim([-20,20])
plt.ylim([0,20])
plt.ylabel('Frequency')
plt.xlabel('Hops')
Output-
My requirements-
I want the graph to have the scale X axis-[-20,20],Y axis [0,18] and the bars should be labelled like in this case the 1st bar should be numbered 1 in this case and so on.

From your comment above, I am assuming this is what you want. You just need to specify the positions at which you want the x-tick labels.
xtcks = [-20, 20]
plt.xticks(np.insert(xtcks, 1, data[:, 0]))
plt.yticks([0, 18])

pandas: plot multiple columns of a timeseries with labels (example from pandas documentation)

I'm trying to recreate the example of plot multiple columns of a timeseries with labels as shown in the pandas documentation here: http://pandas.pydata.org/pandas-docs/dev/visualization.html#visualization-basic (second graph)
Here's my code:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
ts = pd.DataFrame(np.random.randn(1000, 4), index=pd.date_range('1/1/2000', periods=1000), columns=list('ABCD'))
ts = ts.cumsum()
fig = plt.figure()
print ts.head()
ts.plot()
fig.savefig("x.png")
the text output seems ok:
A B C D
2000-01-01 1.547838 -0.571000 -1.780852 0.559283
2000-01-02 1.165659 -1.859979 -0.490980 0.796502
2000-01-03 0.786416 -2.543543 -0.903669 1.117328
2000-01-04 1.640174 -3.756809 -1.862188 0.466236
2000-01-05 2.119575 -4.590741 -1.055563 1.004607
but x.png is always empty.
If I plot just one column:
ts['A'].plot()
I do get a result.
Is there a way to debug this, to find out what's going wrong here?

The reason you don't get a result is because you are not saving the 'correct' figure: you are making a figure with plt.figure(), but pandas does not plot on the current figure, and will create a new one.
If you do:
ts.plot()
fig = plt.gcf() # get current figure
fig.savefig("x.png")
I get the correct output. When plotting a Series, it does use the current axis if no axis is passed.
But it seems that the pandas docs are not fully correct on that account (as they use the plt.figure()), I reported an issue for that: https://github.com/pydata/pandas/issues/8776
Another option is to provide an axes object using the ax argument:
fig = plt.figure()
ts.plot(ax=plt.gca()) # 'get current axis'
fig.savefig("x.png")
or slightly cleaner (IMO):
fig, ax = plt.subplots()
ts.plot(ax=ax)
fig.savefig("x.png")

python pandas plot with uneven timeseries index (with count evenly distributed)

My dataframe has uneven time index.
how could I find a way to plot the data, and local the index automatically? I searched here, and I know I can plot something like
e.plot()
but the time index (x axis) will be even interval, for example per 5 minutes.
if I have to 100 data in first 5 minutes and 6 data for the second 5 minutes, how do I plot
with number of data evenly. and locate the right timestamp on x axis.
here's even count, but I don't know how to add time index.
plot(e['Bid'].values)
example of data format as requested
Time,Bid
2014-03-05 21:56:05:924300,1.37275
2014-03-05 21:56:05:924351,1.37272
2014-03-05 21:56:06:421906,1.37275
2014-03-05 21:56:06:421950,1.37272
2014-03-05 21:56:06:920539,1.37275
2014-03-05 21:56:06:920580,1.37272
2014-03-05 21:56:09:071981,1.37275
2014-03-05 21:56:09:072019,1.37272
and here's the link
http://code.google.com/p/eu-ats/source/browse/trunk/data/new/eur-fix.csv
here's the code, I used to plot
import numpy as np
import pandas as pd
import datetime as dt
e = pd.read_csv("data/ecb/eur.csv", dtype={'Time':object})
e.Time = pd.to_datetime(e.Time, format='%Y-%m-%d %H:%M:%S:%f')
e.plot()
f = e.copy()
f.index = f.Time
x = [str(s)[:-7] for s in f.index]
ff = f.set_index(pd.Series(x))
ff.index.name = 'Time'
ff.plot()
Update:
I added two new plots for comparison to clarify the issue. Now I tried brute force to convert timestamp index back to string, and plot string as x axis. the format easily got messed up. it seems hard to customize location of x label.

Ok, it seems like what you're after is that you want to move around the x-tick locations so that there are an equal number of points between each tick. And you'd like to have the grid drawn on these appropriately-located ticks. Do I have that right?
If so:
import pandas as pd
import urllib
import matplotlib.pyplot as plt
import seaborn as sbn
content = urllib.urlopen('https://eu-ats.googlecode.com/svn/trunk/data/new/eur-fix.csv')
df = pd.read_csv(content, header=0)
df['Time'] = pd.to_datetime(df['Time'], format='%Y-%m-%d %H:%M:%S:%f')
every30 = df.loc[df.index % 30 == 0, 'Time'].values
fig, ax = plt.subplots(1, 1, figsize=(9, 5))
df.plot(x='Time', y='Bid', ax=ax)
ax.set_xticks(every30)

I have tried to reproduce your issue, but I can't seem to. Can you have a look at this example and see how your situation differs?
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sbn
np.random.seed(0)
idx = pd.date_range('11:00', '21:30', freq='1min')
ser = pd.Series(data=np.random.randn(len(idx)), index=idx)
ser = ser.cumsum()
for i in range(20):
for j in range(8):
ser.iloc[10*i +j] = np.nan
fig, axes = plt.subplots(1, 2, figsize=(10, 5))
ser.plot(ax=axes[0])
ser.dropna().plot(ax=axes[1])
gives the following two plots:
There are a couple differences between the graphs. The one on the left doesn't connect the non-continuous bits of data. And it lacks vertical gridlines. But both seem to respect the actual index of the data. Can you show an example of your e series? What is the exact format of its index? Is it a datetime_index or is it just text?
Edit:
Playing with this, my guess is that your index is actually just text. If I continue from above with:
idx_str = [str(x) for x in idx]
newser = ser
newser.index = idx_str
fig, axes = plt.subplots(1, 2, figsize=(10, 5))
newser.plot(ax=axes[0])
newser.dropna().plot(ax=axes[1])
then I get something like your problem:
More edit:
If this is in fact your issue (the index is a bunch of strings, not really a bunch of timestamps) then you can convert them and all will be well:
idx_fixed = pd.to_datetime(idx_str)
fixedser = newser
fixedser.index = idx_fixed
fig, axes = plt.subplots(1, 2, figsize=(10, 5))
fixedser.plot(ax=axes[0])
fixedser.dropna().plot(ax=axes[1])
produces output identical to the first code sample above.
Editing again:
To see the uneven spacing of the data, you can do this:
fig, axes = plt.subplots(1, 2, figsize=(10, 5))
fixedser.plot(ax=axes[0], marker='.', linewidth=0)
fixedser.dropna().plot(ax=axes[1], marker='.', linewidth=0)

Let me try this one from scratch. Does this solve your issue?
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sbn
import urllib
content = urllib.urlopen('https://eu-ats.googlecode.com/svn/trunk/data/new/eur-fix.csv')
df = pd.read_csv(content, header=0, index_col='Time')
df.index = pd.to_datetime(df.index, format='%Y-%m-%d %H:%M:%S:%f')
df.plot()
The thing is, you want to plot bid vs time. If you've put the times into your index then they become your x-axis for "free". If the time data is just another column, then you need to specify that you want to plot bid as the y-axis variable and time as the x-axis variable. So in your code above, even when you convert the time data to be datetime type, you were never instructing pandas/matplotlib to use those datetimes as the x-axis.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Matplotlib bar plot with two different colors - python

The output of what you want is unclear. Do you want to get something like: import matplotlib.pyplot as plt import pandas as pd df = pd.DataFrame({'x': range(1, 11)}) ax = df['x'].plot(kind='bar', color=['blue']5+['red']5) plt.show()

Related

python plotting multiple bars

Plot all pandas dataframe columns separately

Bar graph values missing matplotlib

pandas: plot multiple columns of a timeseries with labels (example from pandas documentation)

python pandas plot with uneven timeseries index (with count evenly distributed)

Categories

Resources

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Matplotlib bar plot with two different colors - python

The output of what you want is unclear. Do you want to get something like: import matplotlib.pyplot as plt import pandas as pd df = pd.DataFrame({'x': range(1, 11)}) ax = df['x'].plot(kind='bar', color=['blue']*5+['red']*5) plt.show()

Related

python plotting multiple bars

Plot all pandas dataframe columns separately

Bar graph values missing matplotlib

pandas: plot multiple columns of a timeseries with labels (example from pandas documentation)

python pandas plot with uneven timeseries index (with count evenly distributed)

Categories

Resources

The output of what you want is unclear. Do you want to get something like: import matplotlib.pyplot as plt import pandas as pd df = pd.DataFrame({'x': range(1, 11)}) ax = df['x'].plot(kind='bar', color=['blue']5+['red']5) plt.show()