how to plot two barh in one axis in pyqtgraph? - python

I need something like this:
demo data:
bottom10
Out[12]:
0 -9.823127e+08
1 -8.069270e+08
2 -6.030317e+08
3 -5.709379e+08
4 -5.224355e+08
5 -4.755464e+08
6 -4.095561e+08
7 -3.989287e+08
8 -3.885740e+08
9 -3.691114e+08
Name: amount, dtype: float64
top10
Out[13]:
0 9.360520e+08
1 9.078776e+08
2 6.603838e+08
3 4.967611e+08
4 4.409362e+08
5 3.914972e+08
6 3.547471e+08
7 3.538894e+08
8 3.368558e+08
9 3.189895e+08
Name: amount, dtype: float64
The same question for matplotlib is here:how to plot two barh in one axis
But there is not any ax.twiny() in pyqtgraph. Any other way?

I found a Widgets "BarGraphItem",which not written in offical documentation(PyQtGraph’s Widgets List). It can "rotate()" to make barh like matplotlib. It's not perfect but works!
import pyqtgraph as pg
import pandas as pd
import numpy as np
bottom10 = pd.DataFrame({'amount':-np.sort(np.random.rand(10))})
top10 = pd.DataFrame({'amount':np.sort(np.random.rand(10))[::-1]})
maxtick=max(top10.amount.max(),-bottom10.amount.min())*1.3
win1 = pg.plot()
axtop=pg.BarGraphItem(x=range(len(top10)),height=top10.amount,width=0.6,brush='r')
axtop.rotate(-90)
win1.addItem(axtop)
axbt=pg.BarGraphItem(x=range(len(top10)),height=-bottom10.amount,y0=maxtick+bottom10.amount,width=0.6,brush='g')
axbt.rotate(-90)
win1.addItem(axbt)

Related

Plot with Histogram an attribute from a dataframe

I have a dataframe with the weight and the number of measures of each user. The df looks like:
id_user
weight
number_of_measures
1
92.16
4
2
80.34
5
3
71.89
11
4
81.11
7
5
77.23
8
6
92.37
2
7
88.18
3
I would like to see an histogram with the attribute of the table (weight, but I want to do it for both cases) at the x-axis and the frequency in the y-axis.
Does anyone know how to do it with matplotlib?
Ok, it seems to be quite easy:
import pandas as pd
import matplotlib.pyplot as plt
hist = df.hist(bins=50)
plt.show()

panda DataFrame.value_counts().plot().bar() and DataFrame.value_counts().cumsum().plot() not using the same axis

I am trying to draw a frequency bar plot and a cumulative "ogive" in the same plot. If I draw them separately both are shown OK, but when shown in the same figure, the cumulative graphic is shown shifted. Below the code used.
df = pd.DataFrame({'Correctas': [4,6,5,4,7,2,8,3,5,6,9,6,6,7,5,5,8,10,4,8,3,6,9,5,11,5,12,7,7,5,4,6]});
df['Correctas'].value_counts(sort = False).plot.bar();
df['Correctas'].value_counts(sort = False).cumsum().plot();
plt.show()
The frequency data is
2 1
3 3
4 7
5 14
6 20
7 24
8 27
9 29
10 30
11 31
12 32
So the cumulative shall start from 2 and it starts from 4 on x axis.
image showing the error
This has to do with bar chart plotting categorical x-axis. Here is a quick fix:
df = pd.DataFrame({'Correctas': [4,6,5,4,7,2,8,3,5,6,9,6,6,7,5,5,8,10,4,8,3,6,9,5,11,5,12,7,7,5,4,6]});
df_counts = df['Correctas'].value_counts(sort = False)
df_counts.index = df_counts.index.astype('str')
df_counts.plot.bar(alpha=.8);
df_counts.cumsum().plot(color='k', kind='line');
plt.show();
Output:

Plotting three dimensions of categorical data in Python

My data has three categorical variables I'm trying to visualize:
City (one of five)
Occupation (one of four)
Blood type (one of four)
So far, I've succeeded in grouping the data in a way that I think will be easy to work with:
import numpy as np, pandas as pd
# Make data
cities = ['Tijuana','Las Vegas','Los Angeles','Anaheim','Atlantis']
occupations = ['Doctor','Lawyer','Engineer','Drone security officer']
bloodtypes = ['A','B','AB','O']
df = pd.DataFrame({'City': np.random.choice(cities,500),
'Occupation': np.random.choice(occupations,500),
'Blood Type':np.random.choice(bloodtypes,500)})
# You need to make a dummy column, otherwise the groupby returns an empty df
df['Dummy'] = np.ones(500)
# This is now what I'd like to plot
df.groupby(by=['City','Occupation','Blood Type']).count().unstack(level=1)
Returns:
Dummy
Occupation Doctor Drone security officer Engineer Lawyer
City Blood Type
Anaheim A 7 7 7 7
AB 6 10 8 5
B 2 10 4 2
O 4 3 3 6
Atlantis A 6 5 5 7
AB 12 7 7 10
B 7 4 7 3
O 7 4 6 4
Las Vegas A 8 4 8 5
AB 5 6 8 9
B 6 10 6 6
O 6 9 5 9
Los Angeles A 7 4 8 8
AB 9 8 8 8
B 3 6 4 1
O 9 11 11 9
Tijuana A 3 4 5 3
AB 9 5 5 7
B 3 6 4 9
O 3 5 5 8
My goal is to create something like the Seaborn swarmplot shown below, which comes from the Seaborn documentation. Seaborn applies jitter to the quantitative data so that you can see the individual data points and their hues:
With my data, I'd like to plot City on the x-axis and Occupation on the y-axis, applying jitter to each, and then hue by Blood type. However, sns.swarmplot requires one of the axes to be quantitative:
sns.swarmplot(data=df,x='City',y='Occupation',hue='Blood Type')
returns an error.
An acceptable alternative might be to create 20 categorical bar plots, one for each intersection of City and Occupation, which I would do by running a for loop over each category, but I can't imagine how I'd feed that to matplotlib subplots to get them in a 4x5 grid.
The most similar question I could find was in R, and the asker only wanted to indicate the most common value for the third variable, so I didn't get any good ideas from there.
Thanks for any help you can provide.
Alright, I got to work on the "acceptable alternative" today and I have found a solution using basically pure matplotlib (but I stuck the Seaborn styling on top of it, just because).
import numpy as np, pandas as pd
import matplotlib.pyplot as plt
from matplotlib.cm import get_cmap
from matplotlib.patches import Patch
import seaborn as sns
# Make data
cities = ['Tijuana','Las Vegas','Los Angeles','Anaheim','Atlantis']
occupations = ['Doctor','Lawyer','Engineer','Drone security officer']
bloodtypes = ['A','B','AB','O']
df = pd.DataFrame({'City': np.random.choice(cities,500),
'Occupation': np.random.choice(occupations,500),
'Blood Type':np.random.choice(bloodtypes,500)})
# Make a dummy column, otherwise the groupby returns an empty df
df['Dummy'] = np.ones(500)
# This is now what I'd like to plot
grouped = df.groupby(by=['City','Occupation','Blood Type']).count().unstack()
# List of blood types, to use later as categories in subplots
kinds = grouped.columns.levels[1]
# colors for bar graph
colors = [get_cmap('viridis')(v) for v in np.linspace(0,1,len(kinds))]
sns.set(context="talk")
nxplots = len(grouped.index.levels[0])
nyplots = len(grouped.index.levels[1])
fig, axes = plt.subplots(nxplots,
nyplots,
sharey=True,
sharex=True,
figsize=(10,12))
fig.suptitle('City, occupation, and blood type')
# plot the data
for a, b in enumerate(grouped.index.levels[0]):
for i, j in enumerate(grouped.index.levels[1]):
axes[a,i].bar(kinds,grouped.loc[b,j],color=colors)
axes[a,i].xaxis.set_ticks([])
axeslabels = fig.add_subplot(111, frameon=False)
plt.tick_params(labelcolor='none', top=False, bottom=False, left=False, right=False)
plt.grid(False)
axeslabels.set_ylabel('City',rotation='horizontal',y=1,weight="bold")
axeslabels.set_xlabel('Occupation',weight="bold")
# x- and y-axis labels
for i, j in enumerate(grouped.index.levels[1]):
axes[nyplots,i].set_xlabel(j)
for i, j in enumerate(grouped.index.levels[0]):
axes[i,0].set_ylabel(j)
# Tune this manually to make room for the legend
fig.subplots_adjust(right=0.82)
fig.legend([Patch(facecolor = i) for i in colors],
kinds,
title="Blood type",
loc="center right")
Returns this:
I'd appreciate any feedback, and I'd still love it if someone could provide the preferred solution.

How can I create a grouped bar chart with Matplotlib or Seaborn from a multi-indexed data frame?

I have a problem regarding how I can plot multi-indexed data in a single bar chart. I started with a DataFrame with three columns (artist, genre and miscl_count) and 195 rows. I then grouped the data by two of the columns, which resulted in the table below. My question is, how can I create a bar plot from this, so that the each group in "miscl_count" are shown as three separate bar plots across all five genres (i.e. a total amount of 3x5 bars)? I would also like the genre to identify what color a bar is assigned.
I know that there is unstacking, but I don't understand how I can get this to work with Matplotlib or Seaborn.
The head of the DataFrame, that I perform the groupby method on looks like this:
print(miscl_df.head())
artist miscl_count genre
0 band1 5 a
1 band2 6 b
2 band3 5 b
3 band4 4 b
4 band5 5 b
5 band6 5 c
miscl_df_group = miscl_df.groupby(['genre', 'miscl_count']).count()
print(miscl_df_group)
After group by, the output looks like this:
artist
miscl_count 4 5 6
genre
a 11 9 9
b 19 13 16
c 13 14 16
d 10 9 12
e 21 14 10
Just to make sure I made myself clear, the output should be shown as a single chart (and not as subplots)!
Working solution to be used on the grouped data:
miscl_df_group.unstack(level='genre').plot(kind='bar')
Alternatively, it can also be used this way:
miscl_df_group.unstack(level='miscl_count').plot(kind='bar')
with seaborn, no need to group the data, this is done under the hood:
import seaborn as sns
sns.barplot(x="artist", y="miscl_count", hue="genre", data=miscl_df)
(change the column names at will, depending on what you want)
# full working example
import numpy as np
import pandas as pd
import seaborn as sns
df = pd.DataFrame()
df["artist"] = list(map(lambda i: f"band{i}", np.random.randint(1,4,size=(100,))))
df["genre"] = list(map(lambda i: f"genre{i}", np.random.randint(1,6,size=(100,))))
df["count"] = np.random.randint(50,100,size=(100,))
# df
# count genre artist
# 0 97 genre9 band1
# 1 95 genre7 band1
# 2 65 genre3 band2
# 3 81 genre1 band1
# 4 58 genre10 band1
# .. ... ... ...
# 95 61 genre1 band2
# 96 53 genre9 band2
# 97 55 genre9 band1
# 98 94 genre1 band2
# 99 85 genre8 band1
# [100 rows x 3 columns]
sns.barplot(x="artist", y="count", hue="genre", data=df)

Using pandas series date as xtick label

I have this dataframe called 'dfArrivalDate' (with the first 11 rows shown)
arrival_date count
0 2013-06-08 9
1 2013-06-27 8
2 2013-03-06 8
3 2013-06-01 8
4 2013-06-28 6
5 2012-11-28 6
6 2013-06-11 5
7 2013-06-29 5
8 2013-06-09 4
9 2013-06-03 3
10 2013-05-31 3
sortedArrivalDate = transform.sort('arrival_date')
I wanted to plot them in a bar chart to see the count by arrival date. I called
sortedArrivalDate.plot(kind = 'bar') [![enter image description here][1]]
but i'm getting the index as the row ticks of my bar chart. I figured i need to use 'xticks'.
sortedArrivalDate.plot(kind = 'bar', xticks = sortedArrivalDate.arrival_date)
but I run into the error: TypeError: Cannot compare type 'Timestamp' with type 'float'
I tried a different approach.
fig, ax = plt.subplots()
ax.plot(sortedArrivalDate.arrival_date, sortedArrivalDate.count)
This time the error is ValueError: x and y must have same first dimension
I'm thinking this might just be an easy fix and since I don't have much experience coding in pandas and matplotlib, I might be missing a very simple thing here. Care to guide me in the right direction? thanks.
IIUC:
df = df.sort_values(by='arrival_date')
df.plot(x='arrival_date', y='count', kind='bar')

Categories