I have a question with DataFram.plot(). I want a line graph in which the Years are in xlabel and Shipments are in ylabel, but the series are the Quarters column in the same graph. Please help me I'm new.
import pandas as pd
import matplotlib.pyplot as plt
df_ship = pd.read_csv('dmba/ApplianceShipments.csv') #importing the csv for the exercise
#Spliting the quarters and the years into two different columns
df_ship[['Quarter','Year']]= df_ship.Quarter.str.split('-',expand=True)
#Grouping the dataframe by Quarter
df_filt = df_ship.groupby(by=['Quarter'],axis=0)
#Creating the plot figure
fig1 = df_filt.plot(x='Year',y='Shipments')
As output I have 4 different line graphs
It would be helpful to have an extract of your dataset but does the code below do what you want?
import seaborn as sns
sns.lineplot(x='Year',y='Shipments', hue='Quarter', data=df_ship)
From Seaborn documentation:
hue : vector or key in data
Grouping variable that will produce lines with different colors. Can be either or numeric, although color mapping will behave
differently in latter case.
Related
This question already has answers here:
Modify the legend of pandas bar plot
(3 answers)
Closed 1 year ago.
I'm using this dataframe: https://www.kaggle.com/fivethirtyeight/fivethirtyeight-fandango-dataset it has several columns that I want to plot, tehy are ['RT_norm', 'RT_user_norm', 'Metacritic_norm', 'Metacritic_user_nom', 'IMDB_norm', 'Fandango_Stars']
When I do any kind of plot with Pandas, the labels are the column labels (Duh!)
df.head().plot.bar(x='FILM', y=marc, figsize=(10,8), grid=True)
plt.title('Calificaciones de Películas por Sitio')
plt.ylabel('Calificación')
plt.xlabel('Película')
Is there any chance I can change the labels to be something else? I dunno... instead of RT_norm I'd want Rotten Tomatoes Normalized, or the only correct answer is to change the column names in the dataframe? I tried using yticks and ylabel parameters, but they just don't work as I want.
I think you want to change the legend labels using plt.legend(labels=..) :
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame({'FILM':range(100),
'y1':np.random.uniform(0,1,100),
'y2':np.random.uniform(0,1,100)})
df.head().plot.bar(x='FILM', y=['y1','y2'], figsize=(10,8), grid=True)
plt.legend(labels=['bar1','bar2'])
Python newbie here. I'm looking at some daily weather data for a couple of cities over the course of a year. Each city has its own csv file. I'm interested in comparing the count of daily average temperatures between two cities in a bar graph, so I can see (for example) how often the average temperature in Seattle was 75 degrees (or 30 or 100) compared to Phoenix.
I'd like a bar graph with side-by-side bars with temperature on the x-axis and count on the y-axis. I've been able to get a bar graph of each city separately with this data, but don't know how to get both cities on the same bar chart with with a different color for each city. Seems like it should be pretty simple, but my hours of search haven't gotten me a good answer yet.
Suggestions please, oh wise stackoverflow mentors?
Here's what I've got so far:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
df = pd.read_csv("KSEA.csv")
df2 = pd.read_csv("KPHX.csv")
df["actual_mean_temp"].value_counts(sort=False).plot(kind ="bar")
df2["actual_mean_temp"].value_counts(sort = False).plot(kind = 'bar')
You can concat DataFrames, assigning city as a column, and then use histplot in seaborn:
import seaborn as sns
z = pd.concat([
df[['actual_mean_temp']].assign(city='KSEA'),
df2[['actual_mean_temp']].assign(city='KPHX'),
])
ax = sns.histplot(data=z, x='actual_mean_temp', hue='city',
multiple='dodge', binwidth=1)
Output:
This question already has an answer here:
seaborn two corner pairplot
(1 answer)
Closed 1 year ago.
I wanted to do a pairplot with two different dataframes data_up and data_low on the lower part and the upper part of the pairgrid. The two dataframes have both 4 columns, wich correspond to the variables.
Looking at Pairgrid, i did not found a way to give different data to each triangle.
e.g :
import numpy as np
import seaborn as sns
import pandas as pd
# Dummy data :
data_up = np.random.uniform(size=(100,4))
data_low = np.random.uniform(size=(100,4))
# The two pairplots i currently uses and want to mix :
sns.pairplot(pd.DataFrame(data_up))
sns.pairplot(pd.DataFrame(data_low))
How can i have only the upper triangle of the first one plotted witht he lower traingle of the second one ? On the diagonal i dont really care what's plotted. Maybe a qqplot between the two corresponding marginals could be nice, but i'll see later.
You could try to put all columns together in the dataframe, and then use x_vars=... to tell which columns to use for the x-direction. Similar for y.
import numpy as np
import seaborn as sns
import pandas as pd
# Dummy data :
data_up_down = np.random.uniform(size=(100,8))
df = pd.DataFrame(data_up_down)
# use columns 0..3 for the x and 4..7 for the y
sns.pairplot(df, x_vars=(0,1,2,3), y_vars=(4,5,6,7))
import matplotlib.pyplot as plt
plt.show()
I have the following code using countplot to plot the count of anomalies grouped by the factory:
import seaborn as sns
sns.countplot(x='factory', hue='anomaly', data=train_df)
This is working (with a very small image width however), but I need to plot a chart that shows the count of products grouped by factory and anomaly.
How can I do this?
The chart can be very large as there are dozens of anomalies and components, so probably I'll have to generate a larger image. What do you suggest?
Here's a small sample of the data:
product_id,factory,anomaly,component
1,1,AC1,W2
2,3,AB1,J1
3,2,AC3,L3
4,4,BA2,T2
5,3,BA2,T2
6,1,AA1,X2
7,4,AC2,J1
8,2,CA1,N1
9,2,AB3,J1
10,4,BB3,W1
11,2,AC3,C3
12,4,CA1,M1
13,3,BC3,Q1
14,2,AC2,O3
And here's the url to the complete: CSV
How the plot should look like:
I guess you would want to create a countplot like
import seaborn as sns
sns.countplot(x='anomaly', hue='factory', data=df)
plt.setp(ax.get_xticklabels(), rotation=90)
You could also create a pivot table of the factories and anomalies with the number of different components as values.
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv("data/component_factory.txt")
piv = df.pivot_table(values='component', index='anomaly', columns='factory',
aggfunc=lambda x: len(x.unique()))
piv.plot.bar(width=0.8)
plt.show()
I am trying to generate a grid of subplots based off of a Pandas groupby object. I would like each plot to be based off of two columns of data for one group of the groupby object. Fake data set:
C1,C2,C3,C4
1,12,125,25
2,13,25,25
3,15,98,25
4,12,77,25
5,15,889,25
6,13,56,25
7,12,256,25
8,12,158,25
9,13,158,25
10,15,1366,25
I have tried the following code:
import pandas as pd
import csv
import matplotlib as mpl
import matplotlib.pyplot as plt
import math
#Path to CSV File
path = "..\\fake_data.csv"
#Read CSV into pandas DataFrame
df = pd.read_csv(path)
#GroupBy C2
grouped = df.groupby('C2')
#Figure out number of rows needed for 2 column grid plot
#Also accounts for odd number of plots
nrows = int(math.ceil(len(grouped)/2.))
#Setup Subplots
fig, axs = plt.subplots(nrows,2)
for ax in axs.flatten():
for i,j in grouped:
j.plot(x='C1',y='C3', ax=ax)
plt.savefig("plot.png")
But it generates 4 identical subplots with all of the data plotted on each (see example output below):
I would like to do something like the following to fix this:
for i,j in grouped:
j.plot(x='C1',y='C3',ax=axs)
next(axs)
but I get this error
AttributeError: 'numpy.ndarray' object has no attribute 'get_figure'
I will have a dynamic number of groups in the groupby object I want to plot, and many more elements than the fake data I have provided. This is why I need an elegant, dynamic solution and each group data set plotted on a separate subplot.
Sounds like you want to iterate over the groups and the axes in parallel, so rather than having nested for loops (which iterates over all groups for each axis), you want something like this:
for (name, df), ax in zip(grouped, axs.flat):
df.plot(x='C1',y='C3', ax=ax)
You have the right idea in your second code snippet, but you're getting an error because axs is an array of axes, but plot expects just a single axis. So it should also work to replace next(axs) in your example with ax = axs.next() and change the argument of plot to ax=ax.