This question already has an answer here:
seaborn two corner pairplot
(1 answer)
Closed 1 year ago.
I wanted to do a pairplot with two different dataframes data_up and data_low on the lower part and the upper part of the pairgrid. The two dataframes have both 4 columns, wich correspond to the variables.
Looking at Pairgrid, i did not found a way to give different data to each triangle.
e.g :
import numpy as np
import seaborn as sns
import pandas as pd
# Dummy data :
data_up = np.random.uniform(size=(100,4))
data_low = np.random.uniform(size=(100,4))
# The two pairplots i currently uses and want to mix :
sns.pairplot(pd.DataFrame(data_up))
sns.pairplot(pd.DataFrame(data_low))
How can i have only the upper triangle of the first one plotted witht he lower traingle of the second one ? On the diagonal i dont really care what's plotted. Maybe a qqplot between the two corresponding marginals could be nice, but i'll see later.
You could try to put all columns together in the dataframe, and then use x_vars=... to tell which columns to use for the x-direction. Similar for y.
import numpy as np
import seaborn as sns
import pandas as pd
# Dummy data :
data_up_down = np.random.uniform(size=(100,8))
df = pd.DataFrame(data_up_down)
# use columns 0..3 for the x and 4..7 for the y
sns.pairplot(df, x_vars=(0,1,2,3), y_vars=(4,5,6,7))
import matplotlib.pyplot as plt
plt.show()
Related
I have numerous sets of seasonal data that I am looking to show in a heatmap format. I am not worried about the magnitude of the values in the dataset but more the overall direction and any patterns that i can look at in more detail later. To do this I want to create a heatmap that only shows 2 colours (red for below zero and green for zero and above).
I can create a normal heatmap with seaborn but the normal colour maps do not have only 2 colours and I am not able to create one myself. Even if I could I am unable to set the parameters to reflect the criteria of below zero = red and zero+ = green.
I managed to create this simply by styling the dataframe but I was unable to export it as a .png because the table_criteria='matplotlib' option removes the formatting.
Below is an example of what I would like to create made from random data, could someone help or point me in the direction of a helpful Stackoverflow answer?
I have also included the code I used to style and export the dataframe.
Desired output - this is created with random data in an Excel spreadsheet
#Code to create a regular heatmap - can this be easily amended?
df_hm = pd.read_csv(filename+h)
pivot = df_hm.pivot_table(index='Year', columns='Month', values='delta', aggfunc='sum')
fig, ax = plt.subplots(figsize=(10,5))
ax.set_title('M1 '+h[:-7])
sns.heatmap(pivot, annot=True, fmt='.2f', cmap='RdYlGn')
plt.savefig(chartpath+h[:-7]+" M1.png", bbox_inches='tight')
plt.close()
#code used to export dataframe that loses format in the .png
import matplotlib.pyplot as plt
import dataframe_image as dfi
#pivot is the dateframe name
pivot = pd.DataFrame(np.random.randint(-100,100,size= (5, 12)),columns=list ('ABCDEFGHIJKL'))
styles = [dict(selector="caption", props=[("font-size", "120%"),("font-weight", "bold")])]
pivot = pivot.style.format(precision=2).highlight_between(left=-100000, right=-0.01, props='color:white;background-color:red').highlight_between(left=0, right= 100000, props='color:white;background-color:green').set_caption(title).set_table_styles(styles)
dfi.export(pivot, root+'testhm.png', table_conversion='matplotlib',chrome_path=None)
You can manually set cmap property to list of colors and if you want to annotate you can do it and it will show same value as it's not converted to -1 or 1.
import numpy as np
import seaborn as sns
arr = np.random.randn(10,10)
sns.heatmap(arr,cmap=["grey",'green'],annot=True,center=0)
# center will make it dividing point
Output:
PS. If you don't want color-bar you can pass cbar=False in `sns.heatmap)
Welcome to SO!
To achieve what you need, you just need to pass delta through the sign function, here's an example code:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
arr = np.random.randn(25,25)
sns.heatmap(np.sign(arr))
Which results in a binary heatmap, albeit one with a quite ugly colormap, still, you can fiddle around with Seaborn's colormaps in order to make it look like excel.
I have a question with DataFram.plot(). I want a line graph in which the Years are in xlabel and Shipments are in ylabel, but the series are the Quarters column in the same graph. Please help me I'm new.
import pandas as pd
import matplotlib.pyplot as plt
df_ship = pd.read_csv('dmba/ApplianceShipments.csv') #importing the csv for the exercise
#Spliting the quarters and the years into two different columns
df_ship[['Quarter','Year']]= df_ship.Quarter.str.split('-',expand=True)
#Grouping the dataframe by Quarter
df_filt = df_ship.groupby(by=['Quarter'],axis=0)
#Creating the plot figure
fig1 = df_filt.plot(x='Year',y='Shipments')
As output I have 4 different line graphs
It would be helpful to have an extract of your dataset but does the code below do what you want?
import seaborn as sns
sns.lineplot(x='Year',y='Shipments', hue='Quarter', data=df_ship)
From Seaborn documentation:
hue : vector or key in data
Grouping variable that will produce lines with different colors. Can be either or numeric, although color mapping will behave
differently in latter case.
This question already has answers here:
Modify the legend of pandas bar plot
(3 answers)
Closed 1 year ago.
I'm using this dataframe: https://www.kaggle.com/fivethirtyeight/fivethirtyeight-fandango-dataset it has several columns that I want to plot, tehy are ['RT_norm', 'RT_user_norm', 'Metacritic_norm', 'Metacritic_user_nom', 'IMDB_norm', 'Fandango_Stars']
When I do any kind of plot with Pandas, the labels are the column labels (Duh!)
df.head().plot.bar(x='FILM', y=marc, figsize=(10,8), grid=True)
plt.title('Calificaciones de Películas por Sitio')
plt.ylabel('Calificación')
plt.xlabel('Película')
Is there any chance I can change the labels to be something else? I dunno... instead of RT_norm I'd want Rotten Tomatoes Normalized, or the only correct answer is to change the column names in the dataframe? I tried using yticks and ylabel parameters, but they just don't work as I want.
I think you want to change the legend labels using plt.legend(labels=..) :
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame({'FILM':range(100),
'y1':np.random.uniform(0,1,100),
'y2':np.random.uniform(0,1,100)})
df.head().plot.bar(x='FILM', y=['y1','y2'], figsize=(10,8), grid=True)
plt.legend(labels=['bar1','bar2'])
Here is my problem
This is a sample of my two DataFrames (I have 30 columns in reality)
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
df = pd.DataFrame({"Marc":[6,0,8,-30,-15,0,-3],
"Elisa":[0,1,0,-1,0,-2,-4],
"John":[10,12,24,-20,7,-10,-30]})
df1 = pd.DataFrame({"Marc":[8,2,15,-12,-8,0,-35],
"Elisa":[4,5,7,0,0,1,-2],
"John":[20,32,44,-30,15,-10,-50]})
I would like to create a scatter plot with two different colors :
1 color if the scores of df1 are negative and one if they are positive, but I don't really know how to do it.
I already did that by using matplotlib
plt.scatter(df,df1);
And I also checked this link Link but the problem is that I have two Pandas Dataframe
and not numpy array as on this link. Hence the I can't use the c= np.sign(df.y) method.
I would like to keep Pandas DataFrame as I have many columns but I really stuck on that.
If anyone has a solution, you are welcome!
You can pass the color array in, but it seems to work with 1D array only:
# colors as stated
colors = np.where(df1<0, 'C0', 'C1')
# stack and ravel to turn into 1D
plt.scatter(df.stack(),df1.stack(), c=colors.ravel())
Output:
This question already has answers here:
Inconsistency when setting figure size using pandas plot method
(2 answers)
Closed 4 years ago.
In the two snippets below, where the only difference seems to be the datasource type (pd.Series vs pd.DataFrame), does plt.figure(num=None, figsize=(12, 3), dpi=80) have an effect in one case but not in the other when using pd.DataFrame.plot?
Snippet 1 - Adjusting plot size when data is a pandas Series
# Imports
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
# data
np.random.seed(123)
df = pd.Series(np.random.randn(10000),index=pd.date_range('1/1/2000', periods=10000)).cumsum()
print(type(df))
# plot
plt.figure(num=None, figsize=(12, 3), dpi=80)
ax = df.plot()
plt.show()
Output 1
Snippet 2 - Now the data source is a pandas Dataframe
# imports
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
# data
np.random.seed(123)
dfx = pd.Series(np.random.randn(100),index=pd.date_range('1/1/2000', periods=100)).cumsum()
dfy = pd.Series(np.random.randn(100),index=pd.date_range('1/1/2000', periods=100)).cumsum()
df = pd.concat([dfx, dfy], axis = 1)
print(type(df))
# plot
plt.figure(num=None, figsize=(12, 3), dpi=80)
ax = df.plot()
plt.show()
The only difference here seems to be the type of the datasource. Why would that have something to say for the matplotlib output?
It seems that pd.Dataframe.plot() works a bit differently from pd.Series.plot(). Since the dataframe might have any number of columns, which might require subplots, different axes, etc., Pandas defaults to creating a new figure. The way around this is to feed the arguments directly to the plot call, ie, df.plot(figsize=(12, 3)) (dpi isn't accepted as a keyword-argument, unfortunately). You can read more about in this great answer:
In the first case, you create a matplotlib figure via fig =
plt.figure(figsize=(10,4)) and then plot a single column DataFrame.
Now the internal logic of pandas plot function is to check if there is
already a figure present in the matplotlib state machine, and if so,
use it's current axes to plot the columns values to it. This works as
expected.
However in the second case, the data consists of two columns. There
are several options how to handle such a plot, including using
different subplots with shared or non-shared axes etc. In order for
pandas to be able to apply any of those possible requirements, it will
by default create a new figure to which it can add the axes to plot
to. The new figure will not know about the already existing figure and
its size, but rather have the default size, unless you specify the
figsize argument.