How do I print scatter plots in a pandas column? - python

I created a pandas dataframe using the below code. (See the extra code and plt.show() which is there to create a new plot every time or else we get one plot with all of them in the same plot)
%matplotlib inline
pd.DataFrame(
np.array([[
col,
plt.scatter(data[col], data['SalePrice']) and plt.show()]
for col in data.columns]),
columns=['Feature', 'Scatter Plot']
)
But what I get is this
And at the end of the dataframe, I get all the scatter plots separately.
What I want is, for those graphs to get printed inline, inside the columns, just like the other values.

Related

How to put two Pandas box plots next to each other? Or group them by variable?

I have two data frames (df1 and df2). Each have the same 10 variables with different values.
I created box plots of the variables in the data frames like so:
df1.boxplot()
df2.boxplot()
I get two graphs of 10 box plots next to each other for each variable. The actual output is the second graph, however, as obviously Python just runs the code in order.
Instead, I would either like these box plots to appear side by side OR ideally, I would like 10 graphs (one for each variable) comparing each variable by data frame (e.g. one graph for the first variable with two box plots in it, one for each data frame). Is that possible just using python library or do I have to use Matplotlib?
Thanks!
To get graphs, standard Python isn't enough. You'd need a graphical library such as matplotlib. Seaborn extends matplotlib to ease the creation of complex statistical plots. To work with Seaborn, the dataframes should be converted to long form (e.g. via pandas' melt) and then combined into one large dataframe.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
# suppose df1 and df2 are dataframes, each with the same 10 columns
df1 = pd.DataFrame({i: np.random.randn(100).cumsum() for i in 'abcdefghij'})
df2 = pd.DataFrame({i: np.random.randn(150).cumsum() for i in 'abcdefghij'})
# pd.melt converts the dataframe to long form, pd.concat combines them
df = pd.concat({'df1': df1.melt(), 'df2': df2.melt()}, names=['source', 'old_index'])
# convert the source index to a column, and reset the old index
df = df.reset_index(level=0).reset_index(drop=True)
sns.boxplot(data=df, x='variable', y='value', hue='source', palette='turbo')
This creates boxes for each of the original columns, comparing the two dataframes:
Optionally, you could create multiple subplots with that same information:
sns.catplot(data=df, kind='box', col='variable', y='value', x='source',
palette='turbo', height=3, aspect=0.5, col_wrap=5)
By default, the y-axes are shared. You can disable the sharing via sharey=False. Here is an example, which also removes the repeated x axes and creates a common legend:
g = sns.catplot(data=df, kind='box', col='variable', y='value', x='source', hue='source', dodge=False,
palette='Reds', height=3, aspect=0.5, col_wrap=5, sharey=False)
g.set(xlabel='', xticks=[]) # remove x labels and ticks
g.add_legend()
PS: If you simply want to put two pandas boxplots next to each other, you can create a figure with two subplots, and pass the axes to pandas. (Note that pandas plotting is just an interface towards matplotlib.)
fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(15, 5))
df1.boxplot(ax=ax1)
ax1.set_title('df1')
df2.boxplot(ax=ax2)
ax2.set_title('df2')
plt.tight_layout()
plt.show()

Creating multiple plots with for loop?

I have a dictionary of dataframes where the key is the name of each dataframe and the value is the dataframe itself.
I am looking to iterate through the dictionary and quickly plot the top 10 rows in each dataframe. Each dataframe would have its own plot. I've attempted this with the following:
for df in dfs:
data = dfs[df].head(n=10)
sns.barplot(data=data, x='x_col', y='y_col', color='indigo').set_title(df)
This works, but only returns a plot for the last dataframe in the iteration. Is there a way I can modify this so that I am also able to return the subsequent plots?
By default, seaborn.barplot() plots data on the current Axes. If you didn't specify the Axes to plot on, the latter will override the previous one. To overcome this, you can either create a new figure in each loop or plot on a different axis by specifying the ax argument.
import matplotlib.pyplot as plt
for df in dfs:
data = dfs[df].head(n=10)
plt.figure() # Create a new figure, current axes also changes.
sns.barplot(data=data, x='x_col', y='y_col', color='indigo').set_title(df)

Organizing Plots in Seaborn Pairplot

I've got a pandas dataframe with a bunch of values in and I want to plot each axis against each axis to get plots of every column against one another. Furthermore, I'm having an issue of the values of my y axis being so condensed that's it's unreadable. I've tried changing the height but have no clue how to "clean up" this axis.
Here is my plotting code:
import seaborn as sns
grid = sns.pairplot(df_merge, dropna = True, height=1.5)
Then here is the graph that has been plotted.

How to Plot Multiple Plots Using for Loop for Each Column in My Data-Frame

I want to know how can I iterate through each column and plot a separate box plot for the values of each column using for loop in python.
Provided that you have a list of your columns (say a list of list), you can use this:
import matplotlib.pyplot as plt
data = ...
#data = [column1, column2, column3]
for elem in data:
plt.plot(elem)
plt.show()
This code will create on plot for each column and will create a new one when you close the current.
I guess you don't want to plot everything on the same graph but you could do it by unindenting once the last line of my example plt.show().

How to plot certain row and column using panda dataframe?

I have a very simple data frame but I could not plot a line using a row and a column. Here is an image, I would like to plot a "line" that connects them.
enter image description here
I tried to plot it but x-axis disappeared. And I would like to swap those axes. I could not find an easy way to plot this simple thing.
Try:
import matplotlib.pyplot as plt
# Categories will be x axis, sexonds will be y
plt.plot(data["Categories"], data["Seconds"])
plt.show()
Matplotlib generates the axis dynamically, so if you want the labels of the x-axis to appear you'll have to increase the size of your plot.

Categories