Scatter matrix with few variables with target in Y axis - python

I am trying to plot variable Vs SalePrice data. I tried pd.scatter_matrix but I am getting number of unnecessary plot with various combinations. I look for is SalePrice in Y axis and a scatter plot for each element from the data set. Here is the code I tried.
data_prep_num['Sales_test_data']=data_sales_price_old
att=['Sales_test_data','YearBuilt','LotArea','MSSubClass','BsmtFinSF1','TotalBsmtSF','1stFlrSF','2ndFlrSF','GrLivArea','GarageArea']
pd.scatter_matrix(data_prep_num[att],alpha=.4,figsize=(30,30))```

If you want to use pd.plotting.scatter_matrix but only want one of the rows (i.e. the Sales_test_data column), you can iterate over the plotting axes, and hide the combinations you don't want.
Assuming the SalePrice is the very first column (index 0):
import numpy as np
import matplotlib.pyplot as plt
axes = pd.plotting.scatter_matrix(data_prep_num[att], alpha=0.4, figsize=(30,30))
for i in range(np.shape(axes)[0]):
if i != 0:
for j in range(np.shape(axes)[1]):
axes[i,j].set_visible(False)
Note: This is obviously not super efficient when you start having lots of columns though.

Related

How to dynamically plot multiple subplots in Python?

I need to plot a variable number of plots (at least 1 but it isn't known the number max) and I couldn't come up with a way to dynamically create and assign subplots to the given graphs.
The code looks like this:
check = False
if "node_x_9" in names:
if "node_x_11" in names:
plt.plot(df["node_x_9"], df["node_x_11"])
check = True
elif "node_x_10" in names:
if "node_x_12" in names:
plt.plot(df["node_x_10", "node_x_12"])
check = True
if check:
plt.show()
I thought about presetting a number of subplots (e.g. plt.subplots(3, 3)) but I still could not come up with a way to assign the plots without bounding them to a given subplot position.
My idea would be to create a 2x1 plot if I have two subplots, 1x1 if I have one, 3x1 if I have 3 and so on and not letting any subplot space empty.
I've come across cases like this, you want to generate one plot per case, but don't know how many cases exist until you query the data on the day.
I used a square layout as an assumption (alter the below if you require a different aspect ratio) then count how many cases you have - find the integer square-root, which, plus one, will give you the integer side-length of a square that is guaranteed to fit your requirements.
Now, you can establish a matplotlib Gridspec object with the requisite width and height, referencing it by index to place your individual plots.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import gridspec
import random
# Create some random data with size=`random` number between 5 and 100
size = random.randint(5,100)
data_rows = pd.DataFrame([np.random.normal(1,5,25) for s in range(0,size)])
# Find the length of a (near) square based on the number of the data samples
side_length = int(len(data_rows)**(1/2))+1
print(side_length)
#Create a gridspec object based on the side_length in both x and y dimensions
gs=gridspec.GridSpec(side_length, side_length)
fig = plt.figure(figsize=(10,10))
# Using the index i, populate the gridpsec object with
# one plot per cell.
for i,row in data_rows.iterrows():
ax=fig.add_subplot(gs[i])
plt.bar(x=range(0,25),height=row)

How to plot multiple rows of a matrix along third dimension in a plot?

I have a matrix of dimension (3,25000), where each row is a speech signal of dimension (1,25000). I want to plot the rows of thee matrix along the third dimension in a 3D plot , something similar to this -
![1]https://ieeexplore.ieee.org/mediastore_new/IEEE/content/media/6221036/8642545/8249740/deb3-2787717-large.gif
please help
You can use mplot3d.
Think of the first row as a categorical axis denoting the "moods" (as in your link), the second row as the y-axis and the third row as the z-axis.
Note: It isn't clear from your question how the category is associated with your numpy array. However, you can't use a character array of the first row of moods in matplotlib to plot data so you have to retrieve the indices at those categories.
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
x,_=pd.factorize(<list of categories aligned with numpy array>[0]) #retrieving indices at categories
y=<your_array>[1]
z=<your_array>[2]
fig = plt.figure()
ax = fig.gca(projection='3d')
ax.plot(x, y, z)
.
.
.
Lastly, you can set your x-axis labels as:
ax.axes.set_xticks(x)
ax.axes.set_xticklabels(<insert list of your labels>)

Iteratively generating subplots in matplotlib by row and column - only final axes plotting

I am writing code to plot cross correlations of every time-series in my data against all others, with two for-loops to index the row and column position respectively (column loop nested within the row loop).
Currently only the final axes (i.e. bottom right corner) of the figure is displaying any data, and each iteration of the loop appears to be plotting on this axes. I am wondering if I have made any obvious mistakes with the order of commands in the nested for loops, or if I am misinterpreting the input arguments to matplotlib functions like subplots....
The code is as below:
fig, axes = plt.subplots(nrows=data_num, ncols=data_num, sharex=True, sharey=True)
for n in range(data_num): #row index
for p in range(data_num): # column index
x = data_df.iloc[:,n] #get data for ROI according to row index
print(x.head())
x = x.values
y = data_df.iloc[:,p] #get data for ROI according to column index
print(y.head())
y = y.values
axes[n,p] = plt.xcorr(x,y,normed=True) #axes [row,column] = cross correlation plot of above data
print(f'plotting at index [ {n} , {p}]')
You can use the flatten() command, e.g,:
fig, ax = plt.subplots(3, 4, figsize = (15, 20))
for m in np.arange(0, 12):
ax.flatten()[m].plot(stuff)
It's a, perhaps unfortunate, way that pyplot and matplotlib works: you have to create the plots on the respective axes, not assign the result from a pyplot.xcorr call to an axes. Thus: axis[n,p].xcorr(...). So the interface is suddenly somewhat more object-oriented than the usual direct pyplot calls.
All the plots ends up in just the last figure, because you are calling
plt.xcorr(x,y,normed=True)
It doesn't matter if you then assign the return value to the axes array elements, which you shouldn't, as that destroys the original axes array.
plt.xcorr will then plot all the data in the same plot on top of each other, because pyplot generally acts on the currently active axes, which is the last one created via plt.subplots().
That's for an explanation. Here's an example solution (with random data and a simple scatter plot):
import numpy as np
import matplotlib.pyplot as plt
data_num = 3
x = np.random.uniform(1, 10, size=(data_num, data_num, 20))
y = np.random.uniform(5, 20, size=(data_num, data_num, 20))
fig, axes = plt.subplots(nrows=data_num, ncols=data_num, sharex=True, sharey=True)
for n in range(data_num): #row index
for p in range(data_num): # column index
# Call `scatter` or any plot function on the
# respective `axes` object itself
axes[n,p].scatter(x[n,p], y[n,p])
print(f'plotting at index [ {n} , {p}]')
plt.savefig('figure.png')
and figure.png looks like (sorry, no colour or symbol variation, just bare bones scatter plots):

Matplotlib duplicated y axis

I am trying to plot two different lines from a same vector in python using Matplotlib. For this, I use an additional vector whose values on certain indices filter the general array to plot in a certain line. The code is:
import matplotlib.pyplot as plt
def visualize_method(general, changes):
'''Correctly plots the data to visualize reversals and trials'''
x = np.array([i for i in range(len(general))])
plt.plot(x[changes==0], general[changes==0],
x[changes==1], general[changes==1],
linestyle='--', marker='o')
plt.show()
When plotting the data, the result is:
As it can be observed, the y axis is "duplicated", how could I use the same y and x axis for this filtered plot?

Summing lines in pyplot

I have a pyplot figure with a few lines on it. I would like to be able to draw an extra line, which would be a sum of all others' values. The lines are not plotted against the same x values (they are visually shorter in the plot - see the image). The resulting line would be somewhat above all others.
One idea I have for it requires obtaining a line's y value in a specific x point. Is there such a function? Or does pyplot/matplotlib support summing lines' values?
Superposition it the short answer to your question: read this for more.
Example:
import matplotlib.pyplot as plt
import numpy as np
x = np.arange(100) #range of x axis
y1 = np.random.rand(100,1) #some random numbers
y2 = np.random.rand(100,1)
#this will only plot y1 value
plt.plot(x,y1)
plt.show()
#this will plot summation of two elements
plt.plot(x,y1+y2)
plt.show()
I took a second look at your question, what I saw is your y values have different length so adding them would not be the case as shown in example above. What you can do is create equal sized 4 lists, where non existing values in that list is zero, then you can apply super position to this (simply add all of them and then plot)
For the future generations: numpy.interp() was my solution to this problem.

Categories