Plot two one seaborn plot from two dataframes - python

I try to plot two dataframes with seaborn into one figure.
given these test data:
df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
df['Name'] = 'Adam'
df.iloc[::5, 4] = 'Berta'
df.head(10)
A B C D Name
0 40 75 45 6 Berta
1 52 98 55 44 Adam
2 57 61 70 17 Adam
3 52 5 20 28 Adam
4 63 53 74 49 Adam
5 53 28 97 26 Berta
6 64 38 73 56 Adam
7 25 65 34 64 Adam
8 95 91 92 60 Adam
9 6 54 5 58 Adam
and
df1 = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
df1['Location'] = 'New York'
df1.iloc[::5, 4] = 'Tokyo'
df1.head(10)
A B C D Location
0 89 16 23 15 Tokyo
1 7 35 26 21 New York
2 64 94 51 61 New York
3 84 16 15 36 New York
4 55 62 0 2 New York
5 73 93 4 1 Tokyo
6 93 11 27 69 New York
7 14 52 50 45 New York
8 26 77 86 32 New York
9 21 10 68 11 New York
A)The first plot I would like to plot a relplot or scatterplot where both dataframes have the same x and y axes, but a different "hue". If I try:
sb.relplot(data=df, x='Name', y='C', hue="Name", height=8.27, aspect=11.7/8.27)
sb.relplot(data=df1, x='Location', y='C', hue="Location", height=8.27, aspect=11.7/8.27)
plt.show()
The latter plot will overwrite the first or creates a new one. Any ideas?
B) Now we have the same y-axes (let's say "amount"), but with different x-axes (strings).
I found this here: How to overlay two seaborn relplots? and it looks pretty good, but if I try:
fig, ax = plt.subplots()
sb.scatterplot(x="Name", y='A', data=df, hue="Name", ax=ax)
ax2 = ax.twinx()
sb.scatterplot(data=df1, x='Location', y='A', hue="Location", ax =ax2)
plt.show()
then the second scatterplot plots the values over the values of the first one overwriting the names for x. But I would like to add the second scatterplot on the right. Is this possible?
In my opinion it doesn't make sense to concatenate the two dataframes.
Thanks very much!

Having gathered all questions you asked I assume you either want to plot two subplots in one row for two DataFrames or plot two sets of data on one figure.
As for the 'A' plot:
fig, ax = plt.subplots(1, 2, figsize=(8, 4), sharey=True)
sb.scatterplot(data=df, x='Name', y='A', hue='Name',
ax=ax[0])
sb.scatterplot(data=df1, x='Location', y='A', hue='Location',
ax=ax[1])
plt.show()
Here I created both fig and ax using plt.subplots() so then I could locate each scatter plot on a separate subplot, indicating number of rows (1) and columns (2) and a shared Y-axis. Here's what I got (sorry for not bothering for legend location and other decorations):
As for the 'B' plot, if you would want everything on one plot, then you may try:
fig, ax = plt.subplots(1, 1, figsize=(8, 4))
sb.scatterplot(data=df, x='Name', y='A', hue='Name', palette=['blue', 'orange'],
ax=ax)
sb.scatterplot(data=df1, x='Location', y='A', hue='Location', palette=['red', 'green'],
ax=ax)
ax.set_xlabel('Name/Location')
plt.show()
Here I made a single subplot and assigned both scatter plots to it. Might require color mapping and renaming X-axis:

Related

Graph - How to change the color of the Y line from a certain day?

I have one question.
How Can I change the color of the Y line from a certain day?
For example I want to change the color to red the price from the 100th day:
I have the code:
df = pd.DataFrame.from_dict(cryptocurrency.prices)
df["time"] = pd.to_datetime(df["time"], unit="s")
df.set_index("time", inplace=False)
df["close"].plot(figsize=(12, 8), title=cryptocurrency.name, label="Price")
plt.legend()
plt.grid()
plt.show()
Thank you for help!
So suppose that I have a data frame as follows, which is called df:
median_house_value
1 176500.0
2 270500.0
3 330000.0
4 81700.0
5 67000.0
.. ...
95 96000.0
96 90600.0
97 121900.0
98 209400.0
99 200000.0
And with the following code, we have reached the plot you see:
df['median_house_value'].plot(figsize=(12, 8))
plt.legend()
plt.grid()
plt.show()
Output:
And I separate the desired part of the dataframe in this way:
df1 = df[50:100]
df1:
median_house_value
51 47500.0
52 156500.0
53 187500.0
54 191800.0
.. ...
98 209400.0
99 200000.0
And we have the new plot as follows:
df["median_house_value"].plot(figsize=(12, 8))
plt.plot(df1["median_house_value"], 'r')
plt.legend()
plt.grid()
plt.show()

how to move the legends in seaborn plots to the axis ticks

I have a pandas df as below and I have the following codes to plot in seaborn:
Group IP IB FP CP CS PB
0 1 100 20 50 120 40 90
1 1 100 20 50 120 40 80
2 1 100 20 50 120 40 78
3 1 100 20 50 120 40 70
4 1 100 20 50 120 40 62
... ... ... ... ... ... ... ...
95 18 150 40 50 150 60 50
96 19 200 20 70 150 40 72
97 19 200 20 70 150 40 64
98 19 200 20 70 150 40 74
99 19 200 20 70 150 40 76
df_m = pd.melt(df, id_vars='Group', value_vars=['PB'])
fig, ax = plt.subplots(figsize=(15,10))
sns.stripplot(x='variable', y='value', data=df_m, hue='Group', dodge=True, ax=ax, linewidth=1)
sns.boxplot(x='variable', y='value', data=df_m, hue='Group')
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles[:19], labels[:19], title='Group')
I can get the graph like this:
what I want to is to have the legends (groups 1-19) as the x axis ticks. I tried this but the graph becomes squeezed:
ax.set_xticks(range(19))
ax.set_xticklabels(['G1', 'G2', 'G3', 'G4', 'G5', 'G6', 'G7', 'G8', 'G9', 'G10',
'G11', 'G12', 'G13', 'G14', 'G15', 'G16', 'G17', 'G18', 'G19']);
as you are using df_m.variable as x-axis and the value is PB for all, the x-axis has just one entry. So, adding 19 ticks and labels is leading all your boxplots to be compressed around the first entry. So, check the below code, which uses x=Group and that leads to 19 points/ticks for the 19 plots instead of relying on hue. As you have 19 ticks, you can then easily change the tick labels. Although not mentioned, I have gone ahead and updated the legend from 1,2,3... to G1,G2,G3... If not required, you can remove that piece of code. Note that most of my data is based on what you provided and some random numbers, so your plot might look slightly different due to different data points. Hope this is what you are looking for...
df_m = pd.melt(df, id_vars='Group', value_vars=['PB'])
fig, ax = plt.subplots(figsize=(15,10))
sns.boxplot(x='Group', y='value', data=df_m, hue='Group', dodge=False, ax=ax)
sns.stripplot(x='Group', y='value', data=df_m, linewidth=1)
myLabels = ['G1', 'G2', 'G3', 'G4', 'G5', 'G6', 'G7', 'G8', 'G9', 'G10',
'G11', 'G12', 'G13', 'G14', 'G15', 'G16', 'G17', 'G18', 'G19']
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles[:19], myLabels, title='Group')
ax.set_xticks(range(19))
ax.set_xticklabels(myLabels)

Is there a way to use a for-loop to quickly create sublots in matplotlib and pandas?

I'm saving the daily stock price for several stocks in a Pandas Dataframe. I'm using python and Jupyter notebook.
Once saved, I'm using matplotlib to graph the prices to check the data.
The idea is to graph 9 stocks at at time in a 3 x 3 subplot.
When I want to check other stock tickers I have to mannualy change each ticker in each subplot, which takes a long time and seems inefficient.
¿Is there a way to do this with some sort of list and for loop?
Here is my current code. It works but it seems to long and hard to update. (Stock tickers are only examples from a vanguard model portfolio).
x = price_df.index
a = price_df["P_VOO"]
b = price_df["P_VGK"]
c = price_df["P_VPL"]
d = price_df["P_IEMG"]
e = price_df["P_MCHI"]
f = price_df["P_VNQ"]
g = price_df["P_GDX"]
h = price_df["P_BND"]
i = price_df["P_BNDX"]
# Plot a figure with various axes scales
fig = plt.figure(figsize=(15,10))
# Subplot 1
plt.subplot(331)
plt.plot(x, a)
plt.title("VOO")
plt.ylim([0,550])
plt.grid(True)
plt.subplot(332)
plt.plot(x, b)
plt.title("VGK")
plt.ylim([0,400])
plt.grid(True)
plt.subplot(333)
plt.plot(x, c)
plt.title('VPL')
plt.ylim([0,110])
plt.grid(True)
plt.subplot(334)
plt.plot(x, d)
plt.title('IEMG')
plt.ylim([0,250])
plt.grid(True)
plt.subplot(335)
plt.plot(x, e)
plt.title('MCHI')
plt.ylim([0,75])
plt.grid(True)
plt.subplot(336)
plt.plot(x, f)
plt.title('P_VNQ')
plt.ylim([0,55])
plt.grid(True)
plt.subplot(337)
plt.plot(x, g)
plt.title('P_GDX')
plt.ylim([0,8])
plt.grid(True)
plt.subplot(338)
plt.plot(x, h)
plt.title('P_BND')
plt.ylim([0,200])
plt.grid(True)
plt.subplot(339)
plt.plot(x, i)
plt.title('P_BNDX')
plt.ylim([0,350])
plt.grid(True)
plt.tight_layout()
Try with DataFrame.plot and enable subplots, set the layout and figsize:
axes = df.plot(subplots=True, title=df.columns.tolist(),
grid=True, layout=(3, 3), figsize=(15, 10))
plt.tight_layout()
plt.show()
Or use plt.subplots to set the layout then plot on those axes with DataFrame.plot:
# setup subplots
fig, axes = plt.subplots(nrows=3, ncols=3, figsize=(15, 10))
# Plot DataFrame on axes
df.plot(subplots=True, ax=axes, title=df.columns.tolist(), grid=True)
plt.tight_layout()
plt.show()
Sample Data and imports:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
np.random.seed(5)
df = pd.DataFrame(np.random.randint(10, 100, (10, 9)),
columns=list("ABCDEFGHI"))
df:
A B C D E F G H I
0 88 71 26 83 18 72 37 40 90
1 17 86 25 63 90 37 54 87 85
2 75 57 40 94 96 28 19 51 72
3 11 92 26 88 15 68 10 90 14
4 46 61 37 41 12 78 48 93 29
5 28 17 40 72 21 77 75 65 13
6 88 37 39 43 99 95 17 26 24
7 41 19 48 57 26 15 44 55 69
8 34 23 41 42 86 54 15 24 57
9 92 10 17 96 26 74 18 54 47
Does this implementation not work out in your case?
x = price_df.index
cols = ["P_VOO","P_VGK",...] #Populate before running
ylims = [[0,550],...] #Populate before running
# Plot a figure with various axes scales
fig = plt.figure(figsize=(15,10))
# Subplot 1
for i, (col, ylim) in enumerate(zip(cols, ylims)):
plt.subplot(331+i)
plt.plot(x, price_df[col])
plt.title(col.split('_')[1])
plt.ylim(ylim)
plt.grid(True)
Haven't run the code in my local, could have some minor bugs. But you get the general idea, right?

How can I iterate through a CSV file and plot it to a boxplot by each column representing a second in Python?

Say I have a csv file like so:
20 30 33 54 12 56
90 54 66 12 88 11
33 22 63 86 12 65
11 44 65 34 23 26
I want to create a boxplot where each column is a second, which is also the x-axis. The actual data to be on the y. So, 20, 90, 33, 11 will be on 1 second and on one plot and 30, 54, 22, 44 on 2 seconds and so on. Also, the csv file has more data than this that I am not sure how many data sets so I can't hard code anything in.
This is what I have so far:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('/user/Desktop/test.csv', header = None)
fig = plt.figure()
ax = fig.add_subplot()
plt.xlabel('Time (s)')
plt.ylabel('ms')
df.boxplot()
plt.show()
Try this:
axes = df.groupby(df.columns//10, axis=1).boxplot(subplots=True,
figsize=(12,18))
plt.xlabel('Time (s)')
plt.ylabel('ms')
plt.show()
Output:
If you want to set y limits of the subplots:
for ax in axes.flatten():
ax.set_ylim(0,100)

IPython notebook stops evaluating cells after plt.show()

I am using iPython to do some coding. When I open the notebook and run some codes by doing SHIFT+ENTER it runs. But after one or two times, it stops giving any output. Why is that. I have to shutdown the notebook again open it and then it runs for few times and same problem again.
Here is the code I have used.
Cell Toolbar:
Question 1: Rotational Invariance of PCA
I(1): Importing the data sets and plotting a scatter plot of the two.
In [1]:
# Channging the working directory
import os
os.getcwd()
path="/Users/file/"
os.chdir(path)
pwd=os.getcwd()
print(pwd)
# Importing the libraries
import pandas as pd
import numpy as np
import scipy as sp
# Mentioning the files to be imported
file=["2d-gaussian.csv","2d-gaussian-rotated.csv"]
# Importing the two csv files in pandas dataframes
XI=pd.read_csv(file[0],header=None)
XII=pd.read_csv(file[1],header=None)
#XI
XII
Out[5]:
0 1
0 1.372310 -2.111748
1 -0.397896 1.968246
2 0.336945 1.338646
3 1.983127 -2.462349
4 -0.846672 0.606716
5 0.582438 -0.645748
6 4.346416 -4.645564
7 0.830186 -0.599138
8 -2.460311 2.096945
9 -1.594642 2.828128
10 3.767641 -3.401645
11 0.455917 -0.224665
12 2.878315 -2.243932
13 -1.062223 0.142675
14 -0.698950 1.113589
15 -4.681619 4.289080
16 0.411498 -0.041293
17 0.276973 0.187699
18 1.500835 -0.284463
19 -0.387535 -0.265205
20 3.594708 -2.581400
21 2.263455 -2.660592
22 -1.686090 1.566998
23 1.381510 -0.944383
24 -0.085535 -1.697205
25 1.030609 -1.448967
26 3.647413 -3.322129
27 -3.474906 2.977695
28 -7.930797 8.506523
29 -0.931702 1.440784
... ... ...
70 4.433750 -2.515612
71 1.495646 -0.058674
72 -0.928938 0.605706
73 -0.890883 -0.005911
74 -2.245630 1.333171
75 -0.707405 0.121334
76 0.675536 -0.822801
77 1.975917 -1.757632
78 -1.239322 2.053495
79 -2.360047 1.842387
80 2.436710 -1.445505
81 0.348497 -0.635207
82 -1.423243 -0.017132
83 0.881054 -1.823523
84 0.052809 1.505141
85 -2.466735 2.406453
86 -0.499472 0.970673
87 4.489547 -4.443907
88 -2.000164 4.125330
89 1.833832 -1.611077
90 -0.944030 0.771001
91 -1.677884 1.920365
92 0.372318 -0.474329
93 -2.073669 2.020200
94 -0.131636 -0.844568
95 -1.011576 1.718216
96 -1.017175 -0.005438
97 5.677248 -4.572855
98 2.179323 -1.704361
99 1.029635 -0.420458
100 rows × 2 columns
The two raw csv files have been imported as data frames. Next we will concatenate both the dataframes into one dataframe to plot a combined scatter plot
In [6]:
# Joining two dataframes into one.
df_combined=pd.concat([XI,XII],axis=1,ignore_index=True)
df_combined
Out[6]:
0 1 2 3
0 2.463601 -0.522861 1.372310 -2.111748
1 -1.673115 1.110405 -0.397896 1.968246
2 -0.708310 1.184822 0.336945 1.338646
3 3.143426 -0.338861 1.983127 -2.462349
4 -1.027700 -0.169674 -0.846672 0.606716
5 0.868458 -0.044767 0.582438 -0.645748
6 6.358290 -0.211529 4.346416 -4.645564
7 1.010685 0.163375 0.830186 -0.599138
8 -3.222466 -0.256939 -2.460311 2.096945
9 -3.127371 0.872207 -1.594642 2.828128
10 5.069451 0.258798 3.767641 -3.401645
11 0.481244 0.163520 0.455917 -0.224665
12 3.621976 0.448577 2.878315 -2.243932
13 -0.851991 -0.650218 -1.062223 0.142675
14 -1.281659 0.293194 -0.698950 1.113589
15 -6.343242 -0.277567 -4.681619 4.289080
16 0.320172 0.261774 0.411498 -0.041293
17 0.063126 0.328573 0.276973 0.187699
18 1.262396 0.860105 1.500835 -0.284463
19 -0.086500 -0.461557 -0.387535 -0.265205
20 4.367168 0.716517 3.594708 -2.581400
21 3.481827 -0.280818 2.263455 -2.660592
22 -2.300280 -0.084211 -1.686090 1.566998
23 1.644655 0.309095 1.381510 -0.944383
24 1.139623 -1.260587 -0.085535 -1.697205
25 1.753325 -0.295824 1.030609 -1.448967
26 4.928210 0.230011 3.647413 -3.322129
27 -4.562678 -0.351581 -3.474906 2.977695
28 -11.622940 0.407100 -7.930797 8.506523
29 -1.677601 0.359976 -0.931702 1.440784
... ... ... ... ...
70 4.913941 1.356329 4.433750 -2.515612
71 1.099070 1.016093 1.495646 -0.058674
72 -1.085156 -0.228560 -0.928938 0.605706
73 -0.625769 -0.634129 -0.890883 -0.005911
74 -2.530594 -0.645206 -2.245630 1.333171
75 -0.586007 -0.414415 -0.707405 0.121334
76 1.059484 -0.104132 0.675536 -0.822801
77 2.640018 0.154351 1.975917 -1.757632
78 -2.328373 0.575707 -1.239322 2.053495
79 -2.971570 -0.366041 -2.360047 1.842387
80 2.745141 0.700888 2.436710 -1.445505
81 0.695584 -0.202735 0.348497 -0.635207
82 -0.994271 -1.018499 -1.423243 -0.017132
83 1.912425 -0.666426 0.881054 -1.823523
84 -1.026954 1.101637 0.052809 1.505141
85 -3.445865 -0.042626 -2.466735 2.406453
86 -1.039549 0.333189 -0.499472 0.970673
87 6.316906 0.032272 4.489547 -4.443907
88 -4.331379 1.502719 -2.000164 4.125330
89 2.435918 0.157511 1.833832 -1.611077
90 -1.212710 -0.122350 -0.944030 0.771001
91 -2.544347 0.171460 -1.677884 1.920365
92 0.598670 -0.072133 0.372318 -0.474329
93 -2.894802 -0.037809 -2.073669 2.020200
94 0.504119 -0.690281 -0.131636 -0.844568
95 -1.930254 0.499670 -1.011576 1.718216
96 -0.715406 -0.723096 -1.017175 -0.005438
97 7.247917 0.780923 5.677248 -4.572855
98 2.746180 0.335849 2.179323 -1.704361
99 1.025371 0.430754 1.029635 -0.420458
100 rows × 4 columns
Plotting two separate scatter plot of all the four columns onto one scatter diagram
In [ ]:
import matplotlib.pyplot as plt
# Fucntion for scatter plot
def scatter_plot():
# plots scatter for first two columns(Unrotated Gaussian data)
plt.scatter(df_combined.ix[:,0], df_combined.ix[:,1],color='red',marker='+')
# plots scatter for Rotated Gaussian data
plt.scatter(df_combined.ix[:,2], df_combined.ix[:,3] ,color='green', marker='x')
legend = plt.legend(loc='upper right')
# set ranges of x and y axes
plt.xlim([-12,12])
plt.ylim([-12,12])
plt.show()
# Function call
scatter_plot()
In [ ]:
def plot_me1():
# create figure and axes
fig = plt.figure()
# split the page into a 1x1 array of subplots and put me in the first one (111)
# (as a matter of fact, the only one)
ax = fig.add_subplot(111)
# plots scatter for x, y1
ax.scatter(df_combined.ix[:,0], df_combined.ix[:,1], color='red', marker='+', s=100)
# plots scatter for x, y2
ax.scatter(df_combined.ix[:,2], df_combined.ix[:,3], color='green', marker='x', s=100)
plt.xlim([-12,12])
plt.ylim([-12,12])
plt.show()
plot_me1()
In [ ]:
You should not use plt.show() in the notebook. This will open an external window that blocks the evaluation of your cell.
Instead begin your notebooks with %matplotlib inline or the cool new %matplotlib notebook (the latter is only possible with matplotlib >= 1.4.3 and ipython >= 3.0)
After the evaluation of each cell, the (still open) figure object is automatically shown in your notebook.
This minimal code example works in notebook. Note that it does not call plt.show()
%matplotlib inline
import matplotlib.pyplot as plt
x = [1,2,3]
y = [3,2,1]
_ = plt.plot(x,y)
%matplotlib inline simply displays the image.
%matplotlib notebook was added recently and offers many of the cool features (zooming, measuring,...) of the interactive backends:

Categories