IPython notebook stops evaluating cells after plt.show() - python

I am using iPython to do some coding. When I open the notebook and run some codes by doing SHIFT+ENTER it runs. But after one or two times, it stops giving any output. Why is that. I have to shutdown the notebook again open it and then it runs for few times and same problem again.
Here is the code I have used.
Cell Toolbar:
Question 1: Rotational Invariance of PCA
I(1): Importing the data sets and plotting a scatter plot of the two.
In [1]:
# Channging the working directory
import os
os.getcwd()
path="/Users/file/"
os.chdir(path)
pwd=os.getcwd()
print(pwd)
# Importing the libraries
import pandas as pd
import numpy as np
import scipy as sp
# Mentioning the files to be imported
file=["2d-gaussian.csv","2d-gaussian-rotated.csv"]
# Importing the two csv files in pandas dataframes
XI=pd.read_csv(file[0],header=None)
XII=pd.read_csv(file[1],header=None)
#XI
XII
Out[5]:
0 1
0 1.372310 -2.111748
1 -0.397896 1.968246
2 0.336945 1.338646
3 1.983127 -2.462349
4 -0.846672 0.606716
5 0.582438 -0.645748
6 4.346416 -4.645564
7 0.830186 -0.599138
8 -2.460311 2.096945
9 -1.594642 2.828128
10 3.767641 -3.401645
11 0.455917 -0.224665
12 2.878315 -2.243932
13 -1.062223 0.142675
14 -0.698950 1.113589
15 -4.681619 4.289080
16 0.411498 -0.041293
17 0.276973 0.187699
18 1.500835 -0.284463
19 -0.387535 -0.265205
20 3.594708 -2.581400
21 2.263455 -2.660592
22 -1.686090 1.566998
23 1.381510 -0.944383
24 -0.085535 -1.697205
25 1.030609 -1.448967
26 3.647413 -3.322129
27 -3.474906 2.977695
28 -7.930797 8.506523
29 -0.931702 1.440784
... ... ...
70 4.433750 -2.515612
71 1.495646 -0.058674
72 -0.928938 0.605706
73 -0.890883 -0.005911
74 -2.245630 1.333171
75 -0.707405 0.121334
76 0.675536 -0.822801
77 1.975917 -1.757632
78 -1.239322 2.053495
79 -2.360047 1.842387
80 2.436710 -1.445505
81 0.348497 -0.635207
82 -1.423243 -0.017132
83 0.881054 -1.823523
84 0.052809 1.505141
85 -2.466735 2.406453
86 -0.499472 0.970673
87 4.489547 -4.443907
88 -2.000164 4.125330
89 1.833832 -1.611077
90 -0.944030 0.771001
91 -1.677884 1.920365
92 0.372318 -0.474329
93 -2.073669 2.020200
94 -0.131636 -0.844568
95 -1.011576 1.718216
96 -1.017175 -0.005438
97 5.677248 -4.572855
98 2.179323 -1.704361
99 1.029635 -0.420458
100 rows × 2 columns
The two raw csv files have been imported as data frames. Next we will concatenate both the dataframes into one dataframe to plot a combined scatter plot
In [6]:
# Joining two dataframes into one.
df_combined=pd.concat([XI,XII],axis=1,ignore_index=True)
df_combined
Out[6]:
0 1 2 3
0 2.463601 -0.522861 1.372310 -2.111748
1 -1.673115 1.110405 -0.397896 1.968246
2 -0.708310 1.184822 0.336945 1.338646
3 3.143426 -0.338861 1.983127 -2.462349
4 -1.027700 -0.169674 -0.846672 0.606716
5 0.868458 -0.044767 0.582438 -0.645748
6 6.358290 -0.211529 4.346416 -4.645564
7 1.010685 0.163375 0.830186 -0.599138
8 -3.222466 -0.256939 -2.460311 2.096945
9 -3.127371 0.872207 -1.594642 2.828128
10 5.069451 0.258798 3.767641 -3.401645
11 0.481244 0.163520 0.455917 -0.224665
12 3.621976 0.448577 2.878315 -2.243932
13 -0.851991 -0.650218 -1.062223 0.142675
14 -1.281659 0.293194 -0.698950 1.113589
15 -6.343242 -0.277567 -4.681619 4.289080
16 0.320172 0.261774 0.411498 -0.041293
17 0.063126 0.328573 0.276973 0.187699
18 1.262396 0.860105 1.500835 -0.284463
19 -0.086500 -0.461557 -0.387535 -0.265205
20 4.367168 0.716517 3.594708 -2.581400
21 3.481827 -0.280818 2.263455 -2.660592
22 -2.300280 -0.084211 -1.686090 1.566998
23 1.644655 0.309095 1.381510 -0.944383
24 1.139623 -1.260587 -0.085535 -1.697205
25 1.753325 -0.295824 1.030609 -1.448967
26 4.928210 0.230011 3.647413 -3.322129
27 -4.562678 -0.351581 -3.474906 2.977695
28 -11.622940 0.407100 -7.930797 8.506523
29 -1.677601 0.359976 -0.931702 1.440784
... ... ... ... ...
70 4.913941 1.356329 4.433750 -2.515612
71 1.099070 1.016093 1.495646 -0.058674
72 -1.085156 -0.228560 -0.928938 0.605706
73 -0.625769 -0.634129 -0.890883 -0.005911
74 -2.530594 -0.645206 -2.245630 1.333171
75 -0.586007 -0.414415 -0.707405 0.121334
76 1.059484 -0.104132 0.675536 -0.822801
77 2.640018 0.154351 1.975917 -1.757632
78 -2.328373 0.575707 -1.239322 2.053495
79 -2.971570 -0.366041 -2.360047 1.842387
80 2.745141 0.700888 2.436710 -1.445505
81 0.695584 -0.202735 0.348497 -0.635207
82 -0.994271 -1.018499 -1.423243 -0.017132
83 1.912425 -0.666426 0.881054 -1.823523
84 -1.026954 1.101637 0.052809 1.505141
85 -3.445865 -0.042626 -2.466735 2.406453
86 -1.039549 0.333189 -0.499472 0.970673
87 6.316906 0.032272 4.489547 -4.443907
88 -4.331379 1.502719 -2.000164 4.125330
89 2.435918 0.157511 1.833832 -1.611077
90 -1.212710 -0.122350 -0.944030 0.771001
91 -2.544347 0.171460 -1.677884 1.920365
92 0.598670 -0.072133 0.372318 -0.474329
93 -2.894802 -0.037809 -2.073669 2.020200
94 0.504119 -0.690281 -0.131636 -0.844568
95 -1.930254 0.499670 -1.011576 1.718216
96 -0.715406 -0.723096 -1.017175 -0.005438
97 7.247917 0.780923 5.677248 -4.572855
98 2.746180 0.335849 2.179323 -1.704361
99 1.025371 0.430754 1.029635 -0.420458
100 rows × 4 columns
Plotting two separate scatter plot of all the four columns onto one scatter diagram
In [ ]:
import matplotlib.pyplot as plt
# Fucntion for scatter plot
def scatter_plot():
# plots scatter for first two columns(Unrotated Gaussian data)
plt.scatter(df_combined.ix[:,0], df_combined.ix[:,1],color='red',marker='+')
# plots scatter for Rotated Gaussian data
plt.scatter(df_combined.ix[:,2], df_combined.ix[:,3] ,color='green', marker='x')
legend = plt.legend(loc='upper right')
# set ranges of x and y axes
plt.xlim([-12,12])
plt.ylim([-12,12])
plt.show()
# Function call
scatter_plot()
In [ ]:
def plot_me1():
# create figure and axes
fig = plt.figure()
# split the page into a 1x1 array of subplots and put me in the first one (111)
# (as a matter of fact, the only one)
ax = fig.add_subplot(111)
# plots scatter for x, y1
ax.scatter(df_combined.ix[:,0], df_combined.ix[:,1], color='red', marker='+', s=100)
# plots scatter for x, y2
ax.scatter(df_combined.ix[:,2], df_combined.ix[:,3], color='green', marker='x', s=100)
plt.xlim([-12,12])
plt.ylim([-12,12])
plt.show()
plot_me1()
In [ ]:

You should not use plt.show() in the notebook. This will open an external window that blocks the evaluation of your cell.
Instead begin your notebooks with %matplotlib inline or the cool new %matplotlib notebook (the latter is only possible with matplotlib >= 1.4.3 and ipython >= 3.0)
After the evaluation of each cell, the (still open) figure object is automatically shown in your notebook.
This minimal code example works in notebook. Note that it does not call plt.show()
%matplotlib inline
import matplotlib.pyplot as plt
x = [1,2,3]
y = [3,2,1]
_ = plt.plot(x,y)
%matplotlib inline simply displays the image.
%matplotlib notebook was added recently and offers many of the cool features (zooming, measuring,...) of the interactive backends:

Related

Plot two one seaborn plot from two dataframes

I try to plot two dataframes with seaborn into one figure.
given these test data:
df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
df['Name'] = 'Adam'
df.iloc[::5, 4] = 'Berta'
df.head(10)
A B C D Name
0 40 75 45 6 Berta
1 52 98 55 44 Adam
2 57 61 70 17 Adam
3 52 5 20 28 Adam
4 63 53 74 49 Adam
5 53 28 97 26 Berta
6 64 38 73 56 Adam
7 25 65 34 64 Adam
8 95 91 92 60 Adam
9 6 54 5 58 Adam
and
df1 = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
df1['Location'] = 'New York'
df1.iloc[::5, 4] = 'Tokyo'
df1.head(10)
A B C D Location
0 89 16 23 15 Tokyo
1 7 35 26 21 New York
2 64 94 51 61 New York
3 84 16 15 36 New York
4 55 62 0 2 New York
5 73 93 4 1 Tokyo
6 93 11 27 69 New York
7 14 52 50 45 New York
8 26 77 86 32 New York
9 21 10 68 11 New York
A)The first plot I would like to plot a relplot or scatterplot where both dataframes have the same x and y axes, but a different "hue". If I try:
sb.relplot(data=df, x='Name', y='C', hue="Name", height=8.27, aspect=11.7/8.27)
sb.relplot(data=df1, x='Location', y='C', hue="Location", height=8.27, aspect=11.7/8.27)
plt.show()
The latter plot will overwrite the first or creates a new one. Any ideas?
B) Now we have the same y-axes (let's say "amount"), but with different x-axes (strings).
I found this here: How to overlay two seaborn relplots? and it looks pretty good, but if I try:
fig, ax = plt.subplots()
sb.scatterplot(x="Name", y='A', data=df, hue="Name", ax=ax)
ax2 = ax.twinx()
sb.scatterplot(data=df1, x='Location', y='A', hue="Location", ax =ax2)
plt.show()
then the second scatterplot plots the values over the values of the first one overwriting the names for x. But I would like to add the second scatterplot on the right. Is this possible?
In my opinion it doesn't make sense to concatenate the two dataframes.
Thanks very much!
Having gathered all questions you asked I assume you either want to plot two subplots in one row for two DataFrames or plot two sets of data on one figure.
As for the 'A' plot:
fig, ax = plt.subplots(1, 2, figsize=(8, 4), sharey=True)
sb.scatterplot(data=df, x='Name', y='A', hue='Name',
ax=ax[0])
sb.scatterplot(data=df1, x='Location', y='A', hue='Location',
ax=ax[1])
plt.show()
Here I created both fig and ax using plt.subplots() so then I could locate each scatter plot on a separate subplot, indicating number of rows (1) and columns (2) and a shared Y-axis. Here's what I got (sorry for not bothering for legend location and other decorations):
As for the 'B' plot, if you would want everything on one plot, then you may try:
fig, ax = plt.subplots(1, 1, figsize=(8, 4))
sb.scatterplot(data=df, x='Name', y='A', hue='Name', palette=['blue', 'orange'],
ax=ax)
sb.scatterplot(data=df1, x='Location', y='A', hue='Location', palette=['red', 'green'],
ax=ax)
ax.set_xlabel('Name/Location')
plt.show()
Here I made a single subplot and assigned both scatter plots to it. Might require color mapping and renaming X-axis:

Is there a way to use a for-loop to quickly create sublots in matplotlib and pandas?

I'm saving the daily stock price for several stocks in a Pandas Dataframe. I'm using python and Jupyter notebook.
Once saved, I'm using matplotlib to graph the prices to check the data.
The idea is to graph 9 stocks at at time in a 3 x 3 subplot.
When I want to check other stock tickers I have to mannualy change each ticker in each subplot, which takes a long time and seems inefficient.
¿Is there a way to do this with some sort of list and for loop?
Here is my current code. It works but it seems to long and hard to update. (Stock tickers are only examples from a vanguard model portfolio).
x = price_df.index
a = price_df["P_VOO"]
b = price_df["P_VGK"]
c = price_df["P_VPL"]
d = price_df["P_IEMG"]
e = price_df["P_MCHI"]
f = price_df["P_VNQ"]
g = price_df["P_GDX"]
h = price_df["P_BND"]
i = price_df["P_BNDX"]
# Plot a figure with various axes scales
fig = plt.figure(figsize=(15,10))
# Subplot 1
plt.subplot(331)
plt.plot(x, a)
plt.title("VOO")
plt.ylim([0,550])
plt.grid(True)
plt.subplot(332)
plt.plot(x, b)
plt.title("VGK")
plt.ylim([0,400])
plt.grid(True)
plt.subplot(333)
plt.plot(x, c)
plt.title('VPL')
plt.ylim([0,110])
plt.grid(True)
plt.subplot(334)
plt.plot(x, d)
plt.title('IEMG')
plt.ylim([0,250])
plt.grid(True)
plt.subplot(335)
plt.plot(x, e)
plt.title('MCHI')
plt.ylim([0,75])
plt.grid(True)
plt.subplot(336)
plt.plot(x, f)
plt.title('P_VNQ')
plt.ylim([0,55])
plt.grid(True)
plt.subplot(337)
plt.plot(x, g)
plt.title('P_GDX')
plt.ylim([0,8])
plt.grid(True)
plt.subplot(338)
plt.plot(x, h)
plt.title('P_BND')
plt.ylim([0,200])
plt.grid(True)
plt.subplot(339)
plt.plot(x, i)
plt.title('P_BNDX')
plt.ylim([0,350])
plt.grid(True)
plt.tight_layout()
Try with DataFrame.plot and enable subplots, set the layout and figsize:
axes = df.plot(subplots=True, title=df.columns.tolist(),
grid=True, layout=(3, 3), figsize=(15, 10))
plt.tight_layout()
plt.show()
Or use plt.subplots to set the layout then plot on those axes with DataFrame.plot:
# setup subplots
fig, axes = plt.subplots(nrows=3, ncols=3, figsize=(15, 10))
# Plot DataFrame on axes
df.plot(subplots=True, ax=axes, title=df.columns.tolist(), grid=True)
plt.tight_layout()
plt.show()
Sample Data and imports:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
np.random.seed(5)
df = pd.DataFrame(np.random.randint(10, 100, (10, 9)),
columns=list("ABCDEFGHI"))
df:
A B C D E F G H I
0 88 71 26 83 18 72 37 40 90
1 17 86 25 63 90 37 54 87 85
2 75 57 40 94 96 28 19 51 72
3 11 92 26 88 15 68 10 90 14
4 46 61 37 41 12 78 48 93 29
5 28 17 40 72 21 77 75 65 13
6 88 37 39 43 99 95 17 26 24
7 41 19 48 57 26 15 44 55 69
8 34 23 41 42 86 54 15 24 57
9 92 10 17 96 26 74 18 54 47
Does this implementation not work out in your case?
x = price_df.index
cols = ["P_VOO","P_VGK",...] #Populate before running
ylims = [[0,550],...] #Populate before running
# Plot a figure with various axes scales
fig = plt.figure(figsize=(15,10))
# Subplot 1
for i, (col, ylim) in enumerate(zip(cols, ylims)):
plt.subplot(331+i)
plt.plot(x, price_df[col])
plt.title(col.split('_')[1])
plt.ylim(ylim)
plt.grid(True)
Haven't run the code in my local, could have some minor bugs. But you get the general idea, right?

Plotting of dot points based on np.where condition

I have a lot of data points (in .CSV form) that I am trying to visualize, what I would like to do is to read the csv and read the "result" column, if the value in the corresponding column is positive(I was trying to use np.where condition), I would like to plot the A B C D E F G parameters corresponding to it in such a way that the y-axis is the value of the parameters and x-axis is the name of the parameter.(Something like a dot/scatter plot) I would like to plot all the values in the same graph, Furthermore, if the number of points are more than 20 I would like to use the first 20 points for the plotting.
An example of the type of dataset is below. (Mine contains around 12000 rows)
A B C D E F G result
23 -54 36 27 98 39 80 -0.86
14 44 -16 47 28 29 26 1.65
67 84 26 67 -88 29 10 0.5
-45 14 76 37 68 59 90 0
24 34 56 27 38 79 48 -1.65
Any help in guiding for this would be appreciated !
From your question I assume that your data is a pandas dataframe. In this case you can do the selection with pandas and use its built-in plotting function:
df.loc[df.result>0, df.columns[:-1]].T.plot(ls='', marker='o')
If you want to plot the first 20 rows only, just add [:20] (or better .iloc[:20]) to df.loc.

How can I iterate through a CSV file and plot it to a boxplot by each column representing a second in Python?

Say I have a csv file like so:
20 30 33 54 12 56
90 54 66 12 88 11
33 22 63 86 12 65
11 44 65 34 23 26
I want to create a boxplot where each column is a second, which is also the x-axis. The actual data to be on the y. So, 20, 90, 33, 11 will be on 1 second and on one plot and 30, 54, 22, 44 on 2 seconds and so on. Also, the csv file has more data than this that I am not sure how many data sets so I can't hard code anything in.
This is what I have so far:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('/user/Desktop/test.csv', header = None)
fig = plt.figure()
ax = fig.add_subplot()
plt.xlabel('Time (s)')
plt.ylabel('ms')
df.boxplot()
plt.show()
Try this:
axes = df.groupby(df.columns//10, axis=1).boxplot(subplots=True,
figsize=(12,18))
plt.xlabel('Time (s)')
plt.ylabel('ms')
plt.show()
Output:
If you want to set y limits of the subplots:
for ax in axes.flatten():
ax.set_ylim(0,100)

get rid of tiny fraction in bar plot scale (Pandas/Python)

When I'm trying to plot a bar plot (of histograms), using pd.cut, I get a funny (and very annoying!) 0.001 added to the axis (from the left), making it starting from -1.001 instead of -1. The question is how to get rid of this? (please see the figure).
My code is:
out_i = pd.cut(df, bins=np.arange(-1,1.2,0.2), include_lowest=True)
out_i.value_counts(sort=False).plot.bar(rot=45, figsize=(6,6))
plt.tight_layout()
with df:
a
0 -0.402203
1 -0.019031
2 -0.979292
3 -0.701221
4 -0.267261
5 -0.563602
7 -0.454961
8 0.632456
9 -0.843081
10 -0.629253
11 -0.946188
12 -0.628178
13 -0.776933
14 -0.717091
15 -0.392144
16 -0.799408
17 -0.897951
18 0.255321
19 -0.641854
20 -0.356393
21 -0.507321
22 -0.698238
23 -0.985097
25 -0.661444
26 -0.751593
27 -0.437505
28 -0.413451
29 -0.798745
30 -0.736440
31 -0.672727
32 -0.807688
33 -0.087085
34 -0.393203
35 -0.979730
36 -0.902951
37 -0.454231
38 -0.561951
39 -0.388580
40 -0.706501
41 -0.408248
42 -0.377235
43 -0.283110
44 -0.517428
45 -0.949603
46 -0.268667
47 -0.376199
48 -0.472293
49 -0.211781
50 -0.921520
51 -0.345870
53 -0.542487
55 -0.597996
In case it is acceptable to chop off the decimal points of the intervals, generate a custom list of interval labels and set this as the xticklabels of the plot:
out_i = pd.cut(df['a'], bins=np.arange(-1,1.2,0.2), include_lowest=True)
intervals = out_i.cat.categories
labels = ['(%.1f, %.1f]' % (int(interval.left*100)/100, interval.right) for interval in intervals]
ax = out_i.value_counts(sort=False).plot.bar(rot=45, figsize=(6,6))
ax.set_xticklabels(labels)
plt.tight_layout()
Which results in the following plot:
Note: this will always output a half-closed interval (a,b]. It can be improved by making the brackets dynamic as per the parameters of pd.cut.

Categories