How to adjust month on axis without overlapp using matplotlib? - python

I'm trying to plot the following data with matplotlib.
Month A B C
0 2014/06 41 17 3
1 2014/07 48 11 7
2 2014/08 58 20 4
3 2014/09 43 16 6
4 2014/10 73 13 7
5 2014/11 69 22 16
6 2014/12 65 34 9
7 2015/01 69 27 12
I'm having the following code:
x = np.arange(len(df["Month"].values))
y1=df["A"].values.astype(int)
y2=df["B"].values.astype(int)
y3=df["C"].values.astype(int)
my_xticks = df["Month"].values
plt.xticks(x, my_xticks)
plt.plot(x,y1)
plt.plot(x,y2)
plt.plot(x,y3)
plt.show()
The problem is the months are overlapping each other on x-axis. Can I make this automatically adjusted by Python. Not only I need to rotate, but also automatically ignore some months. Otherwise, it's too crowded.

Matplotlib has a function which can automatically format your x axis when they are dates - autofmt_xdate. This automatically rotates the labels, and positions the ticks. They can be changed from the defaults by passing arguments to this function. They can also, of course, be changed manually, but this requires (slightly) more effort.
You can easily reduce the number of dates shown be sampling every 2nd element of the list, using the slice notation [::2]
# Code here that creates a list of dates called list_of_dates...
print (list_of_dates)
# ['2016-08', '2016-09', '2016-10', '2016-11', '2016-12', '2017-01',
# '2017-02', '2017-03', '2017-04', '2017-05', '2017-06', '2017-07',
# '2017-08', '2017-09', '2017-10', '2017-11', '2017-12', '2018-01']
x = np.arange(0, len(list_of_dates), 1)
plt.xticks(x[::2], list_of_dates[::2])
plt.plot(x, np.random.randn(len(list_of_dates)))
# plt.gcf() means "get current figure"
plt.gcf().autofmt_xdate(ha="center")
plt.show()
Which gives:

Related

Plot two one seaborn plot from two dataframes

I try to plot two dataframes with seaborn into one figure.
given these test data:
df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
df['Name'] = 'Adam'
df.iloc[::5, 4] = 'Berta'
df.head(10)
A B C D Name
0 40 75 45 6 Berta
1 52 98 55 44 Adam
2 57 61 70 17 Adam
3 52 5 20 28 Adam
4 63 53 74 49 Adam
5 53 28 97 26 Berta
6 64 38 73 56 Adam
7 25 65 34 64 Adam
8 95 91 92 60 Adam
9 6 54 5 58 Adam
and
df1 = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
df1['Location'] = 'New York'
df1.iloc[::5, 4] = 'Tokyo'
df1.head(10)
A B C D Location
0 89 16 23 15 Tokyo
1 7 35 26 21 New York
2 64 94 51 61 New York
3 84 16 15 36 New York
4 55 62 0 2 New York
5 73 93 4 1 Tokyo
6 93 11 27 69 New York
7 14 52 50 45 New York
8 26 77 86 32 New York
9 21 10 68 11 New York
A)The first plot I would like to plot a relplot or scatterplot where both dataframes have the same x and y axes, but a different "hue". If I try:
sb.relplot(data=df, x='Name', y='C', hue="Name", height=8.27, aspect=11.7/8.27)
sb.relplot(data=df1, x='Location', y='C', hue="Location", height=8.27, aspect=11.7/8.27)
plt.show()
The latter plot will overwrite the first or creates a new one. Any ideas?
B) Now we have the same y-axes (let's say "amount"), but with different x-axes (strings).
I found this here: How to overlay two seaborn relplots? and it looks pretty good, but if I try:
fig, ax = plt.subplots()
sb.scatterplot(x="Name", y='A', data=df, hue="Name", ax=ax)
ax2 = ax.twinx()
sb.scatterplot(data=df1, x='Location', y='A', hue="Location", ax =ax2)
plt.show()
then the second scatterplot plots the values over the values of the first one overwriting the names for x. But I would like to add the second scatterplot on the right. Is this possible?
In my opinion it doesn't make sense to concatenate the two dataframes.
Thanks very much!
Having gathered all questions you asked I assume you either want to plot two subplots in one row for two DataFrames or plot two sets of data on one figure.
As for the 'A' plot:
fig, ax = plt.subplots(1, 2, figsize=(8, 4), sharey=True)
sb.scatterplot(data=df, x='Name', y='A', hue='Name',
ax=ax[0])
sb.scatterplot(data=df1, x='Location', y='A', hue='Location',
ax=ax[1])
plt.show()
Here I created both fig and ax using plt.subplots() so then I could locate each scatter plot on a separate subplot, indicating number of rows (1) and columns (2) and a shared Y-axis. Here's what I got (sorry for not bothering for legend location and other decorations):
As for the 'B' plot, if you would want everything on one plot, then you may try:
fig, ax = plt.subplots(1, 1, figsize=(8, 4))
sb.scatterplot(data=df, x='Name', y='A', hue='Name', palette=['blue', 'orange'],
ax=ax)
sb.scatterplot(data=df1, x='Location', y='A', hue='Location', palette=['red', 'green'],
ax=ax)
ax.set_xlabel('Name/Location')
plt.show()
Here I made a single subplot and assigned both scatter plots to it. Might require color mapping and renaming X-axis:

Python bar plot with irregular spacing

I am using a bar chart to plot query frequencies, but I consistently see uneven spacing between the bars. These look like they should be related to to the ticks, but they're in different positions
This shows up in larger plots
And smaller ones
def TestPlotByFrequency (df, f_field, freq, description):
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.bar(df[f_field][0:freq].index,\
df[f_field][0:freq].values)
plt.show()
This is not related to data either, none at the top have the same frequency count
count
0 8266
1 6603
2 5829
3 4559
4 4295
5 4244
6 3889
7 3827
8 3769
9 3673
10 3606
11 3479
12 3086
13 2995
14 2945
15 2880
16 2847
17 2825
18 2719
19 2631
20 2620
21 2612
22 2590
23 2583
24 2569
25 2503
26 2430
27 2287
28 2280
29 2234
30 2138
Is there any way to make these consistent?
The problem has to do with aliasing as the bars are too thin to really be separated. Depending on the subpixel value where a bar starts, the white space will be visible or not. The dpi of the plot can either be set for the displayed figure or when saving the image. However, if you have too many bars increasing the dpi will only help a little.
As suggested in this post, you can also save the image as svg to get a vector format. Depending where you want to use it, it can be perfectly rendered.
import matplotlib
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
matplotlib.rcParams['figure.dpi'] = 300
t = np.linspace(0.0, 2.0, 50)
s = 1 + np.sin(2 * np.pi * t)
df = pd.DataFrame({'time': t, 'voltage': s})
fig, ax = plt.subplots()
ax.bar(df['time'], df['voltage'], width = t[1]*.95)
plt.savefig("test.png", dpi=300)
plt.show()
Image with 100 dpi:
Image with 300 dpi:

Add hue category labels in seaborn stripplot

I have two DataFrames that I am plotting as a stripplot. I am able to plot them pretty much as I wish, but I would like to know if it is possible to add the category labels for the "hue".
The plot currently looks like this:
However, I would like to add the labels of the categories (there are only two of them) to each "column" for each letter. So that it looks something like this:
The DataFrames look like this (although these are just edited snippets):
Case Letter Size Weight
0 upper A 20 bold
1 upper A 23 bold
2 lower A 61 bold
3 lower A 62 bold
4 upper A 78 bold
5 upper A 95 bold
6 upper B 23 bold
7 upper B 40 bold
8 lower B 47 bold
9 upper B 59 bold
10 upper B 61 bold
11 upper B 99 bold
12 lower C 23 bold
13 upper D 23 bold
14 upper D 66 bold
15 lower D 99 bold
16 upper E 5 bold
17 upper E 20 bold
18 upper E 21 bold
19 upper E 22 bold
...and...
Case Letter Size Weight
0 upper A 4 normal
1 upper A 6 normal
2 upper A 7 normal
3 upper A 8 normal
4 upper A 9 normal
5 upper A 12 normal
6 upper A 25 normal
7 upper A 26 normal
8 upper A 38 normal
9 upper A 42 normal
10 lower A 43 normal
11 lower A 57 normal
12 lower A 90 normal
13 upper B 4 normal
14 lower B 6 normal
15 upper B 8 normal
16 upper B 9 normal
17 upper B 12 normal
18 upper B 21 normal
19 lower B 25 normal
The relevant code I have is:
fig, ax = plt.subplots(figsize=(10, 7.5))
plt.tight_layout()
sns.stripplot(x=new_df_normal['Letter'], y=new_df_normal['Size'],
hue=new_df_normal['Case'], jitter=False, dodge=True,
size=8, ax=ax, marker='D',
palette={'upper': 'red', 'lower': 'red'})
plt.setp(ax.get_legend().get_texts(), fontsize='16') # for legend text
plt.setp(ax.get_legend().get_title(), fontsize='18') # for legend title
ax.set_xlabel("Letter", fontsize=20)
ax.set_ylabel("Size", fontsize=20)
ax.set_ylim(0, 105)
ax.tick_params(labelsize=20)
ax2 = ax.twinx()
sns.stripplot(x=new_df_bold['Letter'], y=new_df_bold['Size'],
hue=new_df_bold['Case'], jitter=False, dodge=True,
size=8, ax=ax2, marker='D',
palette={'upper': 'green', 'lower': 'green'})
ax.legend_.remove()
ax2.legend_.remove()
ax2.set_xlabel("", fontsize=20)
ax2.set_ylabel("", fontsize=20)
ax2.set_ylim(0, 105)
ax2.tick_params(labelsize=20)
Is it possible to add those category labels ("bold" and "normal") for each column?
Using seaborn’s scatter plot you could access to the style (or even size) parameter. But you might not end up with your intended layout in the end. scatterplot documentation.
Or you could use the catplot and play with rows and columns. seaborn doc for catplot
Unfortunately Seaborn does not natively provide what you are looking for : another level of nesting beyond the hue parameter in stripplot (see stripplot documentation. Some seaborn tickets are opened that might be related, eg this ticket. But I’ve come accros some similar feature requests in seaborn that were refused, see this ticket
One last possibility is to dive into the matplotlib primitives to manipulate your seaborn diagram (since seaborn is just on top of matplotlib). Needless to say it would require a lot of effort, and might end-up nullifying seaborn in the first place ;)
Set dodge=True enables this:
import seaborn as sns
tips = sns.load_dataset("tips")
sns.violinplot(x="day", y="total_bill", hue="smoker",
data=tips, palette="muted")
sns.stripplot(x="day", y="total_bill", hue="smoker",
data=tips, palette="muted", dodge=True)
EDIT:
And with the df provided by the OP:
df = pd.read_csv('./ongenz.tsv', sep='\t')
sns.stripplot(x=df['Letter'], y=df['Size'], data=df, hue=df['Case'], dodge=True)

get rid of tiny fraction in bar plot scale (Pandas/Python)

When I'm trying to plot a bar plot (of histograms), using pd.cut, I get a funny (and very annoying!) 0.001 added to the axis (from the left), making it starting from -1.001 instead of -1. The question is how to get rid of this? (please see the figure).
My code is:
out_i = pd.cut(df, bins=np.arange(-1,1.2,0.2), include_lowest=True)
out_i.value_counts(sort=False).plot.bar(rot=45, figsize=(6,6))
plt.tight_layout()
with df:
a
0 -0.402203
1 -0.019031
2 -0.979292
3 -0.701221
4 -0.267261
5 -0.563602
7 -0.454961
8 0.632456
9 -0.843081
10 -0.629253
11 -0.946188
12 -0.628178
13 -0.776933
14 -0.717091
15 -0.392144
16 -0.799408
17 -0.897951
18 0.255321
19 -0.641854
20 -0.356393
21 -0.507321
22 -0.698238
23 -0.985097
25 -0.661444
26 -0.751593
27 -0.437505
28 -0.413451
29 -0.798745
30 -0.736440
31 -0.672727
32 -0.807688
33 -0.087085
34 -0.393203
35 -0.979730
36 -0.902951
37 -0.454231
38 -0.561951
39 -0.388580
40 -0.706501
41 -0.408248
42 -0.377235
43 -0.283110
44 -0.517428
45 -0.949603
46 -0.268667
47 -0.376199
48 -0.472293
49 -0.211781
50 -0.921520
51 -0.345870
53 -0.542487
55 -0.597996
In case it is acceptable to chop off the decimal points of the intervals, generate a custom list of interval labels and set this as the xticklabels of the plot:
out_i = pd.cut(df['a'], bins=np.arange(-1,1.2,0.2), include_lowest=True)
intervals = out_i.cat.categories
labels = ['(%.1f, %.1f]' % (int(interval.left*100)/100, interval.right) for interval in intervals]
ax = out_i.value_counts(sort=False).plot.bar(rot=45, figsize=(6,6))
ax.set_xticklabels(labels)
plt.tight_layout()
Which results in the following plot:
Note: this will always output a half-closed interval (a,b]. It can be improved by making the brackets dynamic as per the parameters of pd.cut.

IPython notebook stops evaluating cells after plt.show()

I am using iPython to do some coding. When I open the notebook and run some codes by doing SHIFT+ENTER it runs. But after one or two times, it stops giving any output. Why is that. I have to shutdown the notebook again open it and then it runs for few times and same problem again.
Here is the code I have used.
Cell Toolbar:
Question 1: Rotational Invariance of PCA
I(1): Importing the data sets and plotting a scatter plot of the two.
In [1]:
# Channging the working directory
import os
os.getcwd()
path="/Users/file/"
os.chdir(path)
pwd=os.getcwd()
print(pwd)
# Importing the libraries
import pandas as pd
import numpy as np
import scipy as sp
# Mentioning the files to be imported
file=["2d-gaussian.csv","2d-gaussian-rotated.csv"]
# Importing the two csv files in pandas dataframes
XI=pd.read_csv(file[0],header=None)
XII=pd.read_csv(file[1],header=None)
#XI
XII
Out[5]:
0 1
0 1.372310 -2.111748
1 -0.397896 1.968246
2 0.336945 1.338646
3 1.983127 -2.462349
4 -0.846672 0.606716
5 0.582438 -0.645748
6 4.346416 -4.645564
7 0.830186 -0.599138
8 -2.460311 2.096945
9 -1.594642 2.828128
10 3.767641 -3.401645
11 0.455917 -0.224665
12 2.878315 -2.243932
13 -1.062223 0.142675
14 -0.698950 1.113589
15 -4.681619 4.289080
16 0.411498 -0.041293
17 0.276973 0.187699
18 1.500835 -0.284463
19 -0.387535 -0.265205
20 3.594708 -2.581400
21 2.263455 -2.660592
22 -1.686090 1.566998
23 1.381510 -0.944383
24 -0.085535 -1.697205
25 1.030609 -1.448967
26 3.647413 -3.322129
27 -3.474906 2.977695
28 -7.930797 8.506523
29 -0.931702 1.440784
... ... ...
70 4.433750 -2.515612
71 1.495646 -0.058674
72 -0.928938 0.605706
73 -0.890883 -0.005911
74 -2.245630 1.333171
75 -0.707405 0.121334
76 0.675536 -0.822801
77 1.975917 -1.757632
78 -1.239322 2.053495
79 -2.360047 1.842387
80 2.436710 -1.445505
81 0.348497 -0.635207
82 -1.423243 -0.017132
83 0.881054 -1.823523
84 0.052809 1.505141
85 -2.466735 2.406453
86 -0.499472 0.970673
87 4.489547 -4.443907
88 -2.000164 4.125330
89 1.833832 -1.611077
90 -0.944030 0.771001
91 -1.677884 1.920365
92 0.372318 -0.474329
93 -2.073669 2.020200
94 -0.131636 -0.844568
95 -1.011576 1.718216
96 -1.017175 -0.005438
97 5.677248 -4.572855
98 2.179323 -1.704361
99 1.029635 -0.420458
100 rows × 2 columns
The two raw csv files have been imported as data frames. Next we will concatenate both the dataframes into one dataframe to plot a combined scatter plot
In [6]:
# Joining two dataframes into one.
df_combined=pd.concat([XI,XII],axis=1,ignore_index=True)
df_combined
Out[6]:
0 1 2 3
0 2.463601 -0.522861 1.372310 -2.111748
1 -1.673115 1.110405 -0.397896 1.968246
2 -0.708310 1.184822 0.336945 1.338646
3 3.143426 -0.338861 1.983127 -2.462349
4 -1.027700 -0.169674 -0.846672 0.606716
5 0.868458 -0.044767 0.582438 -0.645748
6 6.358290 -0.211529 4.346416 -4.645564
7 1.010685 0.163375 0.830186 -0.599138
8 -3.222466 -0.256939 -2.460311 2.096945
9 -3.127371 0.872207 -1.594642 2.828128
10 5.069451 0.258798 3.767641 -3.401645
11 0.481244 0.163520 0.455917 -0.224665
12 3.621976 0.448577 2.878315 -2.243932
13 -0.851991 -0.650218 -1.062223 0.142675
14 -1.281659 0.293194 -0.698950 1.113589
15 -6.343242 -0.277567 -4.681619 4.289080
16 0.320172 0.261774 0.411498 -0.041293
17 0.063126 0.328573 0.276973 0.187699
18 1.262396 0.860105 1.500835 -0.284463
19 -0.086500 -0.461557 -0.387535 -0.265205
20 4.367168 0.716517 3.594708 -2.581400
21 3.481827 -0.280818 2.263455 -2.660592
22 -2.300280 -0.084211 -1.686090 1.566998
23 1.644655 0.309095 1.381510 -0.944383
24 1.139623 -1.260587 -0.085535 -1.697205
25 1.753325 -0.295824 1.030609 -1.448967
26 4.928210 0.230011 3.647413 -3.322129
27 -4.562678 -0.351581 -3.474906 2.977695
28 -11.622940 0.407100 -7.930797 8.506523
29 -1.677601 0.359976 -0.931702 1.440784
... ... ... ... ...
70 4.913941 1.356329 4.433750 -2.515612
71 1.099070 1.016093 1.495646 -0.058674
72 -1.085156 -0.228560 -0.928938 0.605706
73 -0.625769 -0.634129 -0.890883 -0.005911
74 -2.530594 -0.645206 -2.245630 1.333171
75 -0.586007 -0.414415 -0.707405 0.121334
76 1.059484 -0.104132 0.675536 -0.822801
77 2.640018 0.154351 1.975917 -1.757632
78 -2.328373 0.575707 -1.239322 2.053495
79 -2.971570 -0.366041 -2.360047 1.842387
80 2.745141 0.700888 2.436710 -1.445505
81 0.695584 -0.202735 0.348497 -0.635207
82 -0.994271 -1.018499 -1.423243 -0.017132
83 1.912425 -0.666426 0.881054 -1.823523
84 -1.026954 1.101637 0.052809 1.505141
85 -3.445865 -0.042626 -2.466735 2.406453
86 -1.039549 0.333189 -0.499472 0.970673
87 6.316906 0.032272 4.489547 -4.443907
88 -4.331379 1.502719 -2.000164 4.125330
89 2.435918 0.157511 1.833832 -1.611077
90 -1.212710 -0.122350 -0.944030 0.771001
91 -2.544347 0.171460 -1.677884 1.920365
92 0.598670 -0.072133 0.372318 -0.474329
93 -2.894802 -0.037809 -2.073669 2.020200
94 0.504119 -0.690281 -0.131636 -0.844568
95 -1.930254 0.499670 -1.011576 1.718216
96 -0.715406 -0.723096 -1.017175 -0.005438
97 7.247917 0.780923 5.677248 -4.572855
98 2.746180 0.335849 2.179323 -1.704361
99 1.025371 0.430754 1.029635 -0.420458
100 rows × 4 columns
Plotting two separate scatter plot of all the four columns onto one scatter diagram
In [ ]:
import matplotlib.pyplot as plt
# Fucntion for scatter plot
def scatter_plot():
# plots scatter for first two columns(Unrotated Gaussian data)
plt.scatter(df_combined.ix[:,0], df_combined.ix[:,1],color='red',marker='+')
# plots scatter for Rotated Gaussian data
plt.scatter(df_combined.ix[:,2], df_combined.ix[:,3] ,color='green', marker='x')
legend = plt.legend(loc='upper right')
# set ranges of x and y axes
plt.xlim([-12,12])
plt.ylim([-12,12])
plt.show()
# Function call
scatter_plot()
In [ ]:
def plot_me1():
# create figure and axes
fig = plt.figure()
# split the page into a 1x1 array of subplots and put me in the first one (111)
# (as a matter of fact, the only one)
ax = fig.add_subplot(111)
# plots scatter for x, y1
ax.scatter(df_combined.ix[:,0], df_combined.ix[:,1], color='red', marker='+', s=100)
# plots scatter for x, y2
ax.scatter(df_combined.ix[:,2], df_combined.ix[:,3], color='green', marker='x', s=100)
plt.xlim([-12,12])
plt.ylim([-12,12])
plt.show()
plot_me1()
In [ ]:
You should not use plt.show() in the notebook. This will open an external window that blocks the evaluation of your cell.
Instead begin your notebooks with %matplotlib inline or the cool new %matplotlib notebook (the latter is only possible with matplotlib >= 1.4.3 and ipython >= 3.0)
After the evaluation of each cell, the (still open) figure object is automatically shown in your notebook.
This minimal code example works in notebook. Note that it does not call plt.show()
%matplotlib inline
import matplotlib.pyplot as plt
x = [1,2,3]
y = [3,2,1]
_ = plt.plot(x,y)
%matplotlib inline simply displays the image.
%matplotlib notebook was added recently and offers many of the cool features (zooming, measuring,...) of the interactive backends:

Categories