Plotting rows corresponding to a given y-value separately - python

Suppose I have a table of data-
No. 200 400 600 800
1 13 14 17 18
2 16 18 20 21
3 20 15 18 19
and so on...
where each column represents a y-value for a given x-value. The first line is the x-value and the first column is the number of each dataset.
How can I read in and plot each row seperately?
For an idea of how I would like my results to be for the table I have quoted above see the following images. I have plotted each plot individually.
http://postimg.org/image/yw46zw7er/92d01c08/
http://postimg.org/image/c1kf2nqwp/29a8b1c8/

Matplotlib plots 2d arrays by plotting each column, so here you just need to transpose your data. Assuming the data is in a text file called data.csv.
import numpy as np
import matplotlib.pyplot as plt
data = np.loadtxt('data.csv')
x = [200, 400, 600, 800]
plt.plot(x, data.T)
plt.legend((1,2,3))
plt.show()

Related

Custom Pandas box plot

Random Data:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
data = pd.DataFrame(np.random.normal(size=(20,4)))
#Rename columns
data.set_axis(['Column A', 'Column B', 'Column C', 'Column D'], axis=1, inplace=True)
data
Column A Column B Column C Column D
0 -2.421964 1.053834 -0.522569 -0.820083
1 -0.334253 1.275719 0.590327 -1.100766
2 -3.410730 -0.738087 -1.619469 1.300860
3 1.808026 -0.490364 -0.433812 0.574514
4 -0.628401 -1.098690 -0.537222 0.601859
5 -0.953888 1.118034 -2.304954 -1.723802
6 0.856820 0.204850 -0.464042 -2.653982
7 -1.328367 1.057808 1.083722 0.120027
8 -1.150053 1.457841 -1.592256 -0.362547
9 -0.449330 0.714787 -0.134940 0.098445
10 -1.793606 0.858645 0.800018 0.191496
11 -0.967488 1.504187 -1.376536 0.251128
12 1.231656 0.984247 -0.975960 0.619155
13 -0.930387 -1.144732 -1.761642 1.983434
14 0.857593 0.580386 -0.119221 -0.513108
15 0.985186 -0.992795 2.154594 0.458575
16 -0.937518 0.548841 -0.536350 -0.925943
17 0.626230 -0.339900 0.027100 -0.209365
18 -0.538961 1.036132 -0.451085 -0.865581
19 0.078272 -0.773970 0.010077 -1.766130
data.boxplot(vert=False, figsize=(15,10), patch_artist=True)
I want to implement the following additions to my data box plot (see example picture below):
To the right of the box plot, we have particular values (4.10% and 3.52%) from the corresponding columns. For example, the last non-NaN values in each column.
To the left of the box plot, we append the percentile (within their columns) of the aforementioned values. For example, within the first column ("12M EU HY Corp") 4.10% is the 86th percentile.
How can I recreate something like this for my plot? I'm more so lost about how to insert the values to the right of the box plot. For the percentiles, I figured I could just concatenate a string representation of the percentile to the name of each column.

Plot with Histogram an attribute from a dataframe

I have a dataframe with the weight and the number of measures of each user. The df looks like:
id_user
weight
number_of_measures
1
92.16
4
2
80.34
5
3
71.89
11
4
81.11
7
5
77.23
8
6
92.37
2
7
88.18
3
I would like to see an histogram with the attribute of the table (weight, but I want to do it for both cases) at the x-axis and the frequency in the y-axis.
Does anyone know how to do it with matplotlib?
Ok, it seems to be quite easy:
import pandas as pd
import matplotlib.pyplot as plt
hist = df.hist(bins=50)
plt.show()

Python- compress lower end of y-axis in contourf plot

The issue
I have a contourf plot I made with a pandas dataframe that plots some 2-dimensional value with time on the x-axis and vertical pressure level on the y-axis. The field, time, and pressure data I'm pulling is all from a netCDF file. I can plot it fine, but I'd like to scale the y-axis to better represent the real atmosphere. (The default scaling is linear, but the pressure levels in the file imply a different king of scaling.) Basically, it should look something like the plot below on the y-axis. It's like a log scale, but compressing the bottom part of the axis instead of the top. (I don't know the term for this... like a log scale but inverted?) It doesn't need to be exact.
Working example (written in Jupyter notebook)
#modules
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import ticker, colors
#data
time = np.arange(0,10)
lev = np.array([900,800,650,400,100])
df = pd.DataFrame(np.arange(50).reshape(5,10),index=lev,columns=time)
df.index.name = 'Level'
print(df)
0 1 2 3 4 5 6 7 8 9
Level
900 0 1 2 3 4 5 6 7 8 9
800 10 11 12 13 14 15 16 17 18 19
650 20 21 22 23 24 25 26 27 28 29
400 30 31 32 33 34 35 36 37 38 39
100 40 41 42 43 44 45 46 47 48 49
#lists for plotting
levtick = np.arange(len(lev))
clevels = np.arange(0,55,5)
#Main plot
fig, ax = plt.subplots(figsize=(10, 5))
im = ax.contourf(df,levels=clevels,cmap='RdBu_r')
#x-axis customization
plt.xticks(time)
ax.set_xticklabels(time)
ax.set_xlabel('Time')
#y-axis customization
plt.yticks(levtick)
ax.set_yticklabels(lev)
ax.set_ylabel('Pressure')
#title and colorbar
ax.set_title('Some mean time series')
cbar = plt.colorbar(im,values=clevels,pad=0.01)
tick_locator = ticker.MaxNLocator(nbins=11)
cbar.locator = tick_locator
cbar.update_ticks()
The Question
How can I scale the y-axis such that values near the bottom (900, 800) are compressed while values near the top (200) are expanded and given more plot space, like in the sample above my code? I tried using ax.set_yscale('function', functions=(forward, inverse)) but didn't understand how it works. I also tried simply ax.set_yscale('log'), but log isn't what I need.
You can use a custom scale transformation with ax.set_yscale('function', functions=(forward, inverse)) as you suggested. From the documentation:
forward and inverse are callables that return the scale transform
and its inverse.
In this case, define in forward() the function you want, such as the inverse of the log function, or a more custom one for your need. Call this function before your y-axis customization.
def forward(x):
return 2**x
def inverse(x):
return np.log2(x)
ax.set_yscale('function', functions=(forward,inverse))

Seaborn distplot only whole numbers

How can I make a distplot with seaborn to only have whole numbers?
My data is an array of numbers between 0 and ~18. I would like to plot the distribution of the numbers.
Impressions
0 210
1 1084
2 2559
3 4378
4 5500
5 5436
6 4525
7 3329
8 2078
9 1166
10 586
11 244
12 105
13 51
14 18
15 5
16 3
dtype: int64
Code I'm using:
sns.distplot(Impressions,
# bins=np.arange(Impressions.min(), Impressions.max() + 1),
# kde=False,
axlabel=False,
hist_kws={'edgecolor':'black', 'rwidth': 1})
plt.xticks = range(current.Impressions.min(), current.Impressions.max() + 1, 1)
Plot looks like this:
What I'm expecting:
The xlabels should be whole numbers
Bars should touch each other
The kde line should simply connect the top of the bars. By the looks of it, the current one assumes to have 0s between (x, x + 1), hence why the downward spike (This isn't required, I can turn off kde)
Am I using the correct tool for the job or distplot shouldn't be used for whole numbers?
For your problem can be solved bellow code,
import seaborn as sns # for data visualization
import numpy as np # for numeric computing
import matplotlib.pyplot as plt # for data visualization
arr = np.array([1,2,3,4,5,6,7,8,9])
sns.distplot(arr, bins = arr, kde = False)
plt.xticks(arr)
plt.show()
enter image description here
In this way, you can plot histogram using seaborn sns.distplot() function.
Note: Whatever data you will pass to bins and plt.xticks(). It should be an ascending order.

Multiple plots in python

I want to plot a curve on an image. I would to see the curve only in a certain range. So:
plt.figure()
plt.imshow(img)
plt.plot(x, my_curve)
plt.axis([0, X, Y, 0])
But in this way also the image is showed in that range, but I don't want this. I would like to see the whole image with a portion of the curve. How can apply the axes only on the second plot?
Note that I can't use a slice of the arrays. I am in this situation:
x = [0 0 0 10 10 10 30 30 30 40 40 40]
my_curve = [0 0 0 10 10 10 30 30 30 40 40 40]
Well I need to see the straight line on the image, but only between pixels 25 and 35. If I delete each element out of such range, I obtain only the point (30,30) and I can not represent the straight line.
If your data is sparse, you can interpolate it :
x2=np.linspace(x[0],x[-1],1000)[0:X]
my_curve2=np.interp(x2,x,my_curve)
plt.plot(x2, my_curve2)

Categories