Seaborn custom axis sxale: matplotlib.scale.FuncScale - python

I'm trying to figure out how to get a custom scale for my axis. My x-axis goes from 0 to 1,000,000 in 100,000 step increments, but I want to scale each of these numbers by 1/100, so that they go from 0 to 1,000 in 100 step increments. matplotlib.scale.FuncScale, but I'm having trouble getting it to work.
Here's what the plot currently looks like:
My code looks like this:
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
dataPlot = pd.DataFrame({"plot1" : [1, 2, 3], "plot2" : [4, 5, 6], "plot3" : [7, 8, 9]})
ax = sns.lineplot(data = dataPlot, dashes = False, palette = ["blue", "red", "green"])
ax.set_xlim(1, numRows)
ax.set_xticks(range(0, numRows, 100000))
plt.ticklabel_format(style='plain')
plt.scale.FuncScale("xaxis", ((lambda x : x / 1000), (lambda y : y * 1000)))
When I run this code specifically, I get AttributeError: module 'matplotlib.pyplot' has no attribute 'scale', so I tried adding import matplotlib as mpl to the top of the code and then changing the last line to be mpl.scale.FuncScale("xaxis", ((lambda x : x / 1000), (lambda y : y * 1000))) and that actually ran without error, but but it didn't change anything.
How can I get this to properly scale the axis?

Based on the clarification from the question comments a straightforward solution scaling the x-axis data in the dataframe (x-data in the question case being the df index) and then plot.
Using example data since the code from the question wasn't running on its own.
x starting range is 0 to 100, and then scaled to 0 to 10, but that's equivalent to any other starting range and scaling.
1st the default df.plot: (just as reference)
import pandas as pd
import numpy as np
arr = np.arange(0, 101, 1) * 1.5
df = pd.DataFrame(arr, columns=['y_data'])
print(df)
y_data
0 0.0
1 1.5
2 3.0
3 4.5
4 6.0
.. ...
96 144.0
97 145.5
98 147.0
99 148.5
100 150.0
df.plot()
Note that per default df.plot uses the index as x-axis.
2nd scaling the x-data in the dataframe:
The interims dfs are only displayed to follow along.
Preparation
df.reset_index(inplace=True)
Getting the original index data as a column to further work with (see scaling below).
index y_data
0 0 0.0
1 1 1.5
2 2 3.0
3 3 4.5
4 4 6.0
.. ... ...
96 96 144.0
97 97 145.5
98 98 147.0
99 99 148.5
100 100 150.0
df = df.rename(columns = {'index':'x_data'}) # just to be more explicit
x_data y_data
0 0 0.0
1 1 1.5
2 2 3.0
3 3 4.5
4 4 6.0
.. ... ...
96 96 144.0
97 97 145.5
98 98 147.0
99 99 148.5
100 100 150.0
Scaling
df['x_data'] = df['x_data'].apply(lambda x: x/10)
x_data y_data
0 0.0 0.0
1 0.1 1.5
2 0.2 3.0
3 0.3 4.5
4 0.4 6.0
.. ... ...
96 9.6 144.0
97 9.7 145.5
98 9.8 147.0
99 9.9 148.5
100 10.0 150.0
3rd df.plot with specific columns:
df.plot(x='x_data', y = 'y_data')
By x= a specific column instead of the default = index is used as the x-axis.
Note that the y data hasn't changed but the x-axis is now scaled compared to the "1st the default df.plot" above.

Related

Finding peaks in pandas series with non integer index

I have the following series and trying to find the index of the peaks which should be [1,8.5] or the peak value which should be [279,139]. the used threshold is 100. I tried many ways but, it always ignores the series index and returns [1,16].
0.5 0
1.0 279
1.5 256
2.0 84
2.5 23
3.0 11
3.5 3
4.0 2
4.5 7
5.0 5
5.5 4
6.0 4
6.5 10
7.0 30
7.5 88
8.0 133
8.5 139
9.0 84
9.5 55
10.0 26
10.5 10
11.0 8
11.5 4
12.0 4
12.5 1
13.0 0
13.5 0
14.0 1
14.5 0
I tried this code
thresh = 100
peak_idx, _ = find_peaks(out.value_counts(sort=False), height=thresh)
plt.plot(out.value_counts(sort=False).index[peak_idx], out.value_counts(sort=False)[peak_idx], 'r.')
out.value_counts(sort=False).plot.bar()
plt.show()
peak_idx
here is the output
array([ 1, 16], dtype=int64)
You are doing it right the only thing that you misunderstood is that find_peaks finds the indexes of the peaks, not peaks themselves.
Here is the documentation that clearly states that:
Returns
peaksndarray
Indices of peaks in x that satisfy all given conditions.
Reference: https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.find_peaks.html
Try this code here:
thresh = 100
y = [0,279,256, 84, 23, 11, 3, 2, 7, 5, 4, 4, 10, 30, 88,133,139, 84, 55, 26, 10, 8, 4, 4, 1, 0, 0, 1, 0]
x = [0.5 ,1.0 ,1.5 ,2.0 ,2.5 ,3.0 ,3.5 ,4.0 ,4.5 ,5.0 ,5.5 ,6.0 ,6.5 ,7.0 ,7.5 ,8.0 ,8.5 ,9.0 ,9.5 ,10.0,10.5,11.0,11.5,12.0,12.5,13.0,13.5,14.0,14.5]
peak_idx, _ = find_peaks(x, height=thresh)
out_values = [x[peak] for peak in peak_idx]
Here out_vaules will contain what you want

Getting several cylinders from 3D scatter plot

I got a 3D scatterplot which looks like "tubes", what it in fact should display. Currently every "tube" consist out of 40 markers. What I am trying is, that these 40 markes together built a cylinder, that looks like a tube with the positional arguments from X, Yand Z and the coloration from C.
X = df['Tube']
Y = df['Window']
C = df['Value']
Z = df['Depth']
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter3D(X,Y,Z, marker='o',s=50, c=C, cmap = 'Reds',edgecolors= "black")
df
Tube Window Value Depth
0 1 1 0.000383 -0.1
1 1 2 0.023253 -0.1
2 1 3 0.022623 -0.1
3 1 4 0.003599 -0.1
4 1 5 0.001281 -0.1
... ... ... ... ...
2155 54 36 0.020977 -1.2
2156 54 37 0.000000 -1.2
2157 54 38 0.007104 -1.2
2158 54 39 0.015233 -1.2
2159 54 40 0.000000 -1.2
Does anybody has any idea how this might be possible?
It seems to work with mayavi.mlap.
from mayavi.mlab import *
from mayavi import mlab
from PyQt5 import QtWidgets
X = df['Tube']
Y = df['Window']
C = df['Value']
Z = df['Depth']
plot3d(X, Y, Z, C, tube_radius=0.25, colormap='Reds')
mlab.show()

for loop to plot multiple graph in one diagram

I am trying to plot multiple graphs in one diagram. I am planning to do it with a for loop.
x = df1['mrwSmpVWi']
c = df['c']
a = df['a']
b = df['b']
y = (c / (1 + (a) * np.exp(-b*(x))))
for number in df.Seriennummer:
plt.plot(x,y, linewidth = 4)
plt.title("TEST")
plt.xlabel('Wind in m/s')
plt.ylabel('Leistung in kWh')
plt.xlim(0,25)
plt.ylim(0,1900)
plt.show()
The calculation doesn't work I just get dots in the diagram and I get 3 different diagrams.
This is the df:
Seriennummer c a b
0 701085 1526 256 0.597
1 701086 1193 271 0.659
2 701087 1266 217 0.607
Does someone know what I did wrong?
[![enter image description here][1]][1]
Df1 has about 500,000 rows. This is a part of df1:
Seriennummer mrwSmpVWi mrwSmpP
422 701087.0 2.9 25.0
423 701090.0 3.9 56.0
424 701088.0 3.2 22.0
425 701086.0 4.0 49.0
426 701092.0 3.7 46.0
427 701089.0 3.3 0.0
428 701085.0 2.4 4.0
429 701091.0 3.6 40.0
430 701087.0 2.7 11.0
431 701090.0 3.1 23.0
432 701086.0 3.6 35.0
The expected output schould be a diagram with multiple logitic graphs. Something like that: [![enter image description here][2]][2]
EDIT:
I guess you are using matplotlib. You can use something like
import matplotlib.pyplot as plt
# some calculations for x and y ...
fig, ax = plt.subplots(ncols=1,nrows=1)
for i in range(10):
ax.plot(x[i],y[i])
plt.show()
Further information can be found on the matplotlib subplots documentation>
https://matplotlib.org/api/_as_gen/matplotlib.pyplot.subplots.html
Because you problem is related to the pandas data frames, try something like
for number in df.Seriennummer:
x = df1.loc['Seriennummer'==number]['mrwSmpVWi']
y = (c['Seriennummer'==number] / (1 + (a['Seriennummer'==number]) * np.exp(-b['Seriennummer'==number]*(x))))
plt.plot(x,y, linewidth = 4)

How to make a scatter plot with varying scatter size and color corresponding to a range of values from a dataframe?

I have a Dataframe
df =
Magnitude,Lon,Lat,Depth
3.5 33.3 76.2 22
3.5 33.1 75.9 34
2.5 30.5 79.6 25
5.5 30.4 79.5 40
5.1 32 78.8 58
4.5 31.5 74 NaN
2.1 33.9 74.7 64
5.1 30.8 79.1 33
1.1 32.6 78.2 78
NaN 33.3 76 36
5.2 32.7 79.5 36
NaN 33.6 78.6 NaN
I wanted to make a scatter plot with Lon in X-Axis Lat in Y-axis and scatter points with different size according to the range of values in Magnitude ;
size =1 : Magnitude<2 , size =1.5 : 2<Magnitude<3, size =2 : 3<Magnitude<4, size =2.5 : Magnitude>4.
and with different colour according to the range of values in Depth ;
color =red : Depth<30 , color =blue : 30<Depth<40, color =black : 40<Depth<60, color =yellow : Depth>60
I am thinking to solve this problem by defining a dictionary for the size and color. ( Just giving the idea ; need the correct syntax)
More like
def magnitude_size(df.Magnitude):
if df.Magnitude < 2 :
return 1
if df.Magnitude > 2 and df.Magnitude < 3 :
return 1.5
if df.Magnitude > 3 and df.Magnitude < 4 :
return 2
if df.Magnitude > 4 :
return 2.5
def depth_color(df.Depth):
if df.Depth < 30 :
return 'red'
if df.Depth > 30 and df.Depth < 40 :
return 'blue'
if df.Depth > 40 and df.Depth < 60 :
return 'black'
if df.Depth > 60 :
return 'yellow'
di = {
'size': magnitude_size(df.Magnitude),
'color' : depth_color(df.Depth)
}
plt.scatter(df.Lon,df.Lat,c=di['color'],s=di['size'])
plt.show()
If there any NaN values in Magnitude give a different symbol for the scatter point () and If there any NaN values in Depth give a different color (green)*
NEED HELP
You could use pandas.cut to create a couple of helper columns in df based on your color and size mappings. This should make it easier to pass these arguments to pyplot.scatter.
N.B. It's worth noting that the values you've chosen for size may not distinguish the markers very well in the plot - it'd be worth experimenting with different sizes until you get the desired results
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
df['color'] = pd.cut(df['Depth'], bins=[-np.inf, 30, 40, 60, np.inf], labels=['red', 'blue', 'black', 'yellow'])
df['size'] = pd.cut(df['Magnitude'], bins=[-np.inf, 2, 3, 4, np.inf], labels=[1, 1.5, 2, 2.5])
plt.scatter(df['Lon'], df['Lat'], c=df['color'], s=df['size'])
Update
It's not what I would recommend, but if you insist on using dict and functions then use:
def magnitude_size(magnitude):
if magnitude < 2 :
return 1
if magnitude >= 2 and magnitude < 3 :
return 1.5
if magnitude >= 3 and magnitude < 4 :
return 2
if magnitude >= 4 :
return 2.5
def depth_color(depth):
if depth < 30 :
return 'red'
if depth >= 30 and depth < 40 :
return 'blue'
if depth >= 40 and depth < 60 :
return 'black'
if depth >= 60 :
return 'yellow'
if np.isnan(depth):
return 'green'
di = {
'size': df.Magnitude.apply(magnitude_size),
'color' : df.Depth.apply(depth_color)
}
plt.scatter(df.Lon,df.Lat,c=di['color'],s=di['size'])

Python: Finding multiple linear trend lines in a scatter plot

I have the following pandas dataframe -
Atomic Number R C
0 2.0 49.0 0.040306
1 3.0 205.0 0.209556
2 4.0 140.0 0.107296
3 5.0 117.0 0.124688
4 6.0 92.0 0.100020
5 7.0 75.0 0.068493
6 8.0 66.0 0.082244
7 9.0 57.0 0.071332
8 10.0 51.0 0.045725
9 11.0 223.0 0.217770
10 12.0 172.0 0.130719
11 13.0 182.0 0.179953
12 14.0 148.0 0.147929
13 15.0 123.0 0.102669
14 16.0 110.0 0.120729
15 17.0 98.0 0.106872
16 18.0 88.0 0.061996
17 19.0 277.0 0.260485
18 20.0 223.0 0.164312
19 33.0 133.0 0.111359
20 36.0 103.0 0.069348
21 37.0 298.0 0.270709
22 38.0 245.0 0.177368
23 54.0 124.0 0.079491
The trend between r and C is generally a linear one. What I would like to do if possible is find an exhaustive list of all the possible combinations of 3 or more points and what their trends are with scipy.stats.linregress so that I can find groups of points that fit linearly the best.
Which would ideally look something like this for the data, (Source) but I am looking for all the other possible trends too.
So the question, how do I feed all the 16776915 possible combinations (sum_(i=3)^24 binomial(24, i)) of 3 or more points into lingress and is it even doable without a ton of code?
My following solution proposal is based on the RANSAC algorithm. It is method to fit a mathematical model (e.g. a line) to data with heavy of outliers.
RANSAC is one specific method from the field of robust regression.
My solution below first fits a line with RANSAC. Then you remove the data points close to this line from your data set (which is the same as keeping the outliers), fit RANSAC again, remove data, etc until only very few points are left.
Such approaches always have parameters which are data dependent (e.g. noise level or proximity of the lines). In the following solution and MIN_SAMPLES and residual_threshold are parameters which might require some adaption to the structure of your data:
import matplotlib.pyplot as plt
import numpy as np
from sklearn import linear_model
MIN_SAMPLES = 3
x = np.linspace(0, 2, 100)
xs, ys = [], []
# generate points for thee lines described by a and b,
# we also add some noise:
for a, b in [(1.0, 2), (0.5, 1), (1.2, -1)]:
xs.extend(x)
ys.extend(a * x + b + .1 * np.random.randn(len(x)))
xs = np.array(xs)
ys = np.array(ys)
plt.plot(xs, ys, "r.")
colors = "rgbky"
idx = 0
while len(xs) > MIN_SAMPLES:
# build design matrix for linear regressor
X = np.ones((len(xs), 2))
X[:, 1] = xs
ransac = linear_model.RANSACRegressor(
residual_threshold=.3, min_samples=MIN_SAMPLES
)
res = ransac.fit(X, ys)
# vector of boolean values, describes which points belong
# to the fitted line:
inlier_mask = ransac.inlier_mask_
# plot point cloud:
xinlier = xs[inlier_mask]
yinlier = ys[inlier_mask]
# circle through colors:
color = colors[idx % len(colors)]
idx += 1
plt.plot(xinlier, yinlier, color + "*")
# only keep the outliers:
xs = xs[~inlier_mask]
ys = ys[~inlier_mask]
plt.show()
In the following plot points shown as stars belong to the clusters detected by my code. You also see a few points depicted as circles which are the points remaining after the iterations. The few black stars form a cluster which you could get rid of by increasing MIN_SAMPLES and / or residual_threshold.

Categories