It might sound trivial but I am surprised by the output. Basically, I have am calculating y = m*x + b for given a, b & x. With below code I am able to get the desired result of y which a list of 20 values.
But when I am checking the length of the list, I am getting 1 in return. And the range is (0,1) which is weird as I was expecting it to be 20.
Am I making any mistake here?
a = 10
b = 0
x = df['x']
print(x)
0 0.000000
1 0.052632
2 0.105263
3 0.157895
4 0.210526
5 0.263158
6 0.315789
7 0.368421
8 0.421053
9 0.473684
10 0.526316
11 0.578947
12 0.631579
13 0.684211
14 0.736842
15 0.789474
16 0.842105
17 0.894737
18 0.947368
19 1.000000
y_new = []
for i in x:
y = a*x +b
y_new.append(y)
len(y_new)
Output: 1
print(y_new)
[0 0.000000
1 0.526316
2 1.052632
3 1.578947
4 2.105263
5 2.631579
6 3.157895
7 3.684211
8 4.210526
9 4.736842
10 5.263158
11 5.789474
12 6.315789
13 6.842105
14 7.368421
15 7.894737
16 8.421053
17 8.947368
18 9.473684
19 10.000000
Name: x, dtype: float64]
I would propose two solutions:
The first solution is : you convert your columnn df['x'] into a list by doing df['x'].tolist() and you re-run your code and also you should replace ax+b by ai+b
The second solution is (which I would do): You convert your df['x'] into an array by doing x = np.array(df['x']). By doing this you can do some array broadcasting.
So, your code will simply be :
x = np.array(df['x'])
y = a*x + b
This should give you the desired output.
I hope this would be helpful
With the code below, I have a length of 20 for the array y_new. Are you sure to print the right value? According to this post, df['x'] returns a panda Series so df['x'] is equivalent to pd.Series(...).
df['x'] — index a column named 'x'. Returns pd.Series
import pandas as pd
a = 10
b = 0
x = pd.Series(data=[0.000000,0.052632,0.105263,0.157895,0.210526, 0.263158, 0.315789, 0.368421, 0.421053,0.473684,0.526316,0.578947,0.631579
,0.684211,0.736842,0.789474,0.842105,0.894737,0.947368,1.000000])
y_new = []
for i in x:
y = a*x +b
y_new.append(y)
print("y_new length: " + str(len(y_new)) )
Output:
y_new length: 20
Related
Lets say we have a set of codes as given below. Currently, we have two parameters whose value are initialized by user input. The output here is a dataframe.
What we want?
Use a function, to create a dataframe with all combinations of X and Y. Lets say X and Y has 4 input values each. Then
Join the output dataframe, df for each combination to get the desired output dataframe.
X= float(input("Enter the value of X: "))
Y = float(input("Enter the value of Y: "))
A= X*Y
B=X*(Y^2)
df = pd.DataFrame({"X": X, "Y": Y, "A": A, "B": B})
Desired output
X Y A B
1 2 2 4
1 4 4 16
1 6 6 36
1 8 8 64
2 2 4 8
2 4 8 32
2 6 12 72
2 8 16 128
3 2 6 12
3 4 12 48
3 6 18 108
3 8 24 192
4 2 8 16
4 4 16 64
4 6 24 144
4 8 32 256
Is this what you were looking for?
def so_help():
x = input('Please enter all X values separated by a comma(,)')
y = input('Please enter all Y values separated by a comma(,)')
#In case anyone gets comma happy
x = x.strip(',')
y = y.strip(',')
x_list = x.split(',')
y_list = y.split(',')
df_x = pd.DataFrame({'X' : x_list})
df_y = pd.DataFrame({'Y' : y_list})
df_cross = pd.merge(df_x, df_y, how = 'cross')
df_cross['X'] = df_cross['X'].astype(int)
df_cross['Y'] = df_cross['Y'].astype(int)
df_cross['A'] = df_cross['X'].mul(df_cross['Y'])
df_cross['B'] = df_cross['X'].mul(df_cross['Y'].pow(2))
return df_cross
so_help()
I have the following table:
df = pd.DataFrame({"A":['CH','CH','NU','NU','J'],
"B":['US','AU','Q','US','Q'],
"TOTAL":[10,13,3,1,18]})
And I wish to get the ratio of B with respect to its total for A. So the end result should be:
what I do is:
df['sum'] = df.groupby(['A'])['TOTAL'].transform(np.sum)
df['ratio'] = df['TOTAL']/df['sum']*100
Question: how can one achieve this with a lambda (or is there a better way).
If you want to use a lambda you can do the division inside transform:
df['ratio'] = df.groupby('A')['TOTAL'].transform(lambda x: x / x.sum() * 100)
Output:
A B TOTAL sum ratio
0 CH US 10 23 43.478261
1 CH AU 13 23 56.521739
2 NU Q 3 4 75.000000
3 NU US 1 4 25.000000
4 J Q 18 18 100.000000
But this is slower (because we go group-by-group). If I were you, I'd choose your code over this one.
I have a dataframe with points on a 2-dimensional plane:
index x y
0 0 -0.032836 49.268820
1 0 4.160005 49.268820
2 0 4.105928 68.330440
3 0 -0.062953 68.342125
4 1 4.166139 49.269398
5 1 8.497650 49.278310
6 1 8.592334 68.336560
7 1 4.041361 68.336560
8 2 8.426349 49.278890
9 2 13.480260 49.278890
10 2 13.446286 68.336560
11 2 8.467557 68.336560
12 3 13.438516 49.278374
13 3 17.356792 49.287285
14 3 17.378400 68.338240
15 3 13.382163 68.333786
16 4 17.295988 49.289800
17 4 21.418156 49.289800
18 4 21.336264 67.359630
19 4 17.313816 67.359630
and I've been trying to find a way to draw lines between the (x,y) coordinates for each index. The resulting plot should be closed rectangles.
Now, I've tried to approach this by defining series:
x = df['x']
y = df['y']
and then
index_l = df.index.tolist()
for i in index_l:
plt.plot([df.x[i],df.y[i]])
This doesn't work at all. Any idea on how to proceed. A note: ideally, I would like to have a rectangle, but if doing this by even connecting diagonally is easier, I can live with it.
Thankful for any hints or solutions.
You can group by the index and then for x, y values of each group, append the first row to the end so that plt.plot plots a closed rectangle:
for idx, points in df.groupby("index")[["x", "y"]]:
points_to_plot = points.append(points.iloc[0])
plt.plot(points_to_plot.x, points_to_plot.y)
to get this plot
I have a large set (thousands) of smooth lines (series of x,y pairs) with different sampling of x and y and different length for each line, i.e.
x_0 = {x_00, x_01, ..., } # length n_0
x_1 = {x_10, x_11, ..., } # length n_1
...
x_m = {x_m0, x_m1, ..., } # length n_m
y_0 = {y_00, y_01, ..., } # length n_0
y_1 = {y_10, y_11, ..., } # length n_1
...
y_m = {y_m0, y_m1, ..., } # length n_m
I want to find cumulative properties of each line interpolated to a regular set of x points, i.e. x = {x_0, x_1 ..., x_n-1}
Currently I'm for-looping over each line, creating an interpolant, resampling, and then taking the sum/median/whatever of that result. It works, but it's really slow. Is there any way to vectorize / matrisize this operation?
I was thinking, since linear interpolation can be a matrix operation, perhaps it's possible. At the same time, since each row can have a different length... it might be complicated. Edit: but zero padding the shorter arrays would be easy...
What I'm doing now looks something like,
import numpy as np
import scipy as sp
import scipy.interpolate
...
# `xx` and `yy` are lists of lists with the x and y points respectively
# `xref` are the reference x values at which I want interpolants
yref = np.zeros([len(xx), len(xref)])
for ii, (xi, yi) in enumerate(zip(xx, yy)):
yref[ii] = sp.interp(xref, xi, yi)
y_med = np.median(yref, axis=-1)
y_sum = np.sum(yref, axis=-1)
...
Hopefully, you can adjust the following for your purposes.
I included pandas because it has an interpolation feature to fill in missing values.
Setup
import pandas as pd
import numpy as np
x = np.arange(19)
x_0 = x[::2]
x_1 = x[::3]
np.random.seed([3,1415])
y_0 = x_0 + np.random.randn(len(x_0)) * 2
y_1 = x_1 + np.random.randn(len(x_1)) * 2
xy_0 = pd.DataFrame(y_0, index=x_0)
xy_1 = pd.DataFrame(y_1, index=x_1)
Note:
x is length 19
x_0 is length 10
x_1 is length 7
xy_0 looks like:
0
0 -4.259448
2 -0.536932
4 0.059001
6 1.481890
8 7.301427
10 9.946090
12 12.632472
14 14.697564
16 17.430729
18 19.541526
xy_0 can be aligned with x via reindex
xy_0.reindex(x)
0
0 -4.259448
1 NaN
2 -0.536932
3 NaN
4 0.059001
5 NaN
6 1.481890
7 NaN
8 7.301427
9 NaN
10 9.946090
11 NaN
12 12.632472
13 NaN
14 14.697564
15 NaN
16 17.430729
17 NaN
18 19.541526
we can then fill in missing with interpolate
xy_0.reindex(x).interpolate()
0
0 -4.259448
1 -2.398190
2 -0.536932
3 -0.238966
4 0.059001
5 0.770445
6 1.481890
7 4.391659
8 7.301427
9 8.623759
10 9.946090
11 11.289281
12 12.632472
13 13.665018
14 14.697564
15 16.064147
16 17.430729
17 18.486128
18 19.541526
What about xy_1
xy_1.reindex(x)
0
0 -1.216416
1 NaN
2 NaN
3 3.704781
4 NaN
5 NaN
6 5.294958
7 NaN
8 NaN
9 8.168262
10 NaN
11 NaN
12 10.176849
13 NaN
14 NaN
15 14.714924
16 NaN
17 NaN
18 19.493678
Interpolated
xy_0.reindex(x).interpolate()
0
0 -1.216416
1 0.423983
2 2.064382
3 3.704781
4 4.234840
5 4.764899
6 5.294958
7 6.252726
8 7.210494
9 8.168262
10 8.837791
11 9.507320
12 10.176849
13 11.689541
14 13.202233
15 14.714924
16 16.307842
17 17.900760
18 19.493678
I have a dataframe that looks like this:
bucket type v
0 0 X 14
1 1 X 10
2 1 Y 11
3 1 X 15
4 2 X 16
5 2 Y 9
6 2 Y 10
7 3 Y 20
8 3 X 18
9 3 Y 15
10 3 X 14
The desired output looks like this:
bucket type v v_paired
0 1 X 14 nan (no Y coming before it)
1 1 X 10 nan (no Y coming before it)
2 1 Y 11 14 (highest X in bucket 1 before this row)
3 1 X 15 11 (lowest Y in bucket 1 before this row)
4 2 X 16 nan (no Y coming before it in the same bucket)
5 2 Y 9 16 (highest X in same bucket coming before)
6 2 Y 10 16 (highest X in same bucket coming before)
7 3 Y 20 nan (no X coming before it in the same bucket)
8 3 X 18 20 (single Y coming before it in same bucket)
9 3 Y 15 18 (single Y coming before it in same bucket)
10 3 X 14 15 (smallest Y coming before it in same bucket)
The goal is to construct the v_paired column, and the rules are as follows:
Look for rows in the same bucket, coming before this one, that have opposite type(X vs Y), call these 'pair candidates'
If the current row is X, choose the min. v out of the pair candidates to become v_paired for the current row, if the current row is Y, choose the max. v out of the pair candidates to be the v_paired for the current row
Thanks in advance.
I believe this should be done in a sequential manner...
first group by bucket
groups = df.groupby('bucket', group_keys=False)
this function will be applied to each bucket group
def func(group):
y_value = None
x_value = None
result = []
for _, (_, value_type, value) in group.iterrows():
if value_type == 'X':
x_value = max(filter(None,(x_value, value)))
result.append(y_value)
elif value_type == 'Y':
y_value = min(filter(None,(y_value, value)))
result.append(x_value)
return pd.DataFrame(result)
df['v_paired'] = groups.apply(func)
hopefuly this will do the job