python resampling from two dataframes

python resampling from two dataframes - python

have two data frames
import pandas as pd
df = pd.DataFrame({'x': [10, 47, 58, 68, 75, 80],
'y': [10, 9, 8, 7, 6, 5]})
df2 = pd.DataFrame({'x': [45, 55, 66, 69, 79, 82], 'y': [10, 9, 8, 7, 6, 5]})
df1
x y
10 10
47 9
58 8
68 7
75 6
80 5
df2
x y
45 10
55 9
66 8
69 7
79 6
82 5
I want to interpolate between them and generate a new data frame with a sampling rate of N. Assume N=3 for this example.
The desired output is
x y
10 10
27.5 10
45 10
...
75 6
77 6
79 6
80 5
81 5
82 5
How can I use my data frames to create the desired output?

If you don't mind using numpy, this solution will give you your desired output:
import pandas as pd
import numpy as np
N = 3
df = pd.DataFrame({'x': [10, 47, 58, 68, 75, 80],
'y': [10, 9, 8, 7, 6, 5]})
df2 = pd.DataFrame({'x': [45, 55, 66, 69, 79, 82], 'y': [10, 9, 8, 7, 6, 5]})
new_x = np.array([np.linspace(i, j, N) for i, j in zip(df['x'], df2['x'])]).flatten()
new_y = df['y'].loc[np.repeat(df.index.values, N)]
final_df = pd.DataFrame({'x': new_x, 'y': new_y})
print(final_df)
Output
x y
0 10.0 10
1 27.5 10
2 45.0 10
3 47.0 9
...
15 80.0 5
16 81.0 5
17 82.0 5

Related

Filling 0 with previous value at index

I have a df:
1 2 3 4 5 6 7 8 9 10
A 10 0 0 15 0 21 45 0 0 7
I am trying fill index A values with the current value if the next value is 0 so that the df would look like this:
1 2 3 4 5 6 7 8 9 10
A 10 10 10 15 15 21 45 45 45 7
I tried:
df.loc[['A']].replace(to_replace=0, method='ffill').values
But this does not work, where is my mistake?

If you want to use your method, you need to work with Series on both sides:
df.loc['A'] = df.loc['A'].replace(to_replace=0, method='ffill')
Alternatively, you can mask the 0 with NaNs, and ffill the data on axis=1:
df.mask(df.eq(0)).ffill(axis=1)
output:
1 2 3 4 5 6 7 8 9 10
A 10.0 10.0 10.0 15.0 15.0 21.0 45.0 45.0 45.0 7.0

Well you should change your code a little bit and work with series:
import pandas as pd
df = pd.DataFrame({'1': [10], '2': [0], '3': [0], '4': [15], '5': [0],
'6': [21], '7': [45], '8': [0], '9': [0], '10': [7]},
index=['A'])
print(df.apply(lambda x: pd.Series(x.values).replace(to_replace=0, method='ffill').values, axis=1))
Output:
A [10, 10, 10, 15, 15, 21, 45, 45, 45, 7]
dtype: object
This way, if you have multiple indices, the code still works:
import pandas as pd
df = pd.DataFrame({'1': [10, 11], '2': [0, 12], '3': [0, 0], '4': [15, 0], '5': [0, 3],
'6': [21, 3], '7': [45, 0], '8': [0, 4], '9': [0, 5], '10': [7, 0]},
index=['A', 'B'])
print(df.apply(lambda x: pd.Series(x.values).replace(to_replace=0, method='ffill').values, axis=1))
Output:
A [10, 10, 10, 15, 15, 21, 45, 45, 45, 7]
B [11, 12, 12, 12, 3, 3, 3, 4, 5, 5]
dtype: object

df.applymap(lambda x:pd.NA if x==0 else x).fillna(method='ffill',axis=1)
1 2 3 4 5 6 7 8 9 10
A 10 10 10 15 15 21 45 45 45 7

How to style pandas dataframe using a list of lists?

I have a pandas dataframe like:
I need to style it using a list of lists like:
[[3, 7, 4, 5],
[6, 17, 5, 10, 13, 16],
[7, 22, 6, 17, 19, 12],
[12, 26, 24, 25, 23, 18, 20],
[21, 20, 18, 27, 25]]
If R1 values are in first list color blue, if R2 values are in second list color blue and so on.
In other words color numbers of each column if value is in the correspondent list.
I have tried:
def posclass(val):
color = 'black'
for i in range(5):
if (val in list[i]):
color = 'blue'
return 'color: %s' % color
df.style.applymap(posclass, subset=['R1','R2','R3','R4','R5'])
But this is not working properly applying each list to each column.
The desired result is a dataframe with colored numbers (those that matches in each column with each list).

Try something like this:
df = pd.DataFrame(np.arange(40).reshape(-1,4), columns=[f'R{i}' for i in range(1,5)])
Input df:
R1 R2 R3 R4
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
3 12 13 14 15
4 16 17 18 19
5 20 21 22 23
6 24 25 26 27
7 28 29 30 31
8 32 33 34 35
9 36 37 38 39
and
list_l = [[3, 7, 4, 5],
[6, 17, 5, 10, 13, 16],
[7, 22, 6, 17, 19, 12],
[12, 26, 24, 25, 23, 18, 20],
[21, 20, 18, 27, 25]]
Then:
def f(x):
colpos = df.columns.get_loc(x.name)
return ['color: blue' if n in list_l[colpos] else '' for n in x]
df.style.apply(f)
Output:

Which API can implement tensor expansion in tensorflow ？

If I have a tensor of (30,40,50), and I want to expand it out to the first order, then I get a second order tensor of (30,2000), and I don't know if tensorflow has an API that implements it.
import tensorflow as tf
import numpy as np
data1=tf.constant([
[[2,5,7,8],[6,4,9,10],[14,16,86,54]],
[[16,43,65,76],[43,65,7,24],[15,75,23,75]]])
data5=tf.reshape(data1,[3,8])
data2,data3,data4=tf.split(data1,3,1)
data6=tf.reshape(data2,[1,8])
data7=tf.reshape(data3,[1,8])
data8=tf.reshape(data4,[1,8])
data9=tf.concat([data6,data7,data8],0)
with tf.Session() as sess:
print(sess.run(data5))
print(sess.run(data))
This gives:
data5
[[ 2 5 7 8 6 4 9 10]
[14 16 86 54 16 43 65 76]
[43 65 7 24 15 75 23 75]]
data9
[[ 2 5 7 8 16 43 65 76]
[ 6 4 9 10 43 65 7 24]
[14 16 86 54 15 75 23 75]]
How do I get data9 directly?

Looks like you're trying to take the sub-tensors ranging across axis 0 (data1[0], data1[1], ...) and concatenate them along axis 2.
Transposing before reshaping should do the trick:
tf.reshape(tf.transpose(data1, [1,0,2]), [data1.shape[1], data1.shape[0] * data1.shape[2]])

You can try:
data9 = tf.layers.flatten(tf.transpose(data1, perm=[1, 0, 2]))
Output:
array([[ 2, 5, 7, 8, 16, 43, 65, 76],
[ 6, 4, 9, 10, 43, 65, 7, 24],
[14, 16, 86, 54, 15, 75, 23, 75]], dtype=int32)

Joining values in 3D pandas array on ", ", making it a 2D array

I have a 3D dataframe, and I want to get all values of one x,y index across the z axis, where the z axis here moves between the original 2D dataframes. The way I am able to imagine it although forgive me if I'm mistaken because it's a little weird to visualize, if I got a vector of the x,y of x=0, y=0 it would be [1, 5, 3].
So my result would be a dataframe, where the df_2d[0][0] would be a string "1, 5, 3", and so on, taking all the values in the 3D dataframe.
Is there any way I can achieve this without looping through each cell index and accessing the values explicitly?
The data frame is defined as:
import pandas as pd
columns = ['A', 'B']
index = [1, 2, 3]
df_1 = pd.DataFrame(data=[[1, 2], [99, 57], [57, 20]], index=index, columns=columns)
df_2 = pd.DataFrame(data=[[5, 6], [78, 47], [21, 11]], index=index, columns=columns)
df_3 = pd.DataFrame(data=[[3, 4], [66, 37], [33, 17]], index=index, columns=columns)
df_3d = pd.concat([df_1, df_2, df_3], keys=['1', '2', '3'])
And then to get the original data out I do:
print(df_3d.xs('1'))
print(df_3d.xs('2'))
print(df_3d.xs('3'))
A B
1 1 2
2 99 57
3 57 20
A B
1 5 6
2 78 47
3 21 11
A B
1 3 4
2 66 37
3 33 17
Again, to clarify, if looking at this print I would like to have a combined dataframe looking like:
A B
1 '1, 5, 3' '2, 6, 4'
2 '99, 78, 66' '57, 47, 37'
3 '57, 21, 33' '20, 11, 17'

Use .xs to get each level dataframe and reduce to combine all dataframe together.
from functools import reduce
# Get each level values
dfs = [df_3d.xs(i) for i in df_3d.index.levels[0]]
df = reduce(lambda left,right: left.astype(str) + ", " + right.astype(str), dfs)
df
A B
1 1, 5, 3 2, 6, 4
2 99, 78, 66 57, 47, 37
3 57, 21, 33 20, 11, 17
And if you want ' you can use applymap to apply the function on every element.
df.applymap(lambda x: "'" + x + "'")
A B
1 '1, 5, 3' '2, 6, 4'
2 '99, 78, 66' '57, 47, 37'
3 '57, 21, 33' '20, 11, 17'
Or df = "'" + df + "'"
df
A B
1 '1, 5, 3' '2, 6, 4'
2 '99, 78, 66' '57, 47, 37'
3 '57, 21, 33' '20, 11, 17'

Selecting a range of columns in a dataframe

I have a dataset that consists of columns 0 to 10, and I would like to extract the information that is only in columns 1 to 5, not 6, and 7 to 9 (it means not the last column). So far, I have done the following:
A = B[:, [[1:5], [7:-1]]]
but I got a syntax error, how can I retrieve that data?

Advanced indexing doesn't take a list of lists of slices. Instead, you can use numpy.r_. This function doesn't take negative indices, but you can get round this by using np.ndarray.shape:
A = B[:, np.r_[1:6, 7:B.shape[1]-1]]
Remember to add 1 to the second part, since a: b does not include b, in the same way slice(a, b) does not include b. Also note that indexing begins at 0.
Here's a demo:
import numpy as np
B = np.random.randint(0, 10, (3, 11))
print(B)
[[5 8 8 8 3 0 7 2 1 6 7]
[4 3 8 7 3 7 5 6 0 5 7]
[1 0 4 0 2 2 5 1 4 2 3]]
A = B[:,np.r_[1:6, 7:B.shape[1]-1]]
print(A)
[[8 8 8 3 0 2 1 6]
[3 8 7 3 7 6 0 5]
[0 4 0 2 2 1 4 2]]

Another way would be to get your slices independently, and then concatenate:
A = np.concatenate([B[:, 1:6], B[:, 7:-1]], axis=1)
Using similar example data as #jpp:
B = np.random.randint(0, 10, (3, 10))
>>> B
array([[0, 5, 0, 6, 8, 5, 9, 3, 2, 0],
[8, 8, 1, 7, 3, 5, 7, 7, 4, 8],
[5, 5, 5, 2, 3, 1, 6, 4, 9, 6]])
A = np.concatenate([B[:, 1:6], B[:, 7:-1]], axis=1)
>>> A
array([[5, 0, 6, 8, 5, 3, 2],
[8, 1, 7, 3, 5, 7, 4],
[5, 5, 2, 3, 1, 4, 9]])

how about union the range?
B[:, np.union1d(range(1,6), range(7,10))]

Just to add some of my thoughts. There are two approaches one can take using either numpy or pandas. So I will demonstrate with some data, and assume that the data is the grades for a student in different courses he/she is enrolled in.
import pandas as pd
import numpy as np
data = {'Course A': [84, 82, 81, 89, 73, 94, 92, 70, 88, 95],
'Course B': [85, 82, 72, 77, 75, 89, 95, 84, 77, 94],
'Course C': [97, 94, 93, 95, 88, 82, 78, 84, 69, 78],
'Course D': [84, 82, 81, 89, 73, 94, 92, 70, 88, 95],
'Course E': [85, 82, 72, 77, 75, 89, 95, 84, 77, 94],
'Course F': [97, 94, 93, 95, 88, 82, 78, 84, 69, 78]
}
df = pd.DataFrame(data=data)
df.head()
CA CB CC CD CE CF
0 84 85 97 84 85 97
1 82 82 94 82 82 94
2 81 72 93 81 72 93
3 89 77 95 89 77 95
4 73 75 88 73 75 88
NOTE: CA through CF represent Course A through Course F.
To help us remember column names and their associated indexes, we can build a list of columns and their indexes via list comprehension.
map_cols = [f"{c[0]}:{c[1]}" for c in enumerate(df.columns)]
['0:Course A',
'1:Course B',
'2:Course C',
'3:Course D',
'4:Course E',
'5:Course F']
Now, to select say Course A, and Course D through Course F using indexing in numpy, you can do the following:
df.iloc[:, np.r_[0, 3:df.shape[1]]]
CA CD CE CF
0 84 84 85 97
1 82 82 82 94
2 81 81 72 93
3 89 89 77 95
4 73 73 75 88
You can also use pandas to the same effect.
df[[df.columns[0], *df.columns[3:]]]
CA CD CE CF
0 84 84 85 97
1 82 82 82 94
2 81 81 72 93
3 89 89 77 95
4 73 73 75 88

One can solve that with the sum of range
[In]: columns = list(range(1,6)) + list(range(7,10))
[Out]:
[1, 2, 3, 4, 5, 7, 8, 9]
Then, considering that your df is called df, using iloc to select the DF columns
newdf = df.iloc[:, columns]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

python resampling from two dataframes - python

Related

Filling 0 with previous value at index

How to style pandas dataframe using a list of lists?

Which API can implement tensor expansion in tensorflow ？

Joining values in 3D pandas array on ", ", making it a 2D array

Selecting a range of columns in a dataframe

Categories

Resources