How to keep the first window constant in sliding window?

How to keep the first window constant in sliding window? - python

I am using the following code to apply a sliding window on time-series data. I want to set up my first window as fixed and then apply the sliding window as shown below in the desired output.
df = pd.DataFrame({'B': [0, 1, 2, 3, 4, 5, 6,7,8,9,10]})
def sliding_window(data, size):
return [ data[x:x+size] for x in range( len(data) - size + 1 ) ]
sliding_window(df, 7)
output
[ B
0 0
1 1
2 2
3 3
4 4
5 5
6 6,
B
1 1
2 2
3 3
4 4
5 5
6 6
7 7,
B
2 2
3 3
4 4
5 5
6 6
7 7
8 8,
B
3 3
4 4
5 5
6 6
7 7
8 8
9 9,
B
4 4
5 5
6 6
7 7
8 8
9 9
10 10]
Desired output
Example:
I am using the fixed window of size 5 here. and it always should be the first window, and the sliding window is same as before except it slides from first window. Like the left figure in the images..
[ B
0 0
1 1
2 2
3 3
4 4,
B
0 0
1 1
2 2
3 3
4 4
5 5,
B
0 0
1 1
2 2
3 3
4 4
5 5
6 6,
B
0 0
1 1
2 2
3 3
4 4
5 5
6 6
7 7,
B
0 0
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8,
B
0 0
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9,
B
0 0
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10]

Try this:
def rolling_window_maybe(data, initial_size: int):
return [ data[:initial_size + x] for x in range( len(data) - initial_size ) ]
For example:
data = [1,2,3,4]
size = 2
data[:size + 0] == [1,2]
data[:size + 1] == [1,2,3]
data[:size + 2] == [1,2,3,4]

Related

Iteratively pop and append to generate new lists using pandas

I have a list of elements mylist = [1, 2, 3, 4, 5, 6, 7, 8] and would like to iteratively:
copy the list
pop the first element of the copied list
and append it to the end of the copied list
repeat this for the next row, etc.
Desired output:
index A B C D E F G H
0 1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8 1
2 3 4 5 6 7 8 1 2
3 4 5 6 7 8 1 2 3
4 5 6 7 8 1 2 3 4
5 6 7 8 1 2 3 4 5
6 7 8 1 2 3 4 5 6
7 8 1 2 3 4 5 6 7
I suspect a for loop is needed but am having trouble iteratively generating rows based on the prior row.

I think slicing (Understanding slicing) is what you are looking for:
next_iteration = my_list[1:] + [my_list[0]]
and the full loop:
output = []
for i in range(len(my_list)):
output.append(my_list[i:] + my_list[:i])

Use this numpy solution with rolls create by np.arange:
mylist = [1, 2, 3, 4, 5, 6, 7, 8]
a = np.array(mylist)
rolls = np.arange(0, -8, -1)
print (rolls)
[ 0 -1 -2 -3 -4 -5 -6 -7]
df = pd.DataFrame(a[(np.arange(len(a))[:,None]-rolls) % len(a)],
columns=list('ABCDEFGH'))
print (df)
A B C D E F G H
0 1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8 1
2 3 4 5 6 7 8 1 2
3 4 5 6 7 8 1 2 3
4 5 6 7 8 1 2 3 4
5 6 7 8 1 2 3 4 5
6 7 8 1 2 3 4 5 6
7 8 1 2 3 4 5 6 7
If need loop solution (slow) is possible use numpy.roll:
mylist = [1, 2, 3, 4, 5, 6, 7, 8]
rolls = np.arange(0, -8, -1)
df = pd.DataFrame([np.roll(mylist, i) for i in rolls],
columns=list('ABCDEFGH'))
print (df)
A B C D E F G H
0 1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8 1
2 3 4 5 6 7 8 1 2
3 4 5 6 7 8 1 2 3
4 5 6 7 8 1 2 3 4
5 6 7 8 1 2 3 4 5
6 7 8 1 2 3 4 5 6
7 8 1 2 3 4 5 6 7

try this:
mylist = [1, 2, 3, 4, 5, 6, 7, 8]
ar = np.roll(np.array(mylist), 1)
data = [ar := np.roll(ar, -1) for _ in range(ar.size)]
df = pd.DataFrame(data, columns=[*'ABCDEFGH'])
print(df)
>>>
A B C D E F G H
0 1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8 1
2 3 4 5 6 7 8 1 2
3 4 5 6 7 8 1 2 3
4 5 6 7 8 1 2 3 4
5 6 7 8 1 2 3 4 5
6 7 8 1 2 3 4 5 6
7 8 1 2 3 4 5 6 7

Filter and to stay only rows with the same index

I have two data frame: X_oos_top_10 and y_oos_top_10. I need to filter them by X_oos_top_10["comm"] == 1.
I do it for one:
X_oos_top_10_comm1 = X_oos_top_10[X_oos_top_10["comm"] == 1]
But for another I have the problem: IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).
y_oos_top_10_comm1 = y_oos_top_10[X_oos_top_10["comm"] == 1]
I haven't ideas how I can do it.

Assuming, X and y have the same length, you can use indexing.
Setup a minimal reproducible example:
X_oos_top_10 = pd.DataFrame({'comm': np.random.randint(1, 10, 10)})
y_oos_top_10 = pd.DataFrame(np.random.randint(1, 10, (10, 4)), columns=list('ABCD'))
print(X_oos_top_10)
# Output:
comm
0 5
1 6
2 2
3 6
4 1
5 6
6 1
7 4
8 5
9 8
print(y_oos_top_10)
# Output:
A B C D
0 2 9 1 6
1 9 8 5 4
2 1 6 7 6
3 6 3 6 5
4 2 6 8 3
5 2 6 6 5
6 4 4 3 5
7 6 3 7 5
8 2 8 8 7
9 4 9 1 4
1st method
idx = X_oos_top_10[X_oos_top_10["comm"] == 1].index
out = y_oos_top_10.loc[idx]
print(out)
# Output:
A B C D
4 2 6 8 3
6 4 4 3 5
2nd method
Xy_oos_top_10 = X_oos_top_10.join(y_oos_top_10)
out = Xy_oos_top_10[Xy_oos_top_10['comm'] == 1]
print(out)
# Output:
comm A B C D
4 1 2 6 8 3
6 1 4 4 3 5

Column selection with iloc, with both individual indices and ranges

I wonder why this line returns "invalid syntax", and what's the correct syntax to use for selecting both isolated columns and ranges in one go:
X = f1.iloc[:, [2,5,[10:19]]].values
Btw the same happens with:
X = f1.iloc[:, [2,5,10:19]].values
Thanks.

Second is correct syntax, only need numpy.r_ for concanecate indices:
np.random.seed(2019)
f1 = pd.DataFrame(np.random.randint(10, size=(5, 25))).add_prefix('a')
print(f1)
a0 a1 a2 a3 a4 a5 ... a19 a20 a21 a22 a23 a24
0 8 2 5 8 6 8 ... 0 1 6 0 2 6
1 6 3 1 3 5 0 ... 4 8 1 0 6 1
2 8 2 3 0 9 2 ... 7 1 0 7 4 4
3 7 0 8 9 0 7 ... 3 0 8 6 0 2
4 7 3 2 4 9 9 ... 0 8 8 1 4 9
X = f1.iloc[:, np.r_[2,5,10:19]].values
print(X)
[[5 8 5 3 0 2 5 7 8 5 4]
[1 0 2 9 8 3 7 7 7 0 3]
[3 2 6 2 1 1 1 1 8 6 2]
[8 7 7 8 0 5 7 4 1 1 4]
[2 9 7 2 9 3 8 5 2 5 5]]
Also is possible first convert values to numpy array, then iloc is not necessary:
X = f1.values[:, np.r_[2,5,10:19]]
print(X)
[[5 8 5 3 0 2 5 7 8 5 4]
[1 0 2 9 8 3 7 7 7 0 3]
[3 2 6 2 1 1 1 1 8 6 2]
[8 7 7 8 0 5 7 4 1 1 4]
[2 9 7 2 9 3 8 5 2 5 5]]

Python how to subset a data frame with just the column indices?

I have a huge dataframe with 282 columns and 500K rows. I wish to remove a list of columns from the dataframe using the column indices. The below code works for sequential columns.
df1 = df.ix[:,[0:2]]
The problem is that my column indices are not sequential.
For example, I want to remove columns 0,1,2 and 5 from df. I tried the following code:
df1 = df.ix[:,[0:2,5]]
I am getting the following error:
SyntaxError: invalid syntax
Any suggestions?

Select columns other than 0,1,2,5 with:
df.ix[:, [3,4]+list(range(6,282))]
Or a little more dynamic:
df.ix[:, [3,4]+list(range(6,df.shape[1]))]

Is it a numpy array you've got? Try
df1 = df.ix[:, (0,1,2,5)]
or
df1 = df.ix[:, [0,1,2,5]]
or
data[:, [i for i in range(3)]+[5]]

Use np.r_[...] for concatenating slices along the first axis
DF:
In [98]: df = pd.DataFrame(np.random.randint(10, size=(5, 12)))
In [99]: df
Out[99]:
0 1 2 3 4 5 6 7 8 9 10 11
0 0 7 2 9 9 0 7 3 5 8 8 1
1 4 9 0 4 0 2 4 8 8 7 1 9
2 2 1 1 2 7 4 4 6 1 2 9 8
3 1 5 0 8 2 2 4 1 1 4 8 4
4 4 6 3 2 2 4 1 6 2 6 9 0
Solution:
In [107]: df.iloc[:, np.r_[3:5, 6:df.shape[1]]]
Out[107]:
3 4 6 7 8 9 10 11
0 9 9 7 3 5 8 8 1
1 4 0 4 8 8 7 1 9
2 2 7 4 6 1 2 9 8
3 8 2 4 1 1 4 8 4
4 2 2 1 6 2 6 9 0
In [108]: np.r_[3:5, 6:df.shape[1]]
Out[108]: array([ 3, 4, 6, 7, 8, 9, 10, 11])
or
In [110]: df.columns.difference([0,1,2,5])
Out[110]: Int64Index([3, 4, 6, 7, 8, 9, 10, 11], dtype='int64')
In [111]: df[df.columns.difference([0,1,2,5])]
Out[111]:
3 4 6 7 8 9 10 11
0 9 9 7 3 5 8 8 1
1 4 0 4 8 8 7 1 9
2 2 7 4 6 1 2 9 8
3 8 2 4 1 1 4 8 4
4 2 2 1 6 2 6 9 0

Pandas: How to get max and min values and write for every row?

I have a data like that;
>> df
A B C
0 1 5 1
1 1 7 1
2 1 6 1
3 1 7 1
4 2 5 1
5 2 8 1
6 2 6 1
7 3 7 1
8 3 9 1
9 4 6 1
10 4 7 1
11 4 1 1
I want to take max and minimum values of the B column depending on the column A(For the each same value of column A, I want to find the min and max in column B) and want to write results on the original table. My code is:
df1 = df.groupby(['A']).B.transform(max)
df1 = df1.rename(columns={'B':'B_max'})
df2 = df.groupby.(['A']).B.transform(min)
df1 = df1.rename(columns={'B':'B_min'})
df3 = df.join(df1['B_max']).join(df2['B_min'])
This is the result.
A B C B_max B_min
0 1 5 1
1 1 7 1 7
2 1 6 1
3 1 4 1 4
4 2 5 1
5 2 8 1 8
6 2 6 1 6
7 3 7 1 7
8 3 9 1 9
9 4 6 1
10 4 7 1 7
11 4 1 1 1
But I want to table look like this;
A B C B_max B_min
0 1 5 1 7 4
1 1 7 1 7 4
2 1 6 1 7 4
3 1 4 1 7 4
4 2 5 1 8 6
5 2 8 1 8 6
6 2 6 1 8 6
7 3 7 1 9 7
8 3 9 1 9 7
9 4 6 1 7 1
10 4 7 1 7 1
11 4 1 1 7 1
interpret the code for the result to look like this

I think you need only assign values to new columns, because transform return Series with same length as df:
df = pd.DataFrame({
'A': [1, 1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4],
'B': [5, 7, 6, 7, 5, 8, 6, 7, 9, 6, 7, 1],
'C': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]})
print (df)
A B C
0 1 5 1
1 1 7 1
2 1 6 1
3 1 7 1
4 2 5 1
5 2 8 1
6 2 6 1
7 3 7 1
8 3 9 1
9 4 6 1
10 4 7 1
11 4 1 1
df['B_max'] = df.groupby(['A']).B.transform(max)
df['B_min'] = df.groupby(['A']).B.transform(min)
print (df)
A B C B_max B_min
0 1 5 1 7 5
1 1 7 1 7 5
2 1 6 1 7 5
3 1 7 1 7 5
4 2 5 1 8 5
5 2 8 1 8 5
6 2 6 1 8 5
7 3 7 1 9 7
8 3 9 1 9 7
9 4 6 1 7 1
10 4 7 1 7 1
11 4 1 1 7 1
g = df.groupby('A').B
df['B_max'] = g.transform(max)
df['B_min'] = g.transform(min)
print (df)
A B C B_max B_min
0 1 5 1 7 5
1 1 7 1 7 5
2 1 6 1 7 5
3 1 7 1 7 5
4 2 5 1 8 5
5 2 8 1 8 5
6 2 6 1 8 5
7 3 7 1 9 7
8 3 9 1 9 7
9 4 6 1 7 1
10 4 7 1 7 1
11 4 1 1 7 1

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to keep the first window constant in sliding window? - python

Try this: def rolling_window_maybe(data, initial_size: int): return [ data[:initial_size + x] for x in range( len(data) - initial_size ) ] For example: data = [1,2,3,4] size = 2 data[:size + 0] == [1,2] data[:size + 1] == [1,2,3] data[:size + 2] == [1,2,3,4]

Related

Iteratively pop and append to generate new lists using pandas

Filter and to stay only rows with the same index

Column selection with iloc, with both individual indices and ranges

Python how to subset a data frame with just the column indices?

Pandas: How to get max and min values and write for every row?

Categories

Resources