Iteratively pop and append to generate new lists using pandas - python

I have a list of elements mylist = [1, 2, 3, 4, 5, 6, 7, 8] and would like to iteratively:
copy the list
pop the first element of the copied list
and append it to the end of the copied list
repeat this for the next row, etc.
Desired output:
index A B C D E F G H
0 1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8 1
2 3 4 5 6 7 8 1 2
3 4 5 6 7 8 1 2 3
4 5 6 7 8 1 2 3 4
5 6 7 8 1 2 3 4 5
6 7 8 1 2 3 4 5 6
7 8 1 2 3 4 5 6 7
I suspect a for loop is needed but am having trouble iteratively generating rows based on the prior row.

I think slicing (Understanding slicing) is what you are looking for:
next_iteration = my_list[1:] + [my_list[0]]
and the full loop:
output = []
for i in range(len(my_list)):
output.append(my_list[i:] + my_list[:i])

Use this numpy solution with rolls create by np.arange:
mylist = [1, 2, 3, 4, 5, 6, 7, 8]
a = np.array(mylist)
rolls = np.arange(0, -8, -1)
print (rolls)
[ 0 -1 -2 -3 -4 -5 -6 -7]
df = pd.DataFrame(a[(np.arange(len(a))[:,None]-rolls) % len(a)],
columns=list('ABCDEFGH'))
print (df)
A B C D E F G H
0 1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8 1
2 3 4 5 6 7 8 1 2
3 4 5 6 7 8 1 2 3
4 5 6 7 8 1 2 3 4
5 6 7 8 1 2 3 4 5
6 7 8 1 2 3 4 5 6
7 8 1 2 3 4 5 6 7
If need loop solution (slow) is possible use numpy.roll:
mylist = [1, 2, 3, 4, 5, 6, 7, 8]
rolls = np.arange(0, -8, -1)
df = pd.DataFrame([np.roll(mylist, i) for i in rolls],
columns=list('ABCDEFGH'))
print (df)
A B C D E F G H
0 1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8 1
2 3 4 5 6 7 8 1 2
3 4 5 6 7 8 1 2 3
4 5 6 7 8 1 2 3 4
5 6 7 8 1 2 3 4 5
6 7 8 1 2 3 4 5 6
7 8 1 2 3 4 5 6 7

try this:
mylist = [1, 2, 3, 4, 5, 6, 7, 8]
ar = np.roll(np.array(mylist), 1)
data = [ar := np.roll(ar, -1) for _ in range(ar.size)]
df = pd.DataFrame(data, columns=[*'ABCDEFGH'])
print(df)
>>>
A B C D E F G H
0 1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8 1
2 3 4 5 6 7 8 1 2
3 4 5 6 7 8 1 2 3
4 5 6 7 8 1 2 3 4
5 6 7 8 1 2 3 4 5
6 7 8 1 2 3 4 5 6
7 8 1 2 3 4 5 6 7

Related

How to keep the first window constant in sliding window?

I am using the following code to apply a sliding window on time-series data. I want to set up my first window as fixed and then apply the sliding window as shown below in the desired output.
df = pd.DataFrame({'B': [0, 1, 2, 3, 4, 5, 6,7,8,9,10]})
def sliding_window(data, size):
return [ data[x:x+size] for x in range( len(data) - size + 1 ) ]
sliding_window(df, 7)
output
[ B
0 0
1 1
2 2
3 3
4 4
5 5
6 6,
B
1 1
2 2
3 3
4 4
5 5
6 6
7 7,
B
2 2
3 3
4 4
5 5
6 6
7 7
8 8,
B
3 3
4 4
5 5
6 6
7 7
8 8
9 9,
B
4 4
5 5
6 6
7 7
8 8
9 9
10 10]
Desired output
Example:
I am using the fixed window of size 5 here. and it always should be the first window, and the sliding window is same as before except it slides from first window. Like the left figure in the images..
[ B
0 0
1 1
2 2
3 3
4 4,
B
0 0
1 1
2 2
3 3
4 4
5 5,
B
0 0
1 1
2 2
3 3
4 4
5 5
6 6,
B
0 0
1 1
2 2
3 3
4 4
5 5
6 6
7 7,
B
0 0
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8,
B
0 0
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9,
B
0 0
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10]
Try this:
def rolling_window_maybe(data, initial_size: int):
return [ data[:initial_size + x] for x in range( len(data) - initial_size ) ]
For example:
data = [1,2,3,4]
size = 2
data[:size + 0] == [1,2]
data[:size + 1] == [1,2,3]
data[:size + 2] == [1,2,3,4]

Column selection with iloc, with both individual indices and ranges

I wonder why this line returns "invalid syntax", and what's the correct syntax to use for selecting both isolated columns and ranges in one go:
X = f1.iloc[:, [2,5,[10:19]]].values
Btw the same happens with:
X = f1.iloc[:, [2,5,10:19]].values
Thanks.
Second is correct syntax, only need numpy.r_ for concanecate indices:
np.random.seed(2019)
f1 = pd.DataFrame(np.random.randint(10, size=(5, 25))).add_prefix('a')
print(f1)
a0 a1 a2 a3 a4 a5 ... a19 a20 a21 a22 a23 a24
0 8 2 5 8 6 8 ... 0 1 6 0 2 6
1 6 3 1 3 5 0 ... 4 8 1 0 6 1
2 8 2 3 0 9 2 ... 7 1 0 7 4 4
3 7 0 8 9 0 7 ... 3 0 8 6 0 2
4 7 3 2 4 9 9 ... 0 8 8 1 4 9
X = f1.iloc[:, np.r_[2,5,10:19]].values
print(X)
[[5 8 5 3 0 2 5 7 8 5 4]
[1 0 2 9 8 3 7 7 7 0 3]
[3 2 6 2 1 1 1 1 8 6 2]
[8 7 7 8 0 5 7 4 1 1 4]
[2 9 7 2 9 3 8 5 2 5 5]]
Also is possible first convert values to numpy array, then iloc is not necessary:
X = f1.values[:, np.r_[2,5,10:19]]
print(X)
[[5 8 5 3 0 2 5 7 8 5 4]
[1 0 2 9 8 3 7 7 7 0 3]
[3 2 6 2 1 1 1 1 8 6 2]
[8 7 7 8 0 5 7 4 1 1 4]
[2 9 7 2 9 3 8 5 2 5 5]]

Python how to subset a data frame with just the column indices?

I have a huge dataframe with 282 columns and 500K rows. I wish to remove a list of columns from the dataframe using the column indices. The below code works for sequential columns.
df1 = df.ix[:,[0:2]]
The problem is that my column indices are not sequential.
For example, I want to remove columns 0,1,2 and 5 from df. I tried the following code:
df1 = df.ix[:,[0:2,5]]
I am getting the following error:
SyntaxError: invalid syntax
Any suggestions?
Select columns other than 0,1,2,5 with:
df.ix[:, [3,4]+list(range(6,282))]
Or a little more dynamic:
df.ix[:, [3,4]+list(range(6,df.shape[1]))]
Is it a numpy array you've got? Try
df1 = df.ix[:, (0,1,2,5)]
or
df1 = df.ix[:, [0,1,2,5]]
or
data[:, [i for i in range(3)]+[5]]
Use np.r_[...] for concatenating slices along the first axis
DF:
In [98]: df = pd.DataFrame(np.random.randint(10, size=(5, 12)))
In [99]: df
Out[99]:
0 1 2 3 4 5 6 7 8 9 10 11
0 0 7 2 9 9 0 7 3 5 8 8 1
1 4 9 0 4 0 2 4 8 8 7 1 9
2 2 1 1 2 7 4 4 6 1 2 9 8
3 1 5 0 8 2 2 4 1 1 4 8 4
4 4 6 3 2 2 4 1 6 2 6 9 0
Solution:
In [107]: df.iloc[:, np.r_[3:5, 6:df.shape[1]]]
Out[107]:
3 4 6 7 8 9 10 11
0 9 9 7 3 5 8 8 1
1 4 0 4 8 8 7 1 9
2 2 7 4 6 1 2 9 8
3 8 2 4 1 1 4 8 4
4 2 2 1 6 2 6 9 0
In [108]: np.r_[3:5, 6:df.shape[1]]
Out[108]: array([ 3, 4, 6, 7, 8, 9, 10, 11])
or
In [110]: df.columns.difference([0,1,2,5])
Out[110]: Int64Index([3, 4, 6, 7, 8, 9, 10, 11], dtype='int64')
In [111]: df[df.columns.difference([0,1,2,5])]
Out[111]:
3 4 6 7 8 9 10 11
0 9 9 7 3 5 8 8 1
1 4 0 4 8 8 7 1 9
2 2 7 4 6 1 2 9 8
3 8 2 4 1 1 4 8 4
4 2 2 1 6 2 6 9 0

Pandas: How to get max and min values and write for every row?

I have a data like that;
>> df
A B C
0 1 5 1
1 1 7 1
2 1 6 1
3 1 7 1
4 2 5 1
5 2 8 1
6 2 6 1
7 3 7 1
8 3 9 1
9 4 6 1
10 4 7 1
11 4 1 1
I want to take max and minimum values of the B column depending on the column A(For the each same value of column A, I want to find the min and max in column B) and want to write results on the original table. My code is:
df1 = df.groupby(['A']).B.transform(max)
df1 = df1.rename(columns={'B':'B_max'})
df2 = df.groupby.(['A']).B.transform(min)
df1 = df1.rename(columns={'B':'B_min'})
df3 = df.join(df1['B_max']).join(df2['B_min'])
This is the result.
A B C B_max B_min
0 1 5 1
1 1 7 1 7
2 1 6 1
3 1 4 1 4
4 2 5 1
5 2 8 1 8
6 2 6 1 6
7 3 7 1 7
8 3 9 1 9
9 4 6 1
10 4 7 1 7
11 4 1 1 1
But I want to table look like this;
A B C B_max B_min
0 1 5 1 7 4
1 1 7 1 7 4
2 1 6 1 7 4
3 1 4 1 7 4
4 2 5 1 8 6
5 2 8 1 8 6
6 2 6 1 8 6
7 3 7 1 9 7
8 3 9 1 9 7
9 4 6 1 7 1
10 4 7 1 7 1
11 4 1 1 7 1
interpret the code for the result to look like this
I think you need only assign values to new columns, because transform return Series with same length as df:
df = pd.DataFrame({
'A': [1, 1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4],
'B': [5, 7, 6, 7, 5, 8, 6, 7, 9, 6, 7, 1],
'C': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]})
print (df)
A B C
0 1 5 1
1 1 7 1
2 1 6 1
3 1 7 1
4 2 5 1
5 2 8 1
6 2 6 1
7 3 7 1
8 3 9 1
9 4 6 1
10 4 7 1
11 4 1 1
df['B_max'] = df.groupby(['A']).B.transform(max)
df['B_min'] = df.groupby(['A']).B.transform(min)
print (df)
A B C B_max B_min
0 1 5 1 7 5
1 1 7 1 7 5
2 1 6 1 7 5
3 1 7 1 7 5
4 2 5 1 8 5
5 2 8 1 8 5
6 2 6 1 8 5
7 3 7 1 9 7
8 3 9 1 9 7
9 4 6 1 7 1
10 4 7 1 7 1
11 4 1 1 7 1
g = df.groupby('A').B
df['B_max'] = g.transform(max)
df['B_min'] = g.transform(min)
print (df)
A B C B_max B_min
0 1 5 1 7 5
1 1 7 1 7 5
2 1 6 1 7 5
3 1 7 1 7 5
4 2 5 1 8 5
5 2 8 1 8 5
6 2 6 1 8 5
7 3 7 1 9 7
8 3 9 1 9 7
9 4 6 1 7 1
10 4 7 1 7 1
11 4 1 1 7 1

Python Reduce conditional expression

I have 9 variables a,b,c,d,e,f,g,h,i and I loop them inside 9 for loop from 0 to 9. But the range may vary.
I want all sequences of them abcdefghi, such that there is no repeated number.
Right now I have this, below:
for a in range(0, 9):
for b in range(0,9): #it doesn't have to start from 0
....
for i in range(0, 9):
if a != b and a != c ... a != i
b != c and b != d ... b != i
c != d and c != e ... c != i
... h != i:
print (a,b,c,d,e,f,g,h,i)
There are 9! = 362880 of them,
But how can I reduce the conditional expression? And what if the ranges for the for loops are different?
Thanks in advance!
You can simply do this with the itertools module:
from itertools import permutations
for arrangement in permutations('abcdefghi', 9):
print ''.join(arrangement)
from itertools import permutations
for perm in permutations(range(1, 10), 9):
print(" ".join(str(i) for i in perm))
which gives
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 9 8
1 2 3 4 5 6 8 7 9
1 2 3 4 5 6 8 9 7
1 2 3 4 5 6 9 7 8
1 2 3 4 5 6 9 8 7
# ... etc - 9! = 362880 permutations
what if i want sequence of abcdefghi such taht a,b,c,e,g is value from 0 to 9, and d,f,h,i in the range of 1 to 5
This is a bit more complicated, but still achievable. It is easier to pick the values in d..i first:
from itertools import permutations
for d,f,h,i,unused in permutations([1,2,3,4,5], 5):
for a,b,c,e,g in permutations([unused,6,7,8,9], 5):
print(a,b,c,d,e,f,g,h,i)
which gives
5 6 7 1 8 2 9 3 4
5 6 7 1 9 2 8 3 4
5 6 8 1 7 2 9 3 4
5 6 8 1 9 2 7 3 4
5 6 9 1 7 2 8 3 4
5 6 9 1 8 2 7 3 4
5 7 6 1 8 2 9 3 4
5 7 6 1 9 2 8 3 4
5 7 8 1 6 2 9 3 4
5 7 8 1 9 2 6 3 4
# ... etc - 5! * 5! = 14400 permutations
For the general case (ie Sudoku) you need a more general solution - a constraint solver like python-constraint (for intro see the python-constraint home page).
Then your solution starts to look like
from constraint import Problem, AllDifferentConstraint
p = Problem()
p.addVariables("abceg", list(range(1,10)))
p.addVariables("dfhi", list(range(1, 6)))
p.addConstraint(AllDifferentConstraint())
for sol in p.getSolutionIter():
print("{a} {b} {c} {d} {e} {f} {g} {h} {i}".format(**sol))
which gives
9 8 7 4 6 3 5 2 1
9 8 7 4 5 3 6 2 1
9 8 6 4 7 3 5 2 1
9 8 6 4 5 3 7 2 1
9 8 5 4 6 3 7 2 1
9 8 5 4 7 3 6 2 1
9 7 8 4 5 3 6 2 1
9 7 8 4 6 3 5 2 1
9 7 6 4 8 3 5 2 1
9 7 6 4 5 3 8 2 1
9 7 5 4 6 3 8 2 1
# ... etc - 14400 solutions

Categories