Selecting/Manuplating cells based on their location in the dataframe - python

I have a dataframe as below
A B C D E F G H I
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
I want to multiply every 3rd column after the 2 column in the last 2 rows by 5 to get the ouput as below.
How to acomplish this?
A B C D E F G H I
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
1 10 3 4 25 6 7 40 9
1 10 3 4 25 6 7 40 9
I am able to select the cells i need with df.iloc[-2:,1::3]
which results in the df as below but I am not able to proceed further.
B E H
2 5 8
2 5 8
I know that I can select the same cells with loc instead of iloc, then the calcualtion is straign forward, but i am not able to figure it out.
The column names & cell values CANNOT Be used since these change (the df here is just a dummy data)

You can assign back to same selection of rows/ columns like:
df.iloc[-2:,1::3] = df.iloc[-2:,1::3].mul(5)
#alternative
#df.iloc[-2:,1::3] = df.iloc[-2:,1::3] * 5
print (df)
A B C D E F G H I
0 1 2 3 4 5 6 7 8 9
1 1 2 3 4 5 6 7 8 9
2 1 2 3 4 5 6 7 8 9
3 1 2 3 4 5 6 7 8 9
4 1 2 3 4 5 6 7 8 9
5 1 10 3 4 25 6 7 40 9
6 1 10 3 4 25 6 7 40 9

Related

Update values in dataframe based on dictionary and condition

I have a dataframe and a dictionary that contains some of the columns of the dataframe and some values. I want to update the dataframe based on the dictionary values, and pick the higher value.
>>> df1
a b c d e f
0 4 2 6 2 8 1
1 3 6 7 7 8 5
2 2 1 1 6 8 7
3 1 2 7 3 3 1
4 1 7 2 6 7 6
5 4 8 8 2 2 1
and the dictionary is
compare = {'a':4, 'c':7, 'e':3}
So I want to check the values in columns ['a','c','e'] and replace with the value in the dictionary, if it is higher.
What I have tried is this:
comp = pd.DataFrame(pd.Series(compare).reindex(df1.columns).fillna(0)).T
df1[df1.columns] = df1.apply(lambda x: np.where(x>comp, x, comp)[0] ,axis=1)
Excepted Output:
>>>df1
a b c d e f
0 4 2 7 2 8 1
1 4 6 7 7 8 5
2 4 1 7 6 8 7
3 4 2 7 3 3 1
4 4 7 7 6 7 6
5 4 8 8 2 3 1
Another possible solution, based on numpy:
cols = list(compare.keys())
df[cols] = np.maximum(df[cols].values, np.array(list(compare.values())))
Output:
a b c d e f
0 4 2 7 2 8 1
1 4 6 7 7 8 5
2 4 1 7 6 8 7
3 4 2 7 3 3 1
4 4 7 7 6 7 6
5 4 8 8 2 3 1
limits = df.columns.map(compare).to_series(index=df.columns)
new = df.mask(df < limits, limits, axis=1)
obtain a Series whose index is columns of df and values from the dictionary
check if the frame's values are less then the "limits"; if so, put what limits have; else, as is
to get
>>> new
a b c d e f
0 4 2 7 2 8 1
1 4 6 7 7 8 5
2 4 1 7 6 8 7
3 4 2 7 3 3 1
4 4 7 7 6 7 6
5 4 8 8 2 3 1

Assign 1 value to random sample of group where the sample size is equal to the value of another column

I want to randomly assign 1 value to the IsShade column (output) such that value 1 can be assigned only D times (see column Shading for ex 2 times or 5 times or 3 times) and have to iterate it for E times (Total column for ex 6 times or 8 times or 5 times)
There are 1 million rows of dataset and attached is sample input and image.
Input:
In[1]:
Sr Series Parallel Shading Total Cell
0 0 3 2 2 6 1
1 1 3 2 2 6 2
2 2 3 2 2 6 3
3 3 3 2 2 6 4
4 4 3 2 2 6 5
5 5 3 2 2 6 6
6 6 4 2 5 8 1
7 7 4 2 5 8 2
8 8 4 2 5 8 3
9 9 4 2 5 8 4
10 10 4 2 5 8 5
11 11 4 2 5 8 6
12 12 4 2 5 8 7
13 13 4 2 5 8 8
14 14 5 1 3 5 1
15 15 5 1 3 5 2
16 16 5 1 3 5 3
17 17 5 1 3 5 4
18 18 5 1 3 5 5
If you can help me in how to achieve or python code that will be helpful. Thank you and appreciate it.
Example Expected Output:
Out[1]:
Sr Series Parallel Shading Total Cell IsShade
0 0 3 2 2 6 1 0
1 1 3 2 2 6 2 0
2 2 3 2 2 6 3 1
3 3 3 2 2 6 4 0
4 4 3 2 2 6 5 0
5 5 3 2 2 6 6 1
6 6 4 2 5 8 1 1
7 7 4 2 5 8 2 0
8 8 4 2 5 8 3 1
9 9 4 2 5 8 4 1
10 10 4 2 5 8 5 0
11 11 4 2 5 8 6 0
12 12 4 2 5 8 7 1
13 13 4 2 5 8 8 1
14 14 5 1 3 5 1 0
15 15 5 1 3 5 2 1
16 16 5 1 3 5 3 0
17 17 5 1 3 5 4 1
18 18 5 1 3 5 5 1
You can create a new column that does a .groupby and randomly selects x number of rows based off the integer in the Shading column using .sample. From there, I returned True or False and converted to an integer (True becomes 1 and False becomes 0 with .astype(int)):
s = df['Series'].ne(df['Series'].shift()).cumsum() #s is a unique identifier group
df['IsShade'] = (df.groupby(s, group_keys=False)
.apply(lambda x: x['Shading'].sample(x['Shading'].iloc[0])) > 0)
df['IsShade'] = df['IsShade'].fillna(False).astype(int)
df
Out[1]:
Sr Series Parallel Shading Total Cell IsShade
0 0 3 2 2 6 1 0
1 1 3 2 2 6 2 0
2 2 3 2 2 6 3 0
3 3 3 2 2 6 4 0
4 4 3 2 2 6 5 1
5 5 3 2 2 6 6 1
6 6 4 2 5 8 1 1
7 7 4 2 5 8 2 1
8 8 4 2 5 8 3 0
9 9 4 2 5 8 4 0
10 10 4 2 5 8 5 1
11 11 4 2 5 8 6 1
12 12 4 2 5 8 7 1
13 13 4 2 5 8 8 0
14 14 5 1 3 5 1 1
15 15 5 1 3 5 2 0
16 16 5 1 3 5 3 0
17 17 5 1 3 5 4 1
18 18 5 1 3 5 5 1

How to ensure consistent spacing in creating number pyramid (Python)

I'm trying to create a number pyramid in python, and none of the solutions I've found on Stack Overflow are quite what I'm looking for. Here is the code I have so far:
for i in range(1, height+1):
for j in range(1, height-i+1):
if j > 9:
print(len(str(j)) * " ", end=" ")
else:
print(" ", end=" ")
for j in range(i, 0, -1):
print(j, end=" ")
for j in range(2, i + 1):
print(j, end=" ")
print()
And here is the output:
1
2 1 2
3 2 1 2 3
4 3 2 1 2 3 4
5 4 3 2 1 2 3 4 5
6 5 4 3 2 1 2 3 4 5 6
7 6 5 4 3 2 1 2 3 4 5 6 7
8 7 6 5 4 3 2 1 2 3 4 5 6 7 8
9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9
10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10
11 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11
12 11 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11 12
13 12 11 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11 12 13
14 13 12 11 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
From what I can see, the code works fine with heights <= 9, but once double digits come in, the alignment fails. I also need to ensure that the spacing between each number is consistent (ONE space in between each number), but the workarounds that I've looked at involve adding more than one space.
Please let me know if there is anything I should clarify, and thank you in advance for your time!
You can use string formatting to define a fixed width for a field, padded by either whitespace or zeroes.
field_len = len(str(height))
for i in range(1, height+1):
for j in range(1, height-i+1):
print(" " * field_len, end=" ")
for j in range(i, 0, -1):
print(f"{j:{field_len}}", end=" ")
for j in range(2, i + 1):
print(f"{j:{field_len}}", end=" ")
print()
which produces
1
2 1 2
3 2 1 2 3
4 3 2 1 2 3 4
5 4 3 2 1 2 3 4 5
6 5 4 3 2 1 2 3 4 5 6
7 6 5 4 3 2 1 2 3 4 5 6 7
8 7 6 5 4 3 2 1 2 3 4 5 6 7 8
9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9
10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10
11 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11
12 11 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11 12
13 12 11 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11 12 13
14 13 12 11 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
and which will auto-adjust the spacing depending on if the number of digits change.
This keeps the slope of the pyramid the same, though the alignment appears to get more sparse with interior numbers, as they're padded into two spaces.
A solution to that is just to use the width of the current number as the number of spaces - which we can do by changing the arguments to range() where it prints the spaces, to actually count down from the height.
for i in range(1, height+1):
for j in range(i, height):
print(" " * len(str(j + 1)), end=" ")
for j in range(i, 0, -1):
print(j, end=" ")
for j in range(2, i + 1):
print(j, end=" ")
print()
This produces a pyramid with uneven slopes but even spacing.
1
2 1 2
3 2 1 2 3
4 3 2 1 2 3 4
5 4 3 2 1 2 3 4 5
6 5 4 3 2 1 2 3 4 5 6
7 6 5 4 3 2 1 2 3 4 5 6 7
8 7 6 5 4 3 2 1 2 3 4 5 6 7 8
9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9
10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10
11 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11
12 11 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11 12
13 12 11 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11 12 13
14 13 12 11 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
just for completeness I will provide another approach to this problem.
the main idea is to keep track of the length of the current line and use rjust to pad with whatever delimeter you wish (I chose the default whitespace)
height = 16
max_line_len = len(' '.join([str(i) for i in range(height,0,-1)] + [str(i) for i in range(2,height+1)]))
half_max_line_len = int((max_line_len+1)/2)
list_of_nums = [str(1)]
print('creating pyramid...')
for num in range(1, height+1):
print(' '.join(list_of_nums).rjust(half_max_line_len))
list_of_nums = [str(num+1)] + list_of_nums + [str(num+1)]
half_max_line_len += len(str(num+1))+1
output:
creating pyramid...
1
2 1 2
3 2 1 2 3
4 3 2 1 2 3 4
5 4 3 2 1 2 3 4 5
6 5 4 3 2 1 2 3 4 5 6
7 6 5 4 3 2 1 2 3 4 5 6 7
8 7 6 5 4 3 2 1 2 3 4 5 6 7 8
9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9
10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10
11 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11
12 11 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11 12
13 12 11 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11 12 13
14 13 12 11 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Slicing Pandas data frame into two parts

Actually I thougth this should be very easy. I have a pandas data frame with lets say 100 colums and I want a subset containing colums 0:30 and 77:99.
What I've done so far is:
df_1 = df.iloc[:,0:30]
df_2 = df.iloc[:,77:99]
df2 = pd.concat([df_1 , df_2], axis=1, join_axes=[df_1 .index])
Is there an easier way?
Use numpy.r_ for concanecate indices:
df2 = df.iloc[:, np.r_[0:30, 77:99]]
Sample:
df = pd.DataFrame(np.random.randint(10, size=(5,15)))
print (df)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
0 6 2 9 5 4 6 9 9 7 9 6 6 1 0 6
1 5 6 7 0 7 8 7 9 4 8 1 2 0 8 5
2 5 6 1 6 7 6 1 5 5 4 6 3 2 3 0
3 4 3 1 3 3 8 3 6 7 1 8 6 2 1 8
4 3 8 2 3 7 3 6 4 4 6 2 6 9 4 9
df2 = df.iloc[:, np.r_[0:3, 7:9]]
print (df2)
0 1 2 7 8
0 6 2 9 9 7
1 5 6 7 9 4
2 5 6 1 5 5
3 4 3 1 6 7
4 3 8 2 4 4
df_1 = df.iloc[:,0:3]
df_2 = df.iloc[:,7:9]
df2 = pd.concat([df_1 , df_2], axis=1, join_axes=[df_1 .index])
print (df2)
0 1 2 7 8
0 6 2 9 9 7
1 5 6 7 9 4
2 5 6 1 5 5
3 4 3 1 6 7
4 3 8 2 4 4

Python Reduce conditional expression

I have 9 variables a,b,c,d,e,f,g,h,i and I loop them inside 9 for loop from 0 to 9. But the range may vary.
I want all sequences of them abcdefghi, such that there is no repeated number.
Right now I have this, below:
for a in range(0, 9):
for b in range(0,9): #it doesn't have to start from 0
....
for i in range(0, 9):
if a != b and a != c ... a != i
b != c and b != d ... b != i
c != d and c != e ... c != i
... h != i:
print (a,b,c,d,e,f,g,h,i)
There are 9! = 362880 of them,
But how can I reduce the conditional expression? And what if the ranges for the for loops are different?
Thanks in advance!
You can simply do this with the itertools module:
from itertools import permutations
for arrangement in permutations('abcdefghi', 9):
print ''.join(arrangement)
from itertools import permutations
for perm in permutations(range(1, 10), 9):
print(" ".join(str(i) for i in perm))
which gives
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 9 8
1 2 3 4 5 6 8 7 9
1 2 3 4 5 6 8 9 7
1 2 3 4 5 6 9 7 8
1 2 3 4 5 6 9 8 7
# ... etc - 9! = 362880 permutations
what if i want sequence of abcdefghi such taht a,b,c,e,g is value from 0 to 9, and d,f,h,i in the range of 1 to 5
This is a bit more complicated, but still achievable. It is easier to pick the values in d..i first:
from itertools import permutations
for d,f,h,i,unused in permutations([1,2,3,4,5], 5):
for a,b,c,e,g in permutations([unused,6,7,8,9], 5):
print(a,b,c,d,e,f,g,h,i)
which gives
5 6 7 1 8 2 9 3 4
5 6 7 1 9 2 8 3 4
5 6 8 1 7 2 9 3 4
5 6 8 1 9 2 7 3 4
5 6 9 1 7 2 8 3 4
5 6 9 1 8 2 7 3 4
5 7 6 1 8 2 9 3 4
5 7 6 1 9 2 8 3 4
5 7 8 1 6 2 9 3 4
5 7 8 1 9 2 6 3 4
# ... etc - 5! * 5! = 14400 permutations
For the general case (ie Sudoku) you need a more general solution - a constraint solver like python-constraint (for intro see the python-constraint home page).
Then your solution starts to look like
from constraint import Problem, AllDifferentConstraint
p = Problem()
p.addVariables("abceg", list(range(1,10)))
p.addVariables("dfhi", list(range(1, 6)))
p.addConstraint(AllDifferentConstraint())
for sol in p.getSolutionIter():
print("{a} {b} {c} {d} {e} {f} {g} {h} {i}".format(**sol))
which gives
9 8 7 4 6 3 5 2 1
9 8 7 4 5 3 6 2 1
9 8 6 4 7 3 5 2 1
9 8 6 4 5 3 7 2 1
9 8 5 4 6 3 7 2 1
9 8 5 4 7 3 6 2 1
9 7 8 4 5 3 6 2 1
9 7 8 4 6 3 5 2 1
9 7 6 4 8 3 5 2 1
9 7 6 4 5 3 8 2 1
9 7 5 4 6 3 8 2 1
# ... etc - 14400 solutions

Categories