Randomly select cells in df pandas

Randomly select cells in df pandas - python

From this pandas df
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
samples_indices = df.sample(frac=0.5, replace=False).index
df.loc[samples_indices] = 'X'
will assign 'X' to all columns in randomly selected rows corresponding to 50% of df, like so:
X X X X
1 1 1 1
X X X X
1 1 1 1
But how do I assign 'X' to 50% randomly selected cells in the df?
For example like this:
X X X 1
1 X 1 1
X X X 1
1 1 1 X

Use numpy and boolean indexing, for an efficient solution:
import numpy as np
df[np.random.choice([True, False], size=df.shape)] = 'X'
# with a custom probability:
N = 0.5
df[np.random.choice([True, False], size=df.shape, p=[N, 1-N])] = 'X'
Example output:
0 1 2 3
0 X 1 X X
1 X X 1 X
2 X X X 1
3 X X 1 X
If you need an exact proportion, you can use:
frac = 0.5
df[np.random.permutation(df.size).reshape(df.shape)>=df.size*frac] = 'X'
Example:
0 1 2 3
0 X 1 X 1
1 X 1 X 1
2 1 1 X 1
3 X X 1 X

In #mozway's answer you can set to 'X' cells with a certain probability. But let's say you want to have exactly 50% of your data being marked as 'X'. This is how you can do it:
import numpy as np
df[np.random.permutation(np.hstack([np.ones(df.size // 2), np.zeros(df.size // 2)])).astype(bool).reshape(df.shape)] = 'X'
Example output:
X X X 1
1 X 1 1
X X X 1
1 1 1 X

Create MultiIndex Series by DataFrame.stack, then use Series.sample and last replace removed values by X in Series.unstack:
N = 0.5
df = (df.stack().sample(frac=1-N).unstack(fill_value='X')
.reindex(index=df.index, columns=df.columns, fill_value='X'))
print (df)
0 1 2 3
0 X X 1 1
1 X 1 X 1
2 1 X X X
3 1 1 1 X

Related

Splitting columns containing comma separated string to new row values

I have a data frame of the below format
variable val
0 'a','x','y' 10
I would like to unnlist(explode) the data in the below format.
variable1 variable2 value
0 a x 10
1 a y 10
2 x y 10
I have tried using df.explode which does not give me the relation between x and y. My code is as below. Can anyone guide me as to how can I proceed further to get the x and y data. Thanks in advance.
import pandas as pd
from ast import literal_eval
data = {'name':["'a','x','y'"], 'val' : [10]}
df = pd.DataFrame(data)
df2 = (df['name'].str.split(',',expand = True, n = 1)
.rename(columns = {0 : 'variable 1', 1 : 'variable 2'})
.join(df.drop(columns = 'name')))
df2['variable 2']=df2['variable 2'].map(literal_eval)
df2=df2.explode('variable 2',ignore_index=True)
print(df2)
OUTPUT:
variable 1 variable 2 val
0 'a' x 10
1 'a' y 10

If need each combinations per splitted values by , use:
print (df)
variable val
0 'a','x','y' 10
1 'a','x','y','f' 80
2 's' 4
from itertools import combinations
df['variable'] = df['variable'].str.replace("'", "", regex=True)
s = [x.split(',') if ',' in x else (x,x) for x in df['variable']]
L = [(*y, z) for x, z in zip(s, df['val']) for y in combinations(x, 2)]
df = pd.DataFrame(L, columns=['variable 1','variable 2','val'])
print (df)
variable 1 variable 2 val
0 a x 10
1 a y 10
2 x y 10
3 a x 80
4 a y 80
5 a f 80
6 x y 80
7 x f 80
8 y f 80
9 s s 4

Printing Fibonacci Series having 2 similar codes but different outputs. Why?

Resolve CODE 2 to print output as CODE 1 and give the reason why both of the codes have different outputs.
Fibonacci Series
CODE 1
x = 0
y = 1
while x < 10:
print(x)
x, y = y, x + y
output
0
1
1
2
3
5
8
CODE 2
x = 0
y = 1
while x < 10:
print(x)
x = y
y = x + y
Output
0
1
2
4
8

Those are simply not identical.
In the first code block y becomes x+y and in the second code block y becomes 2*y.
Just a quick note the output of the second code block is 0 1 2 4 8 and not what you wrote (this was fixed).

Creating matrix with for loop in python

I have a list with 4 elements. Each element is a correct score that I am pulling from a form. For example:
scoreFixed_1 = 1
scoreFixed_2 = 2
scoreFixed_3 = 3
scoreFixed_4 = 4
scoreFixed = [scoreFixed_1, scoreFixed_2, scoreFixed_3, scoreFixed_4]
Then, I need to add:
scoreFixed_1 to fixture[0][0]
scoreFixed_2 to fixture[0][1]
scoreFixed_3 to fixture[1][0]
scoreFixed_4 to fixture[1][1]
Hence, I need to create a triple for loop that outputs the following sequence so I can index to achieve the result above:
0 0 0
1 0 1
2 1 0
3 1 1
I have tried to use this to create this matrix, however I am only able to get the first column correct. Can anyone help?
for x in range(1):
for y in range(1):
for z in range(4):
print(z, x, y)
which outputs:
0 0 0
1 0 0
2 0 0
3 0 0

Your logic does not generate the table, you want something like:
rownum = 0
for x in range(2):
for y in range(2):
print (rownum, x, y)
rownum += 1
(Edit: The question has been changed, to accomplish the new desire, you want something like this:)
scoreIndex = 0
for x in range(2):
for y in range(2):
fixture[x][y] += scoreFixed[scoreIndex]
scoreIndex += 1

After your edit, it seems like we can split the 'sequence' into:
First column, regular ascending variable ( n += 1)
Second and third column, binary counter (00, 01, 10, 11)
0 0 0
1 0 1
2 1 0
3 1 1
^ ^------- These seem like a binary counter
(00, 01, 10, 11)
^------ A regular ascending variable
( n += 1 )
Using that 'logic' we can create a code that looks like
import itertools
scoreFixed = 0
for i in itertools.product([0,1],repeat=2):
print(scoreFixed, ' '.join(map(str,i)))
scoreFixed += 1
And wil output:
0 0 0
1 0 1
2 1 0
3 1 1
As you can test in this online demo

for x in range(4):
z = int(bin(x)[-1])
y = bin(x)[-2]
y = int(y) if y.isdigit() else 0
print(x, y, z)

Problem when creating Minesweeper in Python

I have been tasked with creating minesweeper in the terminal as a project. I am relatively new to Python so this is quite a big task for me. For some reason I cannot get the numbers that surround the bombs to add up correctly. I have pasted the code and some outputs below. I get no errors when running the code. (I have noticed that it may have something to do with the code for the top right block above the bomb, but I've looked through everything and can't seem to find the issue. There's also sometimes less bombs than there should be.)
import random
def minesweeper(dim_size, num_bombs):
print_list = []
for i in range(dim_size):
print_list.append([])
for j in range(dim_size):
print_list[i].append(0)
for i in range(num_bombs):
random_row = random.randrange(0,dim_size-1)
random_column = random.randrange(0,dim_size-1)
print_list[random_row][random_column] = 'X'
# centre-top
if random_row >= 1:
if print_list[random_row - 1][random_column] != 'X':
print_list[random_row - 1][random_column] += 1
# right-top
if random_row >= 1 and random_column > dim_size:
if print_list[random_row - 1][random_column + 1] != 'X':
print_list[random_row - 1][random_column + 1] += 1
# right
if random_column < dim_size:
if print_list[random_row][random_column + 1] != 'X':
print_list[random_row][random_column + 1] += 1
# bottom-right
if random_row < dim_size and random_column < dim_size:
if print_list[random_row + 1][random_column + 1] != 'X':
print_list[random_row + 1][random_column + 1] += 1
# bottom
if random_row < dim_size:
if print_list[random_row + 1][random_column] != 'X':
print_list[random_row + 1][random_column] += 1
# bottom-left
if random_row < dim_size and random_column >= 1:
if print_list[random_row + 1][random_column - 1] != 'X':
print_list[random_row + 1][random_column - 1] += 1
# left
if random_column >= 1:
if print_list[random_row][random_column - 1] != 'X':
print_list[random_row][random_column - 1] += 1
# top-left
if random_row >= 1 and random_column >= 1:
if print_list[random_row - 1][random_column - 1] != 'X':
print_list[random_row - 1][random_column - 1] += 1
for row in range(dim_size):
for column in range(dim_size):
print(print_list[row][column], end=' ')
print()
if __name__ == '__main__':
minesweeper(5,5)
Outputs:
1 X 1 0 0
2 3 3 1 0
2 X X X 1
X 3 3 2 1
1 1 0 0 0
2 X 1 0 0
4 X 2 0 0
X X 3 0 0
2 3 X 1 0
0 1 1 1 0
X 2 X X 1
X 3 2 2 1
1 2 1 0 0
0 1 X 1 0
0 1 1 1 0
X 3 X 2 0
1 4 3 2 0
1 1 X 1 0
X 2 1 1 0
1 1 0 0 0

A couple things stand out:
random.randrange doesn't include the endpoint, so if your endpoint is dim_size-1, that means you'll only ever generate numbers between zero and three inclusive. This means mines will never appear anywhere in the bottom row, or right-most column.
The second issue, which you've already pointed out, has to do with the way you're placing mines. You generate a random xy-coordinate, and then place a mine there. What if you happen to generate the same coordinate more than once? You simply place another mine in the same field which is already occupied by a mine.
Instead of using random.randrange, or even random.randint to generate random coordinates, I would first generate a collection of all possible coordinates, and then use random.sample to pull five unique coordinates from that collection. In the same way lottery numbers are drawn, the same numbers (coordinates in our case) can never be drawn more than once:
import random
import itertools
dim_size = 5
num_mines = 5
for x, y in random.sample(list(itertools.product(range(dim_size), repeat=2)), k=num_mines):
print("Put a mine at {}, {}".format(x, y))
Output:
Put a mine at 4, 4
Put a mine at 4, 3
Put a mine at 3, 1
Put a mine at 1, 0
Put a mine at 3, 0
>>>

Okay, For that 'There's also sometimes less bombs than there should be'... You need to use While loop to track number of bombs you've planted in Grid. Using for loop will run only in range(num_bombs) & it won't care if it has planted the required num of bombs. But While loop will first check if the code has planted the required num_bombs, if so... it will stop running but if not it will continue running
Also before planting the Bomb, you need to check if that row_column has Bomb, if so... don't Plant the bomb... if not plant the bomb.
here the code:
planted_num_bomb = 0 #number of bombs planted
while planted_num_bomb < num_bombs:
# for i in range(num_bombs):
random_row = random.randrange(0,dim_size-1)
random_column = random.randrange(0, dim_size - 1)
# check if the row_colm has a Bomb
if print_list[random_row][random_column] == 'X': #contains the bomb
continue #pass it
else:
print_list[random_row][random_column] = 'X'
planted_num_bomb += 1
There 5 results:
0 0 2 X 1
1 1 3 X 2
2 X 3 X 2
2 X 3 1 1
1 1 1 0 0
1 2 1 0 0
1 X X 1 0
2 4 X 3 0
1 X 3 X 1
1 1 2 1 1
5 #number of bombs planted
1 0 2 X 1
X 1 2 X 2
3 2 1 1 1
X X 1 0 0
2 2 1 0 0
5 #number of bombs planted
1 0 1 1 0
X 2 2 X 1
3 3 X 2 1
X X 2 1 0
2 2 1 0 0
5 #number of bombs planted
0 0 0 0 0
2 2 1 0 0
X X X 2 0
X 4 3 X 1
1 1 1 1 1
5 #number of bombs planted
Now if you take a look... the code can't place a mine a column number 5... why? ANSWER is at the top(1st answer). if you find a solution to that problem, still use the while loop because for loop won't always plant the required number of bombs, why? because, in for loop, when we skip(the continue part in code when there's bomb), the for loop still counts that as an iteration.
Anyway Goodluck on Your WTC bootcamp.

How to combine rows into seperate dataframe python pandas

i have the following dataset:
A B C D E F
154.6175111 148.0112337 155.7859835 1 1 x
255 253.960131 242.5382584 1 1 x
251.9665958 235.1105659 185.9121703 1 1 x
137.9974994 225.3985177 254.4420772 1 1 x
85.74722877 116.7060415 158.4608395 1 1 x
123.6969939 140.0524405 132.6798037 1 1 x
133.3251695 80.08976196 38.81201612 1 1 y
118.0718812 243.5927927 255 1 1 y
189.5557302 139.9046713 91.90519519 1 1 y
172.3117291 188.000268 129.8155501 1 1 y
48.07634611 21.9183119 25.99669279 1 1 y
23.40525987 8.395857933 25.62371342 1 1 y
228.753009 164.0697727 172.6624107 1 1 z
203.3405006 173.9368303 189.8103708 1 1 z
184.9801932 117.1591341 87.94739034 1 1 z
29.55251224 46.03945452 70.7433477 1 1 z
143.6159623 120.6170926 155.0736604 1 1 z
142.5421179 128.8916843 169.6013111 1 1 z
i want to combine x y z into another dataframe like this:
A B C D E F
154.6175111 148.0112337 155.7859835 1 1 x ->first x value
133.3251695 80.08976196 38.81201612 1 1 y ->first y value
228.753009 164.0697727 172.6624107 1 1 z ->first z value
and i want these dataframes for each x y z value like first, second third and so on.
how can i select and combine them?
desired output:
A B C D E F
154.6175111 148.0112337 155.7859835 1 1 x
133.3251695 80.08976196 38.81201612 1 1 y
228.753009 164.0697727 172.6624107 1 1 z
A B C D E F
255 253.960131 242.5382584 1 1 x
118.0718812 243.5927927 255 1 1 y
203.3405006 173.9368303 189.8103708 1 1 z
A B C D E F
251.9665958 235.1105659 185.9121703 1 1 x
189.5557302 139.9046713 91.90519519 1 1 y
184.9801932 117.1591341 87.94739034 1 1 z
A B C D E F
137.9974994 225.3985177 254.4420772 1 1 x
172.3117291 188.000268 129.8155501 1 1 y
29.55251224 46.03945452 70.7433477 1 1 z
A B C D E F
85.74722877 116.7060415 158.4608395 1 1 x
48.07634611 21.9183119 25.99669279 1 1 y
143.6159623 120.6170926 155.0736604 1 1 z
A B C D E F
123.6969939 140.0524405 132.6798037 1 1 x
23.40525987 8.395857933 25.62371342 1 1 y
142.5421179 128.8916843 169.6013111 1 1 z

Use GroupBy.cumcount for counter and then loop by another groupby object:
g = df.groupby('F').cumcount()
for i, g in df.groupby(g):
print (g)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Randomly select cells in df pandas - python

Related

Splitting columns containing comma separated string to new row values

Printing Fibonacci Series having 2 similar codes but different outputs. Why?

Creating matrix with for loop in python

Problem when creating Minesweeper in Python

How to combine rows into seperate dataframe python pandas

Categories

Resources