I want to perform for loop in pandas: for each row i I want to take column x1 and perform the test(if else statements)
In R I will do like this:
df <- data.frame(x1 = rnorm(10),x2 = rexp(10))
for(i in 1:length(df$x1)){
if(df[i,'x1'] >0){
print('+')
} else{
print('-')
}
}
How can I do this in pandas data frame?
P.S I need to perfom a loop like this. But if you have better ideas, I will appreciate it
EDIT:
In case multiple comparison:
Thank you for the answer!
And maybe you can give me an advise, how can i do the iteration if i have multiple if/else statements? For example:
if x>0:
if x%2 == 0:
#do stuff 1
else:
#do other stuff 2
elif x<0:
if x%2 == 0:
#do stuff 3
else:
#do other stuff 4
If need new column use numpy.where:
np.random.seed(54)
df = pd.DataFrame({'x1':np.random.randint(10, size=10)}) - 5
df['new'] = np.where(df['x1'] > 0, '+', '-')
print (df)
x1 new
0 0 -
1 -3 -
2 2 +
3 -4 -
4 -5 -
5 3 +
6 2 +
7 -4 -
8 4 +
9 1 +
But if need loop (obviously avoid it, because slow) is possible use iteritems or items():
for i, x in df['x1'].iteritems():
if x > 0:
print ('+')
else:
print ('-')
EDIT:
df['new'] = np.where(df['x1'] > 0, 'a',
np.where(df['x1'] & 2, 'b', 'c'))
print (df)
x1 new
0 0 c
1 -3 c
2 2 a
3 -4 c
4 -5 b
5 3 a
6 2 a
7 -4 c
8 4 a
9 1 a
But if have many conditions (4 or more) use apply with custom function:
def f(x):
#x == 0
y = 5
if x>0:
if x%2 == 0:
y = 0
#do stuff 1
else:
y = 1
#do other stuff 2
elif x<0:
if x%2 == 0:
y = 2
#do stuff 3
else:
y = 3
#do other stuff 4
return y
df['new'] = df['x1'].apply(f)
print (df)
x1 new
0 0 5
1 -3 3
2 2 0
3 -4 2
4 -5 3
5 3 1
6 2 0
7 -4 2
8 4 0
9 1 1
You can use this code to print out each index with the correct symbol:
print(df['x1'].map(lambda x: '+' if x > 0 else '-').to_string(index=False))
What the above code does is creates a new Series object, for which you use the map function to convert each symbol into a + if i>0 and a - if i<=0. Then, the Series is converted to a string and printed out without indices.
But if you absolutely need to loop through each row, you can use the following code, which is what you have but condensed into 2 lines:
for i in df['x1']:
print('+' if i > 0 else '-')
Related
What is the form of the for loop code below, if it is converted into a while loop?
This is the code:
def number(a):
b = 1
for c in range(1, a+1):
b*=c
return(b)
a = 6
for d in range(a, 0, -1):
print(number(d),end=' ')
for e in range(d, 0, -1):
print(e, end = ' ')
print('')
This is the output:
720 6 5 4 3 2 1
120 5 4 3 2 1
24 4 3 2 1
6 3 2 1
2 2 1
1 1
a = 6
for d in range(a, 0, -1):
Here it is saying that a=6, and the for loop is running from 6 till 0, and stepping by -1 each time. To turn it into a while loop you can just do:
a = 6
while a > 0:
a -= 1
# and so on
Assuming your question is that you wanted to turn the for loop into a while loop, that is how I would do it, but I don't see the point unless there is something very specific you need to do.
How about this line?
After the for line:
a = 6
for d in range(a, 0, -1):
I still don't understand changing the for loop for the code below, because this line has a for d in it as a prefix:
def number(a):
b = 1
c = 1
while c <= a:
c += 1
b*=c
return(b)
a = 6
while a > 0:
a -= 1
print(number(a))
#for e in range(d, 0, -1):
# print(e, end = ' ')
#print('')
The code that I have hashed, is still my problem, what is the solution in that line of code
The output of the program code has just arrived as below:
720
120
24
6
2
1
I have been tasked with creating minesweeper in the terminal as a project. I am relatively new to Python so this is quite a big task for me. For some reason I cannot get the numbers that surround the bombs to add up correctly. I have pasted the code and some outputs below. I get no errors when running the code. (I have noticed that it may have something to do with the code for the top right block above the bomb, but I've looked through everything and can't seem to find the issue. There's also sometimes less bombs than there should be.)
import random
def minesweeper(dim_size, num_bombs):
print_list = []
for i in range(dim_size):
print_list.append([])
for j in range(dim_size):
print_list[i].append(0)
for i in range(num_bombs):
random_row = random.randrange(0,dim_size-1)
random_column = random.randrange(0,dim_size-1)
print_list[random_row][random_column] = 'X'
# centre-top
if random_row >= 1:
if print_list[random_row - 1][random_column] != 'X':
print_list[random_row - 1][random_column] += 1
# right-top
if random_row >= 1 and random_column > dim_size:
if print_list[random_row - 1][random_column + 1] != 'X':
print_list[random_row - 1][random_column + 1] += 1
# right
if random_column < dim_size:
if print_list[random_row][random_column + 1] != 'X':
print_list[random_row][random_column + 1] += 1
# bottom-right
if random_row < dim_size and random_column < dim_size:
if print_list[random_row + 1][random_column + 1] != 'X':
print_list[random_row + 1][random_column + 1] += 1
# bottom
if random_row < dim_size:
if print_list[random_row + 1][random_column] != 'X':
print_list[random_row + 1][random_column] += 1
# bottom-left
if random_row < dim_size and random_column >= 1:
if print_list[random_row + 1][random_column - 1] != 'X':
print_list[random_row + 1][random_column - 1] += 1
# left
if random_column >= 1:
if print_list[random_row][random_column - 1] != 'X':
print_list[random_row][random_column - 1] += 1
# top-left
if random_row >= 1 and random_column >= 1:
if print_list[random_row - 1][random_column - 1] != 'X':
print_list[random_row - 1][random_column - 1] += 1
for row in range(dim_size):
for column in range(dim_size):
print(print_list[row][column], end=' ')
print()
if __name__ == '__main__':
minesweeper(5,5)
Outputs:
1 X 1 0 0
2 3 3 1 0
2 X X X 1
X 3 3 2 1
1 1 0 0 0
2 X 1 0 0
4 X 2 0 0
X X 3 0 0
2 3 X 1 0
0 1 1 1 0
X 2 X X 1
X 3 2 2 1
1 2 1 0 0
0 1 X 1 0
0 1 1 1 0
X 3 X 2 0
1 4 3 2 0
1 1 X 1 0
X 2 1 1 0
1 1 0 0 0
A couple things stand out:
random.randrange doesn't include the endpoint, so if your endpoint is dim_size-1, that means you'll only ever generate numbers between zero and three inclusive. This means mines will never appear anywhere in the bottom row, or right-most column.
The second issue, which you've already pointed out, has to do with the way you're placing mines. You generate a random xy-coordinate, and then place a mine there. What if you happen to generate the same coordinate more than once? You simply place another mine in the same field which is already occupied by a mine.
Instead of using random.randrange, or even random.randint to generate random coordinates, I would first generate a collection of all possible coordinates, and then use random.sample to pull five unique coordinates from that collection. In the same way lottery numbers are drawn, the same numbers (coordinates in our case) can never be drawn more than once:
import random
import itertools
dim_size = 5
num_mines = 5
for x, y in random.sample(list(itertools.product(range(dim_size), repeat=2)), k=num_mines):
print("Put a mine at {}, {}".format(x, y))
Output:
Put a mine at 4, 4
Put a mine at 4, 3
Put a mine at 3, 1
Put a mine at 1, 0
Put a mine at 3, 0
>>>
Okay, For that 'There's also sometimes less bombs than there should be'... You need to use While loop to track number of bombs you've planted in Grid. Using for loop will run only in range(num_bombs) & it won't care if it has planted the required num of bombs. But While loop will first check if the code has planted the required num_bombs, if so... it will stop running but if not it will continue running
Also before planting the Bomb, you need to check if that row_column has Bomb, if so... don't Plant the bomb... if not plant the bomb.
here the code:
planted_num_bomb = 0 #number of bombs planted
while planted_num_bomb < num_bombs:
# for i in range(num_bombs):
random_row = random.randrange(0,dim_size-1)
random_column = random.randrange(0, dim_size - 1)
# check if the row_colm has a Bomb
if print_list[random_row][random_column] == 'X': #contains the bomb
continue #pass it
else:
print_list[random_row][random_column] = 'X'
planted_num_bomb += 1
There 5 results:
0 0 2 X 1
1 1 3 X 2
2 X 3 X 2
2 X 3 1 1
1 1 1 0 0
1 2 1 0 0
1 X X 1 0
2 4 X 3 0
1 X 3 X 1
1 1 2 1 1
5 #number of bombs planted
1 0 2 X 1
X 1 2 X 2
3 2 1 1 1
X X 1 0 0
2 2 1 0 0
5 #number of bombs planted
1 0 1 1 0
X 2 2 X 1
3 3 X 2 1
X X 2 1 0
2 2 1 0 0
5 #number of bombs planted
0 0 0 0 0
2 2 1 0 0
X X X 2 0
X 4 3 X 1
1 1 1 1 1
5 #number of bombs planted
Now if you take a look... the code can't place a mine a column number 5... why? ANSWER is at the top(1st answer). if you find a solution to that problem, still use the while loop because for loop won't always plant the required number of bombs, why? because, in for loop, when we skip(the continue part in code when there's bomb), the for loop still counts that as an iteration.
Anyway Goodluck on Your WTC bootcamp.
I'm looking to find the max run of consecutive zeros in a DataFrame with the result grouped by user. I'm interested in running the RLE on usage.
sample input:
user--day--usage
A-----1------0
A-----2------0
A-----3------1
B-----1------0
B-----2------1
B-----3------0
Desired output
user---longest_run
a - - - - 2
b - - - - 1
mydata <- mydata[order(mydata$user, mydata$day),]
user <- unique(mydata$user)
d2 <- data.frame(matrix(NA, ncol = 2, nrow = length(user)))
names(d2) <- c("user", "longest_no_usage")
d2$user <- user
for (i in user) {
if (0 %in% mydata$usage[mydata$user == i]) {
run <- rle(mydata$usage[mydata$user == i]) #Run Length Encoding
d2$longest_no_usage[d2$user == i] <- max(run$length[run$values == 0])
} else {
d2$longest_no_usage[d2$user == i] <- 0 #some users did not have no-usage days
}
}
d2 <- d2[order(-d2$longest_no_usage),]
this works in R but I want to do the same thing in python, I'm totally stumped
Use groupby with size by columns user, usage and helper Series for consecutive values first:
print (df)
user day usage
0 A 1 0
1 A 2 0
2 A 3 1
3 B 1 0
4 B 2 1
5 B 3 0
6 C 1 1
df1 = (df.groupby([df['user'],
df['usage'].rename('val'),
df['usage'].ne(df['usage'].shift()).cumsum()])
.size()
.to_frame(name='longest_run'))
print (df1)
longest_run
user val usage
A 0 1 2
1 2 1
B 0 3 1
5 1
1 4 1
C 1 6 1
Then filter only zero rows, get max and add reindex for append non 0 groups:
df2 = (df1.query('val == 0')
.max(level=0)
.reindex(df['user'].unique(), fill_value=0)
.reset_index())
print (df2)
user longest_run
0 A 2
1 B 1
2 C 0
Detail:
print (df['usage'].ne(df['usage'].shift()).cumsum())
0 1
1 1
2 2
3 3
4 4
5 5
6 6
Name: usage, dtype: int32
get max number of consecutive zeros on series:
def max0(sr):
return (sr != 0).cumsum().value_counts().max() - (0 if (sr != 0).cumsum().value_counts().idxmax()==0 else 1)
max0(pd.Series([1,0,0,0,0,2,3]))
4
I think the following does what you are looking for, where the consecutive_zero function is an adaptation of the top answer here.
Hope this helps!
import pandas as pd
from itertools import groupby
df = pd.DataFrame([['A', 1], ['A', 0], ['A', 0], ['B', 0],['B',1],['C',2]],
columns=["user", "usage"])
def len_iter(items):
return sum(1 for _ in items)
def consecutive_zero(data):
x = list((len_iter(run) for val, run in groupby(data) if val==0))
if len(x)==0: return 0
else: return max(x)
df.groupby('user').apply(lambda x: consecutive_zero(x['usage']))
Output:
user
A 2
B 1
C 0
dtype: int64
If you have a large dataset and speed is essential, you might want to try the high-performance pyrle library.
Setup:
# pip install pyrle
# or
# conda install -c bioconda pyrle
import numpy as np
np.random.seed(0)
import pandas as pd
from pyrle import Rle
size = int(1e7)
number = np.random.randint(2, size=size)
user = np.random.randint(5, size=size)
df = pd.DataFrame({"User": np.sort(user), "Number": number})
df
# User Number
# 0 0 0
# 1 0 1
# 2 0 1
# 3 0 0
# 4 0 1
# ... ... ...
# 9999995 4 1
# 9999996 4 1
# 9999997 4 0
# 9999998 4 0
# 9999999 4 1
#
# [10000000 rows x 2 columns]
Execution:
for u, udf in df.groupby("User"):
r = Rle(udf.Number)
is_0 = r.values == 0
print("User", u, "Max", np.max(r.runs[is_0]))
# (Wall time: 1.41 s)
# User 0 Max 20
# User 1 Max 23
# User 2 Max 20
# User 3 Max 22
# User 4 Max 23
I used the code below to map the 2 values inside S column to 0 but it didn't work. Any suggestion on how to solve this?
N.B : I want to implement an external function inside the map.
df = pd.DataFrame({
'Age': [30,40,50,60,70,80],
'Sex': ['F','M','M','F','M','F'],
'S' : [1,1,2,2,1,2]
})
def app(value):
for n in df['S']:
if n == 1:
return 1
if n == 2:
return 0
df["S"] = df.S.map(app)
Use eq to create a boolean series and conver that boolean series to int with astype:
df['S'] = df['S'].eq(1).astype(int)
OR
df['S'] = (df['S'] == 1).astype(int)
Output:
Age Sex S
0 30 F 1
1 40 M 1
2 50 M 0
3 60 F 0
4 70 M 1
5 80 F 0
Don't use apply, simply use loc to assign the values:
df.loc[df.S.eq(2), 'S'] = 0
Age Sex S
0 30 F 1
1 40 M 1
2 50 M 0
3 60 F 0
4 70 M 1
5 80 F 0
If you need a more performant option, use np.select. This is also more scalable, as you can always add more conditions:
df['S'] = np.select([df.S.eq(2)], [0], 1)
You're close but you need a few corrections. Since you want to use a function, remove the for loop and replace n with value. Additionally, use apply instead of map. Apply operates on the entire column at once. See this answer for how to properly use apply vs applymap vs map
def app(value):
if value == 1:
return 1
elif value == 2:
return 0
df['S'] = df.S.apply(app)
Age Sex S
0 30 F 1
1 40 M 1
2 50 M 0
3 60 F 0
4 70 M 1
5 80 F 0
If you only wish to change values equal to 2, you can use pd.DataFrame.loc:
df.loc[df['S'] == 0, 'S'] = 0
pd.Series.apply is not recommend and this is just a thinly veiled, inefficient loop.
You could use .replace as follows:
df["S"] = df["S"].replace([2], 0)
This will replace all of 2 values to 0 in one line
Go with vectorize numpy operation:
df['S'] = np.abs(df['S'] - 2)
and stand yourself out from competitions in interviews and SO answers :)
>>>df = pd.DataFrame({'Age':[30,40,50,60,70,80],'Sex':
['F','M','M','F','M','F'],'S':
[1,1,2,2,1,2]})
>>> def app(value):
return 1 if value == 1 else 0
# or app = lambda value : 1 if value == 1 else 0
>>> df["S"] = df["S"].map(app)
>>> df
Age S Sex
Age S Sex
0 30 1 F
1 40 1 M
2 50 0 M
3 60 0 F
4 70 1 M
5 80 0 F
You can do:
import numpy as np
df['S'] = np.where(df['S'] == 2, 0, df['S'])
Working on a project for CS1 that prints out a grid made of 0s and adds shapes of certain numbered sizes to it. Before it adds a shape it needs to check if A) it will fit on the grid and B) if something else is already there. The issue I am having is that when run, the function that checks to make sure placement for the shapes is valid will always do the first and second shapes correctly, but any shape added after that will only "see" the first shape added when looking for a collision. I checked to see if it wasnt taking in the right list after the first time but that doesnt seem to be it. Example of the issue....
Shape Sizes = 4, 3, 2, 1
Python Outputs:
4 4 4 4 1 2 3 0
4 4 4 4 2 2 3 0
4 4 4 4 3 3 3 0
4 4 4 4 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
It Should Output:
4 4 4 4 3 3 3 1
4 4 4 4 3 3 3 0
4 4 4 4 3 3 3 0
4 4 4 4 2 2 0 0
0 0 0 0 2 2 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
What's going on here? Full Code is below...
def binCreate(size):
binlist = [[0 for col in range(size)] for row in range(size)]
return binlist
def binPrint(lst):
for row in range(len(lst)):
for col in range(len(lst[row])):
print(lst[row][col], end = " ")
print()
def itemCreate(fileName):
lst = []
for i in open(fileName):
i = i.split()
lst = i
lst = [int(i) for i in lst]
return lst
def main():
size = int(input("Bin Size: "))
fileName = str(input("Item Size File: "))
binList = binCreate(size)
blockList = itemCreate(fileName)
blockList.sort(reverse = True)
binList = checker(binList, len(binList), blockList)
binPrint(binList)
def isSpaceFree(binList, r, c, size):
if r + size > len(binList[0]):
return False
elif c + size > len(binList[0]):
return False
for row in range(r, r + size):
for col in range(c, c + size):
if binList[r][c] != 0:
return False
elif binList[r][c] == size:
return False
return True
def checker(binList, gSize, blockList):
for i in blockList:
r = 0
c = 0
comp = False
while comp != True:
check = isSpaceFree(binList, r, c, i)
if check == True:
for x in range(c, c+ i):
for y in range(r, r+ i):
binList[x][y] = i
comp = True
else:
print(c)
print(r)
r += 1
if r > gSize:
r = 0
c += 1
if c > gSize:
print("Imcompadible")
comp = True
print(i)
binPrint(binList)
input()
return binList
Your code to test for open spaces looks in binList[r][c] (where r is a row value and c is a column value). However, the code that sets the values once an open space has been found sets binList[x][y] (where x is a column value and y is a row value).
The latter is wrong. You want to set binList[y][x] instead (indexing by row, then column).
That will get you a working solution, but it will still not be exactly what you say you expect (you'll get a reflection across the diagonal). This is because your code updates r first, then c only when r has exceeded the bin size. If you want to place items to the right first, then below, you need to swap them.
I'd suggest using two for loops for r and c, rather than a while too, but to make it work in an elegant way you'd probably need to factor out the "find one item's place" code so you could return from the inner loop (rather than needing some complicated code to let you break out of both of the nested loops).