value range in Pandas - python

I have a simple code for titanic data:
import pandas as pd
def pClassSurvivorDetails(df,pClass):
print('\nResults for Pclass =', pClass, '\n -------------------- ')
print("The following did not survive")
notSurvive = df['Sex'][df['Survived']==0][df['Pclass']==pClass]
print(notSurvive.value_counts())
print("The following did survive")
survive = df['Sex'][df['Survived']==1][df['Pclass']==pClass]
print(survive.value_counts())
def main():
df = pd.read_csv("titanic.csv")
for value in [1, 2, 3]:
pClassSurvivorDetails(df,value, )
main()
Now I need to do the same result but instead of for value in [1, 2, 3] i need first number =x last number= y and all between should be included ...something like [1:3](but it doesn't work this way). Any ideas please...

To cycle through all values between two variables in Python, you can use:
for i in range(x, y):
Or, since it is up to and not including y, you could include y with:
for i in range(x, y + 1):
To get all values in this range, and then access only one, the simplest way is to store it as a list.
my_values = list(range(x, y))
And then you can access with indexing, e.g.:
my_values[2]

Related

Python: How can I use a String for a If-Statement?

In Python I have to build a (long) if statement dynamically.
How can I do this?
I tried the following test code to store the necessary if-statement within a string with the function "buildFilterCondition".
But this doesn´t work...
Any ideas? What is going wrong?
Thank you very much.
Input = [1,2,3,4,5,6,7]
Filter = [4,7]
FilterCondition = ""
def buildFilterCondition():
global FilterCondition
for f in Filter:
FilterCondition = FilterCondition + "(x==" + str(f) +") | "
#remove the last "| " sign
FilterCondition = FilterCondition[:-2]
print("Current Filter: " + FilterCondition)
buildFilterCondition()
for x in Input:
if( FilterCondition ):
print(x)
With my Function buildFilterCondition() I want to reach the following situation, because the function generates the string "(x==4) | (x==7)", but this doesn´t work:
for x in Input:
if( (x==4) | (x==7) ):
print(x)
The output, the result should be 4,7 (--> filtered)
The background of my question actually had a different intention than to replace an if-statement.
I need a longer multiple condition to select specific columns of a pandas dataframe.
For example:
df2=df.loc[(df['Discount1'] == 1000) & (df['Discount2'] == 2000)]
I wanted to keep the column names and the values (1000, 2000) in 2 separate lists (or dictionary) to make my code a little more "generic".
colmnHeader = ["Discount1", "Discount2"]
filterValue = [1000, 2000]
To "filter" the data frame, I then only need to adjust the lists.
How do I now rewrite the call to the .loc method so that it works for iterating over the lists?
df2=df.loc[(df[colmHeader[0] == [filterValue[0]) & (df[colmHeader[1]] == filterValue[1])]
Unfortunately, my current attempt with the following code does not work because the panda-loc function has not to be called sequentially, but in parallel.
So I need ALL the conditions from the lists directly in the .loc call.
#FILTER
colmn = ["colmn1", "colmn2", "colmn3"]
cellContent = ["1000", "2000", "3000"]
# first make sure, the lists have the same size
if( len(colmn) == len(cellContent)):
curIdx = 0
for curColmnName in colmn:
df_columns= df_columns.loc[df_columns [curColmnName]==cellContent[curIdx]]
curIdx += 1
Thank you again!
Use in operator
Because simple if better than complex.
inputs = [1, 2, 3, 4, 5, 6, 7]
value_filter = [4, 7]
for x in inputs:
if x in value_filter:
print(x, end=' ')
# 4 7
Use operator module
With the operator module, you can build a condition at runtime with a list of operator and values pairs to test the current value.
import operator
inputs = [1, 2, 3, 4, 5, 6, 7]
# This list can be dynamically changed if you need to
conditions = [
(operator.ge, 4), # value need to be greater or equal to 4
(operator.lt, 7), # value need to be lower than 7
]
for x in inputs:
# all to apply a and operator on all condition, use any for or
if all(condition(x, value) for condition, value in conditions):
print(x, end=' ')
# 4 5 6

Assign core.mul.mul to an entry of sympy.Matrix

I have a question below:
When the variable "temp" is assigned to a row of the MutableDenseMatrix, an error pops up:
ValueError: unexpected value: x1*x3**2
But the product is what I need. Is there a type conversion issue? Tons of thanks.
import numpy as np
import sympy as sp
Y = sp.MutableDenseMatrix(sp.zeros(3,1))
temp = 1
x = sp.symbols('x:'+str(4))
X = sp.Matrix(x)
C = np.array([0, 1, 0, 2])
for i in range(X.shape[0]):
temp = temp * sp.Pow(X[i], C[i])
Y[1,:] = temp
When you use slice notation on a symbolic matrix, the value you are going to assign needs to be an iterable, even though you are going to assign a single element. For example, this works:
Y[1,:] = [temp]
Alternatively, since Y is a single column matrix and with Y[1,:] you are targeting a single value, you can do:
Y[1] = temp

Python : Function to get the next character of a giving position of strings of a dataframe

I want to get the next character giving a position of a another character of a string in a dataframe. If we have browsed every character of a string I should go to the next line.
And to do so I have written the function bellow.
def get_char(df, y, z):
if z < len(df[0][y])-1:
return df[0][y][z+1]
elif z == len(df[0][y])-1:
if y < len(df[0])-1:
return df[0][y+1][0]
so for the dataframe :
ar = np.array(["aba", "bcb", "zab"])
df = pd.DataFrame(ar)
if I
print get_char(df, 1, 2)
gives me z
and
print get_char(df, 2, 2)
should return nothing , in my function it returns None
I am pretty sure that I can do it with a much easier way.
My dataframe will have only one column.
import numpy as np
import pandas as pd
ar = np.array(["aba", "bcb", "zab"])
df = pd.DataFrame(ar)
def get_char(df, y, z):
a = ''.join(df.iloc[y:y+2,0])
try:
return a[z+1]
except:
return None
print(get_char(df, 1, 2))
# z
print(get_char(df, 2, 2))
# None

Merging and sorting tuples in Python

The program should be like this:
together((0,39,100,210),(4,20))
printing the following:
(0,4,20,39,100,210)
The code:
def together(s,t):
y = s + t
z = 0
if sorted(y) == y:
print (y)
else:
for i in range(len(y)-1):
if y[z] > y[z+1]:
y[z+1] = y[z]
return (y)
print y
If variables are set like the following:
s=1,23,40
and
t=9,90
I´m getting this:
(1, 23, 40, 9, 90)
which is out of order as you can see it should appear the following:
(1,9,23,40,90)
Is there any other way you could do this by using comparatives to see if the numbers from the 'y' variable are already sorted and if not do it to the whole scheme?
Using a type of code like this one where you use to see what´s the bigger number:
def bigger(t):
bigger = 0
for i in range(len(t)-1):
if t[i] > bigger:
bigger = t[i]
return bigger
Is this possible for getting the same result that you get by doing the solution above?
T = ((0,39,100,210),(4,20))
print tuple( sorted( reduce(tuple.__add__, T) ) )
T is your together tuple....This uses reduce, and in-built __add__ function of tuples to do the job...

For cycle gets stuck in Python

My code below is getting stuck on a random point:
import functions
from itertools import product
from random import randrange
values = {}
tables = {}
letters = "abcdefghi"
nums = "123456789"
for x in product(letters, nums): #unnecessary
values[x[0] + x[1]] = 0
for x in product(nums, letters): #unnecessary
tables[x[0] + x[1]] = 0
for line_cnt in range(1,10):
for column_cnt in range(1,10):
num = randrange(1,10)
table_cnt = functions.which_table(line_cnt, column_cnt) #Returns a number identifying the table considered
#gets the values already in the line and column and table considered
line = [y for x,y in values.items() if x.startswith(letters[line_cnt-1])]
column = [y for x,y in values.items() if x.endswith(nums[column_cnt-1])]
table = [x for x,y in tables.items() if x.startswith(str(table_cnt))]
#if num is not contained in any of these then it's acceptable, otherwise find another number
while num in line or num in column or num in table:
num = randrange(1,10)
values[letters[line_cnt-1] + nums[column_cnt-1]] = num #Assign the number to the values dictionary
print(line_cnt) #debug
print(sorted(values)) #debug
As you can see it's a program that generates random sudoku schemes using 2 dictionaries : values that contains the complete scheme and tables that contains the values for each table.
Example :
5th square on the first line = 3
|
v
values["a5"] = 3
tables["2b"] = 3
So what is the problem? Am I missing something?
import functions
...
table_cnt = functions.which_table(line_cnt, column_cnt) #Returns a number identifying the table considered
It's nice when we can execute the code right ahead on our own computer to test it. In other words, it would have been nice to replace "table_cnt" with a fixed value for the example (here, a simple string would have sufficed).
for x in product(letters, nums):
values[x[0] + x[1]] = 0
Not that important, but this is more elegant:
values = {x+y: 0 for x, y in product(letters, nums)}
And now, the core of the problem:
while num in line or num in column or num in table:
num = randrange(1,10)
This is where you loop forever. So, you are trying to generate a random sudoku. From your code, this is how you would generate a random list:
nums = []
for _ in range(9):
num = randrange(1, 10)
while num in nums:
num = randrange(1, 10)
nums.append(num)
The problem with this approach is that you have no idea how long the program will take to finish. It could take one second, or one year (although, that is unlikely). This is because there is no guarantee the program will not keep picking a number already taken, over and over.
Still, in practice it should still take a relatively short time to finish (this approach is not efficient but the list is very short). However, in the case of the sudoku, you can end up in an impossible setting. For example:
line = [6, 9, 1, 2, 3, 4, 5, 8, 0]
column = [0, 0, 0, 0, 7, 0, 0, 0, 0]
Where those are the first line (or any line actually) and the last column. When the algorithm will try to find a value for line[8], it will always fail since 7 is blocked by column.
If you want to keep it this way (aka brute force), you should detect such a situation and start over. Again, this is very unefficient and you should look at how to generate sudokus properly (my naive approach would be to start with a solved one and swap lines and columns randomly but I know this is not a good way).

Categories