Can you use if statements to create variables? - python

I am trying to make the switch from STATA to python for data analysis and I'm running into some hiccups that I'd like some help with. I am attempting to create a secondary variable based on some values in an original variable. I want to create a binary variable which identifies fall accidents (E-codes E880.xx -E888.xx) with a value of 1, and all other e-codes with a value of 0. in a list of ICD-9 codes with over 10,000 rows, so manual imputation isn't possible.
in STATA the code would look something like this
newvar= 0
replace newvar = 1 if ecode_variable == "E880"
replace newvar = 1 if ecode_variable == "E881"
etc
I tried a similar statement in python, but it's not working
data['ecode_fall'] = 1 if data['ecode'] == 'E880'
is this type of work possible in python? Is there a function in the numpy or pandas packages that could help with this.
I've also tried creating a dictionary variable which calls the fall injury codes 1 and applying it to the variable to no avail.

Put the if first.
if data['ecode'] == 'E880': data['ecode_fall'] = 1

you can break it out into two lines like this:
if data['ecode'] == 'E880':
data['ecode_fall'] = 1
or if you include an else statement you can have it in one line, similar syntax to your SATA code:
data['ecode_fall'] = 1 if data['ecode'] == 'E880' else None

Following from the other answers, you can also check multiple values at once like so:
if data['ecode'] in ('E880', 'E881', ...):
data['ecode_fall'] = 1
this leaves you having to only do one if statement per unique value of data['ecode_fall'].

Related

How to create a dataframe with a dynamic text parameter passed to a function in Python

(I'm pretty new to Python,(and even to coding)forgive me for my stupidity.)
I'm trying to pass a text value and a list as parameters to a function. Here's an example :
Names = File['Student_Name']
Scores = File['Marks']
for a in range(0,100):
Student_Name = [Names[a]]
Marks = []
NewDf = pd.DataFrame(PreCovid(Student_Name,Marks))
Master_Sheet_PreCovid = NewDf
Master_Sheet_PreCovid
What I wish to achieve is passing Name of a Student, as a string, one at a time, to the function. In this code, I'm vaguely creating a df with each loop iteration, which obviously will only return me the last value, however, I wish to get the output for complete list of Students. What modifications/additions do I make in this code to make it work.
I followed this thread, Why the function is only returning the last value? , which was similar to my query, however might not work with my requirements.
Edited : I actually have 2 sheets that I'm fetching my data from,one is a Main Sheet,that has all the data with redundancy,I've a Rule book with unique values and the rules for calculation.In this code I'm only fetching values from Rule Book,then going to the function,fetching data based on these values from Main Sheet,performing my calculations,creating a new dataframe,inserting the values I get here into that dataframe as well,and return the Final dataframe.Right now, the calculation tested based only on Student_Name has worked, but now I've a bigger problem of calculating also based on Marks.
At the risk of sounding arrogant, I only wish to pass the name as string, not as list.
Again, I'm sorry about the stupidity of my query.
Give it a try:
Names = File['Student_Name']
Scores = File['Marks']
Master_Sheet_PreCovid = []
for a in range(0,100):
Student_Name = [Names[a]]
Marks = []
NewDf = pd.DataFrame(PreCovid(Student_Name,Marks))
Master_Sheet_PreCovid.append(NewDf)
Master_Sheet_PreCovid = pd.concat(Master_Sheet_PreCovid)
print(Master_Sheet_PreCovid)

Is it possible to use variables values in if statements in Python?

I have a database table about people. They have variables like age, height, weight etc..
I also have another database table about charasteristics of the people. This has three fields:
Id: Just a running number
Condition: For example "Person is teenager" or "Person is overweight"
Formula: For example for the "Person is teenager" the formula is "age > 12 and age < 20" or for the overweight "weight / height * height > 30"
There are more than 50 conditions like there. When I want to define the characteristics of the person I would need to make if statement for all these conditions which makes the code quite messy and also hard to maintain (when ever I add a new condition to database I also need to add a new if statement in the code)
If I type the formulas directly to database is it possible to use those as if statements directly? As in if(print(characteristic['formula']) etc..
What I am looking is something like this, I am using Python.
In this code
Person is one person already fetched from database as a dict
Characteristics are all the characteristics fetched from the database as a list of dictionaries
def getPeronCharacteristics(person, characteristics):
age = person['age']
weight = person['weight'] etc...
personsCharacteristics = []
for x in characteristics:
if(x['formula']):
personCharacteristics.append(x['condition'])
return personCharacteristics
Now in this part if(x['formula']) instead of checking if the variable is true it should "print" the variable value and run if statement agains that e.g. if(age > 12 and age < 20):
Is this possible in some way? Again the whole point of this is that when I come up with new conditions I could just add a new row to the database without altering any code and adding yet another if statement.
Do you mean like this?
#
#Example file for working with conditional statement
#
def main():
x,y =2,8
if(x < y):
st= "x is less than y"
print(st)
if __name__ == "__main__":
main()
This is possible using python's eval function:
if eval(x['formula']):
...
However, this is usually discouraged as it can make it complicated to understand your program, and can give security problems if you're not very careful about how your database is accessed and what can end up in there.

String matching in Python and IF command

I have a list of different names. I want to take one name at a time and match it with values in a particular column in a data frame. If the conditions are met, the following calculation will be performed:
orderno == orderno + 1
However, unfortunately, the code does not seem to work. Is there anything that I can do to make sure it works?
DfCustomers['orderno'] = 0
for i in uniquecustomer:
if i == "DfCustomers['EntityName']":
orderno == orderno + 1
Remove the quotes (""). By writing
if i == "DfCustomers['EntityName']":
you compare the variable i with the actual string "DfCustomers['EntityName']" instead of the variable DfCustomers['EntityName']. Try to remove the quotes and print out the variable to get a feeling for it, e.g.
print("DfCustomers['EntityName']")
vs
print(DfCustomers['EntityName'])
Try first removing the quotes around the "DfCustomers['EntityName']" so as to not just compare directly to that string. Then, within your logic the orderno variable should be incremented by 1, not compared to its value + 1. The new code could look something like this:
DfCustomers['orderno'] = 0
for i in uniquecustomer:
if i == DfCustomers['EntityName']:
orderno = orderno + 1

Python: replace for loop with function

Can anyone help me to understand how I would create a function with def whatever() instead of using a for loop. I'm trying to do thing more Pythonically but don't really understand how to apply a function well instead of a loop. For instance, I have a loop below that works well and gives the output I would like, is there a way to do this with a function?
seasons = leaguesFinal['season'].unique()
teams = teamsDF['team_long_name'].unique()
df = []
for i in seasons:
season = leaguesFinal['season'] == i
season = leaguesFinal[season]
for j in teams:
team_season_wins = season['win'] == j
team_season_win_record = team_season_wins[team_season_wins].count()
team_season_loss = season['loss'] == j
team_season_loss_record = team_season_loss[team_season_loss].count()
df.append((j, i, team_season_win_record, team_season_loss_record))
df = pd.DataFrame(df, columns=('Team', 'Seasons', 'Wins', 'Losses'))
The output looks as follows:
Team Seasons Wins Losses
0 KRC Genk 2008/2009 15 14
1 Beerschot AC 2008/2009 11 14
2 SV Zulte-Waregem 2008/2009 16 11
3 Sporting Lokeren 2008/2009 13 9
4 KSV Cercle Brugge 2008/2009 14 15
Solution
def some_loop(something, something_else):
for i in something:
season = leaguesFinal['sesaon'] == i
season = leaguesFinal[season]
for j in something_else:
team_season_wins = season['win'] == j
team_season_win_record = team_season_wins[team_season_wins].count()
team_season_loss = season['loss'] == j
team_season_loss_record = team_season_loss[team_season_loss].count()
df.append((j, i, team_season_win_record, team_season_loss_record))
some_loop(seasons, teams)
Comments
This is what you are mentioning, creating a function out of the for loop although you still have a for loop its in a function that you can use in different areas of your code without re-using the entire code for the loop.
All there is to to is define a function that accepts two variables for this particular loop that would be def some_loop(something, something_else), I used basic naming so you could see clearer whats taking place.
Then you would replace all the instanes of seasons and teams with those variables.
Now you call your function will replace all occurences of something and something_else with whatever inputs you send to it.
Also I am not completely sure of the statements that involve x = y = i and what this accomplishes or if its even a valid statement?
actually youre mixing stuff up - functions just aggregate lines of code and thus make them reproducable without writing everything again, whereas for-loops are for iteration purposes.
In your above mentioned example, a function would just contain the for-loop and return the resulting dataframe, which you could use then. but it will not change anything or make your code smarter.

Finding a variable count given another variable in Python

Using Python 3
I am trying to pull the total count for each group:
1 - control and convert
2 - treatment and convert
control = df2[df2.group == 'control']
treatment = df2[df2.group == 'treatment']
old = df2[df2.landing_page == 'old']
new = df2[df2.landing_page == 'convert']
I've tried a couple different things:
control.user_id.count() + convert.user_id.count()
But this just adds both groups up.
I also tried a groupby but I can't get the syntax to work.
df2.groupby(df2[df2.group =='control',
'old']).landing_page().reset_index(name='Count')
What is the best way to pull a group given the presence in another group?
Are you looking for something like this?
Two arrays:
a = [(1,2.1232),(3,5)]
b = [(1,2.1232),(5,5)]
List comprehension for finding how many of a are in b:
sum([x in a for x in b])
Returns: 1

Categories