Python: Conditionals for new variable with three classes - python

I want to create a new variable with include all drugs DS59 - DS71 (values currently coded: 1 = never used, 2 = rarely use, 3 = occasionally use, and 4 = regularly use). I want one of three classes to be assigned to each subject as laid out below:
no user: no use on any of the drugs (all 1's)
experimenter/light user: low overall score on drug use across all classes (total summed score less than 20) and no "regularly use (4)" answers to any drug classes
regular user - high overall score on drug use across all classes (score above 20) and at least one "occasionally use (3)" or "regularly use (4)" answer to any drug class
This is my current code - I am unsure how to most appropriately write the conditionals.
druglist=[(df['DS59']),(df['DS60']),(df['DS61']),(df['DS62']),(df['DS63']),
(df['DS64']),(df['DS65']),(df['DS66']),(df['DS67']),(df['DS68']),
(df['DS69']),(df['DS70']),(df['DS71'])]
conditions=[
(druglist== ),
(druglist==),
(druglist== ),
]
values=['no user','experimenter/light user','regular user']
df['drugs']=np.select(conditions,values)
Thank you so much for any help/advice.

If I understood correctly, this should be what you're looking for. Let me know if not:
drug_sum = sum(druglist)
conditions = [
(drug_sum == len(druglist)), # If it equals the length, that means every item is 1
(drug_sum <= 20 and not 4 in druglist),
(drug_sum > 20 and (3 in druglist or 4 in druglist)),
]
Though I'm not sure, do these conditions not leave some cases not fitting into any of the options? For example if a person is 1 on everything but one drug, on which they are 4.

Related

Is it possible to use variables values in if statements in Python?

I have a database table about people. They have variables like age, height, weight etc..
I also have another database table about charasteristics of the people. This has three fields:
Id: Just a running number
Condition: For example "Person is teenager" or "Person is overweight"
Formula: For example for the "Person is teenager" the formula is "age > 12 and age < 20" or for the overweight "weight / height * height > 30"
There are more than 50 conditions like there. When I want to define the characteristics of the person I would need to make if statement for all these conditions which makes the code quite messy and also hard to maintain (when ever I add a new condition to database I also need to add a new if statement in the code)
If I type the formulas directly to database is it possible to use those as if statements directly? As in if(print(characteristic['formula']) etc..
What I am looking is something like this, I am using Python.
In this code
Person is one person already fetched from database as a dict
Characteristics are all the characteristics fetched from the database as a list of dictionaries
def getPeronCharacteristics(person, characteristics):
age = person['age']
weight = person['weight'] etc...
personsCharacteristics = []
for x in characteristics:
if(x['formula']):
personCharacteristics.append(x['condition'])
return personCharacteristics
Now in this part if(x['formula']) instead of checking if the variable is true it should "print" the variable value and run if statement agains that e.g. if(age > 12 and age < 20):
Is this possible in some way? Again the whole point of this is that when I come up with new conditions I could just add a new row to the database without altering any code and adding yet another if statement.
Do you mean like this?
#
#Example file for working with conditional statement
#
def main():
x,y =2,8
if(x < y):
st= "x is less than y"
print(st)
if __name__ == "__main__":
main()
This is possible using python's eval function:
if eval(x['formula']):
...
However, this is usually discouraged as it can make it complicated to understand your program, and can give security problems if you're not very careful about how your database is accessed and what can end up in there.

Efficiently writing string comparison functions in Pandas

Let's say I work for a company that hands out different types of loans. We are getting our loan information from from a big data mart from which I need to calculate some additional things to calculate if someone is in arrears or not, etc. Right now, for clarity's sake I have done this a rather dumb function that iterates over all rows (where all information over a loan is stored) by using the pd.DataFrame.apply(myFunc, axis=1) function, which is horribly slow off course.
Now that we are growing and that I get more and more data to process, I am starting to get concerned over performance. Below is an example of a function that I call a lot, and would like to optimize (some ideas that I have below). These functions are applied to a DataFrame which has (a.o.) the following fields:
Loan_Type : a field containing a string that determines the type of loan, we have many different names but it comes down to either 4 types (for this example); Type 1 and Type 2, and whether staff or not has this loan.
Activity_Date : The date the activity on the loan was logged (it's a daily loan activity table, if that tells you anything)
Product_Account_Status : The status given by the table to these loans (are they active, or some other status?) on the Activity_Date, this needs to be recalculated because it is not always calculated in the table (don't ask why it is like this, complete headache).
Activation_Date : The date the loan was activated
Sum_Paid_To_Date : The amount of money paid into the loan at the Activity_Date
Deposit_Amount : The deposit amount for the loan
Last_Paid_Date : The last date a payment was made into the loan.
So two example functions:
def productType(x):
# Determines the type of the product, for later aggregation purposes, and to determine the amount to be payable per day
if ('Loan Type 1' in x['Loan_Type']) & (not ('Staff' in x['Loan_Type'])):
return 'Loan1'
elif ('Loan Type 2' in x['Loan_Type']) & (not ('Staff' in x['Loan_Type'])):
return 'Loan2'
elif ('Loan Type 1' in x['Loan_Type']) & ('Staff' in x['Loan_Type']):
return 'Loan1Staff'
elif ('Loan Type 2' in x['Loan_Type']) & ('Staff' in x['Loan_Type']):
return 'Loan2Staff'
elif ('Mobile' in x['Loan_Type']) | ('MM' in x['Loan_Type']):
return 'Other'
else:
raise ValueError(
'A payment plan is not captured in the code, please check it!')
This function is then applied to the DataFrame AllLoans which contains all loans I want to analyze at that moment, by using:
AllLoans['productType'] = AllLoans.apply(lambda x: productType(x), axis = 1)
Then I want to apply some other functions, one example of such a function is given below. This function determines whether the loan is blocked or not, depending on how long someone hasn't paid, and some other statuses that are important, but are currently stored in strings in the loan table. Examples of this are whether people are cancelled (for being blocked for too long), or some other statuses, we treat customers differently based on these tags.
def customerStatus(x):
# Sets the customer status based on the column Product_Account_Status or
# the days of inactivity
if x['productType'] == 'Loan1':
dailyAmount = 2
elif x['productType'] == 'Loan2':
dailyAmount = 2.5
elif x['productType'] == 'Loan1Staff':
dailyAmount = 1
elif x['productType'] == 'Loan2Staff':
dailyAmount = 1.5
else:
raise ValueError(
'Daily amount to be paid could not be calculated, check if productType is defined.')
if x['Product_Account_Status'] == 'Cancelled':
return 'Cancelled'
elif x['Product_Account_Status'] == 'Suspended':
return 'Suspended'
elif x['Product_Account_Status'] == 'Pending Deposit':
return 'Pending Deposit'
elif x['Product_Account_Status'] == 'Pending Allocation':
return 'Pending Allocation'
elif x['Outstanding_Balance'] == 0:
return 'Finished Payment'
# If this check returns True it means that Last_Paid_Date is zero/null, as
# far as I can see this means that the customer has only paid the deposit
# and is thus an FPD
elif type(x['Date_Of_Activity'] - x['Last_Paid_Date']) != (pd.tslib.NaTType):
if (((x['Date_Of_Activity'] - x['Last_Paid_Date']).days + 1) > 30) | ((((x['Date_Of_Activity'] - x['Last_Paid_Date']).days + 1) > 14) & ((x['Sum_Paid_To_Date'] - x['Deposit_Amount']) <= dailyAmount)):
return 'Blocked'
elif ((x['Date_Of_Activity'] - x['Last_Paid_Date']).days + 1) <= 30:
return 'Active'
# If this is True, the customer has not paid more than the deposit, so it
# will fall on the age of the customer whether they are blocked or not
elif type(x['Date_Of_Activity'] - x['Last_Paid_Date']) == (pd.tslib.NaTType):
# The date is changed here to 14 because of FPD definition
if ((x['Date_Of_Activity'] - x['Activation_Date']).days + 1) <= 14:
return 'Active'
elif ((x['Date_Of_Activity'] - x['Activation_Date']).days + 1) > 14:
return 'Blocked'
# If we have reached the end and still haven't found the status, it will
# get the following status
return 'Other Status'
This is again applied by using AllLoans['customerStatus'] = AllLoans.apply(lambda x: customerStatus(x), axis = 1). As you can see there are many string comparisons and date comparisons, which are a bit confusing for me on how I can 'properly' vectorize these functions.
Apologies if this is Optimization 101, but have tried to search for answers and strategies on how to do this, but couldn't find really comprehensive answers. I was hoping to get some tips here, thanks in advance for your time.
Some thoughts on making this faster/getting towards a more vectorized approach:
Make the customerStatus function slightly more modular by making a function that determines the daily amounts, and stores this in the dataframe for quicker access (I need to access them later anyway, and determine this variable in multiple functions).
Make the input column for the productType function into integers by using some sort of dict, so that fewer string functions need to called to this (but feel like this won't be my biggest speed up)
Some things that I would like to do but don't really know where to start on;
How to properly vectorize these functions that contain many if statements based on string/date comparisons (business rules can be a bit complex here) based on different columns in the dataframe. The code might become a bit more complex, but I need to apply these functions multiple times to slightly different (but importantly different) dataframes, and these are growing larger and larger so these functions need to be in some sort of library for ease of access, and the code needs to be speed up because it simply takes up to much time.
Have tried to search for some solutions like Numba or Cython but I don't understand enough of the inner workings of C to properly use this (or just yet, would like to learn). Any suggestions on how to improve performance would be greatly appreciated.
Kind regards,
Tim.

Testing Equality of boto Price object

I am using the python package boto to connect python to MTurk. I am needing to award bonus payments, which are of the Price type. I want to test if one Price object equals a certain value. Specifically, when I want to award bonus payments, I need to check that their bonus payment is not 0 (because when you try to award a bonus payment in MTurk, it needs to be positive). But when I go to check values, I can't do this. For example,
from boto.mturk.connection import MTurkConnection
from boto.mturk.price import Price
a = Price(0)
a == 0
a == Price(0)
a == Price(0.0)
a > Price(0)
a < Price(0)
c = Price(.05)
c < Price(0)
c < Price(0.0)
These yield unexpected answers.
I am not sure of how to test if a has a Price equal to 0. Any suggestions?
Think you'll want the Price.amount function to compare these values. Otherwise, I think it compares objects or some other goofiness. It'd be smart for the library to override the standard quality test to make this more developer-friendly.

Create random, unique variable names for objects

I'm playing around with Python and any programming language for the first time, so please bear with me. I started an online class two weeks ago, but try to develop a small game at the side to learn faster (and have fun). It's a text adventure, but is shall have random encounters and fights with enemies that have random equipment that the players can then loot.
This is my problem: If I create random objects/weapons for my random encounter, I need to make sure that the object has a unique name. The way the level is designed there could in theory be an infinite number of objects (it can be open ended, with encounters just popping up).
This is my approach so far
class Item_NPCs: #create objects for NPCs
def__init__(self, item_type, item_number):
# e.g. item_type 1 = weapons, item_type2 = potions, etc.
if item_type == 1 and item number == 1:
self.property1 = 5
self.property2 = 4
if item_type == 1 and item_number ==2:
# etc. :)
def prepare_encounter():
inventory_NPC = [] # empty list for stuff the NPC carries around
XXX = Class_Item(item_type, item_number) # I might randomize the arguments.
What is important is that I want "XXX" to be unique and random, so that no object exists more than once and can later be put into the player's inventory.
How to do that?
Joe
Why do you need it to be random ? You could simply use a list, and append every new object to the list, with its index being its unique identifier :
items = []
items.append( Class_Item(item_type, item_number) )
But if you really need random identifier, maybe you can use a dictionary :
items = dict()
items[random_id] = Class_Item(item_type, item_number)
This requires random_id to be hashable (but it should be if it is a number or a string).
I don't know why others haven't thought of this:
yourvariableName = randint(0,1000)
exec("var_%s = 'value'" % yourVariableName)
I just thought of it myself. I hope it helps you.
A downside is you can't just do this:
print(var_yourVariableName)
you can only do this:
exec("print(var_%s)" % yourVariableName)
But you can probably circumvent this one way or another. Leave a comment if you manage to figure it out.
One more downside — if used in certain ways, it could be very insecure (as we are using exec), so make sure to cover any holes!

Create and Instantiate objects in python when the program in running

I'm making a program for calculate the electrical consumption of a building starting from the characteristics of the building, like the number of apartments and its type, I mean all not need to be of the same size for example, so I create a class called apartment, something like this:
class apartamento:
def __init__(self):
self.area = 0
self.cantidad = 0
self.altura = 0
self.personas = 0
self.area = 0
self.dotacion = 0
self.cto_alum = 0
self.cto_tug = 0
self.cto_e = 0
So I could have let's say thirty apartments fifteen of 100m^2 and others fifteen of 80m^2, the I want:
type1 = apartamento()
type2 = apartamento()
type1.area = 100
type2.area = 80
But since I dont know how many types of apartments are, I need to create them when the program is running in a loop for example
When I say I don't know how many types of apartments are, I refer to the fact that this could be used for someone else, in differents sets of buildings, so in one It could be only one type of apartment and in other could be ten, and this has to be transparent to the user, now I have a spin box to put how many types of apartments are and then I create a table to put all the data of them, their size, number of person, number of circuits that it has, the problem is that them I have to make some calculations on this data so I want to instantiated every type of apartment as a object with the atributes that are put in the table, now I dont known how many types there will be, it depends of the building.
Most complex programs solve this problem using a reload feature that they incorporate into some kind of daemon. For instance nginx
service nginx reload
Will update sites with new configuration parameters without having to actually restart the server.
In your case since this sounds like a more simple program you can always just have a python config file and read from it at a set interval.
import imp
config_module = imp.load_source("config", "/path/to/config.py")
And your configuration could look like
CONFIG = {
"type1": {
"altura": 10,
"personas": 2,
"cantidad": 5,
},
"type2": {
...
}
}
At a certain interval you can just ask your program to look up the source of the configuration module and then you won't have to worry about your configuration being up to date.
Either that or you can just stop your program and then run it again in order to pick up the latest changes without having to wait for your config to reload.
After the configuration is updated you can create new apartment objects as you desire
apartmentos = {}
for type, values in config_module.CONFIG.iteritems():
apartmentos[type] = Aparatmento(values)
Where Apartmento now accepts a new dictionary of configuration
class Apartmento(object):
def __init__(self, config):
personas = config["personas"]
altura = config["altura"]
...
Without knowing what exactly is it that you need, this is my take on it.
It's most likely the dumbest approach to it. It looks to me like all you want to create is a building with some fixed number of apartments, some of which are of type 1 some of which are type 2.
Types of apartments don't sound to me like something that is preset, but more like something that is specific to each building and its plans. I.E. some buildings have apartments split in 2 categories, 100m^2 and 80m^2 and some have 90 and 50?
`class apartment:
def __init__(self, area):
self.area = area
class building:
def __init__(self, ntypesOfapartments):
self.ntypesOfapartments = ntypesOfapartments
self.apartments = list()
for i in range(ntypesOfapartments):
area = input("Area of {0}. type of apartment: ".format(i))
n = input("Number of apartments of area {0}: ".format(area))
for j in range(int(n)):
self.apartments.append(apartment(area))
def __str__(self):
desc = "Building has {0} types of apartments \n".format(self.ntypesOfapartments)
desc += "Total number of rooms is {0}.".format(len(self.apartments))
return desc
So you can instantiate a building with n different apartment types, and then define your apartment types interactively in the interpreter.
>>> test = building(2)
Area of 0. type of apartment: 100
Number of apartments of area 100: 5
Area of 1. type of apartment: 80
Number of apartments of area 80: 10
>>> print(test)
Building has 2 types of apartments
Total number of rooms is 15.
>>>
Of course it would be a lot better to use a more sensical data structure than a list to hold the apartments, like a dictionary. That way you could access apartment type 1 in a dictionary under which you could have a list of all apartments that match your defined type 1.
This also very depends on the type of calculations you want to do with them? What are all the important data that you have to use in those calculations? In this example I only instantiate apartment area, if it turns out that number of people that live in it, is important to your calculations too, then it doesn't make sense to instantiate all of that by hand. Also what do you mean you don't know how many apartment types there are? What do you consider as a "running python program"? Do you mean interpreter, or are you writing something else?

Categories