Python: replace for loop with function - python

Can anyone help me to understand how I would create a function with def whatever() instead of using a for loop. I'm trying to do thing more Pythonically but don't really understand how to apply a function well instead of a loop. For instance, I have a loop below that works well and gives the output I would like, is there a way to do this with a function?
seasons = leaguesFinal['season'].unique()
teams = teamsDF['team_long_name'].unique()
df = []
for i in seasons:
season = leaguesFinal['season'] == i
season = leaguesFinal[season]
for j in teams:
team_season_wins = season['win'] == j
team_season_win_record = team_season_wins[team_season_wins].count()
team_season_loss = season['loss'] == j
team_season_loss_record = team_season_loss[team_season_loss].count()
df.append((j, i, team_season_win_record, team_season_loss_record))
df = pd.DataFrame(df, columns=('Team', 'Seasons', 'Wins', 'Losses'))
The output looks as follows:
Team Seasons Wins Losses
0 KRC Genk 2008/2009 15 14
1 Beerschot AC 2008/2009 11 14
2 SV Zulte-Waregem 2008/2009 16 11
3 Sporting Lokeren 2008/2009 13 9
4 KSV Cercle Brugge 2008/2009 14 15

Solution
def some_loop(something, something_else):
for i in something:
season = leaguesFinal['sesaon'] == i
season = leaguesFinal[season]
for j in something_else:
team_season_wins = season['win'] == j
team_season_win_record = team_season_wins[team_season_wins].count()
team_season_loss = season['loss'] == j
team_season_loss_record = team_season_loss[team_season_loss].count()
df.append((j, i, team_season_win_record, team_season_loss_record))
some_loop(seasons, teams)
Comments
This is what you are mentioning, creating a function out of the for loop although you still have a for loop its in a function that you can use in different areas of your code without re-using the entire code for the loop.
All there is to to is define a function that accepts two variables for this particular loop that would be def some_loop(something, something_else), I used basic naming so you could see clearer whats taking place.
Then you would replace all the instanes of seasons and teams with those variables.
Now you call your function will replace all occurences of something and something_else with whatever inputs you send to it.
Also I am not completely sure of the statements that involve x = y = i and what this accomplishes or if its even a valid statement?

actually youre mixing stuff up - functions just aggregate lines of code and thus make them reproducable without writing everything again, whereas for-loops are for iteration purposes.
In your above mentioned example, a function would just contain the for-loop and return the resulting dataframe, which you could use then. but it will not change anything or make your code smarter.

Related

How to optimize my code for the Kattis Accounting Question?

I am doing this Kattis accounting question but at test case 10, it has the error Time limit exceeded.
How can I optimize my code to make it run faster?
Here's the question!
Erika the economist studies economic inequality. Her model starts in a
situation where everybody has the same amount of money. After that,
people’s wealth changes in various complicated ways.
Erika needs to run a simulation a large number of times to check if
her model works. The simulation consists of people, each of whom
begins with kroners. Then events happen, of three different types:
An event of type “SET ” means that the th person’s wealth is set to .
An event of type “RESTART ” means that the simulation is restarted,
and everybody’s wealth is set to .
An event of type “PRINT ” reports the current wealth of the th person.
Unfortunately, Erika’s current implementation is very slow; it takes
far too much time to keep track of how much money everybody has. She
decides to use her algorithmic insights to speed up the simulation.
Input The first line includes two integers and , where and . The
following lines each start with a string that is either “SET”,
“RESTART”, or “PRINT”. There is guaranteed to be at least one event of
type “PRINT”.
If the string is “SET” then it is followed by two integers and with
and . If the string is “RESTART” then it is followed by an integer
with . If the string is “PRINT” then it is followed by an integer
with .
Output For each event of type “PRINT”, write the th person’s capital.
Sample Input 1: 3 5 SET 1 7 PRINT 1 PRINT 2
RESTART 33 PRINT 1
Sample Output 1: 7 0 33
Sample Input 2: 5 7 RESTART 5 SET 3 7 PRINT 1
PRINT 2 PRINT 3 PRINT 4 PRINT 5
Sample Output 2: 5 5 7 5 5
# print("Enter 2 numbers")
n, q = map(int, input().split())
# print(n , q)
people = {}
def createPeople(n):
for i in range(n):
number = i+1
people[number] = 0
return people
def restart(n,new):
for i in range(n):
number = i+1
people[number] = new
return people
def setPeople(d ,id , number):
d[id] = number
return d
# return d.update({id: number})
def logic(n,dict,q):
for i in range(q):
# print("enter Command")
r = input()
r = r.split()
# print("r" ,r)
if r[0] == "SET":
# print(people , "People list")
abc = setPeople(dict, int(r[1]), int(r[2]))
# print(list)
elif r[0] == "RESTART":
abc = restart(n, int(r[1]))
elif r[0] == "PRINT":
print(dict[int(r[1])])
# return abc
people = createPeople(n)
# print(people)
test = logic(n,people,q)
The input is too big to be doing anything linear, like looping over all of the people and setting their values by hand. If we have 105 queries and 106 people, the worst case scenario is resetting over and over again, 1011 operations.
Easier is to keep a variable to track the baseline value after resets. Whenever a reset occurs, dump all entries in the dictionary and set the baseline to the specified value. Assume any further lookups for people that aren't in the dictionary to have the most recent baseline value. Now, all operations are O(1) and we can handle 105 queries linearly.
people = {}
baseline = 0
n, q = map(int, input().split())
for _ in range(q):
command, *args = input().split()
if command == "SET":
people[int(args[0])] = int(args[1])
elif command == "RESTART":
people.clear()
baseline = int(args[0])
elif command == "PRINT":
print(people.get(int(args[0]), baseline))
As an aside, writing abstractions is great in a real program, but for these tiny code challenges I'd just focus on directly solving the problem. This reduces the potential for confusion with return values like abc that seem to have no clear purpose.
Per PEP-8, use snake_case rather than camelCase in Python.

itertools.product for the full range of columns

as a part of my code, I'm trying to get a full factorial matrix, this is not a problem since I already have a working code for it. However, I would like to generalize it in a way that it wouldn't matter the number of inputs. This would require modifying the line:
for combination in itertools.product(X[0,:],X[1,:],X[2,:],X[3,:],X[4,:],X[5,:],X[6,:]):
input_list = dfraw.columns[0:n_inputs]
output_list = dfraw.columns[n_inputs:len(dfraw.columns)]
fflvls = 4
lhspoints = 60000
X = np.zeros((n_inputs, fflvls),float)
ii=0
for entrada in input_list:
X[ii] = np.linspace(min(dfraw[entrada]), max(dfraw[entrada]), fflvls)
ii+=1
number=1
i=0
X_fact=np.zeros((int(fflvls**n_inputs),n_inputs),float)
for combination in itertools.product(X[0,:],X[1,:],X[2,:],X[3,:],X[4,:],X[5,:],X[6,:]):
X_fact[i,:] = (combination)
i +=1
number+=1
I thought of writing the input of itertools.product as a string with a loop and then evaluating but it doesn't work and I've also seen it is regarded as bad practice
prodstring = ['X[0,:]']
for ii in range(n_inputs):
prodstring.append(',X[%d,:]'%(ii))
in_products = ''.join(prodstring)
for combination in itertools.product(eval(in_products)):
X_fact[i,:] = (combination)
i +=1
number+=1
what other way is there to inputing the full range of columns in this function? (or similar ones)
who said working harder is working better? im back from lunch and I delved into *args and **kwargs as a form of procrastination cause ive sometimes seen them mentioned and i was curious. It seems like it was just the tool I needed. In case this can help other code rookies like me in the future:
args = ()
for ii in range(n_inputs):
b = (X[ii,:])
args += (b,)
for combination in itertools.product(*args):
X_fact[i,:] = (combination)
i +=1
number+=1
Seems to work properly. Solved in an hour of "not working" what i haven't solved in approx 4 hours of "working"

Can you use if statements to create variables?

I am trying to make the switch from STATA to python for data analysis and I'm running into some hiccups that I'd like some help with. I am attempting to create a secondary variable based on some values in an original variable. I want to create a binary variable which identifies fall accidents (E-codes E880.xx -E888.xx) with a value of 1, and all other e-codes with a value of 0. in a list of ICD-9 codes with over 10,000 rows, so manual imputation isn't possible.
in STATA the code would look something like this
newvar= 0
replace newvar = 1 if ecode_variable == "E880"
replace newvar = 1 if ecode_variable == "E881"
etc
I tried a similar statement in python, but it's not working
data['ecode_fall'] = 1 if data['ecode'] == 'E880'
is this type of work possible in python? Is there a function in the numpy or pandas packages that could help with this.
I've also tried creating a dictionary variable which calls the fall injury codes 1 and applying it to the variable to no avail.
Put the if first.
if data['ecode'] == 'E880': data['ecode_fall'] = 1
you can break it out into two lines like this:
if data['ecode'] == 'E880':
data['ecode_fall'] = 1
or if you include an else statement you can have it in one line, similar syntax to your SATA code:
data['ecode_fall'] = 1 if data['ecode'] == 'E880' else None
Following from the other answers, you can also check multiple values at once like so:
if data['ecode'] in ('E880', 'E881', ...):
data['ecode_fall'] = 1
this leaves you having to only do one if statement per unique value of data['ecode_fall'].

Variable loop inside variable loop

New Python learner here. I am trying to make a program that searches for patterns of words in a string and extracts them into variables. I’m doing this by looping through lists to find particular substrings.
I have encountered a problem that has me a bit stuck and I was wondering if somebody here could help me:
I want to loop through a list of strings inside another loop of strings but I can't seem to work out where to loop through the monthcount variable. My code below:
months = ["Easter '","December"]
monthcount = 0
datecheck = [['dated ',' and inscribed '],['dated ?','verso'],['dated ','lower right'],["dated "+months[monthcount],'in']]
datedcount = 0
while datedcount <(len(datecheck)):
if (datecheck[datedcount][0]) in inscription:
dated = (after(inscription,(datecheck[datedcount][0])))
if dated.isdigit() == False:
dated = (before(dated,(datecheck[datedcount][1])))
dated = dated.strip()
if dated.isdigit() == True:
dated_list[lister] = dated
datedcount = datedcount + 1
Perhaps this is what you are looking for?
datecheck = [['dated ',' and inscribed '],['dated ?','verso'],['dated ','lower right']]
for month in ["Easter '","December"]:
datecheck.append(['dated {0}'.format(month), 'in'])
datedcount = 0
while datedcount < len(datecheck): ...
In other words, we initialize datecheck with the static members on your list, then append a couple of dynamically-generated ones. Then you can loop over the final list just like before.
There is no nesting of loops here, just two sequential loops, where the first loops over the expressions we want to add.

Add to dictionary in if loop

I have an if loop in which I am trying to;
(1) Create a dataframe from a filepath.
(2) Format this dataframe
(3) Add that dataframe to a dictionary that is a property of an instance of a class.
Here is my code defining the class and the method:
class myClass:
def __init__(self, name, filepathlist):
self.name = name
self.filepathlist = filepathlist
def formatData(self):
i = 0
self.dataframeDict = {}
if i < (len(self.filepathlist) - 1):
DFRAW = pd.read_csv(self.filepathlist[i], header = 9) #Row 9 is the row that is not blank (all blank auto-skipped)
DFRAW['DateTime'], DFRAW['dummycol1'] = DFRAW[' ;W;W;W;W'].str.split(';', 1).str
DFRAW['Col1'], DFRAW['dummycol2'] = DFRAW['dummycol1'].str.split(';', 1).str
DFRAW['Col2'], DFRAW['dummycol3'] = DFRAW['dummycol2'].str.split(';', 1).str
DFRAW['Col3'], DFRAW['Col4'] = DFRAW['dummycol3'].str.split(';', 1).str
DFRAW = DFRAW.drop([' ;W;W;W;W', 'dummycol1', 'dummycol2', 'dummycol3'], axis = 1)
dictIndex = self.filepathlist[i][39:44]
self.dataframeDict.update({dictIndex: DFRAW})
i = i + 1
Then I create an instance of the class and run the method:
filepathlist = ['filepath1','filepath2']
myINST = myClass('Mydataname', filepathlist)
myINST.formatData()
I then expect myINST.dataframeDict to have two dataframes as per the 2 input filepaths and thus 2 iterations of the if loop. However only 1 is present.
What is the error in my code or my approach?
It is hard to tell whether this will completely solve your problem, because no dummy data is provided. You will, however, get one step closer to your solution if you replace if i < (len(self.filepathlist) - 1): with while i < (len(self.filepathlist) - 1):.
You are currently just checking if i=0 is smaller than len(self.filepathlist)-1. If so, then the if-block is executed once. What you are actually looking for is a loop that keeps on iterating, as long as i is smaller than len(self.filepathlist)-1. This is done with while-loops.
You need to change your condition to for i in range(len(self.filepathlist)):
(Also, remove the assignment of i as the for loop does it automatically. For the same reason, you should also remove the line which increments i).
If you want to use a while loop, change the if line to while i < len(self.filepathlist):.
Notice that there's no -1. This is because you're using < instead of <=. If you want to use -1, then you also need the <= as this will ensure the loop runs the correct number of times.

Categories