How to optimize my code for the Kattis Accounting Question? - python

I am doing this Kattis accounting question but at test case 10, it has the error Time limit exceeded.
How can I optimize my code to make it run faster?
Here's the question!
Erika the economist studies economic inequality. Her model starts in a
situation where everybody has the same amount of money. After that,
people’s wealth changes in various complicated ways.
Erika needs to run a simulation a large number of times to check if
her model works. The simulation consists of people, each of whom
begins with kroners. Then events happen, of three different types:
An event of type “SET ” means that the th person’s wealth is set to .
An event of type “RESTART ” means that the simulation is restarted,
and everybody’s wealth is set to .
An event of type “PRINT ” reports the current wealth of the th person.
Unfortunately, Erika’s current implementation is very slow; it takes
far too much time to keep track of how much money everybody has. She
decides to use her algorithmic insights to speed up the simulation.
Input The first line includes two integers and , where and . The
following lines each start with a string that is either “SET”,
“RESTART”, or “PRINT”. There is guaranteed to be at least one event of
type “PRINT”.
If the string is “SET” then it is followed by two integers and with
and . If the string is “RESTART” then it is followed by an integer
with . If the string is “PRINT” then it is followed by an integer
with .
Output For each event of type “PRINT”, write the th person’s capital.
Sample Input 1: 3 5 SET 1 7 PRINT 1 PRINT 2
RESTART 33 PRINT 1
Sample Output 1: 7 0 33
Sample Input 2: 5 7 RESTART 5 SET 3 7 PRINT 1
PRINT 2 PRINT 3 PRINT 4 PRINT 5
Sample Output 2: 5 5 7 5 5
# print("Enter 2 numbers")
n, q = map(int, input().split())
# print(n , q)
people = {}
def createPeople(n):
for i in range(n):
number = i+1
people[number] = 0
return people
def restart(n,new):
for i in range(n):
number = i+1
people[number] = new
return people
def setPeople(d ,id , number):
d[id] = number
return d
# return d.update({id: number})
def logic(n,dict,q):
for i in range(q):
# print("enter Command")
r = input()
r = r.split()
# print("r" ,r)
if r[0] == "SET":
# print(people , "People list")
abc = setPeople(dict, int(r[1]), int(r[2]))
# print(list)
elif r[0] == "RESTART":
abc = restart(n, int(r[1]))
elif r[0] == "PRINT":
print(dict[int(r[1])])
# return abc
people = createPeople(n)
# print(people)
test = logic(n,people,q)

The input is too big to be doing anything linear, like looping over all of the people and setting their values by hand. If we have 105 queries and 106 people, the worst case scenario is resetting over and over again, 1011 operations.
Easier is to keep a variable to track the baseline value after resets. Whenever a reset occurs, dump all entries in the dictionary and set the baseline to the specified value. Assume any further lookups for people that aren't in the dictionary to have the most recent baseline value. Now, all operations are O(1) and we can handle 105 queries linearly.
people = {}
baseline = 0
n, q = map(int, input().split())
for _ in range(q):
command, *args = input().split()
if command == "SET":
people[int(args[0])] = int(args[1])
elif command == "RESTART":
people.clear()
baseline = int(args[0])
elif command == "PRINT":
print(people.get(int(args[0]), baseline))
As an aside, writing abstractions is great in a real program, but for these tiny code challenges I'd just focus on directly solving the problem. This reduces the potential for confusion with return values like abc that seem to have no clear purpose.
Per PEP-8, use snake_case rather than camelCase in Python.

Related

Problem with variables in if-statements. Python

I have an exchange data stream coming in that sets symbol to a random letter of the alphabet every like 10 ms in an infinite while loop that is calling func(pair, time). Symbol is the trading pair for simplification here. I have used A and Z to show the range.
Using the method below, I have to create a lot of if-statements when I want to count i for each letter. IE, I have to create iA, iB, iC, .. iZ. In reality, there is about 20 lines code to execute instead of the i-iteration shown here. This is very messy.
I am a beginner in coding and stuck with finding a more elegant and perhaps computationally faster way to do this.
def func(symbol, cur_time):
if future_timeA > cur_timeA and symbol = A:
iA += iA
return -1
if future_timeA < cur_timeA and symbol = A:
future_timeA = cur_timeA + 1
valueA = iA
return valueA
if future_timeZ > cur_timeZ and symbol = Z:
iZ += iZ
return -1
if future_timeZ < cur_timeZ and symbol = Z:
future_timeZ = cur_timeZ + 1
value = iZ
return valueZ
Since you need to check for each of the 26 letters, at least this much code would be there. This is available in Python 3.10.
match symbol:
case "A":
do whatever for A
case "B":
do whatever for B
....
case "Z":
do whatever for Z
Instead of symbol, use a number between 0 and 25 to represent all possible letters. This way you can create an array with 26 inputs to calculate the amount of all letters, and put this in a loop for cleaner code.
Hope that helps you further!
Python3.10 introduced a match case statement for that kind of problem: See this answer.

What is the def paradox_stats() function in my code?

I am trying to learn how to write a function that could test the probability of same birthday of two people in a room.
The birthday paradox says that the probability that two people in a room will have the same birthday is more than half, provided n, the number of people in the room, is more than 23. This property is not really a paradox, but many people find it surprising. Design a Python program that can test this paradox by a series of experiments on randomly generated birthdays, which test this paradox for n = 5,10,15,20,... ,100.
Here is the code that showed in my book.
import random
def test_birthday_paradox(num_people):
birthdays = [random.randrange(0,365) for _ in range(num_people)]
birthday_set = set()
for bday in birthdays:
if bday in birthday_set: return True
else: birthday_set.add(bday)
return False
def paradox_stats(num_people = 23, num_trials = 100):
num_successes = 0
for _ in range(num_trials):
if test_birthday_paradox(num_people): num_successes += 1
return num_successes/num_trials
paradox_stats(31)
0.77
I can't understand the code from def paradox_stats to the end of code.
Can someone help me , please?
Guessing that paradox_state(31) is a mistake and you want to write paradox_stats(31):
def paradox_stats(num_people = 23, num_trials = 100): is the definition of the function where two variables could be inserted (these variables are optional).
num_successes = 0 the code are initializing the variable num_successes to zero.
for _ in range(num_trials):
if test_birthday_paradox(num_people): num_successes += 1
return num_successes/num_trials
Here the code is running throw a range from 0 to the number of trials which the user could define once is calling the function (remember it is an optional variable).
In this loop the code is using the previous function test_birthday_paradox (which I suppose you understand as far as you say in your question) to know if someone in the room has the same birthday. In the case that the function returns True (someone has the same birthday) the variable num_successes increase its value in one (this is how works += syntax, but if you need further explanation num_successes+=1 == num_successes = num_successes+1).
And once the loop is completed the function paradox_stats return the probability in the random sample as the number of successes vs number of trials.
Hope my answer can help you.

Python: replace for loop with function

Can anyone help me to understand how I would create a function with def whatever() instead of using a for loop. I'm trying to do thing more Pythonically but don't really understand how to apply a function well instead of a loop. For instance, I have a loop below that works well and gives the output I would like, is there a way to do this with a function?
seasons = leaguesFinal['season'].unique()
teams = teamsDF['team_long_name'].unique()
df = []
for i in seasons:
season = leaguesFinal['season'] == i
season = leaguesFinal[season]
for j in teams:
team_season_wins = season['win'] == j
team_season_win_record = team_season_wins[team_season_wins].count()
team_season_loss = season['loss'] == j
team_season_loss_record = team_season_loss[team_season_loss].count()
df.append((j, i, team_season_win_record, team_season_loss_record))
df = pd.DataFrame(df, columns=('Team', 'Seasons', 'Wins', 'Losses'))
The output looks as follows:
Team Seasons Wins Losses
0 KRC Genk 2008/2009 15 14
1 Beerschot AC 2008/2009 11 14
2 SV Zulte-Waregem 2008/2009 16 11
3 Sporting Lokeren 2008/2009 13 9
4 KSV Cercle Brugge 2008/2009 14 15
Solution
def some_loop(something, something_else):
for i in something:
season = leaguesFinal['sesaon'] == i
season = leaguesFinal[season]
for j in something_else:
team_season_wins = season['win'] == j
team_season_win_record = team_season_wins[team_season_wins].count()
team_season_loss = season['loss'] == j
team_season_loss_record = team_season_loss[team_season_loss].count()
df.append((j, i, team_season_win_record, team_season_loss_record))
some_loop(seasons, teams)
Comments
This is what you are mentioning, creating a function out of the for loop although you still have a for loop its in a function that you can use in different areas of your code without re-using the entire code for the loop.
All there is to to is define a function that accepts two variables for this particular loop that would be def some_loop(something, something_else), I used basic naming so you could see clearer whats taking place.
Then you would replace all the instanes of seasons and teams with those variables.
Now you call your function will replace all occurences of something and something_else with whatever inputs you send to it.
Also I am not completely sure of the statements that involve x = y = i and what this accomplishes or if its even a valid statement?
actually youre mixing stuff up - functions just aggregate lines of code and thus make them reproducable without writing everything again, whereas for-loops are for iteration purposes.
In your above mentioned example, a function would just contain the for-loop and return the resulting dataframe, which you could use then. but it will not change anything or make your code smarter.

SQLalchemy performance when iterating queries millions of time

I'm writing a disease simulation in Python, using SQLalchemy, but I'm hitting some performance issues when running queries on a SQLite file I create earlier in the simulation.
The code is below. There are more queries in the outer for loop, but what I've posted is what slowed it down to a crawl. There are 365 days, about 76,200 mosquitos, and each mosquito makes 5 contacts per day, bringing it to about 381,000 queries per simulated day, and 27,813,000 through the entire simulation (and that's just for the mosquitos). It goes along at about 2 days / hour which, if I'm calculating correctly, is about 212 queries per second.
Do you see any issues that could be fixed that could speed things up? I've experimented with indexing the fields which are used in selection but that didn't seem to change anything. If you need to see the full code, it's available here on GitHub. The function begins on line 399.
Thanks so much, in advance.
Run mosquito-human interactions
for d in range(days_to_run):
... much more code before this, but it ran reasonably fast
vectors = session.query(Vectors).yield_per(1000) #grab each vector..
for m in vectors:
i = 0
while i < biting_rate:
pid = random.randint(1, number_humans) # Pick a human to bite
contact = session.query(Humans).filter(Humans.id == pid).first() #Select the randomly-chosen human from SQLite table
if contact: # If the random id equals an ID in the table
if contact.susceptible == 'True' and m.infected == 'True' and random.uniform(0, 1) < beta: # if the human is susceptible and mosquito is infected, infect the human
contact.susceptible = 'False'
contact.exposed = 'True'
elif contact.infected == 'True' and m.susceptible == 'True': # otherwise, if the mosquito is susceptible and the human is infected, infect the mosquito
m.susceptible = 'False'
m.infected = 'True'
nInfectedVectors += 1
nSuscVectors += 1
i += 1
session.commit()

How to speed up Python string matching code

I have this code which computes the Longest Common Subsequence between random strings to see how accurately one can reconstruct an unknown region of the input. To get good statistics I need to iterate it many times but my current python implementation is far too slow. Even using pypy it currently takes 21 seconds to run once and I would ideally like to run it 100s of times.
#!/usr/bin/python
import random
import itertools
#test to see how many different unknowns are compatible with a set of LCS answers.
def lcs(x, y):
n = len(x)
m = len(y)
# table is the dynamic programming table
table = [list(itertools.repeat(0, n+1)) for _ in xrange(m+1)]
for i in range(n+1): # i=0,1,...,n
for j in range(m+1): # j=0,1,...,m
if i == 0 or j == 0:
table[i][j] = 0
elif x[i-1] == y[j-1]:
table[i][j] = table[i-1][j-1] + 1
else:
table[i][j] = max(table[i-1][j], table[i][j-1])
# Now, table[n, m] is the length of LCS of x and y.
return table[n][m]
def lcses(pattern, text):
return [lcs(pattern, text[i:i+2*l]) for i in xrange(0,l)]
l = 15
#Create the pattern
pattern = [random.choice('01') for i in xrange(2*l)]
#create text start and end and unknown.
start = [random.choice('01') for i in xrange(l)]
end = [random.choice('01') for i in xrange(l)]
unknown = [random.choice('01') for i in xrange(l)]
lcslist= lcses(pattern, start+unknown+end)
count = 0
for test in itertools.product('01',repeat = l):
test=list(test)
testlist = lcses(pattern, start+test+end)
if (testlist == lcslist):
count += 1
print count
I tried converting it to numpy but I must have done it badly as it actually ran more slowly. Can this code be sped up a lot somehow?
Update. Following a comment below, it would be better if lcses used a recurrence directly which gave the LCS between pattern and all sublists of text of the same length. Is it possible to modify the classic dynamic programming LCS algorithm somehow to do this?
The recurrence table table is being recomputed 15 times on every call to lcses() when it is only dependent upon m and n where m has a maximum value of 2*l and n is at most 3*l.
If your program only computed table once, it would be dynamic programming which it is not currently. A Python idiom for this would be
table = None
def use_lcs_table(m, n, l):
global table
if table is None:
table = lcs(2*l, 3*l)
return table[m][n]
Except using an class instance would be cleaner and more extensible than a global table declaration. But this gives you an idea of why its taking so much time.
Added in reply to comment:
Dynamic Programming is an optimization that requires a trade-off of extra space for less time. In your example you appear to be doing a table pre-computation in lcs() but you build the whole list on every single call and then throw it away. I don't claim to understand the algorithm you are trying to implement, but the way you have it coded, it either:
Has no recurrence relation, thus no grounds for DP optimization, or
Has a recurrence relation, the implementation of which you bungled.

Categories