How should I break down this huge function into smaller parts - python

I am trying to understand good design patterns in Python and I cannot think of a way to break this huge function into smaller parts without making the code cluttered, overly complex or plain ugly.
I didn't want to clutter my question by posting the whole file, I mean this function itself is already very large. But the class has only two methods: parse_midi() and generate_midi(file_name, file_length).
pitches, velocities, deltas, durations, and intervals are all MarkovChain objects. MarkovChain is a simple class with methods: add_event(event), generate_markov_dictionary(), and get_next_event(previous_event). MarkovChain.src_events is a list of events to generate the Markov chain from. It is a simple implementation of first order Markov Chains.
def parse_midi(self):
# on_notes dictionary holds note_on events until corresponding note_of event is encountered
on_notes = {}
time = 0
previous_pitch = -1
tempos = []
delta = 0
for message in self.track_in:
time += message.time
delta += message.time
# There are also MetaMessages in a midi file, such as comments, track names, etc.
# We just ignore them
if isinstance(message, mido.Message) and message.type in ["note_on", "note_off"]:
# some midi files use note_on events with 0 velocity instead of note_oof events
# so we check if velocity > 0
if message.velocity > 0 and message.type == "note_on":
on_notes[message.note] = time
self.pitches.add_event(message.note)
self.velocities.add_event(message.velocity)
self.deltas.add_event(delta)
delta = 0
if previous_pitch == -1:
self.intervals.add_event(0)
else:
self.intervals.add_event(message.note - previous_pitch)
else:
# KeyError means note_off came without a prior associated note_on event!"
# Just ignore them
with ignored(KeyError):
self.durations.add_event(time - on_notes[message.note])
del on_notes[message.note]
previous_pitch = message.note
# Tempo might be many tempo changes in a midi file, so we store them all to later calculate an average tempo
elif message.type == "set_tempo":
tempos.append(message.tempo)
elif message.type == "time_signature":
self.time_signature = self.TimeSignature(message.numerator, message.denominator,
message.clocks_per_click, message.notated_32nd_notes_per_beat)
# some tracks might be aempty in a midi file. For example they might contain comments as track name, and no note events
if len(self.pitches.src_events) == 0:
print("There are no note events in track {}!\n"
"The file has {} tracks. Please try another one.".format(self.selected_track, self.num_tracks))
exit(1)
# a midi file might not contain tempo information at all. if it does, we calculate the average
# else we just assign a default tempo of 120 bpm
try:
self.average_tempo = int(sum(tempos) / len(tempos))
except ZeroDivisionError:
self.average_tempo = mido.bpm2tempo(120)

It turns out there is not much to refactor in this method, however, the best attempt to answer this question can be found here

Related

Computer decision mechanism

I have 5 players where they throw dice. We cannot use any external input such as onclick action or something.
How do I make the computer decide whether it is good to stop throwing? The stopping criteria is either you didn't throw 1,5 or straight or triples or higher. Everything counts as a point and if you hit something, lets say triple sixes you now can decide whether you throw again, but without these three dice. But if you fail to hit anything on the next throw, you lose every point you've got in the section. Or keep the sixes, which gives you 600 points.
def game(length, output = True):
round_no = 0
avg_pts = 0
player_points[0,0,0,0,0]
while output:
round_no += 1
for i in range 5:
lock = true
while lock:
dice_throw(number_of_throws)
def dice_throw(number_of_throws):
throw_values = []
for i in range(number_of_throws):
throw_values.append(randint(1,6))
throw_values.sort()
for i in range(throw_values)
How can I make a mechanism that decides if its good to continue throwing or not?

Stuff isn't appending to my list

I'm trying to create a simulation where there are two printers and I find the average wait time for each. I'm using a class for the printer and task in my program. Basically, I'm adding the wait time to each of each simulation to a list and calculating the average time. My issue is that I'm getting a division by 0 error so nothing is being appended. When I try it with 1 printer (Which is the same thing essentially) I have no issues. Here is the code I have for the second printer. I'm using a queue for this.
if printers == 2:
for currentSecond in range(numSeconds):
if newPrintTask():
task = Task(currentSecond,minSize,maxSize)
printQueue.enqueue(task)
if (not labPrinter1.busy()) and (not labPrinter2.busy()) and \
(not printQueue.is_empty()):
nexttask = printQueue.dequeue()
waitingtimes.append(nexttask.waitTime(currentSecond))
labPrinter1.startNext(nexttask)
elif (not labPrinter1.busy()) and (labPrinter2.busy()) and \
(not printQueue.is_empty()):
nexttask = printQueue.dequeue()
waitingtimes.append(nexttask.waitTime(currentSecond))
labPrinter1.startNext(nexttask)
elif (not labPrinter2.busy()) and (labPrinter1.busy()) and \
(not printQueue.is_empty()):
nexttask = printQueue.dequeue()
waitingtimes.append(nexttask.waitTime(currentSecond))
labPrinter2.startNext(nexttask)
labPrinter1.tick()
labPrinter2.tick()
averageWait = sum(waitingtimes)/len(waitingtimes)
outfile.write("Average Wait %6.2f secs %3d tasks remaining." \
%(averageWait,printQueue.size()))
Any assistance would be great!
Edit: I should mention that this happens no matter the values. I could have a page range of 99-100 and a PPM of 1 yet I still get divided by 0.
I think your problem stems from an empty waitingtimes on the first iteration or so. If there is no print job in the queue, and there has never been a waiting time inserted, you are going to reach the bottom of the loop with waitingtimes==[] (empty), and then do:
sum(waitingtimes) / len(waitingtimes)
Which will be
sum([]) / len([])
Which is
0 / 0
The easiest way to deal with this would just be to check for it, or catch it:
if not waitingtimes:
averageWait = 0
else:
averageWait = sum(waitingtimes)/len(waitingtimes)
Or:
try:
averageWait = sum(waitingtimes)/len(waitingtimes)
except ZeroDivisionError:
averageWait = 0

SQLalchemy performance when iterating queries millions of time

I'm writing a disease simulation in Python, using SQLalchemy, but I'm hitting some performance issues when running queries on a SQLite file I create earlier in the simulation.
The code is below. There are more queries in the outer for loop, but what I've posted is what slowed it down to a crawl. There are 365 days, about 76,200 mosquitos, and each mosquito makes 5 contacts per day, bringing it to about 381,000 queries per simulated day, and 27,813,000 through the entire simulation (and that's just for the mosquitos). It goes along at about 2 days / hour which, if I'm calculating correctly, is about 212 queries per second.
Do you see any issues that could be fixed that could speed things up? I've experimented with indexing the fields which are used in selection but that didn't seem to change anything. If you need to see the full code, it's available here on GitHub. The function begins on line 399.
Thanks so much, in advance.
Run mosquito-human interactions
for d in range(days_to_run):
... much more code before this, but it ran reasonably fast
vectors = session.query(Vectors).yield_per(1000) #grab each vector..
for m in vectors:
i = 0
while i < biting_rate:
pid = random.randint(1, number_humans) # Pick a human to bite
contact = session.query(Humans).filter(Humans.id == pid).first() #Select the randomly-chosen human from SQLite table
if contact: # If the random id equals an ID in the table
if contact.susceptible == 'True' and m.infected == 'True' and random.uniform(0, 1) < beta: # if the human is susceptible and mosquito is infected, infect the human
contact.susceptible = 'False'
contact.exposed = 'True'
elif contact.infected == 'True' and m.susceptible == 'True': # otherwise, if the mosquito is susceptible and the human is infected, infect the mosquito
m.susceptible = 'False'
m.infected = 'True'
nInfectedVectors += 1
nSuscVectors += 1
i += 1
session.commit()

Tracking how many elements processed in generator

I have a problem in which I process documents from files using python generators. The number of files I need to process are not known in advance. Each file contain records which consumes considerable amount of memory. Due to that, generators are used to process records. Here is the summary of the code I am working on:
def process_all_records(files):
for f in files:
fd = open(f,'r')
recs = read_records(fd)
recs_p = (process_records(r) for r in recs)
write_records(recs_p)
My process_records function checks for the content of each record and only returns the records which has a specific sender. My problem is the following: I want to have a count on number of elements being returned by read_records. I have been keeping track of number of records in process_records function using a list:
def process_records(r):
if r.sender('sender_of_interest'):
records_list.append(1)
else:
records_list.append(0)
...
The problem with this approach is that records_list could grow without bounds depending upon the input. I want to be able to consume the content of records_list once it grows to certain point and then restart the process. For example, after 20 records has been processed, I want to find out how many records are from 'sender_of_interest' and how many are from other sources and empty the list. Can I do this without using a lock?
You could make your generator a class with an attribute that contains a count of the number of records it has processed. Something like this:
class RecordProcessor(object):
def __init__(self, recs):
self.recs = recs
self.processed_rec_count = 0
def __call__(self):
for r in self.recs:
if r.sender('sender_of_interest'):
self.processed_rec_count += 1
# process record r...
yield r # processed record
def process_all_records(files):
for f in files:
fd = open(f,'r')
recs_p = RecordProcessor(read_records(fd))
write_records(recs_p)
print 'records processed:', recs_p.processed_rec_count
Here's the straightforward approach. Is there some reason why something this simple won't work for you?
seen=0
matched=0
def process_records(r):
seen = seen + 1
if r.sender('sender_of_interest'):
matched = match + 1
records_list.append(1)
else:
records_list.append(0)
if seen > 1000 or someOtherTimeBasedCriteria:
print "%d of %d total records had the sender of interest" % (matched, seen)
seen = 0
matched = 0
If you have the ability to close your stream of messages and re-open them, you might want one more total seen variable, so that if you had to close that stream and re-open it later, you could go to the last record you processed and pick up there.
In this code "someOtherTimeBasedCriteria" might be a timestamp. You can get the current time in milliseconds when you begin processing, and then if the current time now is more than 20,000ms more (20 sec) then reset the seen/matched counters.

Implementing an Anti Spamming thing?

I have an IRC bot that I made for automating stuff.
Here's a snippet of it:
def analyseIRCText(connection, event):
global adminList, userList, commandPat, flood
userName = extractUserName(event.source())
userCommand = event.arguments()[0]
escapedChannel = cleanUserCommand(config.channel).replace('\\.', '\\\\.')
escapedUserCommand = cleanUserCommand(event.arguments()[0])
#print userName, userCommand, escapedChannel, escapedUserCommand
if flood.has_key(userName):
flood[userName] += 1
else:
flood[userName] = 1
... (if flood[userName] > certain number do...)
So the idea is that flood thing is a dictionary where a list of users who have entered in a command to the bot in the recent... some time is kept, and how many times they've said so and so within that time period.
Here's where I run into trouble. There has to be SOMETHING that resets this dictionary so that the users can say stuff every once in awhile, no? I think that a little thing like this would do the trick.
def floodClear():
global flood
while 1:
flood = {} # Clear the list
time.sleep(4)
But what would be the best way to do this?
At the end of the program, I have a little line called:
thread.start_new_thread(floodClear,())
so that this thing doesn't get called at gets stuck in an infinite loop that halts everything else. Would this be a good solution or is there something better that I could do?
Your logic should be enough. If you have say:
if flood.has_key(userName):
flood[userName] += 1
else:
flood[userName] = 1
if flood[userName] > say 8:
return 0
That should make your bot ignore the user if he has spammed too many times within your given time period. What you have there should also work to clear up your flood dictionary.

Categories