Schedule times overwritten with Schedule module when called via array in Python - python

I'm attempting to build an alarm application but I'm struggling to get the 'schedule' module to function how I'd like it to. The problem is that I can't seem to schedule multiple alarms for one day while calling the day attribute via an array.
Example of how you'd normally schedule multiple times for one day:
schedule.every().sunday.at('17:25').do(job)
schedule.every().sunday.at('17:30').do(job)
schedule.every().sunday.at('17:35').do(job)
This works fine, but I really want to load times with a for loop so I don't have a giant if statement, and so that I can load times dynamically:
dayArray = [
schedule.every().sunday,
schedule.every().monday,
schedule.every().tuesday,
schedule.every().wednesday,
schedule.every().thursday,
schedule.every().friday,
schedule.every().saturday
]
for i in range(1, xlsxAlarmSheet.ncols):
for j in range(1, 8):
if(str(xlsxAlarmSheet.cell_value(j, i)) != '0'):
dayArray[j - 1].at(str(xlsxAlarmSheet.cell_value(j, i))[:2] + ':' + str(xlsxAlarmSheet.cell_value(j, i))[2:]).do(job)
The days are being loaded from an array and the times from an xlsx file via the XLRD module. The only problem is the alarms are overwriting each other somehow when I schedule multiple times for one day. If I schedule 3 times for Sunday with this method for example, only the third scheduled time fires off. I thought it must be because when I load the days into an array they are no longer unique somehow, so I tried doing a 2-dimensional array:
dayArray = [[
schedule.every().sunday,
schedule.every().monday,
schedule.every().tuesday,
schedule.every().wednesday,
schedule.every().thursday,
schedule.every().friday,
schedule.every().saturday
]] * (xlsxAlarmSheet.ncols - 1)
for i in range(1, xlsxAlarmSheet.ncols):
for j in range(1, 8):
if(str(xlsxAlarmSheet.cell_value(j, i)) != '0'):
dayArray[i - 1][j - 1].at(str(xlsxAlarmSheet.cell_value(j, i))[:2] + ':' + str(xlsxAlarmSheet.cell_value(j, i))[2:]).do(job)
With no luck... the times are still overwriting each other, any ideas?

Disclaimer: 0 Python experience, only JavaScript.... But...
Try to not call the function within the array of objects like that:
dayArray = [[
schedule.every().sunday,
...
Instead just have the name of the day (the only part which is varying)
dayArray = [[
'sunday', 'monday', ...
Then in the for each use that string name when you build the function
for each .... { schedule.every()[dayArray[i]].at(...).do(...) }
My random guess is that it's somehow getting called incorrectly when stored that way, just store the part that is different (the day name), since you can just call the rest of that function in the loop (since it's the same for all).
Hopefully that makes sense. No idea if it will work, just something to try. Good luck.

I think you may need to use an index to store your values. This link might help.
https://treyhunner.com/2016/04/how-to-loop-with-indexes-in-python/#What_if_we_need_indexes?

In another question I posted, I was originally trying to substitute the day attribute with a string, like Vig is suggesting with his answer to this post. This wasn't working for me, so I ended up storing objects in an array, as Prune suggested on my original question.
However, Prune also posted a link to an example (example 7) in which the entire call to schedule a time was stored in a string and then called via eval(), which seems to work.
So this is what I ended up doing:
dayArray = ['sunday', 'monday', 'tuesday', 'wednesday', 'thursday', 'friday', 'saturday']
dayTimeArray = []
for i in range(1, xlsxAlarmSheet.ncols):
for j in range(1, 8):
if(str(xlsxAlarmSheet.cell_value(j, i)) != '0'):
dayTimeArray.append(
"schedule.every().{}.at('{}').do(StartSubProcess)".format(
dayArray[j - 1],
str(xlsxAlarmSheet.cell_value(j, i))[:2] + ':' + str(xlsxAlarmSheet.cell_value(j ,i))[2:]
)
)
for i in range(0, len(dayTimeArray)):
eval(dayTimeArray[i])

Related

How to traverse dictionary keys in sorted order

I am reading a cfg file, and receive a dictionary for each section. So, for example:
Config-File:
[General]
parameter1="Param1"
parameter2="Param2"
[FileList]
file001="file1.txt"
file002="file2.txt" ......
I have the FileList section stored in a dictionary called section. In this example, I can access "file1.txt" as test = section["file001"], so test == "file1.txt". To access every file of FileList one after the other, I could try the following:
for i in range(1, (number_of_files + 1)):
access_key = str("file_00" + str(i))
print(section[access_key])
This is my current solution, but I don't like it at all. First of all, it looks kind of messy in python, but I will also face problems when more than 9 files are listed in the config.
I could also do it like:
for i in range(1, (number_of_files + 1)):
if (i <= 9):
access_key = str("file_00" + str(i))
elif (i > 9 and i < 100):
access_key = str("file_0" + str(i))
print(section[access_key])
But I don't want to start with that because it becomes even worse. So my question is: What would be a proper and relatively clean way to go through all the file names in order? I definitely need the loop because I need to perform some actions with every file.
Use zero padding to generate the file number (for e.g. see this SO question answer: https://stackoverflow.com/a/339013/3775361). That way you don’t have to write the logic of moving through digit rollover yourself—you can use built-in Python functionality to do it for you. If you’re using Python 3 I’d also recommend you try out f-strings (one of the suggested solutions at the link above). They’re awesome!
If we can assume the file number has three digits, then you can do the followings to achieve zero padding. All of the below returns "015".
i = 15
str(i).zfill(3)
# or
"%03d" % i
# or
"{:0>3}".format(i)
# or
f"{i:0>3}"
Start by looking at the keys you actually have instead of guessing what they might be. You need to filter out the ones that match your pattern, and sort according to the numerical portion.
keys = [key for key in section.keys() if key.startswith('file') and key[4:].isdigit()]
You can add additional conditions, like len(key) > 4, or drop the conditions entirely. You might also consider learning regular expressions to make the checking more elegant.
To sort the names without having to account for padding, you can do something like
keys = sorted(keys, key=lambda s: int(s[4:]))
You can also try a library like natsort, which will handle the custom sort key much more generally.
Now you can iterate over the keys and do whatever you want:
for key in sorted((k for k in section if k.startswith('file') and k[4:].isdigit()), key=lambda s: int(s[4:])):
print(section[key])
Here is what a solution equipt with re and natsort might look like:
import re
from natsort import natsorted
pattern = re.compile(r'file\d+')
for key in natsorted(k for k in section if pattern.fullmatch(k)):
print(section[key])

How to insert all occurances in list[] into CompositeVideoClip

I have a video where I want to insert a dynamic amount of TextClip(s). I have a while loop that handles the logic for actually creating the different TextClips and giving them individual durations & start_times (this works). I do however have a problem with actually "compiling" the video itself with inserting these texts.
Code for creating a TextClip (that works).
text = mpy.TextClip(str(contents),
color='white', size=[1700, 395], method='caption').set_duration(
int(list[i - 1])).set_start(currentTime).set_position(("center", 85))
print(str(i) + " written")
textList.append(text)
Code to "compile" the video. (that doesn't work)
final_clip = CompositeVideoClip([clip, len(textList)])
final_clip.write_videofile("files/final/TEST.mp4")
I tried several approaches but now I'm stuck and can't figure out a way to continue. Before I get a lot of "answers" telling me to do a while loop on the compiling, let me just say that the actual compiling takes about 5 minutes and I have 100-500 different texts I need implemented in the final video which would take days. Instead I want to add them one by one and then do 1 big final compile which I know it will take slightly longer than 5 minutes, but still a lot quicker than 2-3 days.
For those of you that may not have used moviepy before I will post a snippet of "my code" that actually works, not in the way I need it to though.
final_clip = CompositeVideoClip([clip, textList[0], textList[1], textList[2]])
final_clip.write_videofile("files/final/TEST.mp4")
This works exactly as intended (adding 3 texts), However I dont/can't know how many texts there will be in each video beforehand so I need to somehow insert a dynamic amount of textList[] into the function.
Kind regards,
Unsure what the arguments after clip, do (you could clarify), but if the problem's solved by inserting a variable number of textList[i] args, the solution's simple:
CompositeVideoClip([clip, *textList])
The star unpacks the list (or any iterable); ex: arg=4,5 -- def f(a,b): return a+b -- f(*arg)==9. If you have many textLists, you can manage them via a nested list or a dictionary:
textListDict = {'1':textList1, '2':textList2, ...}
textListNest = [textList1, textList2, ...] # probably preferable - using for below
# now iterate:
for textList in textListNest:
final_clip = CompositeVideoClip([clip, *textList])
Unpacking demo:
def show_then_join(a, b, c):
print("a={}, b={}, c={}".format(a,b,c))
print(''.join("{}{}{}".format(a,b,c)))
some_list = [1, 2, 'dog']
show_then_sum(*some_list) # only one arg is passed in, but is unpacked into three
# >> a=1, b=2, c=dog
# >> 12dog

Updating table fields

There is an odoo system with a timesheet module (self-made) in it.
How it works: worker came — time of the arrival has written in the timesheet — everything's good.
But there is a problem: employees, responsible for making such records, are using different timing formats: some of them are using standart HH:MM (e.g. 10:30) and some of them are using tenths HH:T (e.g. 10.5, which means the same 10:30 or even 10.125 (10:08)), so I had to make a convertation function.
Job's done, it works, but I bet there is a way to optimize it. At least, the last part of it.
#api.one
def time_button (self):
def ftohhmm(a):
if a:
a = re.sub(',' , '.' , a)
if (re.search ('^\-?\d+((,|\.)\d+)?$',a) >= 0):
if float(a) <24:
a = float(a) * 60
minutes = a%60
hours = a/60
if int(round(minutes)) < 10:
return str(int(hours))+":0"+str(int(round(minutes)))
else:
return str(int(hours))+":"+str(int(round(minutes)))
return a
if self.format:
for i in self.ids_string:
i.hours1=ftohhmm(i.hours1)
i.hours2=ftohhmm(i.hours2)
i.hours3=ftohhmm(i.hours3)
i.hours4=ftohhmm(i.hours4)
i.hours5=ftohhmm(i.hours5)
i.hours6=ftohhmm(i.hours6)
i.hours7=ftohhmm(i.hours7)
i.hours8=ftohhmm(i.hours8)
i.hours9=ftohhmm(i.hours9)
i.hours10=ftohhmm(i.hours10)
i.hours11=ftohhmm(i.hours11)
i.hours12=ftohhmm(i.hours12)
i.hours13=ftohhmm(i.hours13)
i.hours14=ftohhmm(i.hours14)
i.hours15=ftohhmm(i.hours15)
i.hours16=ftohhmm(i.hours16)
i.hours17=ftohhmm(i.hours17)
i.hours18=ftohhmm(i.hours18)
i.hours19=ftohhmm(i.hours19)
i.hours20=ftohhmm(i.hours20)
i.hours21=ftohhmm(i.hours21)
i.hours22=ftohhmm(i.hours22)
i.hours23=ftohhmm(i.hours23)
i.hours24=ftohhmm(i.hours24)
i.hours25=ftohhmm(i.hours25)
i.hours26=ftohhmm(i.hours26)
i.hours27=ftohhmm(i.hours27)
i.hours28=ftohhmm(i.hours28)
i.hours29=ftohhmm(i.hours29)
i.hours30=ftohhmm(i.hours30)
i.hours31=ftohhmm(i.hours31)
Hours1-31 are the columns for every day. Rows are for workers. Cells at the intersections contain exact time when worker came.
Any advise of how to optimize it would be great. Thanks!
for i in self.ids_string:
for j in range(1, 32):
if hasattr(i, "hours%s" % j):
a = getattr(i, "hours%s" % j)
setattr(i, "hours%s" %j, ftohhmm(a))
maybe this answer is your need.

using enumerate to iterate over a dictionary of lists to extract information

I got some help earlier today about how to obtain positional information from a dictionary using enumerate(). I will provide the code shortly. However, now that I've found this cool tool, I want to implement it in a different manner to obtain some more information from my dictionary.
I have a dictionary:
length = {'A': [(0,21), (30,41), (70,80), (95,200)] 'B': [(0,42), (70,80)]..etc}
and a file:
A 73
B 15
etc
What I want to do now is to find the difference from the max of the first element in my list from the min from the second element. For example, the difference of 21 and 30. Then I want to add all these differences up until I hit the pair (range) of numbers that the number from my file matches to (if that makes sense).
Here is the code that I've been working on:
import csv
with open('Exome_agg_cons_snps_pct_RefSeq_HGMD_reinitialized.txt') as f:
reader = csv.DictReader(f,delimiter="\t")
for row in reader:
snppos = row['snp_rein']
name = row['isoform']
snpos = int(snppos)
if name in exons:
y = exons[name]
for sd, i in enumerate(exons[name]):
while not snpos<=max(i):
intron = min(i+1) - max(i) #this doesn't work unfortunately. It says I can't add 1 to i
totalintron = 0 + intron
if snpos<=max(i):
exonmin = min(i)
exonnumber = sd+1
print exonnumber,name,totalintron
break
I think it's the sd (indexer) that is confusing me. I don't know how to use it in the this context. The commented out portions are other avenues I've tried but failed to be successful. Any help? I know this is a confusing question and my code might be a little mixed up, but that's because I can't even get an output to correct my other mistakes yet.
I want my output to look like this based on the file provided:
exon name introntotal
3 A 38
1 B 0
To try to provide some help for this question: a critical part of the problem is that I don't think enumerate does what you think it does. Enumerate just numbers the things you are iterating over. So when you go through your for loop, sd will first be 0, then it will be 1... And that's all. In your case, you want to look at adjacent list entries (it seems?), so the more idiomatic ways of looping in python aren't nearly as clean. So you could do something like:
...
y = exons[name]
for index in range(len(y) - 1): # the - 1 is to prevent going out of bounds
first_max = max(y[index])
second_min = min(y[index+1])
... # do more stuff, I didn't completely follow what you're trying to do
I will add for the hardcore pythonistas, you can of course do some clever stuff to write this more idiomatically and avoid the C style loop that I wrote, but I think that getting into zip and so on might be a bit confusing for somebody new to python.
The issue is that you're using the output of enumerate() incorrectly.
enumerate() returns the index (position) first then the item
Ex:
x = [10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
for i, item in enumerate(x):
print(i, item)
# prints
#(0, 10)
#(1, 11)
#(2, 12)
#(3, 13)
#(4, 14)
#(5, 15)
#(6, 16)
#(7, 17)
#(8, 18)
#(9, 19)
So in your case, you should switch i and sd:
for i, sd in enumerate(exons[name]):
# do something
Like other commenters suggested, reading the python documentation is usually a good place to start resolving issues, especially if you're not sure how a function does what it does :)

Numpy/Python performing terribly vs. Matlab

Novice programmer here. I'm writing a program that analyzes the relative spatial locations of points (cells). The program gets boundaries and cell type off an array with the x coordinate in column 1, y coordinate in column 2, and cell type in column 3. It then checks each cell for cell type and appropriate distance from the bounds. If it passes, it then calculates its distance from each other cell in the array and if the distance is within a specified analysis range it adds it to an output array at that distance.
My cell marking program is in wxpython so I was hoping to develop this program in python as well and eventually stick it into the GUI. Unfortunately right now python takes ~20 seconds to run the core loop on my machine while MATLAB can do ~15 loops/second. Since I'm planning on doing 1000 loops (with a randomized comparison condition) on ~30 cases times several exploratory analysis types this is not a trivial difference.
I tried running a profiler and array calls are 1/4 of the time, almost all of the rest is unspecified loop time.
Here is the python code for the main loop:
for basecell in range (0, cellnumber-1):
if firstcelltype == np.array((cellrecord[basecell,2])):
xloc=np.array((cellrecord[basecell,0]))
yloc=np.array((cellrecord[basecell,1]))
xedgedist=(xbound-xloc)
yedgedist=(ybound-yloc)
if xloc>excludedist and xedgedist>excludedist and yloc>excludedist and yedgedist>excludedist:
for comparecell in range (0, cellnumber-1):
if secondcelltype==np.array((cellrecord[comparecell,2])):
xcomploc=np.array((cellrecord[comparecell,0]))
ycomploc=np.array((cellrecord[comparecell,1]))
dist=math.sqrt((xcomploc-xloc)**2+(ycomploc-yloc)**2)
dist=round(dist)
if dist>=1 and dist<=analysisdist:
arraytarget=round(dist*analysisdist/intervalnumber)
addone=np.array((spatialraw[arraytarget-1]))
addone=addone+1
targetcell=arraytarget-1
np.put(spatialraw,[targetcell,targetcell],addone)
Here is the matlab code for the main loop:
for basecell = 1:cellnumber;
if firstcelltype==cellrecord(basecell,3);
xloc=cellrecord(basecell,1);
yloc=cellrecord(basecell,2);
xedgedist=(xbound-xloc);
yedgedist=(ybound-yloc);
if (xloc>excludedist) && (yloc>excludedist) && (xedgedist>excludedist) && (yedgedist>excludedist);
for comparecell = 1:cellnumber;
if secondcelltype==cellrecord(comparecell,3);
xcomploc=cellrecord(comparecell,1);
ycomploc=cellrecord(comparecell,2);
dist=sqrt((xcomploc-xloc)^2+(ycomploc-yloc)^2);
if (dist>=1) && (dist<=100.4999);
arraytarget=round(dist*analysisdist/intervalnumber);
spatialsum(1,arraytarget)=spatialsum(1,arraytarget)+1;
end
end
end
end
end
end
Thanks!
Here are some ways to speed up your python code.
First: Don't make np arrays when you are only storing one value. You do this many times over in your code. For instance,
if firstcelltype == np.array((cellrecord[basecell,2])):
can just be
if firstcelltype == cellrecord[basecell,2]:
I'll show you why with some timeit statements:
>>> timeit.Timer('x = 111.1').timeit()
0.045882196294822819
>>> t=timeit.Timer('x = np.array(111.1)','import numpy as np').timeit()
0.55774970267830071
That's an order of magnitude in difference between those calls.
Second: The following code:
arraytarget=round(dist*analysisdist/intervalnumber)
addone=np.array((spatialraw[arraytarget-1]))
addone=addone+1
targetcell=arraytarget-1
np.put(spatialraw,[targetcell,targetcell],addone)
can be replaced with
arraytarget=round(dist*analysisdist/intervalnumber)-1
spatialraw[arraytarget] += 1
Third: You can get rid of the sqrt as Philip mentioned by squaring analysisdist beforehand. However, since you use analysisdist to get arraytarget, you might want to create a separate variable, analysisdist2 that is the square of analysisdist and use that for your comparison.
Fourth: You are looking for cells that match secondcelltype every time you get to that point rather than finding those one time and using the list over and over again. You could define an array:
comparecells = np.where(cellrecord[:,2]==secondcelltype)[0]
and then replace
for comparecell in range (0, cellnumber-1):
if secondcelltype==np.array((cellrecord[comparecell,2])):
with
for comparecell in comparecells:
Fifth: Use psyco. It is a JIT compiler. Matlab has a built-in JIT compiler if you're using a somewhat recent version. This should speed-up your code a bit.
Sixth: If the code still isn't fast enough after all previous steps, then you should try vectorizing your code. It shouldn't be too difficult. Basically, the more stuff you can have in numpy arrays the better. Here's my try at vectorizing:
basecells = np.where(cellrecord[:,2]==firstcelltype)[0]
xlocs = cellrecord[basecells, 0]
ylocs = cellrecord[basecells, 1]
xedgedists = xbound - xloc
yedgedists = ybound - yloc
whichcells = np.where((xlocs>excludedist) & (xedgedists>excludedist) & (ylocs>excludedist) & (yedgedists>excludedist))[0]
selectedcells = basecells[whichcells]
comparecells = np.where(cellrecord[:,2]==secondcelltype)[0]
xcomplocs = cellrecords[comparecells,0]
ycomplocs = cellrecords[comparecells,1]
analysisdist2 = analysisdist**2
for basecell in selectedcells:
dists = np.round((xcomplocs-xlocs[basecell])**2 + (ycomplocs-ylocs[basecell])**2)
whichcells = np.where((dists >= 1) & (dists <= analysisdist2))[0]
arraytargets = np.round(dists[whichcells]*analysisdist/intervalnumber) - 1
for target in arraytargets:
spatialraw[target] += 1
You can probably take out that inner for loop, but you have to be careful because some of the elements of arraytargets could be the same. Also, I didn't actually try out all of the code, so there could be a bug or typo in there. Hopefully, it gives you a good idea of how to do this. Oh, one more thing. You make analysisdist/intervalnumber a separate variable to avoid doing that division over and over again.
Not too sure about the slowness of python but you Matlab code can be HIGHLY optimized. Nested for-loops tend to have horrible performance issues. You can replace the inner loop with a vectorized function ... as below:
for basecell = 1:cellnumber;
if firstcelltype==cellrecord(basecell,3);
xloc=cellrecord(basecell,1);
yloc=cellrecord(basecell,2);
xedgedist=(xbound-xloc);
yedgedist=(ybound-yloc);
if (xloc>excludedist) && (yloc>excludedist) && (xedgedist>excludedist) && (yedgedist>excludedist);
% for comparecell = 1:cellnumber;
% if secondcelltype==cellrecord(comparecell,3);
% xcomploc=cellrecord(comparecell,1);
% ycomploc=cellrecord(comparecell,2);
% dist=sqrt((xcomploc-xloc)^2+(ycomploc-yloc)^2);
% if (dist>=1) && (dist<=100.4999);
% arraytarget=round(dist*analysisdist/intervalnumber);
% spatialsum(1,arraytarget)=spatialsum(1,arraytarget)+1;
% end
% end
% end
%replace with:
secondcelltype_mask = secondcelltype == cellrecord(:,3);
xcomploc_vec = cellrecord(secondcelltype_mask ,1);
ycomploc_vec = cellrecord(secondcelltype_mask ,2);
dist_vec = sqrt((xcomploc_vec-xloc)^2+(ycomploc_vec-yloc)^2);
dist_mask = dist>=1 & dist<=100.4999
arraytarget_vec = round(dist_vec(dist_mask)*analysisdist/intervalnumber);
count = accumarray(arraytarget_vec,1, [size(spatialsum,1),1]);
spatialsum(:,1) = spatialsum(:,1)+count;
end
end
end
There may be some small errors in there since I don't have any data to test the code with but it should get ~10X speed up on the Matlab code.
From my experience with numpy I've noticed that swapping out for-loops for vectorized/matrix-based arithmetic has noticeable speed-ups as well. However, without the shapes the shapes of all of your variables its hard to vectorize things.
You can avoid some of the math.sqrt calls by replacing the lines
dist=math.sqrt((xcomploc-xloc)**2+(ycomploc-yloc)**2)
dist=round(dist)
if dist>=1 and dist<=analysisdist:
arraytarget=round(dist*analysisdist/intervalnumber)
with
dist=(xcomploc-xloc)**2+(ycomploc-yloc)**2
dist=round(dist)
if dist>=1 and dist<=analysisdist_squared:
arraytarget=round(math.sqrt(dist)*analysisdist/intervalnumber)
where you have the line
analysisdist_squared = analysis_dist * analysis_dist
outside of the main loop of your function.
Since math.sqrt is called in the innermost loop, you should have from math import sqrt at the top of the module and just call the function as sqrt.
I would also try replacing
dist=(xcomploc-xloc)**2+(ycomploc-yloc)**2
with
dist=(xcomploc-xloc)*(xcomploc-xloc)+(ycomploc-yloc)*(ycomploc-yloc)
There's a chance it will produce faster byte code to do multiplication rather than exponentiation.
I doubt these will get you all the way to MATLABs performance, but they should help reduce some overhead.
If you have a multicore, you could maybe give the multiprocessing module a try and use multiple processes to make use of all the cores.
Instead of sqrt you could use x**0.5, which is, if I remember correct, slightly faster.

Categories