I have a long running process that is mostly IO bound. It is basically just a loop uploading items somewhere, some of these items take more time than others, some days the whole process is slower so the time can't be hardcoded.
Is there a module that given the progress through the loop in terms of (current position, final position) could evaluate the first few iterations then give an estimate of the remaining time, but also update on every iteration?
I'm thinking something like the progress output you get from tools like wget and apt-get.
I guess I could write it myself but I wondered if something like this exists already.
I'm totally new to PsychoPy and I'm working with Builder. I'm not familiar with Python coding at all.
I have audio stimuli that have variable durations. In each trial, I want the second stimulus to start 500ms or 1500ms after the end of the first stimulus. Is there a way to do this in Builder? If I have to do it on Coder, what should I do?
Thank you very much!
Absolutely. Think of 500ms and 1500ms as two different conditions that you loop over in addition. These two conditions are crossed with the different durations.
In you conditions file, where you have the different durations (or you could just do that using a random function of course), for every duration add two rows with a column "soa" (or whatever you want to call it) with the two values 500ms and 1500ms. In the builder interface you can choose whether order of presentation should be sequential, randomized within block or fully randomized across all trials (not just within block). Also, if you don't want it balanced (e.g. 20% 1500ms and 80% 500ms), you can just add the appropriate number of rows to achieve this balance (1 out of 5 is 1500 ms).
Nearly all demos handles trials in this way, so take a look in Builder --> Demos, click on the loop and see how it's done there. Also, read the relevant section of the online documentation and see a video tutorial also incorporating it.
In concrete terms, when you add a Sound component in Builder, you just need to add an expression in the "Start (time)" field that takes account of the duration of the first sound stimulus and the ISI for this trial.
So if you have a column for the ISI in the conditions file as Jonas suggests (let's say it is called "ISI") and a Sound component for the first auditory stimulus (called, say, "sound1"), then you could put this in the Start field of the second sound stimulus:
$sound1.getDuration() + ISI
The $ symbol indicates that this line is to be interpreted as a Python code expression and not as a literal duration.
This assumes that sound1 starts at the very beginning of a trial. If it starts, say 1 second into the trial, then just add a constant to the expression:
$1.0 + sound1.getDuration() + ISI
Your ISI column should contain values in seconds. If you prefer milliseconds, then do this:
$sound1.getDuration() + ISI/1000.0
I need to apply two running filters on a large amount of data. I have read that creating variables on the fly is not a good idea, but I wonder if it still might be the best solution for me.
My question:
Can I create arrays in a loop with the help of a counter (array1, array2…) and then call them with the counter (something like: ‘array’+str(counter) or ‘array’+str(counter-1)?
Why I want to do it:
The data are 400x700 arrays for 15min time steps over a year (So I have 35000 400x700 arrays). Each time step is read into python individually. Now I need to apply one running filter that checks if the last four time steps are equal (element-wise) and if they are, then all four values are set to zero. The next filter uses the data after the first filter has run and checks if the sum of the last twelve time steps exceeds a certain value. When both filters are done I want to sum up the values, so that at the end of the year I have one 400x700 array with the filtered accumulated values.
I do not have enough memory to read in all the data at once. So I thought I could create a loop where for each time step a new variable for the 400x700 array is created and the two filters run. The older arrays that are filtered I could then add to the yearly sum and delete, so that I do not have more than 16 (4+12) time steps(arrays) in memory at all times.
I don’t now if it’s correct of me to ask such a question without any code to show, but I would really appreciate the help.
If your question is about the best data structure to keep a certain amount of arrays in memory, in this case I would suggest using a three dimensional array. It's shape would be (400, 700, 12) since twelve is how many arrays you need to look back at. The advantage of this is that your memory use will be constant since you load new arrays into the larger one. The disadvantage is that you need to shift all arrays manually.
If you don't want to deal with the shifting yourself I'd suggest using a deque with a maxlen of 12.
"Can I create arrays in a loop with the help of a counter (array1, array2…) and then call them with the counter (something like: ‘array’+str(counter) or ‘array’+str(counter-1)?"
This is a very common question that I think a lot of programmers will face eventually. Two examples for Python on Stack Overflow:
generating variable names on fly in python
How do you create different variable names while in a loop? (Python)
The lesson to learn from this is to not use dynamic variable names, but instead put the pieces of data you want to work with in an encompassing data structure.
The data structure could e.g. be a list, dict or Numpy array. Also the collections.deque proposed by #Midnighter seems to be a good candidate for such a running filter.
Given a time (eg. currently 4:24pm on Tuesday), I'd like to be able to select all businesses that are currently open out of a set of businesses.
I have the open and close times for every business for every day of the week
Let's assume a business can open/close only on 00, 15, 30, 45 minute marks of each hour
I'm assuming the same schedule each week.
I am most interested in being able to quickly look up a set of businesses that is open at a certain time, not the space requirements of the data.
Mind you, some my open at 11pm one day and close 1am the next day.
Holidays don't matter - I will handle these separately
What's the most efficient way to store these open/close times such that with a single time/day-of-week tuple I can speedily figure out which businesses are open?
I am using Python, SOLR and mysql. I'd like to be able to do the querying in SOLR. But frankly, I'm open to any suggestions and alternatives.
If you are willing to just look at single week at a time, you can canonicalize all opening/closing times to be set numbers of minutes since the start of the week, say Sunday 0 hrs. For each store, you create a number of tuples of the form [startTime, endTime, storeId]. (For hours that spanned Sunday midnight, you'd have to create two tuples, one going to the end of the week, one starting at the beginning of the week). This set of tuples would be indexed (say, with a tree you would pre-process) on both startTime and endTime. The tuples shouldn't be that large: there are only ~10k minutes in a week, which can fit in 2 bytes. This structure would be graceful inside a MySQL table with appropriate indexes, and would be very resilient to constant insertions & deletions of records as information changed. Your query would simply be "select storeId where startTime <= time and endtime >= time", where time was the canonicalized minutes since midnight on sunday.
If information doesn't change very often, and you want to have lookups be very fast, you could solve every possible query up front and cache the results. For instance, there are only 672 quarter-hour periods in a week. With a list of businesses, each of which had a list of opening & closing times like Brandon Rhodes's solution, you could simply, iterate through every 15-minute period in a week, figure out who's open, then store the answer in a lookup table or in-memory list.
The bitmap field mentioned by another respondent would be incredibly efficient, but gets messy if you want to be able to handle half-hour or quarter-hour times, since you have to increase arithmetically the number of bits and the design of the field each time you encounter a new resolution that you have to match.
I would instead try storing the values as datetimes inside a list:
openclosings = [ open1, close1, open2, close2, ... ]
Then, I would use Python's "bisect_right()" function in its built-in "bisect" module to find, in fast O(log n) time, where in that list your query time "fits". Then, look at the index that is returned. If it is an even number (0, 2, 4...) then the time lies between one of the "closed" times and the next "open" time, so the shop is closed then. If, instead, the bisection index is an odd number (1, 3, 5...) then the time has landed between an opening and a closing time, and the shop is open.
Not as fast as bitmaps, but you don't have to worry about resolution, and I can't think of another O(log n) solution that's as elegant.
You say you're using SOLR, don't care about storage, and want the lookups to be fast. Then instead of storing open/close tuples, index an entry for every open block of time at the level of granularity you need (15 mins). For the encoding itself, you could use just cumulative hours:minutes.
For example, a store open from 4-5 pm on Monday, would have indexed values added for [40:00, 40:15, 40:30, 40:45]. A query at 4:24 pm on Monday would be normalized to 40:15, and therefore match that store document.
This may seem inefficient at first glance, but it's a relatively small constant penalty for indexing speed and space. And makes the searches as fast as possible.
Sorry I don't have an easy answer, but I can tell you that as the manager of a development team at a company in the late 90's we were tasked with solving this very problem and it was HARD.
It's not the weekly hours that's tough, that can be done with a relatively small bitmask (168 bits = 1 per hour of the week), the trick is the businesses which are closed every alternating Tuesday.
Starting with a bitmask then moving on to an exceptions field is the best solution I've ever seen.
In your Solr index, instead of indexing each business as one document with hours, index every "retail session" for every business during the course of a week.
For example if Joe's coffee is open Mon-Sat 6am-9pm and closed on Sunday, you would index six distinct documents, each with two indexed fields, "open" and "close". If your units are 15 minute intervals, then the values can range from 0 to 7*24*4. Assuming you have a unique ID for each business, store this in each document so you can map the sessions to businesses.
Then you can simply do a range search in Solr:
open:[* TO N] AND close:[N+1 TO *]
where N is computed to the Nth 15 minute interval that the current time falls into. For examples if it's 10:10AM on Wednesday, your query would be:
open:[* TO 112] AND close:[113 TO *]
aka "find a session that starts at or before 10:00am Wed and ends at or after 10:15am Wed"
If you want to include other criteria in your search, such as location or products, you will need to index this with each session document as well. This is a bit redundant, but if your index is not huge, it shouldn't be a problem.
If you can control your data well, I see a simple solution, similar to #Sebastian's. Follow the advice of creating the tuples, except create them of the form [time=startTime, storeId] and [time=endTime, storeId], then sort these in a list. To find out if a store is open, simply do a query like:
select storeId
from table
where time <= '#1'
group by storeId
having count(storeId) % 2 == 1
To optimize this, you could build a lookup table at each of time t, store the stores that are open at t, and the store openings/closings between t and t+1 (for any grouping of t).
However, this has the drawback of being harder to maintain (overlapping openings/closings need to be merged into a longer open-close period).
Have you looked at how many unique open/close time combinations there are? If there are not that many, make a reference table of the unique combinations and store the index of the appropriate entry against each business. Then you only have to search the reference table and then find the business with those indices.
This seems like such a trivial problem, but I can't seem to pin how I want to do it. Basically, I want to be able to produce a figure from a socket server that at any time can give the number of packets received in the last minute. How would I do that?
I was thinking of maybe summing a dictionary that uses the current second as a key, and when receiving a packet it increments that value by one, as well as setting the second+1 key above it to 0, but this just seems sloppy. Any ideas?
A common pattern for solving this in other languages is to let the thing being measured simply increment an integer. Then you leave it to the listening client to determine intervals and frequencies.
So you basically do not let the socket server know about stuff like "minutes", because that's a feature the observer calculates. Then you can also support multiple listeners with different interval resolution.
I suppose you want some kind of ring-buffer structure to do the rolling logging.
When you say the last minute, do you mean the exact last seconds or the last full minute from x:00 to x:59? The latter will be easier to implement and would probably give accurate results. You have one prev variable holding the value of the hits for the previous minute. Then you have a current value that increments every time there is a new hit. You return the value of prev to the users. At the change of the minute you swap prev with current and reset current.
If you want higher analysis you could split the minute in 2 to 6 slices. You need a variable or list entry for every slice. Let's say you have 6 slices of 10 seconds. You also have an index variable pointing to the current slice (0..5). For every hit you increment a temp variable. When the slice is over, you replace the value of the indexed variable with the value of temp, reset temp and move the index forward. You return the sum of the slice variables to the users.
For what it's worth, your implementation above won't work if you don't receive a packet every second, as the next second entry won't necessarily be reset to 0.
Either way, afaik the "correct" way to do this, ala logs analysis, is to keep a limited record of all the queries you receive. So just chuck the query, time received etc. into a database, and then simple database queries will give you the use over a minute, or any minute in the past. Not sure whether this is too heavyweight for you, though.