This question is academic in nature and typically if I had these issues in the real world I'd probably do things differently. However I was curious about whether the speed at which python objects are created could be improved.
#!/usr/bin/env python
import time
# boring class
class boring:
def __init__(self, a, b, c):
self.a = a
self.b = b
self.c = c
def timefunc(f):
def f_timer(*args, **kwargs):
start = time.time()
result = f(*args, **kwargs)
end = time.time()
print f.__name__, 'took', end - start, 'time'
return result
return f_timer
#timefunc
def standard_boring():
for x in range(10000000):
b = boring(1,2,3)
del(b)
if __name__ == "__main__":
standard_boring()
On my machine this takes about 4 seconds. So curious what Python is doing I dug deeper using strace.
What I see is many tens of thousands of calls to brk e.g. -
brk(0xe35000) = 0xe35000
brk(0xe34000) = 0xe34000
brk(0xe33000) = 0xe33000
brk(0xe32000) = 0xe32000
brk(0xe31000) = 0xe31000
Man describes brk as
brk() and sbrk() change the location of the program break, which
defines the end of the process's data segment (i.e., the program break
is the first location after the end of the uninitialized data segment).
Increasing the program break has the effect of allocating memory to the
process; decreasing the break deallocates memory.
Each call to this presumably some number of cycles, as such the very fact there are loads of calls to this makes me think this is what takes the time.
So is there a way to tell python "Yo, I'm going to be using lots of memory so allocate loads up front". Can I do anything like that? I know why you wouldn't want to, like I said this is academic.
I've experimented with slots, hypothesizing that if I use less memory then this should improve my speed by the very nature of reducing the reoccurring memory allocation requests. I found the results from this quite "lumpy".
Any ideas?
Related
I wrote simplePrint.py
simplePrint.py
print(1)
print(2)
print(3)
And, I wrote functionPrint.py.
functionPrint.py
def printTestFunction(one,two,three):
print(one)
print(two)
print(three)
printTestFunction(1,2,3)
It may be natural, but functionPrint.py is slower.
Is there a way to speed up processing while using functions?
The speed comparison method is as follows
import timeit
class checkTime():
def __init__(self):
self.a = 0
self.b = 0
self.c = 0
def local(self):
print(1)
print(2)
print(3)
def self(self):
def printTestFunction(one,two,three):
print(one)
print(two)
print(three)
printTestFunction(1,2,3)
def getTime(self):
def test1():
self.self()
self_elapsed_time = timeit.Timer(stmt=test1).repeat(number=10000)
def test2():
self.local()
local_elapsed_time = timeit.Timer(stmt=test2).repeat(number=10000)
print ("local_time:{0}".format(local_elapsed_time) + "[sec]")
print ("self_time:{0}".format(self_elapsed_time) + "[sec]")
checkTime = checkTime()
checkTime.getTime()
result
local_time:[0.04716750000000003, 0.09638709999999995, 0.07357000000000002, 0.04696279999999997, 0.04360750000000002][sec]
self_time:[0.09702539999999998, 0.111617, 0.07951390000000003, 0.08777400000000002, 0.099128][sec]
There are plenty of ways to optimize your Python, but for something this simple, I wouldn't worry about it. Function calls are damn near instantaneous in human time.
A function call in most languages has to create new variables for the arguments, create a local scope, perform all the actions. So:
def printTestFunction(one,two,three):
print(one)
print(two)
print(three)
printTestFunction(1,2,3)
runs something like this:
define function printTestFunction
call function with args (1, 2, 3)
create local scope
one=arg[0]
two=arg[1]
three=arg[2]
print(one)
print(two)
print(three)
return None
destroy local scope
garbage collect
That's my guess anyway. You can see that there's a lot more going on here and that takes time. (in particular, creating a local scope is a lot of instructions).
(You should definitely still use functions as programming anything complex gets out of control VERY quickly without them. The speed bump is negligible.)
Doing a simple Google search yields:
Here are 5 important things to keep in mind in order to write efficient Python cde... Know the basic data structures. ... Reduce memory footprint. ... Use builtin functions and libraries. ... Move calculations outside the loop. ... Keep your code base small. If you want to test your scripts and see which runs faster you can use this (taken from this post):
import time
start_time = time.time()
// Your main code here
print("--- %s seconds ---" % (time.time() - start_time))
I am fairly new to python, kindly excuse me for insufficient information if any. As a part of the curriculum , I got introduced to python for quants/finance, I am studying multiprocessing and trying to understand this better. I tried modifying the problem given and now I am stuck mentally with the problem.
Problem:
I have a function which gives me ticks, in ohlc format.
{'scrip_name':'ABC','timestamp':1504836192,'open':301.05,'high':303.80,'low':299.00,'close':301.10,'volume':100000}
every minute. I wish to do the following calculation concurrently and preferably append/insert in the samelist
Find the Moving Average of the last 5 close data
Find the Median of the last 5 open data
Save the tick data to a database.
so expected data is likely to be
['scrip_name':'ABC','timestamp':1504836192,'open':301.05,'high':303.80,'low':299.00,'close':301.10,'volume':100000,'MA_5_open':300.25,'Median_5_close':300.50]
Assuming that the data is going to a db, its fairly easy to write a simple dbinsert routine to the database, I don't see that as a great challenge, I can spawn a to execute a insert statement for every minute.
How do I sync 3 different functions/process( a function to insert into db, a function to calculate the average, a function to calculate the median), while holding in memory 5 ticks to calculate the 5 period, simple average Moving Average and push them back to the dict/list.
The following assumption, challenges me in writing the multiprocessing routine. can someone guide me. I don't want to use pandas dataframe.
====REVISION/UPDATE===
The reason, why I don't want any solution on pandas/numpy is that, my objective is to understand the basics, and not the nuances of a new library. Please don't mistake my need for understanding to be arrogance or not wanting to be open to suggestions.
The advantage of having
p1=Process(target=Median,arg(sourcelist))
p2=Process(target=Average,arg(sourcelist))
p3=process(target=insertdb,arg(updatedlist))
would help me understand the possibility of scaling processes based on no of functions /algo components.. But how should I make sure p1&p2 are in sync while p3 should execute post p1&p2
Here is an example of how to use multiprocessing:
from multiprocessing import Pool, cpu_count
def db_func(ma, med):
db.save(something)
def backtest_strat(d, db_func):
a = d.get('avg')
s = map(sum, a)
db_func(s/len(a), median(a))
with Pool(cpu_count()) as p:
from functools import partial
bs = partial(backtest_strat, db_func=db_func)
print(p.map(bs, [{'avg': [1,2,3,4,5], 'median': [1,2,3,4,5]}]))
also see :
https://stackoverflow.com/a/24101655/2026508
note that this will not speed up anything unless there are a lot of slices.
so for the speed up part:
def get_slices(data)
for slice in data:
yield {'avg': [1,2,3,4,5], 'median': [1,2,3,4,5]}
p.map(bs, get_slices)
from what i understand multiprocessing works by message passing via pickles, so the pool.map when called should have access to all three things, the two arrays, and the db_save function. There are of course other ways to go about it, but hopefully this shows one way to go about it.
Question: how should I make sure p1&p2 are in sync while p3 should execute post p1&p2
If you sync all Processes, computing one Task (p1,p2,p3) couldn't be faster as the slowes Process are be.
In the meantime the other Processes running idle.
It's called "Producer - Consumer Problem".
Solution using Queue all Data serialize, no synchronize required.
# Process-1
def Producer()
task_queue.put(data)
# Process-2
def Consumer(task_queue)
data = task_queue.get()
# process data
You want multiple Consumer Processes and one Consumer Process gather all Results.
You don't want to use Queue, but Sync Primitives.
This Example let all Processes run independent.
Only the Process Result waits until notified.
This Example uses a unlimited Task Buffer tasks = mp.Manager().list().
The Size could be minimized if List Entrys for done Tasks are reused.
If you have some very fast algos combine some to one Process.
import multiprocessing as mp
# Base class for all WORKERS
class Worker(mp.Process):
tasks = mp.Manager().list()
task_ready = mp.Condition()
parties = mp.Manager().Value(int, 0)
#classmethod
def join(self):
# Wait until all Data processed
def get_task(self):
for i, task in enumerate(Worker.tasks):
if task is None: continue
if not self.__class__.__name__ in task['result']:
return (i, task['range'])
return (None, None)
# Main Process Loop
def run(self):
while True:
# Get a Task for this WORKER
idx, _range = self.get_task()
if idx is None:
break
# Compute with self Method this _range
result = self.compute(_range)
# Update Worker.tasks
with Worker.lock:
task = Worker.tasks[idx]
task['result'][name] = result
parties = len(task['result'])
Worker.tasks[idx] = task
# If Last, notify Process Result
if parties == Worker.parties.value:
with Worker.task_ready:
Worker.task_ready.notify()
class Result(Worker):
# Main Process Loop
def run(self):
while True:
with Worker.task_ready:
Worker.task_ready.wait()
# Get (idx, _range) from tasks List
idx, _range = self.get_task()
if idx is None:
break
# process Task Results
# Mark this tasks List Entry as done for reuse
Worker.tasks[idx] = None
class Average(Worker):
def compute(self, _range):
return average of DATA[_range]
class Median(Worker):
def compute(self, _range):
return median of DATA[_range]
if __name__ == '__main__':
DATA = mp.Manager().list()
WORKERS = [Result(), Average(), Median()]
Worker.start(WORKERS)
# Example creates a Task every 5 Records
for i in range(1, 16):
DATA.append({'id': i, 'open': 300 + randrange(0, 5), 'close': 300 + randrange(-5, 5)})
if i % 5 == 0:
Worker.tasks.append({'range':(i-5, i), 'result': {}})
Worker.join()
Tested with Python: 3.4.2
I'm making some API requests which are limited at 20 per second. As to get the answer the waiting time is about 0.5 secs I thought to use multiprocessing.Pool.map and using this decorator
rate-limiting
So my code looks like
def fun(vec):
#do stuff
def RateLimited(maxPerSecond):
minInterval = 1.0 / float(maxPerSecond)
def decorate(func):
lastTimeCalled = [0.0]
def rateLimitedFunction(*args,**kargs):
elapsed = time.clock() - lastTimeCalled[0]
leftToWait = minInterval - elapsed
if leftToWait>0:
time.sleep(leftToWait)
ret = func(*args,**kargs)
lastTimeCalled[0] = time.clock()
return ret
return rateLimitedFunction
return decorate
#RateLimited(20)
def multi(vec):
p = Pool(5)
return p.map(f, vec)
I have 4 cores and this program works fine and there is an improvement in time compared to the loop version. Furthermore, when the Pool argument is 4,5,6 it works and the time is smaller for Pool(6) but when I use 7+ I got errors (Too many connections per second I guess).
Then if my function is more complicated and can do 1-5 requests the decorator doesn't work as expected.
What else I can use in this case?
UPDATE
For anyone looking for use Pool remembers to close it otherwise you are going to use all the RAM
def multi(vec):
p = Pool(5)
res=p.map(f, vec)
p.close()
return res
UPDATE 2
I found that something like this WebRequestManager can do the trick. The problem is that doesn't work with multiprocessing. Pool with 19-20 processes because the time is stored in the class you need to call when you run the request.
Your indents are inconsistent up above which makes it harder to answer this, but I'll take a stab.
It looks like you're rate limiting the wrong thing; if f is supposed be limited, you need to limit the calls to f, not the calls to multi. Doing this in something that's getting dispatched to the Pool won't work, because the forked workers would each be limiting independently (forked processes will have independent tracking of the time since last call).
The easiest way to do this would be to limit how quickly the iterator that the Pool pulls from produces results. For example:
import collections
import time
def rate_limited_iterator(iterable, limit_per_second):
# Initially, we can run immediately limit times
runats = collections.deque([time.time()] * limit_per_second)
for x in iterable:
runat, now = runats.popleft(), time.time()
if now < runat:
time.sleep(runat - now)
runats.append(time.time() + 1)
yield x
def multi(vec):
p = Pool(5)
return p.map(f, rate_limited_iterator(vec, 20))
I have to time the implementation I did of an algorithm in one of my classes, and I am using the time.time() function to do so. After implementing it, I have to run that algorithm on a number of data files which contains small and bigger data sets in order to formally analyse its complexity.
Unfortunately, on the small data sets, I get a runtime of 0 seconds even if I get a precision of 0.000000000000000001 with that function when looking at the runtimes of the bigger data sets and I cannot believe that it really takes less than that on the smaller data sets.
My question is: Is there a problem using this function (and if so, is there another function I can use that has a better precision)? Or am I doing something wrong?
Here is my code if ever you need it:
import sys, time
import random
from utility import parseSystemArguments, printResults
...
def main(ville):
start = time.time()
solution = dynamique(ville) # Algorithm implementation
end = time.time()
return (end - start, solution)
if __name__ == "__main__":
sys.argv.insert(1, "-a")
sys.argv.insert(2, "3")
(algoNumber, ville, printList) = parseSystemArguments()
(algoTime, solution) = main(ville)
printResults(algoTime, solution, printList)
The printResults function:
def printResults(time, solution, printList=True):
print ("Temps d'execution = " + str(time) + "s")
if printList:
print (solution)
The solution to my problem was to use the timeit module instead of the time module.
import timeit
...
def main(ville):
start = timeit.default_timer()
solution = dynamique(ville)
end = timeit.default_timer()
return (end - start, solution)
Don't confuse the resolution of the system time with the resolution of a floating point number. The time resolution on a computer is only as frequent as the system clock is updated. How often the system clock is updated varies from machine to machine, so to ensure that you will see a difference with time, you will need to make sure it executes for a millisecond or more. Try putting it into a loop like this:
start = time.time()
k = 100000
for i in range(k)
solution = dynamique(ville)
end = time.time()
return ((end - start)/k, solution)
In the final tally, you then need to divide by the number of loop iterations to know how long your code actually runs once through. You may need to increase k to get a good measure of the execution time, or you may need to decrease it if your computer is running in the loop for a very long time.
I'm developing an app in Google App Engine. One of my methods is taking never completing, which makes me think it's caught in an infinite loop. I've stared at it, but can't figure it out.
Disclaimer: I'm using http://code.google.com/p/gaeunitlink text to run my tests. Perhaps it's acting oddly?
This is the problematic function:
def _traverseForwards(course, c_levels):
''' Looks forwards in the dependency graph '''
result = {'nodes': [], 'arcs': []}
if c_levels == 0:
return result
model_arc_tails_with_course = set(_getListArcTailsWithCourse(course))
q_arc_heads = DependencyArcHead.all()
for model_arc_head in q_arc_heads:
for model_arc_tail in model_arc_tails_with_course:
if model_arc_tail.key() in model_arc_head.tails:
result['nodes'].append(model_arc_head.sink)
result['arcs'].append(_makeArc(course, model_arc_head.sink))
# rec_result = _traverseForwards(model_arc_head.sink, c_levels - 1)
# _extendResult(result, rec_result)
return result
Originally, I thought it might be a recursion error, but I commented out the recursion and the problem persists. If this function is called with c_levels = 0, it runs fine.
The models it references:
class Course(db.Model):
dept_code = db.StringProperty()
number = db.IntegerProperty()
title = db.StringProperty()
raw_pre_reqs = db.StringProperty(multiline=True)
original_description = db.StringProperty()
def getPreReqs(self):
return pickle.loads(str(self.raw_pre_reqs))
def __repr__(self):
return "%s %s: %s" % (self.dept_code, self.number, self.title)
class DependencyArcTail(db.Model):
''' A list of courses that is a pre-req for something else '''
courses = db.ListProperty(db.Key)
def equals(self, arcTail):
for this_course in self.courses:
if not (this_course in arcTail.courses):
return False
for other_course in arcTail.courses:
if not (other_course in self.courses):
return False
return True
class DependencyArcHead(db.Model):
''' Maintains a course, and a list of tails with that course as their sink '''
sink = db.ReferenceProperty()
tails = db.ListProperty(db.Key)
Utility functions it references:
def _makeArc(source, sink):
return {'source': source, 'sink': sink}
def _getListArcTailsWithCourse(course):
''' returns a LIST, not SET
there may be duplicate entries
'''
q_arc_heads = DependencyArcHead.all()
result = []
for arc_head in q_arc_heads:
for key_arc_tail in arc_head.tails:
model_arc_tail = db.get(key_arc_tail)
if course.key() in model_arc_tail.courses:
result.append(model_arc_tail)
return result
Am I missing something pretty obvious here, or is GAEUnit acting up?
Also - the test that is making this run slow has no more than 5 models of any kind in the datastore. I know this is potentially slow, but my app only does this once then subsequently caches it.
Ignoring the commented out recursion, I don't think this should be an infinite loop - you are just doing some for-loops over finite results sets.
However, it does seem like this would be really slow. You're looping over entire tables and then doing more datastore queries in every nested loop. It seems unlikely that this sort of request would complete in a timely manner on GAE unless your tables are really, really small.
Some rough numbers:
If H = # of entities in DepedencyArcHead and T = average # of tails in each DependencyArcHead then:
_getListArcTailsWithCourse is doing about H*T queries (understimate). In the "worst" case, the result returned from this function will have H*T elements.
_traverseForwards loops over all these results H times, and thus does another H*(H*T) queries.
Even if H and T are only on the order of 10s, you could be doing thousands of queries. If they're bigger, then ... (and this ignores any additional queries you'd do if you uncommented the recursive call).
In short, I think you may want to try to organize your data a little differently if possible. I'd make a specific suggestion, but what exactly you're trying to do isn't clear to me.