I am basically building a 3D scatter plot using primitive UV spheres and am running into memory issues when attempting to create more than a couple hundred points at one time. I am limited on my laptop with a 2.1Ghz processor but wanted to know if there is a better way to write this:
import bpy
import random
while count < 5:
bpy.ops.mesh.primitive_uv_sphere_add(size=.3,\
location=(random.randint(-9,9), random.randint(-9,9),\
random.randint(-9,9)), rotation=(0,0,0))
count += 1
I realize that with such a simple script any performance increase is likely negligible but wanted to give it a shot anyway.
Some possible suggestions
I would pre-calculate the x,y,z values, store them in a mathutil vector and add it to a dict to be iterated over.
Duplication should provide a smaller memory footprint than
instantiating new objects. bpy.ops.object.duplicate_move(OBJECT_OT_duplicate=(linked:false, TRANSFORM_OT_translate=(transform)
Edit:
Doing further research it appears each time a bpy.ops.* is called the redraw function . One user documentented exponential increase in time taken to genenerate UV sphere.
CoDEmanX provided the following code snippet to another user.
import bpy
bpy.ops.object.select_all(action='DESELECT')
bpy.ops.mesh.primitive_uv_sphere_add()
sphere = bpy.context.object
for i in range(-1000, 1000, 2):
ob = sphere.copy()
ob.location.y = i
#ob.data = sphere.data.copy() # uncomment this, if you want full copies and no linked duplicates
bpy.context.scene.objects.link(ob)
bpy.context.scene.update()
Then it is just a case of adapting the code to set the object locations
obj.location = location_dict[i]
Related
I'm running python code that's similar to:
import numpy
def get_user_group(user, groups):
if not user.group_id:
user.group_id = assign(groups)
return user.group_id
def assign(groups):
for group in groups:
ids.append(group.id)
percentages.append(group.percentage) # e.g. .33
assignment = numpy.random.choice(ids, p=percentages)
return assignment
We are running this in the wild against tens of thousands of users. I've noticed that the assignments do not respect the actual group percentages. E.G. if our percentages are [.9, .1] we've noticed a consistent hour over hour split of 80% and 20%. We've confirmed the inputs of the choice function are correct and mismatch from actual behavior.
Does anyone have a clue why this could be happening? Is it because we are using the global numpy? Some groups will be split between [.9, .1] while others are [.33,.34,.33] etc. Is it possible that different sets of groups are interfering with each other?
We are running this code in a python flask web application on a number of nodes.
Any recommendations on how to get reliable "random" weighted choice?
This comment exhausted the limitations of a comment, hence I post it here.
The fact that your team was not able to reproduce the problem but got proper results is a sign that most probably NumPy can suit your needs. You can benefit from NumPy later, when you need efficiency, and it can be seen that efficiency is not your concern now.
A more complete code and infrastructure setup on your nodes would be helpful though. How often do you restart your Flask server? Where do you initialize the NumPy random generator? Consider the following code that creates a page /random which can be customized with size, e.g: localhost:5000/random?size=20:
from flask import Flask, request
import numpy
import pandas
... # your webapp
numpy.random.seed(0)
#app.route('/random', methods=['GET'])
def random():
"""Gives the desired number of random numbers
with the state of the random number generator.
"""
# DON'T PUT numpy.random.seed(0) HERE
size = request.args.get('size')
if size is not None:
size = int(size)
else:
size = 1
state = numpy.random.get_state()
data = numpy.random.random(size=size)
table = pandas.DataFrame(data=data)
return table.to_html() + repr(state)
In this example, the state is initialized once after the Flask app is started. Whenever the /random page is requested, good random numbers are generated.
If you put the state initialization inside the function, it would surely cause unexpected distributions, bc you'll get the same random numbers (and same choices).
If you use multiple nodes and initialize with the same seed, your different nodes will produce the same choice again. In this case, use the unique node ids as seed values. If you restart the servers often, concatenate the restart ID or timestamp to the unique node ID. It is also a good idea to ensure that the timestamp is logged.
I have a piece of code which calculates positions of some satellites and planets using Skyfield. For clarity, I use Pandas DataFrame as a container of positions and corresponding time moments. I want to make calculation parallel, but always getting the same error: TypeError: can't pickle Satrec objects. Different parallelizers were tested, like Dask, pandarallel, swifter and Pool.map().
Example of piece of code to be parallelized:
def get_sun_position(self, row):
t = self.ts.utc(row["Date"]) # from skyfield
pos = self.earth.at(t).observe(self.sun).apparent().position.m # from skyfield, error is here
return pos
def get_sat_position(self, row):
t = self.ts.utc(row["Date"]) # from skyfield
pos = self.sat.at(t).position.m # from skyfield, error is here
return pos
def get_positions(self):
self.df["sat_pos"] = self.df.swifter.apply(self.get_sat_position, axis=1) # all the parallelization goes here
self.df["sun_pos"] = self.df.swifter.apply(self.get_sun_position, axis=1) # and here
# the same implementation but using dask
# self.df["sat_pos"] = dd.from_pandas(self.df, npartitions=4*cpu_count())\
# .map_partitions(lambda df : df.apply(lambda row : self.get_sat_position(row),axis=1))\
# .compute(scheduler='processes')
# self.df["sun_pos"] = dd.from_pandas(self.df, npartitions=4*cpu_count())\
# .map_partitions(lambda df : df.apply(lambda row : self.get_sun_position(row),axis=1))\
# .compute(scheduler='processes')
For Dask to avoid Pickle I tried to set serializaton manually like this serializers=['dask', 'pickle'] but it didn't help.
As I understand, Skyfield uses sgp4 which contains Satrec class.
I would be wondering if there is some way to parallelize this .apply(). Or maybe I should not try Skyfield functions for parallel processing at all?
Alas, all of the mechanisms you are using to make the computation parallel do so by creating another process and then sending copies of all of the objects involved in the computation over to the other process — and the Satrec object is written in C++, not Python, to make it faster, and C++ objects have no native way to "serialize" themselves into bytes for transmission to another process. (Python objects have that ability built-in.)
Have you profiled your code to see what the most expensive steps are? My guess is that most of your expense is in the Sun computation, because to achieve its high precision Skyfield needs to compute the Earth's orientation to very high accuracy to give the Sun's position in the sky to high enough precision for even radio astronomers.
But if you yourself don't need that high an accuracy, you could switch to lower-precision sky coordinates for the Sun. Before using t in get_sun_position(), try doing this to it:
t._nutation_angles = iau2000b(t.tt)
That will use a lower precision estimate of the Earth's nutation (print out the values before and after this change to see how big the difference is, and compare that to how much inaccuracy your application can stand), but also hopefully run faster.
The Maya python code below gives a nurbs boolean surface by first taking the difference of two nurbs spheres, nurbsSphere1 and nurbsSphere2, to give the nurbs surface nurbsBooleanSurface1. It then takes the difference of this surface and a third sphere, nurbsSphere3. The result, as seen in the outliner, is the three nurbs spheres plus a surfaceVarGroup, nurbsBooleanSurface1, which 'parents' three transform nodes nurbsBooleanSurface1_1, nurbsBooleanSurface1_2 and nurbsBooleanSurface1_3.
import maya.cmds as cmds
cmds.sphere(nsp=10, r=50)
cmds.sphere(nsp=4, r=5)
cmds.setAttr("nurbsSphere2.translateX",-12.583733)
cmds.setAttr("nurbsSphere2.translateY",-2.2691557)
cmds.setAttr("nurbsSphere2.translateZ",48.33736)
cmds.nurbsBoolean("nurbsSphere1", "nurbsSphere2", nsf=1, op=1)
cmds.sphere(nsp=4, r=5)
cmds.setAttr("nurbsSphere3.translateX",-6.7379503)
cmds.setAttr("nurbsSphere3.translateY",3.6949043)
cmds.setAttr("nurbsSphere3.translateZ",49.40595)
cmds.nurbsBoolean("nurbsBooleanSurface1", "nurbsSphere3", nsf=1, op=1)
print(cmds.ls("nurbsBooleanSurface1_*", type="transform"))
Strangley (to me), the list command, cmds.ls("nurbsBooleanSurface1_*", type="transform") only yields [u'nurbsBooleanSurface1_1', u'nurbsBooleanSurface1_2']; nurbsBooleanSurface1_3 is missing.
But when, after having executed the above code, the print command
print(cmds.ls("nurbsBooleanSurface1_*", type="transform"))
is re-executed, the result is [u'nurbsBooleanSurface1_1', u'nurbsBooleanSurface1_2', u'nurbsBooleanSurface1_3'].
I've tried delaying the execution of the final print command using time.sleep(n) to no avail. I've played with the idea that the missing node might have spun off into another namespace and then re-appeared at the completion of the execution block (desperate, I know!). I've experimented with renaming the spheres and surfaces, using functions and threads (the latter only superficially). The cause of the unlisted nurbsBooleanSurface1_3 on the first execution of
print(cmds.ls("nurbsBooleanSurface1_*", type="transform"))
remains a mystery. Any help would be much appreciated.
A dirty way (but only way I could find) is to call cmds.refresh() during the script.
I have rewritten your script here. Notice that I store each sphere in a variable, this is good practice to make sure it'll work, even if an existing object is already called nurbsSphere3 for example.
import maya.cmds as cmds
sphere1 = cmds.sphere(nsp=10, r=50)
sphere2 = cmds.sphere(nsp=4, r=5)
cmds.setAttr(sphere2[0] + ".translateX",-12.583733)
cmds.setAttr(sphere2[0] + ".translateY",-2.2691557)
cmds.setAttr(sphere2[0] + ".translateZ",48.33736)
nurbsBool1 = cmds.nurbsBoolean("nurbsSphere1", "nurbsSphere2", nsf=1, op=1)
sphere3 = cmds.sphere(nsp=4, r=5)
cmds.setAttr(sphere3[0] + ".translateX",-6.7379503)
cmds.setAttr(sphere3[0] + ".translateY",3.6949043)
cmds.setAttr(sphere3[0] + ".translateZ",49.40595)
nurbsBool2 = cmds.nurbsBoolean(nurbsBool1[0], sphere3[0], nsf=1, op=1)
cmds.refresh(currentView=True) # Force evaluation, of current view only
print(cmds.listRelatives(nurbsBool2[0], children=True, type="transform"))
When you create an object using cmds.sphere() it returns a list of the object name and more. To access this, you can use
mySphere = cmds.sphere()
print(mySphere)
# Result: [u'nurbsSphere1', u'makeNurbSphere1']
print(mySphere[0]) # the first element in the list is the object name
# Result: nurbsSphere1
The same is true for the boolean operation. Look in the documentation for the command under Return value http://help.autodesk.com/cloudhelp/2016/ENU/Maya-Tech-Docs/CommandsPython/index.html
Given 2 large arrays of 3D points (I'll call the first "source", and the second "destination"), I needed a function that would return indices from "destination" which matched elements of "source" as its closest, with this limitation: I can only use numpy... So no scipy, pandas, numexpr, cython...
To do this i wrote a function based on the "brute force" answer to this question. I iterate over elements of source, find the closest element from destination and return its index. Due to performance concerns, and again because i can only use numpy, I tried multithreading to speed it up. Here are both threaded and unthreaded functions and how they compare in speed on an 8 core machine.
import timeit
import numpy as np
from numpy.core.umath_tests import inner1d
from multiprocessing.pool import ThreadPool
def threaded(sources, destinations):
# Define worker function
def worker(point):
dlt = (destinations-point) # delta between destinations and given point
d = inner1d(dlt,dlt) # get distances
return np.argmin(d) # return closest index
# Multithread!
p = ThreadPool()
return p.map(worker, sources)
def unthreaded(sources, destinations):
results = []
#for p in sources:
for i in range(len(sources)):
dlt = (destinations-sources[i]) # difference between destinations and given point
d = inner1d(dlt,dlt) # get distances
results.append(np.argmin(d)) # append closest index
return results
# Setup the data
n_destinations = 10000 # 10k random destinations
n_sources = 10000 # 10k random sources
destinations= np.random.rand(n_destinations,3) * 100
sources = np.random.rand(n_sources,3) * 100
#Compare!
print 'threaded: %s'%timeit.Timer(lambda: threaded(sources,destinations)).repeat(1,1)[0]
print 'unthreaded: %s'%timeit.Timer(lambda: unthreaded(sources,destinations)).repeat(1,1)[0]
Retults:
threaded: 0.894030461056
unthreaded: 1.97295164054
Multithreading seems beneficial but I was hoping for more than 2X increase given the real life dataset i deal with are much larger.
All recommendations to improve performance (within the limitations described above) will be greatly appreciated!
Ok, I've been reading Maya documentation on python and I came to these conclusions/guesses:
They're probably using CPython inside (several references to that documentation and not any other).
They're not fond of threads (lots of non-thread safe methods)
Since the above, I'd say it's better to avoid threads. Because of the GIL problem, this is a common problem and there are several ways to do the earlier.
Try to build a tool C/C++ extension. Once that is done, use threads in C/C++. Personally, I'd only try SIP to work, and then move on.
Use multiprocessing. Even if your custom python distribution doesn't include it, you can get to a working version since it's all pure python code. multiprocessing is not affected by the GIL since it spawns separate processes.
The above should've worked out for you. If not, try another parallel tool (after some serious praying).
On a side note, if you're using outside modules, be most mindful of trying to match maya's version. This may have been the reason because you couldn't build scipy. Of course, scipy has a huge codebase and the windows platform is not the most resilient to build stuff.
I am developing an application for color blind people to enable them smoothly surf the Internet. I have a set of colors, lets say A , which consists of all the colors seen by a color blind person. Set A is calculated using a big calculation involving millions of colors. Set A is independent of inputs taken in my application i.e set A is like a 'constant' to me (just like 'pi' in mathematics). Now I want to store set A so that whenever I run my application, it is available without any added computational cost i.e i don't have to calculate A every time I run my application.
My Try:
I think this can be done by building a class having one constant but can it be done without creating any special class for just a constant?
I am using Python!
No need for a class. You want to store the calculated values on disk and load them back again on startup: for that you will want to look into the shelve or pickle libraries.
Yes, you can certainly do this with Python
If your constant was just a number -- say, you had just discovered tau -- then you would just declare it in a module, and import that module in all of your other source files:
constants.py:
# Define my new super-useful number
TAU = 6.28318530718
everywhere else:
from constants import TAU # Look, no calculations!
Expanding a bit, if you had a more complicated structure, like a dictionary, that took you a long time to compute, then you could just declare that in your module instead:
constants.py:
# Verified results of the national survey
PEPSI_CHALLENGE = {
'Pepsi': 0.57,
'Coke': 0.43,
}
And you can do this for more and more complicated data. The problem, eventually, is that just writing your constants module gets harder and harder, the more complex your data is, and it can be especially hard to update if you occasionally recompute the value you want to cache. In that case, you want to look at pickling the data, possibly as the final step of a python script which calculates it, and then load that data in a module that you import.
To do that, import pickle, and dump a single object out to a disk file:
recalculate.py:
# Here is the script that computes a small value from the hugely complicated domain:
import random
from itertools import groupby
import pickle
# Collect all of the random numbers
random_numbers = [random.randint(0,10) for r in xrange(1000000)]
# TODO: Check this -- this should definitely be 7
most_popular = max(groupby(sorted(random_numbers)),
key=lambda(x, v):(len(list(v)),-L.index(x)))[0]
# Now save the most common random number to disk, using pickle
# Almost any object is picklable like this, but check the docs for the exact details
pickle.dump(most_popular, open('data_cache','w'))
Now, in your constants file, you can simply read the pickled data from the file on disk, and have it available without recalculating it:
constants.py:
import pickle
most_popular = pickle.load(open('data_cache'))
everywhere else:
from constants import most_popular