This might be a dumb question, but say i want to build a program from bottom-up like so:
class Atom(object):
def __init__(self):
'''
Constructor
'''
def atom(self, foo, bar):
#...with foo and bar being arrays of atom Params of lengths m & n
"Do what atoms do"
return atom_out
...i can put my instances in a dictionary:
class Molecule(Atom):
def __init__(self):
def structure(self, a, b):
#a = 2D array of size (num_of_atoms, m); 'foo' Params for each atom
#b = 2D array of size (num_of_atoms, n); 'bar' Params for each atom
unit = self.atom()
fake_array = {"atom1": unit(a[0], b[0]),
"atom2": unit(a[1], b[1]),
: : :
: : :}
def chemicalBonds(self, this, that, theother):
: : :
: : :
My question is, is there a way to do this with numpy arrays so that each element in "real_array" would be an instance of atom--i.e., the output of the individual computations of atom function? I can extend this to class Water(molecule): which would perform fast numpy operations on the large structure and chemicalBonds outputs, hence the need for arrays...Or is it the case that i'm going about this the wrong way?
Also if i am on the right track, i'd appreciate if you wanted to throw in any tips on how to structure a "hierarchical program" like this, as i'm not sure i'm doing the above correctly and recently discovered that i don't know what i'm doing.
Thanks in advance.
The path to hell is paved with premature optimization... As a beginner in python, focus on your program and what is supposed to do, once it is doing it too slowly you can ask focused questions about how to make it do it faster. I would stick with learning python's intrinsic data structures for managing your objects. You can implement your algorithms using using numpy arrays with standard data types if you are doing large array operations. Once you have some working code you can do performance testing to determine where you need optimization.
Numpy does allow you to create arrays of objects, and I will give you enough rope to hang yourself with below, but creating an ecosystem of tools to operate on those arrays of objects is not a trivial undertaking. You should first work with python data structures (buy Beazley's essential python reference), then with numpy's built in types, then creating your own compound numpy types. As a last resort, use the object type from the example below.
Good luck!
David
import numpy
class Atom(object):
def atoms_method(self, foo, bar):
#...with foo and bar being arrays of Paramsof length m & n
atom_out = foo + bar
return atom_out
array = numpy.ndarray((10,),dtype=numpy.object)
for i in xrange(10):
array[i] = Atom()
for i in xrange(10):
print array[i].atoms_method(i, 5)
Related
So far I mostly used Python for data analysis but for some time try to implement stuff. Right now I'm trying to implement a toxicokinetic-toxicodynamic model for a fish to analyse the effect of chemicals on them.
So given the following code:
import numpy as np
class fish():
def __init__(self):
self.resistance_threshold = np.random.normal(0,1)
My question is now, say I would like to initialize multiple instances of the fishclass (say 1000 fish), each with a different resistance to a chemical in order to model an agent-based population. How could one achieve this automatically?
I was wondering if there is something as using for example an index as part of the variable name, e.g.:
for fishid in range(0,1000):
fishfishid = fish() # use here the value of fishid to become the variables name. E.g. fish1, fish2, fish3, ..., fish999
Now even if there is a possibility to do this in Python, I always have the feeling, that implementing those 1000 instances is kinda bad practice. And was wondering if there is like an OOP-Python approach. Such as e.g. setting up a class "population" which initializes it within its own __init__function, but how would I assign the fish without initializing them first?
Any tipps, pointers or links would be greatly appreciated.
You can create a class FishPopulation and then store there all the Fish you need based on the size argument. For example, something like this would work:
import numpy as np
class Fish:
def __init__(self):
self.resistance_threshold = np.random.normal(0, 1)
class FishPopulation:
def __init__(self, size=1000):
self.size = size
self.fishes = [Fish() for _ in range(size)]
You can iterate over it like this:
fish_population = FishPopulation(size=10)
for fish in fish_population.fishes:
print(fish.resistance_threshold)
>>>
-0.9658927669391391
-0.5934917229482478
0.8827336199040103
-1.5729644992077412
-0.7682070400307331
1.464407499255235
0.7724449293785645
-0.7296586180041732
-1.1989783570280217
0.15716170041128566
And you can access their indexes like this:
print(fish_population.fishes[0].resistance_threshold)
>>> -0.9658927669391391
I'm trying to implement new data type "Fractions" in Python to represents fractions, where numenator and denominator are both integers. Moreover, I have to implement four basic arithmetic operations. The trick is, I can't use classes in this task.
I thoght maybe tuples can be a good idea but I really don't know how to approach this.
Is there an easy way to solve such a problem? Any hint would really help me.
You have two problems. 1) How to encapsulate the data, and 2) How to operate on the data.
First, let's solve encapsulation. Just put everything you need in a tuple:
half = (1,2)
whole = (1,1)
answer = (42,1)
See? The first item is the numerator, the second is the denominator.
Now you need a way to operate on the data. Since we can't use methods, we'll just use regular functions:
def mul(a,b):
'Multiply two fractions'
return (a[0]*b[0], a[1]*b[1])
Similarly, implement add(a,b), negate(a), sub(a,b), etc. You might need a simplify(), so you don't end up with 10240000/20480000 after a while.
To make our object-oriented-without-classes suite complete, we need a constructor:
def make_frac(num, denom):
'Create a fraction with the indicated numerate and denominator'
return (num, denom)
Finally, place all of these functions in a module, and your task is complete. The user of your library will write something like this:
import your_fraction_lib
half = your_fraction_lib.make_frac(1,2)
quarter = your_fraction_lib.mul(half, half)
three_quaters = your_fraction_lib.add(half, quarter)
If you want to troll your teacher, you could do something along the lines of:
def construct(values):
def mul(other_fraction):
new_numerator = values['numerator']*other_fraction['values']['numerator']
new_denominator = values['denominator']*other_fraction['values']['denominator']
new_values = {'numerator':new_numerator,'denominator':new_denominator}
return(construct(new_values))
return({'values':{'numerator':values['numerator'],'denominator':values['denominator']},'mul':mul})
This allows you to construct objects that contain a mul function that acts much like a class method:
x = construct({'numerator':1,'denominator':2})
y = construct({'numerator':3,'denominator':5})
product = x['mul'](y)
print(product['values']['numerator'],product['values']['denominator'])
>>3 10
I am building data simulation framework with numpy ORM, where it is much more convenient to work with classes and objects instead of numpy arrays directly. Nevertheless, output of the simulation should be numpy array. Also blockz is quite interesting as a backend here.
I would like to map all object attributes to numpy arrays. Thus, numpy arrays work like a column-oriented "persistent" storage for my classes. I also need to link "new" attributes to objects which I can calculate using numpy(pandas) framework. And then just link them to objects accordingly using the same back-end.
Is there any solution for such approach? Would you recommend any way to build it in a HPC way?
I have found only django-pandas. PyTables is quite slow on adding new columns-attributes.
Something like (working on pointers to np_array):
class Instance()
def __init__(self, np_array, np_position):
self.np_array = np_array
self.np_position = np_position
def get_test_property():
return(self.np_array[np_position])
def set_test_property(value):
self.np_array[np_position] = value
In fact there is a way to change NumPy or bcolz arrays by reference.
Simple example can be found in the following code.
a = np.arange(10)
class Case():
def __init__(self, gcv_pointer):
self.gcv = gcv_pointer
def gcv(self):
return(self.gcv)
def gcv_set(self, value):
self.gcv[:] = value
pass
#===============================================================================
# NumPy
#===============================================================================
caseList = []
for i in range(1, 10):
case = Case(a[i-1:i])
caseList.append(case)
gcvs = [case.GetGCV() for case in caseList]
caseList[1].SetGCV(5)
caseList[1].SetGCV(13)
caseList[1].gcv[:] = 6
setattr(caseList[1], 'dpd', a[5:6])
caseList[1].dpd
caseList[1].dpd[:] = 888
I want to use support_code to define functions that interact with nd numpy arrays. Inside the code argument, the FOO3(i, j, k) notation works, but only in it, not in support_code.Something like this:
import scipy
import scipy.weave
code = '''return_val = f(1);'''
support_code = '''int f(int i) {
return FOO3(i, i, i);
}''''
foo = scipy.arange(3**3).reshape(3,3,3)
print(scipy.weave.inline(code, ['foo'], support_code=support_code))
The concept of support code is mainly to do some includes. In your case, I guess the function should look something like this:
import scipy
import scipy.weave
def foofunc(i):
foo = scipy.arange(3**3).reshape(3,3,3)
code = '''#do something lengthy with foo and maybe i'''
scipy.weave.inline(code, ['foo', 'i']))
return foo[i,i,i]
You don't need support code at all, for what you're trying to do. You also don't have any speed improvement, when you try to do a function return in C instead of doing that in python, also array access is neglectable compared to the cost of the function call. To get a better idea, when and how weave can help you, to speed up your code, have a look here.
Few weeks ago I asked a question on increasing the speed of a function written in Python. At that time, TryPyPy brought to my attention the possibility of using Cython for doing so. He also kindly gave an example of how I could Cythonize that code snippet. I want to do the same with the code below to see how fast I can make it by declaring variable types. I have a couple of questions related to that. I have seen the Tutorial on the cython.org, but I still have some questions. They are closely related:
I don't know any C. What parts do I need to learn, to use Cython to declare variable types?
What is the C type corresponding to python lists and tuples? For example, I can use double in Cython for float in Python. What do I do for lists? In general, where do I find the corresponding C type for a given Python type.
Any example of how I could Cythonize the code below would be really helpful. I have inserted comments in the code that give information about the variable type.
class Some_class(object):
** Other attributes and functions **
def update_awareness_status(self, this_var, timePd):
'''Inputs: this_var (type: float)
timePd (type: int)
Output: None'''
max_number = len(self.possibilities)
# self.possibilities is a list of tuples.
# Each tuple is a pair of person objects.
k = int(math.ceil(0.3 * max_number))
actual_number = random.choice(range(k))
chosen_possibilities = random.sample(self.possibilities,
actual_number)
if len(chosen_possibilities) > 0:
# chosen_possibilities is a list of tuples, each tuple is a pair
# of person objects. I have included the code for the Person class
# below.
for p1,p2 in chosen_possibilities:
# awareness_status is a tuple (float, int)
if p1.awareness_status[1] < p2.awareness_status[1]:
if p1.value > p2.awareness_status[0]:
p1.awareness_status = (this_var, timePd)
else:
p1.awareness_status = p2.awareness_status
elif p1.awareness_status[1] > p2.awareness_status[1]:
if p2.value > p1.awareness_status[0]:
p2.awareness_status = (price, timePd)
else:
p2.awareness_status = p1.awareness_status
else:
pass
class Person(object):
def __init__(self,id, value):
self.value = value
self.id = id
self.max_val = 50000
## Initial awareness status.
self.awarenessStatus = (self.max_val, -1)
As a general note, you can see exactly what C code Cython generates for every source line by running the cython command with the -a "annotate" option. See the Cython documentation for examples. This is extremely helpful when trying to find bottlenecks in a function's body.
Also, there's the concept of "early binding for speed" when Cython-ing your code. A Python object (like instances of your Person class below) use general Python code for attribute access, which is slow when in an inner loop. I suspect that if you change the Person class to a cdef class, then you will see some speedup. Also, you need to type the p1 and p2 objects in the inner loop.
Since your code has lots of Python calls (random.sample for example), you likely won't get huge speedups unless you find a way to put those lines into C, which takes a good amount of effort.
You can type things as a tuple or a list, but it doesn't often mean much of a speedup. Better to use C arrays when possible; something you'll have to look up.
I get a factor of 1.6 speedup with the trivial modifications below. Note that I had to change some things here and there to get it to compile.
ctypedef int ITYPE_t
cdef class CyPerson:
# These attributes are placed in the extension type's C-struct, so C-level
# access is _much_ faster.
cdef ITYPE_t value, id, max_val
cdef tuple awareness_status
def __init__(self, ITYPE_t id, ITYPE_t value):
# The __init__ function is much the same as before.
self.value = value
self.id = id
self.max_val = 50000
## Initial awareness status.
self.awareness_status = (self.max_val, -1)
NPERSONS = 10000
import math
import random
class Some_class(object):
def __init__(self):
ri = lambda: random.randint(0, 10)
self.possibilities = [(CyPerson(ri(), ri()), CyPerson(ri(), ri())) for i in range(NPERSONS)]
def update_awareness_status(self, this_var, timePd):
'''Inputs: this_var (type: float)
timePd (type: int)
Output: None'''
cdef CyPerson p1, p2
price = 10
max_number = len(self.possibilities)
# self.possibilities is a list of tuples.
# Each tuple is a pair of person objects.
k = int(math.ceil(0.3 * max_number))
actual_number = random.choice(range(k))
chosen_possibilities = random.sample(self.possibilities,
actual_number)
if len(chosen_possibilities) > 0:
# chosen_possibilities is a list of tuples, each tuple is a pair
# of person objects. I have included the code for the Person class
# below.
for persons in chosen_possibilities:
p1, p2 = persons
# awareness_status is a tuple (float, int)
if p1.awareness_status[1] < p2.awareness_status[1]:
if p1.value > p2.awareness_status[0]:
p1.awareness_status = (this_var, timePd)
else:
p1.awareness_status = p2.awareness_status
elif p1.awareness_status[1] > p2.awareness_status[1]:
if p2.value > p1.awareness_status[0]:
p2.awareness_status = (price, timePd)
else:
p2.awareness_status = p1.awareness_status
C does not directly know the concept of lists.
The basic data types are int (char, short, long), float/double (all of which have pretty straightforward mappings to python) and pointers.
If the concept of pointers is new to you, have a look at: Wikipedia:Pointers
Pointers can then be used as tuple/array replacements in some cases. Pointers of chars are the base for all strings.
Say you have an array of integers, you would then store it in as a continuous chunk of memory with a start address, you define the type (int) and that it’s a pointer (*):
cdef int * array;
Now you can access each element of the array like this:
array[0] = 1
However, memory has to be allocated (e.g. using malloc) and advanced indexing will not work (e.g. array[-1] will be random data in memory, this also hold for indexes exceeding the width of the reserved space).
More complex types don't directly map to C, but often there is a C way to do something that might not require the python types (e.g. a for loop does not need a range array/iterator).
As you noticed yourself, writing good cython code requires more detailed knowledge of C, so heading forward to a tutorial is probably the best next step.