I have some particle track data from an OpenFoam simulation.
The data looks like this:
0.005 0.00223546 1.52096e-09 0.00503396
0.01 0.00220894 3.92829e-09 0.0101636
0.015 0.00218103 5.37107e-09 0.0154245
.....
First row is time, then x, y ,z coordinates.
In my folder, I have a file for every tracked particle.
I would like to calculate the velocity and the displacement for each particle in each timestep.
It would be nice to enter the position data in a way like particle[1].time[0.01].
Is there already a python tool for that kind of problem?
Thanks a lot
If you have regular time steps, you can use a pandas dataframe to find the difference
import pandas as pd
dt = .005 #or whatever time difference you have
df = pd.read_csv(<a bunch of stuff indicating how to read the file>)
df['v_x'] = df.diff(<the x colum>)
df['v_x'] = df['v_x']/dt
You "almost" don't need numpy for that. I created a simple class hierarchy with some initial methods. You could improve from that if you like the approach. Note that I am creating from a string, you should use for line in file instead of the string.split way.
import numpy
class Track(object):
def __init__(self):
self.trackpoints = []
def AddTrackpoint(self, line):
tpt = self.Trackpoint(line)
if self.trackpoints and tpt.t < self.trackpoints[-1].t:
raise ValueError("timestamps should be in ascending order")
self.trackpoints.append(tpt)
return tpt
def length(self):
pairs = zip(self.trackpoints[:-1], self.trackpoints[1:])
dists = map(self.distance, pairs)
result = sum(dists)
print result
def distance(self, points):
p1, p2 = points
return numpy.sqrt(sum((p2.pos - p1.pos)**2)) # only convenient use of numpy so far
class Trackpoint(object):
def __init__(self, line):
t, x, y, z = line.split(' ')
self.t = t
self.pos = numpy.array((x,y,z), dtype=float)
entries = """
0.005 0.00223546 1.52096e-09 0.00503396
0.01 0.00220894 3.92829e-09 0.0101636
0.015 0.00218103 5.37107e-09 0.0154245
""".strip()
lines = entries.split("\n")
track = Track()
for line in lines:
track.AddTrackpoint(line)
print track.length()
Individual files would be easily loaded with something like:
import numpy as np
t, x, y, z = np.loadtxt(filename, delimiter=' ', unpack=True)
Now there is an issue as you would like to index particule position with time, whereas Numpy will only accept integers as indices.
edit: in Python you could make "position" a dictionary so that you can index it with a float, or anything else. But now that comes down to the amount of data you have, and what you want to do with it. Because dictionaries will be less efficient than Numpy arrays for anything a little more 'advanced' than just picking position at time t.
Related
I am trying to iterate through a CSV file and create a numpy array for each row in the file, where the first column represents the x-coordinates and the second column represents the y-coordinates. I then am trying to append each array into a master array and return it.
import numpy as np
thedoc = open("data.csv")
headers = thedoc.readline()
def generatingArray(thedoc):
masterArray = np.array([])
for numbers in thedoc:
editDocument = numbers.strip().split(",")
x = editDocument[0]
y = editDocument[1]
createdArray = np.array((x, y))
masterArray = np.append([createdArray])
return masterArray
print(generatingArray(thedoc))
I am hoping to see an array with all the CSV info in it. Instead, I receive an error: "append() missing 1 required positional argument: 'values'
Any help on where my error is and how to fix it is greatly appreciated!
Numpy arrays don't magically grow in the same way that python lists do. You need to allocate the space for the array in your "masterArray = np.array([])" function call before you add everything to it.
The best answer is to import directly to a numpy array using something like genfromtxt (https://docs.scipy.org/doc/numpy-1.10.1/user/basics.io.genfromtxt.html) but...
If you know the number of lines you're reading in, or you can get it using something like this.
file_length = len(open("data.csv").readlines())
Then you can preallocate the numpy array to do something like this:
masterArray = np.empty((file_length, 2))
for i, numbers in enumerate(thedoc):
editDocument = numbers.strip().split(",")
x = editDocument[0]
y = editDocument[1]
masterArray[i] = [x, y]
I would recommend the first method but if you're lazy then you can always just build a python list and then make a numpy array.
masterArray = []
for numbers in thedoc:
editDocument = numbers.strip().split(",")
x = editDocument[0]
y = editDocument[1]
createdArray = [x, y]
masterArray.append(createdArray)
return np.array(masterArray)
In a jupyter notebook I OO-modeled a resource but in the control loop need to aggregate data over multiple objects being inefficient compared to ufuncs and similar operations. To package functionality i chose OO but for efficient and concise code i probably have to pull out the data into a storage class (maybe) and push all the ri[0] lines into a 2d array, in this case (2,K).
The class does not need the log, only the last entries.
K = 100
class Resource:
def __init__(self):
self.log = np.random( (5,K) )
# log gets filled during simulation
r0 = Resource()
r1 = Resource()
# while control loop:
#aggregate control data
for k in K:
total_row_0 = r0.log[0][k] + r1.log[0][k]
#do sth with the totals and loop again
This would greatly improves performance, but i have difficulty to link the data to the class if separately stored. How would you approach this? pandas DataFrames, np View or Shallow Copy?
[[...] #r0
[...] ]#r1 same data into one array, efficient but map back to class difficult
Here is my take on it:
import numpy as np
K = 3
class Res:
logs = 2
def __init__(self):
self.log = None
def set_log(self, view):
self.log = view
batteries = [Res(), Res()]
d = {'Res': np.random.random( (Res.logs * len(batteries), K) )}
for i in range(len(batteries)):
view = d['Res'].view()[i::len(batteries)][:]
batteries[i].set_log(view)
print(d)
batteries[1].log[1][2] = 1#test modifies view of last entry of second Res of second log
print(d)
I am new to multicore programming with Python.
So first this is an example code, that shows my problem:
numpyarray = np.zero((100, 100))
def myMainFunction(x, y, t, data):
global numpyarray
x += t
numpyarray[x][y]+=data
for line in myDict:
x = myDict[line]['pos']['x']
y = myDict[line]['pos']['y']
digit = 0
for data in FD[line]['data']:
digit += 1
t = otherFunction(digit)
myMainFunction(x, y, t, data)
So my two main problems are, that I have a global variable, the numpy array, that i want to fill parallel and that i have more than variable for the myMainFunction that i wanna run parallel.
Anyone an Idea how tu run this parallel, because the code takes a lot of time..
I'm writing a script that reads in one file containing a list of files and performing gaussian fits on each of those files. Each of these files is made up of two columns (wv and flux in the script below). My small issue here is how do I limit the range based "wv" values? I tried using a "for" loop for this but I get errors related to the fit (which I don't get if I don't limit the "wv" range).
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
fits = []
wvi_b = []
wvi_r = []
p = open("file_input.txt","r")
for line in p:
fits.append(str(line.split()[0]))
wvi_b.append(float(line.split()[1]))
wvi_r.append(float(line.split()[2]))
p.close()
for j in range(len(fits)):
wv = []
flux = []
f = open("%s"%(fits[j]),"r")
for line in f:
wv.append(float(line.split()[0]))
flux.append(float(line.split()[1]))
f.close()
def gauss(x,a,b,c,a1,b1,c1,d):
func = a*np.exp(-((x-b)**2)/(2.0*(c)**2)) + a1*np.exp(-((x-b1)**2)/(2.0*(c1)**2))+d
return func
for wv in range(6450, 6575):
guess=(0.8,wvi_b[j],3.0,1.0,wvi_r[j],3.0,1.0)
popt,pconv=curve_fit(gauss,wv,flux,guess)
print popt[1], popt[4]
ymod=gauss(wv,*popt)
plt.plot(wv,ymod)
plt.plot(wv,flux,marker='.')
plt.show()
When you call for wv in range(6450, 6575), wv is just an integer in that range, not a member of the list. I'd try taking a look at how you're using that variable. If you want to access data from the list wv, you would have to update the syntax to be wv[wv] (which is a little confusing - it might be best to change the variable in your for loop to something else).
I have some project which I decide to do in Python. In brief: I have list of lists. Each of them also have lists, sometimes one-element, sometimes more. It looks like this:
rules=[
[[1],[2],[3,4,5],[4],[5],[7]]
[[1],[8],[3,7,8],[3],[45],[12]]
[[31],[12],[43,24,57],[47],[2],[43]]
]
The point is to compare values from numpy array to values from this rules (elements of rules table). We are comparing some [x][y] point to first element (e.g. 1 in first element), then, if it is true, value [x-1][j] from array with second from list and so on. Five first comparisons must be true to change value of [x][y] point. I've wrote sth like this (main function is SimulateLoop, order are switched because simulate2 function was written after second one):
def simulate2(self, i, j, w, rule):
data = Data(rule)
if w.world[i][j] in data.c:
if w.world[i-1][j] in data.n:
if w.world[i][j+1] in data.e:
if w.world[i+1][j] in data.s:
if w.world[i][j-1] in data.w:
w.world[i][j] = data.cc[0]
else: return
else: return
else: return
else: return
else: return
def SimulateLoop(self,w):
for z in range(w.steps):
for i in range(2,w.x-1):
for j in range(2,w.y-1):
for rule in w.rules:
self.simulate2(i,j,w,rule)
Data class:
class Data:
def __init__(self, rule):
self.c = rule[0]
self.n = rule[1]
self.e = rule[2]
self.s = rule[3]
self.w = rule[4]
self.cc = rule[5]
NumPy array is a object from World class. Rules is list as described above, parsed by function obtained from another program (GPL License).
To be honest it seems to work fine, but it does not. I was trying other possibilities, without luck. It is working, interpreter doesn't return any errors, but somehow values in array changing wrong. Rules are good because it was provided by program from which I've obtained parser for it (GPL license).
Maybe it will be helpful - it is Perrier's Loop, modified Langton's loop (artificial life).
Will be very thankful for any help!
)
I am not familiar with Perrier's Loop, but if you code something like famous "game life" you would have done simple mistake: store the next generation in the same array thus corrupting it.
Normally you store the next generation in temporary array and do copy/swap after the sweep, like in this sketch:
def do_step_in_game_life(world):
next_gen = zeros(world.shape) # <<< Tmp array here
Nx, Ny = world.shape
for i in range(1, Nx-1):
for j in range(1, Ny-1):
neighbours = sum(world[i-1:i+2, j-1:j+2]) - world[i,j]
if neighbours < 3:
next_gen[i,j] = 0
elif ...
world[:,:] = next_gen[:,:] # <<< Saving computed next generation