I am running the following: output.to_csv("hi.csv") where output is a pandas dataframe.
My variables all have values but when I run this in iPython, no file is created. What should I do?
Better give the complete path for your output csv file. May be that you are checking in a wrong folder.
You have to make sure that your 'to_csv' method of 'output' object has a write-file function implemented.
And there is a lib for csv manipulation in python, so you dont need to handle all the work:
https://docs.python.org/2/library/csv.html
I'm not sure if this will be useful to you, but I write to CSV files frequenly in python. Here is an example generating random vectors (X, V, Z) values and writing them to a CSV, using the CSV module. (The paths are os paths are for OSX but you should get the idea even on a different os.
Working Writing Python to CSV example
import os, csv, random
# Generates random vectors and writes them to a CSV file
WriteFile = True # Write CSV file if true - useful for testing
CSVFileName = "DataOutput.csv"
CSVfile = open(os.path.join('/Users/Si/Desktop/', CSVFileName), 'w')
def genlist():
# Generates a list of random vectors
global v, ListLength
ListLength = 25 #Amount of vectors to be produced
Max = 100 #Maximum range value
x = [] #Empty x vector list
y = [] #Empty y vector list
z = [] #Empty x vector list
v = [] #Empty xyz vector list
for i in xrange (ListLength):
rnd = random.randrange(0,(Max)) #Generate random number
x.append(rnd) #Add it to x list
for i in xrange (ListLength):
rnd = random.randrange(0,(Max))
y.append(rnd) #Add it to y list
for i in xrange (ListLength):
rnd = random.randrange(0,(Max)) #Generate random number
z.append(rnd) #Add it to z list
for i in xrange (ListLength):
merge = x[i], y[i],z[i] # Merge x[i], y[i], x[i]
v.append(merge) #Add merged list into v list
def writeCSV():
# Write Vectors to CSV file
wr = csv.writer(CSVfile, quoting = csv.QUOTE_MINIMAL, dialect='excel')
wr.writerow(('Point Number', 'X Vector', 'Y Vector', 'Z Vector'))
for i in xrange (ListLength):
wr.writerow((i+1, v[i][0], v[i][1], v[i][2]))
print "Data written to", CSVfile
genlist()
if WriteFile is True:
writeCSV()
Hopefully there is something useful in here for you!
Related
I was trying to read multiple files in a folder and get one matrix from each of them and sum all of the matrices.
script for reading only one file and it works well(md.MCERunFile Item2d is some modules that existed for data reading):
outfile=md.MCERunfile('/somepath/filename')
rn_matrix=outfile.Item2d('IV', 'Rn_C%i')
Shape=np.shape(rn_matrix)
rn_matrix_float = np.array([]).reshape(0,55)
for x in range(Shape[0]):
row = map(float, rn_matrix[x])
rn_matrix_float=np.vstack([rn_matrix_float, row])
The final output rn_matrix_float is a 32 by 64 numpy array
Now I tried:
path = '/somepath/*.xxx'
files = glob.glob(path)
final_matrix=np.zeros((32, 64))
for j in range(0,len(files)):
outfile = md.MCERunfile(files[j])
rn_matrix=outfile.Item2d('IV', 'cut_rec_C%i')
Shape=np.shape(rn_matrix)
for x in range(Shape[0]):
rn_matrix_float = np.array([]).reshape(0,64)
row = map(float, rn_matrix[x])
rn_matrix_float=np.vstack([rn_matrix_float, row])
final_matrix=final_matrix+rn_matrix_float
I think my mistake is that I have already defined outfile and rn_matrix in the loop, that make every rn_matrix_float to be exactly the same instead of reading data from different files, so the final_matrix is a summation of same arrays. But I don't know how to fix it.
An iteration like this
rn_matrix_float = np.array([]).reshape(0,55)
for x in range(Shape[0]):
row = map(float, rn_matrix[x])
rn_matrix_float=np.vstack([rn_matrix_float, row])
should be written as a list append
alist = []
for x in range(Shape[0]):
row = map(float, rn_matrix[x])
alist.append(row)
rn_matrix_float=np.vstack(alist)
Appending to a list is faster and easier than repeatedly 'concatenating' arrays. Actually it could probably be written as a list comprehension, or even a one-line array operation
rn_matrix_float = rn_matrix.astype(float)
(but that's more of a guess since I haven't tried to recreate your data.)
Similarly I'd be inclined to collect the multi-file case, and do one sum at the end
alist2 = []
for j in range(0,len(files)):
outfile = md.MCERunfile(files[j])
rn_matrix=outfile.Item2d('IV', 'cut_rec_C%i')
alist2.append(rn_matrix.astype(float)
final_matrix = np.array(alist2)
print(final_matrix.shape) # check shape
final_matrix = final_matrix.sum(axis=0)
If the intermediate array gets too big we might want to add incrementally. But for a start I think you should become comfortable with accumulating multidimensional arrays, and then 'reducing' them with actions like sum.
I'm writing a script that reads in one file containing a list of files and performing gaussian fits on each of those files. Each of these files is made up of two columns (wv and flux in the script below). My small issue here is how do I limit the range based "wv" values? I tried using a "for" loop for this but I get errors related to the fit (which I don't get if I don't limit the "wv" range).
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
fits = []
wvi_b = []
wvi_r = []
p = open("file_input.txt","r")
for line in p:
fits.append(str(line.split()[0]))
wvi_b.append(float(line.split()[1]))
wvi_r.append(float(line.split()[2]))
p.close()
for j in range(len(fits)):
wv = []
flux = []
f = open("%s"%(fits[j]),"r")
for line in f:
wv.append(float(line.split()[0]))
flux.append(float(line.split()[1]))
f.close()
def gauss(x,a,b,c,a1,b1,c1,d):
func = a*np.exp(-((x-b)**2)/(2.0*(c)**2)) + a1*np.exp(-((x-b1)**2)/(2.0*(c1)**2))+d
return func
for wv in range(6450, 6575):
guess=(0.8,wvi_b[j],3.0,1.0,wvi_r[j],3.0,1.0)
popt,pconv=curve_fit(gauss,wv,flux,guess)
print popt[1], popt[4]
ymod=gauss(wv,*popt)
plt.plot(wv,ymod)
plt.plot(wv,flux,marker='.')
plt.show()
When you call for wv in range(6450, 6575), wv is just an integer in that range, not a member of the list. I'd try taking a look at how you're using that variable. If you want to access data from the list wv, you would have to update the syntax to be wv[wv] (which is a little confusing - it might be best to change the variable in your for loop to something else).
I'm trying to load a large number of files saved in the Ensight gold format into a numpy array. In order to conduct this read I've written my own class libvec which reads the geometry file and then preallocates the arrays which python will use to save the data as shown in the code below.
N = len(file_list)
# Create the class object and read geometry file
gvec = vec.libvec(os.path.join(current_dir,casefile))
x,y,z = gvec.xyz()
# Preallocate arrays
U_temp = np.zeros((len(y),len(x),N),dtype=np.dtype('f4'))
V_temp = np.zeros((len(y),len(x),N),dtype=np.dtype('f4'))
u_temp = np.zeros((len(x),len(x),N),dtype=np.dtype('f4'))
v_temp = np.zeros((len(x),len(y),N),dtype=np.dtype('f4'))
# Read the individual files into the previously allocated arrays
for idx,current_file in enumerate(file_list):
U,V =gvec.readvec(os.path.join(current_dir,current_file))
U_temp[:,:,idx] = U
V_temp[:,:,idx] = V
del U,V
However this takes seemingly forever so I was wondering if you have any idea how to speed up this process? The code reading the individual files into the array structure can be seen below:
def readvec(self,filename):
# we are supposing for the moment that the naming scheme PIV__vxy.case PIV__vxy.geo not changes should that
# not be the case appropriate changes have to be made to the corresponding file
data_temp = np.loadtxt(filename, dtype=np.dtype('f4'), delimiter=None, converters=None, skiprows=4)
# U value
for i in range(len(self.__y)):
# x value counter
for j in range(len(self.__x)):
# y value counter
self.__U[i,j]=data_temp[i*len(self.__x)+j]
# V value
for i in range(len(self.__y)):
# x value counter
for j in range(len(self.__x)):
# y value counter
self.__V[i,j]=data_temp[len(self.__x)*len(self.__y)+i*len(self.__x)+j]
# W value
if len(self.__z)>1:
for i in range(len(self.__y)):
# x value counter
for j in range(len(self.__xd)):
# y value counter
self.__W[i,j]=data_temp[2*len(self.__x)*len(self.__y)+i*len(self.__x)+j]
return self.__U,self.__V,self.__W
else:
return self.__U,self.__V
Thanks a lot in advance and best regards,
J
It'a bit hard to say without any test input\output to compare against. But i think this would give you the same U\V arrays as your nested for loops in readvec. This method should be considerably faster then the for loops.
U = data[:size_x*size_y].reshape(size_x, size_y)
V = data[size_x*size_y:].reshape(size_x, size_y)
Returning these directly into U_temp and V_temp should also help. Right now you're doing 3(?) copies of your data to get them into U_temp and V_temp
From file to temp_data
From temp_data to self.__U\V
From U\V into U\V_temp
Although my guess is that the two nested for loop, and accessing one element at a time is causing the slowness
Python/Numpy Problem. Final year Physics undergrad... I have a small piece of code that creates an array (essentially an n×n matrix) from a formula. I reshape the array to a single column of values, create a string from that, format it to remove extraneous brackets etc, then output the result to a text file saved in the user's Documents directory, which is then used by another piece of software. The trouble is above a certain value for "n" the output gives me only the first and last three values, with "...," in between. I think that Python is automatically abridging the final result to save time and resources, but I need all those values in the final text file, regardless of how long it takes to process, and I can't for the life of me find how to stop it doing it. Relevant code copied beneath...
import numpy as np; import os.path ; import os
'''
Create a single column matrix in text format from Gaussian Eqn.
'''
save_path = os.path.join(os.path.expandvars("%userprofile%"),"Documents")
name_of_file = 'outputfile' #<---- change this as required.
completeName = os.path.join(save_path, name_of_file+".txt")
matsize = 32
def gaussf(x,y): #defining gaussian but can be any f(x,y)
pisig = 1/(np.sqrt(2*np.pi) * matsize) #first term
sumxy = (-(x**2 + y**2)) #sum of squares term
expden = (2 * (matsize/1.0)**2) # 2 sigma squared
expn = pisig * np.exp(sumxy/expden) # and put it all together
return expn
matrix = [[ gaussf(x,y) ]\
for x in range(-matsize/2, matsize/2)\
for y in range(-matsize/2, matsize/2)]
zmatrix = np.reshape(matrix, (matsize*matsize, 1))column
string2 = (str(zmatrix).replace('[','').replace(']','').replace(' ', ''))
zbfile = open(completeName, "w")
zbfile.write(string2)
zbfile.close()
print completeName
num_lines = sum(1 for line in open(completeName))
print num_lines
Any help would be greatly appreciated!
Generally you should iterate over the array/list if you just want to write the contents.
zmatrix = np.reshape(matrix, (matsize*matsize, 1))
with open(completeName, "w") as zbfile: # with closes your files automatically
for row in zmatrix:
zbfile.writelines(map(str, row))
zbfile.write("\n")
Output:
0.00970926751178
0.00985735189176
0.00999792646484
0.0101306077521
0.0102550302672
0.0103708481917
0.010477736974
0.010575394844
0.0106635442315
.........................
But using numpy we simply need to use tofile:
zmatrix = np.reshape(matrix, (matsize*matsize, 1))
# pass sep or you will get binary output
zmatrix.tofile(completeName,sep="\n")
Output is in the same format as above.
Calling str on the matrix will give you similarly formatted output to what you get when you try to print so that is what you are writing to the file the formatted truncated output.
Considering you are using python2, using xrange would be more efficient that using rane which creates a list, also having multiple imports separated by colons is not recommended, you can simply:
import numpy as np, os.path, os
Also variables and function names should use underscores z_matrix,zb_file,complete_name etc..
You shouldn't need to fiddle with the string representations of numpy arrays. One way is to use tofile:
zmatrix.tofile('output.txt', sep='\n')
I wrote a code to read *.las file in Python. *las file are special ascii file where each line is x,y,z value of points.
My function read N. number of points and check if they are inside a polygon with points_inside_poly.
I have the following questions:
When I arrive at the end of the file I get this message: LASException: LASError in "LASReader_GetPointAt": point subscript out of range because the number of points is under the chunk dimension. I cannot figure how to resolve this problem.
a = [file_out.write(c[m]) for m in xrange(len(c))] I use a = in order to avoid video print. Is it correct?
In c = [chunk[l] for l in index] I create a new list c because I am not sure that replacing a new chunk is the smart solution (ex: chunk = [chunk[l] for l in index]).
In a statement if else...else I use pass. Is this the right choice?
Really thank for help. It's important to improve listen suggestions from expertise!!!!
import shapefile
import numpy
import numpy as np
from numpy import nonzero
from liblas import file as lasfile
from shapely.geometry import Polygon
from matplotlib.nxutils import points_inside_poly
# open shapefile (polygon)
sf = shapefile.Reader(poly)
shapes = sf.shapes()
# extract vertices
verts = np.array(shapes[0].points,float)
# open las file
f = lasfile.File(inFile,None,'r') # open LAS
# read "header"
h = f.header
# create a file where store the points
file_out = lasfile.File(outFile,mode='w',header= h)
chunkSize = 100000
for i in xrange(0,len(f), chunkSize):
chunk = f[i:i+chunkSize]
x,y = [],[]
# extraxt x and y value for each points
for p in xrange(len(chunk)):
x.append(chunk[p].x)
y.append(chunk[p].y)
# zip all points
points = np.array(zip(x,y))
# create an index where are present the points inside the polygon
index = nonzero(points_inside_poly(points, verts))[0]
# if index is not empty do this otherwise "pass"
if len(index) != 0:
c = [chunk[l] for l in index] #Is It correct to create a new list or i can replace chunck?
# save points
a = [file_out.write(c[m]) for m in xrange(len(c))] #use a = in order to avoid video print. Is it correct?
else:
pass #Is It correct to use pass?
f.close()
file_out.close()
code proposed by #Roland Smith and changed by Gianni
f = lasfile.File(inFile,None,'r') # open LAS
h = f.header
# change the software id to libLAS
h.software_id = "Gianni"
file_out = lasfile.File(outFile,mode='w',header= h)
f.close()
sf = shapefile.Reader(poly) #open shpfile
shapes = sf.shapes()
for i in xrange(len(shapes)):
verts = np.array(shapes[i].points,float)
inside_points = [p for p in lasfile.File(inFile,None,'r') if pnpoly(p.x, p.y, verts)]
for p in inside_points:
file_out.write(p)
f.close()
file_out.close()
i used these solution:
1) reading f = lasfile.File(inFile,None,'r') and after the read head because i need in the *.las output file
2) close the file
3) i used inside_points = [p for p in lasfile.File(inFile,None,'r') if pnpoly(p.x, p.y, verts)] instead of
with lasfile.File(inFile, None, 'r') as f:
... inside_points = [p for p in f if pnpoly(p.x, p.y, verts)]
...
because i always get this error message
Traceback (most recent call last):
File "", line 1, in
AttributeError: _exit_
Regarding (1):
First, why are you using chunks? Just use the lasfile as an iterator (as shown in the tutorial), and process the points one at a time. The following should get write all the points inside the polygon to the output file, by using the pnpoly function in a list comprehension instead of points_inside_poly.
from liblas import file as lasfile
import numpy as np
from matplotlib.nxutils import pnpoly
with lasfile.File(inFile, None, 'r') as f:
inside_points = (p for p in f if pnpoly(p.x, p.y, verts))
with lasfile.File(outFile,mode='w',header= h) as file_out:
for p in inside_points:
file_out.write(p)
The five lines directly above should replace the whole big for-loop. Let's go over them one-by-one:
with lasfile.File(inFile...: Using this construction means that the file will be closed automatically when the with block finishes.
Now comes the good part, the generator expression that does all the work (the part between ()). It iterates over the input file (for p in f). Every point that is inside the polygon (if pnpoly(p.x, p.y, verts)) is added to the generator.
We use another with block for the output file
and all the points (for p in inside_points, this is were the generator is used)
are written to the output file (file_out.write(p))
Because this method only adds the points that are inside the polygon to the list, you don't waste memory on points that you don't need!
You should only use chunks if the method shown above doesn't work.
When using chunks you should handle the exception properly. E.g:
from liblas import LASException
chunkSize = 100000
for i in xrange(0,len(f), chunkSize):
try:
chunk = f[i:i+chunkSize]
except LASException:
rem = len(f)-i
chunk = f[i:i+rem]
Regarding (2): Sorry, but I fail to understand what you are trying to accomplish here. What do you mean by "video print"?
Regarding (3): since you are not using the original chunk anymore, you can re-use the name. Realize that in python a "variable" is just a nametag.
Regarding (4): you aren't using the else, so leave it out completely.