I would like to save an array with shape (5,2), the array named sorted_cube_station_list.
In the print it looks ok, but when I save it with numpy.tofile and later read it with numpy.fromfile it becames a 1d array
Can you help me with that?
import numpy as num
nx=5
ny=5
nz=5
stations=['L001','L002','L003','L004','L005']
for x in range(nx):
for y in range (ny):
for z in range (nz):
cube_station_list = []
i=-1
for sta in stations:
i=i+1
cube=[int(i), num.random.randint(2500, size=1)[0]]
cube_station_list.append(cube)
cub_station_list_arr=num.asarray(cube_station_list)
sorted_cube_station_list_arr=cub_station_list_arr[cub_station_list_arr[:, 1].argsort()]
print x,y,z, sorted_cube_station_list_arr
num.ndarray.tofile(sorted_cube_station_list_arr,str(x)+'_'+str(y)+'_'+str(z)
I suggest you use np.save
a = np.ones(16).reshape([8, 2])
np.save("fileName.npy", a)
See the docs: first parameter must not be the variable you want to save, but the path to the file where you want to save it. Hence the error you got when using np.save(yourArray)
You can load the saved array using np.load(pathToArray)
Related
I am trying to iterate through a CSV file and create a numpy array for each row in the file, where the first column represents the x-coordinates and the second column represents the y-coordinates. I then am trying to append each array into a master array and return it.
import numpy as np
thedoc = open("data.csv")
headers = thedoc.readline()
def generatingArray(thedoc):
masterArray = np.array([])
for numbers in thedoc:
editDocument = numbers.strip().split(",")
x = editDocument[0]
y = editDocument[1]
createdArray = np.array((x, y))
masterArray = np.append([createdArray])
return masterArray
print(generatingArray(thedoc))
I am hoping to see an array with all the CSV info in it. Instead, I receive an error: "append() missing 1 required positional argument: 'values'
Any help on where my error is and how to fix it is greatly appreciated!
Numpy arrays don't magically grow in the same way that python lists do. You need to allocate the space for the array in your "masterArray = np.array([])" function call before you add everything to it.
The best answer is to import directly to a numpy array using something like genfromtxt (https://docs.scipy.org/doc/numpy-1.10.1/user/basics.io.genfromtxt.html) but...
If you know the number of lines you're reading in, or you can get it using something like this.
file_length = len(open("data.csv").readlines())
Then you can preallocate the numpy array to do something like this:
masterArray = np.empty((file_length, 2))
for i, numbers in enumerate(thedoc):
editDocument = numbers.strip().split(",")
x = editDocument[0]
y = editDocument[1]
masterArray[i] = [x, y]
I would recommend the first method but if you're lazy then you can always just build a python list and then make a numpy array.
masterArray = []
for numbers in thedoc:
editDocument = numbers.strip().split(",")
x = editDocument[0]
y = editDocument[1]
createdArray = [x, y]
masterArray.append(createdArray)
return np.array(masterArray)
I am new in python coding and I would like to get XML file from a server, parse it and save to csv file.
2 parts are ok, I am able to get the file and parse it, but there is an issue with saving as a csv.
The code:
import requests
import numpy as np
hu = requests.get('https://www.ecb.europa.eu/stats/eurofxref/eurofxref-daily.xml', stream=True)
from xml.etree import ElementTree as ET
tree = ET.parse(hu.raw)
root = tree.getroot()
namespaces = {'ex': 'http://www.ecb.int/vocabulary/2002-08-01/eurofxref'}
for cube in root.findall('.//ex:Cube[#currency]', namespaces=namespaces):
np.savetxt('data.csv', (cube.attrib['currency'], cube.attrib['rate']), delimiter=',')
Error I get is: mismatch between array dtype and format specifier.
It probably means I get data and try to save it as array, and there appears a mismatch.
But i am not sure how to fix the problem and to not have a mismatch.
Thank you
from the docs, your second argument in np.savetext should be a tuple of equal sized arrays. What you are providing are strings:
>>> x = y = z = np.arange(0.0,5.0,1.0)
>>> np.savetxt('test.out', x, delimiter=',') # X is an array
>>> np.savetxt('test.out', (x,y,z)) # x,y,z equal sized 1D arrays
>>> np.savetxt('test.out', x, fmt='%1.4e') # use exponential notation
You'll need to gather all of the concurrency and rate values into arrays, then save as csv:
concurrency, rate = [], []
for cube in root.findall('.//ex:Cube[#currency]', namespaces=namespaces):
concurrency.append(cube.attrib['concurrency'])
rate.append(cube.attrib['rate'])
np.savetext('file.csv', (concurrency, rate), delimeter='c')
I have a bit trouble with some data stored in a text file on hand for regression analysis using Python.
The data are stored in the format that look like this:
2104,3,399900 1600,3,329900 2400,3,369000 ....
I need to do some analysis like finding mean by this:
(2104+1600+...)/number of data
I think the appropriate steps is to store the data into array. But I have no idea how to store it. I think of two ways to do so. The first one is to set 3 array that stores like
a=[2104 1600 2400 ...] b=[3 3 3 ...] c=[399900 329900 36000 ...]
The second way is to store in
a=[2104 3 399900], b=[1600 3 329900] and so on.
Which one is better?
Also, how to write code that allows the data can be stored into array? I think of like this:
with open("file.txt", "r") as ins:
array = []
elt.strip(',."\'?!*:') for line in ins:
array.append(line)
Is that correct?
You could use :
with open('data.txt') as data:
substrings = data.read().split()
values = [map(int, substring.split(',')) for substring in substrings]
average = sum([a for a, b, c in values]) / float(len(values))
print average
With this data.txt, :
2104,3,399900 1600,3,329900 2400,3,369000
2105,3,399900 1601,3,329900 2401,3,369000
It outputs :
2035.16666667
Using pandas and numpy you can get the data into an array as follows:
In [37]: data = "2104,3,399900 1600,3,329900 2400,3,369000"
In [38]: d = pd.read_csv(StringIO.StringIO(data), sep=',| ', header=None, index_col=None, engine="python")
In [39]: d.values.reshape(3, d.shape[1]/3)
Out[39]:
array([[ 2104, 3, 399900],
[ 1600, 3, 329900],
[ 2400, 3, 369000]])
Instead of having multiple arrays a, b, c... you could store your data as an array of arrays (a 2 dimensional array). For example:
[[2104,3,399900],
[1600,3,329900],
[2400,3,369000]...]
This way you don't have to deal with dynamically naming your arrays. How you store your data, i.e. 3 * array of length n or n * array of length 3 is up to you. I would prefer the second way. To read the data into your array you should then use the split() function, which will split your input into an array. So in your case:
with open("file.txt", "r") as ins:
tmp = ins.read().split(" ")
array = [i.split(",") for i in tmp]
>>> array
[['2104', '3', '399900'], ['1600', '3', '329900'], ['2400', '3', '369000']]
Edit:
To find the mean e.g. for the first element in each list you could do the following:
arraymean = sum([int(i[0]) for i in array]) / len(array)
Where the 0 in i[0] specifies the first element in each list. Note that this code uses list comprehension, which you can learn more about in this post if you want to.
Also this code stores the values in the array as strings, hence the cast to int in the part to get the mean. If you want to store the data as int directly just edit the part in the file reading section:
array = [[int(j) for j in i.split(",")] for i in tmp]
This a quick solution without error checking (using a list comprehension technique, PEP202). But if your file has a consistent format you can do the following:
import numpy as np
a = np.array([np.array(i.split(",")).astype("float") for i in open("example.txt").read().split(" ")])
Should you print it:
print(a)
print("Mean of column 0: ", np.mean(a[:, 0]))
You'll obtain the following:
[[ 2.10400000e+03 3.00000000e+00 3.99900000e+05]
[ 1.60000000e+03 3.00000000e+00 3.29900000e+05]
[ 2.40000000e+03 3.00000000e+00 3.69000000e+05]]
Mean of column 0: 2034.66666667
Notice how, in the code snippet, specified the "," as separator inside triplet, and the space " " as separator between triplets. This is the exact contents of the file I used as an example:
2104,3,399900 1600,3,329900 2400,3,369000
I'm trying to load a large number of files saved in the Ensight gold format into a numpy array. In order to conduct this read I've written my own class libvec which reads the geometry file and then preallocates the arrays which python will use to save the data as shown in the code below.
N = len(file_list)
# Create the class object and read geometry file
gvec = vec.libvec(os.path.join(current_dir,casefile))
x,y,z = gvec.xyz()
# Preallocate arrays
U_temp = np.zeros((len(y),len(x),N),dtype=np.dtype('f4'))
V_temp = np.zeros((len(y),len(x),N),dtype=np.dtype('f4'))
u_temp = np.zeros((len(x),len(x),N),dtype=np.dtype('f4'))
v_temp = np.zeros((len(x),len(y),N),dtype=np.dtype('f4'))
# Read the individual files into the previously allocated arrays
for idx,current_file in enumerate(file_list):
U,V =gvec.readvec(os.path.join(current_dir,current_file))
U_temp[:,:,idx] = U
V_temp[:,:,idx] = V
del U,V
However this takes seemingly forever so I was wondering if you have any idea how to speed up this process? The code reading the individual files into the array structure can be seen below:
def readvec(self,filename):
# we are supposing for the moment that the naming scheme PIV__vxy.case PIV__vxy.geo not changes should that
# not be the case appropriate changes have to be made to the corresponding file
data_temp = np.loadtxt(filename, dtype=np.dtype('f4'), delimiter=None, converters=None, skiprows=4)
# U value
for i in range(len(self.__y)):
# x value counter
for j in range(len(self.__x)):
# y value counter
self.__U[i,j]=data_temp[i*len(self.__x)+j]
# V value
for i in range(len(self.__y)):
# x value counter
for j in range(len(self.__x)):
# y value counter
self.__V[i,j]=data_temp[len(self.__x)*len(self.__y)+i*len(self.__x)+j]
# W value
if len(self.__z)>1:
for i in range(len(self.__y)):
# x value counter
for j in range(len(self.__xd)):
# y value counter
self.__W[i,j]=data_temp[2*len(self.__x)*len(self.__y)+i*len(self.__x)+j]
return self.__U,self.__V,self.__W
else:
return self.__U,self.__V
Thanks a lot in advance and best regards,
J
It'a bit hard to say without any test input\output to compare against. But i think this would give you the same U\V arrays as your nested for loops in readvec. This method should be considerably faster then the for loops.
U = data[:size_x*size_y].reshape(size_x, size_y)
V = data[size_x*size_y:].reshape(size_x, size_y)
Returning these directly into U_temp and V_temp should also help. Right now you're doing 3(?) copies of your data to get them into U_temp and V_temp
From file to temp_data
From temp_data to self.__U\V
From U\V into U\V_temp
Although my guess is that the two nested for loop, and accessing one element at a time is causing the slowness
Python/Numpy Problem. Final year Physics undergrad... I have a small piece of code that creates an array (essentially an n×n matrix) from a formula. I reshape the array to a single column of values, create a string from that, format it to remove extraneous brackets etc, then output the result to a text file saved in the user's Documents directory, which is then used by another piece of software. The trouble is above a certain value for "n" the output gives me only the first and last three values, with "...," in between. I think that Python is automatically abridging the final result to save time and resources, but I need all those values in the final text file, regardless of how long it takes to process, and I can't for the life of me find how to stop it doing it. Relevant code copied beneath...
import numpy as np; import os.path ; import os
'''
Create a single column matrix in text format from Gaussian Eqn.
'''
save_path = os.path.join(os.path.expandvars("%userprofile%"),"Documents")
name_of_file = 'outputfile' #<---- change this as required.
completeName = os.path.join(save_path, name_of_file+".txt")
matsize = 32
def gaussf(x,y): #defining gaussian but can be any f(x,y)
pisig = 1/(np.sqrt(2*np.pi) * matsize) #first term
sumxy = (-(x**2 + y**2)) #sum of squares term
expden = (2 * (matsize/1.0)**2) # 2 sigma squared
expn = pisig * np.exp(sumxy/expden) # and put it all together
return expn
matrix = [[ gaussf(x,y) ]\
for x in range(-matsize/2, matsize/2)\
for y in range(-matsize/2, matsize/2)]
zmatrix = np.reshape(matrix, (matsize*matsize, 1))column
string2 = (str(zmatrix).replace('[','').replace(']','').replace(' ', ''))
zbfile = open(completeName, "w")
zbfile.write(string2)
zbfile.close()
print completeName
num_lines = sum(1 for line in open(completeName))
print num_lines
Any help would be greatly appreciated!
Generally you should iterate over the array/list if you just want to write the contents.
zmatrix = np.reshape(matrix, (matsize*matsize, 1))
with open(completeName, "w") as zbfile: # with closes your files automatically
for row in zmatrix:
zbfile.writelines(map(str, row))
zbfile.write("\n")
Output:
0.00970926751178
0.00985735189176
0.00999792646484
0.0101306077521
0.0102550302672
0.0103708481917
0.010477736974
0.010575394844
0.0106635442315
.........................
But using numpy we simply need to use tofile:
zmatrix = np.reshape(matrix, (matsize*matsize, 1))
# pass sep or you will get binary output
zmatrix.tofile(completeName,sep="\n")
Output is in the same format as above.
Calling str on the matrix will give you similarly formatted output to what you get when you try to print so that is what you are writing to the file the formatted truncated output.
Considering you are using python2, using xrange would be more efficient that using rane which creates a list, also having multiple imports separated by colons is not recommended, you can simply:
import numpy as np, os.path, os
Also variables and function names should use underscores z_matrix,zb_file,complete_name etc..
You shouldn't need to fiddle with the string representations of numpy arrays. One way is to use tofile:
zmatrix.tofile('output.txt', sep='\n')