import an array in python - python

How can I import an array to python (numpy.arry) from a file and that way the file must be written if it doesn't already exist.
For example, save out a matrix to a file then load it back.

Checkout the entry on the numpy example list. Here is the entry on .loadtxt()
>>> from numpy import *
>>>
>>> data = loadtxt("myfile.txt") # myfile.txt contains 4 columns of numbers
>>> t,z = data[:,0], data[:,3] # data is 2D numpy array
>>>
>>> t,x,y,z = loadtxt("myfile.txt", unpack=True) # to unpack all columns
>>> t,z = loadtxt("myfile.txt", usecols = (0,3), unpack=True) # to select just a few columns
>>> data = loadtxt("myfile.txt", skiprows = 7) # to skip 7 rows from top of file
>>> data = loadtxt("myfile.txt", comments = '!') # use '!' as comment char instead of '#'
>>> data = loadtxt("myfile.txt", delimiter=';') # use ';' as column separator instead of whitespace
>>> data = loadtxt("myfile.txt", dtype = int) # file contains integers instead of floats

Another option is numpy.genfromtxt, e.g:
import numpy as np
data = np.genfromtxt("myfile.dat",delimiter=",")
This will make data a numpy array with as many rows and columns as are in your file

(I know the question is old, but I think this might be good as a reference for people with similar questions)
If you want to load data from an ASCII/text file (which has the benefit or being more or less human-readable and easy to parse in other software), numpy.loadtxt is probably what you want:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html
If you just want to quickly save and load numpy arrays/matrices to and from a file, take a look at numpy.save and numpy.load:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.save.html
http://docs.scipy.org/doc/numpy/reference/generated/numpy.load.html

In Python, Storing a bare python list as a numpy.array and then saving it out to file, then loading it back, and converting it back to a list takes some conversion tricks. The confusion is because python lists are not at all the same thing as numpy.arrays:
import numpy as np
foods = ['grape', 'cherry', 'mango']
filename = "./outfile.dat.npy"
np.save(filename, np.array(foods))
z = np.load(filename).tolist()
print("z is: " + str(z))
This prints:
z is: ['grape', 'cherry', 'mango']
Which is stored on disk as the filename: outfile.dat.npy
The important methods here are the tolist() and np.array(...) conversion functions.

Have a look at SciPy cookbook. It should give you an idea of some basic methods to import /export data.
If you save/load the files from your own Python programs, you may also want to consider the Pickle module, or cPickle.

Related

How to turn items from extracted data to numbers for plotting in python?

So i have a text document with a lot of values from calculations. I have extracted all the data and stored it in an array, but they are not numbers that I can use for anything. I want to use the number to plot them in a graph, but the elements in the array are text-strings, how would i turn them into numbers and remove unneccesary signs like commas and n= for instance?
Here is code, and under is my print statement.
import numpy as np
['n=1', 'n=2', 'n=3', 'n=4', 'n=5', 'n=6', 'n=7', 'n=8', 'n=9', 'n=10', 'n=11', 'n=12', 'n=13', 'n=14', 'n=15', 'n=16', 'n=17', 'n=18', 'n=19'])
I'd use the conversion method presented in this post within the extract function, so e.g.
...
delta_x.append(strtofloat(words[1]))
...
where you might as well do the conversion inline (my strtofloat is a function you'd have to write based on mentioned post) and within a try/except block, so failed conversions are just ignored from your list.
To make it more consistent, any conversion error should discard the whole line affected, so you might want to use intermediate variables and a check for each field.
Btw. I noticed the argument to the extract function, it would seem logical to make the argument a string containing the file name from which to extract the data?
EDIT: as a side note, you might want to look into pandas, which is a library specialised in numerical data handling. Depending on the format of your data file there are probably standard functions to read your whole file into a DataFrame (which is a kind of super-charged array class which can handle a lot of data processing as well) in a single command.
I would consider using regular expression:
import re
match_number = re.compile('-?[0-9]+\.?[0-9]*(?:[Ee]-?[0-9]+)?')
for line in infile:
words = line.split()
new_delta_x = float(re.search(match_number, words[1]).group())
new_abs_error = float(re.search(match_number, words[7]).group())
new_n = int(re.search(match_number, words[10]).group())
delta_x.append(new_delta_x)
abs_error.append(new_abs_error)
n.append(new_n)
But it seems like your data is already in csv format. So try using pandas.
Then read data into dataframe without header (column names will be integers).
import numpy as np
import pandas as pd
df = pd.read_csv('approx_derivative_sine.txt', header=None)
delta_x = df[1].to_numpy()
abs_error = df[7].to_numpy()
# if n is always number of the row
n = df.index.to_numpy(dtype=int)
# if n is always in the form 'n=<integer>'
n = df[10].apply(lambda x: x.strip()[2:]).to_numpy(dtype=int)
If you could post a few rows of your approx_derivative_sine.txt file, that would be useful.
From the given array in the question, If you would like to remove the 'n=' and convert each element to an integer, you may try the following.
import numpy as np
array = np.array(['n=1', 'n=2', 'n=3', 'n=4', 'n=5', 'n=6', 'n=7', 'n=8', 'n=9',
'n=10', 'n=11', 'n=12', 'n=13', 'n=14', 'n=15', 'n=16', 'n=17', 'n=18', 'n=19'])
array = [int(i.replace('n=', '')) for i in array]
print(array)

Request XML file from URL and save as CSV using python

I am new in python coding and I would like to get XML file from a server, parse it and save to csv file.
2 parts are ok, I am able to get the file and parse it, but there is an issue with saving as a csv.
The code:
import requests
import numpy as np
hu = requests.get('https://www.ecb.europa.eu/stats/eurofxref/eurofxref-daily.xml', stream=True)
from xml.etree import ElementTree as ET
tree = ET.parse(hu.raw)
root = tree.getroot()
namespaces = {'ex': 'http://www.ecb.int/vocabulary/2002-08-01/eurofxref'}
for cube in root.findall('.//ex:Cube[#currency]', namespaces=namespaces):
np.savetxt('data.csv', (cube.attrib['currency'], cube.attrib['rate']), delimiter=',')
Error I get is: mismatch between array dtype and format specifier.
It probably means I get data and try to save it as array, and there appears a mismatch.
But i am not sure how to fix the problem and to not have a mismatch.
Thank you
from the docs, your second argument in np.savetext should be a tuple of equal sized arrays. What you are providing are strings:
>>> x = y = z = np.arange(0.0,5.0,1.0)
>>> np.savetxt('test.out', x, delimiter=',') # X is an array
>>> np.savetxt('test.out', (x,y,z)) # x,y,z equal sized 1D arrays
>>> np.savetxt('test.out', x, fmt='%1.4e') # use exponential notation
You'll need to gather all of the concurrency and rate values into arrays, then save as csv:
concurrency, rate = [], []
for cube in root.findall('.//ex:Cube[#currency]', namespaces=namespaces):
concurrency.append(cube.attrib['concurrency'])
rate.append(cube.attrib['rate'])
np.savetext('file.csv', (concurrency, rate), delimeter='c')

Numpy put .txt integervalues into an array [duplicate]

How can I import an array to python (numpy.arry) from a file and that way the file must be written if it doesn't already exist.
For example, save out a matrix to a file then load it back.
Checkout the entry on the numpy example list. Here is the entry on .loadtxt()
>>> from numpy import *
>>>
>>> data = loadtxt("myfile.txt") # myfile.txt contains 4 columns of numbers
>>> t,z = data[:,0], data[:,3] # data is 2D numpy array
>>>
>>> t,x,y,z = loadtxt("myfile.txt", unpack=True) # to unpack all columns
>>> t,z = loadtxt("myfile.txt", usecols = (0,3), unpack=True) # to select just a few columns
>>> data = loadtxt("myfile.txt", skiprows = 7) # to skip 7 rows from top of file
>>> data = loadtxt("myfile.txt", comments = '!') # use '!' as comment char instead of '#'
>>> data = loadtxt("myfile.txt", delimiter=';') # use ';' as column separator instead of whitespace
>>> data = loadtxt("myfile.txt", dtype = int) # file contains integers instead of floats
Another option is numpy.genfromtxt, e.g:
import numpy as np
data = np.genfromtxt("myfile.dat",delimiter=",")
This will make data a numpy array with as many rows and columns as are in your file
(I know the question is old, but I think this might be good as a reference for people with similar questions)
If you want to load data from an ASCII/text file (which has the benefit or being more or less human-readable and easy to parse in other software), numpy.loadtxt is probably what you want:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html
If you just want to quickly save and load numpy arrays/matrices to and from a file, take a look at numpy.save and numpy.load:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.save.html
http://docs.scipy.org/doc/numpy/reference/generated/numpy.load.html
In Python, Storing a bare python list as a numpy.array and then saving it out to file, then loading it back, and converting it back to a list takes some conversion tricks. The confusion is because python lists are not at all the same thing as numpy.arrays:
import numpy as np
foods = ['grape', 'cherry', 'mango']
filename = "./outfile.dat.npy"
np.save(filename, np.array(foods))
z = np.load(filename).tolist()
print("z is: " + str(z))
This prints:
z is: ['grape', 'cherry', 'mango']
Which is stored on disk as the filename: outfile.dat.npy
The important methods here are the tolist() and np.array(...) conversion functions.
Have a look at SciPy cookbook. It should give you an idea of some basic methods to import /export data.
If you save/load the files from your own Python programs, you may also want to consider the Pickle module, or cPickle.

Load many files into one array - Python

So, I have to load many .mat files with some features to plot it.
Each array to be plotted is loaded into a dictionary:
import numpy as np
import scipy.io as io
dict1 = io.loadmat('file1.MAT')
dict2 = io.loadmat('file2.MAT') # type = dict
dict3 = io.loadmat('file3.MAT')
...
so I have to take the dictionarie's element I need, to plot after:
array1 = dict1['data']
array2 = dict2['data']
array3 = dict3['data']
...
After this, I can plot the data. It works, but looks dumb to me (If I have 100 vectors, it will take some time...). Is there a better way to make this task?
Given that you are talking about dealing with many matrices, you should manage them as a collection. First, let's define your set of files. It could be a tuple, or a list:
Matrix_files = [ 'fileA.MAT', 'file1.MAT', 'no pattern to these names.MAT' ]
If they happen to have a pattern, you might try generating the names:
Matrix_files = [ 'file{}.MAT'.format(num) for num in range(1,4) ]
If they share a common location, you might consider using one of the various directory scanning approaches (opendir or glob, to name two).
Once you have a list of filenames, you can read the dictionaries in:
def read_matrix(filespec):
from scipy.io import loadmat
md = loadmat(filespec)
# process md
return md
With that, you can either get all the data, or get some of the data:
All_data = [read_matrix(f) for f in Matrix_files]
Some_data = [read_matrix(f)['data'] for f in Matrix_files]
If you only care about the data, you could skip the function definition:
from scipy.io import loadmat
Just_data = [loadmat(f)['data'] for f in Matrix_files]

How do I get a text output from a string created from an array to remain unshortened?

Python/Numpy Problem. Final year Physics undergrad... I have a small piece of code that creates an array (essentially an n×n matrix) from a formula. I reshape the array to a single column of values, create a string from that, format it to remove extraneous brackets etc, then output the result to a text file saved in the user's Documents directory, which is then used by another piece of software. The trouble is above a certain value for "n" the output gives me only the first and last three values, with "...," in between. I think that Python is automatically abridging the final result to save time and resources, but I need all those values in the final text file, regardless of how long it takes to process, and I can't for the life of me find how to stop it doing it. Relevant code copied beneath...
import numpy as np; import os.path ; import os
'''
Create a single column matrix in text format from Gaussian Eqn.
'''
save_path = os.path.join(os.path.expandvars("%userprofile%"),"Documents")
name_of_file = 'outputfile' #<---- change this as required.
completeName = os.path.join(save_path, name_of_file+".txt")
matsize = 32
def gaussf(x,y): #defining gaussian but can be any f(x,y)
pisig = 1/(np.sqrt(2*np.pi) * matsize) #first term
sumxy = (-(x**2 + y**2)) #sum of squares term
expden = (2 * (matsize/1.0)**2) # 2 sigma squared
expn = pisig * np.exp(sumxy/expden) # and put it all together
return expn
matrix = [[ gaussf(x,y) ]\
for x in range(-matsize/2, matsize/2)\
for y in range(-matsize/2, matsize/2)]
zmatrix = np.reshape(matrix, (matsize*matsize, 1))column
string2 = (str(zmatrix).replace('[','').replace(']','').replace(' ', ''))
zbfile = open(completeName, "w")
zbfile.write(string2)
zbfile.close()
print completeName
num_lines = sum(1 for line in open(completeName))
print num_lines
Any help would be greatly appreciated!
Generally you should iterate over the array/list if you just want to write the contents.
zmatrix = np.reshape(matrix, (matsize*matsize, 1))
with open(completeName, "w") as zbfile: # with closes your files automatically
for row in zmatrix:
zbfile.writelines(map(str, row))
zbfile.write("\n")
Output:
0.00970926751178
0.00985735189176
0.00999792646484
0.0101306077521
0.0102550302672
0.0103708481917
0.010477736974
0.010575394844
0.0106635442315
.........................
But using numpy we simply need to use tofile:
zmatrix = np.reshape(matrix, (matsize*matsize, 1))
# pass sep or you will get binary output
zmatrix.tofile(completeName,sep="\n")
Output is in the same format as above.
Calling str on the matrix will give you similarly formatted output to what you get when you try to print so that is what you are writing to the file the formatted truncated output.
Considering you are using python2, using xrange would be more efficient that using rane which creates a list, also having multiple imports separated by colons is not recommended, you can simply:
import numpy as np, os.path, os
Also variables and function names should use underscores z_matrix,zb_file,complete_name etc..
You shouldn't need to fiddle with the string representations of numpy arrays. One way is to use tofile:
zmatrix.tofile('output.txt', sep='\n')

Categories