How to extract data in *.txt file and use it in Python?

How to extract data in *.txt file and use it in Python? - python

I have a Python code that converts (u,v)to (s,d):
def d2r(d):
r = d * math.pi / 180.0
return (r)
def r2d(r):
d = r * 180.0 / math.pi
return (d)
def sd2uv(s,d):
r = d2r(d)
u = s * math.sin(r)
v = s * math.cos(r)
return (u,v)
def uv2sd(u,v):
s = math.sqrt((u*u)+(v*v))
r = math.atan2(u,v)
d = r2d(r)
if d < 0:
d = 360 + d
return (s,d)
The u data are stored in u.txt, each line has one number; the v data are stored in v.txt and each line has one number too. My question is how to extract data from these two files and then use them in the Python code to print (s,d)? Thanks!

I think this should do it:
with open('u.txt') as uf, open('v.txt') as vf:
for u,v in zip(uf,vf):
print uv2sd(float(u),float(v))

from itertools import izip, imap
with open('u.txt') as u_data, open('v.txt') as v_data:
for u,v in imap(float, izip(u_data, v_data)):
print uv2sd(u, v)

I can imagine two ways of doing this:
Read all of the data from each file into two separate lists. Iterate through both lists and compute each value until you reach the end of one of the lists.
Read one line from each file at a time. Compute the value you are looking for. Repeat until you have exhausted both files.
The first point has the advantage of saving the data for subsequent use (if needed) without having to open & read the files again. This may not work well at all if you have a very large data set in the file.
The second point has the advantage of saving some memory if you only need to use the data once in the program. This could be slower if you need to use the data over-and-over again.
The first way may look like this:
with open('u.txt') as u_file, open('v.txt') as v_file:
u_values = u_file.readlines()
v_values = v_file.readlines()
for u, v in zip(u_values, v_values):
print uv2sd(float(u), float(v))
# We can use u_values and v_values again if we need to now
The second way is what Akavall and gnibbler came up with.

Related

Nested for loop producing more number of values than expected-Python

Background:I have two catalogues consisting of positions of spatial objects. My aim is to find the similar ones in both catalogues with a maximum difference in angular distance of certain value. One of them is called bss and another one is called super.
Here is the full code I wrote
import numpy as np
def crossmatch(bss_cat, super_cat, max_dist):
matches=[]
no_matches=[]
def find_closest(bss_cat,super_cat):
dist_list=[]
def angular_dist(ra1, dec1, ra2, dec2):
r1 = np.radians(ra1)
d1 = np.radians(dec1)
r2 = np.radians(ra2)
d2 = np.radians(dec2)
a = np.sin(np.abs(d1-d2)/2)**2
b = np.cos(d1)*np.cos(d2)*np.sin(np.abs(r1 - r2)/2)**2
rad = 2*np.arcsin(np.sqrt(a + b))
d = np.degrees(rad)
return d
for i in range(len(bss_cat)): #The problem arises here
for j in range(len(super_cat)):
distance = angular_dist(bss_cat[i][1], bss_cat[i][2], super_cat[j][1], super_cat[j][2]) #While this is supposed to produce single floating point values, it produces numpy.ndarray consisting of three entries
dist_list.append(distance) #This list now contains numpy.ndarrays instead of numpy.float values
for k in range(len(dist_list)):
if dist_list[k] < max_dist:
element = (bss_cat[i], super_cat[j], dist_list[k])
matches.append(element)
else:
element = bss_cat[i]
no_matches.append(element)
return (matches,no_matches)
When put seperately, the function angular_dist(ra1, dec1, ra2, dec2) produces a single numpy.float value as expected. But when used inside the for loop in this crossmatch(bss_cat, super_cat, max_dist) function, it produces numpy.ndarrays instead of numpy.float. I've stated this inside the code also. I don't know where the code goes wrong. Please help

Music21 Analyze Key always returns c minor?

I've been trying to use the Python module Music21 to try and get the key from a set of chords, but no matter what I put in it always seems to return c minor. Any ideas what I'm doing wrong?
I've tried a variety of input strings, the print statement spits out all the right chord names but the resulting key is always c minor!
I'm using Python 3.7.4 on Windows with VSCode.
string = 'D, Em, F#m, G, A, Bm'
s = stream.Stream()
for c in string.split(','):
print(harmony.ChordSymbol(c).pitchedCommonName)
s.append(harmony.ChordSymbol(c))
key = s.analyze('key')
print(key)

It works if you give the ChordSymbol some length of time. The analysis weights the components by time, so a time of zero (default for ChordSymbols) will give you nonsense.
d = harmony.ChordSymbol('D')
d.quarterLength = 2
s = stream.Stream([d])
s.analyze('key')

It looks like music21 Analyze is not working ok with ChordSymbol.
As an alternative, you can manually set all the chord's notes, and analyze that. The code:
string = 'D, Em, F#m, G, A, Bm'
s = stream.Stream()
for d in string.split(','):
print(harmony.ChordSymbol(d).pitchedCommonName)
for p in harmony.ChordSymbol(d).pitches:
n = note.Note()
n.pitch = p
s.append(n)
key = s.analyze('key')
print(key)
returns a D major key, as expected.

Is that way of writing/reading a solution safe?

I need to store a solution of expensive FEM calculation to use it in further analysis. Browsing through tutorials I have so far discovered, that I may store my results like this:
from fenics import *
mesh = Mesh('mesh/UnitSquare8x8.xml')
V = FunctionSpace(mesh, 'P', 1)
u = TrialFunction(V)
v = TestFunction(V)
f = Constant(-6.0)
a = dot(grad(u), grad(v))*dx
L = f*v*dx
u_D = Expression('1 + x[0]*x[0] + 2*x[1]*x[1]', degree=2)
def boundary(x, on_boundary):
return on_boundary
bc = DirichletBC(V, u_D, boundary)
A = assemble(a)
b = assemble(L)
bc.apply(A, b)
u = Function(V)
solver = KrylovSolver("cg", "ilu")
solver.solve(A, u.vector(), b)
File('solution.xml') << u.vector()
and later load them like this:
from fenics import *
mesh = Mesh('mesh/UnitSquare8x8.xml')
V = FunctionSpace(mesh, 'P', 1)
u = Function(V)
File('solution.xml') >> u.vector()
Unfortunately I hardly know what exactly I am doing here. Is that a proper way of storing and loading calculated results? Is order of elements in u.vector() (for the same mesh file) fixed within/between different FEniCS versions, or it is just an implementation detail which may change any time? If it is unsafe, then what is the proper way of doing so?
I have found another (possibly even more dangerous) solution. I may use VALUES = u.vector().get_local() and u.vector().set_local(VALUES) methods, as VALUES is a numpy array which I may easily store and load.

No, according to answer to Is the order of Vector elements preserved between runs? the order of Vector elements is not guarranted to be preserved.
It is recommended to use XDMFFile.write_checkpoint() and XDMFFile.read_checkpoint() methods instead.

Python load large number of files

I'm trying to load a large number of files saved in the Ensight gold format into a numpy array. In order to conduct this read I've written my own class libvec which reads the geometry file and then preallocates the arrays which python will use to save the data as shown in the code below.
N = len(file_list)
# Create the class object and read geometry file
gvec = vec.libvec(os.path.join(current_dir,casefile))
x,y,z = gvec.xyz()
# Preallocate arrays
U_temp = np.zeros((len(y),len(x),N),dtype=np.dtype('f4'))
V_temp = np.zeros((len(y),len(x),N),dtype=np.dtype('f4'))
u_temp = np.zeros((len(x),len(x),N),dtype=np.dtype('f4'))
v_temp = np.zeros((len(x),len(y),N),dtype=np.dtype('f4'))
# Read the individual files into the previously allocated arrays
for idx,current_file in enumerate(file_list):
U,V =gvec.readvec(os.path.join(current_dir,current_file))
U_temp[:,:,idx] = U
V_temp[:,:,idx] = V
del U,V
However this takes seemingly forever so I was wondering if you have any idea how to speed up this process? The code reading the individual files into the array structure can be seen below:
def readvec(self,filename):
# we are supposing for the moment that the naming scheme PIV__vxy.case PIV__vxy.geo not changes should that
# not be the case appropriate changes have to be made to the corresponding file
data_temp = np.loadtxt(filename, dtype=np.dtype('f4'), delimiter=None, converters=None, skiprows=4)
# U value
for i in range(len(self.__y)):
# x value counter
for j in range(len(self.__x)):
# y value counter
self.__U[i,j]=data_temp[i*len(self.__x)+j]
# V value
for i in range(len(self.__y)):
# x value counter
for j in range(len(self.__x)):
# y value counter
self.__V[i,j]=data_temp[len(self.__x)*len(self.__y)+i*len(self.__x)+j]
# W value
if len(self.__z)>1:
for i in range(len(self.__y)):
# x value counter
for j in range(len(self.__xd)):
# y value counter
self.__W[i,j]=data_temp[2*len(self.__x)*len(self.__y)+i*len(self.__x)+j]
return self.__U,self.__V,self.__W
else:
return self.__U,self.__V
Thanks a lot in advance and best regards,
J

It'a bit hard to say without any test input\output to compare against. But i think this would give you the same U\V arrays as your nested for loops in readvec. This method should be considerably faster then the for loops.
U = data[:size_x*size_y].reshape(size_x, size_y)
V = data[size_x*size_y:].reshape(size_x, size_y)
Returning these directly into U_temp and V_temp should also help. Right now you're doing 3(?) copies of your data to get them into U_temp and V_temp
From file to temp_data
From temp_data to self.__U\V
From U\V into U\V_temp
Although my guess is that the two nested for loop, and accessing one element at a time is causing the slowness

What is the lightest way of doing this task?

I have a file whose contents are of the form:
.2323 1
.2327 1
.3432 1
.4543 1
and so on some 10,000 lines in each file.
I have a variable whose value is say a=.3344
From the file I want to get the row number of the row whose first column is closest to this variable...for example it should give row_num='3' as .3432 is closest to it.
I have tried in a method of loading the first columns element in a list and then comparing the variable to each element and getting the index number
If I do in this method it is very much time consuming and slow my model...I want a very quick method as this need to to called some 1000 times minimum...
I want a method with least overhead and very quick can anyone please tell me how can it be done very fast.
As the file size is maximum of 100kb can this be done directly without loading into any list of anything...if yes how can it be done.
Any method quicker than the method mentioned above are welcome but I am desperate to improve the speed -- please help.
def get_list(file, cmp, fout):
ind, _ = min(enumerate(file), key=lambda x: abs(x[1] - cmp))
return fout[ind].rstrip('\n').split(' ')
#root = r'c:\begpython\wavnk'
header = 6
for lst in lists:
save = database_index[lst]
#print save
index, base,abs2, _ , abs1 = save
using_data[index] = save
base = 'C:/begpython/wavnk/'+ base.replace('phone', 'text')
fin, fout = base + '.pm', base + '.mcep'
file = open(fin)
fout = open(fout).readlines()
[next(file) for _ in range(header)]
file = [float(line.partition(' ')[0]) for line in file]
join_cost_index_end[index] = get_list(file, float(abs1), fout)
join_cost_index_strt[index] = get_list(file, float(abs2), fout)
this is the code i was using..copying file into a list.and all please give better alternarives to this

Building on John Kugelman's answer, here's a way you might be able to do a binary search on a file with fixed-length lines:
class SubscriptableFile(object):
def __init__(self, file):
self._file = file
file.seek(0,0)
self._line_length = len(file.readline())
file.seek(0,2)
self._len = file.tell() / self._line_length
def __len__(self):
return self._len
def __getitem__(self, key):
self._file.seek(key * self._line_length)
s = self._file.readline()
if s:
return float(s.split()[0])
else:
raise KeyError('Line number too large')
This class wraps a file in a list-like structure, so that now you can use the functions of the bisect module on it:
def find_row(file, target):
fw = SubscriptableFile(file)
i = bisect.bisect_left(fw, target)
if fw[i + 1] - target < target - fw[i]:
return i + 1
else:
return i
Here file is an open file object and target is the number you want to find. The function returns the number of the line with the closest value.
I will note, however, that the bisect module will try to use a C implementation of its binary search when it is available, and I'm not sure if the C implementation supports this kind of behavior. It might require a true list, rather than a "fake list" (like my SubscriptableFile).

Is the data in the file sorted in numerical order? Are all the lines of the same length? If not, the simplest approach is best. Namely, reading through the file line by line. There's no need to store more than one line in memory at a time.
Code:
def closest(num):
closest_row = None
closest_value = None
for row_num, row in enumerate(file('numbers.txt')):
value = float(row.split()[0])
if closest_value is None or abs(value - num) < abs(closest_value - num):
closest_row = row
closest_row_num = row_num
closest_value = value
return (closest_row_num, closest_row)
print closest(.3344)
Output for sample data:
(2, '.3432 1\n')
If the lines are all the same length and the data is sorted then there are some optimizations that will make this a very fast process. All the lines being the same length would let you seek directly to particular lines (you can't do this in a normal text file with lines of different length). Which would then enable you to do a binary search.
A binary search would be massively faster than a linear search. A linear search will on average have to read 5,000 lines of a 10,000 line file each time, whereas a binary search would on average only read log2 10,000 ≈ 13 lines.

Load it into a list then use bisect.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to extract data in *.txt file and use it in Python? - python

I think this should do it: with open('u.txt') as uf, open('v.txt') as vf: for u,v in zip(uf,vf): print uv2sd(float(u),float(v))

from itertools import izip, imap with open('u.txt') as u_data, open('v.txt') as v_data: for u,v in imap(float, izip(u_data, v_data)): print uv2sd(u, v)

Related

Nested for loop producing more number of values than expected-Python

Music21 Analyze Key always returns c minor?

Is that way of writing/reading a solution safe?

Python load large number of files

What is the lightest way of doing this task?

Categories

Resources