Reading multiple files and arrays - python

I need to read the values from text files into an arrays, Z. This works fine using just a single file, ChiTableSingle, but when i try to use multiple files it fails. It seems to be reading lines correctly, and produces Z, but gives z[0] as just [], then i get the error, setting an array element with a sequence.
This is my current code:
rootdir='C:\users\documents\ChiGrid'
fileNameTemplate = r'C:\users\documents\ContourPlots\Plot{0:02d}.png'
for subdir,dirs,files in os.walk(rootdir):
for count, file in enumerate(files):
fh=open(os.path.join(subdir,file),'r')
#fh = open( "ChiTableSingle.txt" );
print 'file is '+ str(file)
Z = []
for line in fh.readlines():
y = [value for value in line.split()]
Z.append( y )
print Z[0][0]
fh.close()
plt.figure() # Create a new figure window
Temp=open('TempValues.txt','r')
lineTemp=Temp.readlines()
for i in range(0, len(lineTemp)):
lineTemp[i]=[float(lineTemp[i])]
Grav=open('GravValues2.txt','r')
lineGrav=Grav.readlines()
for i in range(0, len(lineGrav)):
lineGrav[i]=[float(lineGrav[i])]
X,Y = np.meshgrid(lineTemp, lineGrav) # Create 2-D grid xlist,ylist values
plt.contour(X, Y, Z,[1,2,3], colors = 'k', linestyles = 'solid')
plt.savefig(fileNameTemplate.format(count), format='png')
plt.clf()

The first thing I noticed is that your list comprehension y = [value for ...] is only going to return a list of strings (from the split() function), so you will want to convert them to a numeric format at some point before trying to plot it.
In addition, if the files you are reading in are simply white-space delimited tables of numbers, you should consider using numpy.loadtxt(fh) since it takes care of splitting and type conversion and returns a 2-d numpy.array. You can also add comment text that it will ignore if the line starts with the regular python comment character (e.g. # this line is a comment and will be ignored).
Just another thought, I would be careful about using variable names that are the same as a python method (e.g. the word file in this case). Once you redefine it as something else, the previous definition is gone.

Related

How to split chunks of xy data into lists between isalpha() and newline \n

So I've got a cleaned up datafile of number strings, representing coordinates for polygons. I've had experience assigning one polygon's data in a datafile into a column and plotting it in numpy/matplotlib, but for this I have to plot multiple polygons from one datafile separated by headers. The data isn't evenly sized either; every header has several lines of data in two columns, but not the same amount of lines.
i.e. I've used .readlines() to go from:
# Title of the polygons
# a comment on the datasource
# A comment on polygon projection
Poly Two/a bit
(331222.6210000003, 672917.1531000007)
(331336.0946000004, 672911.7816000003)
(331488.4949000003, 672932.4191999994)
##etc
Poly One
[(331393.15660000034, 671982.1392999999), (331477.28839999996, 671959.8816), (331602.10170000046, 671926.8432999998), (331767.28160000034, 671894.7273999993), (331767.28529999964, 671894.7267000005), (##etc)]
to:
PolyOneandTwo
319547.04899999965,673790.8118999992
319553.2614000002,673762.4122000001
319583.4143000003,673608.7760000005
319623.6182000004,673600.1608000007
319685.3598999996,673600.1608000007
##etc
PolyTwoandabit
319135.9966000002,673961.9215999991
319139.7357999999,673918.9201999996
319223.0153000001,673611.6477000006
319254.6040000003,673478.1133999992
##etc etc
PolyOneHundredFifty
##etc
My code so far involves cleaning the original dataset up to make it like you see above;
data_easting=[]
data_northing=[]
County = open('counties.dat','r')
for line in County.readlines():
if line.lstrip().startswith('#'):
print ('Comment line ignored and leading whitespace removed')
continue
line = line.replace('/','and').replace(' ','').replace('[','').replace(']','').replace('),(','\n')
line = line.strip('()\n')
print (line)
if line.isalpha():
print ('Skipped header: '+ line)
continue
I've been using isalpha(): to ignore the headers for each polygon so far, and I was planning on using if line == '\n': continue and line.split(',') to ignore the newlines between data and begin splitting the Easting and Northing lists. I've already got the numpy and matplotlib section of the code (not shown) sorted to make 1 polygon, but I don't know how to implement it to plot multiple arrays/multiple polygons.
I realised early on though that if I tried to assign all the data to the 2 x and y lists, that would just make one large unending polygon that will make a spaghetti mess of my plot as imaginary lines will be drawn to connect them up.
I want to use the isalpha() section to instead identify and make a Dictionary or List of the polygon names, and attach an array for each polygon datablock to that, but I'm not sure of how to implement it (or if you even can). I'm also not certain how to make it stop loading data into a list at the end of a polygon datablock (maybe if line == '\n': break? but how to make it start and stop again 149 more times for each other chunk?).
To make it a bit more difficult, there is 150 polygons with x and y data in this file, so making 150 x and y lists for each individual polygon and writing specific code for each wouldn't be very efficient.
So, how do I basically do:
if line.isalpha():
#(assign to a Counties dictionary or a list as PolyOne, PolyTwo,...PolyOneHundredFifty)
#(a way of getting the data between the header and newline into a separate list)
#(a way to relate that PolyOne Data list of x and y to the dictionary "PolyOne")
if line == '\n':
#(break? continue?)
#(then restart and repeat for PolyTwo,...PolyOneHundredFifty)
line.split (',')
data_easting.append(x) #x1,x2,...x150?
data_northing.append(y) #y1,y2,y150?)
Is there a way of doing what I intend? How would I go about that without pandas?
Thanks for your time.
Parsing the raw data/file:
When you encounter a line/block like the second in your example,
>>> s = '''[(331393.15660000034, 671982.1392999999), (331477.28839999996, 671959.8816), (331602.10170000046, 671926.8432999998), (331767.28160000034, 671894.7273999993), (331767.28529999964, 671894.7267000005)]'''
it can be converted directly to a 2d numpy array as follows using ast.literal_eval which is a safe way to convert text to a python object - in this case a list of tuples.
>>> import numpy as np
>>> import ast
>>> if s.startswith('['):
#print(ast.literal_eval(s))
array = np.array(ast.literal_eval(s))
>>> array
array([[331393.1566, 671982.1393],
[331477.2884, 671959.8816],
[331602.1017, 671926.8433],
[331767.2816, 671894.7274],
[331767.2853, 671894.7267]])
>>> array.shape
(5, 2)
For the blocks that resemble the first in your (raw) example accumulate each line as a tuple of floats in a list, when you reach the next block make an array of that list and reset it. I put this all in a generator function which yields blocks as 2-d arrays.
import ast
import numpy as np
def parse(lines_read):
data = []
for line in lines_read:
if line.startswith('#'):
continue
elif line.startswith('('):
n,e = line[1:-2].split(',')
data.append((float(n),float(e)))
elif line.startswith('['):
array = np.array(ast.literal_eval(line))
yield array
else:
if data:
array = np.array(data)
data = []
yield array
Used like this
>>> for block in parse(f.readlines()):
... print(block)
... print('*******************')
[[331222.621 672917.1531]
[331336.0946 672911.7816]
[331488.4949 672932.4192]]
*******************
[[331393.1566 671982.1393]
[331477.2884 671959.8816]
[331602.1017 671926.8433]
[331767.2816 671894.7274]
[331767.2853 671894.7267]]
*******************
>>>
If you need to select the northing or easting columns separately, consult the Numpy docs.
Parsing with two regular expressions. This operates on the whole file read as a string: s = fileobject.read(). It needs to go over the file twice and does not preserve the block order.
import re, ast
import numpy as np
pattern1 = re.compile(r'(\n\([^)]+\))+')
pattern2 = re.compile(r'^\[[^]]+\]',flags=re.MULTILINE)
for m in pattern1.finditer(s):
block = m.group().strip().split('\n')
data = []
for line in block:
line = line[1:-1]
n,e = map(float,line.split(','))
data.append((n,e))
print(np.array(data))
print('****')
for m in pattern2.finditer(s):
print(np.array(ast.literal_eval(m.group())))
print('****')

pass in a list of labels for generation of figures in for loop

Currently trying to save figures each with a name coming from a list in a for loop.
Input:
plot_l=['test1','test2',.........]
for i in range(ydata.shape[1]):
plt.figure()
fig, ax = plt.subplots(constrained_layout=True)
ax.plot(dsfg,ydata[i])
ax.set_xlabel('dsfg')
ax.set_ylabel('tzui')
ax.set_title('misc ')
secax.set_xlabel('len')
plt.savefig('plot_l{0}.jpg'.format(i))
plt.close()
Output: The figures are generated but with incorrect figure name, ie,
plot_l1.png
plot_l2.png
plot_l3.png
Desired Output:
test1.png
test2.png
test3.png
I have also tried plt.savefig(plot_l[i]+'.png') in place of plt.savefig('plot_l{0}.jpg'.format(i)) Suggestions are welcome....thanks
You are iterating on integers generated by range:
for i in range(ydata.shape[1]):
And you are naming the files with this parameter i.
plt.savefig('plot_l{0}.jpg'.format(i))
Assuming there is no error and the list of names contains as many names as there are iteration on i (i.e. ydata.shape[1] == len(plot_l)), then you can replace the savefig with:
plt.savefig(f'{plot_l[i]}.jpg')
The notation f followed by a string is equivalent to str.format(), but is more explicit as you can place the variable you are formatting inside the {}.

How do I get a text output from a string created from an array to remain unshortened?

Python/Numpy Problem. Final year Physics undergrad... I have a small piece of code that creates an array (essentially an n×n matrix) from a formula. I reshape the array to a single column of values, create a string from that, format it to remove extraneous brackets etc, then output the result to a text file saved in the user's Documents directory, which is then used by another piece of software. The trouble is above a certain value for "n" the output gives me only the first and last three values, with "...," in between. I think that Python is automatically abridging the final result to save time and resources, but I need all those values in the final text file, regardless of how long it takes to process, and I can't for the life of me find how to stop it doing it. Relevant code copied beneath...
import numpy as np; import os.path ; import os
'''
Create a single column matrix in text format from Gaussian Eqn.
'''
save_path = os.path.join(os.path.expandvars("%userprofile%"),"Documents")
name_of_file = 'outputfile' #<---- change this as required.
completeName = os.path.join(save_path, name_of_file+".txt")
matsize = 32
def gaussf(x,y): #defining gaussian but can be any f(x,y)
pisig = 1/(np.sqrt(2*np.pi) * matsize) #first term
sumxy = (-(x**2 + y**2)) #sum of squares term
expden = (2 * (matsize/1.0)**2) # 2 sigma squared
expn = pisig * np.exp(sumxy/expden) # and put it all together
return expn
matrix = [[ gaussf(x,y) ]\
for x in range(-matsize/2, matsize/2)\
for y in range(-matsize/2, matsize/2)]
zmatrix = np.reshape(matrix, (matsize*matsize, 1))column
string2 = (str(zmatrix).replace('[','').replace(']','').replace(' ', ''))
zbfile = open(completeName, "w")
zbfile.write(string2)
zbfile.close()
print completeName
num_lines = sum(1 for line in open(completeName))
print num_lines
Any help would be greatly appreciated!
Generally you should iterate over the array/list if you just want to write the contents.
zmatrix = np.reshape(matrix, (matsize*matsize, 1))
with open(completeName, "w") as zbfile: # with closes your files automatically
for row in zmatrix:
zbfile.writelines(map(str, row))
zbfile.write("\n")
Output:
0.00970926751178
0.00985735189176
0.00999792646484
0.0101306077521
0.0102550302672
0.0103708481917
0.010477736974
0.010575394844
0.0106635442315
.........................
But using numpy we simply need to use tofile:
zmatrix = np.reshape(matrix, (matsize*matsize, 1))
# pass sep or you will get binary output
zmatrix.tofile(completeName,sep="\n")
Output is in the same format as above.
Calling str on the matrix will give you similarly formatted output to what you get when you try to print so that is what you are writing to the file the formatted truncated output.
Considering you are using python2, using xrange would be more efficient that using rane which creates a list, also having multiple imports separated by colons is not recommended, you can simply:
import numpy as np, os.path, os
Also variables and function names should use underscores z_matrix,zb_file,complete_name etc..
You shouldn't need to fiddle with the string representations of numpy arrays. One way is to use tofile:
zmatrix.tofile('output.txt', sep='\n')

Python script for trasnforming ans sorting columns in ascending order, decimal cases

I wrote a script in Python removing tabs/blank spaces between two columns of strings (x,y coordinates) plus separating the columns by a comma and listing the maximum and minimum values of each column (2 values for each the x and y coordinates). E.g.:
100000.00 60000.00
200000.00 63000.00
300000.00 62000.00
400000.00 61000.00
500000.00 64000.00
became:
100000.00,60000.00
200000.00,63000.00
300000.00,62000.00
400000.00,61000.00
500000.00,64000.00
10000000 50000000 60000000 640000000
This is the code I used:
import string
input = open(r'C:\coordinates.txt', 'r')
output = open(r'C:\coordinates_new.txt', 'wb')
s = input.readline()
while s <> '':
s = input.readline()
liste = s.split()
x = liste[0]
y = liste[1]
output.write(str(x) + ',' + str(y))
output.write('\n')
s = input.readline()
input.close()
output.close()
I need to change the above code to also transform the coordinates from two decimal to one decimal values and each of the two new columns to be sorted in ascending order based on the values of the x coordinate (left column).
I started by writing the following but not only is it not sorting the values, it is placing the y coordinates on the left and the x on the right. In addition I don't know how to transform the decimals since the values are strings and the only function I know is using %f and that needs floats. Any suggestions to improve the code below?
import string
input = open(r'C:\coordinates.txt', 'r')
output = open(r'C:\coordinates_sorted.txt', 'wb')
s = input.readline()
while s <> '':
s = input.readline()
liste = string.split(s)
x = liste[0]
y = liste[1]
output.write(str(x) + ',' + str(y))
output.write('\n')
sorted(s, key=lambda x: x[o])
s = input.readline()
input.close()
output.close()
thanks!
First, try to format your code according to PEP8—it'll be easier to read. (I've done the cleanup in your post already).
Second, Tim is right in that you should try to learn how to write your code as (idiomatic) Python not just as if translated directly from its C equivalent.
As a starting point, I'll post your 2nd snippet here, refactored as idiomatic Python:
# there is no need to import the `string` module; `.strip()` is a built-in
# method of strings (i.e. objects of type `str`).
# read in the data as a list of pairs of raw (i.e. unparsed) coordinates in
# string form:
with open(r'C:\coordinates.txt') as in_file:
coords_raw = [line.strip().split() for line in in_file.readlines()]
# convert the raw list into a list of pairs (2-tuples) containing the parsed
# (i.e. float not string) data:
coord_pairs = [(float(x_raw), float(y_raw)) for x_raw, y_raw in coords_raw]
coord_pairs.sort() # you want to sort the entire data set, not just values on
# individual lines as in your original snippet
# build a list of all x and y values we have (this could be done in one line
# using some `zip()` hackery, but I'd like to keep it readable (for you at
# least)):
all_xs = [x for x, y in coord_pairs]
all_ys = [y for x, y in coord_pairs]
# compute min and max:
x_min, x_max = min(all_xs), max(all_xs)
y_min, y_max = min(all_ys), max(all_ys)
# NOTE: the above section performs well for small data sets; for large ones, you
# should combine the 4 lines in a single for loop so as to NOT have to read
# everything to memory and iterate over the data 6 times.
# write everything out
with open(r'C:\coordinates_sorted.txt', 'wb') as out_file:
# here, we're doing 3 things in one line:
# * iterate over all coordinate pairs and convert the pairs to the string
# form
# * join the string forms with a newline character
# * write the result of the join+iterate expression to the file
out_file.write('\n'.join('%f,%f' % (x, y) for x, y in coord_pairs))
out_file.write('\n\n')
out_file.write('%f %f %f %f' % (x_min, x_max, y_min, y_max))
with open(...) as <var_name> gives you guaranteed closing of the file handle as with try-finally; also, it's shorter than open(...) and .close() on separate lines. Also, with can be used for other purposes, but is commonly used for dealing with files. I suggest you look up how to use try-finally as well as with/context managers in Python, in addition to everything else you might have learned here.
Your code looks more like C than like Python; it is quite unidiomatic. I suggest you read the Python tutorial to find some inspiration. For example, iterating using a while loop is usually the wrong approach. The string module is deprecated for the most part, <> should be !=, you don't need to call str() on an object that's already a string...
Then, there are some errors. For example, sorted() returns a sorted version of the iterable you're passing - you need to assign that to something, or the result will be discarded. But you're calling it on a string, anyway, which won't give you the desired result. You also wrote x[o] where you clearly meant x[0].
You should be using something like this (assuming Python 2):
with open(r'C:\coordinates.txt') as infile:
values = []
for line in infile:
values.append(map(float, line.split()))
values.sort()
with open(r'C:\coordinates_sorted.txt', 'w') as outfile:
for value in values:
outfile.write("{:.1f},{:.1f}\n".format(*value))

Writing to a multi dimensional array with split

I am trying to use python to parse a text file (stored in the var trackList) with times and titles in them it looks like this
00:04:45 example text
00:08:53 more example text
12:59:59 the last bit of example text
My regular expression (rem) works, I am also able to split the string (i) into two parts correctly (as in I separate times and text) but I am unable to then add the arrays (using .extend) that the split returns to a large array I created earlier (sLines).
f=open(trackList)
count=0
sLines=[[0 for x in range(0)] for y in range(34)]
line=[]
for i in f:
count+=1
line.append(i)
rem=re.match("\A\d\d\:\d\d\:\d\d\W",line[count-1])
if rem:
sLines[count-1].extend(line[count-1].split(' ',1))
else:
print("error on line: "+count)
That code should go through each line in the file trackList, test to see if the line is as expected, if so separate the time from the text and save the result of that as an array inside an array at the index of one less than the current line number, if not print an error pointing me to the line
I use array[count-1] as python arrays are zero indexed and file lines are not.
I use .extend() as I want both elements of the smaller array added to the larger array in the same iteration of the parent for loop.
So, you have some pretty confusing code there.
For instance doing:
[0 for x in range(0)]
Is a really fancy way of initializing an empty list:
>>> [] == [0 for x in range(0)]
True
Also, how do you know to get a matrix that is 34 lines long? You're also confusing yourself with calling your line 'i' in your for loop, usually that would be reserved as a short hand syntax for index, which you'd expect to be a numerical value. Appending i to line and then re-referencing it as line[count-1] is redundant when you already have your line variable (i).
Your overall code can be simplified to something like this:
# load the file and extract the lines
f = open(trackList)
lines = f.readlines()
f.close()
# create the expression (more optimized for loops)
expr = re.compile('^(\d\d:\d\d:\d\d)\s*(.*)$')
sLines = []
# loop the lines collecting both the index (i) and the line (line)
for i, line in enumerate(lines):
result = expr.match(line)
# validate the line
if ( not result ):
print("error on line: " + str(i+1))
# add an invalid list to the matrix
sLines.append([]) # or whatever you want as your invalid line
continue
# add the list to the matrix
sLines.append(result.groups())

Categories