Read Data into Python Line by Line as a List - python

I would like to read in a series of coordinates with their accuracy into a triangulation function to provide the triangulated coordinates. I've been able to use python to create a .txt document that contains the list of coordinates for each triangulation i.e.
[(-1.2354798, 36.8959406, -22.0), (-1.245124, 36.9027361, -31.0), (-1.2387697, 36.897921, -12.0), (-1.3019762, 36.8923956, -4.0)]
[(-1.3103075, 36.8932163, -70.0), (-1.3017684, 36.8899228, -12.0)]
[(-1.3014139, 36.8899931, -34.0), (-1.2028006, 36.9180461, -54.0), (-1.1996497, 36.9286186, -67.0), (-1.2081047, 36.9239936, -22.0), (-1.2013893, 36.9066869, -11.0)]
Each of those would be one group of coordinates and accuracy to feed into the triangulation function. The text documents separate them by line.
This is the triangulation function I am trying to read the text file into:
def triangulate(points):
"""
Given points in (x,y, signal) format, approximate the position (x,y).
Reading:
* http://stackoverflow.com/questions/10329877/how-to-properly-triangulate-gsm-cell-towers-to-get-a-location
* http://www.neilson.co.za/?p=364
* http://gis.stackexchange.com/questions/40660/trilateration-algorithm-for-n-amount-of-points
* http://gis.stackexchange.com/questions/2850/what-algorithm-should-i-use-for-wifi-geolocation
"""
# Weighted signal strength
ws = sum(p[2] for p in points)
points = tuple( (x,y,signal/ws) for (x,y,signal) in points )
# Approximate
return (
sum(p[0]*p[2] for p in points), # x
sum(p[1]*p[2] for p in points) # y
)
print(triangulate([
(14.2565389, 48.2248439, 80),
(14.2637736, 48.2331576, 55),
(14.2488966, 48.232513, 55),
(14.2488163, 48.2277972, 55),
(14.2647612, 48.2299558, 21),
]))
When I test the function with the above print statement it works. But when I try to load the data from the text file into the function as follows"
with open(filename, 'r') as file:
for points in file:
triangulation(points)
I get the error: IndexError: string index out of range. I understand that this is because it is not being read in as a list but as a string, but when I try to convert it to a list object points = list(points) it is also not recognized as a list of different coordinates. My question is how should I read the file into python in order for it to be translated to working within the triangulate function.

What you get from the file is a string, but Python doesn't know anything about how that string should be interpreted. It could be a printed representation of a list of tuples, as in your case, but it could just as well be a part of a book, or it could be some compressed data, or so on. It's not the language's job to guess how to treat the string that gets read from the file. That's your job; you have to write some code to take those strings and parse them - that is, convert them into the data your program needs, using the reverse of the rules that were used to convert that data into strings in the first place.
Now, this is certainly a thing you could do, but it's probably better to just use something other than print(). That is, use a different set of rules for converting your data into strings, one where people have already written the code to reverse the process. A common format you could use is JSON, for which Python includes a library to do the conversions. Other formats that can work with numerical data include CSV (here's the Python module) and HDF5 (supported by an external library, probably overkill for your case). The point is, you need to choose some set of rules for converting between data and strings and use the corresponding code in both directions. In your original example, you were only using the rule for going from data to strings and expecting Python to guess the rule for going back.
If you want to read more about this, the process of converting data to strings (or, really, to something that can be put in a file) is called formatting or serialization, depending on context, and the reverse process of converting the strings back to the original data is called parsing or deserialization.

As suggested by #FMCorz you should use JSON or some other machine-readable format.
Doing so is simple and just a matter of dumping your list of points to the text file in any machine-readable format and then later reading it back in.
Here is a minimal example (using JSON):
import json
def triangulate(points):
""" Given points in (x,y, signal) format, approximate the position (x,y).
Reading:
* http://stackoverflow.com/questions/10329877/how-to-properly-triangulate-gsm-cell-towers-to-get-a-location
* http://www.neilson.co.za/?p=364
* http://gis.stackexchange.com/questions/40660/trilateration-algorithm-for-n-amount-of-points
* http://gis.stackexchange.com/questions/2850/what-algorithm-should-i-use-for-wifi-geolocation
"""
# Weighted signal strength
ws = sum(p[2] for p in points)
points = tuple( (x,y,signal/ws) for (x,y,signal) in points )
# Approximate
return (
sum(p[0]*p[2] for p in points), # x
sum(p[1]*p[2] for p in points) # y
)
points = [(14.2565389, 48.2248439, 80),
(14.2637736, 48.2331576, 55),
(14.2488966, 48.232513, 55),
(14.2488163, 48.2277972, 55),
(14.2647612, 48.2299558, 21)]
with open("points.txt", 'w') as file:
file.write(json.dumps(points))
with open("points.txt", 'r') as file:
for line in file:
points = json.loads(line)
print(triangulate(points))
If you wanted to use a list of lists (a list containing lists of points), you could do something like this:
import json
def triangulate(points):
""" Given points in (x,y, signal) format, approximate the position (x,y).
Reading:
* http://stackoverflow.com/questions/10329877/how-to-properly-triangulate-gsm-cell-towers-to-get-a-location
* http://www.neilson.co.za/?p=364
* http://gis.stackexchange.com/questions/40660/trilateration-algorithm-for-n-amount-of-points
* http://gis.stackexchange.com/questions/2850/what-algorithm-should-i-use-for-wifi-geolocation
"""
# Weighted signal strength
ws = sum(p[2] for p in points)
points = tuple( (x,y,signal/ws) for (x,y,signal) in points )
# Approximate
return (
sum(p[0]*p[2] for p in points), # x
sum(p[1]*p[2] for p in points) # y
)
points_list = [[(-1.2354798, 36.8959406, -22.0), (-1.245124, 36.9027361, -31.0), (-1.2387697, 36.897921, -12.0), (-1.3019762, 36.8923956, -4.0)],
[(-1.3103075, 36.8932163, -70.0), (-1.3017684, 36.8899228, -12.0)],
[(-1.3014139, 36.8899931, -34.0), (-1.2028006, 36.9180461, -54.0), (-1.1996497, 36.9286186, -67.0), (-1.2081047, 36.9239936, -22.0), (-1.2013893, 36.9066869, -11.0)]]
with open("points.txt", 'w') as file:
file.write(json.dumps(points_list))
with open("points.txt", 'r') as file:
for line in file:
points_list = json.loads(line)
for points in points_list:
print(triangulate(points))

Related

How to to remove a double/nested for-loop? String to float transformation in python

I have the following polygon of a geographic area that I fetch via a request in CAP/XML format from an API
The raw data looks like this:
<polygon>22.3243,113.8659 22.3333,113.8691 22.4288,113.8691 22.4316,113.8742 22.4724,113.9478 22.5101,113.9951 22.5099,113.9985 22.508,114.0017 22.5046,114.0051 22.5018,114.0085 22.5007,114.0112 22.5007,114.0125 22.502,114.0166 22.5038,114.0204 22.5066,114.0245 22.5067,114.0281 22.5057,114.0371 22.5051,114.0409 22.5041,114.0453 22.5025,114.0494 22.5023,114.0511 22.5035,114.0549 22.5047,114.0564 22.5059,114.057 22.5104,114.0576 22.512,114.0584 22.5144,114.0608 22.5163,114.0637 22.517,114.0657 22.5172,114.0683 22.5181,114.0717 22.5173,114.0739</polygon>
I store the requested items in a dictionary and then work through them to transform to a GeoJSON list object that is suitable for ingestion into Elasticsearch according to the schema I'm working with. I've removed irrelevant code here for ease of reading.
# fetches and store data in a dictionary
r = requests.get("https://alerts.weather.gov/cap/ny.php?x=0")
xpars = xmltodict.parse(r.text)
json_entry = json.dumps(xpars['feed']['entry'])
dict_entry = json.loads(json_entry)
# transform items if necessary
for entry in dict_entry:
if entry['cap:polygon']:
polygon = entry['cap:polygon']
polygon = polygon.split(" ")
coordinates = []
# take the split list items swap their positions and enclose them in their own arrays
for p in polygon:
p = p.split(",")
p[0], p[1] = float(p[1]), float(p[0]) # swap lon/lat
coordinates += [p]
# more code adding fields to new dict object, not relevant to the question
The output of the p in polygon loop looks like:
[ [113.8659, 22.3243], [113.8691, 22.3333], [113.8691, 22.4288], [113.8742, 22.4316], [113.9478, 22.4724], [113.9951, 22.5101], [113.9985, 22.5099], [114.0017, 22.508], [114.0051, 22.5046], [114.0085, 22.5018], [114.0112, 22.5007], [114.0125, 22.5007], [114.0166, 22.502], [114.0204, 22.5038], [114.0245, 22.5066], [114.0281, 22.5067], [114.0371, 22.5057], [114.0409, 22.5051], [114.0453, 22.5041], [114.0494, 22.5025], [114.0511, 22.5023], [114.0549, 22.5035], [114.0564, 22.5047], [114.057, 22.5059], [114.0576, 22.5104], [114.0584, 22.512], [114.0608, 22.5144], [114.0637, 22.5163], [114.0657, 22.517], [114.0683, 22.5172], [114.0717, 22.5181], [114.0739, 22.5173] ]
Is there a way to do this that is better than O(N^2)? Thank you for taking the time to read.
O(KxNxM)
This process involves three obvious loops. These are:
Checking each entry (K)
Splitting valid entries into points (MxN) and iterating through those points (N)
Splitting those points into respective coordinates (M)
The amount of letters in a polygon string is ~MxN because there are N points each roughly M letters long, so it iterates through MxN characters.
Now that we know all of this, let's pinpoint where each occurs.
ENTRIES (K):
IF:
SPLIT (MxN)
POINTS (N):
COORDS(M)
So, we can finally conclude that this is O(K(MxN + MxN)) which is just O(KxNxM).

produce vector output from a dask array

I have a large dask array (labeled_arr) that is actually a labeled raster image (dtype is int64). I want to use rasterio to turn the labeled regions into polygons and combine them into a single list of polygons (or geoseries with just a geometry column). This is a straightforward task on a single array, but I'm having trouble figuring out how to tell dask that I want it to do this operation on each chunk and return something that is not an array.
function to apply to each chunk:
def get_polys(labeled_blocks):
polys = list(poly[0]['coordinates'][0] for poly in rasterio.features.shapes(
labeled_blocks.astype('int32'), transform=trans))[:-1]
# Note: rasterio.features.shapes returns an iterator, hence the conversion to a list here
return polys
line of code trying to get dask to do this:
test_polygons = da.blockwise(get_polys, '', labeled_arr, 'ij')
test_polygons.compute()
where labeled_arr is the input chunked dask array.
Running as is returns an error saying I have to specify a dtype for da.blockwise. Specifying a dtype returns an AttributeError since the output list type does not have a dtype attribute. I discovered the meta keyword, but still have been unable to get the right syntax to turn my output into a Series or list.
I'm not attached to the above approach, but my overarching goal is: take a labeled, chunked dask dataarray (which does not all fit in memory), extract a list based on computations for each chunk, and generate a concatenated list (or pandas data object) with the outputs from all the chunks in my original chunked array.
This might work:
import dask
import dask.array as da
# we expect to see 4 blocks here
test_array = da.random.random((4, 4), chunks=(2, 2))
#dask.delayed
def my_func(block):
# do something fancy
return list(block)
results = dask.compute([my_func(x) for x in test_array.to_delayed().ravel()])
As you noted, the problem is that list has no dtype. A way around this would be to convert the list into a np.array, but I'm not sure if this will work with all geometry objects (it should be OK for Points, but polygons might be problematic due to varying length). Since you are not interested in forcing these geometries into an array, it's best to treat individual blocks as delayed objects feeding them into your function one at a time (but scaled across workers/processes).
Here's the solution I ended up with initially, though it still requires a lot of RAM given the concatenate=True kwarg.
poss_list = []
def get_polys(labeled_blocks):
polys = list(poly[0]['coordinates'][0] for poly in rasterio.features.shapes(
labeled_blocks.astype('int32'), transform=trans))[:-1]
poss_list.append(polys)
da.blockwise(get_bergs, '', labeled_arr, 'ij',
meta=pd.DataFrame({'c':[]}), concatenate=True).compute()
If I'm interpreting correctly, this doesn't feed the chunks into my function across workers/processes though (which it seems I can get away with for now).
Update - improved answer using dask.delayed, building on the accepted answer by #SultanOrazbayev
import dask
# onedem = original_xarray_dataarray
poss_list = []
#dask.delayed
def get_bergs(labeled_blocks, pointer, chunk0, chunk1):
# Note: I'm using this in a CRS (polar stereo) with negative y coordinates - it hasn't been tested for other CRSs
def getpx(chunkid, chunksz):
amin = chunkid[0] * chunksz[0][0]
amax = amin + chunksz[0][0]
bmin = chunkid[1] * chunksz[1][0]
bmax = bmin + chunksz[1][0]
return (amin, amax, bmin, bmax)
# order of all inputs (and outputs) should be y, x when axis order is used
chunksz = (onedem.chunks['y'], onedem.chunks['x'])
ymini, ymaxi, xmini, xmaxi = getpx((chunk0, chunk1), chunksz)
# use rasterio Windows and rioxarray to construct transform
# https://rasterio.readthedocs.io/en/latest/topics/windowed-rw.html#window-transforms
chwindow = rasterio.windows.Window(xmini, ymini, xmaxi-xmini, ymaxi-ymini) #.from_slices[ymini, ymaxi],[xmini, xmaxi])
trans = onedem.rio.isel_window(chwindow).rio.transform(recalc=True)
return list(poly[0]['coordinates'][0] for poly in rasterio.features.shapes(labeled_blocks.astype('int32'), transform=trans))[:-1]
for __, obj in enumerate(labeled_arr.to_delayed()):
for bl in obj:
piece = dask.delayed(get_bergs)(bl, *bl.key)
poss_list.append(piece)
poss_list = dask.compute(*poss_list)
# unnest the list of polygons returned by using dask to polygonize
concat_list = [item for sublist in poss_list for item in sublist if len(item)!=0]

How to split chunks of xy data into lists between isalpha() and newline \n

So I've got a cleaned up datafile of number strings, representing coordinates for polygons. I've had experience assigning one polygon's data in a datafile into a column and plotting it in numpy/matplotlib, but for this I have to plot multiple polygons from one datafile separated by headers. The data isn't evenly sized either; every header has several lines of data in two columns, but not the same amount of lines.
i.e. I've used .readlines() to go from:
# Title of the polygons
# a comment on the datasource
# A comment on polygon projection
Poly Two/a bit
(331222.6210000003, 672917.1531000007)
(331336.0946000004, 672911.7816000003)
(331488.4949000003, 672932.4191999994)
##etc
Poly One
[(331393.15660000034, 671982.1392999999), (331477.28839999996, 671959.8816), (331602.10170000046, 671926.8432999998), (331767.28160000034, 671894.7273999993), (331767.28529999964, 671894.7267000005), (##etc)]
to:
PolyOneandTwo
319547.04899999965,673790.8118999992
319553.2614000002,673762.4122000001
319583.4143000003,673608.7760000005
319623.6182000004,673600.1608000007
319685.3598999996,673600.1608000007
##etc
PolyTwoandabit
319135.9966000002,673961.9215999991
319139.7357999999,673918.9201999996
319223.0153000001,673611.6477000006
319254.6040000003,673478.1133999992
##etc etc
PolyOneHundredFifty
##etc
My code so far involves cleaning the original dataset up to make it like you see above;
data_easting=[]
data_northing=[]
County = open('counties.dat','r')
for line in County.readlines():
if line.lstrip().startswith('#'):
print ('Comment line ignored and leading whitespace removed')
continue
line = line.replace('/','and').replace(' ','').replace('[','').replace(']','').replace('),(','\n')
line = line.strip('()\n')
print (line)
if line.isalpha():
print ('Skipped header: '+ line)
continue
I've been using isalpha(): to ignore the headers for each polygon so far, and I was planning on using if line == '\n': continue and line.split(',') to ignore the newlines between data and begin splitting the Easting and Northing lists. I've already got the numpy and matplotlib section of the code (not shown) sorted to make 1 polygon, but I don't know how to implement it to plot multiple arrays/multiple polygons.
I realised early on though that if I tried to assign all the data to the 2 x and y lists, that would just make one large unending polygon that will make a spaghetti mess of my plot as imaginary lines will be drawn to connect them up.
I want to use the isalpha() section to instead identify and make a Dictionary or List of the polygon names, and attach an array for each polygon datablock to that, but I'm not sure of how to implement it (or if you even can). I'm also not certain how to make it stop loading data into a list at the end of a polygon datablock (maybe if line == '\n': break? but how to make it start and stop again 149 more times for each other chunk?).
To make it a bit more difficult, there is 150 polygons with x and y data in this file, so making 150 x and y lists for each individual polygon and writing specific code for each wouldn't be very efficient.
So, how do I basically do:
if line.isalpha():
#(assign to a Counties dictionary or a list as PolyOne, PolyTwo,...PolyOneHundredFifty)
#(a way of getting the data between the header and newline into a separate list)
#(a way to relate that PolyOne Data list of x and y to the dictionary "PolyOne")
if line == '\n':
#(break? continue?)
#(then restart and repeat for PolyTwo,...PolyOneHundredFifty)
line.split (',')
data_easting.append(x) #x1,x2,...x150?
data_northing.append(y) #y1,y2,y150?)
Is there a way of doing what I intend? How would I go about that without pandas?
Thanks for your time.
Parsing the raw data/file:
When you encounter a line/block like the second in your example,
>>> s = '''[(331393.15660000034, 671982.1392999999), (331477.28839999996, 671959.8816), (331602.10170000046, 671926.8432999998), (331767.28160000034, 671894.7273999993), (331767.28529999964, 671894.7267000005)]'''
it can be converted directly to a 2d numpy array as follows using ast.literal_eval which is a safe way to convert text to a python object - in this case a list of tuples.
>>> import numpy as np
>>> import ast
>>> if s.startswith('['):
#print(ast.literal_eval(s))
array = np.array(ast.literal_eval(s))
>>> array
array([[331393.1566, 671982.1393],
[331477.2884, 671959.8816],
[331602.1017, 671926.8433],
[331767.2816, 671894.7274],
[331767.2853, 671894.7267]])
>>> array.shape
(5, 2)
For the blocks that resemble the first in your (raw) example accumulate each line as a tuple of floats in a list, when you reach the next block make an array of that list and reset it. I put this all in a generator function which yields blocks as 2-d arrays.
import ast
import numpy as np
def parse(lines_read):
data = []
for line in lines_read:
if line.startswith('#'):
continue
elif line.startswith('('):
n,e = line[1:-2].split(',')
data.append((float(n),float(e)))
elif line.startswith('['):
array = np.array(ast.literal_eval(line))
yield array
else:
if data:
array = np.array(data)
data = []
yield array
Used like this
>>> for block in parse(f.readlines()):
... print(block)
... print('*******************')
[[331222.621 672917.1531]
[331336.0946 672911.7816]
[331488.4949 672932.4192]]
*******************
[[331393.1566 671982.1393]
[331477.2884 671959.8816]
[331602.1017 671926.8433]
[331767.2816 671894.7274]
[331767.2853 671894.7267]]
*******************
>>>
If you need to select the northing or easting columns separately, consult the Numpy docs.
Parsing with two regular expressions. This operates on the whole file read as a string: s = fileobject.read(). It needs to go over the file twice and does not preserve the block order.
import re, ast
import numpy as np
pattern1 = re.compile(r'(\n\([^)]+\))+')
pattern2 = re.compile(r'^\[[^]]+\]',flags=re.MULTILINE)
for m in pattern1.finditer(s):
block = m.group().strip().split('\n')
data = []
for line in block:
line = line[1:-1]
n,e = map(float,line.split(','))
data.append((n,e))
print(np.array(data))
print('****')
for m in pattern2.finditer(s):
print(np.array(ast.literal_eval(m.group())))
print('****')

How do I get a text output from a string created from an array to remain unshortened?

Python/Numpy Problem. Final year Physics undergrad... I have a small piece of code that creates an array (essentially an n×n matrix) from a formula. I reshape the array to a single column of values, create a string from that, format it to remove extraneous brackets etc, then output the result to a text file saved in the user's Documents directory, which is then used by another piece of software. The trouble is above a certain value for "n" the output gives me only the first and last three values, with "...," in between. I think that Python is automatically abridging the final result to save time and resources, but I need all those values in the final text file, regardless of how long it takes to process, and I can't for the life of me find how to stop it doing it. Relevant code copied beneath...
import numpy as np; import os.path ; import os
'''
Create a single column matrix in text format from Gaussian Eqn.
'''
save_path = os.path.join(os.path.expandvars("%userprofile%"),"Documents")
name_of_file = 'outputfile' #<---- change this as required.
completeName = os.path.join(save_path, name_of_file+".txt")
matsize = 32
def gaussf(x,y): #defining gaussian but can be any f(x,y)
pisig = 1/(np.sqrt(2*np.pi) * matsize) #first term
sumxy = (-(x**2 + y**2)) #sum of squares term
expden = (2 * (matsize/1.0)**2) # 2 sigma squared
expn = pisig * np.exp(sumxy/expden) # and put it all together
return expn
matrix = [[ gaussf(x,y) ]\
for x in range(-matsize/2, matsize/2)\
for y in range(-matsize/2, matsize/2)]
zmatrix = np.reshape(matrix, (matsize*matsize, 1))column
string2 = (str(zmatrix).replace('[','').replace(']','').replace(' ', ''))
zbfile = open(completeName, "w")
zbfile.write(string2)
zbfile.close()
print completeName
num_lines = sum(1 for line in open(completeName))
print num_lines
Any help would be greatly appreciated!
Generally you should iterate over the array/list if you just want to write the contents.
zmatrix = np.reshape(matrix, (matsize*matsize, 1))
with open(completeName, "w") as zbfile: # with closes your files automatically
for row in zmatrix:
zbfile.writelines(map(str, row))
zbfile.write("\n")
Output:
0.00970926751178
0.00985735189176
0.00999792646484
0.0101306077521
0.0102550302672
0.0103708481917
0.010477736974
0.010575394844
0.0106635442315
.........................
But using numpy we simply need to use tofile:
zmatrix = np.reshape(matrix, (matsize*matsize, 1))
# pass sep or you will get binary output
zmatrix.tofile(completeName,sep="\n")
Output is in the same format as above.
Calling str on the matrix will give you similarly formatted output to what you get when you try to print so that is what you are writing to the file the formatted truncated output.
Considering you are using python2, using xrange would be more efficient that using rane which creates a list, also having multiple imports separated by colons is not recommended, you can simply:
import numpy as np, os.path, os
Also variables and function names should use underscores z_matrix,zb_file,complete_name etc..
You shouldn't need to fiddle with the string representations of numpy arrays. One way is to use tofile:
zmatrix.tofile('output.txt', sep='\n')

Python script for trasnforming ans sorting columns in ascending order, decimal cases

I wrote a script in Python removing tabs/blank spaces between two columns of strings (x,y coordinates) plus separating the columns by a comma and listing the maximum and minimum values of each column (2 values for each the x and y coordinates). E.g.:
100000.00 60000.00
200000.00 63000.00
300000.00 62000.00
400000.00 61000.00
500000.00 64000.00
became:
100000.00,60000.00
200000.00,63000.00
300000.00,62000.00
400000.00,61000.00
500000.00,64000.00
10000000 50000000 60000000 640000000
This is the code I used:
import string
input = open(r'C:\coordinates.txt', 'r')
output = open(r'C:\coordinates_new.txt', 'wb')
s = input.readline()
while s <> '':
s = input.readline()
liste = s.split()
x = liste[0]
y = liste[1]
output.write(str(x) + ',' + str(y))
output.write('\n')
s = input.readline()
input.close()
output.close()
I need to change the above code to also transform the coordinates from two decimal to one decimal values and each of the two new columns to be sorted in ascending order based on the values of the x coordinate (left column).
I started by writing the following but not only is it not sorting the values, it is placing the y coordinates on the left and the x on the right. In addition I don't know how to transform the decimals since the values are strings and the only function I know is using %f and that needs floats. Any suggestions to improve the code below?
import string
input = open(r'C:\coordinates.txt', 'r')
output = open(r'C:\coordinates_sorted.txt', 'wb')
s = input.readline()
while s <> '':
s = input.readline()
liste = string.split(s)
x = liste[0]
y = liste[1]
output.write(str(x) + ',' + str(y))
output.write('\n')
sorted(s, key=lambda x: x[o])
s = input.readline()
input.close()
output.close()
thanks!
First, try to format your code according to PEP8—it'll be easier to read. (I've done the cleanup in your post already).
Second, Tim is right in that you should try to learn how to write your code as (idiomatic) Python not just as if translated directly from its C equivalent.
As a starting point, I'll post your 2nd snippet here, refactored as idiomatic Python:
# there is no need to import the `string` module; `.strip()` is a built-in
# method of strings (i.e. objects of type `str`).
# read in the data as a list of pairs of raw (i.e. unparsed) coordinates in
# string form:
with open(r'C:\coordinates.txt') as in_file:
coords_raw = [line.strip().split() for line in in_file.readlines()]
# convert the raw list into a list of pairs (2-tuples) containing the parsed
# (i.e. float not string) data:
coord_pairs = [(float(x_raw), float(y_raw)) for x_raw, y_raw in coords_raw]
coord_pairs.sort() # you want to sort the entire data set, not just values on
# individual lines as in your original snippet
# build a list of all x and y values we have (this could be done in one line
# using some `zip()` hackery, but I'd like to keep it readable (for you at
# least)):
all_xs = [x for x, y in coord_pairs]
all_ys = [y for x, y in coord_pairs]
# compute min and max:
x_min, x_max = min(all_xs), max(all_xs)
y_min, y_max = min(all_ys), max(all_ys)
# NOTE: the above section performs well for small data sets; for large ones, you
# should combine the 4 lines in a single for loop so as to NOT have to read
# everything to memory and iterate over the data 6 times.
# write everything out
with open(r'C:\coordinates_sorted.txt', 'wb') as out_file:
# here, we're doing 3 things in one line:
# * iterate over all coordinate pairs and convert the pairs to the string
# form
# * join the string forms with a newline character
# * write the result of the join+iterate expression to the file
out_file.write('\n'.join('%f,%f' % (x, y) for x, y in coord_pairs))
out_file.write('\n\n')
out_file.write('%f %f %f %f' % (x_min, x_max, y_min, y_max))
with open(...) as <var_name> gives you guaranteed closing of the file handle as with try-finally; also, it's shorter than open(...) and .close() on separate lines. Also, with can be used for other purposes, but is commonly used for dealing with files. I suggest you look up how to use try-finally as well as with/context managers in Python, in addition to everything else you might have learned here.
Your code looks more like C than like Python; it is quite unidiomatic. I suggest you read the Python tutorial to find some inspiration. For example, iterating using a while loop is usually the wrong approach. The string module is deprecated for the most part, <> should be !=, you don't need to call str() on an object that's already a string...
Then, there are some errors. For example, sorted() returns a sorted version of the iterable you're passing - you need to assign that to something, or the result will be discarded. But you're calling it on a string, anyway, which won't give you the desired result. You also wrote x[o] where you clearly meant x[0].
You should be using something like this (assuming Python 2):
with open(r'C:\coordinates.txt') as infile:
values = []
for line in infile:
values.append(map(float, line.split()))
values.sort()
with open(r'C:\coordinates_sorted.txt', 'w') as outfile:
for value in values:
outfile.write("{:.1f},{:.1f}\n".format(*value))

Categories