Identifying coordinate matches from two files using python - python

I've got two sets of data describing atomic positions. They're in separate files that I would like to compare, aim being identifying matching atoms by their coordinates. Data looks like the following in both cases, and there's going to be up to a 1000 or so entries. The files are of different lengths since they describe different sized systems and have the following format:
1 , 0.000000000000E+00 0.000000000000E+00
2 , 0.000000000000E+00 2.468958660000E+00
3 , 0.000000000000E+00 -2.468958660000E+00
4 , 2.138180920454E+00 -1.234479330000E+00
5 , 2.138180920454E+00 1.234479330000E+00
The first column is the entry ID, second is a set of coordinates in the x,y.
What I'd like to do is compare the coordinates in both sets of data, identify matches and the corresponding ID eg "Entry 3 in file 1 corresponds to Entry 6 in file 2." I'll be using this information to alter the coordinate values within file 2.
I've read the files, line by line and split them into two entries per line using the command, then put them into a list, but am a bit stumped as to how to specify the comparison bit - particularly telling it to compare the second entries only, whilst being able to call the first entry. I'd imagine it would require looping ?
Code looks like this so far:
open1 = open('./3x3supercell_coord_clean','r')
openA = open('./6x6supercell_coord_clean','r')
small_list=[]
for line in open1:
stripped_small_line = line.strip()
column_small = stripped_small_line.split(",")
small_list.append(column_small)
big_list=[]
for line in openA:
stripped_big_line = line.strip()
column_big = stripped_big_line.split(",")
big_list.append(column_big)
print small_list[2][1] #prints out coords only

Use a dictionary with coordinates as keys.
data1 = """1 , 0.000000000000E+00 0.000000000000E+00
2 , 0.000000000000E+00 2.468958660000E+00
3 , 0.000000000000E+00 -2.468958660000E+00
4 , 2.138180920454E+00 -1.234479330000E+00
5 , 2.138180920454E+00 1.234479330000E+00"""
# Read data1 into a list of tupes (id, x, y)
coords1 = [(int(line[0]), float(line[2]), float(line[3])) for line in
(line.split() for line in data1.split("\n"))]
# This dictionary will map (x, y) -> id
coordsToIds = {}
# Add coords1 to this dictionary.
for id, x, y in coords1:
coordsToIds[(x, y)] = id
# Read coords2 the same way.
# Left as an exercise to the reader.
# Look up each of coords2 in the dictionary.
for id, x, y in coords2:
if (x, y) in coordsToIds:
print(coordsToIds[(x, y)] # the ID in coords1
Beware that comparing floats is always a problem.

If all you are doing is trying to compare the second element of each element in two lists, that can be done by having each coord compared against each coord in the opposite file. This is definitely not the fastest way to go about it, but it should get you the results you need.It scans through small list, and checks every small_entry[1] (the coordinate) against every coordinate for each entry in big_list
for small_entry in small_list:
for big_entry in big_list:
if small_entry[1] == big_entry[1] :
print(small_entry[0] + "matches" + big_entry[0])
something like this?

Build two dictionaries the following way:
# do your splitting to populate two dictionaries of this format:
# mydata1[Coordinate] = ID
# i.e.
for line in data1.split():
coord = line[2] + ' ' + line[3]
id = line[0]
mydata1[coord] = id
for line in data2.split():
coord = line[2] + ' ' + line[3]
id = line[0]
mydata2[coord] = id
#then we can use set intersection to find all coordinates in both key sets
set1=set(mydata1.keys())
set2=set(mydata2.keys())
intersect = set1.intersection(set2)
for coordinate in intersect:
print ' '.join(["Coordinate", str(coordinate), "found in set1 id", set1[coordinate]), "and set2 id", set2[coordinate])])

Here's an approach that uses dictionaries:
coords = {}
with open('first.txt', 'r') as first_list:
for i in first_list:
pair = [j for j in i.split(' ') if j]
coords[','.join(pair[2:4])] = pair[0]
#reformattted coords used as key "2.138180920454E+00,-1.234479330000E+00"
with open('second.txt', 'r') as second_list:
for i in second_list:
pair = [j for j in i.split(' ') if j]
if ','.join(pair[2:4]) in coords:
#reformatted coords from second list checked for presence in keys of dictionary
print coords[','.join(pair[2:4])], pair[0]
What's going on here is that each of your coordinates from file A (which you have stated will be distinct), get stored into a dictionary as the key. Then, the first file is closed and the second file is opened. The second list's coordinates get opened, reformatted to match how the dictionary keys are saved and checks for membership. If the coordinate string from list B is in dictionary coords, the pair exists in both lists. It then prints the ID from the first and second list, regarding that match.
Dictionary lookups are much faster O(1). This approach also has the advantage of not needing to have all the data in memory in order to check (just one list) as well as not worrying about type-casting, e.g., float/int conversions.

Related

Python: Fill data in a list in a tuple

I need to create a function that reads the data given and creates a list that contains tuples each of which has as its first element the name of the airport and as its second and third its geographical coordinates as float numbers.
airport_data = """
Alexandroupoli 40.855869°N 25.956264°E
Athens 37.936389°N 23.947222°E
Chania 35.531667°N 24.149722°E
Chios 38.343056°N 26.140556°E
Corfu 39.601944°N 19.911667°E
Heraklion 35.339722°N 25.180278°E"""
airports = []
import re
airport_data1 = re.sub("[°N#°E]","",airport_data)
def process_airports(string):
airports_temp = list(string.split())
airports = [tuple(airports_temp[x:x+3]) for x in range(0, len(airports_temp), 3)]
return airports
print(process_airports(airport_data1))
This is my code so far but I'm new to Python, so I'm struggling to debug my code.
If you want the second and third element of the tuple to be a float, you have to convert them using the float() function.
One way to do this is creating a tuple with round brackets in your list comprehension and convert the values there:
def process_airports(string):
airports_temp = string.split()
airports = [(airports_temp[x], float(airports_temp[x+1]), float(airports_temp[x+2])) for x in range(0, len(airports_temp), 3)]
return airports
This yields a pretty unwieldy expression, so maybe this problem could be solved more readable with a classical for loop.
Also note that slpit() already returns a list.
Further remark: If you just cut off the letters from coordinates this might come back to bite you when your airports are in different quadrants.
You need to take in account N/S, W/E for longitude and latitude.
May be
def process_airports(string):
airports = []
for line in string.split('\n'):
if not line: continue
name, lon, lat = line.split()
airports.append((name,
float(lon[:-2]) * (1 if lon[-1] == "N" else -1),
float(lat[:-2]) * (-1 if lat[-1] == "E" else 1)
))
return airports
>>> process_airports(airport_data1)
[('Alexandroupoli', 40.855869, -25.956264), ('Athens', 37.936389, -23.947222), ('Chania', 35.531667, -24.149722), ('Chios', 38.343056, -26.140556), ('Corfu', 39.601944, -19.911667), ('Heraklion', 35.339722, -25.180278)]
I prefered the double split to put in evidence the differences lines/tuple elements

Count occurances of a specific string within multi-valued elements in a set

I have generated a list of genes
genes = ['geneName1', 'geneName2', ...]
and a set of their interactions:
geneInt = {('geneName1', 'geneName2'), ('geneName1', 'geneName3'),...}
I want to find out how many interactions each gene has and put that in a vector (or dictionary) but I struggle to count them. I tried the usual approach:
interactionList = []
for gene in genes:
interactions = geneInt.count(gene)
interactionList.append(ineractions)
but of course the code fails because my set contains elements that are made out of two values while I need to iterate over the single values within.
I would argue that you are using the wrong data structure to hold interactions. You can represent interactions as a dictionary keyed by gene name, whose values are a set of all the genes it interacts with.
Let's say you currently have a process that does something like this at some point:
geneInt = set()
...
geneInt.add((gene1, gene2))
Change it to
geneInt = collections.defaultdict(set)
...
geneInt[gene1].add(gene2)
If the interactions are symmetrical, add a line
geneInt[gene2].add(gene1)
Now, to count the number of interactions, you can do something like
intCounts = {gene: len(ints) for gene, ints in geneInt.items()}
Counting your original list is simple if the interactions are one-way as well:
intCounts = dict.fromkeys(genes, 0)
for gene, _ in geneInt:
intCounts[gene] += 1
If each interaction is two-way, there are three possibilities:
Both interactions are represented in the set: the above loop will work.
Only one interaction of a pair is represented: change the loop to
for gene1, gene2 in geneInt:
intCounts[gene1] += 1
if gene1 != gene2:
intCounts[gene2] += 1
Some reverse interactions are represented, some are not. In this case, transform geneInt into a dictionary of sets as shown in the beginning.
Try something like this,
interactions = {}
for gene in genes:
interactions_count = 0
for tup in geneInt:
interactions_count += tup.count(gene)
interactions[gene] = interactions_count
Use a dictionary, and keep incrementing the value for every gene you see in each tuple in the set geneInt.
interactions_counter = dict()
for interaction in geneInt:
for gene in interaction:
interactions_counter[gene] = interactions_counter.get(gene, 0) + 1
The dict.get(key, default) method returns the value at the given key, or the specified default if the key doesn't exist. (More info)
For the set geneInt={('geneName1', 'geneName2'), ('geneName1', 'geneName3')}, we get:
interactions_counter = {'geneName1': 2, 'geneName2': 1, 'geneName3': 1}

Convert undstructured blocks of data in a columnwise manner (DataFrame)

Description of the problem:
I have an external *.xls file that I have converted to a *.csv file containing block of data such as:
"Legend number one";;;;Number of items;6
X;-358.6806792;-358.6716338;;;
Y;0.8767189;0.8966855;Avg;;50.1206378
Z;-0.7694626;-0.7520983;Std;;-0.0010354
D;8.0153902;8;Err;;1.010385
;;;;;
There is many many blocks.
Each block may contain some additional lines data;
"Legend number six";;;;Number of items;19
X;-358.6806792;-358.6716338;;;
Y;0.8767189;0.8966855;Avg;;50.1206378
Z;-0.7654644;-0.75283;Std;;-0.0010354
D;8.0153902;8;Err;;1.010385
A;0;1;Value;;0
B;1;0;;;
;;;;;
The structure is such that a new empty line separate each blocs, which is the ';;;;;;' line in my samples.
The first line after this begins with a unique identifier of the block.
It appears that each line contains 6 elements such as key1;elem1;elem2;key2;elem3;elem4 which would be nice to represent as two 3-elements vector key1;elem1;elem2 and key2;elem3;elem4 on two separate lines. Example for the second sample:
"Legend number six";;
;;Number of items;19
X;-358.6806792;-358.6716338;
;;
Y;0.8767189;0.8966855;
Avg;;50.1206378
Z;-0.7654644;-0.75283;
Std;;-0.0010354
D;8.0153902;8;
Err;;1.010385
A;0;1;
Value;;0
B;1;0;
;;
;;;;;
Some are empty but I do not want to discard them for the moment.
But I would like to end up a DataFrame containing columnwise elements for each block of data.
The cleanest "pre solution" I have so far:
With this Python code I ended up in a more organized "List of dictionaries":
import os, sys, re, glob
import pandas as pd
csvFile = os.path.join(workingDir,'file.csv')
h = 0 # Number of lines to skip in head
s = 2 # number of values per key
s += 1
str1 = 'Number of items'
# Reading file in a global list and storing each line in a sublist:
A = [line.split(';') for line in open(csvFile).read().split('\n')]
# This code splits each 6-elements sublist in one new sublist
# containing two-elements; each element with 3 values:
B = [(';'.join(el[:s])+'\n'+';'.join(el[s:])).split('\n') for el in A]
# Init empty structures:
names = [] # to store block unique identifier (the name in the legend)
L = [] # future list of dictionnaries
for el in (B):
for idx,elj in enumerate(el):
vi = elj.split(';')[1:]
# Here we grep the name only when the 2nd element of
# the first line contains the string "Number of items",
# which is constant all over the file:
if len(vi)>1 and vi[0]==str1:
name = el[idx-1].split(';')[0]
names.append(name)
#print(name)
# We loop again over B to append in a new list one dictionary
# per vector of 3 elements because each vector of 3 elements
# is structured like ; key;elem1;elem2
for el in (B):
for elj in (el):
k = elj.split(';')[0]
v = elj.split(';')[1:]
# Little tweak because the key2;elem3;elem4 of the
# first line (the one containing the name) have the
# key in the second place like "elem3;key2;elem4" :
if len(v)>1 and v[0]==str1:
kp = v[0]
v = [v[1],k]
k = kp
if k!='':
dct = {k:v}
L.append(dct)
I am unsuccessful to extract the name as a global identifier and all values of the blocs as variable so far. I can't play with some modulo based technique because of the variable number of informations in each separate block of data even if all block contain at least some common keys.
I also tried a while condition within a for loop all over each dictionary but it's a mess now.
zip could potentially be a nice option but I don't really know how to use it properly.
Target DataFrame:
What I'd like to end up should ideally look something similar to a DataFrame containing;
index 'Number of items' 'X' '' 'Y' 'Avg' 'Z' 'Std' ...
"Legend number one" 6 ...
"Legend number six" 19 ...
"Legend number 11" 6 ...
"Legend number 15" 18 ...
The columns names are the keys and the table is containing the values for each block of data on a separate line.
If there is a numbered index and a new column with "Legend name"; it's OK as well.
CSV sample to play with:
"Legend number one";;;;Number of items;6
X;8.6806792;8.6716338;;;
Y;0.1557;0.1556;Avg;;50.1206378
Z;-0.7859;-0.7860;Std;;-0.0010354
D;8.0153902;8;Err;;1.010385
;;;;;
"Legend number six";;;;Number of items;19
X;56.6806792;56.6716338;;;
Y;0.1324;0.1322;Avg;;50.1206378
Z;-0.7654644;-0.75283;Std;;-0.0010354
D;8.0153902;8;Err;;1.010385
A;0;1;Value;;0
B;1;0;;;
;;;;;
"Legend number 11";;;;Number of items;6
X;358.6806792;358.6716338;;;
Y;0.1324;0.1322;Avg;;50.1206378
Z;-0.7777;-0.7778;Std;;-0.0010354
D;8.0153902;8;Err;;1.010385
;;;;;
"Legend number 15";;;;Number of items;18
X;58.6806792;58.6716338;;;
Y;0.1324;0.1322;Avg;;50.1206378
Z;0.5555;0.5554;Std;;-0.0010354
D;8.0153902;8;Err;;1.010385
A;0;1;Value;;0
B;1;0;;;
C;0;0;k;1;0
;;;;;
I'm using Ubuntu and Python 3.6 but the script must work on a Windows computer as well.
Appending this to the previous code should work pretty well:
for elem in L:
for key,val in elem.items():
if key in names:
name = key
Dict2 = {}
else:
Dict2[key] = val
Dict1[name] = Dict2
df1 = pd.DataFrame.from_dict(Dict1, orient='index')
df2 = pd.DataFrame(index=df1.index)
for col in df1.columns:
colS = df1[col].apply(pd.Series)
colS = colS.rename(columns = lambda x : col+'_'+ str(x))
df2 = pd.concat([df2[:], colS[:]], axis=1)
df2.to_csv('output.csv', sep=',', index=True, header=True)
There is probably many other ways to go...
This link was helpful:
https://chrisalbon.com/python/data_wrangling/pandas_expand_cells_containing_lists/

'Bool object has no attributed index' Error

Im trying to write a code that solves a facility location problem. I have created a data structure in the variable data. data is a list with 4 lists in it. data[0] is a list of city names with a length of 128. The other three are irrelevant for now. There is also a function called nearbyCities(cityname, radius, data) which takes a city name, a radius, and the data and outputs a list of cities within the radius. Assuming that all the code mentioned is correct, why is the error:
File "/Applications/Wing101.app/Contents/MacOS/src/debug/tserver/_sandbox.py", line 232, in locateFacilities
File "/Applications/Wing101.app/Contents/MacOS/src/debug/tserver/_sandbox.py", line 162, in served
File "/Applications/Wing101.app/Contents/MacOS/src/debug/tserver/_sandbox.py", line 131, in nearbyCities
AttributeError: 'bool' object has no attribute 'index'
popping up?
Here's the three functions in question. r is the radius of the cities I am trying to serve. The first two are just helpers for the third which I am trying to call. The error is in the while loop I think.
def served(city, r, data, FalseList): #Helper Function 1
nearbycity=nearbyCities(city, r, data)
for everycity in nearbycity:
dex1=data[0].index(everycity)
FalseList[dex1]=True
return FalseList
def CountHowManyCitiesAreInRThatAreNotServed(city, FalseList, r, data): #Helper Function 2
NBC= nearbyCities(city, r, data)
notserved=0
for element in NBC:
if FalseList[data[0].index(element)] == False:
notserved= notserved+1
return notserved
def locateFacilities(data, r):
FalseList=[False]*128
Cities=data[0]
Radius=[]
output=[]
for everycity in Cities:
Radius.append(len(nearbyCities(everycity, r, data)))
maxito= max(Radius) #Take Radius and find the city that has the most cities in r radius from it.
dex= Radius.index(maxito)
firstserver=Cities[dex]
output.append(firstserver)
FalseList=served(firstserver, r, data, FalseList)
while FalseList.count(False) > 0:
WorkingCityList=[]
Radius2=[]
temp=[]
for everycity in Cities:
if FalseList[Cities.index(everycity)] == False:
Radius2.append(CountHowManyCitiesAreInRThatAreNotServed(everycity, FalseList, r, data))
temp.append(everycity)
maxito=max(Radius2)
dex = Radius2.index(maxito)
serverC= temp[dex]
output.append(serverC)
FalseList=served(serverC, r, FalseList, data)
output.sort()
return output
This is how the rest of the code starts
import re #Import Regular Expressions
def createDataStructure():
f=open('miles.dat') #Opens file
CITY_REG = re.compile(r"([^[]+)\[(\d+),(\d+)\](\d+)") #RegularExpression with a pattern that groups 3 diffrent elements. r" takes a raw string and each thing in parentheses are groups. The first group takes a string until there is actual brackets. The second starts at brackets with two integers sperated by a comma. The third takes an intger. The ending " ends the raw string.
CITY_TYPES = (str, int, int, int) # A conversion factor to change the raw string to the desired types.
#Initialized lists
Cities=[]
Coordinates=[]
Populations=[]
TempD=[]
FileDistances=[]
#Loop that reads the file line by line
line=f.readline()
while line:
match = CITY_REG.match(line) #Takes the compiled pattern and matches it. Returns false of not matched.
if match:
temp= [type(dat) for dat,type in zip(match.groups(), CITY_TYPES)] #Returns the matched string and converts it into the desired format.
# Moves the matched lists into individual lists
Cities.append(temp[0])
Coordinates.append([temp[1],temp[2]])
Populations.append(temp[3])
if TempD: #Once the distance line(s) are over and a city line is matched this appends the distances to a distance list.
FileDistances.append(TempD)
TempD=[]
elif not(line.startswith('*')): # Runs if the line isn't commented out with a "*" or a matched line (line that starts with a city).
g=line.split() #This chunck takes a str of numbers and converts it into a list of integers.
i=0
intline=[]
while i != len(g):
intline.append(int(g[i]))
i+=1
TempD.extend(intline)
line=f.readline()
f.close() #End parsing file
FileDistances.append(TempD) #Appends the last distance line
FileDistances.insert(0,[]) #For first list
i=0
j=1
while i!= 128: #Loop takes lists of distances and makes them len(128) with corresponding distances
FileDistances[i].reverse() #Reverses the current distance list to correspond with distance from city listed before.
FileDistances[i].append(0) #Appends 0 because at this point the city distance is it's self.
counter=i+1
while len(FileDistances[i]) != 128: #Loop that appends the other distnaces.
FileDistances[i].append(FileDistances[counter][-j])
counter=counter+1
j+=1
i+=1
cities=[]
for i in Cities: #Removes the commas. I dont know why we need to get rid of the commas...
new=i.replace(',','')
cities.append(new)
#Final product <3
MasterList=[cities, Coordinates, Populations, FileDistances]
return MasterList
getCoordinates
def getCoordinates(cityname, data): #Basic search function
INDEX=data[0].index(cityname)
return data[1][INDEX]
getPopulation
def getPopulation (cityname, data): #Basic search function
INDEX=data[0].index(cityname)
return data[2][INDEX]
getDistance
def getDistance (cityname1, cityname2, data): #Basic search function
INDEX=data[0].index(cityname1)
INDEX2=data[0].index(cityname2)
return data[3][INDEX][INDEX2]
nearbyCities
def nearbyCities(cityname, radius, data):
Cities=data[0]
INDEX=Cities.index(cityname)
workinglist=data[3][INDEX] #Data[3] is the distance list
IndexList=[]
index = 0
while index < len(workinglist): #Goes through the lists and outputs the indexes of cities in radius r
if workinglist[index] <= radius:
IndexList.append(index)
index += 1
output=[]
for i in IndexList: #Searches the indexes and appends them to an output list
output.append(Cities[i])
output.sort()
return output
The file miles.dat can be found at http://mirror.unl.edu/ctan/support/graphbase/miles.dat
Well, it appears that data[0] contains a boolean, and not a string. I tried this in an empty interpreter, and was able to raise the same exception.
It appears that there is an error in your data list's format. We would need to see that in order to figure out the true issue.

Python script for trasnforming ans sorting columns in ascending order, decimal cases

I wrote a script in Python removing tabs/blank spaces between two columns of strings (x,y coordinates) plus separating the columns by a comma and listing the maximum and minimum values of each column (2 values for each the x and y coordinates). E.g.:
100000.00 60000.00
200000.00 63000.00
300000.00 62000.00
400000.00 61000.00
500000.00 64000.00
became:
100000.00,60000.00
200000.00,63000.00
300000.00,62000.00
400000.00,61000.00
500000.00,64000.00
10000000 50000000 60000000 640000000
This is the code I used:
import string
input = open(r'C:\coordinates.txt', 'r')
output = open(r'C:\coordinates_new.txt', 'wb')
s = input.readline()
while s <> '':
s = input.readline()
liste = s.split()
x = liste[0]
y = liste[1]
output.write(str(x) + ',' + str(y))
output.write('\n')
s = input.readline()
input.close()
output.close()
I need to change the above code to also transform the coordinates from two decimal to one decimal values and each of the two new columns to be sorted in ascending order based on the values of the x coordinate (left column).
I started by writing the following but not only is it not sorting the values, it is placing the y coordinates on the left and the x on the right. In addition I don't know how to transform the decimals since the values are strings and the only function I know is using %f and that needs floats. Any suggestions to improve the code below?
import string
input = open(r'C:\coordinates.txt', 'r')
output = open(r'C:\coordinates_sorted.txt', 'wb')
s = input.readline()
while s <> '':
s = input.readline()
liste = string.split(s)
x = liste[0]
y = liste[1]
output.write(str(x) + ',' + str(y))
output.write('\n')
sorted(s, key=lambda x: x[o])
s = input.readline()
input.close()
output.close()
thanks!
First, try to format your code according to PEP8—it'll be easier to read. (I've done the cleanup in your post already).
Second, Tim is right in that you should try to learn how to write your code as (idiomatic) Python not just as if translated directly from its C equivalent.
As a starting point, I'll post your 2nd snippet here, refactored as idiomatic Python:
# there is no need to import the `string` module; `.strip()` is a built-in
# method of strings (i.e. objects of type `str`).
# read in the data as a list of pairs of raw (i.e. unparsed) coordinates in
# string form:
with open(r'C:\coordinates.txt') as in_file:
coords_raw = [line.strip().split() for line in in_file.readlines()]
# convert the raw list into a list of pairs (2-tuples) containing the parsed
# (i.e. float not string) data:
coord_pairs = [(float(x_raw), float(y_raw)) for x_raw, y_raw in coords_raw]
coord_pairs.sort() # you want to sort the entire data set, not just values on
# individual lines as in your original snippet
# build a list of all x and y values we have (this could be done in one line
# using some `zip()` hackery, but I'd like to keep it readable (for you at
# least)):
all_xs = [x for x, y in coord_pairs]
all_ys = [y for x, y in coord_pairs]
# compute min and max:
x_min, x_max = min(all_xs), max(all_xs)
y_min, y_max = min(all_ys), max(all_ys)
# NOTE: the above section performs well for small data sets; for large ones, you
# should combine the 4 lines in a single for loop so as to NOT have to read
# everything to memory and iterate over the data 6 times.
# write everything out
with open(r'C:\coordinates_sorted.txt', 'wb') as out_file:
# here, we're doing 3 things in one line:
# * iterate over all coordinate pairs and convert the pairs to the string
# form
# * join the string forms with a newline character
# * write the result of the join+iterate expression to the file
out_file.write('\n'.join('%f,%f' % (x, y) for x, y in coord_pairs))
out_file.write('\n\n')
out_file.write('%f %f %f %f' % (x_min, x_max, y_min, y_max))
with open(...) as <var_name> gives you guaranteed closing of the file handle as with try-finally; also, it's shorter than open(...) and .close() on separate lines. Also, with can be used for other purposes, but is commonly used for dealing with files. I suggest you look up how to use try-finally as well as with/context managers in Python, in addition to everything else you might have learned here.
Your code looks more like C than like Python; it is quite unidiomatic. I suggest you read the Python tutorial to find some inspiration. For example, iterating using a while loop is usually the wrong approach. The string module is deprecated for the most part, <> should be !=, you don't need to call str() on an object that's already a string...
Then, there are some errors. For example, sorted() returns a sorted version of the iterable you're passing - you need to assign that to something, or the result will be discarded. But you're calling it on a string, anyway, which won't give you the desired result. You also wrote x[o] where you clearly meant x[0].
You should be using something like this (assuming Python 2):
with open(r'C:\coordinates.txt') as infile:
values = []
for line in infile:
values.append(map(float, line.split()))
values.sort()
with open(r'C:\coordinates_sorted.txt', 'w') as outfile:
for value in values:
outfile.write("{:.1f},{:.1f}\n".format(*value))

Categories