Highest to Lowest from a textfile? - python

I'm pretty new to python and I have been having trouble in trying to print out a score list in the form of highest to lowest. The scorelist is saved in a text file and is set out like this...
Jax:6
Adam:10
Rav:2
I have looked in books but I haven't been getting anywhere, does anyone know how I could go about receiving the scores in the form of highest to lowest from a textfile. Thank You.
I am using Python 3.3.2 version.

try like this:
with open("your_file") as f:
my_dict = {}
for x in f:
x = x.strip().split(":")
my_dict[x[0]] = x[1]
print sorted(my_dict.items(), key= lambda x:x[1], reverse=True)

First, you need to load the file (let say it's name is file.txt), then you need to read the values, sort it after that and then print it. It's not as difficult as it seems to be.
Works only when the scores are unique
# init a dictionary where you store the results
results = {}
# open the file with results in a "read" mode
with open("file.txt", "r") as fileinput:
# for each line in file with results, do following
for line in fileinput:
# remove whitespaces at the end of the line and split the line by ":"
items = line.strip().split(":")
# store the result, the score will be the key
results[int(items[1])] = items[0]
# sort the scores (keys of results dictionery) in descending order
sorted_results = sorted(results.keys(), reverse=True)
# for each score in sorted_results do the following
for i in sorted_results:
# print the result in the format of the scores in your file
print "{}:{}".format(results[i],i)
The steps are explained in the example code.
The links to the relevant documentation or examples follows:
Sorting
Printing (string.format())
Dictionary data structure (dict)
Reading and writing files
EDIT:
This version works even when there are more scores of the same value.
(Thanks to #otorrillas for pointing out the problem)
# init a list where you store the results
results = []
# open the file with results in a "read" mode
with open("file.txt", "r") as fileinput:
# for each line in file with results, do following
for line in fileinput:
# remove whitespaces at the end of the line and split the line by ":"
items = line.strip().split(":")
# store the result as a list of tuples
results.append(tuple(items))
# first it sorts all the tuples in `results` tuple by the second item (score)
# for each result record in sorted results list do the following
for result_item in sorted(results, key=lambda x: x[1], reverse=True):
# print the result in the format of the scores in your file
print "{}:{}".format(result_item[0], result_item[1])
Comments in the code describes the code. The main difference is that the code does not use dict any more and uses tuple instead. And it also uses sorting by a key.

Just for fun: if all you need to do is sort the data from a file you can use the UNIX sort command
sort -k 2 -t : -n -r $your_file
(the arguments are: sort by second key, split fields by ':', numeric sort, reverse order).

tldr
sorted([l.rstrip().split(':') for l in open('d.d')], key=lambda i:int(i[1]))
You need to operate on the lines in the file, that you can get simply as
[l for l in open('FILE')]
but possibly without the new lines
[l.rstrip() for l in open('FILE')]
and eventually split over the : colon character
[l.rstrip().split(':') for l in open('FILE')]
so that you have obtainined a list of lists
>>> print [l.rstrip().split(':') for l in open('FILE')]
[['Jax', '6'], ['Adam', '10'], ['Rav', '2']]
that is the thing that you want to have sorted. In species
you want to sort it according to the numerical value of the 2nd field
>>> print [int(r[1]) for r in [l.rstrip().split(':') for l in open('FILE')]]
[6, 10, 2]
The sorted builtin accepts the optional argument key, a function to extract the part to compare in each element of the iterable to be sorted
>>> sd = sorted([l.rstrip().split(':')for l in open('FILE')],key=lambda r:int(r[1]))
>>> print sd
[['Rav', '2'], ['Jax', '6'], ['Adam', '10']]
and that's all folks...

Related

Sort value in txt file contain {string, number} using Python

I have a .txt file as per below:
testa, 10
testb, 50
testc, 20
I want to sort it reversely, based on the number each line located on the right side of comma, become the result like below:
testb, 50
testc, 20
testa, 10
I have tried to append each line to a list and use sort(), but fail.
Any way to do that in Python? Take note my file is txt.
This prints the lines in reverse order
with open('test.txt') as f:
lines = f.readlines()
with open('out.txt', 'w') as f:
for line in sorted(lines, key=lambda x: x.split()[1], reverse=True):
f.write(line.strip() + '\n')
Edited to write the out.txt file
You can specify a key=func argument to the sorted() built-in function, which specify a value to be used as comparing value between items. The code to do it would be:
lines = open('input.txt').read().splitlines()
splitted = (line.split(', ') for line in lines)
items = ((i[0], int(i[1])) for i in splitted)
# Reversed sort
reversed_items = sorted(items, key=lambda i: -i[1])
open('output.txt', 'w+').write('\n'.join((f"{i[0]}, {i[1]}" for i in reversed_items)))
Documentation: https://docs.python.org/3/howto/sorting.html#key-functions

How do I sort a text file after the last instance of a character?

Goal: Sort the text file alphabetically based on the characters that appear AFTER the final slash. Note that there are random numbers right before the final slash.
Contents of the text file:
https://www.website.com/1939332/delta.html
https://www.website.com/2237243/alpha.html
https://www.website.com/1242174/zeta.html
https://www.website.com/1839352/charlie.html
Desired output:
https://www.website.com/2237243/alpha.html
https://www.website.com/1839352/charlie.html
https://www.website.com/1939332/delta.html
https://www.website.com/1242174/zeta.html
Code Attempt:
i = 0
for line in open("test.txt").readlines(): #reading text file
List = line.rsplit('/', 1) #splits by final slash and gives me 4 lists
dct = {list[i]:list[i+1]} #tried to use a dictionary
sorted_dict=sorted(dct.items()) #sort the dictionary
textfile = open("test.txt", "w")
for element in sorted_dict:
textfile.write(element + "\n")
textfile.close()
Code does not work.
I would pass a different key function to the sorted function. For example:
with open('test.txt', 'r') as f:
lines = f.readlines()
lines = sorted(lines, key=lambda line: line.split('/')[-1])
with open('test.txt', 'w') as f:
f.writelines(lines)
See here for a more detailed explanation of key functions.
Before you run this, I am assuming you have a newline at the end of your test.txt. This will fix "combining the second and third lines".
If you really want to use a dictionary:
dct = {}
i=0
with open("test.txt") as textfile:
for line in textfile.readlines():
mylist = line.rsplit('/',1)
dct[mylist[i]] = mylist[i+1]
sorted_dict=sorted(dct.items(), key=lambda item: item[1])
with open("test.txt", "w") as textfile:
for element in sorted_dict:
textfile.write(element[i] + '/' +element[i+1])
What you did wrong
In the first line, you name your variable List, and in the second you access it using list.
List = line.rsplit('/', 1)
dct = {list[i]:list[i+1]}
Variable names are case sensitive so you need use the same capitalisation each time. Furthermore, Python already has a built-in list class. It can be overridden, but I would not recommend naming your variables list, dict, etc.
( list[i] will actually just generate a types.GenericAlias object, which is a type hint, something completely different from a list, and not what you want at all.)
You also wrote
dct = {list[i]:list[i+1]}
which repeatedly creates a new dictionary in each loop iteration, overwriting whatever was stored in dct previously. You should instead create an empty dictionary before the loop, and assign values to its keys every time you want to update it, as I have done.
You're calling sort in each iteration in the loop; you should only call once it after the loop is done. After all, you only want to sort your dictionary once.
You also open the file twice, and although you close it at the end, I would suggest using a context manager and the with statement as I have done, so that file closing is automatically handled.
My code
sorted(dct.items(), key=lambda item: item[1])
means that the sorted() function uses the second element in the item tuple (the dictionary item) as the 'metric' by which to sort.
`textfile.write(element[i] + '/' +element[i+1])`
is necessary, since, when you did rsplit('/',1), you removed the /s in your data; you need to add them back and reconstruct the string from the element tuple before you write it.
You don't need + \n in textfile.write since readlines() preserves the \n. That's why you should end text files with a newline: so that you don't have to treat the last line differently.
def sortFiles(item):
return item.split("/")[-1]
FILENAME = "test.txt"
contents = [line for line in open(FILENAME, "r").readlines() if line.strip()]
contents.sort(key=sortFiles)
with open(FILENAME, "w") as outfile:
outfile.writelines(contents)

Make a list in python from a FASTA text file

I have text file like this small example:
>ENST00000491024.1|ENSG00000187583.6|OTTHUMG00000040756.4|OTTHUMT00000097942.2|PLEKHN1-003|PLEKHN1|176
SLESSPDAPDHTSETSHSPLYADPYTPPATSHRRVTDVRGLEEFLSAMQSARGPTPSSPLPSVPVSVPASDPRSCSSGPAGPYLLSKKGALQSRAAQRHRGSAKDGGPQPPDAPQLVSSAREGSPEPWLPLTDGRSPRRSRDPGYDHLWDETLSSSHQKCPQLGGPEASGGLVQWI
>ENST00000433179.2|ENSG00000187642.5|OTTHUMG00000040757.3|-|C1orf170-201|C1orf170|696
MPTQDGQLRRPARPPGPRAWMEPRGGGSSQFSSCPGPASSGDQMQRLLQGPAPRPPGEPPGSPKSPGHSTGSQRPPDSPGAPPRSPSRKKRRAVGAKGGGHTGASASAQTGSPLLPAASPETAKLMAKAGQEELGPGPAGAPEPGPRSPVQEDRPGPGLGLSTPVPVTEQGTDQIRTPRRAKLHTVSTTVWEALPDVSRAKSDMAVSTPASEPQPDRDMAVSTPASEPQSDRDMAVSTPASEPQPDTDMAVSTPASEPQPDRDMAVSIPASKPQSDTAVSTPASEPQSSVALSTPISKPQLDTDVAVSTPASKHGLDVALPTAGPVAKLEVASSPPVSEAVPRMTESSGLVSTPVPRADAAGLAWPPTRRAGPDVVEMEAVVSEPSAGAPGCCSGAPALGLTQVPRKKKVRFSVAGPSPNKPGSGQASARPSAPQTATGAHGGPGAWEAVAVGPRPHQPRILKHLPRPPPSAVTRVGPGSSFAVTLPEAYEFFFCDTIEENEEAEAAAAGQDPAGVQWPDMCEFFFPDVGAQRSRRRGSPEPLPRADPVPAPIPGDPVPISIPEVYEHFFFGEDRLEGVLGPAVPLPLQALEPPRSASEGAGPGTPLKPAVVERLHLALRRAGELRGPVPSFAFSQNDMCLVFVAFATWAVRTSDPHTPDAWKTALLANVGTISAIRYFRRQVGQGRRSHSPSPSS
>ENST00000341290.2|ENSG00000187642.5|OTTHUMG00000040757.3|OTTHUMT00000097943.2|C1orf170-001|C1orf170|676
MEPRGGGSSQFSSCPGPASSGDQMQRLLQGPAPRPPGEPPGSPKSPGHSTGSQRPPDSPGAPPRSPSRKKRRAVGAKGGGHTGASASAQTGSPLLPAASPETAKLMAKAGQEELGPGPAGAPEPGPRSPVQEDRPGPGLGLSTPVPVTEQGTDQIRTPRRAKLHTVSTTVWEALPDVSRAKSDMAVSTPASEPQPDRDMAVSTPASEPQSDRDMAVSTPASEPQPDTDMAVSTPASEPQPDRDMAVSIPASKPQSDTAVSTPASEPQSSVALSTPISKPQLDTDVAVSTPASKHGLDVALPTAGPVAKLEVASSPPVSEAVPRMTESSGLVSTPVPRADAAGLAWPPTRRAGPDVVEMEAVVSEPSAGAPGCCSGAPALGLTQVPRKKKVRFSVAGPSPNKPGSGQASARPSAPQTATGAHGGPGAWEAVAVGPRPHQPRILKHLPRPPPSAVTRVGPGSSFAVTLPEAYEFFFCDTIEENEEAEAAAAGQDPAGVQWPDMCEFFFPDVGAQRSRRRGSPEPLPRADPVPAPIPGDPVPISIPEVYEHFFFGEDRLEGVLGPAVPLPLQALEPPRSASEGAGPGTPLKPAVVERLHLALRRAGELRGPVPSFAFSQNDMCLVFVAFATWAVRTSDPHTPDAWKTALLANVGTISAIRYFRRQVGQGRRSHSPSPSS
>ENST00000428771.2|ENSG00000188290.6|OTTHUMG00000040758.2|OTTHUMT00000097945.2|HES4-002|HES4|247
MAADTPGKPSASPMAGAPASASRTPDKPRSAAEHRKVGSRPGVRGATGGREGRGTQPVPDPQSSKPVMEKRRRARINESLAQLKTLILDALRKESSRHSKLEKADILEMTVRHLRSLRRVQVTAALSADPAVLGKYRAGFHECLAEVNRFLAGCEGVPADVRSRLLGHLAACLRQLGPSRRPASLSPAAPAEAPAPEVYAGRPLLPSLGGPFPLLAPPLLPGLTRALPAAPRAGPQGPGGPWRPWLR
This file is splitted into different groups. Each group has 2 parts. The 1st part starts with ">" and the elements in this part are splitted by "|" and the line after that is the 2nd part. I am trying to make a list in Python from my file which has the 6th element of the ID part of each group. Here is the expected output for the small example:
list = ["PLEKHN1", "C1orf170", "C1orf170", "HES4"]
I am trying to first import into a dictionary and then make a list like expected output using:
from itertools import groupby
with open('infile.txt') as f:
groups = groupby(f, key=lambda x: not x.startswith(">"))
d = {}
for k,v in groups:
if not k:
key, val = list(v)[0].rstrip(), "".join(map(str.rstrip,next(groups)[1],""))
d[key] = val
k = d.keys()
res = [el[5:] for s in k for el in s.split('|')]
But it does not return what I am looking for. Do you know how to fix it?
Since these are clearly protein sequences in FASTA format, I suggest you use Biopython, it will save you time and be more robust than building your own parser:
from Bio import SeqIO
lst = [record.description.split('|')[5] for record in SeqIO.parse('in_file.fasta', 'fasta')]
print(lst)
# ['PLEKHN1', 'C1orf170', 'C1orf170', 'HES4']
Try this:
res = [s[5] for s in [el.split('|') for el in k]]
output:
['HES4', 'C1orf170', 'PLEKHN1', 'C1orf170']
You can get the tokens you want by reading every line in your file and selecting only the lines that start with '>'. Then you split the results based on the '|' character and take the 6th element. This code does that in one line
with open('infile.txt') as f:
tokens =[line.split('|')[5] for line in f.readlines() if line[0] == '>']
print(tokens)

how can i sort multiple dictionaries in one varible?

there are several dictionaries in the variable highscores. I need to sort it by its key values, and sorted() isn't working.
global highscores
f = open('RPS.txt', 'r')
highscores = [line.strip() for line in f]
sorted(highscores)
highscores = reverse=True[:5]
for line in f:
x = line.strip()
print(x)
f.close()
this is the error:
TypeError: 'bool' object is not subscriptable
sorted(v) an iterator that returns each element of v in order; it is not a list. You can use the iterator in a for loop to process the elements one at a time:
for k in sorted(elements): ...
You can transform each element and store the result in a list:
v = [f(k) for k in sorted(elements)]
Or you can just capture all elements into a list.
v = list(k)
Note that in the code above, elements are strings from a file, not a dictionary.
The following should do what (I think) you want:
with open('RPS.txt', 'r') as f: # will automatically close f
highscores = [line.strip() for line in f]
highscores = sorted(highscores, reverse=True)[:5]
for line in highscores:
print(line)
The primary problem was the way you're using sorted(). And, at the end, rather than trying to iterate though the lines of the file again (which won't work because files aren't list and can't be arbitrarily iterated-over) WHat the code above does is sort the lines read from the file and then takes first 5 of that list, which was saved in highscores. Following that it prints them. There's no need to strip the lines again, that was taken care of when the file was first read.

Python: How do i sort integers in a text file into ascending and/or descending order?

I have a text file(data.txt) containing names and scores 1:1 i.e.:
Mike = 1\n John = 2\n Cam = 3\n
I want to sort the integers along with the corresponding name in ascending and descending order.
with open(filepath, 'r') as file:
list = []
for line in file:
list.append(line[1:-1].split(","))
list.sort(key=lambda x: int(x[4]))
Yes, i have done some research however it doesn't work, i was hoping one of you guys could help me fix the code above. I know i must convert the data within the text file into a list then sort the list then put write back to the text file, but i am not sure how.
Source: How do i sort a text file numerically highest to lowest?
Here's an example, doing it in memory using the sorted() function.
with open(filepath, 'r') as file:
sorted_data=sorted(file.readlines(),
key=lambda item: int(item.rsplit('=',1)[-1].strip()))
sorted_data will then contain a list of the sorted rows.
Here it is:
open the file:
with open(filepath, 'r') as file:
Get all the lines of the file (as a list):
file.readlines()
For each line in the file, sorted() will numerically sort them based on the output of passing each line into the "key" function.
The "key" function takes a line, splits it by the "=" symbol, then takes the last part of that (the part after the = sign), strips any leading or trailing whitespace (.strip()) and returns the value cast to an integer (int) .
Sorted takes the lines and orders them using the numbers output by the key function.

Categories