how can i sort file elements in python? - python

I have to sort some elements in a text file that contains the names with the schedules of some teachers. Searching on google, I found this program:
def sorting(filename):
infile = open(filename)
words = []
for line in infile:
temp = line.split()
for i in temp:
words.append(i)
infile.close()
words.sort()
outfile = open("result.txt", "w")
for i in words:
outfile.writelines(i)
outfile.writelines(" ")
outfile.close()
sorting("file.txt")
The code works, but it sorts the elements of the file on a single line, while I want it to be written as in the file, but in alphabetical and numerical orders. To unterstand it, let's say I have this file:
c
a
b
What I want is to sort it like this:
a
b
c
but it sorts it like this:
a b c
I use python 3.10. Can anyone please help me? Thanks.

def sorting(filename):
infile = open(filename)
words = []
for line in infile:
temp = line.split()
for i in temp:
words.append(i)
infile.close()
words.sort()
outfile = open("result.txt", "w")
for i in words:
outfile.writelines(i)
outfile.writelines("\n") # edited Instead of ' ' write '\n'
outfile.close()
sorting("test.txt")

Related

Counting word frequency by python list

Today i was trying to write a code to return the number of times a word is repeated in a text (the text that a txt file contains). at first , before i use a dictionary i wanted to test if the list is working and the words are appended into it so i wrote this code :
def word_frequency(file) :
"""Returns the frequency of all the words in a txt file"""
with open(file) as f :
arg = f.readlines()
l = []
for line in arg :
l = line.split(' ')
return l
After i gave it the file address and i pressed f5 this happened :
In[18]: word_frequency("C:/Users/ASUS/Desktop/Workspace/New folder/tesst.txt")
Out[18]: ['Hello', 'Hello', 'Hello\n']
At first you may think that there is no problem with this output but the text in the txt file is :
As you can see , it only appends the words of the first line to the list but i want all the words that are in the txt file to be appended to the list.
Does anyone know what i have to do? what is the problem here ?
You should save the words in the main list before returning the list.
def word_frequency(file):
with open(file) as f:
lines = f.readlines()
words = []
for line in lines:
line_words = line.split()
words += line_words
return words
In your code, you are saving and returning only the first line, return terminates the execution of the function and returns a value. Which in your case is just the first line of the file.
One answer is from https://www.pythonforbeginners.com/lists/count-the-frequency-of-elements-in-a-list#:~:text=Count%20frequency%20of%20elements%20in%20a%20list%20using,the%20frequency%20of%20the%20element%20in%20the%20list.
import collections
with open(file) as f:
lines = f.readlines()
words = []
for line in lines:
word = line.split(' ')
words.append(word)
frequencyDict = collections.Counter(words)
print("Input list is:", words)
print("Frequency of elements is:")
print(frequencyDict)

Special caracters don't display correctly when splitting

When I'm reading a line in a text file, like this one below :
présenté alloué ééé ààà tué
And try to print it in the terminal, it displays correctly. But when I apply a split with a space as separator, it displays this :
['pr\xc3\xa9sent\xc3\xa9', 'allou\xc3\xa9', '\xc3\xa9\xc3\xa9\xc3\xa9', '\xc3\xa0\xc3\xa0\xc3\xa0', 'tu\xc3\xa9\n']
I just use this to read the text file :
f = open("test.txt")
l = f.readline()
f.close()
print l.split(" ")
Can someone help me ?
Printing the list is not the same as printing its elements
s = "présenté alloué ééé ààà tué"
print s.split(" ")
for x in s.split(" "):
print x
Output:
['pr\xc3\xa9sent\xc3\xa9', 'allou\xc3\xa9', '\xc3\xa9\xc3\xa9\xc3\xa9', '\xc3\xa0\xc3\xa0\xc3\xa0', 'tu\xc3\xa9']
présenté
alloué
ééé
ààà
tué
Python 3.* solution:
All you have to do is to specify the encoding you wish to use
f = open("test.txt", encoding='utf-8')
l = f.readline()
f.close()
print(l.split(" "))
And you'll get
['présenté', 'alloué', 'ééé', 'ààà', 'tué']
Python 2.* solution:
import codecs
f = codecs.open("""D:\Source Code\\voc-git\\test.txt""", mode='r', encoding='utf-8')
l = f.read()
f.close()
for word in l.split(" "):
print(word)

Split string within list into words in Python

I'm a newbie in Python, and I need to write a code in Python that will read a text file, then split each words in it, sort it and print it out.
Here is the code I wrote:
fname = raw_input("Enter file name: ")
fh = open(fname)
lst = list()
words = list()
for line in fh:
line = line.strip()
line.split()
lst.append(line)
lst.sort()
print lst
That's my output -
['Arise fair sun and kill the envious moon', 'But soft what light through yonder window breaks', 'It is the east and Juliet is the sun', 'Who is already sick and pale with grienter code herew',
'with', 'yonder']
However, when I try to split lst.split() it saying
List object has no attribute split
Please help!
You should extend the new list with the splitted line, rather than attempt to split the strings after appending:
for line in fh:
line = line.strip()
lst.extend(line.split())
The issue is split() does not magically mutate the string that is split into a list. You have to do sth with the return value.
for line in fh:
# line.split() # expression has has no effect
line = line.split() # statement does
# lst += line # shortcut for loop underneath
for token in line:
lst = lst + [token]
lst += [token]
The above is a solution that uses a nested loop and avoids append and extend. The whole line by line splitting and sorting can be done very concisely, however, with a nested generator expression:
print sorted(word for line in fh for word in line.strip().split())
You can do:
fname = raw_input("Enter file name: ")
fh = open(fname, "r")
lines = list()
words = list()
for line in fh:
# get an array of words for this line
words = line.split()
for w in words:
lines.append(w)
lines.sort()
print lines
To avoid dups:
no_dups_list = list()
for w in lines:
if w not in no_dups_list:
no_dups_list.append(w)

Python Count not resetting?

I'm trying to insert an increment after the occurance of ~||~ in my .txt. I have this working, however I want to split it up, so after each semicolon, it starts back over at 1.
So Far I have the following, which does everything except split up at semicolons.
inputfile = "output2.txt"
outputfile = "/output3.txt"
f = open(inputfile, "r")
words = f.read().split('~||~')
f.close()
count = 1
for i in range(len(words)):
if ';' in words [i]:
count = 1
words[i] += "~||~" + str(count)
count = count + 1
f2 = open(outputfile, "w")
f2.write("".join(words))
Why not first split the file based on the semicolon, then in each segment count the occurences of '~||~'.
import re
count = 0
with open(inputfile) as f:
semicolon_separated_chunks = f.read().split(';')
count = len(re.findall('~||~', semicolon_separated_chunks))
# if file text is 'hello there ~||~ what is that; what ~||~ do you ~|| mean; nevermind ~||~'
# then count = 4
Instead of resetting the counter the way you are now, you could do the initial split on ;, and then split the substrings on ~||~. You'd have to store your words another way, since you're no longer doing words = f.read().split('~||~'), but it's safer to make an entirely new list anyway.
inputfile = "output2.txt"
outputfile = "/output3.txt"
all_words = []
f = open(inputfile, "r")
lines = f.read().split(';')
f.close()
for line in lines:
count = 1
words = line.split('~||~')
for word in words:
all_words.append(word + "~||~" + str(count))
count += 1
f2 = open(outputfile, "w")
f2.write("".join(all_words))
See if this works for you. You also may want to put some strategically-placed newlines in there, to make the output more readable.

improving a word combination script

Any way to make this better or more simple? I know it generates a whole lot of words and when you try to combine more than 4 lines on one sentence it doesn't look the way it should.
infile = open('Wordlist.txt.txt','r')
wordlist = []
for line in infile:
wordlist.append(line.strip())
infile.close()
outfile = open('output.txt','w')
for word1 in wordlist:
for word2 in wordlist:
out = '%s %s' %(word1,word2)
#feel free to #comment one of these two lines to not output to file or screen
print out
outfile.write(out + '\n')
outfile.close()
Use itertools.product
with open('Wordlist.txt.txt') as infile:
words = [line.strip() for line in infile]
with open('output.txt', 'w') as outfile:
for word1, word2 in itertools.product(words, repeat=2):
outfile.write("%s %s\n" %(word1, word2))
If each line in your infile contains exactly 2 words you may consider:
from itertools import product
with open('Wordlist.txt.txt','r') as infile:
wordlist=infile.readlines()
with open('output','w') as ofile:
ofile.write('\n'.join(map(product, [line.strip().split() for line in wordlist])))

Categories