Special caracters don't display correctly when splitting - python

When I'm reading a line in a text file, like this one below :
présenté alloué ééé ààà tué
And try to print it in the terminal, it displays correctly. But when I apply a split with a space as separator, it displays this :
['pr\xc3\xa9sent\xc3\xa9', 'allou\xc3\xa9', '\xc3\xa9\xc3\xa9\xc3\xa9', '\xc3\xa0\xc3\xa0\xc3\xa0', 'tu\xc3\xa9\n']
I just use this to read the text file :
f = open("test.txt")
l = f.readline()
f.close()
print l.split(" ")
Can someone help me ?

Printing the list is not the same as printing its elements
s = "présenté alloué ééé ààà tué"
print s.split(" ")
for x in s.split(" "):
print x
Output:
['pr\xc3\xa9sent\xc3\xa9', 'allou\xc3\xa9', '\xc3\xa9\xc3\xa9\xc3\xa9', '\xc3\xa0\xc3\xa0\xc3\xa0', 'tu\xc3\xa9']
présenté
alloué
ééé
ààà
tué

Python 3.* solution:
All you have to do is to specify the encoding you wish to use
f = open("test.txt", encoding='utf-8')
l = f.readline()
f.close()
print(l.split(" "))
And you'll get
['présenté', 'alloué', 'ééé', 'ààà', 'tué']
Python 2.* solution:
import codecs
f = codecs.open("""D:\Source Code\\voc-git\\test.txt""", mode='r', encoding='utf-8')
l = f.read()
f.close()
for word in l.split(" "):
print(word)

Related

how can i sort file elements in python?

I have to sort some elements in a text file that contains the names with the schedules of some teachers. Searching on google, I found this program:
def sorting(filename):
infile = open(filename)
words = []
for line in infile:
temp = line.split()
for i in temp:
words.append(i)
infile.close()
words.sort()
outfile = open("result.txt", "w")
for i in words:
outfile.writelines(i)
outfile.writelines(" ")
outfile.close()
sorting("file.txt")
The code works, but it sorts the elements of the file on a single line, while I want it to be written as in the file, but in alphabetical and numerical orders. To unterstand it, let's say I have this file:
c
a
b
What I want is to sort it like this:
a
b
c
but it sorts it like this:
a b c
I use python 3.10. Can anyone please help me? Thanks.
def sorting(filename):
infile = open(filename)
words = []
for line in infile:
temp = line.split()
for i in temp:
words.append(i)
infile.close()
words.sort()
outfile = open("result.txt", "w")
for i in words:
outfile.writelines(i)
outfile.writelines("\n") # edited Instead of ' ' write '\n'
outfile.close()
sorting("test.txt")

how do I output a list of strings that fall alphabetically between two input values

I'm given a text file called input1.txt1 this file contains the following
aspiration
classified
federation
graduation
millennium
philosophy
quadratics
transcript
wilderness
zoologists
Write a program that first reads in the name of an input file, followed by two strings representing the lower and upper bounds of a search range. The file should be read using the file.readlines() method. The input file contains a list of alphabetical, ten-letter strings, each on a separate line. Your program should output all strings from the list that are within that range (inclusive of the bounds).
EX:
Enter the path and name of the input file: input1.txt
Enter the first word: ammunition
Enter the second word (it must come alphabetically after the first word): millennium
The words between ammunition and millennium are:
aspiration
classified
federation
graduation
millennium
file_to_open = input()
bound1 = input()
bound2 = input()
with open(file_to_open) as file_handle:
list1 = [line.strip() for line in file_handle]
out = [x for x in list1 if x >= bound1 and x <= bound2]
out.sort()
print('\n'.join(map(str, out)))
Use a list comprehension with inequalities to check the string range:
out = [x for x in your_list if x >= 'ammunition' and x <= 'millennium']
This assumes that your range is inclusive on both ends, that is, you want to include ammunition and millennium on both ends of the range.
To further sort the out list and then write to a file, use:
out.sort()
f = open('output.txt', 'w')
text = '\n'.join(out)
f.write(text)
f.close()
if you should use readline() try this :
filepath = 'Iliad.txt'
start = 'sometxtstart'
end = 'sometxtend'
apending = False
out = ""
with open(filepath) as fp:
line = fp.readline()
while line:
txt = line.strip()
if(txt == end):
apending = False
if(apending):
out+=txt + '\n'
if(txt == start):
apending = True
line = fp.readline()
print(out)
This worked for me:
file = input()
first = input()
second = input()
with open(file) as f:
lines = f.readlines()
for line in lines:
l = line.strip('\n')
if (l >= first) and (l <= second):
print(line.strip())
else:
pass

Python programming with file

I need help with a problem concerning the code below.
with open ("Premier_League.txt", "r+") as f:
i= int(input("Please put in your result! \n"))
data = f.readlines()
print(data)
data[i] = int(data[i])+1
f.seek(0) # <-- rewind to the beginning
f.writelines(str(data))
f.truncate() # <-- cut any leftovers from the old version
print(data)
data[i] = str(data)
For example if the file
Premier_League.txt contains:
1
2
3
and as I run the program and choose i as 0
that gives me:
[2, '2\n', '3']
and saves it to the already existing file (and deletes the old content)
But after that I cannot run the program again and it gives me this:
ValueError: invalid literal for int() with base 10: "[2, '2\\n', '3']"
My question is: How do I make the new file content suitable to go into the program again?
I recommend this approach:
with open('Premier_League.txt', 'r+') as f:
data = [int(line.strip()) for line in f.readlines()] # [1, 2, 3]
f.seek(0)
i = int(input("Please put in your result! \n"))
data[i] += 1 # e.g. if i = 1, data now [1, 3, 3]
for line in data:
f.write(str(line) + "\n")
f.truncate()
f readlines() read all content in a list as a string,if you want write back those contents as int
data=[]
with open ("Premier_League.txt", "r+") as f:
i= int(input("Please put in your result! \n"))
data = f.readlines()
with open ("Premier_League.txt", "w") as f:
for j in data:
f.write(str(int(j)+1))
#or do this to make it more clean,these lines are comments
#j=int(j)+1
#f.write(str(j))
# <-- cut any leftovers from the old version
print(data)
Note that,once you open a file,if you don't close it,your written contents can be lost,whatever you want to do with data,your have to do it in the second writing method .Also notice the change from r to w in with open ("Premier_League.txt", "w") for writing
Following my solution:
with open ("Premier_League.txt", "r+") as f:
i= int(input("Please put in your result! \n"))
# here strip wrong chars from input
data = f.readlines()
print(data)
# here added the str(..) conversion
data[i] = str(int(data[i].strip())+1) + '\n'
f.seek(0) # <-- rewind to the beginning
# the str(data) is wrong, data is a list!
f.writelines(data)
# I don't think this is necessary
# f.truncate() # <-- cut any leftovers from the old version
print(data)
# i think this is not necessary
# data[i] = str(data)

Split string within list into words in Python

I'm a newbie in Python, and I need to write a code in Python that will read a text file, then split each words in it, sort it and print it out.
Here is the code I wrote:
fname = raw_input("Enter file name: ")
fh = open(fname)
lst = list()
words = list()
for line in fh:
line = line.strip()
line.split()
lst.append(line)
lst.sort()
print lst
That's my output -
['Arise fair sun and kill the envious moon', 'But soft what light through yonder window breaks', 'It is the east and Juliet is the sun', 'Who is already sick and pale with grienter code herew',
'with', 'yonder']
However, when I try to split lst.split() it saying
List object has no attribute split
Please help!
You should extend the new list with the splitted line, rather than attempt to split the strings after appending:
for line in fh:
line = line.strip()
lst.extend(line.split())
The issue is split() does not magically mutate the string that is split into a list. You have to do sth with the return value.
for line in fh:
# line.split() # expression has has no effect
line = line.split() # statement does
# lst += line # shortcut for loop underneath
for token in line:
lst = lst + [token]
lst += [token]
The above is a solution that uses a nested loop and avoids append and extend. The whole line by line splitting and sorting can be done very concisely, however, with a nested generator expression:
print sorted(word for line in fh for word in line.strip().split())
You can do:
fname = raw_input("Enter file name: ")
fh = open(fname, "r")
lines = list()
words = list()
for line in fh:
# get an array of words for this line
words = line.split()
for w in words:
lines.append(w)
lines.sort()
print lines
To avoid dups:
no_dups_list = list()
for w in lines:
if w not in no_dups_list:
no_dups_list.append(w)

python file reading

I have file /tmp/gs.pid with content
client01: 25778
I would like retrieve the second word from it.
ie. 25778.
I have tried below code but it didn't work.
>>> f=open ("/tmp/gs.pid","r")
>>> for line in f:
... word=line.strip().lower()
... print "\n -->" , word
Try this:
>>> f = open("/tmp/gs.pid", "r")
>>> for line in f:
... word = line.strip().split()[1].lower()
... print " -->", word
>>> f.close()
It will print the second word of every line in lowercase. split() will take your line and split it on any whitespace and return a list, then indexing with [1] will take the second element of the list and lower() will convert the result to lowercase. Note that it would make sense to check whether there are at least 2 words on the line, for example:
>>> f = open("/tmp/gs.pid", "r")
>>> for line in f:
... words = line.strip().split()
... if len(words) >= 2:
... print " -->", words[1].lower()
... else:
... print 'Line contains fewer than 2 words.'
>>> f.close()
word="client01: 25778"
pid=word.split(": ")[1] #or word.split()[1] to split from the separator
If all lines are of the form abc: def, you can extract the 2nd part with
second_part = line[line.find(": ")+2:]
If not you need to verify line.find(": ") really returns a nonnegative number first.
with open("/tmp/gs.pid") as f:
for line in f:
p = line.find(": ")
if p != -1:
second_part = line[p+2:].lower()
print "\n -->", second_part
>>> open("/tmp/gs.pid").read().split()[1]
'25778'

Categories