Python 3; Writing to text files, unintentional newline (beginner) - python

The following function is a part of my future program, a library program. This particular function is supposed to fetch books from a text file, and if the user desires, "loan" them and thereby adding a string "(LOANED)" after the author of the book. The books are sorted by title first, followed by a comma and a space (, ) and then the author of the book. What I want to do is to just, simply, add a "(LOANED)" string after the author of the book in the text file. However, when I try this the (LOANED) string just ends up on a different line (one line below) from where I want it to be, and it's driving me nuts.
def lend_book(title):
f = open("booksbytitle.txt", "r+")
d={}
#splits the registry up into a dictionary with the book title as key and the book author as value to that key
for i in f:
x=i.split(",")
a=x[0]
b=x[1]
d[a]=b[0:(len(b))]
#checks if the title is in the dictionary, else skips down and quits the function
if title in d:
print("\n", title, "is available in the library and is written by"+d[title])
saved_author = d[title][1:]
while True:
alternative=input("Do you want to lend this book? [y/n] ")
if alternative.lower() == "y":
print("The book is now loaned ")
break
elif alternative.lower() == "n":
print("Okay, going back to main menu.. ")
break
else:
print("That is not a valid choice. Type 'y' or 'n'")
f.close()
loaned="(LOANED)"
f=open("booksbytitle.txt", "r+")
z=f.readlines()
f.seek(0)
for i in z:
if title not in i:
f.write(i)
f.write("\n" + title + ", " + saved_author + loaned)
f.truncate()
f.close()
#clears the program of unintented blank lines
fh = open("booksbytitle.txt", "r")
lines = fh.readlines()
fh.close()
keep = []
for line in lines:
if not line.isspace():
keep.append(line)
fh = open("booksbytitle.txt", "w")
fh.write("".join(keep))
fh.close()
else:
print("There wasnt a book by that name found in the registry")

It's hard to tell with the screwed-up formatting and the meaningless one-letter variable names, but I suspect the problem is this:
When you iterate the lines of a file, like for i in f:, each one ends with a newline character ('\n').
When you split(",") each one, the last split-off string still contains that newline.
So ultimately, when you try to stick that string in the middle of a string, it's got a newline at the end, which means you end up with a newline in the middle of the string.
To fix this, use rstrip on each line as you read them in:
for i in f:
x = i.rstrip().split(",")
This may mean that you're now missing newlines in your output to the file. You were expecting to get them for free, but now you don't. So you may have to do something like this:
f.write("\n" + title + ", " + saved_author + loaned + "\n")
However, maybe not. I notice that for some reason you're putting a "\n" at the start of every line, so this may just mean you end up with extra blank lines between each line (along with the extra blank line at the start of your file, which is inherent in using "\n" +).

You could use rstrip() on the strings to remove the right spaces (newlines),
and then join over "\n" instead of the empty string.
PS: You can write a bit of this code much simpler, by the way. For instance, you can just get the lines in the file all at once, filter out the empty ones and rstrip all at the same time, like this:
with open(filename, "r") as handler:
keep = [line.rstrip() for line in handler if line]
(The 'with' takes care of automatically closing the file after the indented block, then there's a list comprehension, and the open file object "handler" gives you the lines when iterating over it.)

Related

I'm trying to solve this Python exercise but I have no idea of how to do it: get first character of a line from a file + length of the line

I am learning Python on an app called SoloLearn, got to solve this exercise and I cannot see the solution or see the comments, I don't need to solve it to continue but I'd like to know how to do it.
Book Titles: You have been asked to make a special book categorization program, which assigns each book a special code based on its title.
The code is equal to the first letter of the book, followed by the number of characters in the title.
For example, for the book "Harry Potter", the code would be: H12, as it contains 12 characters (including the space).
You are provided a books.txt file, which includes the book titles, each one written on a separate line.
Read the title one by one and output the code for each book on a separate line.
For example, if the books.txt file contains:
Some book
Another book
Your program should output:
S9
A12
Recall the readlines() method, which returns a list containing the lines of the file.
Also, remember that all lines, except the last one, contain a \n at the end, which should not be included in the character count.
I tried:
file = open("books.txt","r")
for line in file:
for i in range(len(file.readlines())):
title = line[0]+str(len(line)-1)
print(titulo)
title = line[0]+str(len(line)-1)
print(title)
file.close
I also tried with range() and readlines() but I don't know how to solve it
This uses readlines():
with open('books.txt') as f: # Open file
for line in f.readlines(): # Iterate through lines
if line[-1] == '\n': # Check if there is '\n' at end of line
line = line[:-1] # If there is, ignore it
print(line[0], len(line), sep='') # Output first character and length
But I think splitlines() is easier, as it doesn't have the trailing '\n':
with open('books.txt') as f: # Open file
for line in f.read().splitlines(): # Iterate through lines
# No need to check for trailing '\n'
print(line[0], len(line), sep='') # Output first character and length
You can use "with" to handle file oppening and closing.
Use rstrip to get rid of '\n'.
with open('books.txt') as f:
lines = file.readlines()
for line in lines:
print(line[0] + str(len(line.rstrip())))
This is the same:
file = open('books.txt')
lines = file.readlines()
for line in lines:
print(line[0] + str(len(line.rstrip())))
file.close()

How do I combine lines in a text file in a specific order?

I'm trying to transform the text in a file according the following rule: for each line, if the line does not begin with "https", add that word to the beginning of subsequent lines until you hit another line with a non-https word.
For example, given this file:
Fruit
https://www.apple.com//
https://www.banana.com//
Vegetable
https://www.cucumber.com//
https://www.lettuce.com//
I want
Fruit-https://www.apple.com//
Fruit-https://www.banana.com//
Vegetable-https://www.cucumber.com//
Vegetable-https://www.lettuce.com//
Here is my attempt:
one = open("links.txt", "r")
for two in one.readlines():
if "https" not in two:
sitex = two
else:
print (sitex + "-" +two)
Here is the output of that program, using the above sample input file:
Fruit
-https://www.apple.com//
Fruit
-https://www.banana.com//
Vegetable
-https://www.cucumber.com//
Vegetable
-https://www.lettuce.com//
What is wrong with my code?
To fix that we need to implement rstrip() method to sitex to remove the new line character at the end of the string. (credit to BrokenBenchmark)
second, the print command by default newlines everytime it's called, so we must add the end="" parameter to fix this.
So your code should look like this
one = open("links.txt", "r")
for two in one.readlines():
if "https" not in two:
sitex = two.rstrip()
else:
print (sitex + "-" +two,end="")
one.close()
Also always close the file when you are done.
Lines in your file end on "\n" - the newline character.
You can remove whitespaces (includes "\n") from a string using strip() (both ends) or rstrip() / lstrip() (remove at one end).
print() adds a "\n" at its end by default, you can omit this using
print("something", end=" ")
print("more) # ==> 'something more' in one line
Fix for your code:
# use a context handler for better file handling
with open("data.txt","w") as f:
f.write("""Fruit
https://www.apple.com//
https://www.banana.com//
Vegetable
https://www.cucumber.com//
https://www.lettuce.com//
""")
with open("data.txt") as f:
what = ""
# iterate file line by line instead of reading all at once
for line in f:
# remove whitespace from current line, including \n
# front AND back - you could use rstring here as well
line = line.strip()
# only do something for non-empty lines (your file does not
# contain empty lines, but the last line may be empty
if line:
# easier to understand condition without negation
if line.startswith("http"):
# printing adds a \n at the end
print(f"{what}-{line}") # line & what are stripped
else:
what = line
Output:
Fruit-https://www.apple.com//
Fruit-https://www.banana.com//
Vegetable-https://www.cucumber.com//
Vegetable-https://www.lettuce.com//
See:
str.lstrip([chars])
str.rstrip([chars])
str.strip([chars])
[chars] are optional - if not given, whitespaces are removed.
You need to strip the trailing newline from the line if it doesn't contain 'https':
sitex = two
should be
sitex = two.rstrip()
You need to do something similar for the else block as well, as ShadowRanger points out:
print (sitex + "-" +two)
should be
print (sitex + "-" + two.rstrip())

Minor bug in code written to format a text file (incorrect spacing) (Python 3)

New to coding so sorry if this is a silly question.
I have some text that I'm attempting to format to make it more pleasant to read, so I tried my hand at writing a short program in Python to do it for me. I initially removed extra paragraph breaks in MS-Word using the find-and-replace option. The input text looks something like this:
This is a sentence. So is this one. And this.
(empty line)
This is the next line
(empty line)
and some lines are like this.
I want to eliminate all empty lines, so that there is no spacing between lines, and ensure no sentences are left hanging mid-way like in the bit above. All new lines should begin with 2 (two) empty spaces, represented by the $ symbol below. So after formatting it should look something like this:
$$This is a sentence. So is this one. And this.
$$This is the next line and some lines are like this.
I wrote the following script:
import os
directory = "C:/Users/DELL/Desktop/"
filename = "test.txt"
path = os.path.join(directory, filename)
with open(path,"r") as f_in, open(directory+"output.txt","w+") as f_out:
temp = " "
for line in f_in:
curr_line = line.strip()
temp += curr_line
#print("Current line:\n%s\n\ntemp line: %s" % (curr_line, temp))
if curr_line:
if temp[-1]==".": #check if sentence is complete
f_out.write(temp)
temp = "\n " #two blank spaces here
It eliminates all blank lines, indents new lines by two spaces, and conjoins hanging sentences, but doesn't insert the necessary blank space - so the output currently looks like (missing space between the words line and and).
$$This is a sentence. So is this one. And this.
$$This is the next lineand some lines are like this.
I tried to fix this by changing the following lines of code to read as follows:
temp += " " + curr_line
temp = "\n " #one space instead of two
and that doesn't work, and I'm not entirely sure why. It might be an issue with the text itself but I'll check on that.
Any advice would be appreciated, and if there is a better way to do what I want than this convoluted mess that I wrote, then I would like to know that as well.
EDIT: I seem to have fixed it. In my text (very long so I didn't notice it at first) there were two lines separated by 2 (two) empty lines, and so my attempt at fixing it didn't work. I moved one line a bit further below to give the following loop, which seems to have fixed it:
for line in f_in:
curr_line = line.strip()
#print("Current line:\n%s\n\ntemp line: %s" % (curr_line, temp))
if curr_line:
temp += " " + curr_line
if temp[-1]==".": #check if sentence is complete
f_out.write(temp)
temp = "\n "
I also saw that an answer below initially had a bit of Regex in it, I'll have to learn that at some point in the future I suppose.
Thanks for the help everyone.
This should work. It's effectively the same as yours but a bit more efficient. Doesn't use string concatenation + += (which are slow) but instead saves incomplete lines as a list. It then writes 2 spaces, each incomplete sentence joined by spaces and then a newline -- this simplifies it by only writing when a line is complete.
temp = []
with open(path_in, "r") as f_in, open(path_out, "w") as f_out:
for line in f_in:
curr_line = line.strip()
if curr_line:
temp.append(curr_line)
if curr_line.endswith('.'): # write our line
f_out.write(' ')
f_out.write(' '.join(temp))
f_out.write('\n')
temp.clear() # reset temp
outputs
This is a sentence. So is this one. And this.
This is the next line and some lines are like this.

Read lines from .txt and if the first and last char equals to X and Y, add some text after that string

i'm trying to solve a problem that consists in:
open a .txt input file
readlines of that .txt
store the lines values in a list
check if lin.startswith("2") and lin.endswith("|")
if it's true, then lin2 = lin + "ISENTO"
write the edited lines to an output .txt file
Here's what i got until now...
def editTxt():
#open .txt file and read the lines
filename = askopenfilename(filetypes=[("Text files","*.txt"), ("Text files","*.TXT")])
infile = open(filename, 'r')
infile.read()
#save each line in a list called "linhas" outside the editTxt function
with open(filename, 'r') as f:
linhas = f.readlines()
#create the output file
outfile = open(filename + "_edit.txt", 'w')
#checking the condition and writing the edited lines
for linha in linhas:
if linha.startswith("2") and linha.endswith("|"):
linha = linha + "ISENTO"
outfile.write(linha)
#close files
outfile.close()
infile.close()
The problem is that my output file is exactly equals my input file...
i've already tried to use if linha[0] == "2" and linha[len(linha)-1] == "|"
but then i figured it out that the module f.readlines() just add \n after my string...
so i tried with if linha[0] == "2" and linha[len(linha)-3] == "|"
but i didn't worked too...
some guys told me that i should use the replace function.. but i couldn't figure how
The real file example:
lin1: 10|1,00|55591283000185|02/03/2015|31/03/2015
lin2: 20|I||VENDA|0|9977|02/03/2015 00:00:00|02/03/2015 11:48:00|1|5102|||07786273000152|OBSERVE SEGURANCA LTDA|RUA
MARINGA,|2150||BOA VISTA|RIBEIRAO PRETO|SP|14025560||39121530|
lin3: 30|1103|DAT 05MM - 5.102||PC|1,0000|19,9000|19,90|090|0,00|0,00|0,00
I just need to change the lin2, because it starts with "2" and ends with "|"
what i need after running the editTxt function:
lin2: 20|I||VENDA|0|9977|02/03/2015 00:00:00|02/03/2015 11:48:00|1|5102|||07786273000152|OBSERVE SEGURANCA LTDA|RUA
MARINGA,|2150||BOA VISTA|RIBEIRAO PRETO|SP|14025560||39121530|ISENTO
please python experts, show me an easier way to do this with another code or preferably explaining to me whats wrong with mine..
thx!
You were very close with your last attempt
The '\n' line terminator is not literally the characters '\' and 'n'. It's a special character that is represented, for convenience by '\n'. So it's only one character in your string not two.
Hopefully, that should give you enough of a hint to figure out how to change your code :)

How do I search for text within a tab delimited file and print this information?

I need to search for something in a tab delimited text file. The user is supposed to input both the file and the thing that needs searching for. The programme is then supposed to return the whole line that the user inputted word is in. I have got two models so far because I've been coming at this problem from different angles. The first programme goes as follows:
import csv
searchfile = raw_input ('Which file do you want to search? ')
try:
input_file = open (searchfile, 'rU')
except:
print "Invalid file. Please enter a correct file"
csv_file_object = csv.reader(open(searchfile, 'rb'))
header = csv_file_object.next()
data=[]
for row in csv_file_object:
data.append(row)
searchA = raw_input ('which author?')
author_search = data[0::,0] == searchA
if author_search in searchfile:
print author_search
The problem with the first programme is that this error pops up:
TypeError: list indices must be integers, not tuple
I therefore attempted this method:
import csv
searchfile = raw_input ('Which file do you want to search? ')
try:
input_file = open (searchfile, 'rU')
except:
print "Invalid file. Please enter a correct file"
with open(searchfile) as f:
reader = csv.reader(f, delimiter="\t")
d = list(reader)
searchtype = raw_input ('Search on author or journal/conference or [Q = quit]')
if searchtype == 'author':
searchdataA = raw_input ("Input author name")
if searchdataA in input_file:
print line
elif searchtype == 'journal' or 'conference' or 'journal/conference':
searchdataJ = raw_input ("input journal/conference name")
if searchdataJ in d:
print line
elif searchtype == 'Q':
print "Program left"
else:
print "please choose either author or journal/conference"
This is unable to get beyond inputting the search parameters.
Any help on where to go with either programme would be much appreciated, or if I'm completely on the wrong track then links to useful material would be great.
I think you're making this a bit more complicated than it needs to be. Since you want to print the whole line that the target word appeared on, you don't really need the CSV module. You're not doing any of the sophisticated parsing it is capable of.
searchfile = raw_input ('Which file do you want to search? ')
searchA = raw_input ('which author?')
with open(searchfile) as infile:
for line in infile:
if searchA in line:
print(' '.join(line.split()))
break # remove this if you want to print all matches instead of
# just the first one
Notice that in the when printing the line, I first split the line (which splits on whitespace by default), then rejoin the fields with two spaces between them. I think doing something like this would be a good way to go for you since you're printing tab-separated fields on the console. Reducing that extra space will make your prints a bit easier to read, but using two spaces still makes it easy to distinguish the columns from each other.
You can generalize it by prompting your user for any search term, instead of specifying "author". This may be the way to go, since your second code snippet suggests that you may want to search for other fields, like "journal" or "conference":
target_term = raw_input("Which term or phrase would you like to find?")
Since this method searches in and prints the entire line, there's no need to deal with the separate columns and different kinds of search terms. It just looks at the whole row at once and prints a matching line.
Since you are not really using a different search method, depending on if you are searching for author, journal, conference or journal/conference. So you could actually do a full text search on the line. Therefore it is wise to collect all data you need from user BEFORE processing file, so you can output just the matching lines. If user passes a rather large CSV file, then your way would take up far too much memory.
with open(searchfile, 'r') as f:
for line in f:
if line.find(searchA) > -1:
print line
This way you are looping through the file as fast as possible and prints out all matching lines.
The .find() function returns the index to where in the string he found the match and otherwise -1 if the string was not found. So from the value you could "estimate" on where the match was made, but if you really want to differentiate between author, journal, etc. then you will have to split the line. In my sample i will assume the author field to be the sixth field in the CSV line:
with open(searchfile, 'r') as f:
for line in f:
fields = line.split("\t")
if len(fields) > 5: # check length of fields array
if fields[5].find(searchA) > -1: # search straight in author field
print line # return full line
why not simply
fname = raw_input("Enter Filename")
author = raw_input("Enter Author Name:")
if author in open(fname,"rb").read():
print "match found"
if you want to see the lines you could do
print re.findall(".*%s.*"%(author),open(fname,"rb").read())
as people point out it is better form to do
with open(fname,"rb") as f:
data = print re.findall(".*%s.*"%(author),f.read())
although in CPython it will be garbage collected immediatly so its not really a problem....
The first thing that came to my mind is simply:
def check_file(file_name, author_name):
with open(file_name) as f:
content = f.readlines()
for line in content:
if author_name in line:
print "Found: ", line
Hope it can be useful.

Categories