Remove certain lines in an external textfile - python

I'm working on a program which should be able to handle basic library tasks. I've a
problem with a class method which is suppose to offer the user the possibility to remove a certain book from the library. The list of books is contained on an external textfile with the following format (author, title):
Vibeke Olsson, Molnfri bombnatt
Axel Munthe, Boken om San Michele
The metod I'm using is shown below:
def removeBook(self):
removal_of_book = input("What's the book's titel, author you'd like to remove?: ")
with open("books1.txt" , "r+") as li:
new_li = li.readlines()
li.seek(0)
for line in new_li:
if removal_of_book not in line:
li.write(line)
li.truncate()
print(removal_of_book + " is removed from the system!")
The problem with this method is it that every row containing removal_of_book gets removed (or not rewritten on the file). I know that the method is far from optimal and probably should be scratched but I'm completely lost in finding an alternative.
Does anyone have a better solution to this problem?

You can create your lines to write into the new file on the fly using a list comprehension and then write them to the new file afterwards (using the same file name to overwrite the original file):
def removeBook(self):
to_remove = input("What's the book's title, author you'd like to remove?: ")
with open("books1.txt" , "r+") as li:
new_li = [line for line in li.readlines() if to_remove not in line]
new_file = open('books1.txt', 'w'); new_file.write(new_li); new_file.close()
print(to_remove + " is removed from the system!")
Note that string membership checking is case sensitive, so you are expecting your user to match your case in the original file exactly. You might think about converting the strings to lower-case prior to performing your check using lower().

Related

delete only 1 instance of a string from a file

I have a file that looks like this:
1234:AnneShirly:anneshirley#seneca.ca:4:5\[SRT111,OPS105,OPS110,SPR100,ENG100\]
3217:Illyas:illay#seneca.ca:2:4\[SRT211,OPS225,SPR200,ENG200\]
1127:john Marcus:johnmarcus#seneca.ca:1:4\[SRT111,OPS105,SPR100,ENG100\]
0001:Amin Malik:amin_malik#seneca.ca:1:3\[OPS105,SPR100,ENG100\]
I want to be able to ask the user for an input(the student number at the beginning of each line) and then ask which course they want to delete(the course codes are the list). So the program would delete the course from the list in the student number without deleting other instances of the course. Cause other students have the same courses.
studentid = input("enter studentid")
course = input("enter the course to delete")
with open("studentDatabase.dat") as file:
f = file.readlines()
with open("studentDatabase.dat","w") as file:
for line in lines:
if line.find(course) == -1:
file.write(line)
This just deletes the whole line but I only want to delete the course
Welcome to the site. You have a little ways to go to make this work. It would be good if you put some additional effort in to this before asking somebody to code this up. Let me suggest a structure for you that perhaps you can work on/augment and then you can re-post if you get stuck by editing your question above and/or commenting back on this answer. Here is a framework that I suggest:
make a section of code to read in your whole .dat file into memory. I would suggest putting the data into a dictionary that looks like this:
data = {1001: (name, email, <whatever the digits stand for>, [SRT111, OPS333, ...],
1044: ( ... )}
basically a dictionary with the ID as the key and the rest in a tuple or list. Test that, make sure it works OK by inspecting a few values.
Make a little "control loop" that uses your input statements, and see if you can locate the "record" from your dictionary. Add some "if" logic to do "something" if the ID is not found or if the user enters something like "quit" to exit/break the loop. Test it to make sure it can find the ID's and then test it again to see that it can find the course in the list inside the tuple/list with the data. You probably need another "if" statement in there to "do something" if the course is not in the data element. Test it.
Make a little "helper function" that can re-write a data element with the course removed. A suggested signature would be:
def remove_course(data_element, course):
# make the new data element (name, ... , [reduced course list]
return new_data_element
Test it, make sure it works.
Put those pieces together and you should have the ingredients to change the dictionary by using the loop and function to put the new data element into the dictionary, over-writing the old one.
Write a widget to write the new .dat file from the dictionary in its entirety.
EDIT:
You can make the dictionary from a data file with something like this:
filename = 'student_data.dat'
data = {} # an empty dictionary to stuff the results in
# use a context manager to handle opening/closing the file...
with open(filename, 'r') as src:
# loop through the lines
for line in src:
# strip any whitespace from the end and tokenize the line by ":"
tokens = line.strip().split(':')
# check it... (remove later)
print(tokens)
# gather the pieces, make conversions as necessary...
stu_id = int(tokens[0])
name = tokens[1]
email = tokens[2]
some_number = int(tokens[3])
# splitting the number from the list of courses is a little complicated
# you *could* do this more elegantly with regex, but for your level,
# here is a simple way to find the "chop points" and split this up...
last_blobs = tokens[4].split('[')
course_count = int(last_blobs[0])
course_list = last_blobs[1][:-1] # everything except the last bracket
# split up the courses by comma
courses = course_list.split(',')
# now stuff that into the dictionary...
# a little sanity check:
if data.get(stu_id):
print(f'duplicate ID found: {stu_id}. OVERWRITING')
data[stu_id] = (name,
email,
some_number,
course_count,
courses)
for key, value in data.items():
print(key, value)
i got something for you. What you want to do is to find the student first and then delete the course: like this.
studentid = input("enter studentid")
course = input("enter the course to delete")
with open("studentDatabase.dat") as file:
f = file.readlines()
with open("studentDatabase.dat","w") as file:
for line in lines:
if studentid in line: # Check if it's the right sudent
line = line.replace(course, "") # replace course with nothing
file.write(line)
You want to check if we are looking at the correct student, then replace the line but without the course code. Hope you can find it useful.

Reading list from text file and converting it to string

Hello, I recently started to learn Python, so that's my best explain to you, cause my English skills are not Perfectly.
I made a script which is reading a list from text file, and then my problem is converting it to string, so I could display it in the print function. After doing that, when user is typing his "Nickname", lets say. The script is already readen the list from text file. Also the i don't know if used the split(',') Function, that should split the words with those , from the words in the text file used for list. Here are some pictures of my code.
https://gyazo.com/db797ca0998286248bf846ac70c94067 (Main code)
https://gyazo.com/918aaba9b749116d842fccb78f6204a8 (Text file - list of usernames which are "BANNED")
The text code file name is Listas_BAN.txt.
I've tried to do all this thing myself, i did some research before posting this, but many methods are outdated.
# Name
name = input("~ Please enter Your name below\n")
print("Welcome " + str(name))
def clear(): return os.system('cls')
clear() # Clearina viska.
# define empty list
Ban_Listo_Read = open('Listas_BAN.txt', mode='r')
Ban_Listo_Read = Ban_Listo_Read.readlines()
Ban = Ban_Listo_Read.list(ban)
# Print the function (LIST) in string .
print("Your'e Banned. You'r nickname is - ", + Ban_Listo_Read).Select (' %s ', %s str(name)) # Select the User nickname from
# The input which he typed. (Check for BAN, In the List.)
# Text file is the List Location . - Listas_BAN.txt
enter image description here
enter image description here
I'm getting Wrong Syntax Error
ll = open('untitled.txt', mode='r').readlines()
print("".join(ll).replace('\n', '.'))
name = input("~ Please enter Your name below\n")
if name in ll:
print('your name {n} is in the list'.format(n=name))
EDIT:
plus, you shall consider using string formatting:
var1 = ...
var2 = ...
print("{x}...{y}".format(x=var1, y=var2)
or python 3.7
print(f"{var1}...{var2}")
EDIT:
f.readlines()
https://docs.python.org/3.7/tutorial/inputoutput.html
If you want to read all the lines of a file in a list you can also use
list(f) or f.readlines().

Storing multiple lines from a file to a variable using a delimiter

I am using Python to make a filter to search through thousands of text files for specific queries. These text files consist of several sections, and they do not all have consistent formatting. I want each of these sections to be checked for specific criteria, so in the section of the text file called "DESCRIPTION OF RECORD", I was doing something like this to store the string to a variable:
with open(some_file, 'r') as r:
for line in r:
if "DESCRIPTION OF RECORD" in line:
record = line
Now this works pretty well for most files, but some files have a line break in the the section, so it does not store the whole section to the variable. I was wondering how I could use a delimiter to control how many lines are stored to the variable. I would probably use the title of the next section called "CORRELATION" for the delimiter. Any ideas?
An example structure of the file could look like:
CLINICAL HISTORY: Some information.
MEDICATIONS: Other information
INTRODUCTION: Some more information.
DESCRIPTION OF THE RECORD: Some information here....
another line of information
IMPRESSION: More info
CLINICAL CORRELATION: The last bit of information
You could use builtin re module like that:
import re
# I assume you have a list of all possible sections
sections = [
'CLINICAL HISTORY',
'MEDICATIONS',
'INTRODUCTION',
'DESCRIPTION OF THE RECORD',
'IMPRESSION',
'CLINICAL CORRELATION'
]
# Build a regexp that will match any of the section names
exp = '|'.join(sections)
with open(some_file, 'r') as r:
contents_of_file = r.read()
infos = list(re.split(exp, contents_of_file)) # infos is a list of what's between the section names
infos = [info.strip('\n :') for info in infos] # let's get rid of colons and whitespace in our infos
print(infos) # you don't have to print it :)
If I use your example text instead of a file, it prints something like that:
['', 'Some information.', 'Other information', 'Some more information.', 'Some information here....\nanother line of information', 'More info', 'The last bit of information']
The first element is empty, but you can get rid of it simply by doing so:
infos = infos[1:]
By the way, if we merge lines in which we deal with infos, into one, it would probably be cleaner, and would surely be more efficient (but maybe a little bit less understandable):
infos = [info.strip('\n :') in re.split(exp, contents_of_file)][1:]
If you do not know the sections you'll find, here's a version which seems to work, as long as the text is formatted as in your example :
import itertools
text = """
CLINICAL HISTORY: Some information.
MEDICATIONS: Other information
INTRODUCTION: Some more information.
DESCRIPTION OF THE RECORD: Some information here....
another line of information
IMPRESSION: More info
CLINICAL CORRELATION: The last bit of information
"""
def method_tuple(s):
# sp holds strings which finish with the section names.
sp = s.split(":")
# This line removes spurious "\n" at both end of the strings in sp.
# It then splits them once at "\n" starting from their end, effectively
# seperating the sections and the descriptions.
# It builds a list of strings alternating section names and information.
fragments = list(itertools.chain.from_iterable( p.strip("\n").rsplit("\n", 1) for p in sp ))
# You can now build a list of 2-uples.
pairs = [ (fragments[i*2],fragments[i*2+1]) for i in range(len(fragments)//2)]
# Or you could build a dict
# pairs = { fragments[i*2]:fragments[i*2+1] for i in range(len(fragments)//2)}
return pairs
print(method_tuple(text))
The timings compared the regex version of Ilya are roughly equivalent, although building a dictionnary seems to start winning over building a list of tuples or using regexp, on the sample text at 1 billion loops...
I found another possible solution for this using the indexes of the line. I first opened the check file, and stored its f.read() contents into a variable called info. I then did this:
with open(check_file, 'r') as r:
for line in r:
if "DESCRIPTION" in line:
record_Index = info.index(line)
record = info[info.index(line):]
if "IMPRESSION" in record:
impression_Index = info.index("IMPRESSION")
record = info[record_Index:impression_Index]
This method worked as well, although I don't know how efficient it is memory and speed wise. Instead of using with open(...) multiple times, it might be better just to store it all in the variable called info and then do everything with that.

Create complex file name from user input

I’m trying to adapt a script that currently contains the following segment:
# Initialize the output files
working_dir = os.getcwd()
output_path = "{}/{}".format(working_dir, "output_Prelim")
if not os.path.exists(output_path):
os.mkdir(output_path)
data_file = "{0}/RWA_2010_BUFFER_by_1.csv".format(output_path)
error_file = "{0}/failed_queries.txt".format(output_path)
In the statement that begins “data_file,” the parts of the file name “RWA” and “2010” refer to the country and year in which a particular survey was conducted.
I’m trying to adapt that segment so that the file name preserves the same general format, but allows the user to enter a different country code and year.
I can generate a string called “file_name” that looks right, using the following code:
print('Enter the country code')
cCode =input()
print('The country code is '+cCode)
print('Enter the survey year')
srvyYear =input()
print('The survey year is '+srvyYear)
file_name = r'"{0}/'+cCode+'_'+srvyYear+'_'+'BUFFER300_by_1.csv"'\
When I print “file_name,” I get
"{0}/BDI_2009_BUFFER300_by_1.csv"
That looks right, but am not sure what to do with it - in particular, how to get it understood as a file name rather than as a string. When I try to concatenate that string with the remainder of the statement that begins “data_file,” I get a syntax error.
Obviously I need to do a tutorial, but am not sure what to look for.
Many thanks, and apologies for the newbie question.
Not sure what your problem is exactly, but if you want to replace the {0} part with something else (e.g. the value in data_file), you can just do file_name.format(data_file).
Why not use join() method for string and os.path.join() for paht? e.g.
file_name = os.path.join(out_path, '_'.join([cCode, srvyYear, 'BUFFER300_by_1.csv']))
You can see its doc here, I believe that is what you needed, you'd better not concatenate string by yourself to construct path or filename. By the way, os.path.join() can construct filename without platform dependence, it will be a smart choice (especially for Windows).

Python program to search for specific strings in hash values (coding help)

Trying to write a code that searches hash values for specific string's (input by user) and returns the hash if searchquery is present in that line.
Doing this to kind of just learn python a bit more, but it could be a real world application used by an HR department to search a .csv resume database for specific words in each resume.
I'd like this program to look through a .csv file that has three entries per line (id#;applicant name;resume text)
I set it up so that it creates a hash, then created a string for the resume text hash entry, and am trying to use the .find() function to return the entire hash for each instance.
What i'd like is if the word "gpa" is used as a search query and it is found in s['resumetext'] for three applicants(rows in .csv file), it prints the id, name, and resume for every row that has it.(All three applicants)
As it is right now, my program prints the first row in the .csv file(print resume['id'], resume['name'], resume['resumetext']) no matter what the searchquery is, whether it's in the resumetext or not.
lastly, are there better ways to doing this, by searching word documents, pdf's and .txt files in a folder for specific words using python (i've just started reading about the re module and am wondering if this may be the route, rather than putting everything in a .csv file.)
def find_details(id2find):
resumes_f=open("resume_data.csv")
for each_line in resumes_f:
s={}
(s['id'], s['name'], s['resumetext']) = each_line.split(";")
resumetext = str(s['resumetext'])
if resumetext.find(id2find):
return(s)
else:
print "No data matches your search query. Please try again"
searchquery = raw_input("please enter your search term")
resume = find_details(searchquery)
if resume:
print resume['id'], resume['name'], resume['resumetext']
The line
resumetext = str(s['resumetext'])
is redundant, because s['resumetext'] is already a string (since it comes as one of the results from a .split call). So, you can merge this line and the next into
if id2find in s['resumetext']: ...
Your following else is misaligned -- with it placed like that, you'll print the message over and over again. You want to place it after the for loop (and the else isn't needed, though it would work), so I'd suggest:
for each_line in resumes_f:
s = dict(zip('id name resumetext'.split(), each_line.split(";"))
if id2find in s['resumetext']:
return(s)
print "No data matches your search query. Please try again"
I've also shown an alternative way to build dict s, although yours is fine too.
What #Justin Peel said. Also to be more pythonic I would say change
if resumetext.find(id2find) != -1: to if id2find in resumetext:
A few more changes: you might want to lower case the comparison and user input so it matches GPA, gpa, Gpa, etc. You can do this by doing searchquery = raw_input("please enter your search term").lower() and resumetext = s['resumetext'].lower(). You'll note I removed the explicit cast around s['resumetext'] as it's not needed.
One change that I recommend for your code is changing
if resumetext.find(id2find):
to
if resumetext.find(id2find) != -1:
because find() returns -1 if id2find wasn't in resumetext. Otherwise, it returns the index where id2find is first found in resumetext, which could be 0. As #Personman commented, this would give you the false positive because -1 is interpreted as True in Python.
I think that problem has something to do with the fact that find_details() only returns the first entry for which the search string is found in resumetext. It might be good to make find_details() into a generator instead and then you could iterate over it and print the found records out one by one.

Categories