I have a CSV file filled with ticket information. I created a small script to input ticket numbers separated by spaces into a list which searches the CSV file for each ticket in the list, and if it finds the ticket, it outputs info on that row.
My problem is if I search for a ticket not in the CSV file, it just skips past it and moves on, but I would like it to tell me that that ticket is not in the file. From what I can tell, it's searching the CSV file by row or line. If I try an else statement, it will start printing out every line in the CSV file, if that ticket's not in the row.
I need to be able to input multiple ticket numbers and have python search each one individually. If it finds the ticket in column 1, then print information from that row and if it doesn't find the ticket in any rows column 1, then print out that ticket followed by "is not in the file".
import csv
file = csv.reader(open('Path To CSV file', "rb"), delimiter=",")
newticket = raw_input('Enter Ticket Numbers:').split()
for line in file:
for tickets in newticket:
if tickets == line[1]:
print(tickets, line[36], line[37])
If your CSV file isn't enormous, I suggest reading the whole thing, extracting the data you want into a dictionary, with the ticket number as the key. Searching a dict is very fast.
I've modified your file opening code to use the with statement, which fails gracefully if there's a problem reading the file. BTW, there's no need to specify the comma as the CSV delimiter, since that the default delimiter.
import csv
with open('Path To CSV file', "rb") as f:
data = {line[1]: (line[36], line[37]) for line in csv.reader(f)}
newtickets = raw_input('Enter Ticket Numbers:').split()
for ticket in newtickets:
line = data.get(ticket)
if line:
print(ticket, line)
else:
print(ticket, "not found")
The dict.get method returns None if the key isn't in the dict, although you can specify another default return value if you want. So we could re-write that part like this:
for ticket in newtickets:
line = data.get(ticket, "not found")
print(ticket, line)
Alternatively, we could use the in operator:
for ticket in newtickets:
if ticket in data:
print(ticket, data[ticket])
else:
print(ticket, "not found")
All 3 versions have roughly the same efficiency, choose whichever you feel is the most readable.
Related
Search through the first column of a piped '|' delimited .txt file containing 10 million rows using python. The first column contains phone number. I would like to output the entire row for that phone number
The file is 5GB .txt file, I am unable to open it in either ms excel or ms access. So i want to write a python code that can search through the file and print out the entire row which matches a particular phone number. Phone number is in the first column. I wrote a code but it searches the entire file and is very slow. I just want to search the first column and my search item is the phone number.
f = open("F:/.../master.txt","rt") # open file master.txt
for line in f: # check each line in the file handle f
if '999995555' in line: # if a particular phone number is found
print(line) # print the entire row
f.close() # close file
I expect the entire row to be printed on screen where the first column contains the phone number i am searching. but it is taking a lot of time as I am unable to search for the column as I don t know the code.
Well you are on the correct track there. Since it is a 5GB file, you probably want to avoid allocating 5GB of RAM for this. You probably better of using .readline(), since it is design for exactly your scenario (a big file).
Something like the following should do the trick, note that .readline() will return '' for the end of the file and '\n' for empty lines. The .strip() call is merely to remove the '\n' that .readline() returns at the end of each line actually in the file.
def search_file_line_prefix(path, search_prefix):
with open(path, 'r') as file_handle:
while (True):
line = file_handle.readline()
if line == '':
break
if line.startswith(search_prefix):
yield line.strip()
for result in search_file_line_prefix('file_path', 'phone number'):
print(result)
i've run into a problem where: My code creates a file with headers, writes data to it. And then when i run it for a second time it overwrites the data, which it should but start a new line. Also, what does delimiter mean?
#Intro
import csv
headers=["Name","Age","Year Group"]
with open("Details.csv","a" and "w") as i:
w=csv.writer(i,delimiter=",")
w.writerow(headers)
print("Welcome User, to my Topics Quiz!\n------------------------------
---------\nYou can choose from 3 different topics:\n • History\n • Music\n •
Computer Science\n---------------------------------------")
#Storing: User's name, age and year group
print("Before we start, we need to register an account.")
User=input("Enter your name:\n")
Age=input("Enter your age:\n")
Year=input("Enter your year group:\n")
details=[User,Age,Year]
w.writerow(details)
with open("UserPass.csv","a" and "w") as Userpass:
w=csv.writer(Userpass,delimiter=",")
headers=["Username","Password"]
w.writerow(headers)
NewUser=(User[:3]+Age)
print("Great! Your username is set to: {}".format(NewUser))
Pass=input("Enter a password for your account:\n")
userpass=[NewUser,Pass]
w.writerow(userpass)
So the code will write out the data when i want it to add.
Thanks in advance.
You're mode is only "w":
>>> "a" and "w" == "w"
True
Instead use only "a".
a is enough for the open mode.
Append: "a" has Write: "w" in its definition. When you say you want to append it means you want to write at the end... So don't use them together.
delimiter means separator which most of the times refers to some characters like comma, space, dot, etc.
If you wish to append to an existing .csv file you need to either:
skip writing the header if the file already exists, (check with os.path.exists() before opening it) or
open the file in read mode & read the file into memory first, (as a list of lists), close it then add the new row to the data and then overwrite the whole thing, including headers, over the original, (this does let you do things like sorting the rows).
As others have said your file open mode should be one of:
"r" = Read Text
"rb" = Read Binary
"w" = *Over***W**rite Text
"wb" = *Over***Write **Binary
"a" = Append Text
"ab" = Append Binary
The delimiter specifies what separates the fields in the .csv file.
As an aside you should never store passwords - instead you should store a hash of the password and when the user next enters a password calculate the same hash and compare it with the stored hash. Something like:
import hashlib
import getpass
pswd = getpass.getpass()
userpass = hashlib.sha256(pswd.encode('ascii')).hexdigest()
As for the term delimiter, remember that CSV stands for Comma Seperated Values. Delimiter means seperator; CSV uses a comma (",") as its seperator or delimiter, but the python module csv gives you the option to specify a different delimiter, such as a tab character ("\t"), etc.
As for the file operations, you could check if the file is blank. If it is, write the headers. Then (always) append the data. Something like this:
# open the file for appending (the "a"). Create it if it doesn't exist (the "+")
with open(filename, "a+") as f:
w = csv.writer(f, delimiter=",")
# only write the headers if the file is blank (i.e. the first time the program runs)
if f.read() == "":
w.writerows[headers]
# get `details` from the user.
# ...
# the data always gets appended
w.writerows(details)
Hope this helps!
I am trying to use Python to search through and Excel file and print data that corresponds to the value of a cell that the user searched for.
I have an Excel File with a list of every Zip Code in the USA in one column and the next four columns are information related to that Zip Code such as the state it is located in, the price to ship an object there, and so on. I would like the user to be able to search for a specific zip code and have the program print out the information in the corresponding cells.
Here is what I have so far:
from xlrd import open_workbook
book = open_workbook('zip_code_database edited.xls',on_demand=True)
prompt = '>'
print "Please enter a Zip Code."
item = raw_input(prompt)
sheet = book.sheet_by_index(0)
for cell in sheet.col(1): #
if sheet.cell_value == item:
print "Data: ",sheet.row
Any and all help is greatly appreciated!
sheet.cell_value is a function it will never be equal to item. You should try accessing using cell.value
Example -
for cell in sheet.col(1):
if cell.value == item:
print "Data: ",cell
I haven't worked with the module xlrd that you are using, but it seems like you could maybe make this easier on yourself if you just use a regular python dictionary for this job and create a small module containing the loaded dictionary. I'm assuming you are familiar with the python dictionary for the following solution.
You will use the zip codes as keys and the other 4 data fields as values for the dictionary (I'm assuming a .csv file below but you could also use tab-delimited or other single spaces). Call the following file make_zip_dict.py:
zipcode_dict = {}
myzipcode = 'zip_code_database edited.xls'
with open(myzipcode, 'r') as f:
for line in f:
line = line.split(',') # break the line into the 5 fields
zip_code = line[0] # assuming zips are the first column
info = ' '.join(line[1:]) # the other fields turned into a string with single spaces
# now for each zip, enter the info in the dictionary:
zipcode_dict[zip_code] = info
Save this file to the same directory as 'zip_code_database edited.xls'. Now, to use it interactively, navigate to the directory and start a python interactive session:
>>> import make_zip_dict as mzd # this loads the module and the dict
>>> my_zip = '10001' # pick some zip to try out
>>> mzd.zipcode_dict[my_zip] # enter it into the dict
'New York City NY $9.99 info4' # it should return your results
You can just work with this interactively on the command line by entering your desired zip code. There are some fancy bells and whistles you could add too, but this will very quickly spit out the info and it should be pretty lightweight and quick.
I have a file where each line starts with a number. The user can delete a row by typing in the number of the row the user would like to delete.
The issue I'm having is setting the mode for opening it. When I use a+, the original content is still there. However, tacked onto the end of the file are the lines that I want to keep. On the other hand, when I use w+, the entire file is deleted. I'm sure there is a better way than opening it with w+ mode, deleting everything, and then re-opening it and appending the lines.
def DeleteToDo(self):
print "Which Item Do You Want To Delete?"
DeleteItem = raw_input(">") #select a line number to delete
print "Are You Sure You Want To Delete Number" + DeleteItem + "(y/n)"
VerifyDelete = str.lower(raw_input(">"))
if VerifyDelete == "y":
FILE = open(ToDo.filename,"a+") #open the file (tried w+ as well, entire file is deleted)
FileLines = FILE.readlines() #read and display the lines
for line in FileLines:
FILE.truncate()
if line[0:1] != DeleteItem: #if the number (first character) of the current line doesn't equal the number to be deleted, re-write that line
FILE.write(line)
else:
print "Nothing Deleted"
This is what a typical file may look like
1. info here
2. more stuff here
3. even more stuff here
When you open a file for writing, you clobber the file (delete its current contents and start a new file). You can find this out by reading documentation for the open() command.
When you open a file for appending, you do not clobber the file. But how can you delete just one line? A file is a sequence of bytes stored on a storage device; there is no way for you to delete one line and have all the other lines automatically "slide down" into new positions on the storage device.
(If your data was stored in a database, you could actually delete just one "row" from the database; but a file is not a database.)
So, the traditional way to solve this: you read from the original file, and you copy it to a new output file. As you copy, you perform any desired edits; for example, you can delete a line simply by not copying that one line; or you can insert a line by writing it in the new file.
Then, once you have successfully written the new file, and successfully closed it, if there is no error, you go ahead and rename the new file back to the same name as the old file (which clobbers the old file).
In Python, your code should be something like this:
import os
# "num_to_delete" was specified by the user earlier.
# I'm assuming that the number to delete is set off from
# the rest of the line with a space.
s_to_delete = str(num_to_delete) + ' '
def want_input_line(line):
return not line.startswith(s_to_delete)
in_fname = "original_input_filename.txt"
out_fname = "temporary_filename.txt"
with open(in_fname) as in_f, open(out_fname, "w") as out_f:
for line in in_f:
if want_input_line(line):
out_f.write(line)
os.rename(out_fname, in_fname)
Note that if you happen to have a file called temporary_filename.txt it will be clobbered by this code. Really we don't care what the filename is, and we can ask Python to make up some unique filename for us, using the tempfile module.
Any recent version of Python will let you use multiple statements in a single with statement, but if you happen to be using Python 2.6 or something you can nest two with statements to get the same effect:
with open(in_fname) as in_f:
with open(out_fname, "w") as out_f:
for line in in_f:
... # do the rest of the code
Also, note that I did not use the .readlines() method to get the input lines, because .readlines() reads the entire contents of the file into memory, all at once, and if the file is very large this will be slow or might not even work. You can simply write a for loop using the "file object" you get back from open(); this will give you one line at a time, and your program will work with even really large files.
EDIT: Note that my answer is assuming that you just want to do one editing step. As #jdi noted in comments for another answer, if you want to allow for "interactive" editing where the user can delete multiple lines, or insert lines, or whatever, then the easiest way is in fact to read all the lines into memory using .readlines(), insert/delete/update/whatever on the resulting list, and then only write out the list to a file a single time when editing is all done.
def DeleteToDo():
print ("Which Item Do You Want To Delete?")
DeleteItem = raw_input(">") #select a line number to delete
print ("Are You Sure You Want To Delete Number" + DeleteItem + "(y/n)")
DeleteItem=int(DeleteItem)
VerifyDelete = str.lower(raw_input(">"))
if VerifyDelete == "y":
FILE = open('data.txt',"r") #open the file (tried w+ as well, entire file is deleted)
lines=[x.strip() for x in FILE if int(x[:x.index('.')])!=DeleteItem] #read all the lines first except the line which matches the line number to be deleted
FILE.close()
FILE = open('data.txt',"w")#open the file again
for x in lines:FILE.write(x+'\n') #write the data to the file
else:
print ("Nothing Deleted")
DeleteToDo()
Instead of writing out all lines one by one to the file, delete the line from memory (to which you read the file using readlines()) and then write the memory back to disk in one shot. That way you will get the result you want, and you won't have to clog the I/O.
You could mmap the file... after haven read the suitable documentation...
You don't need to check for the lines numbers in your file, you can do something like this:
def DeleteToDo(self):
print "Which Item Do You Want To Delete?"
DeleteItem = int(raw_input(">")) - 1
print "Are You Sure You Want To Delete Number" + str(DeleteItem) + "(y/n)"
VerifyDelete = str.lower(raw_input(">"))
if VerifyDelete == "y":
with open(ToDo.filename,"r") as f:
lines = ''.join([a for i,a in enumerate(f) if i != DeleteItem])
with open(ToDo.filename, "w") as f:
f.write(lines)
else:
print "Nothing Deleted"
I'm reading a 6 million entry .csv file with Python, and I want to be able to search through this file for a particular entry.
Are there any tricks to search the entire file? Should you read the whole thing into a dictionary or should you perform a search every time? I tried loading it into a dictionary but that took ages so I'm currently searching through the whole file every time which seems wasteful.
Could I possibly utilize that the list is alphabetically ordered? (e.g. if the search word starts with "b" I only search from the line that includes the first word beginning with "b" to the line that includes the last word beginning with "b")
I'm using import csv.
(a side question: it is possible to make csv go to a specific line in the file? I want to make the program start at a random line)
Edit: I already have a copy of the list as an .sql file as well, how could I implement that into Python?
If the csv file isn't changing, load in it into a database, where searching is fast and easy. If you're not familiar with SQL, you'll need to brush up on that though.
Here is a rough example of inserting from a csv into a sqlite table. Example csv is ';' delimited, and has 2 columns.
import csv
import sqlite3
con = sqlite3.Connection('newdb.sqlite')
cur = con.cursor()
cur.execute('CREATE TABLE "stuff" ("one" varchar(12), "two" varchar(12));')
f = open('stuff.csv')
csv_reader = csv.reader(f, delimiter=';')
cur.executemany('INSERT INTO stuff VALUES (?, ?)', csv_reader)
cur.close()
con.commit()
con.close()
f.close()
you can use memory mapping for really big files
import mmap,os,re
reportFile = open( "big_file" )
length = os.fstat( reportFile.fileno() ).st_size
try:
mapping = mmap.mmap( reportFile.fileno(), length, mmap.MAP_PRIVATE, mmap.PROT_READ )
except AttributeError:
mapping = mmap.mmap( reportFile.fileno(), 0, None, mmap.ACCESS_READ )
data = mapping.read(length)
pat =re.compile("b.+",re.M|re.DOTALL) # compile your pattern here.
print pat.findall(data)
Well, if your words aren't too big (meaning they'll fit in memory), then here is a simple way to do this (I'm assuming that they are all words).
from bisect import bisect_left
f = open('myfile.csv')
words = []
for line in f:
words.extend(line.strip().split(','))
wordtofind = 'bacon'
ind = bisect_left(words,wordtofind)
if words[ind] == wordtofind:
print '%s was found!' % wordtofind
It might take a minute to load in all of the values from the file. This uses binary search to find your words. In this case I was looking for bacon (who wouldn't look for bacon?). If there are repeated values you also might want to use bisect_right to find the the index of 1 beyond the rightmost element that equals the value you are searching for. You can still use this if you have key:value pairs. You'll just have to make each object in your words list be a list of [key, value].
Side Note
I don't think that you can really go from line to line in a csv file very easily. You see, these files are basically just long strings with \n characters that indicate new lines.
You can't go directly to a specific line in the file because lines are variable-length, so the only way to know when line #n starts is to search for the first n newlines. And it's not enough to just look for '\n' characters because CSV allows newlines in table cells, so you really do have to parse the file anyway.
my idea is to use python zodb module to store dictionaty type data and then create new csv file using that data structure. do all your operation at that time.
There is a fairly simple way to do this.Depending on how many columns you want python to print then you may need to add or remove some of the print lines.
import csv
search=input('Enter string to search: ')
stock=open ('FileName.csv', 'wb')
reader=csv.reader(FileName)
for row in reader:
for field in row:
if field==code:
print('Record found! \n')
print(row[0])
print(row[1])
print(row[2])
I hope this managed to help.