Searching through an Excel Sheet and Printing Data in Python - python

I am trying to use Python to search through and Excel file and print data that corresponds to the value of a cell that the user searched for.
I have an Excel File with a list of every Zip Code in the USA in one column and the next four columns are information related to that Zip Code such as the state it is located in, the price to ship an object there, and so on. I would like the user to be able to search for a specific zip code and have the program print out the information in the corresponding cells.
Here is what I have so far:
from xlrd import open_workbook
book = open_workbook('zip_code_database edited.xls',on_demand=True)
prompt = '>'
print "Please enter a Zip Code."
item = raw_input(prompt)
sheet = book.sheet_by_index(0)
for cell in sheet.col(1): #
if sheet.cell_value == item:
print "Data: ",sheet.row
Any and all help is greatly appreciated!

sheet.cell_value is a function it will never be equal to item. You should try accessing using cell.value
Example -
for cell in sheet.col(1):
if cell.value == item:
print "Data: ",cell

I haven't worked with the module xlrd that you are using, but it seems like you could maybe make this easier on yourself if you just use a regular python dictionary for this job and create a small module containing the loaded dictionary. I'm assuming you are familiar with the python dictionary for the following solution.
You will use the zip codes as keys and the other 4 data fields as values for the dictionary (I'm assuming a .csv file below but you could also use tab-delimited or other single spaces). Call the following file make_zip_dict.py:
zipcode_dict = {}
myzipcode = 'zip_code_database edited.xls'
with open(myzipcode, 'r') as f:
for line in f:
line = line.split(',') # break the line into the 5 fields
zip_code = line[0] # assuming zips are the first column
info = ' '.join(line[1:]) # the other fields turned into a string with single spaces
# now for each zip, enter the info in the dictionary:
zipcode_dict[zip_code] = info
Save this file to the same directory as 'zip_code_database edited.xls'. Now, to use it interactively, navigate to the directory and start a python interactive session:
>>> import make_zip_dict as mzd # this loads the module and the dict
>>> my_zip = '10001' # pick some zip to try out
>>> mzd.zipcode_dict[my_zip] # enter it into the dict
'New York City NY $9.99 info4' # it should return your results
You can just work with this interactively on the command line by entering your desired zip code. There are some fancy bells and whistles you could add too, but this will very quickly spit out the info and it should be pretty lightweight and quick.

Related

how to go to changed url in selenium python

i am a completely beginner in programming, and i am trying to make my first python script, and i have a url in this form: https://stackoverflow.com/questions/ID
where ID is changed every time in a loop, and a list of IDs given in a text file.
now i tried to do it this way:
if id_str != "":
y = f'https://stackoverflow.com/questions/{id}'
browser.get(y)
but it opens only the first ID in the text file, so i need to know how to make it get a different ID from the text file every time.
Thanks in Advance
Generally it can be something like this:
with open(filename) as file:
lines = file.readlines()
for line in lines:
if id_str != "":
y = f'https://stackoverflow.com/questions/{id_str}'
browser.get(y)
where filename is a text file containing questions ids.
Each line here containing a single id string.
It can be more complicated, according to your needs / implementation.

How to strip/replace text in with selenium python?

I'm learning to do web scraping and i managed to pull data out of a webpage into excel file. But, it might be because of the item names that contain "," and this made the item names in the excel file to multiple columns.
I have tried using strip and replace elements in the list but it returns an error saying: AttributeError: 'WebElement' object has no attribute 'replace'.
item = driver.find_elements_by_xpath('//h2[#class="list_title"]')
item = [i.replace(",","") for i in item]
price = driver.find_elements_by_xpath('//div[#class="ads_price"]')
price = [p.replace("rm","") for p in price]
expected result in excel file file:
expected
actual result in excel file file:
actual
The function find_elements_by_xpath returns a WebElement object, you will need to convert this to a string in order to use the replace function.
Depending on your use case you may want to reconsider using excel as your storage medium, unless this is the final step of your process.
The portion of your code that you've included in your question isn't the portion that's relevant to the issue you're experiencing.
As CMMCD mentioned, I would also recommend skipping the binary excel format for the sake of simplicity, and use the built-in csv library instead. This will prevent unintended separators from splitting your cells
from csv import writer
# your data should be a list of lists
data = [['product1', 8.0], ['product2', 12.25]] # etc, as an example
with open('your_output_file.csv', 'w') as file:
mywriter = writer(file)
for line in data:
mywriter.writerow(line)
The docs: https://docs.python.org/3/library/csv.html

Searching CSV file and if it's not in the file

I have a CSV file filled with ticket information. I created a small script to input ticket numbers separated by spaces into a list which searches the CSV file for each ticket in the list, and if it finds the ticket, it outputs info on that row.
My problem is if I search for a ticket not in the CSV file, it just skips past it and moves on, but I would like it to tell me that that ticket is not in the file. From what I can tell, it's searching the CSV file by row or line. If I try an else statement, it will start printing out every line in the CSV file, if that ticket's not in the row.
I need to be able to input multiple ticket numbers and have python search each one individually. If it finds the ticket in column 1, then print information from that row and if it doesn't find the ticket in any rows column 1, then print out that ticket followed by "is not in the file".
import csv
file = csv.reader(open('Path To CSV file', "rb"), delimiter=",")
newticket = raw_input('Enter Ticket Numbers:').split()
for line in file:
for tickets in newticket:
if tickets == line[1]:
print(tickets, line[36], line[37])
If your CSV file isn't enormous, I suggest reading the whole thing, extracting the data you want into a dictionary, with the ticket number as the key. Searching a dict is very fast.
I've modified your file opening code to use the with statement, which fails gracefully if there's a problem reading the file. BTW, there's no need to specify the comma as the CSV delimiter, since that the default delimiter.
import csv
with open('Path To CSV file', "rb") as f:
data = {line[1]: (line[36], line[37]) for line in csv.reader(f)}
newtickets = raw_input('Enter Ticket Numbers:').split()
for ticket in newtickets:
line = data.get(ticket)
if line:
print(ticket, line)
else:
print(ticket, "not found")
The dict.get method returns None if the key isn't in the dict, although you can specify another default return value if you want. So we could re-write that part like this:
for ticket in newtickets:
line = data.get(ticket, "not found")
print(ticket, line)
Alternatively, we could use the in operator:
for ticket in newtickets:
if ticket in data:
print(ticket, data[ticket])
else:
print(ticket, "not found")
All 3 versions have roughly the same efficiency, choose whichever you feel is the most readable.

find headings from multiple text files in one location and add to xlsx document with same headings

I need to create a Python program that will read through multiple .txt files in a set directory, then look for specific headings from within the text files and store the data found under the headings from the searched text in a .xlsx document
An example of a .txt file
person: Vyacheslav Danik
address: Ukraine, Kharkov
phone: +380675746805
address: Ukraine, Kharkiv
address: Pavlova st., 319
I need 5 Headers in the excel spreadsheet; number, organization, role, name, and address. And for the python program to put information under these headings in the spreadsheet from each file scanned.
Any help would be appreciated, as I'm struggling a bit with this. Thanks
I'm still a beginner myself, but I thought this seemed easy enough. Its more of a starting point for you to build on and customize. I only choose to do one column (Person), I'm pretty sure everything you need to do what you want is in this example. You'll have to install the 2 required python libraries needed to access spreadsheets by running the next 2 commands (assuming you're using some type of linux, you didn't provide enough info):
pip install xlrd
pip install xlutils
Here's the example, the comments roughly explain what each line does.
#!/usr/bin/env python
''' Required to install these libraries to access spreadsheets
pip install xlrd
pip install xlutils
'''
import os, re, string
from xlutils.copy import copy
from xlrd import open_workbook
book_ro = open_workbook("spreadsheet.xls")
# creates a writeable copy
book = copy(book_ro)
# Select first sheet
sheet1 = book.get_sheet(0)
# Create list to hold people, otherwise we have to figure out the next empty column in spreadsheet
peopleList = []
# Get list of files in current folder and filter only the txt files
for root, dirs, docs in os.walk('.', topdown=True):
for filename in docs:
if (filename.endswith(".txt")) or (filename.endswith(".TXT")):
filepath=(os.path.join(root, name))
# Open file read only
TxtFile = open(filepath,"r")
# Read all the lines at once in variable
lines = TxtFile.readlines()
# Finished reading, close file
TxtFile.close()
# Convert file to big string so it can be searched with re.findall
lines = '\n'.join(lines)
# Find all occurences of "person:" and capture rest of line
people = re.findall(r'person: (.*)',lines)
# Remove delimeters/special character separating each name
people = map(lambda x: x.strip(), people)
# If file has more than 1 person, add each one individually
for person in people:
peopleList.append(person)
row = 0
column = 0
# Sort the list and remove duplicates (set(sort)), then step thru list and write to spreadsheet
for person in set(sorted(peopleList)):
sheet1.write(row, column, person)
row += 1
# This will overwrite the original spreadsheet if one existed
book.save("spreadsheet.xls")

Storing data of a dict in Json

I have a simple program that take input from user and put them into a dict. After then I want to store that data into json file (I searched and found only json useful)
for example
mydict = {}
while True:
user = input("Enter key: ")
if user in mydict.keys(): #if the key already exist only print
print ("{} ---> {}".format(user,mydict[user]))
continue
mn = input("Enter value: ")
app = input ("Apply?: ")
if app == "y":
mydict[user] = mn
with open ("mydict.json","a+") as f:
json.dump(mydict,f)
with open ("mydict.json") as t:
mydict = json.load(t)
Every time user enter a key and value, I want to add them into dict, after then store that dict in json file. And every time I want to read that json file so I can refresh the dict in program.
Those codes above raised ValueError: Extra data: . I understood error occured because I'm adding the dict to json file every time so there are more than one dict. But how can I add whole dict at once? I didn't want to use w mode because I don't want to overwrite the file and I'm new in Json.
Program must be infinite and I have to refresh dict every time, that's why I couldn't find any solution or try anything, since I'm new on Json.
If you want to use JSon, then you will have to use the 'w' option when opening the file for writing. The 'a+' option will append your full dict to the file right after its previously saved version.
Why wouldn't you use csv instead ? With 'a+' option, any newly entered user info will be appended to the end of the file and transforming its content at reading time to a dict is quite easy and should look something like:
import csv
with open('your_dict.json', 'r') as fp:
yourDict = {key: value for key,value in csv.reader(fp, delimiter='\t')
while the saving counterpart would look like:
yourDictWriter = csv.writer( open('your_dict.json','a+'), delimiter='\t') )
#...
yourDictWriter.writerow([key, value])
Another approach would be to use MongoDB, a database designed for storing json documents. Then you won't have to worry about overwriting files, encoding json, and so on, since the database and driver will manage this for you. (Also note that it makes your code more concise.) Assuming you have MongoDB installed and running, you could use it like this:
import pymongo
client = MongoClient()
db = client.test_database.test_collection
while True:
user = input("Enter key: ")
if db.find_one({'user': user}) #if the key already exist only print
print ("{} ---> {}".format(user, db.find_one({'user':user})['value'])
continue
mn = input("Enter value: ")
app = input ("Apply?: ")
if app == "y":
db.insert({'user':user, 'value':value})
With your code as it is right now, you have no reason to append to the file. You're converting the entire dict to JSON and writing it all to file anyway, so it doesn't matter if you lose the previous data. a is no more efficient than w here. In fact it's worse because the file will take much more space on disk.
Using the CSV module as Schmouk said is a good approach, as long as your data has a simple structure. In this case you just have a table with two columns and many rows, so a CSV is appropriate, and also more efficient.
If each row has a more complex structure, such as nested dicts and or lists, or different fields for each row, then JSON will be more useful. You can still append one row at a time. Just write each row into a single line, and have each row be an entire JSON object on its own. Reading files one line at a time is normal and easy in Python.
By the way, there is no need for the last two lines. You only need to read in the JSON when the program starts (as Moses has shown you) so you have access to data from previous runs. After that you can just use the variable mydict and it will remember all the keys you've added to it.

Categories