Read .xls with xlrd in Python

Read .xls with xlrd in Python - python

I'm trying to get a list of backgrounds for each shots in an .xls document but I have no idea how to say to stop reading column and rows when it's a different shot...
The .xls I'm reading is like this:
and my test code is there:
import xlrd
planNameToFind = '002'
backgroundsList = []
openFolderPath = 'I:\\manue\\REFS\\516\\ALCM_516_SceneAssetList.xls'.format().replace('/','\\')
wb = xlrd.open_workbook(openFolderPath)
sheets = wb.sheet_by_index(0)
nrows = sheets.nrows
allcols = sheets.ncols
episode_index = 0
shot_index = 2
background_index = 9
rst = 1
seqrow = rst + 1
for rowx in xrange(rst + 2, nrows + 1):
planName = str(sheets.cell_value(seqrow, shot_index)).replace('.0', '')
if planName == planNameToFind:
print 'planName',planName
background = str(sheets.cell_value(seqrow, background_index)).replace('.0', '')
print 'background: ',background
backgroundsList.append(background)
if planName == '':
background = str(sheets.cell_value(seqrow, background_index)).replace('.0', '')
if background != '':
print 'background2: ',background
backgroundsList.append(background)
#shotToDo = ' shotToDo: {0} BG: {1}'.format(planName,background)
seqrow += 1
print 'backgroundsList: ',backgroundsList
My result log alls the backgrounds in the .xls, but I need only backgrounds of shot '002' (here only 3 backgrounds). Does someone know how to read backgrounds only for a shot?

Your best bet is to work with Panda Dataframes. It's extremely easy to clean data and work with it.

Related

Color a Pandas Cell Based on Search Info in Other Dataframes in Python

I may have painted myself into a corner with this. I have two excel documents I need to compare, and I'm using Pandas in Python.
I want to be able to select which file in the current folder is the main one to compare, a column on that sheet, and color each cell in that column based on how many instances of that line were found in other sheets in the same folder.
The two different dataframes are from different excel files with different shapes, and I want to go line by line through one dataframe, and compare one cell with each other cell in the other dataframe (the data is in different places in each list). That's so far out of my knowledge level, so I've been trying to just compare two columns, and search count the instances that item appears in the other column in the other dataframe. Then I want to color the text based on how many instances were found in the second dataframe.
I've tried this a number of ways, but the code below is the closest I've been able to get. My most recent error with the code below is
"ValueError: Can only compare identically-labeled DataFrame objects"
The relevant code is below:
from pathlib import Path
import sys
import shutil
import os
from nltk import PorterStemmer
import numpy as np
from openpyxl import Workbook
from openpyxl.styles import PatternFill
def show_menu(input_list):
if isinstance(input_list[0], str):
i = 1
for f in input_list:
print(i.__str__() + '. ' + f)
i = i + 1
elif isinstance(input_list[0], pd.DataFrame):
i = 1
for f in input_list:
# print(i.__str__() + '. ' + f.columns)
print(i.__str__())
print(f.head)
i = i + 1
else:
print("Shit's Wack, yo!")
files = [f for f in os.listdir('.') if os.path.isfile(f)]
while True:
try:
# show menu to select the main file
print('Select the main file.')
show_menu(files)
selection = int(input('Enter Number: ')) - 1
original_file_name = files.pop(selection)
# report_file_name = "ReportFor" + original_file_name
report_file_name = original_file_name
print(files)
print("Main File Selected: " + original_file_name)
# make a copy of the main file to use as a report file
# shutil.copyfile(original_file_name, report_file_name)
# print("Report File Created: " + report_file_name)
show_menu(files)
selection = int(input('Enter Number: ')) - 1
comparison_file_name = files.pop(selection)
break
except IndexError:
print("Select a number from the list.")
compare_sheet_df = pd.read_excel(comparison_file_name)
report_sheet_df = pd.read_excel(report_file_name)
show_menu(report_sheet_df.columns)
primary_column_selection = int(input('Select column to compare: ')) - 1
show_menu(compare_sheet_df.columns)
compare_column_selection = int(input('Select column to compare: ')) - 1
wb = Workbook()
sheet = wb["Sheet"]
for r in report_sheet_df.iterrows():
counter = 0
for i in compare_sheet_df.iterrows():
for c in compare_sheet_df.iterrows():
if report_sheet_df.iloc[[r[0], primary_column_selection]] == compare_sheet_df.iloc[[i[0], c[0]]]:
# if there's a match, add one to the counter
counter += 1
if counter > 1:
my_cell = sheet.cell(row=r, column=1)
# write the title to the cell
my_cell.value = report_sheet_df.iloc[r]
# color the cell red in r
my_cell.fill = PatternFill("solid", start_color="#FF0000")
elif counter < 1:
my_cell = sheet.cell.iloc(row=r, column=1)
# write the title to the cell
my_cell.value = report_sheet_df.iloc[r]
# color the cell red in r
my_cell.fill = PatternFill("solid", start_color="#FFA500")
else:
my_cell = sheet.cell(row=r, column=1)
# write the title to the cell
my_cell.value = report_sheet_df[r]
# color the cell red in r
my_cell.fill = PatternFill("solid", start_color="#00FF00")
wb.save("output.xlsx")
Any help is greatly appreciated!

AttributeError: 'pywintypes.datetime' object has no attribute 'nanosecond'

I have some code to open an excel file and save it as a pandas dataframe, it was originally used in Python 2.7 and I am currently trying to make it work under Python 3.
Originally, I used the code in #myidealab from this other post: From password-protected Excel file to pandas DataFrame.
It currently looks like this:
data_file = <path_for_file>
# Load excel file
xlApp = win32com.client.Dispatch("Excel.Application")
xlApp.Visible = False
pswd = getpass.getpass('password: ')
xldatabase = xlApp.Workbooks.Open(data_file, False, True, None, pswd)
dfdatabase = []
for sh in xldatabase.Sheets:
xlsheet = xldatabase.Worksheets(sh.Name)
# Get last_row
row_num = 0
cell_val = ''
while cell_val != None:
row_num += 1
cell_val = xlsheet.Cells(row_num, 1).Value
last_row = row_num - 1
# Get last_column
col_num = 0
cell_val = ''
while cell_val != None:
col_num += 1
cell_val = xlsheet.Cells(1, col_num).Value
last_col = col_num - 1
# Get content
content = xlsheet.Range(xlsheet.Cells(1, 1), xlsheet.Cells(last_row, last_col)).Value
# Load each sheet as a dataframe
dfdatabase.append(pd.DataFrame(list(content[1:]), columns=content[0]))
Now, I am getting the following error:
AttributeError: 'pywintypes.datetime' object has no attribute
'nanosecond'
The problem seems to boil down to the lines bellow:
# Get content
content = xlsheet.Range(xlsheet.Cells(1, 1), xlsheet.Cells(last_row, last_col)).Value
# Load each sheet as a dataframe
dfdatabase.append(pd.DataFrame(list(content[1:]), columns=content[0]))
The xlsheet.Range().Value is reading the data and assigning pywintymes descriptors to the data, which pd.DataFrame() fails to interpret.
Did anyone ran into this issue before? Is there a way that I can specifically tell xlsheet.Range().Value how to read the values in a way that pandas can interpret?
Any help will be welcome!
Thank you.

This solves the issue, assuming you know beforehand the size/formatting of your dates/times in the excel sheet.
Might be there are other more elegant ways to solve it, nonetheless.
Note: content is initially a tuple. Position [0] is the array containing the headers and the remaining positions contain the data.
import datetime
import pywintypes
...
content = xlsheet.Range(xlsheet.Cells(1, 1), xlsheet.Cells(last_row, last_col)).Value
head = content[0]
data = list(content[1:])
for x in range(0,len(data)):
data[x] = list(data[x])
for y in range(0,len(data[x])):
if isinstance(data[x][y], pywintypes.TimeType):
temp = str(data[x][y]).rstrip("+00:00").strip()
if len(temp)>10:
data[x][y] = datetime.datetime.strptime(temp, "%Y-%m-%d%H:%M")
elif len(temp)>5 and len(temp)<=10:
data[x][y] = datetime.datetime.strptime(temp, "%Y-%m-%d")
elif len(temp)<=5:
data[x][y] = datetime.datetime.strptime(temp, "%H:%M")
print(data[x][y])
# Load each sheet as a dataframe
dfdatabase.append(pd.DataFrame(data, columns=head))
Used this as references:
python-convert-pywintyptes-datetime-to-datetime-datetime

Openpyxl won't save file

For some reason Openpyxl won't save the the xlsx file at the end of the program.
I am trying to read measurments from a file, each line is a different measurement. I want to take them and write them to excel as to make using this data later on easier. Everything seems to work, but in the end the data isn't saved, if i create new file where the changes should be saved it will not be created.
from openpyxl import load_workbook
from openpyxl import Workbook
wb = load_workbook(filename='Data_Base.xlsx')
sheet = wb.worksheets[0]
BS = []
Signal = []
with open('WifiData2.txt') as f:
for line in f:
y = int(line.split('|')[0].split(';')[3])
x = int(line.split('|')[0].split(';')[2])
floor = int(x = line.split('|')[0].split(';')[1])
data = line.split("|")[1].strip()
measurements = data.split(";")
for l in measurements:
raw = l.split(" ")
BSSID = raw[0]
signal_strength = raw[1]
print(signal_strength)
BS.append(BSSID)
Signal.append(signal_strength)
for row_num in range(sheet.max_row):
num = row_num
if row_num > 1:
test_X = int(sheet.cell(row=row_num, column=4).value)
test_Y = int(sheet.cell(row=row_num, column=3).value)
test_floor = int(sheet.cell(row=row_num, column=2).value)
if (test_X == x and test_Y == y and test_floor == floor):
nr = nr + 1
if (nr > 3):
q = 1
if (q == 0):
sheet.cell(row=sheet.max_row+1, column = 2, value = floor)
sheet.cell(row=sheet.max_row + 1, column=3, value=x)
sheet.cell(row=sheet.max_row + 1, column=4, value=y)
sheet.cell(row=sheet.max_row + 1, column=2, value=sheet.max_row)
for element in BS:
nr = 0
for col in sheet.max_column:
if BS[element] == sheet.cell(row=1, column=col).value:
sheet.cell(row=sheet.max_row + 1, column=col, value=Signal[element])
nr = 1
if (nr == 0):
sheet.cell(row=1, column=sheet.max_column+1, value=BS[element])
sheet.cell(row=sheet.max_row+1, column=sheet.max_column + 1, value=BS[element])
Signal.clear()
BS.clear()
wb.save('Data_Base1.xlsx')
What is weird that if i save the workbook earlier it will create the file. Of course it doesnt really work for me since any changes that i want made won't be made. I had similar issue when i tried it with xlrd/wt/utils combo. Does any1 know where the problem is ?

Use absolute path instead of relative path will do the trick!

Add
wb.template = False
before
wb.save('Filename.xlsx')

List index out of range error in breaking whiloe loop in python

Hi I am new to python and struggling my way out. Currently ia m doing some appending excel files kind of task and here's my sample code. Getting list out of index error as according to me while loop is not breaking at rhe end of each excel file. Any help would be appreciated. Thanks:
import xlrd
import glob
import os
import openpyxl
import csv
from xlrd import open_workbook
from os import listdir
row = {}
basedir = '../files/'
files = listdir('../files')
sheets = [filename for filename in files if filename.endswith("xlsx")]
header_is_written = False
for filename in sheets:
print('Parsing {0}{1}\r'.format(basedir,filename))
worksheet = open_workbook(basedir+filename).sheet_by_index(0)
print (worksheet.cell_value(5,6))
counter = 0
while True:
row['plan name'] = worksheet.cell_value(1+counter,1).strip()
row_values = worksheet.row_slice(counter+1,start_colx=0, end_colx=30)
row['Dealer'] = int(row_values[0].value)
row['Name'] = str(row_values[1].value)
row['City'] = str(row_values[2].value)
row['State'] = str(row_values[3].value)
row['Zip Code'] = int(row_values[4].value)
row['Region'] = str(row_values[5].value)
row['AOM'] = str(row_values[6].value)
row['FTS Short Name'] = str(row_values[7].value)
row['Overall Score'] = float(row_values[8].value)
row['Overall Rank'] = int(row_values[9].value)
row['Count of Ros'] = int(row_values[10].value)
row['Count of PTSS Cases'] = int(row_values[11].value)
row['% of PTSS cases'] = float(row_values[12].value)
row['Rank of Cases'] = int(row_values[13].value)
row['% of Not Prepared'] = float(row_values[14].value)
row['Rank of Not Prepared'] = int(row_values[15].value)
row['FFVt Pre Qrt'] = float(row_values[16].value)
row['Rank of FFVt'] = int(row_values[17].value)
row['CSI Pre Qrt'] = int(row_values[18].value)
row['Rank of CSI'] = int(row_values[19].value)
row['FFVC Pre Qrt'] = float(row_values[20].value)
row['Rank of FFVc'] = int(row_values[21].value)
row['OnSite'] = str(row_values[22].value)
row['% of Onsite'] = str(row_values[23].value)
row['Not Prepared'] = int(row_values[24].value)
row['Open'] = str(row_values[25].value)
row['Cost per Vin Pre Qrt'] = float(row_values[26].value)
row['Damages per Visit Pre Qrt'] = float(row_values[27].value)
row['Claim Sub time pre Qrt'] = str(row_values[28].value)
row['Warranty Index Pre Qrt'] = str(row_values[29].value)
counter += 1
if row['plan name'] is None:
break
with open('table.csv', 'a',newline='') as f:
w=csv.DictWriter(f, row.keys())
if header_is_written is False:
w.writeheader()
header_is_written = True
w.writerow(row)

In place of while True use for.
row['plan name'] = worksheet.cell_value(1 + counter, 1).strip()
row_values = worksheet.row_slice(counter + 1, start_colx=0, end_colx=30)
for values in row_values:
row['Dealer'] = int(values.value)
row['Name'] = str(values.value)
....
because while True means to run this loop infinite time.(or until it means break keyword) inside while loop
Read more about while loop

while True loop basically means: execute the following code block to infinity, unless a break or sys.exit statement get you out.
So in your case, you need to terminate after the lines to append the excel are over (exhausted). You have two options: check if there are more lines to append, and if not break.
A more suitable approach when writing a file is for loops. This kind of a loop terminates when it is exausted.
Also, you should consider gathering the content of the excel in one operation, and save it to a variable. Then, once you have it, create iteration and append it to csv.

Recalling sheet names for a while loop

I have imported xlrd etc. The main part of my code is then as follows:
for serie_diam in range(0,9):
namesheet = "Diamètre " + str(serie_diam)
#select(namesheet)
numLine = sh.row_values(3)
OK = 1
while OK == 1:
d = sh1(numLine, 1)
D = sh1(numLine, 2)
rs = sh1(numLine, 7)
for i in range(4):
BB = sh1(numLine, 2 + i)
if BB != 0:
print repr(d).rjust(2), repr(D).rjust(3), repr(B).rjust(4), repr(rs).rjust(5)
I have 7 sheets in my xls file overall and I would like to know how I can loop through these in the same while loop as OK == 1 where for the moment I have written just 'sh1'.
I'm sorry if this question is too easy!

import xlrd
book = xlrd.open_workbook('xlrd_test.xls')
for sheet in book.sheets():
print sheet.row(0) # do stuff here - I'm printing the first line as example
# or if you need the sheet index for some purpose:
for shidx in xrange(0, book.nsheets):
sheet = book.sheet_by_index(shidx)
# would print 'Page N, first line: ....'
print 'Page %d, first line: %s' % (shidx, sheet.row(0))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Read .xls with xlrd in Python - python

Your best bet is to work with Panda Dataframes. It's extremely easy to clean data and work with it.

Related

Color a Pandas Cell Based on Search Info in Other Dataframes in Python

AttributeError: 'pywintypes.datetime' object has no attribute 'nanosecond'

Openpyxl won't save file

List index out of range error in breaking whiloe loop in python

Recalling sheet names for a while loop

Categories

Resources