Python 2.7 - xlrd - Matching A String To a Cell Value - python

Using Python 2.7 on Mac OSX Lion with xlrd
My problem is relatively simple and straightforward. I'm trying to match a string to an excel cell value, in order to insure that other data, within the row that value will be matched to, is the correct value.
So, say for instance that player = 'Andrea Bargnani' and I want to match a row that looks like this:
Draft Player Team
1 Andrea Bargnani - Toronto Raptors
I do:
num_rows = draftSheet.nrows - 1
cur_row = -1
while cur_row < num_rows:
cur_row += 1
row = draftSheet.row(cur_row)
if row[1] == player:
ranking == row[0]
The problem is that the value of row[1] is text:u'Andrea Bargnani, as opposed to just Andrea Bargnani.
I know that Excel, after Excel 97, is all unicode. But even if I do player = u'Andrea Bargnani' there is still the preceding text:. So I tried player = 'text:'u'Andrea Bargnani', but when the variable is called it ends up looking like u'text: Andrea Bargnani and still does not produce a match.
I would like to then just strip the test: u' off of the returned row[1] value in order to get an appropriate match.

You need to get a value from the cell.
I've created a sample excel file with a text "Andrea Bargnani" in the A1 cell. And here the code explaining the difference between printing the cell and it's value:
import xlrd
book = xlrd.open_workbook("input.xls")
sheet = book.sheet_by_index(0)
print sheet.cell(0, 0) # prints text:u'Andrea Bargnani'
print sheet.cell(0, 0).value # prints Andrea Bargnani
Hope that helps.

Related

In openpyxl, is there a way to see what conditional formatting rule(s) are applied to a cell?

I'm using openpyxl 2.5.6 and py 3.7.0. My goal is to read an Excel workbook and print both the contents and the formatting of each cell into a CSV. For instance, if a cell is blue with text "Data" then I would prepend a tag of "[blu]" to the cell value, printing to the CSV as "[blu]Data" and do this likewise with a cell that's bolded and for other fill colors, etc.
I can do this perfectly fine for cells with static formatting, but not with conditional formatting. My issue is that I don't know how to tell if a conditional formatting rule is applied. I found the conditional_formatting._cf_rules dict, but I'm only seeing attributes like formula, priority, dxfId, and the dxf rules itself. I want to believe that the data of whether a cf rule is applied or not might stored somewhere, but I cannot find where it might be.
My code thus far looks something like this.
from openpyxl import load_workbook
wb = load_workbook('Workbook_Name.xlsx', data_only = True)
ws = wb['Worksheet1']
# Code that shows me each cf rule's formula, fill type, priority, etc
cellrangeslist = list(ws.conditional_formatting._cf_rules)
for cellrange in cellrangeslist:
print('{:30s}{:^10s}{:30s}'.format('----------------------------',str(cellrange.sqref),'----------------------------'))
for i in cellrange.cfRule:
print('{:10s}{:8s}{:40s}{:10s}{:10s}'.format(str(i.dxf.fill.bgColor.index), str(i.dxf.fill.bgColor.type), str(i.formula), str(i.stopIfTrue), str(i.priority)))
# This is where I want to be able to identify which cf rule is applied to a given cell
#
#
#
# Code that interprets cell styling into appropriate tags, e.g.
for r in ws.iter_rows(min_row = ws.min_row, max_row = ws.max_row, min_col = ws.min_column, max_col = ws.max_column):
for cell in r:
if cell.font.b == True:
cell.value = "[bold]"+cell.value
# Code to write each cell as a string literal to a CSV file
#
#
#
My Excel file looks like this,
A1 == 1234
B1 == 1235
C1 == '=A1-B1'
And my cf rules look like this,
Formula: =$A1 - $B1 < 0, Format: [red fill], Applies to: =$C$1
Formula: =$A1 - $B1 > 0, Format: [green fill], Applies to: =$C$1
The console output I receive from the above code is
---------------------------- C1 ----------------------------
FF92D050 rgb ['$A1-$B1>0'] None 2
FFFF0000 rgb ['$A1-$B1<0'] None 1
The output shows the rules are properly there, but I'm wanting to know if there's a way to tell which of these rules, if any, are actually applied to the cell.
I have a growing suspicion that it's something calculated on runtime of Excel, so my alternative is to write an Excel formula interpreter, but I'm really hoping to avoid that by just about any means as I'm not sure I have the skill to do it.
If you don't find a better option, following on from my comment this is an example of what you could do with Xlwings.
For the example output shown, A1 is a higher number than B1 so cell C1 is green.A1 = 1236B1 = 1235
If the A1 is changed back to 1234, C1 colour returns to red and if the same code is run after the workbook is saved the 'Colour applied to conditional format cell:' will be for 'Conditional Format 1' i.e. red
import xlwings as xw
from xlwings.constants import RgbColor
def colour_lookup(cfc):
cell_colour = (key for key, value in colour_dict.items() if value == cfc)
for key in cell_colour:
return key
colour_dict = { key: getattr(RgbColor, key) for key in dir(RgbColor) if not key.startswith('_') }
wb = xw.Book('test.xlsx')
ws = wb.sheets('Sheet1')
cf = ws['C1'].api.FormatConditions
print("Number of conditional formatting rules: " + str(cf._inner.Count))
print("Colour applied to conditional format cell:\n\tEnumerated: " +
str(cf._inner.Parent.DisplayFormat.Interior.Color))
print("\tRGBColor: " + colour_lookup(cf._inner.Parent.DisplayFormat.Interior.Color))
print("------------------------------------------------")
for idx, cf_detail in enumerate(cf, start=1):
print("Conditional Format " + str(idx))
print(cf_detail._inner.Formula1)
print(cf_detail._inner.Interior.Color)
print("\tRGBColor: " + colour_lookup(cf_detail._inner.Interior.Color))
print("")
Output
Number of conditional formatting rules: 2
Colour applied to conditional format cell:
Enumerated: 32768.0
RGBColor: rgbGreen
------------------------------------------------
Conditional Format 1
=$A1-$B1<0
255.0
RGBColor: rgbRed
Conditional Format 2
=$A1-$B1>0
32768.0
RGBColor: rgbGreen
xlwings

Openpyxl and Binary Search

The problem: I have two spreadsheets. Spreadsheet 1 has about 20,000 rows. Spreadsheet 2 has near 1 million rows. When a value from a row in spreadsheet 1 matches a value from a row in spreadsheet 2, the entire row from spreadsheet 2 is written to excel. The problem isn't too difficult, but with such a large number of rows, the run time is incredibly long.
Book 1 Example:
|Key |Value |
|------|------------------|
|397241|587727227839578000|
An example of book 2:
ID
a
b
c
587727227839578000
393
24
0.43
My current solution is:
g1 = openpyxl.load_workbook('path/to/sheet/sheet1.xlsx',read_only=True)
grid1 = g1.active
grid1_rows = list(grid1.rows)
g2 = openpyxl.load_workbook('path/to/sheet2/sheet2.xlsx',read_only=True)
grid2 = g2.active
grid2_rows = list(grid2.rows)
for row in grid1_rows:
value1 = int(row[1].value)
print(value1)
for row2 in grid2_rows:
value2 = int(row2[0].value)
if value1 == value2:
new_Name = int(row[0].value)
print("match")
output_file.write(str(new_Name))
output_file.write(",")
output_file.write(",".join(str(c.value) for c in row2[1:]))
output_file.write("\n")
This solution works, but again the runtime is absurd. Ideally I'd like to take value1 (which comes from the first sheet,) then perform a binary search for that value on the other sheet, then just like my current solution, if it matches, copy the entire row to a new file. then just
If there's an even faster method to do this I'm all ears. I'm not the greatest at python so any help is appreciated.
Thanks.
You are getting your butt kicked here because you are using an inappropriate data structure, which requires you to use the nested loop.
The below example uses sets to match indices from first sheet to those in the second sheet. This assumes there are no duplicates on either sheet, which would seem weird given your problem description. Once we make sets of the indices from both sheets, all we need to do is intersect the 2 sets to find the ones that are on sheet 2.
Then we have the matches, but we can do better. If we put the second sheet row data into dictionary with the indices as the keys, then we can hold onto the row data while we do the match, rather than have to go hunting for the matching indices after intersecting the sets.
I've also put in an enumeration, which may or may not be needed to identify which rows in the spreadsheet are the ones of interest. Probably not needed.
This should execute in the blink of an eye after things are loaded. If you start to have memory issues, you may want to just construct the dictionary at the start rather than the list and the dictionary.
Book 1:
Book 2:
Code:
import openpyxl
g1 = openpyxl.load_workbook('Book1.xlsx',read_only=True)
grid1 = g1.active
grid1_rows = list(grid1.rows)[1:] # exclude the header
g2 = openpyxl.load_workbook('Book2.xlsx',read_only=True)
grid2 = g2.active
grid2_rows = list(grid2.rows)[1:] # exclude the header
# make a set of the values in Book 1 that we want to search for...
search_items = {int(t[0].value) for t in grid1_rows}
#print(search_items)
# make a dictionary (key-value paring) for the items in the 2nd book, and
# include an enumeration so we can capture the row number
lookup_dict = {int(t[0].value) : (idx, t) for idx,t in enumerate(grid2_rows, start=1)}
#print(lookup_dict)
# now let's intersect the set of search items and key values to get the keys of the matches...
keys = search_items & lookup_dict.keys()
#print(keys)
for key in keys:
idx = lookup_dict.get(key)[0] # the row index, if needed
row_data = lookup_dict.get(key)[1] # the row data
print(f'row {idx} matched value {key} and has data:')
print(f' name: {row_data[1].value:10s} \t qty: {int(row_data[2].value)}')
Output:
row 3 matched value 202 and has data:
name: steak qty: 3
row 1 matched value 455 and has data:
name: dogfood qty: 10

How to add/append to python object?

I have a python object that was created and defined by swagger editor. I have written a program to populate it. However, instead of appending it, it keeps replacing previous entries. Below is my code;
from xlrd import open_workbook
from abc_model import BES as bes
for sheet in wb.sheet():
if sheet.name == "ABC"
number_columns = sheet.ncols
for i in range(2,number_of_columns):
xyz = bes(name = sheet.cell(19,i).value))
model.abc_model = xyz
print(model)
This only prints and asssign the content of column 4 (assuming there are total 4 columns). However, it should have contents of columns 3 and 4.
Any idea what I am doing wrong?
Instead of model.abc_model = xyz try model.abc_model += xyz
You're setting your column to the last value obtained, rather than adding them together with your iteration. This is why you're only getting the second value, and not the first or both.
As #Dylan Smite mentioned, use model.abc_model += xyz. This line of code basically mean that model.abc_model = model.abc_model + xyz . It means you add xyz to model.abc_model and then assign it back to model.abc_model

How to count the row's values in tables in docx file by using python

I had a problem with counting row's values in tables. Here is my details about the problem, I have a word docx file which consist of many tables. I need to count the Total "Test Type" row's values which has "black box testing". I already asked this question but still can't able to execute. In that docx there are many kinds of tables and paragraph. so I got confused that how to retrieve specific tables and count the row's values. The table would be like this
I'm using python 3.6 and I have installed docx module as well.
Code I have tried:
from docx import Document
def table_test_automation(table):
for row in table.rows:
row_heading = row.cells[9].text
if row_heading != 'Test Type':
continue
Black_box = row.cells[1].text
return 1 if Black_box == 'Black Box' else 0
return 0
document = Document('VRRPv3-PEGASUS.docx')
yes_count = 0
for table in document.tables:
yes_count += table_test_automation(table)
print("Total No Of Black_box:",yes_count)
First i need to retrieve the Test Case Detail and then I need to count the total no of black box testing. For example my docx consist nearly 300 above tables so It is hard to count the row's values. please help me.
Thanks in advance!
Try this:
doc = Document("sample.docx")
i = 0
for t in doc.tables:
for ro in t.rows:
if ro.cells[0].text=="Test Type" and ro.cells[2].text=="Black Box":
i=i+1
print("Total Black Box Tests are: ", i)
Input was:

Dynamic rows in a pandas dataframe

I have an excel spreadsheet that I read using python. I was looking for a way in which I could query the first column of the spreadsheet and assign every cell from that column to a variable. The number of cells in the column that have data can change tomorrow for ex.
Excel Spreadsheet:
Names
Mike
Adam
Mitchell
Desired output: Name1=Mike; Name2= Adam;Name3=Mitchell. If tomorrow there is no Mitchell in the list or if there is an additional name I would either have 3 Name variable or respectively 4.
My try so far was:
for i in db.index:
if i == 1:
Name1 = db.ix[0]['Names']
else:
if i==2:
Name2 = db.ix[1]['Names']
else:
if i==3:
Name3 = db.ix[2]['Names']
else:
Name4 = db.ix[3]['Names']
Thanks and apologies for any mystakes
I manage to fix this in case anyone else has the same issue. I am using 2 lists and concatenate them into a dictionary.
names= db['Names'].tolist()
lst = []
for i in range(db.index):
lst.append(i)
lst=['Name'+str(x)for x in lst]
dictionary = dict(zip(lst, names))

Categories