Erasing only a part of an excel with win32com and python - python

I have 3 parameters.
startLine, starColumn and width (here 2,8,3)
How can I erase the selected area without writing blanks in each cells?
(here there is only 30 line but there could potetialy be 10 000 lines)
Right now I'm succesfully counting the number of lines but I can't manage to find how to select and delete an area
self.startLine = 2
self.startColumn = 8
self.width = 8
self.xl = client.Dispatch("Excel.Application")
self.xl.Visible = 1
self.xl.ScreenUpdating = False
self.worksheet = self.xl.Workbooks.Open("c:\test.xls")
sheet = self.xl.Sheets("data")
#Count the number of line of the record
nb = 0
while sheet.Cells(start_line + nb, self.startColumn).Value is not None:
nb += 1
#must select from StartLine,startColumn to startcolum+width,nb
#and then erase
self.worksheet.Save()
ps : the code works, I may have forgotten some part due do copy/pas error, in reality the handling of the excel file is managed by several classes inheriting from each other
thanks

What I usually do is that I record macro in Excel and than try to re-hack the VB in Python. For deleting content I got something like this, should not be hard to convert it to Python:
Range("H5:J26").Select
Selection.ClearContents
In Python it should be something like:
self.xl.Range("H5:J26").Select()
self.xl.Selection.ClearContents()
Working example:
from win32com.client.gencache import EnsureDispatch
exc = EnsureDispatch("Excel.Application")
exc.Visible = 1
exc.Workbooks.Open(r"f:\Python\Examples\test.xls")
exc.Sheets("data").Select()
exc.Range("H5:J26").Select()
exc.Selection.ClearContents()

this worked for me
xl = EnsureDispatch('Excel.Application')
wb2=xl.Workbooks.Open(file)
ws=wb2.Worksheets("data")
ws.Range("A12:B20").ClearContents()

Related

Reading and writing column data in Python with Pandas

This endeavour is a variation on the wonderful Mac Model Shelf. I have managed thus far to write the code myself that can read single Mac serial numbers at the command line and give back the corresponding model type, based on on the last 3 or 4 chars in the serial.
Write now I am trying to write a script to read-in the column data in an Excel file and return back the results for each cell in the neighbouring column.
The output Excel would hopefully looking something like this (with headers)...
Serial Model
C12PT70EG8WP Macbook Pro 2015 15" 2.5 Ghz i7
K12PT7EG0PW iMac 2010 Intel Core Duo 1.6 Ghz
This is all based on excel file that supplies its data to a python shelve. Here is a small example of how it reads... I've called it 'pgList.xlsx' in the main code. In reality it will be hundreds of lines long.
G8WP Macbook Pro 2015 15" 2.5 Ghz i7
0PW iMac 2010 Intel Core Duo 1.6 Ghz
3RT iPad Pro 2017
Main python3 code...
import shelve
import pandas as pd
#getting the shelve/database ready from the library excel file
DBPATH = "/Users/me/PycharmProjects/shelve/macmodelshelfNEW"
databaseOfMacs = shelve.open(DBPATH)
excelDict = pd.read_excel('pgList.xlsx', header=None, index_col=0,squeeze=True).to_dict()
databaseOfMacs.update(excelDict)
#loading up the excel file and serial numbers I want to examine...
df = pd.read_excel('testSerials.xlsx', sheet_name='Sheet1')
listSerials = df['Serial']
listModels = df['Model']
for i in listSerials:
inputSerial = i
inputSerial = inputSerial.upper()
modelCodeIsolatedFromSerial = ""
if len(inputSerial) == 12:
modelCodeIsolatedFromSerial = inputSerial[-4:]
elif len(inputSerial) == 11:
modelCodeIsolatedFromSerial = inputSerial[-3:]
try:
model = databaseOfMacs[modelCodeIsolatedFromSerial]
#printing to console to check code works
print(model)
except:
print("Result not found")
databaseOfMacs.clear()
databaseOfMacs.close()
Could you guys help me out with writing of the results back to the same excel file? So example, if the serial number was in cell A2, the result (the model type) would be written to B2?
I have tried including this line of code before the main 'for' loop in the code but it only ever serves to wipe the Excel file empty after running the script! I just comment it out for the moment.
writer = pd.ExcelWriter('testSerials.xlsx', engine='xlsxwriter')
Could you also help me handle any potential blank cells in the serials column?
A blank will throw back this error.
AttributeError: 'float' object has no attribute 'upper'
Thanks again for looking after me!
WL
UPDATE
The comments I have up to now have really helped. I think the part where am I getting stuck at is getting the output of the 'for' loop, 'model' in this case into the column for 'Models. The variable 'listModels' doesn't seem to behave like other lists in Python 3 i.e I cannot append anything to it.
UPDATE 2
Some more tinkering, trying to get the result of the serial-number lookup of the values in "Serial" column into the "Model" column.
I have tried (without any real success)
try:
model = databaseOfMacs[modelCodeIsolatedFromSerial]
print(model)
listModels.replace(['nan'], [model], inplace=True)
This doesn't give me an error message but still nothing appears in the outputted excel file.
When I run a for loop to print the contents of 'listModels' I just back a list of "NaN"s, suggesting nothing at all has been changed... bummer!
I've also tried
try:
model = databaseOfMacs[modelCodeIsolatedFromSerial]
print(model)
listModels[i] = model
This will spit back a console error about
A value is trying to be set on a copy of a slice from a DataFrame
but at least I can see the modelname relating to a serial number in the console when I iterate through 'listModels', still nothing in the output Excel file though (along with a 'nan' for every serial number that is examined?)
I am sure it's something small that I am missing in the code to fix this problem. Thanks again to anybody who can help me out.
UPDATE 3
I've solved it on my own. Just had to use a while loop instead.
sizeOfSerialsList = len(listSerials)
count = 0
while (count < sizeOfSerialsList):
inputSerial = listSerials.iloc[count]
inputSerial = str(inputSerial).upper()
modelCodeIsolatedFromSerial = ""
model = ""
if len(inputSerial) == 12:
modelCodeIsolatedFromSerial = inputSerial[-4:]
elif len(inputSerial) == 11:
modelCodeIsolatedFromSerial = inputSerial[-3:]
try:
model = databaseOfMacs[modelCodeIsolatedFromSerial]
listModels.iloc[count] = model
except:
listModels.iloc[count] = "Not found"
count = count + 1
From the XlsxWriter docs, you'll need to call df.to_excel(writer) followed by writer.save().
To avoid that AttributeError, one fix (maybe not the most python-3-esque?) is to change inputSerial = inputSerial.upper() to inputSerial = str(inputSerial).upper().
See Update 3 for code that solved the issue

How to use different fonts for two lines within the same cell in Excel?

I have an excel file with a table A6:E233. I had to concatenate columns A and B so that values from B are displayed in a new line. I have achieved that with the CONCATENATE function (and CHAR(10) for new line) that is built into Excel.
After concatenation the spreadsheets looks like this:
EXAMPLE1
Now i would also need different formatting for each line inside the cell, namely size 12, bold for the first line and size 8 for second line:
EXAMPLE2
How do achieve this? If it would be a short table, I would do it manually, but since I have a few files, totally well over 5000 rows, maybe an automated way would be better.
I have found answers that touch upon this problem, but since I dont know how to use VBA, I am more or less lost. I am also using a lot of python and have looked through openpyexl and csv, but have not found a way how to achieve this.
Thank you for your help!
With Excel VBA, you need to use the Characters property of the Range object. For example:
Sub Test()
Dim rngCell As Range
Dim lngPos As Long
'get cell
Set rngCell = Sheet1.Range("A1")
'find linebreak
lngPos = InStr(1, rngCell.Value, vbLf, vbBinaryCompare)
'format either side
rngCell.Characters(1, lngPos).Font.Bold = True
rngCell.Characters(lngPos + 1, Len(rngCell.Value) - lngPos).Font.Color = 1234
End Sub
Which will format like this:
Here, try this code. I built this according to your screenshot.
Sub partialFormatting()
Dim tws As Worksheet
Dim fr, lr As Integer
Dim pos As Integer
Set tws = ThisWorkbook.Worksheets("Sheet1")
fr = 7
lr = tws.Range("A1000000").End(xlUp).Row
For r = fr To lr
With tws.Range("A" & r)
pos = InStr(.Value, vbLf)
With .Characters(Start:=1, Length:=pos - 1).Font
.FontStyle = "Bold"
.Size = 12
End With
With .Characters(Start:=pos + 1, Length:=Len(.Value) - pos).Font
.FontStyle = "Normal"
.Size = 8
End With
End With
Next r
End Sub
Please let me know if you have any questions on how the code works!

dropping an image with win32com (MS WORD)

I am trying to create a series of name badges and rather than doing it all by hand I'm trying to do it via python. I am using the win32com.client approach to create a table in msword to hold each name badge however the images I am inserting into each cell are pushed up against the top of the cell whereas I want them moved down a bit (Image is oversized I know but that can be dealt with later).
As you can see the image is write against the top of the border, I want it pushing down, I have tried adding newlines before (demonstrated below) but this seems to have had no effect. This is my loop for generating the badges.
for i in range(10):
cell_col = i % cols + 1
cell_row = i / cols + 1
cell_range = table.Cell(cell_row, cell_col).Range
cell_range.ParagraphFormat.SpaceBefore = 0
cell_range.ParagraphFormat.SpaceAfter = 3
table.Cell(cell_row, cell_col).Range.InsertBefore('\n')
cell_range.InlineShapes.AddPicture(os.path.join(os.path.abspath("."), filename))
table.Cell(cell_row, cell_col).Range.InsertAfter('\n'+hold[i])
table.Cell(cell_row, cell_col).Height = 150
table.Cell(cell_row, cell_col).Width = 250

python win32com excel border formatting

I have a piece of code here that actually work to format the borders in excel using python win32com. My concern is the time it take to format the borders. I tried to record a macro in excel to find out the required information to transpose it in my script but it didn't work.
So the best that I can do is to run in a for range loop where I always start at row 3 up to a row counter called shn[1] by increment of 1 and from column 1 to 10 by increment of 1. From there I use "BorderAround()" which work fine but too slow. Here my piece of code:
for shn in [("Beam-Beam", bb_row, bb_col), ("Beam-Col", bc_row, bc_col)]:
sheet = book.Worksheets(shn[0])
sheet.Range( "J3:DW3" ).Copy()
if shn[0] == "Beam-Col":
sheet.Range( "J3:AA3" ).Copy()
sheet.Range( sheet.Cells( 4, 10 ), sheet.Cells( shn[1]-1, 10 ) ).PasteSpecial()
for mrow in xrange(3,shn[1],1):
for mcol in xrange(1,10,1):
sheet.Cells(mrow, mcol).BorderAround()#.Border(1)
Is there something I can do to format the borders with a range like ==> sheet.Range( sheet.Cells(3,1), sheet.Cells(shn[1],10) )? I tried ".Borders(11)" and ".Borders(12)" plus ".BorderAround()", but only ".BorderAround()" have worked.
Thanks in advance.
Hmm, what excel are you using?
This should work:
for shn in [("Beam-Beam", bb_row, bb_col), ("Beam-Col", bc_row, bc_col)]:
sheet = book.Worksheets(shn[0])
sheet.Range( "J3:DW3" ).Copy()
if shn[0] == "Beam-Col":
sheet.Range( "J3:AA3" ).Copy()
## Set a variable named rng to the range
rng = sheet.Range( sheet.Cells( 4, 10 ), sheet.Cells( shn[1]-1, 10 ) )
rng.PasteSpecial()
## Using this range, we can now set its borders linestyle and weight
## -> where 7 through 13 correspond to borders for xlEdgeTop,xlEdgeBottom,
## xlEdgeRight, xlEdgeLeft, xlInsideHorizontal, and xlInsideVertical
## -> LineStyle of 1 = xlContinous
## -> Weight of 2 = xlThin
for border_id in xrange(7,13):
rng.Borders(border_id).LineStyle=1
rng.Borders(border_id).Weight=2
## And to finish just call
book.Close(True) # To close book and save
excel_app.Quit() # or some variable name established for the com instance
Let me know how this works for you.
Also it may be faster if you set the excel applications Visible to False or turned off screenupdating:
excel_app.Visible = False # This will not physically open the book
excel_app.ScreenUpdating = False # This will not update the screen on an open book
##
# Do Stuff...
##
# Just make sure when using the ScreenUpdating feature that you reenable it before closing
excel_app.ScreenUpdating = True
This way excel isn't updating the screen for every call.

Quickly find differences between two large text files

I have two 3GB text files, each file has around 80 million lines. And they share 99.9% identical lines (file A has 60,000 unique lines, file B has 80,000 unique lines).
How can I quickly find those unique lines in two files? Is there any ready-to-use command line tools for this? I'm using Python but I guess it's less possible to find a efficient Pythonic method to load the files and compare.
Any suggestions are appreciated.
If order matters, try the comm utility. If order doesn't matter, sort file1 file2 | uniq -u.
I think this is the fastest method (whether it's in Python or another language shouldn't matter too much IMO).
Notes:
1.I only store each line's hash to save space (and time if paging might occur)
2.Because of the above, I only print out line numbers; if you need actual lines, you'd just need to read the files in again
3.I assume that the hash function results in no conflicts. This is nearly, but not perfectly, certain.
4.I import hashlib because the built-in hash() function is too short to avoid conflicts.
import sys
import hashlib
file = []
lines = []
for i in range(2):
# open the files named in the command line
file.append(open(sys.argv[1+i], 'r'))
# stores the hash value and the line number for each line in file i
lines.append({})
# assuming you like counting lines starting with 1
counter = 1
while 1:
# assuming default encoding is sufficient to handle the input file
line = file[i].readline().encode()
if not line: break
hashcode = hashlib.sha512(line).hexdigest()
lines[i][hashcode] = sys.argv[1+i]+': '+str(counter)
counter += 1
unique0 = lines[0].keys() - lines[1].keys()
unique1 = lines[1].keys() - lines[0].keys()
result = [lines[0][x] for x in unique0] + [lines[1][x] for x in unique1]
With 60,000 or 80,000 unique lines you could just create a dictionary for each unique line, mapping it to a number. mydict["hello world"] => 1, etc. If your average line is around 40-80 characters this will be in the neighborhood of 10 MB of memory.
Then read each file, converting it to an array of numbers via the dictionary. Those will fit easily in memory (2 files of 8 bytes * 3GB / 60k lines is less than 1 MB of memory). Then diff the lists. You could invert the dictionary and use it to print out the text of the lines that differ.
EDIT:
In response to your comment, here's a sample script that assigns numbers to unique lines as it reads from a file.
#!/usr/bin/python
class Reader:
def __init__(self, file):
self.count = 0
self.dict = {}
self.file = file
def readline(self):
line = self.file.readline()
if not line:
return None
if self.dict.has_key(line):
return self.dict[line]
else:
self.count = self.count + 1
self.dict[line] = self.count
return self.count
if __name__ == '__main__':
print "Type Ctrl-D to quit."
import sys
r = Reader(sys.stdin)
result = 'ignore'
while result:
result = r.readline()
print result
If I understand correctly, you want the lines of these files without duplicates. This does the job:
uniqA = set(open('fileA', 'r'))
Python has difflib which claims to be quite competitive with other diff utilities see:
http://docs.python.org/library/difflib.html

Categories