As the title suggested, I've tried to get the last row so I can determine the end of the range I need to copy for pasting into Outlook. I had to use win32com because it will allow me to pull the data from an opened Excel book while Pandas or openxyl doesn't allow me to do that. Below is the snippet of code I tried to do.
I did try to use sheet.max_row or even len(sheet['G') to get last row but it doesn't allow enumeration apparently.
import os
import win32com.client as client
from PIL import ImageGrab
from datetime import date
from openpyxl import load_workbook
(...)
excel = client.GetActiveObject('Excel.Application')
wb = excel.Workbooks('Test Work')
sheet = wb.Sheets['PIVOT TABLE']
loc = str(sheet.Range('B5'))
locEmail = loc_EmailName(loc.split(',')[0])
copyrange = sheet.Range('A4:G11')
copyrange.CopyPicture(Appearance=1, Format=2)
(...)
def get_column_after(self, column, offset):
global counter
for item in self.ws.Range("{0}{1}:{0}{2}".format(column, offset, self.get_last_row_from_column(column))).Value:
print(item[0])
counter = counter + 1
I simply added a counter so that I could get the final row number and used that to augment my range of what to copy. Thank you to #BigBen for the pointer to the answer from how to find if last row has value for a specific column in Excel using Python Win32com.
Related
I'm a beginner at Python, and I have been trying my hand at some projects. I have an excel spreadsheet that contains a column of URLs that I want to open, pull some data from, output to a different column on my spreadsheet, and then go down to the next URL and repeat.
I was able to write code that allowed me to complete almost the entire process if I enter in a single URL, but I suck at creating loops
My list is only 10 cells long.
My question is, what code can I use that will loop through a column until it hits a stopping point. .
import urllib.request, csv, pandas as pd
from openpyxl import load_workbook
xl = pd.ExcelFile("filename.xlsx")
ws = xl.parse("Sheet1")
i = 0 # This is where I insert the row number for a specific URL
urlpage = str(ws['URLPage'][i]) # 'URLPage' is the name of the column in Excel
p = urlpage.replace(" ", "") # This line is for deleting whitespace in my URL
response = urllib.request.urlopen(p)
Also as stated, I'm newer at Python, so if you see where I can improve the code I already have, please let me know.
I have used openpyxl for outputting values in Excel in my Python code. However, now I find myself in a situation where the cell locations in excel file may change based on the user. To avoid any problems with the program, I want to name the cells where the code can save the output to. Is there any way to have Python interact with named ranges in Excel?
For a workbook level defined name
import openpyxl
wb = openpyxl.load_workbook("c:/tmp/SO/namerange.xlsx")
ws = wb["Sheet1"]
mycell = wb.defined_names['mycell']
for title, coord in mycell.destinations:
ws = wb[title]
ws[coord] = "Update"
wb.save('updated.xlsx')
print("{} {} updated".format(ws,coord))
I was able to find the parameters of the named range using defined_names. After that I just worked like it was a normal Excel cell.
from openpyxl import load_workbook
openWB=load_workbook('test.xlsx')
rangeDestination = openWB.defined_names['testCell']
print(rangeDestination)
sheetName=str(rangeDestination.attr_text).split('!')[0]
cellName = str(rangeDestination.attr_text).split('!')[1]
sheetToWrite=openWB[sheetName]
cellToWrite=sheetToWrite[cellName]
sheetToWrite[cellName]='TEST-A3'
print(sheetName)
print(cellName)
openWB.save('test.xlsx')
openWB.close()
Is it possible to search/ parse through two columns in excel (let's say columns C & D) and find only the fields with underscores by using python?
Maybe a code like this? Not too sure..:
Import xl.range
Columns = workbook.get("C:D"))
Extract = re.findall(r'\(._?)\', str(Columns)
Please let me know if my code can be further improved on! :)
for those who need an answer, I solved it via using this code:
import openpyxl
from openpyxl.reader.excel import load_workbook
dict_folder = "C:/...../abc"
for file in os.listdir(dict_folder):
if file.endswith(".xlsx"):
wb1 = load_workbook(join(dict_folder, file), data_only = True)
ws = wb1.active
for rowofcellobj in ws["C" : "D"]:
for cellobj in rowofcellobj:
data = re.findall(r"\w+_.*?\w+", str(cellobj.value))
if data != []:
fields = data[0]
fieldset.add(fields)
Yes, it is indeed possible. The main lib you'll get to for that is pandas. With it installed (instructions here) after, of course, installing python, you could do something along the lines of
import pandas as pd
# Reading the Excel worksheet into a pandas.DataFrame type object
sheet_path = 'C:\\Path\\to\\excel\\sheet.xlsx'
df = pd.read_excel(sheet_path)
# Using multiple conditions to find column substring within
underscored = df[(df['C'].str.contains('_')) | (df['D'].str.contains('_'))]
And that'd do it for columns C and D within your worksheet.
pandas has got a very diverse documentation, but to the extent you're looking for, the read_excel function documentation (has examples) will suffice, along with some more content on python itself, if needed.
I have to write some data into existing xls file.(i should say that im working on unix and couldnt use windows)
I prefer work with python and have tried some libraries like xlwt, openpyxl, xlutils.
Its not working, cause there is some filter in my xls file. After rewriting this file filter is dissapearing. But i still need this filter.
Could some one tell me about options that i have.
help, please!
Example:
from xlutils.copy import copy
from xlrd import open_workbook
from xlwt import easyxf
start_row=0
rb=open_workbook('file.xls')
r_sheet=rb.sheet_by_index(1)
wb=copy(rb)
w_sheet=wb.get_sheet(1)
for row_index in range(start_row, r_sheet.nrows):
row=r_sheet.row_values(row_index)
call_index=0
for c_el in row:
value=r_sheet.cell(row_index, call_index).value
w_sheet.write(row_index, call_index, value)
call_index+=1
wb.save('file.out.xls');
I also tried:
import xlrd
from openpyxl import Workbook
import unicodedata
rb=xlrd.open_workbook('file.xls')
sheet=rb.sheet_by_index(0)
wb=Workbook()
ws1=wb.create_sheet("Results", 0)
for rownum in range(sheet.nrows):
row=sheet.row_values(rownum)
arr=[]
for c_el in row:
arr.append(c_el)
ws1.append(arr)
ws2=wb.create_sheet("Common", 1)
sheet=rb.sheet_by_index(1)
for rownum in range(sheet.nrows):
row=sheet.row_values(rownum)
arr=[]
for c_el in row:
arr.append(c_el)
ws2.append(arr)
ws2.auto_filter.ref=["A1:A15", "B1:B15"]
#ws['A1']=42
#ws.append([1,2,3])
wb.save('sample.xls')
The problem is still exist. Ok, ill try to find machine running on windows, but i have to admit something else:
There is some rows like this:
enter image description here
Ive understood what i was doing wrong, but i still need help.
First of all, i have one sheet that contains some values
Second sheet contains summary table!!!
If i try to copy this worksheet it did wrong.
So, the question is : how could i make summary table from first sheet?
Suppose your existing excel file has two columns (date and number).
This is how you will append additional rows using openpyxl.
import openpyxl
import datetime
wb = openpyxl.load_workbook('existing_data_file.xlsx')
sheet = wb.get_sheet_by_name('Sheet1')
a = sheet.get_highest_row()
sheet.cell(row=a,column=0).value=datetime.date.today()
sheet.cell(row=a,column=1).value=30378
wb.save('existing_data_file.xlsx')
If you are on Windows, I would suggest you take a look at using the win32com.client approach. This allows you to interact with your spreadsheet using Excel itself. This will ensure that any existing filters, images, tables, macros etc should be preserved.
The following example opens an XLS file adds one entry and saves the whole workbook as a different XLS formatted file:
import win32com.client as win32
import os
excel = win32.gencache.EnsureDispatch('Excel.Application')
wb = excel.Workbooks.Open(r'input.xls')
ws = wb.Worksheets(1)
# Write a value at A1
ws.Range("A1").Value = "Hello World"
excel.DisplayAlerts = False # Allow file overwrite
wb.SaveAs(r'sample.xls', FileFormat=56)
excel.Application.Quit()
Note, make sure you add full paths to your input and output files.
I have been able to get the column to output the values of the column in a separated list. However I need to retain these values and use them one by one to perform an Amazon lookup with them. The amazon lookup is not the problem. Getting XLRD to give one value at a time has been a problem. Is there also an efficient method of setting a time in Python? The only answer I have found to the timer issue is recording the time the process started and counting from there. I would prefer just a timer. This question is somewhat two parts here is what I have done so far.
I load the spreadsheet with xlrd using argv[1] i copy it to a new spreadsheet name using argv[2]; argv[3] i need to be the timer entity however I am not that far yet.
I have tried:
import sys
import datetime
import os
import xlrd
from xlrd.book import colname
from xlrd.book import row
import xlwt
import xlutils
import shutil
import bottlenose
AMAZON_ACCESS_KEY_ID = "######"
AMAZON_SECRET_KEY = "####"
print "Executing ISBN Amazon Lookup Script -- Please be sure to execute it python amazon.py input.xls output.xls 60(seconds between database queries)"
print "Copying original XLS spreadsheet to new spreadsheet file specified as the second arguement on the command line."
print "Loading Amazon Account information . . "
amazon = bottlenose.Amazon(AMAZON_ACCESS_KEY_ID, AMAZON_SECRET_KEY)
response = amazon.ItemLookup(ItemId="row", ResponseGroup="Offer Summaries", SearchIndex="Books", IdType="ISBN")
shutil.copy2(sys.argv[1], sys.argv[2])
print "Opening copied spreadsheet and beginning ISBN extraction. . ."
wb = xlrd.open_workbook(sys.argv[2])
print "Beginning Amazon lookup for the first ISBN number."
for row in colname(colx=2):
print amazon.ItemLookup(ItemId="row", ResponseGroup="Offer Summaries", SearchIndex="Books", IdType="ISBN")
I know this is a little vague. Should I perhaps try doing something like column = colname(colx=2) then i could do for row in column: Any help or direction is greatly appreciated.
The use of colname() in your code is simply going to return the name of the column (e.g. 'C' by default in your case unless you've overridden the name). Also, the use of colname is outside the context of the contents of your workbook. I would think you would want to work with a specific sheet from the workbook you are loading, and from within that sheet you would want to reference the values of a column (2 in the case of your example), does this sound somewhat correct?
wb = xlrd.open_workbook(sys.argv[2])
sheet = wb.sheet_by_index(0)
for row in sheet.col(2):
print amazon.ItemLookup(ItemId="row", ResponseGroup="Offer Summaries", SearchIndex="Books", IdType="ISBN")
Although I think looking at the call to amazon.ItemLookup() you probably want to refer to row and not to "row" as the latter is simply a string and the former is the actual contents of the variable named row from your for loop.