Python openpyxl error - python

I want to open a file using Python and paste query results out of Oracle in to a specific sheet. I found a way to do with with xlswriter but that is not the right tool for the job.
I can get my query to execute and append to a list. I have both strings and integers in the result. I cannot get this to transfer to an excel file. Any help would be great.
The error I'm getting is:
line 201, in _bind_value
raise ValueError("Cannot convert {0} to Excel".format(value))
ValueError: Cannot convert ('20 GA GV-CS-CT-DRY-G90 60xC COIL', 2, 848817, 982875, 1.15793510261929) to Excel
Code:
import cx_Oracle
import openpyxl
con = cx_Oracle.connect('example', 'example', "example")
cur = con.cursor()
heatmap_data = []
statement = """ select * from example"""
cur.arraysize = 2000
cur.execute(statement)
for result in cur:
heatmap_data.append(result)
con.close()
file = "path/Test.xlsx"
wb = openpyxl.load_workbook(filename=file)
ws = wb.get_sheet_by_name("Sheet1")
row = 1
col = 1
for rowNum in range(2, len(heatmap_data)):
ws.cell(row=row, column=col).value = heatmap_data[rowNum]
row =+ 1
wb.save(file)

Maybe openpyxl doesn't convert iterables (which is what looks to be passed) to ws.cell.value.
Try:
for rowNum, data in enumerate(heatmap_data):
ws.cell(row=rowNum + 2, column=col).value = ", ".join(data[rowNum])
# Noticed the range you're choosing skips the first 2 values of your data.
# Looks like you're not incrementing the column. Meh. Guess not.
Heh.

Resolved the issue with the following code:
row = 1
for i in (heatmap_data):
print(i[0], i[1], i[2], i[3], i[4])
ws.cell(row=row, column=1).value = (i[0])
ws.cell(row=row, column=2).value = (i[1])
ws.cell(row=row, column=3).value = (i[2])
ws.cell(row=row, column=4).value = (i[3])
ws.cell(row=row, column=5).value = (i[4])
row += 1

Related

How to iterate thorugh multiple excel sheets using openpyxl library in Python?

I am using Openpyxl library to read xlsx file and extract few contents and add more strings to my txt output file.The excel file I am currently using contain sheets with name Summary and Employee. My below code is working fine for my current excel file. Now the issue is I would to use the same code for reading another excel file containing more sheets whose sheetnames I am not sure of. So in my code line ws = wb['Employee']. The sheetname will change all the time. However, One thing I am sure about is I don't want to read any data from sheet1. All the data extraction will occur from sheet2 onwards in all the xlsx files. I am not sure how to proceed from here so any help will be appreciated.
Thanks in advance for your time and efforts!
Code:
from openpyxl import load_workbook
data_file='\\test.xlsx'
# Load the entire workbook.
wb = load_workbook(data_file)
ws = wb['Employee'] #Manually adding sheet name here
mylines={"Column_name":[],"Column_Type":[]} #Getting 2 columns data from row 6
type_strs = {
'String': 'VARCHAR(256)',
'Numeric': 'NUMBER',
'Date': 'NUMBER(4,0)',
'Int': 'NUMBER'
}
for index, value in enumerate(mylines["Column_Type"]):
mylines["Column_Type"][index] = type_strs.get(value, value)
for i in range(6, ws.max_row+1):
name = ws.cell(row=i, column=1).value
name1=ws.cell(row=i, column=2).value
mylines["Column_name"].append(name) #Appending dictionary key "Column_name"
mylines["Column_Type"].append(name1) #Appending dictionay key "Column_type"
for index, value in enumerate(mylines["Column_Type"]):
mylines["Column_Type"][index] = type_strs.get(value, value)
theString = " "
for i in range(len(mylines['Column_name'])):
theString += mylines['Column_name'][i] + " " + mylines['Column_Type'][i]
if i < len(mylines['Column_name'])-1:
theString += ", "
outputFile = open('/output.txt', 'w') # Text file Output
outputFile.write("CREATE TABLE TRANSIENT TABLE STG_EMPLOYEE({});".format(theString) + "\n")
outputFile.close() #Closing file
Updated Code based on SO User comment:
from openpyxl import load_workbook
data_file='\\test.xlsx'
# Load the entire workbook.
wb = load_workbook(data_file)
#ws = wb['Employee'] #Manually adding sheet name here
mylines={"Column_name":[],"Column_Type":[]} #Getting 2 columns data from row 6
type_strs = {
'String': 'VARCHAR(256)',
'Numeric': 'NUMBER',
'Date': 'NUMBER(4,0)',
'Int': 'NUMBER'
}
for index, value in enumerate(mylines["Column_Type"]):
mylines["Column_Type"][index] = type_strs.get(value, value)
skip = True
for ws in wb.worksheets:
if skip == True:
skip = False
else:
for i in range(6, ws.max_row+1):
name = ws.cell(row=i, column=1).value
name1=ws.cell(row=i, column=2).value
mylines["Column_name"].append(name) #Appending dictionary key "Column_name"
mylines["Column_Type"].append(name1) #Appending dictionay key "Column_type"
for index, value in enumerate(mylines["Column_Type"]):
mylines["Column_Type"][index] = type_strs.get(value, value)
theString = " "
for i in range(len(mylines['Column_name'])):
theString += mylines['Column_name'][i] + " " + mylines['Column_Type'][i]
if i < len(mylines['Column_name'])-1:
theString += ", "
outputFile = open('/output.txt', 'w') # Text file Output
outputFile.write("CREATE TABLE TRANSIENT TABLE STG_EMPLOYEE({});".format(theString) + "\n")
outputFile.close() #Closing file
Excel data
<Sheet 1 Name -> Summary Sheet: Empty
<Sheet 2 Name -> Employee Sheet
File Name: Employee
Sheet Name: Employee
File Type: csv
Field Name Type
Name String
Salary Numeric
Date Date
Phone Int
<Sheet 3 Name-> Employee1 Sheet
File Name: Employee
Sheet Name: Employee1
File Type: csv
Field Name Type
Employee Name Date
Employee Salary Int
Employment Date Int
Office Phone Int
To iterate through all worksheets in a workbook and read data in them (except the first worksheet, remove the ws = wb['Employee']
Use a for loop (insert before for i in range(5,... as this
skip = True
for ws in wb.worksheets:
if skip == True:
skip = False
else:
for i in range(6, ws.max_row+1):
name = ws.cell(row=i, column=1).value
....
This will read each sheet and append data to mylines, except the first sheet
Second Update
As you mentioned in below comment, to add a new line with the new SQL query, please make these additional changes
Add another entry to dictionary to indicate new line as below (careful to ensure the lines execute after all lines in a particular sheet are read)
Edit the String formation so that once a NewLine is seen, that string is written to the output file. Do note that the NewFile boolean value will overwrite any file that is there. Multiple lines will be appended post that.
skip = True
for ws in wb.worksheets:
if skip == True:
skip = False
else:
for i in range(6, ws.max_row+1):
name = ws.cell(row=i, column=1).value
print(i, name)
name1=ws.cell(row=i, column=2).value
print(name1)
mylines["Column_name"].append(name) #Appending dictionary key "Column_name"
mylines["Column_Type"].append(name1) #Appending dictionay key "Column_type"
for index, value in enumerate(mylines["Column_Type"]):
mylines["Column_Type"][index] = type_strs.get(value, value)
mylines["Column_name"].append('NextLine')
mylines["Column_Type"].append('NextLine')
theString = " "
NewFile = True
sheetList = wb.sheetnames
tabIndex = 1
for i in range(len(mylines['Column_name'])):
if(mylines['Column_name'][i] != 'NextLine'):
theString += mylines['Column_name'][i] + " " + mylines['Column_Type'][i]
theString += ", "
else:
theString = theString[:-2]
if NewFile:
NewFile = False
outputFile = open('output.txt', 'w') # Text file Output
print("New file ", theString)
else:
outputFile = open('output.txt', 'a')
print("Not new file ", theString)
outputFile.write("CREATE TABLE TRANSIENT TABLE STG_" + sheetList[tabIndex] +"({});".format(theString) + "\n")
outputFile.close()
tabIndex += 1
theString = " "

Delete timestamp in my output file in excel using python

I need to remove timestamps in my file. It should only return the name. My source text file looks like this
7:52:01 AM sherr
hello GOOD morning .
おはようございます。
7:52:09 AM sherr
Who ?
誰?
7:52:16 AM sherr
OK .
わかりました
and my code looks like this
from openpyxl import Workbook
import copy
wb = Workbook()
with open('chat_20220207131707.txt', encoding='utf-8') as sherr:
row = 1
column = 1
ws = wb.active
for line in sherr:
ws.cell(row=row, column=column, value=line.strip())
if (column := column + 1) > 3:
row += 1
column = 1
for row in ws.iter_rows():
for cell in row:
alignment = copy.copy(cell.alignment)
alignment.wrapText=True
cell.alignment = alignment
wb.save('sherrplease.xlsx')
If the file always has the same structure, which it certainly looks like, this can be done with simple splitting of the string in question.
from openpyxl import Workbook
import copy
wb = Workbook()
with open('chat_20220207131707.txt', encoding='utf-8') as sherr:
row = 1
column = 1
ws = wb.active
for line in sherr:
if column == 1:
## split the line and rejoin
value = " ".join(line.strip().split(' ')[2:])
else:
value = line.strip()
ws.cell(row=row, column=column, value=value)
if (column := column + 1) > 3:
row += 1
column = 1
for row in ws.iter_rows():
for cell in row:
alignment = copy.copy(cell.alignment)
alignment.wrapText=True
cell.alignment = alignment
wb.save('sherrplease.xlsx')

AttributeError: 'pywintypes.datetime' object has no attribute 'nanosecond'

I have some code to open an excel file and save it as a pandas dataframe, it was originally used in Python 2.7 and I am currently trying to make it work under Python 3.
Originally, I used the code in #myidealab from this other post: From password-protected Excel file to pandas DataFrame.
It currently looks like this:
data_file = <path_for_file>
# Load excel file
xlApp = win32com.client.Dispatch("Excel.Application")
xlApp.Visible = False
pswd = getpass.getpass('password: ')
xldatabase = xlApp.Workbooks.Open(data_file, False, True, None, pswd)
dfdatabase = []
for sh in xldatabase.Sheets:
xlsheet = xldatabase.Worksheets(sh.Name)
# Get last_row
row_num = 0
cell_val = ''
while cell_val != None:
row_num += 1
cell_val = xlsheet.Cells(row_num, 1).Value
last_row = row_num - 1
# Get last_column
col_num = 0
cell_val = ''
while cell_val != None:
col_num += 1
cell_val = xlsheet.Cells(1, col_num).Value
last_col = col_num - 1
# Get content
content = xlsheet.Range(xlsheet.Cells(1, 1), xlsheet.Cells(last_row, last_col)).Value
# Load each sheet as a dataframe
dfdatabase.append(pd.DataFrame(list(content[1:]), columns=content[0]))
Now, I am getting the following error:
AttributeError: 'pywintypes.datetime' object has no attribute
'nanosecond'
The problem seems to boil down to the lines bellow:
# Get content
content = xlsheet.Range(xlsheet.Cells(1, 1), xlsheet.Cells(last_row, last_col)).Value
# Load each sheet as a dataframe
dfdatabase.append(pd.DataFrame(list(content[1:]), columns=content[0]))
The xlsheet.Range().Value is reading the data and assigning pywintymes descriptors to the data, which pd.DataFrame() fails to interpret.
Did anyone ran into this issue before? Is there a way that I can specifically tell xlsheet.Range().Value how to read the values in a way that pandas can interpret?
Any help will be welcome!
Thank you.
This solves the issue, assuming you know beforehand the size/formatting of your dates/times in the excel sheet.
Might be there are other more elegant ways to solve it, nonetheless.
Note: content is initially a tuple. Position [0] is the array containing the headers and the remaining positions contain the data.
import datetime
import pywintypes
...
content = xlsheet.Range(xlsheet.Cells(1, 1), xlsheet.Cells(last_row, last_col)).Value
head = content[0]
data = list(content[1:])
for x in range(0,len(data)):
data[x] = list(data[x])
for y in range(0,len(data[x])):
if isinstance(data[x][y], pywintypes.TimeType):
temp = str(data[x][y]).rstrip("+00:00").strip()
if len(temp)>10:
data[x][y] = datetime.datetime.strptime(temp, "%Y-%m-%d%H:%M")
elif len(temp)>5 and len(temp)<=10:
data[x][y] = datetime.datetime.strptime(temp, "%Y-%m-%d")
elif len(temp)<=5:
data[x][y] = datetime.datetime.strptime(temp, "%H:%M")
print(data[x][y])
# Load each sheet as a dataframe
dfdatabase.append(pd.DataFrame(data, columns=head))
Used this as references:
python-convert-pywintyptes-datetime-to-datetime-datetime

Trying to write to next line (python/win32com)

I am able to find the last row and i have my variable set, however i'm getting this error: This object does not support enumeration
## Export Severity to Qualy Data ##
## access excel application and open active worksheet ##
Excel = win32.gencache.EnsureDispatch('Excel.application')
Excel.Visible = False
wb = Excel.Workbooks.Open('Qualys Data.xlsx')
sh = wb.ActiveSheet
## Find last row ##
lastRow = sh.UsedRange.Rows.Count
next_Row = lastRow+1
print("Next row to print: ", next_Row)
## Can i loop through the document, if on last_Row / print to last row?
for line in sh:
if desired_row == next_Row:
sh.Range("A1:A1").Value = reportDate
print("Exporting to data to new spreadsheet...")
time.sleep(1)
Excel.DisplayAlerts = 0
wb.SaveAs('Qualys Data.xlsx')
Excel.DisplayAlerts = True
Excel.Quit()
Basically I want to write to the last row if on the last row, next_Row will print the row number to write to
It is not possible to iterate on wb.ActiveSheet which is why you are getting the error message.
If you are just trying to append a single cell to the bottom of the first row, you can just write directly to that cell, no enumeration is needed:
import win32com.client as win32
from datetime import datetime
import time
reportDate = datetime.now()
Excel = win32.gencache.EnsureDispatch('Excel.application')
Excel.Visible = False
wb = Excel.Workbooks.Open(r'e:\python temp\Qualys Data.xlsx')
ws = wb.ActiveSheet
## Find last row ##
lastRow = ws.UsedRange.Rows.Count
next_Row = lastRow + 1
print("Next row to print: ", next_Row)
ws.Cells(next_Row, 1).Value = reportDate
print("Exporting to data to new spreadsheet...")
time.sleep(1)
Excel.DisplayAlerts = 0
wb.SaveAs('Qualys Data.xlsx')
Excel.DisplayAlerts = True
Excel.Quit()
This is done using ws.Cells(next_Row, 1).Value = reportDate
In this example, it just appends the current date and time.
To write a list to the last row use the following:
data = ["cell 1", "cell 2", "cell 3", "cell 4"]
ws.Range(ws.Cells(next_Row, 1), ws.Cells(next_Row, len(data))).Value = data

read Chinese character from excel file python3

I have an Excel file that contains two columns, first one in Chinese and the second is just a link.
I tried two methods I found here. but it didn't work and I can't print the value in the console, I changed my encoding variable in settings (pycharm) to U8, still doesn't work.
I used Pandas & xlrd libs, both didn't work while it worked for others who posted.
this is my current code :
from xlrd import open_workbook
class Arm(object):
def __init__(self, id, dsp_name):
self.id = id
self.dsp_name = dsp_name
def __str__(self):
return("Arm object:\n"
" Arm_id = {0}\n"
" DSPName = {1}\n"
.format(self.id, self.dsp_name))
if __name__ == '__main__':
wb = open_workbook('test.xls')
for sheet in wb.sheets():
print(sheet)
number_of_rows = sheet.nrows
number_of_columns = sheet.ncols
items = []
rows = []
for row in range(1, number_of_rows):
values = []
for col in range(number_of_columns):
value = str(sheet.cell(row, col).value)
for a in value:
print('\n'.join([a]))
values.append(value)
print(value)
for item in items:
print (item)
print("Accessing one single value (eg. DSPName): {0}".format(item.dsp_name))
print
obviously it's not working, I was just messing around with it after giving up.
File : http://www59.zippyshare.com/v/UxITFjis/file.html
It's not about encoding, you are not access the right rows.
On the line 24
for row in range(1, number_of_rows):
why are you want to start with 1 instead of 0.
tryfor row in range(number_of_rows):
Well the problem I had wasn't in reading the Chinese characters actually! my problem we're in printing in console.
I thought that the print encoder works fine and I just didn't read it the characters, but this code works fine :
from xlrd import open_workbook
wb = open_workbook('test.xls')
messages = []
links = []
for sheet in wb.sheets():
number_of_rows = sheet.nrows
number_of_columns = sheet.ncols
for row in range(1, number_of_rows):
i = 0
for col in range(number_of_columns):
value = (sheet.cell(row,col).value).encode('gbk')
if i ==0:
messages.append(value)
else:
links.append(value)
i+=1
print(links)
to check it, I paste the first result in selenium driver (since I was going to use it anyway)
element = driver.find_element_by_class_name('email').send_keys(str(messages[0],'gbk'))
and it works like a charme!

Categories