I am working on a project where I am converting some VBA code to Python, in order to have Python interact with Excel in much the same way VBA would. In this particular case, I am utilizing the win32com library to have Python extract data from an Oracle Database via an ADODB Connection and write the resulting recordset directly to a pivot cache. I.e. creating a pivot table with data from an external source.
import win32com.client
Excel = win32com.client.gencache.EnsureDispatch('Excel.Application')
win32c = win32com.client.constants
# Create and Open Connection
conn = win32com.client.Dispatch(r'ADODB.Connection')
DSN = 'Provider=OraOLEDB.Oracle; Data Source=localhost:1521/XEPDB1; User Id=system; Password=password;'
conn.Open(DSN)
# Create Excel File
wb = Excel.Workbooks.Add()
Sheet1 = wb.Worksheets("Sheet1")
# Create Recordset
RS = win32com.client.Dispatch(r'ADODB.Recordset')
RS.Open('SELECT * FROM employees', conn, 1, 3)
# Create Pivot Cache
PivotCache = wb.PivotCaches().Create(SourceType=win32c.xlExternal, Version=win32c.xlPivotTableVersion15)
# Write Recordset to Pivot Cache
PivotCache.Recordset = RS # <~~ This is where it breaks!
# Create Pivot Table
Pivot = PivotCache.CreatePivotTable(TableDestination:=Sheet1.Cells(2, 2), TableName:='Python Test Pivot', DefaultVersion:=win32c.xlPivotTableVersion15)
# Close Connection
RS.Close()
conn.Close()
# View Excel
Excel.Visible = 1
I am successful in extracting the data via ADODB and creating an Excel file, but when I try to write the resulting recordset to the pivot cache by setting PivotCache.Recordset = RS, I get the following error.
[Running] venv\Scripts\python.exe "c:\Project\Test\debug_file_test.py"
Traceback (most recent call last):
File "c:\Project\Test\debug_file_test.py", line 29, in <module>
PivotCache.Recordset = RS # <~~ This is where it breaks!
File "c:\Project\venv\lib\site-packages\win32com\client\__init__.py", line 482, in __setattr__
self._oleobj_.Invoke(*(args + (value,) + defArgs))
pywintypes.com_error: (-2147352567, 'Exception occurred.', (0, None, 'No such interface supported\r\n', None, 0, -2146827284), None)
[Done] exited with code=1 in 0.674 seconds
Can anybody shed some light on what I am doing wrong?
I ended up finding a solution to the issue, and want to post an answer for anyone who may come across this question at some point.
Instead of creating the recordset by Recordset.Open() I tried using the command object and create the recordset by cmd.Execute(). As it turns out that Execute returns a tuple, I had to pass cmd.Execute()[0] to the recordset in order to make it work.
This doesn't answer why my initial code doesn't work, but it does provide an answer for how to write an ADODB recordset to a PivotCache with Python.
import win32com.client
#Initiate Excel Application
Excel = win32com.client.gencache.EnsureDispatch('Excel.Application')
win32c = win32com.client.constants
# Create and Open Connection
conn = win32com.client.Dispatch('ADODB.Connection')
cmd = win32com.client.Dispatch('ADODB.Command')
DSN = 'Provider=OraOLEDB.Oracle; Data Source=localhost:1521/XEPDB1; User Id=system; Password=password;'
conn.Open(DSN)
# Define Command Properties
cmd.ActiveConnection = conn
cmd.ActiveConnection.CursorLocation = win32c.adUseClient
cmd.CommandType = win32c.adCmdText
cmd.CommandText = 'SELECT * FROM employees'
# Create Excel File
wb = Excel.Workbooks.Add()
Sheet1 = wb.Worksheets("Sheet1")
# Create Recordset
RS = win32com.client.Dispatch('ADODB.Recordset')
RS = cmd.Execute()[0]
# Create Pivot Cache
PivotCache = wb.PivotCaches().Create(SourceType=win32c.xlExternal, Version=win32c.xlPivotTableVersion15)
PivotCache.Recordset = RS
# Create Pivot Table
Pivot = PivotCache.CreatePivotTable(TableDestination:=Sheet1.Cells(2, 2), TableName:='Python Test Pivot', DefaultVersion:=win32c.xlPivotTableVersion15)
# Close Connection
RS.Close()
conn.Close()
# View Excel
Excel.Visible = 1
Update
As hinted by #Parfait the code above also works if RS = cmd.Execute()[0] is replaced by
RS.Open(cmd)
Which I actually prefer because that secures alignment between the VB syntax and the Python syntax.
Related
I am trying to fetch data from Cassandra from a specific table and trying to insert it into another table in Cassandra after making some changes. Both the tables are located in keyspace "test". When I am trying to get the data from the first table everything works fine and it is able to fetch the data. However, in the future handler which handles the output of the first query, I am trying to insert the data into another table under the same Cassandra instance and it is gettingting failed. I am getting an error from the application stating "cassandra.cluster.NoHostAvailable: ("Unable to connect to any servers using keyspace 'test'", ['127.0.0.1'])" . I am not sure where I am going wrong
import threading
from threading import Event
from cassandra.query import SimpleStatement
from cassandra.cluster import Cluster
hosts=['127.0.0.1']
keyspace="test"
thread_local = threading.local()
cluster_ = Cluster(hosts)
def get_session():
if hasattr(thread_local, "cassandra_session"):
print("got session from threadlocal")
return thread_local.cassandra_session
print(" Connecting to Cassandra Host " + str(hosts))
session_ = cluster_.connect(keyspace)
print(" Connecting and creating session to Cassandra KeySpace " + keyspace)
thread_local.cassandra_session = session_
return session_
class PagedResultHandler(object):
def __init__(self, future):
self.error = None
self.finished_event = Event()
self.future = future
self.future.add_callbacks(
callback=self.handle_page,
errback=self.handle_error)
def handle_page(self, rows):
for row in rows:
process_row(row)
if self.future.has_more_pages:
self.future.start_fetching_next_page()
else:
self.finished_event.set()
def handle_error(self, exc):
self.error = exc
self.finished_event.set()
def process_row(row):
print(row)
session_ = get_session()
stmt = session_.prepare(
"INSERT INTO test.data(customer,snr,rttt, event_time) VALUES (?,?,?,?)")
results = session_.execute(stmt,
[row.customer, row.snr, row.rttt,row.created_time])
print("Done")
session = get_session()
query = "select * from test.data_log"
statement = SimpleStatement(query, fetch_size=1000)
future = session.execute_async(statement)
handler = PagedResultHandler(future)
handler.finished_event.wait()
if handler.error:
raise handler.error
cluster_.shutdown()
However, when I try to execute the python file the application is throwing an error "cassandra.cluster.NoHostAvailable: ("Unable to connect to any servers using keyspace 'test'", ['127.0.0.1'])" from getSession() call from "process_row" method. Clearly, the first call to Cassandra is getting succeeded without any issues. There is no connectivity issue and the Cassandra instance is running fine locally. I am able to query the data using cqlsh. If I call the process_row method outside the future handler everything is working fine, I am not sure what needs to be done to make it happen from the Future Handler.
Connecting to Cassandra Host ['127.0.0.1']
Connecting and creating session to Cassandra KeySpace test
Row(customer='abcd', snr=100, rttt=121, created_time=datetime.datetime(2020, 8, 8, 2, 26, 51))
Connecting to Cassandra Host ['127.0.0.1']
Traceback (most recent call last):
File "test/check.py", , in <module>
raise handler.error
File "cassandra/cluster.py", line 4579, in cassandra.cluster.ResponseFuture._set_result
File "cassandra/cluster.py", line 4777, in cassandra.cluster.ResponseFuture._set_final_result
File "test/check.py"", in handle_page
process_row(row)
File "test/check.py"", in process_row
session_ = get_session()
File "/test/check.py"", in get_session
session_ = cluster_.connect(keyspace)
File "cassandra/cluster.py", line 1715, in cassandra.cluster.Cluster.connect
File "cassandra/cluster.py", line 1772, in cassandra.cluster.Cluster._new_session
File "cassandra/cluster.py", line 2553, in cassandra.cluster.Session.__init__
cassandra.cluster.NoHostAvailable: ("Unable to connect to any servers using keyspace 'test'", ['127.0.0.1'])
Process finished with exit code 1
Ok so Cassandra recommends the following:
Use at most one Session per keyspace, or use a single Session and explicitely specify the keyspace in your queries
https://www.datastax.com/blog/4-simple-rules-when-using-datastax-drivers-cassandra
In your code you try to create a session every time the read query has retrieved some rows.
To force the code to use at most one session we can create a queue where the child thread sends the row to the main thread and the main thread handles it further by executing the insert query. We do this in the main thread because I've experienced issues by executing queries in child thread.
callback_queue = Queue()
session = cluster_.connect(keyspace)
session.row_factory = dict_factory # because queue doesn't accept a Row instance
class PagedResultHandler(object):
...
def handle_page(self, rows):
for row in rows:
callback_queue.put(row) # here we pass the row as a dict to the queue
...
def process_rows():
while True:
try:
row = callback_queue.get() # here we retrieve the row as a dict from the child thread
stmt = session.prepare(
"INSERT INTO test.data(customer,snr,rttt, event_time) VALUES (?,?,?,?,?)")
results = session.execute(stmt,
[row['customer'], row['snr'], row['rttt'], row['created_time']])
print("Done")
except Empty:
pass
query = "select * from test.data_log"
statement = SimpleStatement(query, fetch_size=1000)
future = session.execute_async(statement)
handler = PagedResultHandler(future)
process_rows() # for now the code will hang here because we have an infinite loop in this function
handler.finished_event.wait()
if handler.error:
raise handler.error
cluster_.shutdown()
This will get it to work, but I would replace the while True else you will get into an infinite loop.
Ok so in that case we do 2 things, we can use multithreading and batch inserting. I think if we batch insert parallelism is not required, because that will speed things up from the client side fast enough. multithreading wouldn't add much more speed to it as it is not a cpu intensive task.
session = cluster_.connect(keyspace)
session.row_factory = dict_factory
class Fetcher:
def __init__(self, session):
self.session = session
query = "select * from test.data_log"
self.statement = SimpleStatement(query, fetch_size=1000)
def run(self):
rows = self.session.execute(self.statement)
temp_rows = []
total = 0
for row in rows:
temp_rows.append(row)
if len(temp_rows) == 1000:
handler = PagedResultHandler(self.session, temp_rows)
handler.start()
temp_rows = []
handler = PagedResultHandler(self.session, temp_rows)
handler.start()
def handle_error(self, err=None):
print(err)
class PagedResultHandler(threading.Thread):
def __init__(self, session, rows):
super().__init__()
self.session = session
self.error = None
self.rows = rows
self.finished_event = Event()
def run(self):
batch = BatchStatement(consistency_level=ConsistencyLevel.QUORUM)
stmt = session.prepare("INSERT INTO test.data(id, customer,snr,rttt, event_time) VALUES (?,?,?,?,?)")
for row in self.rows:
batch.add(stmt, [1, row['customer'], row['snr'], row['rttt'], row['created_time']])
results = session.execute(batch)
print(results)
Fetcher(session).run()
This does script does both batch inserting and multithreading, but again multithreading seems unnecessary.
I'm having a hard time here on processing GIS data in Python, using library ArcPy.
I've been trying to generate independent features from a feature class based on a field of the attribute table which is a unique code representing productive forest units, but I can't get it done.
I've already done this in other situations, but this time I don't know what I am missing.
Here is the code and the error I get:
# coding utf-8
import arcpy
arcpy.env.overwriteOutput = True
ws = r'D:\Projeto_VANT\SIG\proc_parc.gdb'
arcpy.env.workspace = ws
talhoes = r'copy_talhoes'
estados = ('SP', 'MG')
florestas = ('PROPRIA', 'PARCERIA')
arcpy.MakeFeatureLayer_management(talhoes,
'talhoes_layer',
""" "ESTADO" IN {} AND "FLORESTA" IN {} """.format(estados, florestas),
ws)
arcpy.FeatureClassToFeatureClass_conversion(in_features = 'talhoes_layer',
out_path = ws,
out_name = 'talhoes1')
talhoes1 = r'talhoes1'
arcpy.AddField_management(talhoes1, 'CONCAT_T', 'TEXT')
arcpy.CalculateField_management(talhoes1, 'CONCAT_T', """ [ESTADO] & "_" & [CODIGO] & "_" & [TALHAO] """, 'VB')
with arcpy.da.SearchCursor(talhoes1, ['CONCAT_T', 'AREA']) as tal_cursor:
for x in tal_cursor:
print(x[0] + " " + str(x[1])) # This print is just to check if the cursor works and it does!
arcpy.MakeFeatureLayer_management(x,
'teste',
""" CONCAT_T = '{}' """.format(str(x[0]))) # Apparently the problem is here!
arcpy.CopyFeatures_management('teste',
'Layer{}'.format(x[0]))
Here is the error:
Traceback (most recent call last):
File "D:/ArcPy_Classes/Scripts/sampling_sig.py", line 32, in <module>
""" CONCAT_T = '{}' """.format(str(x[0])))
File "C:\Program Files (x86)\ArcGIS\Desktop10.5\ArcPy\arcpy\management.py", line 6965, in MakeFeatureLayer
raise e
RuntimeError: Object: Error in executing tool
I think the issue is with your In feature. you will want your in feature to be talhoes1 since x is the cursor object and not a feature.
arcpy.MakeFeatureLayer_management(talhoes1,'teste',""" CONCAT_T =
'{}'""".format(str(x[0])))
I am trying to loop through Excel files on a Mac 10.10.5 with xlwings in python in order to save the values only (i.e. not the formulae that create them). Many of the files have been output by automated software, so the typical pandas.read_excel function produces null values for any cells that start with =.
Anyway, if I walk through what's inside the for loop, one at a time in ipython, the code works fine. However, what's causing an error is that when the script runs with a loop, whenever it gets to a workbook that hasn't been walked through manually, it produces an error. Here's the script (reduced version in update below):
wkbks = [fl for fl in os.listdir(originals_folder) if any([fl.endswith(ext) for ext in ['.xls', 'xlsx']])]
for w_idx, wkbk in enumerate(wkbks):
if w_idx % 10 == 0:
print "\t{}/{}".format(w_idx, len(wkbks))
wb = xw.Book( os.path.join(originals_folder, wkbk) )
new_wb = xw.Book()
for sht_idx, sht in enumerate(wb.sheets):
# look for a pretty wide range (disclaimer: not super robust for really large sheets)
values = sht.range('A1:HZ1000').value
df = pd.DataFrame(values)
rows_to_keep = df.isnull().sum(axis=1) < df.shape[1]
cols_to_keep = df.isnull().sum(axis=0) < df.shape[0]
df = df.ix[rows_to_keep, cols_to_keep]
# add values to new sheet
if not any([sht.name == sh.name for sh in new_wb.sheets]):
if sht_idx == 0:
new_wb.sheets.add(sht.name)
else:
new_wb.sheets.add(sht.name, after=wb.sheets[sht_idx-1].name)
new_wb.sheets[sht.name].range('A1').value = df.values
new_name = new_wb.name
new_wb.save() # close to generic name in current location, as filepaths produced an error
# close newly made workbook
close_workbook_by_name(xw, new_name)
# copy newly made workbook to base_folder
new_fname = [f for f in os.listdir('./') if f.startswith(new_name)][0]
shutil.copy2(os.path.join('./', new_fname), os.path.join(base_folder, wkbk))
# remove newly made file from current directory
os.remove( os.path.join('./', new_fname) )
# close original workbook
close_workbook_by_name(xw, wkbk)
del sht, new_wb, wb, wkbk, shts
And here's the error (again, this only happens when it loops to a wkbk that hasn't yet been manually 'walked through' in ipython):
---------------------------------------------------------------------------
CommandError Traceback (most recent call last)
<ipython-input-319-2240e28ae169> in <module>()
14 shts = wb.sheets
15 sht_idx = 0
---> 16 for sht in shts:
17 sht = wb.sheets[sht.name]
18
...
CommandError: Command failed:
OSERROR: -1728
MESSAGE: The object you are trying to access does not exist
COMMAND: app(pid=8732).workbooks['cwa balance sheet 9.30.2017-5688630b3.xlsx'].count(each=k.worksheet)
Any suggestions on the specifics or general strategy here would be greatly appreciated. Thanks!
UPDATE
Reduced script
wkbks = [fl for fl in os.listdir(originals_folder) if any([fl.endswith(ext) for ext in ['.xls', 'xlsx']])]
for w_idx, wkbk in enumerate(wkbks):
wb = xw.Book( os.path.join(originals_folder, wkbk) )
new_wb = xw.Book()
for sht_idx, sht in enumerate(wb.sheets):
pass
new_wb.save() # close to generic name in current location, as filepaths produced an error
# close newly made workbook
book_idx_to_close = [b_idx for b_idx, b in enumerate(xw.books) if b.name.startswith(new_name)]
if len(book_idx_to_close) > 0:
book_to_close = xw.books[book_idx_to_close[0]]
book_to_close.close()
print "Closed", new_name
# close original workbook
book_idx_to_close = [b_idx for b_idx, b in enumerate(xw.books) if b.name.startswith(wkbk)]
if len(book_idx_to_close) > 0:
book_to_close = xw.books[book_idx_to_close[0]]
book_to_close.close()
print "Closed", wkbk
New error:
<ipython-input-14-68ed128e9078> in <module>()
34 if len(book_idx_to_close) > 0:
35 book_to_close = xw.books[book_idx_to_close[0]]
---> 36 book_to_close.close()
37 print "Closed", new_name
38
...
CommandError: Command failed:
OSERROR: -50
MESSAGE: Parameter error.
COMMAND: app(pid=10100).workbooks[2].close(saving=k.no)
The error occurs when a file opens that requires a 'Read Only' access due to some features not being compatible with the current version of Excel. However, after receiving the error, when I retype book_to_close.close(), the file closes with no error. Also, there is no error when opening/saving/closing a file that had the same read only access, when I had previously 'walked through' it manually.
I realize this is a different error than the one above, but suspect they may be related (hence why leaving the original post as is above).
I am trying to parse an sqlscript file based on a delimiter ";" and then subsequently call Cx_Oracle to connect and execute statements on the DB server. I have run into an issue with a cursor related block of code. My call structure is thus:
ScriptHandle = open(filepath)
SqlScript = ScriptHandle.read()
SqlCommands = SqlScript.split(';')
for sqlcommand in SqlCommands:
print sqlcommand,'\n'*3
if sqlcommand:
ODBCCon.ExecuteWithCx_Oracle(cursor, sqlcommand)
The problem I have comes with the following sql block:
DECLARE CURSOR date_cur IS (select calendar_date
from cg_calendar dates
where dates.calendar_date between '30-Jun-2014' and '31-Jul-2015'
and global_business_or_holiday = 'B');
BEGIN
FOR date_rec in date_cur LOOP
insert into fc_pos
SELECT PP.acctid,
PP.mgrid,
PP.activitydt,
PP.secid,
PP.shrparamt,
PP.lclmktval,
PP.usdmktval,
SM.asset_name_1,
SM.asset_name_2,
SM.cg_sym,
SM.fc_local_crncy_cd,
SM.fc_local_crncy_id,
SM.fc_trade_cd,
substr(SM.asset_name_1, 17,3) as against_crncy_cd
FROM FC_acct_mgr AM, asset SM, ma_mktval PP
WHERE PP.dw_asset_id = SM.dw_asset_id
AND PP.secid = SM.asset_id
AND PP.activitydt = date_rec.calendar_date
AND AM.acctid = PP.acctid
AND AM.mgrid = PP.mgrid
AND SM.asset_categ_cd = 'FC';
END LOOP;
END;
The above python parse step disassociates the above code based on delimiter ";" where as I need to treat above as one block starting at the DECLARE and ending at the END;
How can I accomplish it from python end. I have been unable to make any headway on this and this is a legacy process flow that I am automating.
Thanks in advance.
While running following sample code I am getting error as "save is not defined."
That is why I added from xlsxcessive.xlsx import save.
But still it is not able to save on local machine.
Is this error of Python API or am I doing any mistake in Coding?
"""Just a simple example of XlsXcessive API usage."""
from xlsxcessive.xlsx import Workbook
from xlsxcessive.worksheet import Cell
**from xlsxcessive.xlsx import save**
import decimal
wb = Workbook()
sheet = wb.new_sheet('Test Sheet')
# a shared format
bigguy = wb.stylesheet.new_format()
bigguy.font(size=24)
bigguy.align('center')
# add a border
bigguy.border(top="medium", bottom="medium")
# set a builtin number format
bigguy.number_format('0.00')
# another shared format
boldfont = wb.stylesheet.new_format()
boldfont.font(bold=True)
# and another
highprec = wb.stylesheet.new_format()
# set a custom number format on the shared format
highprec.number_format("0.000")
# the API supports adding rows
row1 = sheet.row(1)
# rows support adding cells - cells can currently store strings, numbers
# and formulas.
a1 = row1.cell("A1", "Hello, World!", format=boldfont)
row1.cell("C1", 42.0, format=bigguy)
# cells can be merged with other cells - there is no checking on invalid merges
# though. merge at your own risk!
a1.merge(Cell('B1'))
# adding rows is easy
row2 = sheet.row(2)
row2.cell("B2", "Foo")
row2.cell("C2", 1, format=bigguy)
# formulas are written as strings and can have default values
shared_formula = sheet.formula("SUM(C1, C2)", 43.0, shared=True)
row3 = sheet.row(3)
row3.cell("C3", shared_formula, format=bigguy)
# you can work with cells directly on the sheet
sheet.cell('D1', 12.0005, format=highprec)
sheet.cell('D2', 11.9995, format=highprec)
sheet.cell('D3', shared_formula, format=highprec)
# and directly via row and column indicies
sheet.cell(coords=(0, 4), value=40)
sheet.cell(coords=(1, 4), value=2)
sheet.cell(coords=(2, 4), value=shared_formula)
# you can share a formula in a non-contiguous range of cells
times_two = sheet.formula('PRODUCT(A4, 2)', shared=True)
sheet.cell('A4', 12)
sheet.cell('B4', times_two)
sheet.cell('C4', 50)
sheet.cell('D4', times_two)
# iteratively adding data is easy now
for rowidx in xrange(5,10):
for colidx in xrange(5, 11, 2):
sheet.cell(coords=(rowidx, colidx), value=rowidx*colidx)
# set column widths
sheet.col(2, width=5)
# write unicode value
sheet.cell('G2', value=u"43\u00b0")
if __name__ == '__main__':
import os
import sys
from xlsxcessive.xlsx import save
if len(sys.argv) == 1:
print "USAGE: python sample.py NEWFILEPATH"
print "Writes a sample .xlsx file to NEWFILEPATH"
raise SystemExit(1)
if os.path.exists(sys.argv[1]):
print "Aborted. File %s already exists." % sys.argv[1]
raise SystemExit(1)
stream = None
if sys.argv[1] == '-':
stream = sys.stdout
# wb is the Workbook created above
save(wb, sys.argv[1], stream)
Note: Tried following also
# local file
save(workbook, 'financials.xlsx')
# stream
save(workbook, 'financials.xlsx', stream=sys.stdout)