I want to connect Oracle Db via python and take query results data and create excel or csv reports by using these data. I never tried before and did not see anyone who did something like this around me, do you have any recommendations or ideas for that case?
Regards
You can connect Oracle db with python cx_Oracle library using syntax below for connection string. You should be aware that your connection_oracle_textfile.txt file and your .py file which had your python code must be in the samefolder for start.
connection_oracle_textfile.txt -> username/password#HOST:PORT/SERVICE_NAME(you can find all of them but username and password in tnsnames.ora file)
import cx_Oracle as cx_Oracle
import pandas as pd
def get_oracle_table_from_dbm(sql_text):
if 'connection_oracle' not in globals():
print('connection does not exist. Try to connect it...')
f = open('connection_oracle_textfile.txt', "r")
fx = f.read()
####
global connection_oracle
connection_oracle = cx_Oracle.connect(fx)
####
print('connection established!!')
print('Already have connection. Just fetch data!!')
return pd.read_sql(sql_text, con=connection_oracle)
df=get_oracle_table_from_dbm('select * from dual')
There are other stackoverflow answers to this, e.g. How to export a table to csv or excel format. Remember to tune cursor.arraysize.
You don't strictly need the pandas library for to create csv files, though you may want it for future data analysis.
The cx_Oracle documentation discussions installation, connection, and querying, amongst other topics.
If you want to read from a CSV file, see Loading CSV Files into Oracle Database.
I have a scenario to migrate SQL Server tables(30- 40 tables) to Oracle. I Cannot depend on SSIS as the no of tables to be migrated to Oracle will change regularly and I cannot always create or update a DFT when ever there is a change in schema.
Is there any other way where the movement of data can be handled dynamically and can work effectively ? Like using Python or any other Programming languages ?
C# approach - SchemaMapper library
Since you are open to a solution using a programming language, i think you can benefit from SchemaMapper class library which is an open-source project published on GitHub. A full description can be found in the Readme file on the link above.
Important Note: Yesterday i added the support of reading data from databases (SQL Server , Oracle ...) and the ability to export data to Oracle.
In this answer i will provide information on importing SQL Server tables, create the appropriate SchemaMapper class for each one (since they have different schema and you need to import them to different schemas), and how to export data to Oracle.
//First of all list the tables names need to import
string[] TableNameFilter = new[] { "Table1", "Table2" };
//Create an instance of the oracle import class
SchemaMapper.Exporters.OracleExport expOracle = new SchemaMapper.Exporters.OracleExport(oracleconnectionstring);
//Create an SQL Server import class
using (SchemaMapper.Converters.SqlServerCeImport ssImport = new SchemaMapper.Converters.SqlServerCeImport(sqlconnectionstring))
{
//Retrieve tables names
ssImport.getSchemaTable();
//loop over tables matching the filter
foreach(DataRow drRowSchema in ssImport.SchemaTable.AsEnumerable().Where(x =>
TableNameFilter.Contains(x["TABLE_NAME"].ToString())).ToList())
{
string SQLTableName = drRowSchema["TABLE_NAME"].ToString();
string SQLTableSchema = drRowSchema["TABLE_SCHEMA"].ToString();
DataTable dtSQL = ssImport.GetDataTable(SQLTableSchema, SQLTableName);
//Create a schema mapping class
using (SchemaMapper.SchemaMapping.SchemaMapper sm = new SchemaMapper.SchemaMapping.SchemaMapper(SQLTableSchema, SQLTableName))
{
foreach (DataColumn dc in dtSQL.Columns)
{
SchemaMapper_Column smCol = new SchemaMapper_Column();
smCol.Name = dc.ColumnName;
smCol.Name = dc.ColumnName;
smCol.DataType = smCol.GetCorrespondingDataType(dc.DataType.ToString(), dc.MaxLength);
sm.Columns.Add(smCol);
}
//create destination table in oracle
expOracle.CreateDestinationTable(sm);
//Insert data
expOracle.InsertUsingOracleBulk(sm, dtSQL);
//there are other methods such as :
//expOracle.InsertIntoDb(sm, dtSQL);
//expOracle.InsertIntoDbWithParameters(sm, dtSQL);
}
}
}
Note: this is an open-source project: it is not fully tested and not all data types are supported, if you encountered some errors feel free to give a feedback, or add an Issue in GitHub
Other approach - SQL Server Import and Export Wizard
If you can do this without scheduling a Job, then you can use the Import and Export Wizard which allows you to import multiple tables into Oracle without the need to build the packages manually. It will create packages, destination tables, map columns and import data.
Start the SQL Server Import and Export Wizard
Connect to an Oracle Data Source (SQL Server Import and Export Wizard)
Here is the approach I have decided to go considering the time constraint( using C# is taking more time).For 8 GB table it is taking 11 minutes to move the data SQL to Oracle.
Steps:
Dump the SQL tables data into flat files.(Used BIML for automating
the DFT creation)
Transfer these flat files to the Destination server.
Using SQL*Loader to load data from flat files to Oracle.
I have used Python to parse a txt file for specific information (dates, $ amounts, lbs, etc) and now I want to export that data to an Oracle table that I made in SQL Developer.
I have successfully connected Python to Oracle with the cx_Oracle module, but I am struggling to export or even print any data to my database from Python.
I am not proficient at using SQL, I know of simple queries and that's about it. I have explored the Oracle docs and haven't found straightforward export commands. When exporting data to an Oracle table via Python is it Python code I am going to be using or SQL code? Is it the same as importing a CSV file, for example?
I would like to understand how to write to an Oracle table from Python; I need to parse and export a very large amount of data so this won't be a one time export/import. I would also ideally like to have a way to preview my import to ensure it aligns correctly with my already created Oracle table, or if a simple undo action exists that would suffice.
If my problem is unclear I am more than happy to clarify it. Thanks for all help.
My code so far:
import cx_Oracle
dsnStr = cx_Oracle.makedsn("sole.wh.whoi.edu", "1526", "sole")
con = cx_Oracle.connect(user="myusername", password="mypassword", dsn=dsnStr)
print (con.version)
#imp 'Book1.csv' [this didn't work]
cursor = con.cursor()
print (cursor)
con.close()
From Import a CSV file into Oracle using CX_Oracle & Python 2.7 you can see overall plan.
So if you already parsed data into csv you can easily do it like:
import cx_Oracle
import csv
dsnStr = cx_Oracle.makedsn("sole.wh.whoi.edu", "1526", "sole")
con = cx_Oracle.connect(user="myusername", password="mypassword", dsn=dsnStr)
print (con.version)
#imp 'Book1.csv' [this didn't work]
cursor = con.cursor()
print (cursor)
text_sql = '''
INSERT INTO tablename (firstfield, secondfield) VALUES(:1,:2)
'''
my_file = 'C:\CSVData\Book1.csv'
cr = csv.reader(open(my_file,"rb"))
for row in cr:
print row
cursor.execute(text_sql, row)
print 'Imported'
con.close()
What seemed like a trivial task turned into a real nightmare when I had to read in some data from a PowerPivot model using Python. I believe I've researched this very well over the last couple of days but now I hit a brick wall and would appreciate some help from the Python/SSAS/ADO community.
Basically, all I want to do is programmatically access raw data stored in PowerPivot models - my idea was to connect to the underlying PowerPivot (i.e. MS Analysis Services) engine via one of the methods listed below, list the tables contained in the model, then extract the raw data from each table using a simple DAX query (something like EVALUATE (table_name)). Easy peasy, right? Well, maybe not.
0. Some Background Information
As you can see, I've tried several different approaches. I'll try to document everything as carefully as possible so that those uninitiated in PowerPivot functionality will have a good idea of what I'd like to do.
First of all, some background on programmatic access to Analysis Services engine (it says 2005 SQL Server, but all of it ought to still be applicable): SQL Server Data Mining Programmability and Data providers used for Analysis Services connections.
The sample Excel/PowerPivot file I'll be using in the example below can be found here: Microsoft PowerPivot for Excel 2010 and PowerPivot in Excel 2013 Samples.
Also, note that I'm using Excel 2010, so some of my code is version-specific. E.g. wb.Connections["PowerPivot Data"].OLEDBConnection.ADOConnection should be wb.Model.DataModelConnection.ModelConnection.ADOConnection if you're using Excel 2013.
The connection string I'll be using throughout this question is based on the information found here: Connect to PowerPivot engine with C#. Additionally, some of the methods apparently require some sort of initialization of the PowerPivot model prior to data retrieval. See here: Automating PowerPivot Refresh operation from VBA.
Finally, here's a couple of links showing that this should be achievable (note however, that these links mainly refer to C#, not Python):
Made connection to PowerPivot DataModel, how can I fill a dataset with it?
Connecting to PowerPivot with C#
2013 C# connection to PowerPivot DataModel
Connecting Tableau and PowerPivot. It just works. (showing that external apps can in fact read PowerPivot model data - note that the Tableau add-in installs Interop.ADODB.dll assembly, which I guess is what it uses to access the PowerPivot data)
1. Using ADOMD
import clr
clr.AddReference("Microsoft.AnalysisServices.AdomdClient")
import Microsoft.AnalysisServices.AdomdClient as ADOMD
ConnString = "Provider=MSOLAP;Data Source=$Embedded$;Locale Identifier=1033;
Location=H:\\PowerPivotTutorialSample.xlsx;SQLQueryMode=DataKeys"
Connection = ADOMD.AdomdConnection(ConnString)
Connection.Open()
Here, it appears the problem is that the PowerPivot model has not been initialized:
AdomdConnectionException: A connection cannot be made. Ensure that the server is running.
2. Using AMO
import clr
clr.AddReference("Microsoft.AnalysisServices")
import Microsoft.AnalysisServices as AMO
ConnString = "Provider=MSOLAP;Data Source=$Embedded$;Locale Identifier=1033;
Location=H:\\PowerPivotTutorialSample.xlsx;SQLQueryMode=DataKeys"
Connection = AMO.Server()
Connection.Connect(ConnString)
Same story, "the server is not running":
ConnectionException: A connection cannot be made. Ensure that the server is running.
Note that AMO is technically not used for querying data, but I included it as one of the potential ways of connecting to the PowerPivot model.
3. Using ADO.NET
import clr
clr.AddReference("System.Data")
import System.Data.OleDb as ADONET
ConnString = "Provider=MSOLAP;Data Source=$Embedded$;Locale Identifier=1033;
Location=H:\\PowerPivotTutorialSample.xlsx;SQLQueryMode=DataKeys"
Connection = ADONET.OleDbConnection()
Connection.ConnectionString = ConnString
Connection.Open()
This is similar to What's the simplest way to access mssql with python or ironpython?. Unfortunately, this also doesn't work:
OleDbException: OLE DB error: OLE DB or ODBC error: The following system error occurred:
The requested name is valid, but no data of the requested type was found.
4. Using ADO via adodbapi module
import adodbapi
ConnString = "Provider=MSOLAP;Data Source=$Embedded$;Locale Identifier=1033;
Location=H:\\PowerPivotTutorialSample.xlsx;SQLQueryMode=DataKeys"
Connection = adodbapi.connect(ConnString)
Similar to Opposite Workings of OLEDB/ODBC between Python and MS Access VBA. The error I get is:
OperationalError: (com_error(-2147352567, 'Exception occurred.', (0, u'Microsoft OLE DB
Provider for SQL Server 2012 Analysis Services.', u'OLE DB error: OLE DB or ODBC error: The
following system error occurred: The requested name is valid, but no data of the requested
type was found...
This is basically the same problem as with ADO.NET above.
5. Using ADO via Excel/win32com module
from win32com.client import Dispatch
Xlfile = "H:\\PowerPivotTutorialSample.xlsx"
XlApp = Dispatch("Excel.Application")
Workbook = XlApp.Workbooks.Open(Xlfile)
Workbook.Connections["PowerPivot Data"].Refresh()
Connection = Workbook.Connections["PowerPivot Data"].OLEDBConnection.ADOConnection
Recordset = Dispatch('ADODB.Recordset')
Query = "EVALUATE(dbo_DimDate)" #sample DAX query
Recordset.Open(Query, Connection)
The idea for this approach came from this blog post that uses VBA: Export a table or DAX query from Power Pivot to CSV using VBA. Note that this approach uses an explicit Refresh command that initializes the model (i.e. "server"). Here's the error message:
com_error: (-2147352567, 'Exception occurred.', (0, u'ADODB.Recordset', u'Arguments are of
the wrong type, are out of acceptable range, or are in conflict with one another.',
u'C:\\Windows\\HELP\\ADO270.CHM', 1240641, -2146825287), None)
It appears, however, that the ADO connection has been established:
type(Connection) returns instance
print(Connection) returns Provider=MSOLAP.5;Persist Security Info=True;Initial Catalog=Microsoft_SQLServer_AnalysisServices;Data Source=$Embedded$;MDX Compatibility=1;Safety Options=2;ConnectTo=11.0;MDX Missing Member Mode=Error;Subqueries=2;Optimize Response=3;Cell Error Mode=TextValue
It seems the problem lies in the creation of the ADODB.Recordset object.
6. Using ADO via Excel/win32com, direct use of ADODB.Connection
from win32com.client import Dispatch
ConnString = "Provider=MSOLAP;Data Source=$Embedded$;Locale Identifier=1033;
Location=H:\\PowerPivotTutorialSample.xlsx;SQLQueryMode=DataKeys"
Connection = Dispatch('ADODB.Connection')
Connection.Open(ConnString)
Similar to Connection to Access from Python [duplicate] and Query access using ADO in Win32 platform (Python recipe). Unfortunately, the error Python spits out is the same as in the two examples above:
com_error: (-2147352567, 'Exception occurred.', (0, u'Microsoft OLE DB Provider for SQL
Server 2012 Analysis Services.', u'OLE DB error: OLE DB or ODBC error: The following system
error occurred: The requested name is valid, but no data of the requested type was found.
..', None, 0, -2147467259), None)
7. Using ADO via Excel/win32com, direct use of ADODB.Connection plus model refresh
from win32com.client import Dispatch
Xlfile = "H:\\PowerPivotTutorialSample.xlsx"
XlApp = Dispatch("Excel.Application")
Workbook = XlApp.Workbooks.Open(Xlfile)
Workbook.Connections["PowerPivot Data"].Refresh()
ConnStringInternal = "Provider=MSOLAP.5;Persist Security Info=True;Initial Catalog=
Microsoft_SQLServer_AnalysisServices;Data Source=$Embedded$;MDX
Compatibility=1;Safety Options=2;ConnectTo=11.0;MDX Missing Member
Mode=Error;Optimize Response=3;Cell Error Mode=TextValue"
Connection = Dispatch('ADODB.Connection')
Connection.Open(ConnStringInternal)
I was hoping I could initialize an instance of Excel, then initialize the PowerPivot model, and then create a connection using the internal connection string Excel uses for embedded PowerPivot data (similar to How do you copy the powerpivot data into the excel workbook as a table? - note that the connection string is different from the one I've used elsewhere). Unfortunately, this doesn't work and my guess is that Python starts the ADODB.Connection process in a separate instance (as I get the same error message when I execute the last three rows without first initializing Excel, etc.):
com_error: (-2147352567, 'Exception occurred.', (0, u'Microsoft OLE DB Provider for SQL
Server 2012 Analysis Services.', u'Either the user, ****** (masked), does not have access
to the Microsoft_SQLServer_AnalysisServices database, or the database does not exist.',
None, 0, -2147467259), None)
Lo and behold, I finally managed to crack the problem - turns out that accessing Power Pivot data using Python is indeed possible! Below's a short recap of what I did - you can find a more detailed description here: Analysis Services (SSAS) on a shoestring. Note: the code has been optimized neither for efficiency nor elegance.
Install Microsoft Power BI Desktop (comes with free Analysis Services server, so no need for a costly SQL Server license - however, the same approach obviously also works if you have a proper license).
Fire up the AS engine by first creating the msmdsrv.ini settings file, then restore the database from the ABF file (using AMO.NET), then extract data using ADOMD.NET.
Here's the Python code that illustrates the AS engine + AMO.NET parts:
import psutil, subprocess, random, os, zipfile, shutil, clr, sys, pandas
def initialSetup(pathPowerBI):
sys.path.append(pathPowerBI)
#required Analysis Services assemblies
clr.AddReference("Microsoft.PowerBI.Amo.Core")
clr.AddReference("Microsoft.PowerBI.Amo")
clr.AddReference("Microsoft.PowerBI.AdomdClient")
global AMO, ADOMD
import Microsoft.AnalysisServices as AMO
import Microsoft.AnalysisServices.AdomdClient as ADOMD
def restorePowerPivot(excelName, pathTarget, port, pathPowerBI):
#create random folder
os.chdir(pathTarget)
folder = os.getcwd()+str(random.randrange(10**6, 10**7))
os.mkdir(folder)
#extract PowerPivot model (abf backup)
archive = zipfile.ZipFile(excelName)
for member in archive.namelist():
if ".data" in member:
filename = os.path.basename(member)
abfname = os.path.join(folder, filename) + ".abf"
source = archive.open(member)
target = file(os.path.join(folder, abfname), 'wb')
shutil.copyfileobj(source, target)
del target
archive.close()
#start the cmd.exe process to get its PID
listPIDpre = [proc for proc in psutil.process_iter()]
process = subprocess.Popen('cmd.exe /k', stdin=subprocess.PIPE)
listPIDpost = [proc for proc in psutil.process_iter()]
pid = [proc for proc in listPIDpost if proc not in listPIDpre if "cmd.exe" in str(proc)][0]
pid = str(pid).split("=")[1].split(",")[0]
#msmdsrv.ini
msmdsrvText = '''<ConfigurationSettings>
<DataDir>{0}</DataDir>
<TempDir>{0}</TempDir>
<LogDir>{0}</LogDir>
<BackupDir>{0}</BackupDir>
<DeploymentMode>2</DeploymentMode>
<RecoveryModel>1</RecoveryModel>
<DisklessModeRequested>0</DisklessModeRequested>
<CleanDataFolderOnStartup>1</CleanDataFolderOnStartup>
<AutoSetDefaultInitialCatalog>1</AutoSetDefaultInitialCatalog>
<Network>
<Requests>
<EnableBinaryXML>1</EnableBinaryXML>
<EnableCompression>1</EnableCompression>
</Requests>
<Responses>
<EnableBinaryXML>1</EnableBinaryXML>
<EnableCompression>1</EnableCompression>
<CompressionLevel>9</CompressionLevel>
</Responses>
<ListenOnlyOnLocalConnections>1</ListenOnlyOnLocalConnections>
</Network>
<Port>{1}</Port>
<PrivateProcess>{2}</PrivateProcess>
<InstanceVisible>0</InstanceVisible>
<Language>1033</Language>
<Debug>
<CallStackInError>0</CallStackInError>
</Debug>
<Log>
<Exception>
<CrashReportsFolder>{0}</CrashReportsFolder>
</Exception>
<FlightRecorder>
<Enabled>0</Enabled>
</FlightRecorder>
</Log>
<AllowedBrowsingFolders>{0}</AllowedBrowsingFolders>
<ResourceGovernance>
<GovernIMBIScheduler>0</GovernIMBIScheduler>
</ResourceGovernance>
<Feature>
<ManagedCodeEnabled>1</ManagedCodeEnabled>
</Feature>
<VertiPaq>
<EnableDisklessTMImageSave>0</EnableDisklessTMImageSave>
<EnableProcessingSimplifiedLocks>1</EnableProcessingSimplifiedLocks>
</VertiPaq>
</ConfigurationSettings>'''
#save ini file to disk, fill it with required parameters
msmdsrvini = open(folder+"\\msmdsrv.ini", "w")
msmdsrvText = msmdsrvText.format(folder, port, pid) #{0},{1},{2}
msmdsrvini.write(msmdsrvText)
msmdsrvini.close()
#run AS engine inside the cmd.exe process
initString = "\"{0}\\msmdsrv.exe\" -c -s \"{1}\""
initString = initString.format(pathPowerBI.replace("/","\\"),folder)
process.stdin.write(initString + " \n")
#connect to the AS instance from Python
AMOServer = AMO.Server()
AMOServer.Connect("localhost:{0}".format(port))
#restore database from PowerPivot abf backup, disconnect
AMORestoreInfo = AMO.RestoreInfo(os.path.join(folder, abfname))
AMOServer.Restore(AMORestoreInfo)
AMOServer.Disconnect()
return process
And the data-extraction part:
def runQuery(query, port, flag):
#ADOMD assembly
ADOMDConn = ADOMD.AdomdConnection("Data Source=localhost:{0}".format(port))
ADOMDConn.Open()
ADOMDCommand = ADOMDConn.CreateCommand()
ADOMDCommand.CommandText = query
#read data in via AdomdDataReader object
DataReader = ADOMDCommand.ExecuteReader()
#get metadata, number of columns
SchemaTable = DataReader.GetSchemaTable()
numCol = SchemaTable.Rows.Count #same as DataReader.FieldCount
#get column names
columnNames = []
for i in range(numCol):
columnNames.append(str(SchemaTable.Rows[i][0]))
#fill with data
data = []
while DataReader.Read()==True:
row = []
for j in range(numCol):
try:
row.append(DataReader[j].ToString())
except:
row.append(DataReader[j])
data.append(row)
df = pandas.DataFrame(data)
df.columns = columnNames
if flag==0:
DataReader.Close()
ADOMDConn.Close()
return df
else:
#metadata table
metadataColumnNames = []
for j in range(SchemaTable.Columns.Count):
metadataColumnNames.append(SchemaTable.Columns[j].ToString())
metadata = []
for i in range(numCol):
row = []
for j in range(SchemaTable.Columns.Count):
try:
row.append(SchemaTable.Rows[i][j].ToString())
except:
row.append(SchemaTable.Rows[i][j])
metadata.append(row)
metadf = pandas.DataFrame(metadata)
metadf.columns = metadataColumnNames
DataReader.Close()
ADOMDConn.Close()
return df, metadf
The raw data are then extracted via something like this:
pathPowerBI = "C:/Program Files/Microsoft Power BI Desktop/bin"
initialSetup(pathPowerBI)
session = restorePowerPivot("D:/Downloads/PowerPivotTutorialSample.xlsx", "D:/", 60000, pathPowerBI)
df, metadf = runQuery("EVALUATE dbo_DimProduct", 60000, 1)
endSession(session)
The problem with getting data out of PowerPivot is that the tabular engine in PowerPivot runs in-process inside Excel and the only way to connect to that engine is to have your code running inside Excel too. (I suspect that it may use shared memory or some other transport, but it's definitely not listening on a TCP port or a named pipe or anything like that which would allow an external process to connect)
We do this in Dax Studio by running a C# VSTO Excel add-in in Excel. However that was only designed to work for testing analytic queries, not for doing bulk data extraction. We marshal the data across from the add-in to the UI using a string variable so the entire dataset must be less than 2Gb or the response gets truncated and you will see an "unrecognizable response" error (the data is serialized into an XMLA rowset which is quite verbose so may see it break when only extracting a few hundred Mb of data)
If you wanted to build a script to automate extracting all the raw data from a model I don't think you will be able to do it with Python as I don't believe you can get the python interpreter running in-process inside Excel. I would look at using a vba macro like this one http://www.powerpivotblog.nl/export-a-table-or-dax-query-from-power-pivot-to-csv-using-vba/
You should find that you can query the model for a list of tables with something like "SELECT * FROM $SYSTEM.DBSCHEMA_TABLES" - you could then loop over each table and extract with a variation of the code in the above link.
I got in touch with Tom Gleeson (aka Gobán Saor) who was kind enough to let me post his emails here. There are some interesting nuggets in them, so hopefully others will also find them useful.
Email #1
When you say Python, you mean running Python.NET as a standalone exe?
If that’s the case, you’re out of luck with Excel PP models (different
story for Power BI desktop though). I’ve accessed PP models (2010+)
successfully from both VBA, and from Python.NET (via AMO) using
similar code to that in your SO question. The difference being (in
both VBA & .NET version) is that my code is running in-process within
Excel using Excel’s various add-in technologies. (Likely Tableau is
also running as an add-in or has embedded Excel within itself enabling
similar behaviour). DAX Studio (a useful C# code base to learn the
how-tos of PP access) runs both as an Excel add-in and as a standalone
EXE, but only as an add-in can it access Excel based PP models.
Email #2
You might find the process of using Python.NET for this somewhat
challenging. You would need to embed a Python engine using C#/VB.NET
Excel add-in code. I’ve used Excel-DNA (a fantastic open source
project) rather than MS’s highly cumbersome "official" method for
developing such .NET addins in the past, but I mainly stick to VBA
where at all possible.
Using VBA you’ll not be able to access the
.NET-only AMO (so no ability to create calculated columns on the fly),
but by loading the resulting dataset into an ADO recordset you should
be able to output to a worksheet OR to a corporate-database/MS Access
OR to a flat-file/CSV etc.
Unlike the 1M worksheet limit, for a
flat-file or database output memory (RAM) will be the limiting factor,
but, assuming you’re using 64bit Excel and have enough memory to hold
the compacted model and the workspace for the largest of the model’s
tables in un-compacted form (i.e. a row based rather than column based
format that’ll result from a DAX Query), multiplied by 2ish (one
instance within PP workspace the other within VBA’s ADO workspace) you
should be okay.
Having said that, I’ve never attempted extracting a
very large dataset, and using models as a dataset exchange medium is
not one of PP’s "use-cases"; so, very large tables might hit some
other bug/constraint!
I have a script that stores results in pdf format in a particular folder. I want to create a mysql database ( which is successful with the below code ), and populate the pdf results to it. what would be the best way , storing the file as such , or as reference to the location. The file size would be around 2MB. Could someone help in explaining the same with some working examples. I am new to both python and mysql.Thanks in advance.
To clarify more : I tried using LOAD DATA INFILE and the BLOB type for the result file column , but it dosent seem to work .I am using pymysql api module to connect to the database.Below code is to connect to the database and is successful.
import pymsql
conn = pymysql.connect(host='hostname', port=3306, user='root', passwd='abcdef', db='mydb')
cur = conn.cursor()
cur.execute("SELECT * FROM userlogin")
for r in cur.fetchall():
print(r)
cur.close()
conn.close()
Since you seem to be close to getting mysql to store strings for you (user names), your best bet is to just stick with what you did there and store the file path just as you stored the strings in your userlogin table (but in a different table with a foreign key to userlogin). It will probably be the most efficient approach in the long run anyway, especially if you store important metadata along with the file path (like keywords or even complete n-gram sets)... now you're talking about a file indexing system like Google Desktop or Xapian... just so you know what you're up against if you want to do this the "best" way.