Currently have a remote SQL server without multiple database structures on it. Connecting through Python code using PyMSSQL plugin and extracting data into pandas before applying some analysis. Is there a way to iterate such that with each loop, the database number changes, allowing a new database's data to be analysed?
E.g.
*connect to server
cursor.execute("SELECT TOP 100 *variable name* FROM *database_1*")
*analyse
*disconnect server
Ideally would have a loop allowing me to automatically read data from say database_1 through to database_10
IIUC you can easily do this using read_sql() method:
engine = create_engine('mssql+pymssql://USER:PWD#hostname/db_name')
for i in range(1,10):
qry = 'SELECT TOP 100 variable name FROM database_{}'.format(i)
df = pd.read_sql(qry, engine)
# analyse ...
Related
I am using pyodbc to run a query to create a temp table from a bunch of other tables. I then want to pull that whole temp table into pandas, but my pd.read_sql call takes upwards of 15 minutes. I want to try the connectorX library to see if it will speed things up.
For pandas the working way to query the temp table simply looks like:
conn = connection("connection string")
cursor = conn.cursor()
cursor.execute("""Do a bunch of stuff that ultimately creates one #finalTable""")
df = pd.read_sql("SELECT * FROM #finalTable", con=conn)
I've been reading the documentation and it appears I can only pass a connection string to the connectorx.read_sql function, and I haven't been able to find a way to pass it an existing connection that carries the temp table I need.
Am I able to query the temp table with connectorX? If so how?
If not, what would be a faster way to query a large temp table?
Thanks!
I'm attempting to use python with sqlalchemy to download some data, create a temporary staging table on a Teradata Server, then MERGEing that table into another table which I've created to permanently store this data. I'm using sql = slqalchemy.text(merge) and td_engine.execute(sql) where merge is a string similar to the below:
MERGE INTO perm_table as p
USING temp_table as t
ON p.Id = t.Id
WHEN MATCHED THEN
UPDATE
SET col1 = t.col1,
col2 = t.col2,
...
col50 = t.col50
WHEN NOT MATCHED THEN
INSERT (col1,
col2,
...
col50)
VALUES (t.col1,
t.col2,
...
t.col50)
The script runs all the way to the end without error and the SQL executes properly through Teradata Studio, but for some reason the table won't update when I execute it through SQLAlchemy. However, I've also run different SQL expressions, like the insert that populated perm_table from the same python script and it worked fine. Maybe there's something specific to the MERGE and SQLAlchemy combo?
Since you're using the engine directly, without using a transaction, you're probably (barring unseen configuration on your part) relying on SQLAlchemy's version of autocommit, which works by detecting data changing operations such as INSERTs etc. Possibly MERGE is not one of the detected operations. Try
sql = sqlalchemy.text(merge).execution_options(autocommit=True)
td_engine.execute(sql)
I'm trying to replace some old MSSQL stored procedures with python, in an attempt to take some of the heavy calculations off of the sql server. The part of the procedure I'm having issues replacing is as follows
UPDATE mytable
SET calc_value = tmp.calc_value
FROM dbo.mytable mytable INNER JOIN
#my_temp_table tmp ON mytable.a = tmp.a AND mytable.b = tmp.b AND mytable.c = tmp.c
WHERE (mytable.a = some_value)
and (mytable.x = tmp.x)
and (mytable.b = some_other_value)
Up to this point, I've made some queries with SQLAlchemy, stored those data in Dataframes, and done the requisite calculations on them. I don't know now how to put the data back into the server using SQLAlchemy, either with raw SQL or function calls. The dataframe I have on my end would essentially have to work in the place of the temporary table created in MSSQL Server, but I'm not sure how I can do that.
The difficulty is of course that I don't know of a way to join between a dataframe and a mssql table, and I'm guessing this wouldn't work so I'm looking for a workaround
As the pandas doc suggests here :
from sqlalchemy import create_engine
engine = create_engine("mssql+pyodbc://user:password#DSN", echo = False)
dataframe.to_sql('tablename', engine , if_exists = 'replace')
engine parameter for msSql is basically the connection string check it here
if_exist parameter is a but tricky since 'replace' actually drops the table first and then recreates it and then inserts all data at once.
by setting the echo attribute to True it shows all background logs and sql's.
I currently have a Python dataframe that is 23 columns and 20,000 rows.
Using Python code, I want to write my data frame into a MSSQL server that I have the credentials for.
As a test I am able to successfully write some values into the table using the code below:
connection = pypyodbc.connect('Driver={SQL Server};'
'Server=XXX;'
'Database=XXX;'
'uid=XXX;'
'pwd=XXX')
cursor = connection.cursor()
for index, row in df_EVENT5_15.iterrows():
cursor.execute("INSERT INTO MODREPORT(rowid, OPCODE, LOCATION, TRACKNAME)
cursor.execute("INSERT INTO MODREPORT(rowid, location) VALUES (?,?)", (5, 'test'))
connection.commit()
But how do I write all the rows in my data frame table to the MSSQL server? In order to do so, I need to code up the following steps in my Python environment:
Delete all the rows in the MSSQL server table
Write my dataframe to the server
When you say Python data frame, I'm assuming you're using a Pandas dataframe. If it's the case, then you could use the to_sql function.
df.to_sql("MODREPORT", connection, if_exists="replace")
The if_exists argument set to replace will delete all the rows in the existing table before writing the records.
I realise it's been a while since you asked but the easiest way to delete ALL the rows in the SQL server table (point 1 of the question) would be to send the command
TRUNCATE TABLE Tablename
This will drop all the data in the table but leave the table and indexes empty so you or the DBA would not need to recreate it. It also uses less of the transaction log when it runs.
I'm trying to migrate data from a MySQL DB to HANA utilizing Python. The way we're currently implementing this migration at work is manually but the plan is to run a script everyday to collect data from the prior day (stored in MySQL) and move it to HANA to use their analytics tools. I have written a script with 2 functions, one that connects to MySQL and temporarily stores the data from the query in a Pandas Dataframe. The second function uses the sqlalchemy-hana connector to create an engine that I feed into Pandas' to_sql function to store the data into HANA.
Below is the first function call to MySQL
def connect_to_mysql(query):
try:
#connect to the db
stagedb = myscon.connect(
user = 'user-name',
password = 'password',
host = 'awshost.com',
database = 'sampletable',
raise_on_warnings = True)
df = pandas.read_sql(query, stagedb)
except myscon.Error as err:
if err.errno == errorcode.ER_ACCESS_DENIED_ERROR:
print('Incorrect user name or password')
elif err.errno == errorcode.ER_BAD_DB_ERROR:
print("Database does not exit")
else:
print(err)
finally:
if central_stagedb:
central_stagecur.close()
central_stagedb.close()
return df
This is the second function call to connect to HANA
def connect_to_hana(query):
#connect to HANA db
try:
engine = create_engine('hana://username:password#host:port')
#return dataframe from first function
to_df = connect_to_mysql(query)
to_df.to_sql('sample_data', engine, if_exists = 'append', index = False, chunksize=20000)
except: raise
My HAHA DB has several schemas in the catalog folder, many of them "SYS" or "_SYS" related. I have created a separate schema to test my code on and play around in, which has the same name as my username.
My questions are as such: 1) is there a more efficient way to load data from MySQL to Hana without using a go-between like a CSV file or, in my case, a Pandas Dataframe. Using VS Code it takes around 90 seconds for the script to complete and 2) when using the sqlalchemy-hana connector, how does it know which schema to create the table and store the data/append the data to? The read-me file didn't really explain. Luckily it's storing it in the right schema (the one with my username) but I created another one as a test and of course the table didn't show up under that one. If I try to specify the database in the create_engine line as so:
engine = create_engine('hana://username:password#host:port/Username')
I get this error: TypeError: connect() got an unexpected keyword argument 'database'.
Also, I noticed that say if I were to run my script twice and count the number of rows in the created table, it adds the rows twice - essentially creating duplicates. Because of this, 3) would it be better to iterate throw the rows of the Dataframe and insert the rows one by one using the pyhdb package?
Any advice/suggestions/answers will be very much appreciated! Thank you!
Gee... that seems like a rather complicated workflow. Alternatively, you may want to check the HANA features Smart Data Access (SDA) and Smart Data Integration (SDI). With these, you could either establish a "virtual" data access in SAP HANA, that is, you read data from the MySQL DB into the HANA process when you run your analytics query. Or you could actually load the data into HANA, making it a data mart.
If it is really just about the "piping" for this data transfer, I probably wouldn't put 3rd party tools into the scenario. This only makes the setup more complicated than necessary.