Using Python looping through a number of SQL Server databases creating tables using select into, but when I run the script nothing happens i.e. no error messages and the tables have not been created. Below is an extract example of what I am doing. Can anyone advise?
df = [] # dataframe of database names as example
for i, x in df.iterrows():
SQL = """
Drop table if exists {x}..table
Select
Name
Into
{y}..table
From
MainDatabase..Details
""".format(x=x['Database'],y=x['Database'])
cursor.execute(SQL)
conn.commit()
Looks like your DB driver doesn't support multiple statements behavior, try to split your query to 2 single statements one with drop and other with select:
for i, x in df.iterrows():
drop_sql = """
Drop table if exists {x}..table
""".format(x=x['Database'])
select_sql = """
Select
Name
Into
{y}..table
From
MainDatabase..Details
""".format(x=x['Database'], y=x['Database'])
cursor.execute(drop_sql)
cursor.execute(select_sql)
cursor.commit()
And second tip, your x=x['Database'] and y=x['Database'] are the same, is this correct?
Related
I have stored queries (like select * from table) in a Snowflake table and want to execute each query row-by-row and generate a CSV file for each query. Below is the python code where I am able to print the queries but don't know how to execute each query and create a CSV file:
I believe I am close to what I want to achieve. I would really appreciate if someone can help over here.
import pyodbc
import pandas as pd
import snowflake.connector
import os
conn = snowflake.connector.connect(
user = 'User',
password = 'Pass',
account = 'Account',
autocommit = True
)
try:
cursor = conn.cursor()
query=('Select Column from Table;')--This will return two select
statements
output = cursor.execute(query)
for i in cursor:
print(i)
cursor.close()
del cursor
conn.close()
except Exception as e:
print(e)
You're pretty close. just need to execute the code instead of printing, and put the data into a file.
I haven't used pandas much myself, but this is the code that Snowflake documentation provides for running a query and putting it into a pandas dataframe.
cursor = conn.cursor()
query=('Select Column, row_number() over(order by Column) as Rownum from Table;')
cursor.execute(query)
resultset = cursor.fetchall()
for result in resultset:
cursor.execute(result[0])
df = cursor.fetch_pandas_all()
df.to_csv(r'C:\Users\...<your filename here>'+ result[1], index = False)
may take some fiddling, but here's a couple links for references:
Snowflake docs on creating a pandas dataframe
Exporting a pandas dataframe to csv
Update: added an example of a way to create separate files for each record. This just adds a distinct number to each row of your sql output so you can use that number as part of the filename. Ultimately, you need to have some logic in your loop to create a filename, whether that's adding a random number, a timestamp, whatever. That can come from the SQL or from the python, up to you. I'd probably add a filename column to your table, but I don't know if that makes sense for you.
I am trying to join two tables on a column and then populate a new table with the query results.
I know that the join command gives me the table data I want but now how do I insert this data into a new table without having to loop through the results as there are many unique column names. Is there a way to do this with a SQLite command? To do this without SQLite command would require nested for loops and become computationally expensive (if it even works).
Join command that works:
connection = sqlite3.connect("database1.db")
c = connection.cursor()
c.execute("ATTACH DATABASE 'database1.db' AS db_1")
c.execute("ATTACH DATABASE 'database2.db' AS db_2")
c.execute("SELECT * FROM db_1.Table1Name AS a JOIN db_2.Table2Name AS b WHERE a.Column1 = b.Column2")
Attempt to join and insert command that does not error but does not populate the table:
c.execute("INSERT INTO 'NewTableName' SELECT * FROM db_1.Table1Name AS a JOIN db_2.Table2Name AS b WHERE a.Column1 = b.Column2")
the sql part is:
CREATE TABLE new_table AS
SELECT expressions
FROM existing_tables
[WHERE conditions];
Is there a way to return the aliased column names from a sql query returned from JayDeBeApi?
For example, I have the following query:
sql = """ SELECT visitorid AS id_alias FROM table LIMIT 1 """
I then run the following (connect_to_vdm() establishes a connection to my DB):
curs = connect_to_vdm().cursor()
curs.execute(sql)
vals = curs.fetchall()
I normally retrieve column names like so:
desc = curs.description
column_names = [col[0] for col in desc]
This returns the original column name "visitorid" and not the alias specified in the query "id_alias".
I know I could swap the names for the value in Python, but hoping to be able to have this done within the query since it is already defined in the Select statement. This behaves as expected in a SQL client, but I cannot seem to get the Aliases to return when using python/JayDeBeApi. Is there a way to do this using JayDeBeApi?
EDIT:
I have discovered that structuring my query with a CTE seems to help fix the problem, but still wondering if there is a more straightforward solution out there. Here is how I rewrote the same query:
sql = """ WITH cte (id_alias) AS (SELECT visitorid AS id_alias FROM table LIMIT 1) SELECT id_alias from cte"""
I was able to fix this using a CTE (Common Table Expression)
sql = """ WITH cte (id_alias) AS (SELECT visitorid AS id_alias FROM table LIMIT 1) SELECT id_alias from cte"""
Hat tip to pybokeh on Github, but this worked for me.
According to IBM (here and here), the behavior of JDBC drivers changed at some point. Bizarrely, the column aliases display just fine when using a tool like DBVisualizer, but not by querying through jaydebeapi.
To fix, add the following to the end of your DB URL:
:useJDBC4ColumnNameAndLabelSemantics=false;
Example:
jdbc:db2://[DBSERVER]:[PORT]/[DBNAME]:useJDBC4ColumnNameAndLabelSemantics=false;
I am trying to upload data from a csv file (its on my local desktop) to my remote SQL database. This is my query
dsn = "dsnname";pwd="password"
import pyodbc
csv_data =open(r'C:\Users\folder\Desktop\filename.csv')
def func(dsn):
cnnctn=pyodbc.connect(dsn)
cnnctn.autocommit =True
cur=cnnctn.cursor()
for rows in csv_data:
cur.execute("insert into database.tablename (colname) value(?)", rows)
cur.commit()
cnnctn.commit()
cur.close()
cnnctn.close()
return()
c=func(dsn)
The problem is that all of my data gets uploaded in one col- that I specified. If I don't specify a col name it won't run. I have 9 cols in my database table and I want to upload this data into separate cols.
When you insert with SQL, you need to make sure you are telling which columns you want to be inserting on. For example, when you execute:
INSERT INTO table (column_name) VALUES (val);
You are letting SQL know that you want to map column_name to val for that specific row. So, you need to make sure that the number of columns in the first parentheses matches the number of values in the second set of parentheses.
My python code produces a table with weeks as columns and rows as urls accessed. To get the data for each cell a query on a mysql database is executed. The code runs very slowly. I've added indexes to the mysql tables and this has not really helped. I thought it was because i was building the html table code with concatenation but even using a list and join has not fixed the speed. The code runs slowly in both django (using an additional database connection) and standalone python. Any help of speeding this up would be appreciated.
example query that to called from a loop:
def get_postcounts(week):
pageviews = 0
cursor = connections['olap'].cursor()
sql = "SELECT SUM(F.pageview) AS pageviews FROM fact_coursevisits F INNER JOIN dim_dates D ON F.Date_Id = D.Id WHERE D.date_week=%d;" % (week)
row_count = cursor.execute(sql);
result = cursor.fetchall()
for row in result:
if row[0] is not None:
pageviews = int(row[0])
cursor.close()
return pageviews
it could be because of the number of queries that you are executing(if you are having to call this method a lot).
i would suggest querying view count and the week over a certain period in one single query and read off the results.