I am working on pushing data from DBF files from a UNC to a sql server DB. There are about 50 DBF files, all of which with different schemas. Now I know I can create a program and list all 50 Tables and all 50 DBF files but this is going to take forever. Is there a way to derive the DBF field names somehow to do the insert rather then going through every DBF and typing out every field name in the DBF? Here's the code I have right now that inserts records from two fields in one DBF file.
import pyodbc
from dbfread import DBF
# SQL Server Connection Test
cnxn = pyodbc.connect('DRIVER={SQL Server};SERVER=**********;DATABASE=TEST_DBFIMPORT;UID=test;PWD=test')
cursor = cnxn.cursor()
dir = 'E\\Backups\\'
table = DBF('E:\\Backups\\test.dbf', lowernames=True)
for record in table.records:
rec1 = record['field1']
rec2 = record['field2']
cursor.execute ("insert into tblTest (column1,column2) values(?,?)", rec1, rec2)
cnxn.commit()
Some helpful hints using my dbf package:
import dbf
import os
for filename in os.listdir('e:/backups'):
with dbf.Table('e:/backups/'+filename) as table:
fields = dbf.field_names(table)
for record in table:
values = list(record)
# insert fields, values using odbc
If you want to transfer all fields, then you'll need to calculate the table name, the field names, and the values; some examples:
sql_table = os.path.splitext(filename)[0]
fields = ','.join(fields)
place_holders = ','.join(['?'] * len(fields))
values = tuple(record)
sql = "insert into %s (%s) values(%s)" % (sql_table, fields, place_holders)
curser.execute(sql, *values)
Related
With the next cmds I am trying to upload a csv file where columns are separated by tabs and sometimes null values can be assigned to a column.
conn = psycopg2.connect(host="localhost",
port="5432",
user="postgres",
password="somepwd",
database="mydb",
options="-c search_path=dbo")
...
cur = conn.cursor()
with open(opath, "r") as opath_file:
next(opath_file) # skip the header row
cur.copy_from(opath_file, table_name[3:], null='', columns=cols.split(','))
cols has a string with the column names separated by ','
the table with name table_name[3:] belongs to the dbo schema
This code runs, no error is reported but no data is uploaded. The owner of the db is postgres.
Any ideas?
Would you believe me if the problem was I needed to run
conn.commit()
after the cur.copy_from cmd?
I am new to AWS glue and other AWS stuff. I have a requirement to build an ETL framework for a project.
This is the high-level diagram. I want to understand, instead of creating 400 glue pipelines, can I create a template kind of a thing which is driven by reference data from a postgres aurora/mysql. I am familiar with Python.
Anyone has any ideas on this? Any references, code examples.
We had a config master table in our mysql db. The columns per convenience we had source_table_name as the identifier to fetch appropriate table column names/queries for CREATING STG TABLE, LOAD DATA INTO STG TABLE, INSERT/UPDATE INTO TARGET TABLEs etc.
We have also split the INSERT/UPDATE into two different columns in config master, since we were using ON DUPLICATE KEY to update existing records.
get the source table name, by processing the lambda events which will have landing file name.
Fetch all required data from the config master for the source table name. It would be something like following:
sql_query = "SELECT * FROM {0}.CONFIG_MASTER WHERE src_tbl_name = %s ".format(mydb)
cur.execute(sql_query, (source_fname))
result = cur.fetchall()
for row in result:
stg_table_name = row[1]
tgt_table_name = row[2]
create_stg_table_qry = row[3]
load_data_stg_table_qry = row[4]
insert_tgt_table_qry = row[5]
insert_tgt_table_qry_part_1 = row[6]
insert_tgt_table_qry_part_2 = row[7]
conn.commit()
cur.close()
Pass appropriate parameters to the generic functions as below:
create_stg_table(stg_table_name, create_stg_table_qry, load_data_stg_table_qry)
loaddata(tgt_table_name, insert_tgt_table_qry_part_1, insert_tgt_table_qry_part_2, stg_table_name)
The generic functions would be something like below, this is for aurora RDS, please make changes as needed.
def create_stg_table(stg_table_name, create_stg_table_qry, load_data_stg_table_qry):
cur, conn = connect()
createStgTable1 = "DROP TABLE IF EXISTS {0}.{1}".format(mydb, stg_table_name)
createStgTable2 = "CREATE TABLE {0}.{1} {2}".format(mydb, stg_table_name, create_stg_table_qry)
loadQry = "LOAD DATA FROM S3 PREFIX 's3://' REPLACE INTO TABLE ...".format()
cur.execute(createStgTable1)
cur.execute(createStgTable2)
cur.execute(loadQry)
conn.commit()
conn.close()
def loaddata(tgt_table_name, insert_tgt_table_qry_part_1, insert_tgt_table_qry_part_2, stg_table_name):
cur, conn = connect()
insertQry = "INSERT INTO target table, from the staging table query here"
print(insertQry)
cur.execute(insertQry)
conn.commit()
conn.close()
Hope this gives an idea.
Thanks
I am using python to establish db connection and reading csv file. For each line in csv i want to run a PostgreSQL query and get value corresponding to each line read.
DB connection and file reading is working fine. Also if i run query for hardcoded value then it works fine. But if i try to run query for each row in csv file using python variable then i am not getting correct value.
cursor.execute("select team from users.teamdetails where p_id = '123abc'")
Above query works fine.
but when i try it for multiple values fetched from csv file then i am not getting correct value.
cursor.execute("select team from users.teamdetails where p_id = queryPID")
Complete code for Reference:
import psycopg2
import csv
conn = psycopg2.connect(dbname='', user='', password='', host='', port='')
cursor = conn.cursor()
with open('playerid.csv','r') as csv_file:
csv_reader = csv.reader(csv_file)
for line in csv_reader:
queryPID = line[0]
cursor.execute("select team from users.teamdetails where p_id = queryPID")
team = cursor.fetchone()
print (team[0])
conn.close()
DO NOT concatenate the csv data. Use a parameterised query.
Use %s inside your string, then pass the additional variable:
cursor.execute('select team from users.teamdetails where p_id = %s', (queryPID,))
Concatenation of text leaves your application vulnerable to SQL injection.
https://www.psycopg.org/docs/usage.html
I try to import data from a oracle database to a pandas dataframe.
right Now i am using:
import cx_Oracle
import pandas as pd
db_connection_string = '.../A1#server:port/servername'
con = cx_Oracle.connect(db_connection_string)
query = """SELECT*
FROM Salesdata"""
df = pd.read_sql(query, con=con)
and get the following error: DatabaseError: ORA-00942: Table or view doesn't exist
When I run a query to get the list of all tables:
cur = con.cursor()
cur.execute("SELECT table_name FROM dba_tables")
for row in cur:
print (row)
The output looks like this:
('A$',)
('A$BD',)
('Salesdata',)
What I am doing wrong? I used this question to start.
If I use the comment to print(query)I get:
SELECT*
FROM Salesdata
Getting ORA-00942 when running SELECT can have 2 possible causes:
The table does not exist: here you should make sure the table name is prefixed by the table owner (schema name) as in select * from owner_name.table_name. This is generally needed if the current Oracle user connected is not the table owner.
You don't have SELECT privileges on the table. This is also generally needed if the current Oracle user connected is not the table owner.
You need to check both.
So, I have this empty table which I created (see code below) and I need to load it with data from a csv file, using python-sql connection. As I do this, need to replace the html codes and change to correct datatypes (clean the file) and finally load it into this empty sql table.
This is the code I wrote but, without any success...when I check the table in SQL it just returns an empty table:
Python code:
import csv
with open ('UFOGB_Observations.csv', 'r') as UFO_Obsr:
## Write to the csv file, to clean it and change the html codes:
with open ('UFO_Observations.csv', 'w') as UFO_Obsw:
for line in UFO_Obsr:
line = line.replace(',', ',')
line = line.replace(''', "'")
line = line.replace('!', '!')
line = line.replace('&', '&')
UFO_Obsw.write(line)
##To Connect Python to SQL:
import pyodbc
print('Connecting...')
conn = pyodbc.connect('Trusted_Connection=yes', driver = '{ODBC Driver 13 for SQL Server}', server = '.\SQLEXPRESS', database = 'QA_DATA_ANALYSIS')
print('Connected')
cursor = conn.cursor()
print('cursor established')
cursor.execute('''DROP TABLE IF EXISTS UFO_GB_1;
CREATE TABLE UFO_GB_1 (Index_No VARCHAR(10) NOT NULL, date_time VARCHAR(15) NULL, city_or_state VARCHAR(50) NULL,
country_code VARCHAR(50) NULL, shape VARCHAR (200) NULL, duration VARCHAR(50) NULL,
date_posted VARCHAR(15) NULL, comments VARCHAR(700) NULL);
''')
print('Commands succesfully completed')
#To insert that csv into the table:
cursor.execute('''BULK INSERT QA_DATA_ANALYSIS.dbo.UFO_GB_1
FROM 'F:\GSS\QA_DATA_ANALYSIS_LEVEL_4\MODULE_2\Challenge_2\TASK_2\UFO_Observations.csv'
WITH ( fieldterminator = '', rowterminator = '\n')''')
conn.commit()
conn.close()
I was expecting to see a table with all 1900+ rows, when I type SELECT * FROM table, with correct data types (i.e. date_time and date_posted columns as timestamp)
(Apologies in advance. New here so not allowed to comment.)
1) Why are you creating the table each time? Is this meant to be a temporary table?
2) What do you get as a response to your query?
3) What happens when you break the task down into parts?
Does the code create the table?
If the table already exists and you run just the insert data code does it work? When you import the csv and then write back to the same file does that produce the result you are looking for or crash? What if you wrote to a different file and imported that?
you are writing your queries like how you would in SQL, but you need to re-write them in python. Python needs to understand the query as a python string, then it can parse it into sql. IE don't wrap the statement with '''.
This is not tested, but try something like this:
bulk_load_sql = """
BULK INSERT QA_DATA_ANALYSIS.dbo.UFO_GB_1
FROM 'F:\GSS\QA_DATA_ANALYSIS_LEVEL_4\MODULE_2\Challenge_2\TASK_2\UFO_Observations.csv'
WITH ( fieldterminator = '', rowterminator = '\n')
"""
cursor.execute(bulk_load_sql)
This uses a docstring to put the sql on multiple lines, but you may want to use a regular string.
Here is an answer that goes over formatting your query for pyodbc
https://stackoverflow.com/a/43855693/4788717