I have my dataframe in python environment, which i want to push it to Hana from python environment.
I am trying to do line by line pushing to Hana, that's not happening.However if there a way to push full dataframe to hana in one go, thats the best.
However, for now i am not able to push dynamically the values of dataframe ,line by line:
Here is the python code i tried, so far in best But unfortunately not working :( :
cursor = conn.cursor()
cursor.execute('CREATE TABLE DS.BM_TEXT("Var_ID" VARCHAR (255),"Start_Date"
varchar(255),"End_Date" varchar(255),"ID" varchar(255),"Text" varchar(255))')
conn = dbapi.connect(address="hana1345.lab1.abc.com", port=30015, user='SHEEMAZ',
password='Hello1')
sql_insert_query = """ INSERT INTO DS.BM_TEXT VALUES (%s,%s,%s,%s,%s)"""
insert_tuple_2 = ("2", "Emma", "2019-05-19", "9500","22")
cursor.execute(sql_insert_query, insert_tuple_2)
Error i am getting is :
ProgrammingError: (257, 'sql syntax error: incorrect syntax near "%": line 1 col 53 (at pos 53)')
Appreciate all help.
I am not positive on what module you are using for your db api. But usually ? is the placeholder. Without explicitly calling .format on your string it may not actually insert your sql_insert_query into the string. I could be wrong but I am guessing that is the problem.
As for sending everything at once; it can be done with executemany(). You need an iterable structured like this:
insert_list = [("2", "Emma", "2019-05-19", "9500","22"),("3", "Smith", "2019-05-19", "9500","22")]
To send it to the database use this query:
cursor.executemany("""INSERT INTO DS.BM_TEXT VALUES (?,?,?,?,?);""", insert_list)
This will put the whole iterable into the table. It still does it line by line I believe, but does the heavy lifting for you. If your dataframe is not structured like this you could create an iterable class/function that yields the data in that format from your df.
Related
I am trying to use pyodbc to update an existing MS Access database table with a very long multiline string. The string is actually a csv that has been turned into a string.
The query I am trying to use to update the table is as follows:
query = """
UPDATE Stuff
SET Results = '{}'
WHERE AnalyteName =
'{}'
""".format(df, analytename)
The full printed statement looks as follows:
UPDATE Stuff
SET Results =
'col a,col b,col c,...,col z,
Row 1,a1,b1,c1,
...,...,...,...,
Row 3000,a3000,b3000,c3000'
WHERE AnalyteName = 'Serotonin'
However this does not seem to be working, and I keep getting the following error:
pyodbc.ProgrammingError: ('42000', '[42000] [Microsoft][ODBC Microsoft Access Driver] Syntax error in UPDATE statement. (-3503) (SQLExecDirectW)')
Which I assume is due to the format of the csv string I am trying to use to update the table with.
I have tried using INSERT and inserting a new row with the csv string and other relevant information and that seems to work. However, I need to use UPDATE as I will eventually be adding other csv strings to these columns. This leads me to believe that there is A) Something is wrong with the syntax of my UPDATE query (I am new to SQL syntax) or B) I am missing something from the documentation regarding UPDATE queries.
Is executing an UPDATE query like this possible? If so, where am I going wrong?
It would be determined by the table's field type.
For large amounts of text you'd need a blob field in your database table.
A blob field will store binary info so using blob will not 'see' illegal characters.
Answering my own question in case anyone else wants to use this.
It turns out what I was missing was brackets around the table column fields from my UPDATE statement. My final code looked something like this.
csv = df.to_csv(index=False)
name = 'some_name'
query = """
UPDATE Stuff
SET
[Results] = ?
WHERE
[AnalyteName] = ?
"""
self.cursor.execute(query, (csv, name))
I've seen several other posts here where brackets were not around the column names. However, since this is MS Access, I believe they were required for this query, or rather this specific query since it included a very long strong in the SET statement.
I welcome anyone else here to provide a more efficient method of performing this task or someone else who can provide more insight into why this is what worked for me.
I've now spent countless of hours troubleshooting this error, before making a post here, to no avail.
Here's what I'm trying to do:
Import data from a CSV file into an SQL database using Python and psycopg2. Ideally without making any changes to the CSV file.
Here's my issue:
TLDR: DB is set up for varchar/string data, yet somehow I'm getting a "invalid input syntax for type integer:" error.
Here's the code that successfully creates the "Customers_1" table:
cur.execute('DROP TABLE IF EXISTS customers_1;')
cur.execute('CREATE TABLE customers_1 (customer_id SERIAL NOT NULL PRIMARY KEY,'
'Kundenummer varchar(100),'
'Start_year integer,'
'Navn varchar(100),'
'Land varchar(100),'
'Segment varchar(20));'
)
Everthing should be in order here, and when I run a basic INSERT INTO command I can enter the values just fine as seen in this screenshot from pgAdmin:
customers_1 table with 1 row successfully added
I attempt to import the CSV data using the "copy_from" function as seen here:
f = open(r'Eksamen2\customer_segment.csv', 'r')
cur.copy_from(f, 'customers_1', sep=',')
f.close()
My CSV file includes a header, however, removing it did not fix the issue. Here's a look at the first two lines of the CSV file:
Kundenummer,Start_year,Navn,Land,Segment
K816058744,2019,GearBicycle ,Norge,Medium
The program returns this error message:
cur.copy_from(f, 'customers_1', sep=',')
psycopg2.errors.InvalidTextRepresentation: invalid input syntax for type integer: "Kundenummer"
CONTEXT: COPY customers_1, line 1, column customer_id: "Kundenummer"
I don't understand why it is expecting an integer here, as the column "Kundenummer" is a varchar.
I'm considering simply turning the SQL text into a query and creating a for loop that goes through the CSV file, however that seems more complicated than simply using the copy_from function.
Any help would be greatly appreciated!
Bonus question: I'd like to ditch the "customer_id" PK in favour of simply using the "Kundenummer" column as the PK. However, I'm unable to create a SERIAL for that PK that would automatically generate a customer number starting with K, followed by 8 or 9 numbers, like the customer IDs in my CSV file.
Using Python and psycopg2 I am trying to build a dynamic SQL query to insert rows into tables.
The variables are:
1. Table name
2. Variable list of column names
3. Variable list of values, ideally entering multiple rows in one statement
The problems I have come across are the treatment of string literals from Python to SQL and psycopg2 trying to avoid you exposing your code to SQL injection attacks.
Using the SQL module from psycopg2, I have resolved dynamically adding the Table name and List of columns. However I am really struggling with adding the VALUES. Firstly the values are put into the query as %(val)s and seem to be passed literally like this to the database, causing an error.
Secondly, I would then like to be able to add multiple rows at once.
Code below. All help much appreciated :)
import psycopg2 as pg2
from psycopg2 import sql
conn = pg2.connect(database='my_dbo',user='***',password='***')
cols = ['Col1','Col2','Col3']
vals = ['val1','val2','val3']
#Build query
q2 = sql.SQL("insert into my_table ({}) values ({})") \
.format(sql.SQL(',').join(map(sql.Identifier, cols)), \
sql.SQL(',').join(map(sql.Placeholder,vals)))
When I print this string as print(q2.as_string(conn)) I get:
insert into my_table ("Col1","Col2","Col3") values %(val1)s,%(val2)s,%(val3)s
And then when i try and a execute such a string I get the following error:
ProgrammingError: syntax error at or near "%"
LINE 1: ... ("Col1","Col2","Col3") values (%(val1)s...
^
Ok I solved this. Firstly use Literal rather than Placeholder, secondly put your row values together as tuples within a tuple, loop through adding each tuple to a list as literals and then drop in at the end when building the query.
this seems like a basic function, but I'm new to Python so maybe I'm not googling this correctly.
In Microsoft SQL Server, when you have a statement like
SELECT top 100 * FROM dbo.Patient_eligibility
you get a result like
Patient_ID | Patient_Name | Patient_Eligibility
67456 | Smith, John | Eligible
...
etc.
Etc.
I am running a connection to SQL through Python as such, and would like the output to look exactly the same as in SQL. Specifically - with column names and all the data rows specified in the SQL query. It doesn't have to appear in the console or the log, I just need a way to access it to see what's in it.
Here is my current code attempts:
import pyodbc
conn = pyodbc.connect(connstr)
cursor = conn.cursor()
sql = "SELECT top 100 * FROM [dbo].[PATIENT_ELIGIBILITY]"
cursor.execute(sql)
data = cursor.fetchall()
#Query1
for row in data :
print (row[1])
#Query2
print (data)
#Query3
data
My understanding is that somehow the results of PATIENT_ELIGIBILITY are stored in the variable data. Query 1, 2, and 3 represent methods of accessing that data that I've googled for - again seems like basic stuff.
The results of #Query1 give me the list of the first column, without a column name in the console. In the variable explorer, 'data' appears as type List. When I open it up, it just says 'Row object of pyodbc module' 100 times, one for each row. Not what I'm looking for. Again, I'm looking for the same kind of view output I would get if I ran it in Microsoft SQL Server.
Running #Query2 gets me a little closer to this end. The results appear like a .csv file - unreadable, but it's all there, in the console.
Running #Query3, just the 'data' variable, gets me the closest result but with no column names. How can I bring in the column names?
More directly, how do i get 'data' to appear as a clean table with column names somewhere? Since this appears a basic SQL function, could you direct me to a SQL-friendly library to use instead?
Also note that neither of the Queries required me to know the column names or widths. My entire method here is attempting to eyeball the results of the Query and quickly check the data - I can't see that the Patient_IDs are loading properly if I don't know which column is patient_ids.
Thanks for your help!
It's more than 1 question, I'll try help you and give advice.
I am running a connection to SQL through Python as such, and would like the output to look exactly the same as in SQL.
You are mixing SQL as language and formatted output of some interactive SQL tool.
SQL itself does not have anything about "look" of data.
Also note that neither of the Queries required me to know the column names or widths. My entire method here is attempting to eyeball the results of the Query and quickly check the data - I can't see that the Patient_IDs are loading properly if I don't know which column is patient_ids.
Correct. cursor.fetchall returns only data.
Field informations can be read from cursor.description.
Read more in PEP-O249
how do i get 'data' to appear as a clean table with column names somewhere?
It depends how do you define "appear".
You want: text output, html page or maybe GUI?
For text output: you can read column names from cursor.description and print them before data.
If you want html/excel/pdf/other - find some library/framework suiting your taste.
If you want an interactive experience similar to SQL tools - I recommend to look on jupyter-notebook + pandas.
Something like:
pandas.read_sql_query(sql)
will give you "clean table" nothing worse than SQLDeveloper/SSMS/DBeaver/other gives.
We don't need any external libraries.
Refer to this for more details.
Print results in MySQL format with Python
However, the latest version of MySQL gives an error to this code. So, I modified it.
Below is the query for the dataset
stri = "select * from table_name"
cursor.execute(stri)
data = cursor.fetchall()
mycon.commit()
Below it will print the dataset in tabular form
def columnnm(name):
v = "SELECT LENGTH("+name+") FROM table_name WHERE LENGTH("+name+") = (SELECT MAX(LENGTH("+name+")) FROM table_name) LIMIT 1;"
cursor.execute(v)
data = cursor.fetchall()
mycon.commit()
return data[0][0]
widths = []
columns = []
tavnit = '|'
separator = '+'
for cd in cursor.description:
widths.append(max(columnnm(cd[0]), len(cd[0])))
columns.append(cd[0])
for w in widths:
tavnit += " %-"+"%ss |" % (w,)
separator += '-'*w + '--+'
print(separator)
print(tavnit % tuple(columns))
print(separator)
for row in data:
print(tavnit % row)
print(separator)
I am attempting to insert parsed dta data into a postgresql database with each row being a separate variable table, and it was working until I added in the second row "recodeid_fk". The error I now get when attempting to run this code is: pg8000.errors.ProgrammingError: ('ERROR', '42601', 'syntax error at or near "imp"').
Eventually, I want to be able to parse multiple files at the same time and insert the data into the database, but if anyone could help me understand whats going on now it would be fantastic. I am using Python 2.7.5, the statareader is from pandas 0.12 development records, and I have very little experience in Python.
dr = statareader.read_stata('file.dta')
a = 2
t = 1
for t in range(1,10):
z = str(t)
for date, row in dr.iterrows():
cur.execute("INSERT INTO tblv00{} (data, recodeid_fk) VALUES({}, {})".format(z, str(row[a]),29))
a += 1
t += 1
conn.commit()
cur.close()
conn.close()
To your specific error...
The syntax error probably comes from strings {} that need quotes around them. execute() can take care of this for you automtically. Replace
execute("INSERT INTO tblv00{} (data, recodeid_fk) VALUES({}, {})".format(z, str(row[a]),29))
execute("INSERT INTO tblv00{} (data, recodeid_fk) VALUES(%s, %s)".format(z), (row[a],29))
The table name is completed the same way as before, but the the values will be filled in by execute, which inserts quotes if they are needed. Maybe execute could fill in the table name too, and we could drop format entirely, but that would be an unusual usage, and I'm guessing execute might (wrongly) put quotes in the middle of the name.
But there's a nicer approach...
Pandas includes a function for writing DataFrames to SQL tables. Postgresql is not yet supported, but in simple cases you should be able to pretend that you are connected to sqlite or MySQL database and have no trouble.
What do you intend with z here? As it is, you loop z from '1' to '9' before proceeding to the next for loop. Should the loops be nested? That is, did you mean to insert the contents dr into nine different tables called tblv001 through tblv009?
If you mean that loop to put different parts of dr into different tables, please check the indentation of your code and clarify it.
In either case, the link above should take care of the SQL insertion.
Response to Edit
It seems like t, z, and a are doing redundant things. How about:
import pandas as pd
import string
...
# Loop through columns of dr, and count them as we go.
for i, col in enumerate(dr):
table_name = 'tblv' + string.zfill(i, 3) # e.g., tblv001 or tblv010
df1 = DataFrame(dr[col]).reset_index()
df1.columns = ['data', 'recodeid_fk']
pd.io.sql.write_frame(df1, table_name, conn)
I used reset_index to make the index into a column. The new (sequential) index will not be saved by write_frame.