Python - Pandas read sql modifies float values columns - python

I'm trying to use Pandas read_sql to validate some fields in my app.
When i read my db using SQL Developer, i get these values:
603.29
1512.00
488.61
488.61
But reading the same sql query using Pandas, the decimal places are ignored and added to the whole-number part. So i end up getting these values:
60329.0
1512.0
48861.0
48861.0
How can i fix it?

I've found a workaround for now.
Convert the column you want to string, then after you use Pandas you can convert the string to whatever type you want.
Even though this works, it doesn't feel right to do so.

Could you specify what SQL you are using? I just encountered a bit similar problem, which I overcame by defining the datatype more specifically in the query - here the solution for it. I guess you could use the similar approach and see if it works.
In my MySQL database I have four columns with very high precision
When I tried to read them with this query, they were truncated to 5 digits after the decimal delimiter.
query = """select
y_coef,
y_intercept,
x_coef,
x_intercept
from TABLE_NAME"""
df = pd.read_sql(query, connection)
However, when I specified that I want to have them with the precision of 15 digits after the decimal delimiter like below, they were not truncated anymore.
query = """select
cast(y_coef as decimal(15, 15)) as y_coef,
cast(y_intercept as decimal(15, 15)) as y_intercept,
cast(x_coef as decimal(15, 15)) as x_coef,
cast(x_intercept as decimal(15, 15)) as x_intercept
from TABLE_NAME"""
df = pd.read_sql(query, connection)

Related

Oracle/SQL Server Table to a file - Without implicit data conversions

I am fetching data from SQL (Oracle and MS SQL both) databases from a python code using pyodbc and cxOracle packages. Python automatically converts all date time fields in SQL to datetime.datetime. Is there any way I can capture data as is from SQL into a file. Same happens to Null and integer columns as well.
1) Date: Value in DB and expected-- 12-AUG-19 12.00.01.000 -- Python Output: 2019-08-12 00:00:01
2) Null becomes a NaN
3) Integer value 1s and 0s becomes True and False.
I tried to google the issue, and seems like a common issue amongst all packages like pyodbc, cx_oracle, pandas.read_sql as well.
I would like the data appearing exactly the same as in the database.
We are calling a Oracle/SQL Server Stored proc and NOT a SQL query to get this result and we can't change the stored proc. We cannot use CAST in sql query.
Pyodbc fetchall() output is the table in list format. We lose the formatting of the data as soon as it is captured in python.
Could someone help with this issue?
I'm not sure about Oracle, but on the SQL Server side, you could change the command you use so that you capture the results of the stored proc in a temp table, and then you can CAST() the columns of the temp table.
So if you currently call a stored proc on SQL Server like this: EXEC {YourProcName}
Then you could change your command to something like this:
CREATE TABLE #temp
(
col1 INT
,col2 DATETIME
,col3 VARCHAR(20)
);
INSERT INTO #temp
EXEC [sproc];
SELECT
col1 = CAST(col1 AS VARCHAR(20))
,col2 = CAST(FORMAT(col2,'dd-MMM-yy ') AS VARCHAR) + REPLACE(CAST(CAST(col2 AS TIME(3)) AS VARCHAR),':','.')
,col3
FROM #temp;
DROP TABLE #temp
You'll want to create your temp table using the same column names and datatypes that get output from the proc. Then you can CAST() numeric values to VARCHAR, and with dates/datetimes, you can use FORMAT() to define your date string format. The example I have here should result in format you want of 12-AUG-19 12.00.01.000. I couldn't find a single format string that gave me the correct output, so I broke the date and time elements apart, format them in the expected way, and then concatenate the casted values.

HASHBYTES, sha2_256 in sql introduce bad characters when called from python

One of our old sql legacy code, converts a numerical column in sql using the HASHBYTES function and sha2_256.
The entire process is moving to python as we are putting in some advanced usage on top of the legacy work. However, when using connector, we are calling the same sql code, the HASHBYTES('sha2_256',column_name) id returning values with lot of garbage.
Running the code in sql result in this
Column Encoded_Column
101286297 0x7AC82B2779116F40A8CEA0D85BE4AA02AF7F813B5383BAC60D5E71B7BDB9F705
Running same sql query from python result in
Column Encoded_Column
101286297
b"z\xc8+'y\x11o#\xa8\xce\xa0\xd8[\xe4\xaa\x02\xaf\x7f\x81;S\x83\xba\xc6\r^q\xb7\xbd\xb9\xf7\x05"
Code is
Select Column,HASHBYTES('SHA2_256', CONVERT(VARBINARY(8),Column)) as Encoded_Column from table
I have tried usual garbage removal but not helping
You are getting the right result but is displayed as raw bytes (This is why you have the b in b"...").
Looking at the result from SQL you have the data encoded with hexadecimal.
So to transform the python result you can do:
x = b"z\xc8+'y\x11o#\xa8\xce\xa0\xd8[\xe4\xaa\x02\xaf\x7f\x81;S\x83\xba\xc6\r^q\xb7\xbd\xb9\xf7\x05"
x.hex().upper()
And the result will be:
'7AC82B2779116F40A8CEA0D85BE4AA02AF7F813B5383BAC60D5E71B7BDB9F705'
Which is what you had in SQL.
You can read more here about the 0x at the start of the SQL result that is not present in the python code.
And finally, if you are working with pandas you can convert the whole column with:
df["Encoded_Column"] = df["Encoded_Column"].apply(lambda x: x.hex().upper())
# And if you want the '0x' at the start do:
df["Encoded_Column"] = "0x" + df["Encoded_Column"]

Python Replace Quoted Values In External SQL Query

I use the simple query below to select from a table based on the date:
select * from tbl where date = '2019-10-01'
The simple query is part of a much larger query that extracts information from many tables on the same server. I don't have execute access on the server, so I can't install a stored procedure to make my life easier. Instead, I read the query into Python and try to replace certain values inside single quote strings, such as:
select * from tbl where date = '<InForceDate>'
I use a simple Python function (below) to replace with another value like 2019-10-01, but the str.replace() function isn't replacing when I look at the output. However, I tried this with a value like that wasn't in quotes and it worked. I'm sure I'm missing something fundamental, but haven't uncovered why it works without quotes and fails with quotes.
Python:
def generate_sql(sql_path, inforce_date):
with open(pd_sql_path, 'r') as sql_file:
sql_string = sql_file.read()
sql_final = str.replace(sql_string, r'<InForceDate>', inforce_date)
return(sql_final)
Can anyone point me in the right direction?
Nevermind folks -- problem solved, but haven't quite figured out why. File encoding is my guess.

Dynamic SQL query Psycopg2 values problem

Using Python and psycopg2 I am trying to build a dynamic SQL query to insert rows into tables.
The variables are:
1. Table name
2. Variable list of column names
3. Variable list of values, ideally entering multiple rows in one statement
The problems I have come across are the treatment of string literals from Python to SQL and psycopg2 trying to avoid you exposing your code to SQL injection attacks.
Using the SQL module from psycopg2, I have resolved dynamically adding the Table name and List of columns. However I am really struggling with adding the VALUES. Firstly the values are put into the query as %(val)s and seem to be passed literally like this to the database, causing an error.
Secondly, I would then like to be able to add multiple rows at once.
Code below. All help much appreciated :)
import psycopg2 as pg2
from psycopg2 import sql
conn = pg2.connect(database='my_dbo',user='***',password='***')
cols = ['Col1','Col2','Col3']
vals = ['val1','val2','val3']
#Build query
q2 = sql.SQL("insert into my_table ({}) values ({})") \
.format(sql.SQL(',').join(map(sql.Identifier, cols)), \
sql.SQL(',').join(map(sql.Placeholder,vals)))
When I print this string as print(q2.as_string(conn)) I get:
insert into my_table ("Col1","Col2","Col3") values %(val1)s,%(val2)s,%(val3)s
And then when i try and a execute such a string I get the following error:
ProgrammingError: syntax error at or near "%"
LINE 1: ... ("Col1","Col2","Col3") values (%(val1)s...
^
Ok I solved this. Firstly use Literal rather than Placeholder, secondly put your row values together as tuples within a tuple, loop through adding each tuple to a list as literals and then drop in at the end when building the query.

syntax error when attempting to insert data into postgresql

I am attempting to insert parsed dta data into a postgresql database with each row being a separate variable table, and it was working until I added in the second row "recodeid_fk". The error I now get when attempting to run this code is: pg8000.errors.ProgrammingError: ('ERROR', '42601', 'syntax error at or near "imp"').
Eventually, I want to be able to parse multiple files at the same time and insert the data into the database, but if anyone could help me understand whats going on now it would be fantastic. I am using Python 2.7.5, the statareader is from pandas 0.12 development records, and I have very little experience in Python.
dr = statareader.read_stata('file.dta')
a = 2
t = 1
for t in range(1,10):
z = str(t)
for date, row in dr.iterrows():
cur.execute("INSERT INTO tblv00{} (data, recodeid_fk) VALUES({}, {})".format(z, str(row[a]),29))
a += 1
t += 1
conn.commit()
cur.close()
conn.close()
To your specific error...
The syntax error probably comes from strings {} that need quotes around them. execute() can take care of this for you automtically. Replace
execute("INSERT INTO tblv00{} (data, recodeid_fk) VALUES({}, {})".format(z, str(row[a]),29))
execute("INSERT INTO tblv00{} (data, recodeid_fk) VALUES(%s, %s)".format(z), (row[a],29))
The table name is completed the same way as before, but the the values will be filled in by execute, which inserts quotes if they are needed. Maybe execute could fill in the table name too, and we could drop format entirely, but that would be an unusual usage, and I'm guessing execute might (wrongly) put quotes in the middle of the name.
But there's a nicer approach...
Pandas includes a function for writing DataFrames to SQL tables. Postgresql is not yet supported, but in simple cases you should be able to pretend that you are connected to sqlite or MySQL database and have no trouble.
What do you intend with z here? As it is, you loop z from '1' to '9' before proceeding to the next for loop. Should the loops be nested? That is, did you mean to insert the contents dr into nine different tables called tblv001 through tblv009?
If you mean that loop to put different parts of dr into different tables, please check the indentation of your code and clarify it.
In either case, the link above should take care of the SQL insertion.
Response to Edit
It seems like t, z, and a are doing redundant things. How about:
import pandas as pd
import string
...
# Loop through columns of dr, and count them as we go.
for i, col in enumerate(dr):
table_name = 'tblv' + string.zfill(i, 3) # e.g., tblv001 or tblv010
df1 = DataFrame(dr[col]).reset_index()
df1.columns = ['data', 'recodeid_fk']
pd.io.sql.write_frame(df1, table_name, conn)
I used reset_index to make the index into a column. The new (sequential) index will not be saved by write_frame.

Categories