I have an sql file locally stored in my PC. I want to open and read it using the pandas library. Here it iswhat I have tried:
import pandas as pd
import sqlite3
my_file = 'C:\Users\me\Downloads\\database.sql'
#I am creating an empty database
conn = sqlite3.connect(r'C:\Users\test\Downloads\test.db')
#I am reading my file
df = pd.read_sql(my_file, conn)
However, I am receiving the following error:
DatabaseError: Execution failed on sql 'C:\Users\me\Downloads\database.sql': near "C": syntax error
Try moving the file to D://
Sometimes Python is not granted access to read/write in C.
Hence may be that is an issue.
You can also try alternative method using cursors.
cur=conn.cursor()
r=cur.fetchall()
This r would contain a tuple of your dataset.
Related
I want to connect Oracle Db via python and take query results data and create excel or csv reports by using these data. I never tried before and did not see anyone who did something like this around me, do you have any recommendations or ideas for that case?
Regards
You can connect Oracle db with python cx_Oracle library using syntax below for connection string. You should be aware that your connection_oracle_textfile.txt file and your .py file which had your python code must be in the samefolder for start.
connection_oracle_textfile.txt -> username/password#HOST:PORT/SERVICE_NAME(you can find all of them but username and password in tnsnames.ora file)
import cx_Oracle as cx_Oracle
import pandas as pd
def get_oracle_table_from_dbm(sql_text):
if 'connection_oracle' not in globals():
print('connection does not exist. Try to connect it...')
f = open('connection_oracle_textfile.txt', "r")
fx = f.read()
####
global connection_oracle
connection_oracle = cx_Oracle.connect(fx)
####
print('connection established!!')
print('Already have connection. Just fetch data!!')
return pd.read_sql(sql_text, con=connection_oracle)
df=get_oracle_table_from_dbm('select * from dual')
There are other stackoverflow answers to this, e.g. How to export a table to csv or excel format. Remember to tune cursor.arraysize.
You don't strictly need the pandas library for to create csv files, though you may want it for future data analysis.
The cx_Oracle documentation discussions installation, connection, and querying, amongst other topics.
If you want to read from a CSV file, see Loading CSV Files into Oracle Database.
I want to download data from SQL via python. But, instead of downloading the whole of dataset I only need specific variables.
I am restricted to use only the read_sql from pyodbc
My code is the following:
# call from SQL
import pandas as pd
import pyodbc
conn = pyodbc.connect("""DRIVER={SQL Server};
Server=BXTS131133.eu.rabonet.com\LWID_LAB_03;
Database=CORP_Modelling;
Trusted_connection=yes;""")
SQL1 = 'SELECT * FROM [CORP_Modelling].[LDM_Freeze_1].[JointObligorMonthly]'
Nevertheless, suppose that I want to download only a few variables/attributes from SQL. For example, from the tables sepecified in 'SLQ1' I only want to download:
var_to_download = ['MeasurementPeriodID', 'JointObligorID' ]
I cannot understand how I can modify the above code in order to download only these variables.
I have been attempting to use Python to upload a table into Microsoft SQL Server. I have had great success with smaller tables, but start to get errors when there is a large number of columns or rows. I don't believe it is the filesize that is the issue, but I may be mistaken.
The same error comes up whether the data is from an Excel file, csv file, or query.
When I run the code, it does create a table in SQL Server, but only has the column headers (the rest being blank).
This is the code that I am using, which works for smaller files but gives me the below error for the larger ones:
import pyodbc
#import cx_Oracle
import pandas as pd
from sqlalchemy import create_engine
connstr_Dev = ('DSN='+ODBC_Dev+';UID='+SQLSN+';PWD='+SQLpass)
conn_Dev = pyodbc.connect(connstr_Dev)
cursor_Dev=conn_Dev.cursor()
engine_Dev = create_engine('mssql+pyodbc://'+ODBC_Dev)
upload_file= "M:/.../abc123.xls"
sql_table_name='abc_123_sql'
pd.read_excel(upload_file).to_sql(sql_table_name, engine_Dev, schema='dbo', if_exists='replace', index=False, index_label=None, chunksize=None, dtype=None)
conn_Dev.commit()
conn_Dev.close()
This gives me the following error:
ProgrammingError: (pyodbc.ProgrammingError) ('The SQL contains -13854
parameter markers, but 248290 parameters were supplied', 'HY000') .......
(Background on this error at: http://sqlalche.me/e/f405)
The error log in the provided link doesn't give me any ideas on troubleshooting.
Anything I can tweak in the code to make this work?
Thanks!
Upgrading to pandas 0.23.4 solved it for me. What is your version ?
I am trying to convert a .csv file I've download into a .db so that I can analyze it in DBreaver with SQLite3.
I'm using Anaconda Prompt and python within it.
Can anyone point out where I'm mistaken?
import pandas as pd
import sqlite 3
df = pd.read_csv('0117002-eng.csv')
df.to_sql('health', conn)
And I just haven't been able to figure out how to set up conn appropriately. All the guides I've read have you do something like:
conn = sqlite3.connect("file.db")
But, as I mentioned I have only the csv file. And when I did try to do that, it also doesn't work.
I have a multi-million record SQL table that I'm planning to write out to many parquet files in a folder, using the pyarrow library. The data content seems too large to store in a single parquet file.
However, I can't seem to find an API or parameter with the pyarrow library that allows me to specify something like:
file_scheme="hive"
As is supported by the fastparquet python library.
Here's my sample code:
#!/usr/bin/python
import pyodbc
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq
conn_str = 'UID=username;PWD=passwordHere;' +
'DRIVER=FreeTDS;SERVERNAME=myConfig;DATABASE=myDB'
#----> Query the SQL database into a Pandas dataframe
conn = pyodbc.connect( conn_str, autocommit=False)
sql = "SELECT * FROM ClientAccount (NOLOCK)"
df = pd.io.sql.read_sql(sql, conn)
#----> Convert the dataframe to a pyarrow table and write it out
table = pa.Table.from_pandas(df)
pq.write_table(table, './clients/' )
This throws an error:
File "/usr/local/lib/python2.7/dist-packages/pyarrow/parquet.py", line 912, in write_table
os.remove(where)
OSError: [Errno 21] Is a directory: './clients/'
If I replace that last line with the following, it works fine but writes only one big file:
pq.write_table(table, './clients.parquet' )
Any ideas how I can do the multi-file output thing with pyarrow?
Try pyarrow.parquet.write_to_dataset https://github.com/apache/arrow/blob/master/python/pyarrow/parquet.py#L938.
I opened https://issues.apache.org/jira/browse/ARROW-1858 about adding some more documentation about this.
I recommend seeking support for Apache Arrow on the mailing list dev#arrow.apache.org. Thanks!