Open sqlite database from http in memory - python

I have code:
from io import BytesIO as Memory
import requests
def download_file_to_obj(url, file_obj):
with requests.get(url, stream=True) as r:
r.raise_for_status()
for chunk in r.iter_content(chunk_size=None):
if chunk:
file_obj.write(chunk)
def main(source_url):
db_in_mem = Memory()
print('Downloading..')
download_file_to_obj(source_url, db_in_mem)
print('Complete!')
with sqlite3.connect(database=db_in_mem.read()) as con:
cursor = con.cursor()
cursor.execute('SELECT * FROM my_table limit 10;')
data = cursor.fetchall()
print(data)
del(db_in_mem)
The my_table exits in source database.
Error:
sqlite3.OperationalError: no such table: my_table
How to load sqlite database to memory from http?

The most common way to force an SQLite database to exist purely in memory is to open the database using the special filename :memory:. In other words, instead of passing the name of a real disk file pass in the string :memory:. For example:
database = sqlite3.connect(":memory:")
When this is done, no disk file is opened. Instead, a new database is created purely in memory. The database ceases to exist as soon as the database connection is closed. Every :memory: database is distinct from every other. So, opening two database connections each with the filename ":memory:" will create two independent in-memory databases.
Note that in order for the special :memory: name to apply and to create a pure in-memory database, there must be no additional text in the filename. Thus, a disk-based database can be created in a file by prepending a pathname, like this: ./:memory:.
See more here: https://www.sqlite.org/inmemorydb.html

You can build on top of this solution and insert the data into an in-memory SQLite database, created with db = sqlite3.connect(":memory:"). You should be able to perform queries from that database.

Related

Is postgres COPY tablename FROM STDIN with csv at risk of SQL injection?

I am using python and pyscopg2.
If I run code below, the user provided csv file will be open and read. Then the content contains in csv file will be transferred to database.
I want to know if the code is at risk of SQL injection when some unexpected words or symbols contain in the csv file.
conn_config = dict(port="5432", dbname="test", password="test")
with psycopg2.connection(**conn_config) as conn:
with conn.cursor() as cur:
with open("test.csv") as f:
cur.copy_expert(sql="COPY test FROM STDIN", file=f)
I read some documents of psycopg2 and postgres, but I did not found the result.
Please know that English is not my native language, and I may make some confusing mistakes
The command simply copies the data into the table. No part of the copied data may be interpreted as an SQL command, so SQL injection is out of the question. Additional security is the rigid CSV format. If the data contains extra (redundant) writes, the command will simply fail. The only risk of the command operation may be strange contents in the table.

How to batch write a table that has 10 million plus records into separate gzip files-Python/Postgres/Airflow

I have a table that has 10 million plus records(rows) in it. I am trying to do a one-time load into s3 by select *'ing the table and then writing it to a gzip file in my local file system. Currently, I can run my script to collect 800,000 records into the gzip file but then I receive an error, and the remainder records are obviously not inserted.
Since there is no continuation in sql (for example- if you run 10 limit 800,000 queries, it wont be in order).
So, is there a way to writer a python/airflow function that can load the 10 million+ table in batches? Perhaps theres a way in python where I can do a select * statement and continue the statement after x amount of records into separate gzip files?
Here is my python/airflow script so far that when ran, it only writers 800,000 records to the path variable:
def gzip_postgres_table(table_name, **kwargs):
path = '/usr/local/airflow/{}.gz'.format(table_name)
server_post = create_tunnel_postgres()
server_post.start()
etl_conn = conn_postgres_internal(server_postgres)
record = get_etl_record(kwargs['master_table'],
kwargs['table_name'])
cur = etl_conn.cursor()
unload_sql = '''SELECT *
FROM schema1.database1.{0} '''.format(record['table_name'])
cur.execute(unload_sql)
result = cur.fetchall()
column_names = [i[0] for i in cur.description]
fp = gzip.open(path, 'wt')
myFile = csv.writer(fp, delimiter=',')
myFile.writerow(column_names)
myFile.writerows(result)
fp.close()
etl_conn.close()
server_postgres.stop()
The best, I mean THE BEST approach to insert so many records into PostgreSQL, or to get them form PostgreSQL, is to use postgresql COPY. This means you would have to change your approach drastically, but there's no better way that I know in PostgreSQL. COPY manual
COPY creates a file with the query you are executing or it can insert into a table from a file.
COPY moves data between PostgreSQL tables and standard file-system
files.
The reason why is the best solution is because your using PostgreSQL default method to handle external data, without intermediaries; so it's fast and secure.
COPY works like a charm with CSV files. You should change your approach to a file handling method and the use of COPY.
Since COPY runs with SQL, you can divide your data using LIMIT and OFFSET in the query. For example:
COPY (SELECT * FROM country LIMIT 10 OFFSET 10) TO '/usr1/proj/bray/sql/a_list_of_10_countries.copy';
-- This creates 10 countries starting in the row 10
COPY only works with files that are accessible with PostgreSQL user in the server.
PL Function (edited):
If you want COPY to be dynamic, you can use the COPY into a PL function. For example:
CREATE OR REPLACE FUNCTION copy_table(
table_name text,
file_name text,
vlimit text,
voffset text
)RETURNS VOID AS $$
DECLARE
query text;
BEGIN
query := 'COPY (SELECT * FROM country LIMIT '||vlimit||' OFFSET '||voffset||') TO '''||file_name||''' DELIMITER '','' CSV';
-- NOTE that file_name has to have its dir too.
EXECUTE query;
END;$$ LANGUAGE plpgsql;
SECURITY DEFINER
LANGUAGE plpgsql;
To execute the function you just have to do:
SELECT copy_table('test','/usr/sql/test.csv','10','10')
Notes:
If the PL will be public, you have to check for SQL injection attacks.
You can program the PL to suit your needs, this is just an example.
The function returns VOID, so it just do the COPY, if you need some feedback you should return something else.
The function has to be owned with user postgres from the server, because it needs file access; that is why it needs SECURITY DEFINER, so that any database user can run the PL.

How to Import a SQL file to Python

I'm attempting to import an sq file that already has tables into python. However, it doesn't seem to import what I had hoped. The only things I've seen so far are how to creata a new sq file with a table, but I'm looking to just have an already completed sq file imported into python. So far, I've written this.
# Python code to demonstrate SQL to fetch data.
# importing the module
import sqlite3
# connect withe the myTable database
connection = sqlite3.connect("CEM3_Slice_20180622.sql")
# cursor object
crsr = connection.cursor()
# execute the command to fetch all the data from the table emp
crsr.execute("SELECT * FROM 'Trade Details'")
# store all the fetched data in the ans variable
ans= crsr.fetchall()
# loop to print all the data
for i in ans:
print(i)
However, it keeps claiming that the Trade Details table, which is a table inside the file I've connected it to, does not exist. Nowhere I've looked shows me how to do this with an already created file and table, so please don't just redirect me to an answer about that
As suggested by Rakesh above, you create a connection to the DB, not to the .sql file. The .sql file contains SQL scripts to rebuild the DB from which it was generated.
After creating the connection, you can implement the following:
cursor = connection.cursor() #cursor object
with open('CEM3_Slice_20180622.sql', 'r') as f: #Not sure if the 'r' is necessary, but recommended.
cursor.executescript(f.read())
Documentation on executescript found here
To read the file into pandas DataFrame:
import pandas as pd
df = pd.read_sql('SELECT * FROM table LIMIT 10', connection)
There are two possibilities:
Your file is not in the correct format and therefore cannot be opened.
The SQLite file can exist anywhere on the disk e.g. /Users/Username/Desktop/my_db.sqlite , this means that you have to tell python exactly where your file is otherwise it will look inside the scripts directory, see that there is no file with the same name and therefore create a new file with the provided filename.
sqlite3.connect expects the full path to your database file or '::memory::' to create a database that exists in RAM. You don't pass it a SQL file. Eg.
connection = sqlite3.connect('example.db')
You can then read the contents of CEM3_Slice_20180622.sql as you would a normal file and execute the SQL commands against the database.

Copy from Oracle to Vertica via CSV: lost rows

I try to load data from Oracle to Vertica via CSV file
Used python,
wrote this script for create CSV from Oracle
csv_file = open("C:\DataBases\csv\%s_%s.csv" % (FILE_NAME, TABLE_NAME), "a", encoding = 'utf-8')
for row in cursor:
count_rows += 1
result_inside = {}
row_content = []
for col, val in zip(col_names, row):
result_inside[col] = val
row_content.append(result_inside[col])
result_select_from_oracle.append(result_inside)
file.write(json.dumps(result_inside,
default = myconverter))
writer = csv.writer(csv_file, delimiter = ';', quoting = csv.QUOTE_ALL)
writer.writerow(row_content)
wrote this script for COPY CSV to Vertica
connection = vertica_python.connect( * * conn_info)
cursor = connection.cursor()
with open("C:\DataBases\csv\%s_%s.csv" % (FILE_NAME, TABLE_NAME), "rb") as fs:
record_terminator = '\n')
" %(SCHEMA_NAME, TABLE_NAME), my_file)
cursor.copy("COPY %s.%s from stdin PARSER fcsvparser(type='traditional', delimiter=';', record_terminator='\n')" % (SCHEMA_NAME, TABLE_NAME), my_file)
connection.commit()
connection.close()
After fineshed operation I had problem
from oracle
Unloaded 40 000 rows
BUT in Vertica 39700 rows.
Where there can be a problem and how to solve it?
COPY statement has two main stages: parsing and loading (there are other stages, but we’ll stick to these two). COPY rejects data only if it encounters problems during its parser phase. That’s when you end up with rejected data.
Potential causes for parsing errors include:
Unsupported parser options
Incorrect data types for the table into which data is being loaded
Malformed context for the parser in use
Missing delimiters
You may want the whole load to fail if even one row is rejected, for that, use the optional parameter ABORT ON ERROR
You may want to limit the number of rejected rows you’ll permit. Use REJECTMAX to set a threshold after which you want COPY to roll back the load process.
Vertica gives you these options to save rejected data:
Do nothing. Vertica automatically saves a rejected data file and an
accompanying explanation of each rejected row (the exception), to
files in a catalog subdirectory called CopyErrorLogs.
Specify file locations of your choice using the REJECTED DATA and
EXCEPTIONS parameters (files will be saved on the machine which you run the script on) .
Save rejected data to a table. Using a table lets you query what
data was rejected, and why. You can then fix any incorrect data, and
reload it.
Vertica recommends saving rejected data to a table which will contain both the rejected data and the exception in one location. Saving rejected data to a table is simple, using the REJECTED DATA AS TABLE reject_table clause in the COPY statement

flask/python: best way to import and parse a sqlite file

At my work we frequently work with sqlite files to perform troubleshooting. I want to create a web page, possibly in flask, that allows users to upload a .sqlite file and automatically have simple, pre-defined queries run.
What is the best way within a Flask application to import a .sqlite file, run queries on it, and then set itself up to repeat the process?
The best way to use an sqlite file with specific queries is using sqlite3 package, Just:
import sqlite3
db = sqlite3.connect('PATH TO FILE')
result = db.execute(query, args)
...
First of all, you need to upload that file to the server, to do so, you can start reading this: http://flask.pocoo.org/docs/patterns/fileuploads/
Then, you can connect to that .sqlite file like this, and then execute queries:
import sqlite3
connection = sqlite3.connect('/path/to/your/sqlite_file')
cursor = connection.cursor()
cursor.execute('my query')
cursor.fetchall() # If you used a select statement
# OR
connection.commit() # If you inserted date for example

Categories