flask/python: best way to import and parse a sqlite file - python

At my work we frequently work with sqlite files to perform troubleshooting. I want to create a web page, possibly in flask, that allows users to upload a .sqlite file and automatically have simple, pre-defined queries run.
What is the best way within a Flask application to import a .sqlite file, run queries on it, and then set itself up to repeat the process?

The best way to use an sqlite file with specific queries is using sqlite3 package, Just:
import sqlite3
db = sqlite3.connect('PATH TO FILE')
result = db.execute(query, args)
...

First of all, you need to upload that file to the server, to do so, you can start reading this: http://flask.pocoo.org/docs/patterns/fileuploads/
Then, you can connect to that .sqlite file like this, and then execute queries:
import sqlite3
connection = sqlite3.connect('/path/to/your/sqlite_file')
cursor = connection.cursor()
cursor.execute('my query')
cursor.fetchall() # If you used a select statement
# OR
connection.commit() # If you inserted date for example

Related

Open sqlite database from http in memory

I have code:
from io import BytesIO as Memory
import requests
def download_file_to_obj(url, file_obj):
with requests.get(url, stream=True) as r:
r.raise_for_status()
for chunk in r.iter_content(chunk_size=None):
if chunk:
file_obj.write(chunk)
def main(source_url):
db_in_mem = Memory()
print('Downloading..')
download_file_to_obj(source_url, db_in_mem)
print('Complete!')
with sqlite3.connect(database=db_in_mem.read()) as con:
cursor = con.cursor()
cursor.execute('SELECT * FROM my_table limit 10;')
data = cursor.fetchall()
print(data)
del(db_in_mem)
The my_table exits in source database.
Error:
sqlite3.OperationalError: no such table: my_table
How to load sqlite database to memory from http?
The most common way to force an SQLite database to exist purely in memory is to open the database using the special filename :memory:. In other words, instead of passing the name of a real disk file pass in the string :memory:. For example:
database = sqlite3.connect(":memory:")
When this is done, no disk file is opened. Instead, a new database is created purely in memory. The database ceases to exist as soon as the database connection is closed. Every :memory: database is distinct from every other. So, opening two database connections each with the filename ":memory:" will create two independent in-memory databases.
Note that in order for the special :memory: name to apply and to create a pure in-memory database, there must be no additional text in the filename. Thus, a disk-based database can be created in a file by prepending a pathname, like this: ./:memory:.
See more here: https://www.sqlite.org/inmemorydb.html
You can build on top of this solution and insert the data into an in-memory SQLite database, created with db = sqlite3.connect(":memory:"). You should be able to perform queries from that database.

pymongo: insert_one() is running but isn't adding anything to mongodb database?

I'm trying to upload a .txt file to a mongodb database collection using PyCharm, but nothing is appearing inside of the collection? Here's the script I'm using at the moment:
from pymongo import MongoClient
client = MongoClient()
db = client.memorizer_data # use a database called "memorizer_data"
collection = db.english # and inside that DB, a collection called "english"
with open('7_1_1.txt', 'r') as f:
text = f.read() # read the txt file
name = '7_1_1.txt'
# build a document to be inserted
text_file_doc = {"file_name": name, "contents": text}
# insert the contents into the "file" collection
collection.insert_one(text_file_doc)
PyCharm gets through the script with no errors, I've also tried printing the acknowledged attribute just to see what comes up:
result = collection.insert_one(text_file_doc)
print(result.acknowledged)
Which is giving me True. I wasn't sure if I was actually connecting to my database, so I tried db.list_collection_names() and my collection 'english' is in the list, so as far as I can tell I am connecting with it?
I'm a newbie to MongoDB so I realize I've probably gone about things the wrong way. At the moment I'm just trying to get the script working for a single .txt file before uploading everything my project is using to the db.
What makes you think there's nothing in the collection? Two ways to check;
In your pymongo code, add a final debug line:
print(collection.find_one())
Or, in the mongodb shell:
use memorizer_data
db.english.findOne()

Migrate csv from gcs to postgresql

I'm trying to migrate csv files from Google Cloud Storage (GCS), which have been exported from BigQuery, to a PostgreSQL Google cloud sql instance using a python script.
I was hoping to use the Google API but found this in the documentation:
Importing CSV data using the Cloud SQL Admin API is not supported for PostgreSQL instances.
As an alternative I could use psycopg2 library and stream the rows of the csv file into the SQL instance. I can do this three ways
Line by line: Read each line and then submit the insert command and then commit
Batch stream: Read each line and then submit the insert commands and then commit after 10 lines or 100 etc.
The entire csv: Read each line and submit the insert commands and then only commit at the end of the document.
My concerns are these csv files could contain millions of rows and running this process for any of the three options mentioned above seems like a bad idea to me.
What alternatives do I have?
Essentially I have some raw data in BigQuery on which we do some preprocessing before exporting to GCS in preparation for importing to the PostgreSQL instance.
I need to export this preprocessed data from BigQuery to the PostgreSQL instance.
This is not a duplicate of this question as I'm preferably looking for the solution which exports data from BigQuery to the PostgreSQL instance wether it be via GCS or direct.
You can do the import process with Cloud Dataflow as suggested by #GrahamPolley. It's true that this solution involves some extra work (getting familiar with Dataflow, setting everything up, etc). Even with the extra work, this would be the preferred solution for your situation. However, other solutions are available and I'll explain one of them below.
To set up a migration process with Dataflow, this tutorial about exporting BigQuery to Google Datastore is a good example
Alternative solution to Cloud Dataflow
Cloud SQL for PostgreSQL doesn't support importing from a .CSV but it does support .SQL files.
The file type for the specified uri.
SQL: The file contains SQL statements.
CSV: The file contains CSV data.
Importing CSV data using the Cloud SQL Admin API is not supported for PostgreSQL instances.
A direct solution would be to convert the .CSV filest to .SQL with some tool (Google doens't provide one that I know of, but there are many online) and then import to the PostgreSQL.
If you want to implement this solution in a more "programatic" way, I would suggest to use Cloud Functions, here is an example of how I would try to do it:
Set up a Cloud Function that triggers when a file is uploaded to a Cloud Storage bucket
Code the function to get the uploaded file and check if it's a .CSV. If it is, use a csv-to-sql API (example of API here) to convert the file to .SQL
Store the new file in Cloud Storage
Import to the PostgreSQL
Before you begin, you should make sure:
The database and table you are importing into must
already exist on your Cloud SQL instance.
CSV file format requirements CSV files must have one line for each row
of data and have comma-separated fields.
Then, you can import data to a Cloud SQL instance using a CSV file present in a GCS bucket following the next steps [GCLOUD]
Describe the instance you are exporting from:
gcloud sql instances describe [INSTANCE_NAME]
Copy the serviceAccountEmailAddress field.
Add the service account to the bucket ACL as a writer:
gsutil acl ch -u [SERVICE_ACCOUNT_ADDRESS]:W gs://[BUCKET_NAME]
Add the service account to the import file as a reader:
gsutil acl ch -u [SERVICE_ACCOUNT_ADDRESS]:R gs://[BUCKET_NAME]/[IMPORT_FILE_NAME]
Import the file
gcloud sql import csv [INSTANCE_NAME] gs://[BUCKET_NAME]/[FILE_NAME] \
--database=[DATABASE_NAME] --table=[TABLE_NAME]
If you do not need to retain the permissions provided by the ACL you set previously, remove the ACL:
gsutil acl ch -d [SERVICE_ACCOUNT_ADDRESS] gs://[BUCKET_NAME]
I found that the pyscopg2 module has copy_from() which allows the loading of an entire csv file instead of streaming the rows individually.
The downside of using this method is that the csv file still needs to be downloaded from the GCS and stored locally.
here are the details of using pyscopg2 'copy_from()'. (From here)
import psycopg2
conn = psycopg2.connect("host=localhost dbname=postgres user=postgres")
cur = conn.cursor()
with open('user_accounts.csv', 'r') as f:
# Notice that we don't need the `csv` module.
next(f) # Skip the header row.
cur.copy_from(f, 'users', sep=',')
conn.commit()
You could just use a class to make the text you are pulling from the internet behave like a file. I have used this several times.
import io
import sys
class IteratorFile(io.TextIOBase):
""" given an iterator which yields strings,
return a file like object for reading those strings """
def __init__(self, obj):
elements = "{}|" * len(obj[0])
elements = (unicode(elements[:-1]).format(*x) for x in obj)
self._it = elements
self._f = io.cStringIO()
def read(self, length=sys.maxsize):
try:
while self._f.tell() < length:
self._f.write(next(self._it) + "\n")
except StopIteration as e:
# soak up StopIteration. this block is not necessary because
# of finally, but just to be explicit
pass
except Exception as e:
print("uncaught exception: {}".format(e))
finally:
self._f.seek(0)
data = self._f.read(length)
# save the remainder for next read
remainder = self._f.read()
self._f.seek(0)
self._f.truncate(0)
self._f.write(remainder)
return data
def readline(self):
return next(self._it)

How to Import a SQL file to Python

I'm attempting to import an sq file that already has tables into python. However, it doesn't seem to import what I had hoped. The only things I've seen so far are how to creata a new sq file with a table, but I'm looking to just have an already completed sq file imported into python. So far, I've written this.
# Python code to demonstrate SQL to fetch data.
# importing the module
import sqlite3
# connect withe the myTable database
connection = sqlite3.connect("CEM3_Slice_20180622.sql")
# cursor object
crsr = connection.cursor()
# execute the command to fetch all the data from the table emp
crsr.execute("SELECT * FROM 'Trade Details'")
# store all the fetched data in the ans variable
ans= crsr.fetchall()
# loop to print all the data
for i in ans:
print(i)
However, it keeps claiming that the Trade Details table, which is a table inside the file I've connected it to, does not exist. Nowhere I've looked shows me how to do this with an already created file and table, so please don't just redirect me to an answer about that
As suggested by Rakesh above, you create a connection to the DB, not to the .sql file. The .sql file contains SQL scripts to rebuild the DB from which it was generated.
After creating the connection, you can implement the following:
cursor = connection.cursor() #cursor object
with open('CEM3_Slice_20180622.sql', 'r') as f: #Not sure if the 'r' is necessary, but recommended.
cursor.executescript(f.read())
Documentation on executescript found here
To read the file into pandas DataFrame:
import pandas as pd
df = pd.read_sql('SELECT * FROM table LIMIT 10', connection)
There are two possibilities:
Your file is not in the correct format and therefore cannot be opened.
The SQLite file can exist anywhere on the disk e.g. /Users/Username/Desktop/my_db.sqlite , this means that you have to tell python exactly where your file is otherwise it will look inside the scripts directory, see that there is no file with the same name and therefore create a new file with the provided filename.
sqlite3.connect expects the full path to your database file or '::memory::' to create a database that exists in RAM. You don't pass it a SQL file. Eg.
connection = sqlite3.connect('example.db')
You can then read the contents of CEM3_Slice_20180622.sql as you would a normal file and execute the SQL commands against the database.

Saving a Python-produced list as .db file

I have the following lines as part of Python code when working with .db SQLite file:
sql = "SELECT * FROM calculations"
cursor.execute(sql)
results = cursor.fetchall()
where "calculations" is a table I previously created during the execution of my code. When I do
print results
I see
[(1,3.56,7,0.3), (7,0.4,18,1.45), (11,23.18,2,4.44)]
what I need to do is save this output as another .db file named "output_YYYY_MM_DD_HH_MM_SS.db" using the module "datetime" so that when I connect to "output_YYYY_MM_DD_HH_MM_SS.db" and select all I would see an output exactly equal the list above.
Any ideas on how to do this?
Many thanks in advance.
If I remind well, sqlite3 creates a database with connect() if the database does not exist in the directory of the Python script:
"""
1. connect to the database assigning the name you want (use ``datetime`` time-to-string method)
2. execute multiple inserts on the new db to dump the list you have
3. close connection
"""
Feel free to ask if something is unclear.

Categories