I'm exploring DuckDB for one of my project.
Here I have a sample Database file downloaded from https://www.wiley.com/en-us/SQL+for+Data+Scientists%3A+A+Beginner%27s+Guide+for+Building+Datasets+for+Analysis-p-9781119669364
I'm trying to import FarmersMarketDatabase into my DuckDB database.
con.execute("IMPORT DATABASE 'FarmersMarketDatabase'")
It throws out an error as:
RuntimeError: IO Error: Cannot open file "FarmersMarketDatabase\schema.sql": The system cannot find the path specified.
How to load the databases into DuckDB?
As I couldn’t edit my answer above:
I had a brief look into the download you referred to.
According to the screenshots included in the zip-download it seems to be a MySQL database dump file (as the tool shown in the screenshots is MySQL Workbench).
DuckDB import/export function supports its own structure and format of a dump.
With the above mentioned extension duckDB can access SQLite database files directly. With another extension PostgreSQL is also supported, but currently no support of MySQL.
In Python you could use Mysqldump to access the file and save the tables as dataframes.
DuckDB can query such dataframes directly.
If you prefer, with a GUI such as dbeaver, you can also access the MySQL dump file and copy the tables to a duckDB database file.
I am not sure if you can load .sql file into DuckDB directly. Maybe you can first export sqlite database to CSV file(s). Then load CSV file(s) into DuckDB: https://duckdb.org/docs/data/csv.
You may want to install the related extension first:
https://github.com/duckdblabs/sqlitescanner
The import/export feature seems to support DuckDB format only.
Related
I've got some python code that runs ~15 select queries use sqlalchemy and gets the data. I'd like to dump all of these into a single .sql file so that I can later load this into a MySQL databae. I can't find any docs on how to do this.
I have basic csv report that is produced by other team on a daily basis, each report has 50k rows, those reports are saved on sharedrive everyday. And I have Oracle DB.
I need to create autoscheduled process (or at least less manual) to import those csv reports to Oracle DB. What solution would you recommend for it?
I did not find such solution in SQL Developer, since it is upload from file and not a query. I was thinking about python cron script, that will autoran on a daily basis and transform csv report to txt with needed SQL syntax (insert into...) and then python will connect to Oracle DB and will ran txt file as SQL command and insert data.
But this looks complicated.
Maybe you know other solution that you would recommend yo use?
Create an external table to allow you to access the content of the CSV as if it were a regular table. This assumes the file name does not change day-to-day.
Create a scheduled job to import the data in that external table and do whatever you want with it.
One common blocking issue that prevents using 'external tables' is that external tables require the data to be on the computer hosting the database. Not everyone has access to those servers. Or sometimes the external transfer of data to that machine + the data load to the DB is slower than doing a direct path load from the remote machine.
SQL*Loader with direct path load may be an option: https://docs.oracle.com/en/database/oracle/oracle-database/19/sutil/oracle-sql-loader.html#GUID-8D037494-07FA-4226-B507-E1B2ED10C144 This will be faster than Python.
If you do want to use Python, then read the cx_Oracle manual Batch Statement Execution and Bulk Loading. There is an example of reading from a CSV file.
I have a SQL file titled "DreamMarket2017_product.sql". I believe it's MySQL.
How do I read this file into a Jupyter Notebook using PyMySQL? Or, should I use Psycopg2?
I'm much more familiar w/ Psycopg2 than PyMySQL.
Both PyMySQL and Psycopg request a database name. There is no database. I solely have the files.
Do I need to create a database using a GUI like Pgadmin2 and load those the SQL tables into the newly created database?
Also, I'm still waiting to hear from the university that created the dataset.
Yes, u need to create a database and load data into table or import table backup u have
connection = psycopg2.connect(user = "dummy",password = "1234",host = "any",port = "1234",database = "demo")
I have a large 3 GB CSV file, and I'd like to use Blaze to investigate the data, select down to the data I'm interesting in analyzing, with the eventual goal to migrate that data into a suitable computational backend such as SQlite, PostgresSQL etc. I can get that data into Blaze and work on it fine, but this is the part I'm having trouble with:
db = odo(bdata, 'sqlite:///report.db::report')`
I'm not sure how to properly create a db file to open with sqlite.
You can go directly from CSV to sqlite using the directions listed here.
http://odo.pydata.org/en/latest/perf.html?highlight=sqlite#csv-sqlite3-57m-31s
I think you are missing the column names as warned about here: http://odo.pydata.org/en/latest/sql.html?highlight=sqlite
dshape = discover(resource('report_2015.csv'))
t = odo('report_2015.csv', 'sqlite:///report.db::report', dshape=dshape)
I have a db file that I would like to open, process the data within and re-save as another db file to be inserted into a MySQLdb database.
I have read that the only way to open a db file is with SQLlite. I'm working in Ubuntu 11.04.
I need to write the process code in Python.
What is the correct conceptual procedure to do this?
I would recommend sqlalchemy for this type of problem. You can use it to
Open your SQLite3 DB and figure out the schema
Save that schema as a sqlalchemy model
< OPTIONAL do any processing you like >
Using the same sqlalchemy model from 1, open a MySQL connection, create the tables and load the data
Note I - you can do all this with the django ORM too - but the SQLAlchemy route will allow you to have less redundant code and more future flexibility.
Note II - sqlautocode can help you with 1.