I'm using python and psycopg2 to remotely query some psql databases, and I'm trying to figure out the best way to select the data I need from the remote table, and insert it into a table on a separate DB (local application server).
Most of the stuff I've read has directed me to avoid executemany and look toward COPY operations, but I'm unsure how to implement this on a specific select statement as opposed to the entire table. Should I be headed this way or am I completely off?
but I'm unsure how to implement this on a specific select statement as opposed to the entire table
COPY isn't limited to tables, you can use a query as the source as well, check out the examples in the manual, it shows how to use COPY to create a text file based on a query:
http://www.postgresql.org/docs/current/static/sql-copy.html#AEN59055
(3rd example)
Take a look at http://ryrobes.com/featured-articles/using-a-simple-python-script-for-end-to-end-data-transformation-and-etl-part-1/
Granted, this is pulling from Oracle and inserting into SQL Server, but the concepts should be the same.
Related
I have a table with 30k clients, with the ClientID as primary key.
I'm getting data from API calls and inserting them into the table using python.
I'd like to find a way to insert rows with new clients and, if the ClientID that comes with the API call already exists in the table, update the existing register with the updated information of this client.
Thanks!!
A snippet of code would be nice to show us what exactly you are doing right now. I presume you are using an ORM like SqlAlchemy? If so, then you are looking at doing an UPSERT type of an operation.
That is already answered HERE
Alternatively, if you are executing raw queries without an ORM then you could write a custom procedure and pass required parameters. HERE is a good write up on how that is done in MSSQL under high concurrency. You could use this as a starting point for understanding and then re-write it for PostgreSQL.
I'm starting with Impala SQL and Hadoop and have a (probably simple) question.
I have a Hadoop database with hundrets of tables with the same schema and naming convention (e.g. process_1, process_2, process_3 and so on). How would I query all the tables and concatenate them into one big table or dataframe? Is it possible to do so by using just Impala SQL which returns one dataframe in python?
Something like:
SELECT * FROM 'process_*';
Or do I need to run SHOW TABLES 'process_*', use a loop in python and query each table seperately?
If you are looking purely Impala solution, then one approach would be to create a view on top of all of the tables. Something as below:
create view process_view_all_tables as
select * from process1
union all
select * from process2
union all
...
select * from processN;
The disadvantage with this approach is as below:
You need to union multiple tables together. Union is an expensive operation in terms of memory utilisation. Works ok if you have less number of tables say in range of 2-5 tables.
You need to add all the tables manually. If you a new process table in future, you would need to ALTER the view and then add the new table. This is a maintenance headache.
The view assumes that all the PROCESS tables are of the same schema.
In the Second approach, as you said, you could query the list of tables from Impala using SHOW TABLES LIKE 'process*' and write a small program to iterate over the list of tables and create the files.
Once you have the file generated, you could port the file back to HDFS and create a table on top of it.
The only disadvantage with the second approach is that for every iteration there would impala database requests which is particularly disadvantageous in a multi-tenant database env.
In my opinion, you should try the second approach.
Hope this helps :)
I have a python function (pyfunc):
def pyfunc(x):
...
return someString
I want to apply this function to every item in a mysql table column,
something like:
UPDATE tbl SET mycol=pyfunc(mycol);
This update includes tens of millions of records.
Is there an efficient way to do this?
Note: I cannot rewrite this function in sql or any other programming language.
If your pyfunc does not depend on other data sources like apis or any cache, and just does some data processing like string or mathematical manipulations, or depends on the data stored in same database in mysql, you shall go for MySQL user defined functions.
Lets assume you create a MySQL function called colFunc , then your query would be
Update tbl set mycol=colFunc(mycol)
Just prepare update.sql file using python script.
After this you can check update on your local machine(with dump of db). Just connect to sql and run update.sql script which was prepared from python.
In this case you will use raw sql query without python for updating data.
I think it is not bad solution.
I want to dump oracle objects like tables and stored procedures using cx_Oracle from python ,
is any tutorial how to do this ?
If you are looking for the source code for tables you can use the following:
select DBMS_METADATA.GET_DDL('TABLE','<table_name>') from DUAL;
for stored procedures you can use
select text from all_source where name = '<procedure name>'
In general this is not a cx_Oracle specific problem, just call the oracle specific tables (like all_source) or functions (like get_ddl) and read it in like any other query. There are more of these sorts of tables (like user_source for source that you the specific user own) in Oracle, but I'm doing this off the top of my head and don't have easy access to an Oracle db to remind myself.
I would like to copy the contents of a MySQL database from one server to another using a third server. This could be done from the shell prompt using this:
mysqldump --host=hostname1 --user=username --password="mypwd" acme | mysql --host=hostname2 --user=username --password="mypwd" acme
However, how do I do this from within a Python script without using os.system or any of the other subprocess methods? I've read through the MySQLdb docs, but don't see a way to do a bulk export/import. Thank you!
If you dont want to use mysqldump from the command line (using the os.system methods) you are kind of tied to get the data straight from MySQL and then put it to the other server. In that respect your question looks very similar to Get Insert Statement for existing row in MySQL
you can use a query to get the schema creation sql
SHOW CREATE TABLE MyTable;
And then you need to implement a script that just querys data and inserts it to the other server.
You could also look into third party applications that allows you to copy data from one database to another.