Python, Postgresql trouble copying from csv - python

I am doing this all as a test. I want to take a csv file that has headers and copy the values into a postgresql database table. The tables columns are named the same as the headers in csv file case-sensitive. table has two columns "pkey", "m". the csv just has the "m" for header. pkey is just the primary key setup to auto increment. As a test i just want to copy the "m" column in the csv file the table.
import csv
import psycopg2
database = psycopg2.connect ( database = "testing", user="**",
password="**", host="**", port="**")
ocsvf = open("sample.csv")
def merger(conn, table_name, file_object):
cursor = conn.cursor()
cursor.copy_from(file_object, table_name, sep=',', columns=('mls'))
conn.commit()
cursor.close()
try:
merger(database, 'tests', ocsvf)
finally:
database.close()
when i try to run the code i get this as a error
Traceback (most recent call last):
File "csvtest.py", line 26, in <module>
merger(database, 'tests', ocsvf)
File "csvtest.py", line 21, in merger
cursor.copy_from(file_object, table_name, sep=',', columns=('m'))
psycopg2.ProgrammingError: column "m" of relation "tests" does not exist
I am sure its something simple that i just keep over looking but i have also googled this and the one thing i found was someone said it might be the primary key is setup right but i tested it and the primary keys works fine when i do manual input from pgadmin. any help would be great thanks

In this line:
cursor.copy_from(file_object, table_name, sep=',', columns=('mls'))
The ('mls') evaluated to "mls" which eventually means that iterating over it will result in 3 items ['m','l','s'].
You should write this line as follows:
cursor.copy_from(file_object, table_name, sep=',', columns=('mls',))
The expression ('mls',) evaluated to a tuple with one item: "mls", which is what I guess you meant to do.

Related

inserting dataset entries into PostgreSQL database server

I have a problem with attempting to pipeline some entries into a Postgresql database. The loader is in this file movie_loader.py provided to me:
import csv
"""
This program generates direct SQL statements from the source Netflix Prize files in order
to populate a relational database with those files’ data.
By taking the approach of emitting SQL statements directly, we bypass the need to import
some kind of database library for the loading process, instead passing the statements
directly into a database command line utility such as `psql`.
"""
# The INSERT approach is best used with a transaction. An introductory definition:
# instead of “saving” (committing) after every statement, a transaction waits on a
# commit until we issue the `COMMIT` command.
print('BEGIN;')
# For simplicity, we assume that the program runs where the files are located.
MOVIE_SOURCE = 'movie_titles.csv'
with open(MOVIE_SOURCE, 'r+', encoding='iso-8859-1') as f:
reader = csv.reader(f)
for row in reader:
id = row[0]
year = 'null' if row[1] == 'NULL' else int(row[1])
title = ', '.join(row[2:])
# Watch out---titles might have apostrophes!
title = title.replace("'", "''")
print(f'INSERT INTO movie VALUES({id}, {year}, \'{title}\');')
sys.stdout.reconfigure(encoding='UTF08')
# We wrap up by emitting an SQL statement that will update the database’s movie ID
# counter based on the largest one that has been loaded so far.
print('SELECT setval(\'movie_id_seq\', (SELECT MAX(id) from movie));')
# _Now_ we can commit our transation.
print('COMMIT;')
However, when attempting to pipeline this file into my database, I get the following error, which seems to be some kind of encoder error. I am using git bash as my terminal.
$ python3 movie_loader.py | psql postgresql://localhost/postgres
stdin is not a tty
Traceback (most recent call last):
File "C:\Users\dhuan\relational\movie_loader.py", line 28, in <module>
print(f'INSERT INTO movie VALUES({id}, {year}, \'{title}\');')
OSError: [Errno 22] Invalid argument
Exception ignored in: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='cp1252'>
OSError: [Errno 22] Invalid argument
It seems as if maybe my dataset has an error? I'm not sure specifically what the error is pointing at. Any insight is appreciated

Apache Superset not loading table records/columns

I am trying to add a table in Superset. The other tables get added properly, meaning the columns are fetched properly by Superset. But for my table booking_xml, it does not load any columns.
The description of table is
After adding this table, when I click on the table name to explore it, it gives the following error
Empty query?
Traceback (most recent call last):
File "/home/superset/superset_venv/lib/python3.8/site-packages/superset/viz.py", line 473, in get_df_payload
df = self.get_df(query_obj)
File "/home/superset/superset_venv/lib/python3.8/site-packages/superset/viz.py", line 251, in get_df
self.results = self.datasource.query(query_obj)
File "/home/superset/superset_venv/lib/python3.8/site-packages/superset/connectors/sqla/models.py", line 1139, in query
query_str_ext = self.get_query_str_extended(query_obj)
File "/home/superset/superset_venv/lib/python3.8/site-packages/superset/connectors/sqla/models.py", line 656, in get_query_str_extended
sqlaq = self.get_sqla_query(**query_obj)
File "/home/superset/superset_venv/lib/python3.8/site-packages/superset/connectors/sqla/models.py", line 801, in get_sqla_query
raise Exception(_("Empty query?"))
Exception: Empty query?
ERROR:superset.viz:Empty query?
However, when I try to explore it using the SQL editor, it loads up properly. I have found the difference in the form_data parameter in the URL when loading from tables page and from SQL editor.
URL from SQL Lab view:
form_data={"queryFields":{"groupby":"groupby","metrics":"metrics"},"datasource":"192__table","viz_type":"table","url_params":{},"time_range_endpoints":["inclusive","exclusive"],"granularity_sqla":"created_on","time_grain_sqla":"P1D","time_range":"Last+week","groupby":[],"metrics":["count"],"all_columns":[],"percent_metrics":[],"order_by_cols":[],"row_limit":10000,"order_desc":true,"adhoc_filters":[],"table_timestamp_format":"smart_date","color_pn":true,"show_cell_bars":true}
URL from datasets list:
form_data={"queryFields":{"groupby":"groupby","metrics":"metrics"},"datasource":"191__table","viz_type":"table","url_params":{},"time_range_endpoints":["inclusive","exclusive"],"time_grain_sqla":"P1D","time_range":"Last+week","groupby":[],"all_columns":[],"percent_metrics":[],"order_by_cols":[],"row_limit":10000,"order_desc":true,"adhoc_filters":[],"table_timestamp_format":"smart_date","color_pn":true,"show_cell_bars":true}
When loading from datasets list, /explore_json/ gives 400 Bad Request.
Superset version == 0.37.1, Python version == 3.8
Superset saves the details/metadata of the table that has to be connected. So, in that my table had a very long datatype as you can see in the image in question. Superset saves that as a varchar of length 32. So, the database was not allowing to enter this value into the database. Which was causing the error. Due to that no records were being fetched even after adding the table in the datasources.
What I did was to increase the length of the column datatype.
ALTER TABLE table_columns MODIFY type varchar(200)

MariaDB / MySQL INSERT INTO / ON DUPLICATE KEY UPDATE in Python

I’m trying to INSERT INTO / ON DUPLICATE KEY UPDATE taking the values from one table and inserting into another. I have the following Python code.
try:
cursor.execute("SELECT LocationId, ProviderId FROM CQCLocationDetailsUpdates")
rows = cursor.fetchall()
for row in rows:
maria_cnxn.execute('INSERT INTO CQCLocationDetailsUpdates2 (LocationId, ProviderId) VALUES (%s,%s) ON DUPLICATE KEY UPDATE ProviderId = VALUES(%s)', row)
mariadb_connection.commit()
except TypeError as error:
print(error)
mariadb_connection.rollback()
If I change this script just to INSERT INTO it work fine, the problem seems to be when I add the ON DUPLICATE KEY UPDATE. What do I have wrong? LocationId is the PRIMARY KEY
I get this error.
Traceback (most recent call last):
File "C:/Users/waynes/PycharmProjects/DRS_Dev/CQC_Locations_Update_MariaDB.py", line 228, in <module>
maria_cnxn.execute('INSERT INTO CQCLocationDetailsUpdates2 (LocationId, ProviderId) VALUES (%s,%s) ON DUPLICATE KEY UPDATE ProviderId = VALUES(%s)', row)
File "C:\Users\waynes\PycharmProjects\DRS_Dev\venv\lib\site-packages\mysql\connector\cursor.py", line 548, in execute
stmt = RE_PY_PARAM.sub(psub, stmt)
File "C:\Users\waynes\PycharmProjects\DRS_Dev\venv\lib\site-packages\mysql\connector\cursor.py", line 79, in __call__
"Not enough parameters for the SQL statement")
mysql.connector.errors.ProgrammingError: Not enough parameters for the SQL statement
Your error is because row is a 2 element tuple and your SQL statement requires three %s vars.
It is however possible to use an INSERT .. SELECT .. ON DUPLICATE KEY like:
maria_cnxn.execute('INSERT INTO CQCLocationDetailsUpdates2 (LocationId,
ProviderId)
SELECT LocationId, ProviderId
FROM CQCLocationDetailsUpdates orig
ON DUPLICATE KEY UPDATE CQCLocationDetailsUpdates2.ProviderID = orig.ProviderID')
Whenever you end up doing a loop around a SQL statement you should look to see if there is a SQL way of doing this.

not enough arguments for format string python mysql

I have a trouble with my program. I want input database from file txt. This is my source code
import MySQLdb
import csv
db=MySQLdb.connect(user='root',passwd='toor',
host='127.0.0.1',db='data')
cursor=db.cursor()
csv_data=csv.reader(file('test.txt'))
for row in csv_data:
sql = "insert into `name` (`id`,`Name`,`PoB`,`DoB`) values(%s,%s,%s,%s);"
cursor.execute(sql,row)
db.commit()
cursor.close()
After run that program, here the error
Traceback (most recent call last):
File "zzz.py", line 9, in <module>
cursor.execute(sql,row)
File "/home/tux/.local/lib/python2.7/site-packages/MySQLdb/cursors.py", line 187, in execute
query = query % tuple([db.literal(item) for item in args])
TypeError: not enough arguments for format string
and this is my test.txt
4
zzzz
sby
2017-10-10
Please help, and thanks in advance.
Now that you have posted the CSV file, the error should now be obvious to you - each line contains only one field, not the four that the SQL statement requires.
If that is the real format of your data file, it is not CSV data. Instead you need to read each group of four lines as one record, something like this might work:
LINES_PER_RECORD = 4
SQL = 'insert into `name` (`id`,`Name`,`PoB`,`DoB`) values (%s,%s,%s,%s)'
with open('test.txt') as f:
while True:
try:
record = [next(f).strip() for i in range(LINES_PER_RECORD)]
cursor.execute(SQL, record)
except StopIteration:
# insufficient lines available for record, treat as end of file
break
db.commit()

Error in loading database entries into lists

I'm getting the following error:
Traceback (most recent call last):
File "/home/pi/Nike/test_two.py", line 43, in <module>
do_query()
File "/home/pi/Nike/test_two.py", line 33, in do_query
for(Product,Bin,Size,Color) in records:
ValueError: too many values to unpack
Code:
def do_query():
connection = sqlite3.connect('test_db.db')
cursor = connection.cursor()
cursor.execute("SELECT * FROM TESTER ORDER BY CheckNum")
records = cursor.fetchall()
for(Product,Bin,Size,Color) in records:
row_1.append(Product)
row_2.append(Bin)
row_3.append(Size)
row_4.append(Color)
connection.commit()
cursor.close()
connection.close()
do_query()
I'm trying to load each column of a table into seperate python list. I am using Python, and sqlite3. Why am I getting this error?
You are using "SELECT *" which will return every column from the table. My guess is that the table in question contains more columns then the 4 you specified.
A better way would actually be specifying in the SQL which columns you want so that your code will not break if columns are added to the database.
Something like "SELECT col1, col2 FROM table"
You can run the sqlite3 tool on the db file and then view the table schema with ".schema <table_name>"

Categories