I have a problem with attempting to pipeline some entries into a Postgresql database. The loader is in this file movie_loader.py provided to me:
import csv
"""
This program generates direct SQL statements from the source Netflix Prize files in order
to populate a relational database with those files’ data.
By taking the approach of emitting SQL statements directly, we bypass the need to import
some kind of database library for the loading process, instead passing the statements
directly into a database command line utility such as `psql`.
"""
# The INSERT approach is best used with a transaction. An introductory definition:
# instead of “saving” (committing) after every statement, a transaction waits on a
# commit until we issue the `COMMIT` command.
print('BEGIN;')
# For simplicity, we assume that the program runs where the files are located.
MOVIE_SOURCE = 'movie_titles.csv'
with open(MOVIE_SOURCE, 'r+', encoding='iso-8859-1') as f:
reader = csv.reader(f)
for row in reader:
id = row[0]
year = 'null' if row[1] == 'NULL' else int(row[1])
title = ', '.join(row[2:])
# Watch out---titles might have apostrophes!
title = title.replace("'", "''")
print(f'INSERT INTO movie VALUES({id}, {year}, \'{title}\');')
sys.stdout.reconfigure(encoding='UTF08')
# We wrap up by emitting an SQL statement that will update the database’s movie ID
# counter based on the largest one that has been loaded so far.
print('SELECT setval(\'movie_id_seq\', (SELECT MAX(id) from movie));')
# _Now_ we can commit our transation.
print('COMMIT;')
However, when attempting to pipeline this file into my database, I get the following error, which seems to be some kind of encoder error. I am using git bash as my terminal.
$ python3 movie_loader.py | psql postgresql://localhost/postgres
stdin is not a tty
Traceback (most recent call last):
File "C:\Users\dhuan\relational\movie_loader.py", line 28, in <module>
print(f'INSERT INTO movie VALUES({id}, {year}, \'{title}\');')
OSError: [Errno 22] Invalid argument
Exception ignored in: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='cp1252'>
OSError: [Errno 22] Invalid argument
It seems as if maybe my dataset has an error? I'm not sure specifically what the error is pointing at. Any insight is appreciated
Related
I am trying to add a table in Superset. The other tables get added properly, meaning the columns are fetched properly by Superset. But for my table booking_xml, it does not load any columns.
The description of table is
After adding this table, when I click on the table name to explore it, it gives the following error
Empty query?
Traceback (most recent call last):
File "/home/superset/superset_venv/lib/python3.8/site-packages/superset/viz.py", line 473, in get_df_payload
df = self.get_df(query_obj)
File "/home/superset/superset_venv/lib/python3.8/site-packages/superset/viz.py", line 251, in get_df
self.results = self.datasource.query(query_obj)
File "/home/superset/superset_venv/lib/python3.8/site-packages/superset/connectors/sqla/models.py", line 1139, in query
query_str_ext = self.get_query_str_extended(query_obj)
File "/home/superset/superset_venv/lib/python3.8/site-packages/superset/connectors/sqla/models.py", line 656, in get_query_str_extended
sqlaq = self.get_sqla_query(**query_obj)
File "/home/superset/superset_venv/lib/python3.8/site-packages/superset/connectors/sqla/models.py", line 801, in get_sqla_query
raise Exception(_("Empty query?"))
Exception: Empty query?
ERROR:superset.viz:Empty query?
However, when I try to explore it using the SQL editor, it loads up properly. I have found the difference in the form_data parameter in the URL when loading from tables page and from SQL editor.
URL from SQL Lab view:
form_data={"queryFields":{"groupby":"groupby","metrics":"metrics"},"datasource":"192__table","viz_type":"table","url_params":{},"time_range_endpoints":["inclusive","exclusive"],"granularity_sqla":"created_on","time_grain_sqla":"P1D","time_range":"Last+week","groupby":[],"metrics":["count"],"all_columns":[],"percent_metrics":[],"order_by_cols":[],"row_limit":10000,"order_desc":true,"adhoc_filters":[],"table_timestamp_format":"smart_date","color_pn":true,"show_cell_bars":true}
URL from datasets list:
form_data={"queryFields":{"groupby":"groupby","metrics":"metrics"},"datasource":"191__table","viz_type":"table","url_params":{},"time_range_endpoints":["inclusive","exclusive"],"time_grain_sqla":"P1D","time_range":"Last+week","groupby":[],"all_columns":[],"percent_metrics":[],"order_by_cols":[],"row_limit":10000,"order_desc":true,"adhoc_filters":[],"table_timestamp_format":"smart_date","color_pn":true,"show_cell_bars":true}
When loading from datasets list, /explore_json/ gives 400 Bad Request.
Superset version == 0.37.1, Python version == 3.8
Superset saves the details/metadata of the table that has to be connected. So, in that my table had a very long datatype as you can see in the image in question. Superset saves that as a varchar of length 32. So, the database was not allowing to enter this value into the database. Which was causing the error. Due to that no records were being fetched even after adding the table in the datasources.
What I did was to increase the length of the column datatype.
ALTER TABLE table_columns MODIFY type varchar(200)
I am trying to query an Azure storage table to get all rows to turn into a table on a web site, however I cannot get the entries from the table, I get the same error every time "azure.core.exceptions.HttpResponseError: The requested operation is not implemented on the specified resource."
For code I am following the examples here and it is not working as expected.
from azure.data.tables import TableServiceClient
from azure.core.credentials import AzureNamedKeyCredential
def read_storage_table():
credential = AzureNamedKeyCredential(os.environ["AZ_STORAGE_ACCOUNT"], os.environ["AZ_STORAGE_KEY"])
service = TableServiceClient(endpoint=os.environ["AZ_STORAGE_ENDPOINT"], credential=credential)
client = service.get_table_client(table_name=os.environ["AZ_STORAGE_TABLE"])
entities = client.query_entities(query_filter="PartitionKey eq 'tasksSeattle'")
client.close()
service.close()
return entities
Then calling the function.
table = read_storage_table()
for record in table:
for key in record.keys():
print("Key: {}, Value: {}".format(key, record[key]))
And that returns:
Traceback (most recent call last):
File "C:\Program Files\Python310\Lib\site-packages\azure\data\tables\_models.py", line 363, in _get_next_cb
return self._command(
File "C:\Program Files\Python310\Lib\site-packages\azure\data\tables\_generated\operations\_table_operations.py", line 386, in query_entities
raise HttpResponseError(response=response, model=error)
azure.core.exceptions.HttpResponseError: Operation returned an invalid status 'Not Implemented'
Content: {"odata.error":{"code":"NotImplemented","message":{"lang":"en-US","value":"The requested operation is not implemented on the specified resource.\nRequestId:cd29feda-1002-006b-679c-3d39e8000000\nTime:2022-03-22T03:27:00.5993216Z"}}}
Using a similar function I am able to write to the table. But even trying entities = client.list_entities() I get the same error. I'm at a loss.
KrunkFu thank you for identifying and sharing the solution here. Posting the same into answer section to help other community members.
replacing https://<accountname>.table.core.windows.net/<table>, with
https://<accountname>.table.core.windows.net to the query solved the
issue
I'm trying to export a postgres database table as a csv file in my filesystem in Python, but I'm having trouble running the copy query.
import psycopg2
import Config
class postgres_to_s3():
def __init__(self):
app_config = Config.Config
self.pg_conn = app_config.pg_conn
self.pg_cur = app_config.pg_cur
def unload_database_to_CSV(self):
query1 = '\COPY source_checksum TO /Users/Will/Downloads/output.csv WITH (FORMAT CSV, HEADER);'
with open ('/Users/Will/Downloads/output.csv', 'w') as f:
self.pg_cur.copy_expert(query1, f)
s1 = postgres_to_s3()
s1.unload_database_to_CSV()
I get the error:
psycopg2.ProgrammingError: syntax error at or near "\"
LINE 1: \COPY source_checksum TO /Users/Will/Downloads/output.csv
^
I was able to execute the query fine on the psql console. I tried using double backslash but I still get the same error.
EDIT: following this thread I removed the backslash, but now I get the error:
psycopg2.ProgrammingError: syntax error at or near "/"
LINE 1: COPY source_checksum TO /Users/Will/Downloads/output.csv
^
I have a trouble with my program. I want input database from file txt. This is my source code
import MySQLdb
import csv
db=MySQLdb.connect(user='root',passwd='toor',
host='127.0.0.1',db='data')
cursor=db.cursor()
csv_data=csv.reader(file('test.txt'))
for row in csv_data:
sql = "insert into `name` (`id`,`Name`,`PoB`,`DoB`) values(%s,%s,%s,%s);"
cursor.execute(sql,row)
db.commit()
cursor.close()
After run that program, here the error
Traceback (most recent call last):
File "zzz.py", line 9, in <module>
cursor.execute(sql,row)
File "/home/tux/.local/lib/python2.7/site-packages/MySQLdb/cursors.py", line 187, in execute
query = query % tuple([db.literal(item) for item in args])
TypeError: not enough arguments for format string
and this is my test.txt
4
zzzz
sby
2017-10-10
Please help, and thanks in advance.
Now that you have posted the CSV file, the error should now be obvious to you - each line contains only one field, not the four that the SQL statement requires.
If that is the real format of your data file, it is not CSV data. Instead you need to read each group of four lines as one record, something like this might work:
LINES_PER_RECORD = 4
SQL = 'insert into `name` (`id`,`Name`,`PoB`,`DoB`) values (%s,%s,%s,%s)'
with open('test.txt') as f:
while True:
try:
record = [next(f).strip() for i in range(LINES_PER_RECORD)]
cursor.execute(SQL, record)
except StopIteration:
# insufficient lines available for record, treat as end of file
break
db.commit()
I'm trying to set up a simple table existence test for a luigi task using luigi.hive.HiveTableTarget
I create a simple table in hive just to make sure it is there:
create table test_table (a int);
Next I set up the target with luigi:
from luigi.hive import HiveTableTarget
target = HiveTableTarget(table='test_table')
>>> target.exists()
True
Great, next I try it with a table I know doesn't exist to make sure it returns false.
target = HiveTableTarget(table='test_table_not_here')
>>> target.exists()
And it raises an exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.6/site-packages/luigi/hive.py", line 344, in exists
return self.client.table_exists(self.table, self.database)
File "/usr/lib/python2.6/site-packages/luigi/hive.py", line 117, in table_exists
stdout = run_hive_cmd('use {0}; describe {1}'.format(database, table))
File "/usr/lib/python2.6/site-packages/luigi/hive.py", line 62, in run_hive_cmd
return run_hive(['-e', hivecmd], check_return_code)
File "/usr/lib/python2.6/site-packages/luigi/hive.py", line 56, in run_hive
stdout, stderr)
luigi.hive.HiveCommandError: ('Hive command: hive -e use default; describe test_table_not_here
failed with error code: 17', '', '\nLogging initialized using configuration in
jar:file:/opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/jars/hive-common-0.13.1-
cdh5.2.0.jar!/hive-log4j.properties\nOK\nTime taken: 0.822 seconds\nFAILED:
SemanticException [Error 10001]: Table not found test_table_not_here\n')
edited formatting for clarity
I don't understand that last line of the exception. Of course the table is not found, that is the whole point of an existence check. Is this the expected behavior or do I have some configuration issue I need to work out?
Okay so it looks like this may have been a bug in the latest tagged release (1.0.19) but it is fixed on the master branch. The code responsible is the line:
stdout = run_hive_cmd('use {0}; describe {1}'.format(database, table))
return not "does not exist" in stdout
which is changed in the master to be:
stdout = run_hive_cmd('use {0}; show tables like "{1}";'.format(database, table))
return stdout and table in stdout
The latter works fine whereas the former throws a HiveCommandError.
If you want a solution without having to update to the master branch, you could create your own target class with minimal effort:
from luigi.hive import HiveTableTarget, run_hive_cmd
class MyHiveTarget(HiveTableTarget):
def exists(self):
stdout = run_hive_cmd('use {0}; show tables like "{1}";'.format(self.database, self.table))
return self.table in stdout
This will produce the desired output.