Related
I have model written in declarative base of SQL Alchemy.
Class Roles(Base):
__tablename__ = "roles"
__table_args__ = (
Index("roles_name", "name", unique=True),
)
id = Column(Integer, primary_key=True, default=get_uuid())
name = Column(String(10), nullable=False)
As you may have noticed I have set the default value of primary key column id to get_uuid().
def get_uuid():
pk = uuid.uuid4().int >> 64
return pk
The above method return UUID as integer of bit size 64 or less. This is because the column id of this table is set to int and spanner can hold up to 64 bit.
So now to insert a row in this table -
>>> role = Roles()
>>> role.name = "Admin"
>>> session.add(role)
>>> session.commit()
This resulted in following exception -
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/google/api_core/grpc_helpers.py", line 72, in error_remapped_callable
return callable_(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/grpc/_channel.py", line 946, in __call__
return _end_unary_response_blocking(state, call, False, None)
File "/usr/local/lib/python3.10/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.FAILED_PRECONDITION
details = "Could not parse 18011687921562567628 as an integer"
debug_error_string = "UNKNOWN:Error received from peer ipv4:172.19.0.3:9010 {grpc_message:"Could not parse 18011687921562567628 as an integer", grpc_status:9, created_time:"2022-11-12T06:48:36.468914625+00:00"}"
>
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/google/cloud/spanner_dbapi/cursor.py", line 269, in execute
) = self.connection.run_statement(statement)
File "/usr/local/lib/python3.10/site-packages/google/cloud/spanner_dbapi/connection.py", line 454, in run_statement
_execute_insert_heterogenous(
File "/usr/local/lib/python3.10/site-packages/google/cloud/spanner_dbapi/_helpers.py", line 57, in _execute_insert_heterogenous
transaction.execute_update(
File "/usr/local/lib/python3.10/site-packages/google/cloud/spanner_v1/transaction.py", line 302, in execute_update
response = api.execute_sql(
File "/usr/local/lib/python3.10/site-packages/google/cloud/spanner_v1/services/spanner/client.py", line 1096, in execute_sql
response = rpc(
File "/usr/local/lib/python3.10/site-packages/google/api_core/gapic_v1/method.py", line 154, in __call__
return wrapped_func(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/google/api_core/retry.py", line 283, in retry_wrapped_func
return retry_target(
File "/usr/local/lib/python3.10/site-packages/google/api_core/retry.py", line 190, in retry_target
return target()
File "/usr/local/lib/python3.10/site-packages/google/api_core/timeout.py", line 99, in func_with_timeout
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/google/api_core/grpc_helpers.py", line 74, in error_remapped_callable
raise exceptions.from_grpc_error(exc) from exc
google.api_core.exceptions.FailedPrecondition: 400 Could not parse 18011687921562567628 as an integer
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1900, in _execute_context
self.dialect.do_execute(
File "/usr/local/lib/python3.10/site-packages/google/cloud/sqlalchemy_spanner/sqlalchemy_spanner.py", line 1013, in do_execute
cursor.execute(statement, parameters)
File "/usr/local/lib/python3.10/site-packages/google/cloud/spanner_dbapi/cursor.py", line 70, in wrapper
return function(cursor, *args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/google/cloud/spanner_dbapi/cursor.py", line 289, in execute
raise IntegrityError(getattr(e, "details", e)) from e
google.cloud.spanner_dbapi.exceptions.IntegrityError: []
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 1451, in commit
self._transaction.commit(_to_root=self.future)
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 829, in commit
self._prepare_impl()
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 808, in _prepare_impl
self.session.flush()
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 3386, in flush
self._flush(objects)
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 3525, in _flush
with util.safe_reraise():
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/util/langhelpers.py", line 70, in __exit__
compat.raise_(
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/util/compat.py", line 208, in raise_
raise exception
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 3486, in _flush
flush_context.execute()
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/unitofwork.py", line 456, in execute
rec.execute(self)
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/unitofwork.py", line 630, in execute
util.preloaded.orm_persistence.save_obj(
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/persistence.py", line 245, in save_obj
_emit_insert_statements(
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/persistence.py", line 1238, in _emit_insert_statements
result = connection._execute_20(
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1705, in _execute_20
return meth(self, args_10style, kwargs_10style, execution_options)
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/sql/elements.py", line 333, in _execute_on_connection
return connection._execute_clauseelement(
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1572, in _execute_clauseelement
ret = self._execute_context(
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1943, in _execute_context
self._handle_dbapi_exception(
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2124, in _handle_dbapi_exception
util.raise_(
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/util/compat.py", line 208, in raise_
raise exception
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1900, in _execute_context
self.dialect.do_execute(
File "/usr/local/lib/python3.10/site-packages/google/cloud/sqlalchemy_spanner/sqlalchemy_spanner.py", line 1013, in do_execute
cursor.execute(statement, parameters)
File "/usr/local/lib/python3.10/site-packages/google/cloud/spanner_dbapi/cursor.py", line 70, in wrapper
return function(cursor, *args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/google/cloud/spanner_dbapi/cursor.py", line 289, in execute
raise IntegrityError(getattr(e, "details", e)) from e
sqlalchemy.exc.IntegrityError: (google.cloud.spanner_dbapi.exceptions.IntegrityError) []
[SQL: INSERT INTO roles (id, name) VALUES (%s, %s)]
[parameters: [18011687921562567628, 'Admin']]
(Background on this error at: https://sqlalche.me/e/14/gkpj)
What I understood for this is that the spanner is not willing to accept the generated UUID.
status = StatusCode.FAILED_PRECONDITION
details = "Could not parse 18011687921562567628 as an integer"
I have checked the method get_uuid(). It does return int value of but size 64 or less.
The README of this repo suggests creating a table's primary key as Integer and while in inserting a row in the database generate value of primary key in hex. I did exactly the same but it didn't work.
The generated int value is larger than the maximum INT64 value that is allowed in Cloud Spanner:
Max allowed: 9223372036854775807
Your value : 18011687921562567628
See https://cloud.google.com/spanner/docs/reference/standard-sql/data-types#integer_types for more information on the INT64 type.
I'm no Python expert, but my guess is that the int value that you are generating is interpreted as an unsigned int, while the INT64 data type in Cloud Spanner is signed.
EDIT: Add example to get signed value.
My understanding is that you can do the following to get a signed 64-bit integer value from a UUID in Python:
import ctypes
import uuid
ctypes.c_long(uuid.uuid4().int >> 64).value
I'm scraping reddit using praw and storing records in a pandas df. Using a combination of sqlalchemy & pymysql to connect to my AWS RDS db and to_sql to append records to an existing table. All seems to be working fine until I hit the to_sql method. It throws the following errors and i'm not really sure where to go from here. Any help or suggestions would be awesome!
engine = sqlalchemy.create_engine('mysql+pymysql://username:password#database...rds.amazonaws.com:3306/socialdata')
df_comment = pd.DataFrame(comment_table)
df_comment.to_sql(name='reddit_comments', con=engine, index=False, if_exists='append')
Traceback (most recent call last):
File "/Users/ty/Desktop/Python/reddit_scraper.py", line 121, in <module>
df_comment.to_sql(name='reddit_comments', con=engine, index=False, if_exists='append')
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/generic.py", line 2605, in to_sql
sql.to_sql(
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/io/sql.py", line 589, in to_sql
pandas_sql.to_sql(
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/io/sql.py", line 1398, in to_sql
table.insert(chunksize, method=method)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/io/sql.py", line 830, in insert
exec_insert(conn, keys, chunk_iter)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/io/sql.py", line 747, in _execute_insert
conn.execute(self.table.insert(), data)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1011, in execute
return meth(self, multiparams, params)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
ret = self._execute_context(
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
self._handle_dbapi_exception(
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1514, in _handle_dbapi_exception
util.raise_(exc_info[1], with_traceback=exc_info[2])
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
raise exception
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1256, in _execute_context
self.dialect.do_executemany(
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/sqlalchemy/dialects/mysql/mysqldb.py", line 148, in do_executemany
rowcount = cursor.executemany(statement, parameters)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pymysql/cursors.py", line 188, in executemany
return self._do_execute_many(q_prefix, q_values, q_postfix, args,
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pymysql/cursors.py", line 206, in _do_execute_many
v = values % escape(next(args), conn)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pymysql/cursors.py", line 120, in _escape_args
return {key: conn.literal(val) for (key, val) in args.items()}
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pymysql/cursors.py", line 120, in <dictcomp>
return {key: conn.literal(val) for (key, val) in args.items()}
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pymysql/connections.py", line 469, in literal
return self.escape(obj, self.encoders)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pymysql/connections.py", line 462, in escape
return converters.escape_item(obj, self.charset, mapping=mapping)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pymysql/converters.py", line 27, in escape_item
val = encoder(val, mapping)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pymysql/converters.py", line 123, in escape_unicode
return u"'%s'" % _escape_unicode(value)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pymysql/converters.py", line 78, in _escape_unicode
return value.translate(_escape_table)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/praw/models/reddit/base.py", line 35, in __getattr__
return getattr(self, attribute)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/praw/models/reddit/base.py", line 36, in __getattr__
raise AttributeError(
AttributeError: 'Redditor' object has no attribute 'translate'
One of the columns in your DataFrame contains a custom "Redditor" object which doesn't map to a corresponding SQL datatype. pymysql calls the object's translate function when it isn't something obvious like int float or string
If Redditor is just a wrapper object for a store of usernames and other metadata, then you can do something like remapping that column to the string / number representation of the Redditor object. If it is an object you've defined, you can add a translate() function to the Redditor class's definition to return the appropriate value. For example if Redditor.id contains the value that you want to store in the column :-
class Redditor():
def translate(self):
# Change self.id with the value you care about
return self.id
or in pandas before you save
df[REDDITOR_COLUMN] = df[REDDITOR_COLUMN].apply(lambda x: x.id)
I am dealing with a huge table where I have to do query. I decided to do so by chunking my data based on user_id and every time read and write into the sql.
from sqlalchemy import create_engine
engine = create_engine('mysql+pymysql://')
q1 = "SELECT max(id) FROM users"
max_users = pd.read_sql(q1, engine)
max_users = max_users.iloc[0][0]
# since user_ids start from 1 to ... I make the split based on that
data = range(max_users)
chunks = [list(data[x:x+1000]) for x in range(0, len(data), 1000)]
def make_q(userid):
q2 = "SELECT alotofusers from bigtable WHERE userid in (" + str(','.join(str(e) for e in userid)) + ")"
from multiprocessing import Pool, TimeoutError
import time
import os
table_name = "user_type_tmp6"
def f(q):
df = pd.read_sql(q, engine)
df.to_sql(con=engine, name=table_name, if_exists='append')
pool = Pool(processes=10) # start 4 worker processes
pool.map(f, [make_q(item) for item in chunks[0:3]])
In fact my table only get populated by the first chunck but I get the following error
Exception during reset or similar
Traceback (most recent call last):
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/sqlalchemy/pool/base.py", line 680, in _finalize_fairy
fairy._reset(pool)
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/sqlalchemy/pool/base.py", line 867, in _reset
pool._dialect.do_rollback(self)
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/sqlalchemy/dialects/mysql/base.py", line 2302, in do_rollback
dbapi_connection.rollback()
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/pymysql/connections.py", line 430, in rollback
self._read_ok_packet()
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/pymysql/connections.py", line 394, in _read_ok_packet
pkt = self._read_packet()
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/pymysql/connections.py", line 671, in _read_packet
% (packet_number, self._next_seq_id))
pymysql.err.InternalError: Packet sequence number wrong - got 48 expected 1
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1246, in _execute_context
cursor, statement, parameters, context
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 581, in do_execute
cursor.execute(statement, parameters)
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/pymysql/cursors.py", line 170, in execute
result = self._query(query)
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/pymysql/cursors.py", line 328, in _query
conn.query(q)
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/pymysql/connections.py", line 517, in query
self._affected_rows = self._read_query_result(unbuffered=unbuffered)
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/pymysql/connections.py", line 732, in _read_query_result
result.read()
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/pymysql/connections.py", line 1075, in read
first_packet = self.connection._read_packet()
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/pymysql/connections.py", line 671, in _read_packet
% (packet_number, self._next_seq_id))
pymysql.err.InternalError: Packet sequence number wrong - got 114 expected 1
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 733, in _rollback_impl
self.engine.dialect.do_rollback(self.connection)
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/sqlalchemy/dialects/mysql/base.py", line 2302, in do_rollback
dbapi_connection.rollback()
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/pymysql/connections.py", line 429, in rollback
self._execute_command(COMMAND.COM_QUERY, "ROLLBACK")
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/pymysql/connections.py", line 750, in _execute_command
raise err.InterfaceError("(0, '')")
pymysql.err.InterfaceError: (0, '')
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "user_app_usage_type.py", line 90, in f
df = pd.read_sql(q, engine) # index_col = 'user_id'
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/pandas/io/sql.py", line 436, in read_sql
chunksize=chunksize,
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/pandas/io/sql.py", line 1218, in read_query
result = self.execute(*args)
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/pandas/io/sql.py", line 1087, in execute
return self.connectable.execute(*args, **kwargs)
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 2182, in execute
return connection.execute(statement, *multiparams, **params)
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 976, in execute
return self._execute_text(object_, multiparams, params)
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1149, in _execute_text
parameters,
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1250, in _execute_context
e, statement, parameters, cursor, context
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1471, in _handle_dbapi_exception
self._autorollback()
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/sqlalchemy/util/langhelpers.py", line 79, in __exit__
compat.reraise(type_, value, traceback)
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 153, in reraise
raise value
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1471, in _handle_dbapi_exception
self._autorollback()
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 861, in _autorollback
self._root._rollback_impl()
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 735, in _rollback_impl
self._handle_dbapi_exception(e, None, None, None, None)
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1384, in _handle_dbapi_exception
exc_info,
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 398, in raise_from_cause
reraise(type(exception), exception, tb=exc_tb, cause=cause)
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 152, in reraise
raise value.with_traceback(tb)
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 733, in _rollback_impl
self.engine.dialect.do_rollback(self.connection)
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/sqlalchemy/dialects/mysql/base.py", line 2302, in do_rollback
dbapi_connection.rollback()
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/pymysql/connections.py", line 429, in rollback
self._execute_command(COMMAND.COM_QUERY, "ROLLBACK")
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/pymysql/connections.py", line 750, in _execute_command
raise err.InterfaceError("(0, '')")
sqlalchemy.exc.InterfaceError: (pymysql.err.InterfaceError) (0, '')
(Background on this error at: http://sqlalche.me/e/rvf5)
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "user_app_usage_type.py", line 109, in <module>
pool.map(f, [make_q(item) for item in chunks[0:3]])
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/multiprocessing/pool.py", line 268, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/multiprocessing/pool.py", line 657, in get
raise self._value
sqlalchemy.exc.InterfaceError: (pymysql.err.InterfaceError) (0, '')
(Background on this error at: http://sqlalche.me/e/rvf5)
I am guess I am doing the multiprocessing not correct ! Or perhaps the sqlalchemy is not aligned with the pooling.
Update
From my understand by reading this and this suggested by Ilja I updated my worked function as following
def f(q):
engine = create_engine('mysql+pymysql://')
df = pd.read_sql(q, engine, index_col = 'user_id')
df.fillna(0, inplace = True)
df.to_csv('tmp.csv')
df.to_sql(con=engine, name=table_name, if_exists='append' )
engine.dispose()
but now I get errors like
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1246, in _execute_context
cursor, statement, parameters, context
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 581, in do_execute
cursor.execute(statement, parameters)
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/pymysql/cursors.py", line 170, in execute
result = self._query(query)
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/pymysql/cursors.py", line 328, in _query
conn.query(q)
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/pymysql/connections.py", line 517, in query
self._affected_rows = self._read_query_result(unbuffered=unbuffered)
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/pymysql/connections.py", line 732, in _read_query_result
result.read()
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/pymysql/connections.py", line 1075, in read
first_packet = self.connection._read_packet()
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/pymysql/connections.py", line 684, in _read_packet
packet.check_error()
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/pymysql/protocol.py", line 220, in check_error
err.raise_mysql_exception(self._data)
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/pymysql/err.py", line 109, in raise_mysql_exception
raise errorclass(errno, errval)
pymysql.err.InternalError: (1050, "Table 'users_usage_frequency_oly_12' already exists")
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "user_login_type.py", line 103, in f
df.to_sql(con=engine, name=table_name, schema = 'datateam', if_exists='append' ) # dtype={'user_type': Enum('Browser', 'Hoarder', 'Mementor', 'Explorer', 'Lister', 'Scanner') }
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/pandas/core/generic.py", line 2712, in to_sql
method=method,
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/pandas/io/sql.py", line 518, in to_sql
method=method,
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/pandas/io/sql.py", line 1319, in to_sql
table.create()
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/pandas/io/sql.py", line 656, in create
self._execute_create()
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/pandas/io/sql.py", line 638, in _execute_create
self.table.create()
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/sqlalchemy/sql/schema.py", line 870, in create
bind._run_visitor(ddl.SchemaGenerator, self, checkfirst=checkfirst)
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 2049, in _run_visitor
conn._run_visitor(visitorcallable, element, **kwargs)
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1618, in _run_visitor
visitorcallable(self.dialect, self, **kwargs).traverse_single(element)
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/sqlalchemy/sql/visitors.py", line 138, in traverse_single
return meth(obj, **kw)
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/sqlalchemy/sql/ddl.py", line 826, in visit_table
include_foreign_key_constraints,
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 982, in execute
return meth(self, multiparams, params)
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/sqlalchemy/sql/ddl.py", line 72, in _execute_on_connection
return connection._execute_ddl(self, multiparams, params)
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1044, in _execute_ddl
compiled,
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1250, in _execute_context
e, statement, parameters, cursor, context
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1476, in _handle_dbapi_exception
util.raise_from_cause(sqlalchemy_exception, exc_info)
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 398, in raise_from_cause
reraise(type(exception), exception, tb=exc_tb, cause=cause)
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 152, in reraise
raise value.with_traceback(tb)
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1246, in _execute_context
cursor, statement, parameters, context
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 581, in do_execute
cursor.execute(statement, parameters)
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/pymysql/cursors.py", line 170, in execute
result = self._query(query)
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/pymysql/cursors.py", line 328, in _query
conn.query(q)
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/pymysql/connections.py", line 517, in query
self._affected_rows = self._read_query_result(unbuffered=unbuffered)
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/pymysql/connections.py", line 732, in _read_query_result
result.read()
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/pymysql/connections.py", line 1075, in read
first_packet = self.connection._read_packet()
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/pymysql/connections.py", line 684, in _read_packet
packet.check_error()
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/pymysql/protocol.py", line 220, in check_error
err.raise_mysql_exception(self._data)
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/site-packages/pymysql/err.py", line 109, in raise_mysql_exception
raise errorclass(errno, errval)
sqlalchemy.exc.InternalError: (pymysql.err.InternalError) (1050, "Table 'tmp_oly_12' already exists")
[SQL:
CREATE TABLE tmp_oly_12 (
user_id BIGINT,
total_logins BIGINT,
distinct_month BIGINT,
freq TEXT,
lastlogin DATETIME,
typelastlog TEXT
)
]
(Background on this error at: http://sqlalche.me/e/2j85)
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "user_login_type.py", line 125, in <module>
pool.map(f, [make_q(item) for item in chunks[0:3]])
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/multiprocessing/pool.py", line 268, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/Users/opt/anaconda3/envs/UserExperience/lib/python3.7/multiprocessing/pool.py", line 657, in get
raise self._value
sqlalchemy.exc.InternalError: (pymysql.err.InternalError) (1050, "Table 'users_usage_frequency_oly_12' already exists")
[SQL:
CREATE TABLE tmp_oly_12 (
user_id BIGINT,
total_logins BIGINT,
distinct_month BIGINT,
freq TEXT,
lastlogin DATETIME,
I can see the table tmp_oly_12 is populated, not fully - but still I get this error ...
I'm saving a table into SQL server and need it to replace existing table of the same name.
df1.to_sql('customer', schema = r'Marketing\xyz', con = engine, index = False, if_exists = 'replace')
worked but adding brackets at the schema will break it:
df1.to_sql('customer', schema = r'[Marketing\xyz]', con = engine, index = False, if_exists = 'replace')
with this error:
Traceback (most recent call last):
File "<ipython-input-11-69ec99f37fd6>", line 1, in <module>
df1.to_sql('customer', schema = r'[abc\xyz]', con = engine, index = False, if_exists = 'replace')
File "C:\Users\xyz\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\generic.py", line 2712, in to_sql
method=method,
File "C:\Users\xyz\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\sql.py", line 518, in to_sql
method=method,
File "C:\Users\xyz\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\sql.py", line 1319, in to_sql
table.create()
File "C:\Users\xyz\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\sql.py", line 656, in create
self._execute_create()
File "C:\Users\xyz\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\sql.py", line 638, in _execute_create
self.table.create()
File "C:\Users\xyz\AppData\Local\Continuum\anaconda3\lib\site-packages\sqlalchemy\sql\schema.py", line 860, in create
bind._run_visitor(ddl.SchemaGenerator, self, checkfirst=checkfirst)
File "C:\Users\xyz\AppData\Local\Continuum\anaconda3\lib\site-packages\sqlalchemy\engine\base.py", line 2036, in _run_visitor
conn._run_visitor(visitorcallable, element, **kwargs)
File "C:\Users\xyz\AppData\Local\Continuum\anaconda3\lib\site-packages\sqlalchemy\engine\base.py", line 1607, in _run_visitor
visitorcallable(self.dialect, self, **kwargs).traverse_single(element)
File "C:\Users\xyz\AppData\Local\Continuum\anaconda3\lib\site-packages\sqlalchemy\sql\visitors.py", line 132, in traverse_single
return meth(obj, **kw)
File "C:\Users\xyz\AppData\Local\Continuum\anaconda3\lib\site-packages\sqlalchemy\sql\ddl.py", line 826, in visit_table
include_foreign_key_constraints,
File "C:\Users\xyz\AppData\Local\Continuum\anaconda3\lib\site-packages\sqlalchemy\engine\base.py", line 988, in execute
return meth(self, multiparams, params)
File "C:\Users\xyz\AppData\Local\Continuum\anaconda3\lib\site-packages\sqlalchemy\sql\ddl.py", line 72, in _execute_on_connection
return connection._execute_ddl(self, multiparams, params)
File "C:\Users\xyz\AppData\Local\Continuum\anaconda3\lib\site-packages\sqlalchemy\engine\base.py", line 1050, in _execute_ddl
compiled,
File "C:\Users\xyz\AppData\Local\Continuum\anaconda3\lib\site-packages\sqlalchemy\engine\base.py", line 1248, in _execute_context
e, statement, parameters, cursor, context
File "C:\Users\xyz\AppData\Local\Continuum\anaconda3\lib\site-packages\sqlalchemy\engine\base.py", line 1466, in _handle_dbapi_exception
util.raise_from_cause(sqlalchemy_exception, exc_info)
File "C:\Users\xyz\AppData\Local\Continuum\anaconda3\lib\site-packages\sqlalchemy\util\compat.py", line 398, in raise_from_cause
reraise(type(exception), exception, tb=exc_tb, cause=cause)
File "C:\Users\xyz\AppData\Local\Continuum\anaconda3\lib\site-packages\sqlalchemy\util\compat.py", line 152, in reraise
raise value.with_traceback(tb)
File "C:\Users\xyz\AppData\Local\Continuum\anaconda3\lib\site-packages\sqlalchemy\engine\base.py", line 1244, in _execute_context
cursor, statement, parameters, context
File "C:\Users\xyz\AppData\Local\Continuum\anaconda3\lib\site-packages\sqlalchemy\engine\default.py", line 552, in do_execute
cursor.execute(statement, parameters)
ProgrammingError: (pyodbc.ProgrammingError) ('42S01', "[42S01] [Microsoft][ODBC Driver 13 for SQL Server][SQL Server]There is already an object named 'customer' in the database. (2714) (SQLExecDirectW)")
[SQL:
CREATE TABLE [Marketing\xyz].customer (
[Customer] not NULL
)
]
(Background on this error at: http://sqlalche.me/e/f405)
Anyone know why adding the brackets in the schema would cause pandas fail to automatically drop the table for me? Thanks-
I'm using
sys.version
Out[19]: '3.7.4 (default, Aug 9 2019, 18:34:13) [MSC v.1915 64 bit (AMD64)]'
pd.__version__
Out[16]: '0.25.3'
I have a pandas DataFrame 'df' that I'm attempting to upload to a Netezza database. I've been attempting this using DataFrame.to_sql and creating the appropriate SQLAlchemy engine to do so:
import pandas
import sqlalchemy
import urllib
def upload_test(data, table):
quoted = urllib.quote_plus('DRIVER={NetezzaSQL};Server=SERVER;Database=DATA_BASE;UID=uid;PWD=pwd;Port=5480;')
engine = create_engine('mssql+pyodbc:///?odbc_connect={}'.format(quoted))
data.to_sql(name=table, con=engine, if_exists='append', index=False)
df = pandas.DataFrame(
{
'VAR1': pandas.Series(['2016-05-01', '2016-05-02'])
, 'VAR2': pandas.Series([2500, 2500])
, 'VAR3': pandas.Series([211232, 211232])
}
)
upload_test(data=df, table='TABLE')
This just returns a SQL error in my console's Traceback:
Traceback (most recent call last):
File "C:\Anaconda3\envs\python_2_7_Anaconda\lib\site-packages\IPython\core\interactiveshell.py", line 2885, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-68-b2d5c19f9472>", line 19, in <module>
upload_test(data=df, table='TABLE')
File "<ipython-input-68-b2d5c19f9472>", line 4, in upload_test
data.to_sql(name=table, con=engine, if_exists='append', index=False)
File "C:\Anaconda3\envs\python_2_7_Anaconda\lib\site-packages\pandas\core\generic.py", line 1003, in to_sql
dtype=dtype)
File "C:\Anaconda3\envs\python_2_7_Anaconda\lib\site-packages\pandas\io\sql.py", line 569, in to_sql
chunksize=chunksize, dtype=dtype)
File "C:\Anaconda3\envs\python_2_7_Anaconda\lib\site-packages\pandas\io\sql.py", line 1240, in to_sql
table.create()
File "C:\Anaconda3\envs\python_2_7_Anaconda\lib\site-packages\pandas\io\sql.py", line 685, in create
if self.exists():
File "C:\Anaconda3\envs\python_2_7_Anaconda\lib\site-packages\pandas\io\sql.py", line 673, in exists
return self.pd_sql.has_table(self.name, self.schema)
File "C:\Anaconda3\envs\python_2_7_Anaconda\lib\site-packages\pandas\io\sql.py", line 1263, in has_table
schema or self.meta.schema,
File "C:\Anaconda3\envs\python_2_7_Anaconda\lib\site-packages\sqlalchemy\engine\base.py", line 1972, in run_callable
return conn.run_callable(callable_, *args, **kwargs)
File "C:\Anaconda3\envs\python_2_7_Anaconda\lib\site-packages\sqlalchemy\engine\base.py", line 1477, in run_callable
return callable_(self, *args, **kwargs)
File "C:\Anaconda3\envs\python_2_7_Anaconda\lib\site-packages\sqlalchemy\dialects\mssql\base.py", line 1466, in wrap
tablename, dbname, owner, schema, **kw)
File "C:\Anaconda3\envs\python_2_7_Anaconda\lib\site-packages\sqlalchemy\dialects\mssql\base.py", line 1475, in _switch_db
return fn(*arg, **kw)
File "C:\Anaconda3\envs\python_2_7_Anaconda\lib\site-packages\sqlalchemy\dialects\mssql\base.py", line 1621, in has_table
c = connection.execute(s)
File "C:\Anaconda3\envs\python_2_7_Anaconda\lib\site-packages\sqlalchemy\engine\base.py", line 914, in execute
return meth(self, multiparams, params)
File "C:\Anaconda3\envs\python_2_7_Anaconda\lib\site-packages\sqlalchemy\sql\elements.py", line 323, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "C:\Anaconda3\envs\python_2_7_Anaconda\lib\site-packages\sqlalchemy\engine\base.py", line 1010, in _execute_clauseelement
compiled_sql, distilled_params
File "C:\Anaconda3\envs\python_2_7_Anaconda\lib\site-packages\sqlalchemy\engine\base.py", line 1146, in _execute_context
context)
File "C:\Anaconda3\envs\python_2_7_Anaconda\lib\site-packages\sqlalchemy\engine\base.py", line 1341, in _handle_dbapi_exception
exc_info
File "C:\Anaconda3\envs\python_2_7_Anaconda\lib\site-packages\sqlalchemy\util\compat.py", line 200, in raise_from_cause
reraise(type(exception), exception, tb=exc_tb)
File "C:\Anaconda3\envs\python_2_7_Anaconda\lib\site-packages\sqlalchemy\engine\base.py", line 1139, in _execute_context
context)
File "C:\Anaconda3\envs\python_2_7_Anaconda\lib\site-packages\sqlalchemy\engine\default.py", line 450, in do_execute
cursor.execute(statement, parameters)
ProgrammingError: (pyodbc.ProgrammingError) ('42000', '[42000] ERROR: \'SELECT [COLUMNS_1].[TABLE_SCHEMA], [COLUMNS_1].[TABLE_NAME], [COLUMNS_1].[COLUMN_NAME], [COLUMNS_1].[IS_NULLABLE], [COLUMNS_1].[DATA_TYPE], [COLUMNS_1].[ORDINAL_POSITION], [COLUMNS_1].[CHARACTER_MAXIMUM_LENGTH], [COLUMNS_1].[NUMERIC_PRECISION], [COLUMNS_1].[NUMERIC_SCALE], [COLUMNS_1].[COLUMN_DEFAULT], [COLUMNS_1].[COLLATION_NAME] FROM [INFORMATION_SCHEMA].[COLUMNS] AS [COLUMNS_1] WHERE [COLUMNS_1].[TABLE_NAME] = NULL AND [COLUMNS_1].[TABLE_SCHEMA] = NULL limit 0\'\nerror ^ found "[" (at char 8) expecting an identifier found a keyword (27) (SQLPrepare)') [SQL: u'SELECT [COLUMNS_1].[TABLE_SCHEMA], [COLUMNS_1].[TABLE_NAME], [COLUMNS_1].[COLUMN_NAME], [COLUMNS_1].[IS_NULLABLE], [COLUMNS_1].[DATA_TYPE], [COLUMNS_1].[ORDINAL_POSITION], [COLUMNS_1].[CHARACTER_MAXIMUM_LENGTH], [COLUMNS_1].[NUMERIC_PRECISION], [COLUMNS_1].[NUMERIC_SCALE], [COLUMNS_1].[COLUMN_DEFAULT], [COLUMNS_1].[COLLATION_NAME] \nFROM [INFORMATION_SCHEMA].[COLUMNS] AS [COLUMNS_1] \nWHERE [COLUMNS_1].[TABLE_NAME] = ? AND [COLUMNS_1].[TABLE_SCHEMA] = ?'] [parameters: (u'TABLE', u'dbo')]
I know the connection is solid because I've been able to use it to read back data just fine:
connection = engine.connect()
result = connection.execute("SELECT * FROM TABLE LIMIT 100")
for row in result:
print row
Now from what I've seen on other sites, the issue lies in my choosing of the dialect for my SQLAlchemy engine, but I'm not certain if this is the issue. Is there some other object I could possibly convert the DataFrame to? Should I attempt to insert a row at a time into the table?
Thanks!
I believe the issue here is, as you guessed, in your choice of dialect for SQLAlchemy.
From the last line of your error output:
...found "[" (at char 8) expecting an identifier found a keyword...
It's delimiting your column and table names with square brackets, which is a MSSQL-ism, and Netezza will not accept that.
I don't have any experience with SQLAlchemy, but if it doesn't have a Netezza-specific dialect to choose, then try one of the Postgres dialects, as this is in Netezza's family tree.