to_sql - "cannot use a string pattern on a byte-like object"

to_sql - "cannot use a string pattern on a byte-like object" - python

I am trying to use pandas, sqlalchemy and pypyodbc in order to write my dataframe (df2) to a table in my sql database. I have successfully connected to the database with pypyodbc. However, when I try to use the engine with to_sql it throws the error in the title. I think I'm missing something in the urllib section when the engine creation string is parsed.
Here is my code snippet:
connStr = "Driver=SQL Server Native Client 10.0; Server=Sname-L\SQLEXPRESS;database=DBname;trusted_connection=yes;"
connEncodedStr = urllib.parse.quote_plus(connStr)
engine = sqlalchemy.create_engine("mssql+pyodbc:///?odbc_connect={}".format(connEncodedStr),module=pypyodbc,echo=True)
df2.to_sql('db_table2', engine, if_exists='replace')
print(engine)
result:
Engine(mssql+pyodbc:///?odbc_connect=Driver=SQL Server Native Client 10.0; Server=Sname-L\SQLEXPRESS;database=DBname;trusted_connection=yes;)
Here is the traceback:
Traceback (most recent call last):
File "C:\Users\user\AppData\Local\Continuum\Anaconda3\lib\site-packages\s
qlalchemy\pool.py", line 1122, in _do_get
return self._pool.get(wait, self._timeout)
File "C:\Users\user\AppData\Local\Continuum\Anaconda3\lib\site-packages\s
qlalchemy\util\queue.py", line 145, in get
raise Empty
sqlalchemy.util.queue.Empty
However, this also happened:
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "noaadatamanip_datafram.py", line 50, in <module>
df2.to_sql('db_table2', engine, if_exists='replace')
File "C:\Users\user\AppData\Local\Continuum\Anaconda3\lib\site-packages\p
andas\core\generic.py", line 1201, in to_sql
chunksize=chunksize, dtype=dtype)
The above code went on for a lot of lines and returned the error in the title. Let me know if posting the whole thing will be helpful.
Sorry in advance, my error handling skills are very low.

Related

How to integrate and mocking Redshift and S3 locally using redshift-fake-driver

I would like to build redshift and s3 locally, and then use them for tasks that may run from airflow, tools ... to reduce CI/CD code when have to deploy them to dev, also want to avoid conflict about resources, files, ...
Currently can use LocalStack's S3, but for Redshift, jusr looking for solutions but only get combination using redshift-fake-driver along with package JayDeBeApi in python, but it seems not working properly
import jpype # JPype1==1.4.1
import jaydebeapi # JayDeBeApi==1.2.3
jars = "/Users/trancongminh/Downloads/jars/*"
jpype.startJVM(classpath=jars)
driverName = "jp.ne.opt.redshiftfake.postgres.FakePostgresqlDriver"
print(jpype.JClass(driverName))
# as I spin up a docker container for postgresQL
connectionString = "jdbc:postgresqlredshift://localhost:5432/docker"
uid = "docker"
pwd = "docker"
driverFileName = "/Users/trancongminh/Downloads/jars/redshift-fake-driver_2.12-1.0.15.jar"
conn = jaydebeapi.connect(
jclassname=driverName,
url=connectionString,
driver_args={'user': uid, 'password': pwd},
jars=driverFileName
)
curs = conn.cursor()
curs.execute("SELECT * FROM pg_catalog.pg_tables limit 10;")
curs.fetchall()
curs.execute("copy db_table_name_v2 from 'http://localhost:4566/events-streaming/traveller/v2/ym_202210/d_04/hm_131901.parquet' CREDENTIALS 'aws_access_key_id=test;aws_secret_access_key=test' ")
But get errors like No such file or directory, or smth like this
Traceback (most recent call last):
File "FakeConnection.scala", line 31, in jp.ne.opt.redshiftfake.FakeConnection.prepareStatement
Exception: Java Exception
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/trancongminh/Pelago/pelago-ds-env/lib/python3.9/site-packages/jaydebeapi/__init__.py", line 531, in execute
self._prep = self._connection.jconn.prepareStatement(operation)
java.lang.NoSuchMethodError: java.lang.NoSuchMethodError: 'void scala.util.parsing.combinator.Parsers.$init$(scala.util.parsing.combinator.Parsers)'
or may be like this:
Traceback (most recent call last):
File "FakePreparedStatement.scala", line 138, in jp.ne.opt.redshiftfake.FakePreparedStatement$FakeAsIsPreparedStatement.execute
Exception: Java Exception
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/trancongminh/Pelago/pelago-ds-env/lib/python3.9/site-packages/jaydebeapi/__init__.py", line 534, in execute
is_rs = self._prep.execute()
org.postgresql.util.PSQLException: org.postgresql.util.PSQLException: ERROR: could not open file "s3://events-streaming/traveller/v2/ym_202210/d_04/hm_131901.parquet" for reading: No such file or directory
Hint: COPY FROM instructs the PostgreSQL server process to read a file. You may want a client-side facility such as psql's \copy.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/trancongminh/Pelago/pelago-ds-env/lib/python3.9/site-packages/jaydebeapi/__init__.py", line 536, in execute
_handle_sql_exception()
File "/Users/trancongminh/Pelago/pelago-ds-env/lib/python3.9/site-packages/jaydebeapi/__init__.py", line 165, in _handle_sql_exception_jpype
reraise(exc_type, exc_info[1], exc_info[2])
File "/Users/trancongminh/Pelago/pelago-ds-env/lib/python3.9/site-packages/jaydebeapi/__init__.py", line 57, in reraise
raise value.with_traceback(tb)
File "/Users/trancongminh/Pelago/pelago-ds-env/lib/python3.9/site-packages/jaydebeapi/__init__.py", line 534, in execute
is_rs = self._prep.execute()
jaydebeapi.DatabaseError: org.postgresql.util.PSQLException: ERROR: could not open file "s3://events-streaming/traveller/v2/ym_202210/d_04/hm_131901.parquet" for reading: No such file or directory
Hint: COPY FROM instructs the PostgreSQL server process to read a file. You may want a client-side facility such as psql's \copy
Anyy body has exp with this pattern just help, thanks
Solutions or keywords that helpful for further investigation

How to properly serialize and deserialize paging_size in Python?

In my Python application, I make the query to the Cassandra database. I'm trying to implement pagination through the cassandra-driver package. As you can see from the code below, paging_state returns the bytes data type. I can convert this value to the string data type. Then I send the value of the str_paging_state variable to the client. If this client sends me str_paging_state again I want to use it in my query.
This part of code works:
query = "select * from users where user_type = 'clients';"
statement = SimpleStatement(query, fetch_size=10)
results = session.execute(statement)
paging_state = results.paging_state
print(type(paging_state)) # <class 'bytes'>
str_paging_state = str(paging_state)
print(str_paging_state) # "b'\\x00C\\x00\\x00\\x00\\x02\\x00\\x00\\x00\\x03_hk\\x00\\x00\\x00\\x11P]5C#\\x8bGD~\\x8b\\xc7g\\xda\\xe5rH\\xb0\\x00\\x00\\x00\\x03_rk\\x00\\x00\\x00\\x18\\xee\\x14\\xf7\\x83\\x84\\x00tTmw[\\x00\\xec\\xdb\\x9b\\xa9\\xfd\\x00\\xb9\\xff\\xff\\xff\\xff\\xfe\\x01\\x00'"
This part of code raise error:
results = session.execute(
statement,
paging_state=bytes(str_paging_state.encode())
)
Error:
[ERROR] NoHostAvailable: ('Unable to complete the operation against any hosts')
Traceback (most recent call last):
  File "/var/task/lambda_function.py", line 51, in lambda_handler
    results = cassandra_connection.execute(statement, paging_state=bytes(paging_state.encode()))
  File "/opt/python/lib/python3.8/site-packages/cassandra/cluster.py", line 2618, in execute
    return self.execute_async(query, parameters, trace, custom_payload, timeout, execution_profile, paging_state, host, execute_as).result()
  File "/opt/python/lib/python3.8/site-packages/cassandra/cluster.py", line 4877, in result
    raise self._final_exceptionEND RequestId: 4b7bf588-a2d2-45e5-ad7e-8611f1704313
In Java documentation I found the .fromString method which creates a PagingState object from a string previously generated with toString(). Unfortunately, I didn't find an equivalent for this method in Python.
I also tried to use codecs package to decode and encode the paging_state.
str_paging_state = codecs.decode(paging_state, encoding='utf-8', errors='ignore')
# "\u0000C\u0000\u0000\u0000\u0002\u0000\u0000\u0000\u0003_hk\u0000\u0000\u0000\u0011P]5C#GD~grH\u0000\u0000\u0000\u0003_rk\u0000\u0000\u0000\u0018\u0014\u0000tTmw[\u0000ۛ\u0000\u0001\u0000"
# Raise error
results = session.execute(statement, paging_state=codecs.encode(str_paging_state, encoding='utf-8', errors='ignore'))
In this case I see next error:
[ERROR] ProtocolException: <Error from server: code=000a [Protocol error] message="Invalid value for the paging state">
Traceback (most recent call last):
  File "/var/task/lambda_function.py", line 50, in lambda_handler
    results = cassandra_connection.execute(
  File "/opt/python/lib/python3.8/site-packages/cassandra/cluster.py", line 2618, in execute
    return self.execute_async(query, parameters, trace, custom_payload, timeout, execution_profile, paging_state, host, execute_as).result()
  File "/opt/python/lib/python3.8/site-packages/cassandra/cluster.py", line 4877, in result
    raise self._final_exceptionEND RequestId: 979f098a-a566-4904-821a-2ce06522d909
In my case, protocol version is 4.
cluster = Cluster(..., protocol_version=4)
I would appreciate any help!

Just convert the binary data into hex string or base64 - use binascii module for that. For example, for first case functions hexlify/unhexlify (or in Python 3 use .hex method of binary data), and for base64 - use functions b2a_base64/a2b_base64

psycopg2.DatabaseError: SSL error: bad length

Python 3.5, psycopg2 2.7.4, connects to postgresql database and gets error psycopg2.DatabaseError: SSL error: bad length
What does this error mean - some SSL certificate is not valid?
Here's traceback:
Traceback (most recent call last):
File "src/db_importer.py", line 278, in <module>
main()
File "src/db_importer.py", line 255, in main
run_process(args, db_params)
File "src/db_importer.py", line 198, in run_process
import_csv(cur, schema, table_name, args.raport_data)
File "src/db_importer.py", line 130, in import_csv
cur.copy_from(f, '{}.{}'.format(schema, table_name), sep=';', columns=cols)
psycopg2.DatabaseError: SSL error: bad length

After freeing some space the script worked. So tt was lack of space that caused this error.

try 'vacuum full analyze your table' on postgres level, it worked for me. Deadtulips was the issue and 'After freeing some space' might have been autovaccum running

Cannot fetch data from cassandra using cqlengine python model :

I am trying to use the cassandrapython cqlengine for accessing the cassandra db, I was able to filter when the columns are not of list type.
I get the following error message :
d =cClass().filter(lastname='text',age=2,input__contains='a')
Traceback (most recent call last):
File "/usr/local/lib/python3.4/dist-packages/cqlengine-0.21.0-py3.4.egg/cqlengine/operators.py", line 43,
in get_operator KeyError: 'CONTAINS' During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.4/dist-packages/cqlengine-0.21.0-py3.4.egg/cqlengine/models.py", line 562, in filter
File "/usr/local/lib/python3.4/dist-packages/cqlengine-0.21.0-py3.4.egg/cqlengine/query.py", line 507, in filter
File "/usr/local/lib/python3.4/dist-packages/cqlengine-0.21.0-py3.4.egg/cqlengine/operators.py", line 45,
in get_operator **cqlengine.operators.QueryOperatorException: contains doesn't map to a QueryOperator**

It looks like you're using a very old (and deprecated) version of cqlengine. If you can, upgrade to the version integrated with the DataStax driver, where the API has supported contains since version 3.1.0.

[Python]:phoenixdb.errors.ProgrammingError: ("Syntax error. Unexpected char: '!'", 601, '42P00', None)

I am using a python module called phoenixdb to access phoenix which is SQL wrapper to query over HBase.
Here is my code snippet:-
import phoenixdb
database_url = 'http://localhost:8765/'
conn = phoenixdb.connect(database_url, autocommit=True)
cursor = conn.cursor()
cursor.execute("!table")
print cursor.fetchall()
cursor.close()
The phoenix query to list all the schemes and tables is !table or !tables.
But when I try passing the same in the execute function as shown above, I get the following error:-
Traceback (most recent call last):
File "phoenix_hbase.py", line 7, in <module>
cursor.execute("!table")
File "build/bdist.linux-x86_64/egg/phoenixdb/cursor.py", line 242, in execute
File "build/bdist.linux-x86_64/egg/phoenixdb/avatica.py", line 345, in prepareAndExecute
File "build/bdist.linux-x86_64/egg/phoenixdb/avatica.py", line 184, in _apply
File "build/bdist.linux-x86_64/egg/phoenixdb/avatica.py", line 90, in parse_error_page
phoenixdb.errors.ProgrammingError: ("Syntax error. Unexpected char: '!'", 601, '42P00', None)
Funny part is when I try to passing a different query, for example a select query, then script gets executed and produces result just fine.
Query:cursor.execute("select * from CARETESTING.EDR_BIN_SOURCE_3600_EDR_FLOW_SUBS_TOTAL limit 1")
Result:
[[2045,1023,4567]]
Is there any other format for passing !table which is equivalent to show tables in phoenixdb library's execute function which I am missing?
I tried looking up on the internet but unfortunately haven't come across anything helpful so far.
Thanks

!tables is sqlline grammar that can not be parsed by JDBC interface.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

to_sql - "cannot use a string pattern on a byte-like object" - python

Related

How to integrate and mocking Redshift and S3 locally using redshift-fake-driver

How to properly serialize and deserialize paging_size in Python?

psycopg2.DatabaseError: SSL error: bad length

Cannot fetch data from cassandra using cqlengine python model :

[Python]:phoenixdb.errors.ProgrammingError: ("Syntax error. Unexpected char: '!'", 601, '42P00', None)

Categories

Resources