How to properly serialize and deserialize paging_size in Python? - python

In my Python application, I make the query to the Cassandra database. I'm trying to implement pagination through the cassandra-driver package. As you can see from the code below, paging_state returns the bytes data type. I can convert this value to the string data type. Then I send the value of the str_paging_state variable to the client. If this client sends me str_paging_state again I want to use it in my query.
This part of code works:
query = "select * from users where user_type = 'clients';"
statement = SimpleStatement(query, fetch_size=10)
results = session.execute(statement)
paging_state = results.paging_state
print(type(paging_state)) # <class 'bytes'>
str_paging_state = str(paging_state)
print(str_paging_state) # "b'\\x00C\\x00\\x00\\x00\\x02\\x00\\x00\\x00\\x03_hk\\x00\\x00\\x00\\x11P]5C#\\x8bGD~\\x8b\\xc7g\\xda\\xe5rH\\xb0\\x00\\x00\\x00\\x03_rk\\x00\\x00\\x00\\x18\\xee\\x14\\xf7\\x83\\x84\\x00tTmw[\\x00\\xec\\xdb\\x9b\\xa9\\xfd\\x00\\xb9\\xff\\xff\\xff\\xff\\xfe\\x01\\x00'"
This part of code raise error:
results = session.execute(
statement,
paging_state=bytes(str_paging_state.encode())
)
Error:
[ERROR] NoHostAvailable: ('Unable to complete the operation against any hosts')
Traceback (most recent call last):
  File "/var/task/lambda_function.py", line 51, in lambda_handler
    results = cassandra_connection.execute(statement, paging_state=bytes(paging_state.encode()))
  File "/opt/python/lib/python3.8/site-packages/cassandra/cluster.py", line 2618, in execute
    return self.execute_async(query, parameters, trace, custom_payload, timeout, execution_profile, paging_state, host, execute_as).result()
  File "/opt/python/lib/python3.8/site-packages/cassandra/cluster.py", line 4877, in result
    raise self._final_exceptionEND RequestId: 4b7bf588-a2d2-45e5-ad7e-8611f1704313
In Java documentation I found the .fromString method which creates a PagingState object from a string previously generated with toString(). Unfortunately, I didn't find an equivalent for this method in Python.
I also tried to use codecs package to decode and encode the paging_state.
str_paging_state = codecs.decode(paging_state, encoding='utf-8', errors='ignore')
# "\u0000C\u0000\u0000\u0000\u0002\u0000\u0000\u0000\u0003_hk\u0000\u0000\u0000\u0011P]5C#GD~grH\u0000\u0000\u0000\u0003_rk\u0000\u0000\u0000\u0018\u0014\u0000tTmw[\u0000ۛ\u0000\u0001\u0000"
# Raise error
results = session.execute(statement, paging_state=codecs.encode(str_paging_state, encoding='utf-8', errors='ignore'))
In this case I see next error:
[ERROR] ProtocolException: <Error from server: code=000a [Protocol error] message="Invalid value for the paging state">
Traceback (most recent call last):
  File "/var/task/lambda_function.py", line 50, in lambda_handler
    results = cassandra_connection.execute(
  File "/opt/python/lib/python3.8/site-packages/cassandra/cluster.py", line 2618, in execute
    return self.execute_async(query, parameters, trace, custom_payload, timeout, execution_profile, paging_state, host, execute_as).result()
  File "/opt/python/lib/python3.8/site-packages/cassandra/cluster.py", line 4877, in result
    raise self._final_exceptionEND RequestId: 979f098a-a566-4904-821a-2ce06522d909
In my case, protocol version is 4.
cluster = Cluster(..., protocol_version=4)
I would appreciate any help!

Just convert the binary data into hex string or base64 - use binascii module for that. For example, for first case functions hexlify/unhexlify (or in Python 3 use .hex method of binary data), and for base64 - use functions b2a_base64/a2b_base64

Related

Compress in Java, decompress in Python - snappy/redis-py-cluster

I am writing cron script in python for a redis cluster and using redis-py-cluster for only reading data from a prod server. A separate Java application is writing to redis cluster with snappy compression and java string codec utf-8.
I am able to read data but not able to decode it.
from rediscluster import RedisCluster
import snappy
host, port ="127.0.0.1", "30001"
startup_nodes = [{"host": host, "port": port}]
print("Trying connecting to redis cluster host=" + host + ", port=" + str(port))
rc = RedisCluster(startup_nodes=startup_nodes, max_connections=32, decode_responses=True)
print("Connected", rc)
print("Reading all keys, value ...\n\n")
for key in rc.scan_iter("uidx:*"):
value = rc.get(key)
#uncompress = snappy.uncompress(value, decoding="utf-8")
print(key, value)
print('\n')
print("Done. exit()")
exit()
decode_responses=False is working fine with the comment. however changing decode_responses=True is throwing error. My guess is its not able to get the correct decoder.
Traceback (most recent call last):
File "splooks_cron.py", line 22, in <module>
print(key, rc.get(key))
File "/Library/Python/2.7/site-packages/redis/client.py", line 1207, in get
return self.execute_command('GET', name)
File "/Library/Python/2.7/site-packages/rediscluster/utils.py", line 101, in inner
return func(*args, **kwargs)
File "/Library/Python/2.7/site-packages/rediscluster/client.py", line 410, in execute_command
return self.parse_response(r, command, **kwargs)
File "/Library/Python/2.7/site-packages/redis/client.py", line 768, in parse_response
response = connection.read_response()
File "/Library/Python/2.7/site-packages/redis/connection.py", line 636, in read_response
raise e
: 'utf8' codec can't decode byte 0x82 in position 0: invalid start byte
PS: Uncommenting this line uncompress = snappy.uncompress(value, decoding="utf-8") is breaking with error
Traceback (most recent call last):
File "splooks_cron.py", line 27, in <module>
uncompress = snappy.uncompress(value, decoding="utf-8")
File "/Library/Python/2.7/site-packages/snappy/snappy.py", line 91, in uncompress
return _uncompress(data).decode(decoding)
snappy.UncompressError: Error while decompressing: invalid input
After hours of debugging, I was finally able to solve this.
I am using xerial/snappy-java compressor in my Java code which is writing to redis cluster. Interesting thing is that during compression xerial SnappyOutputStream adds some offset at the beginning of the compress data. In my case this looks something like this
"\x82SNAPPY\x00\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x01\xb6\x8b\x06\\******actual data here*****
Due to this, the decompressor was not able to figure out. I modified code as below and remove offset form the value. it's working fine now.
for key in rc.scan_iter("uidx:*"):
value = rc.get(key)
#in my case offset was 20 and utf-8 is default ecoder/decoder for snappy
# https://github.com/andrix/python-snappy/blob/master/snappy/snappy.py
uncompress_value = snappy.decompress(value[20:])
print(key, uncompress_value)
print('\n')

Tornado write_message not sending dict/json

I am trying to send the file over the tornado websocket like this
in_file = open("/home/rootkit/Pictures/test.png", "rb")
data = in_file.read()
in_file.close()
d = {'file': base64.b64encode(data), 'filename': 'test.png'}
self.ws.write_message(message=d)
as per tornado documentation.
The message may be either a string or a dict (which will be encoded as json). If the binary argument is false, the message will be sent as utf8; in binary mode any byte string is allowed.
But I am getting this exception.
ERROR:asyncio:Future exception was never retrieved
future: <Future finished exception=TypeError("Expected bytes, unicode, or None; got <class 'dict'>",)>
Traceback (most recent call last):
File "/home/rootkit/.local/lib/python3.5/site-packages/tornado/gen.py", line 1147, in run
yielded = self.gen.send(value)
File "/home/rootkit/PycharmProjects/socketserver/WebSocketClient.py", line 42, in run
self.ws.write_message(message=d, binary=True)
File "/home/rootkit/.local/lib/python3.5/site-packages/tornado/websocket.py", line 1213, in write_message
return self.protocol.write_message(message, binary=binary)
File "/home/rootkit/.local/lib/python3.5/site-packages/tornado/websocket.py", line 854, in write_message
message = tornado.escape.utf8(message)
File "/home/rootkit/.local/lib/python3.5/site-packages/tornado/escape.py", line 197, in utf8
"Expected bytes, unicode, or None; got %r" % type(value)
TypeError: Expected bytes, unicode, or None; got <class 'dict'>
The documentation which you're citing is for WebSocketHandler which is meant for serving a websocket connection.
Whereas you're using a websocket client. You'll have to manually convert your dictionary to json.
from tornado.escape import json_encode
self.ws.write_message(message=json_encode(d))

error with multiprocessing parsed data to sqlite

I am trying to parse a bunch of links and append the parsed data to sqlite3. I am getting errors that the sqlite3 database is locked, so maybe it's because I am using too high a pool value? I tried to lower it to 5 but I am still getting errors shown below.
My code is basically looking like this:
from multiprocessing import Pool
with Pool(5) as p:
p.map(parse_link, links)
My real code is looking like this:
with Pool(5) as p:
p.map(Get_FT_OU, file_to_set('links.txt'))
# Where Get_FT_OU(link) appends links to a sqlite3 database.
When the code runs I often get these errors. Can someone help me to fix it?
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "/Users/christian/Documents/GitHub/odds/CP_Parser.py", line 166, in Get_FT_OU
cursor.execute(sql_str)
sqlite3.OperationalError: database is locked
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/christian/Documents/GitHub/odds/CP_Parser.py", line 206, in <module>
p.map(Get_FT_OU, file_to_set('links.txt'))
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/pool.py", line 266, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/pool.py", line 644, in get
raise self._value
sqlite3.OperationalError: database is locked
>>>
I can run the code fine without using multiprocessing and actually also while using Pool(2) I get no errors, but if I go higher I get these errors. I'm using the newest MacBook Air.
It somehow worked by adding a timeout=10 to the connection
conn = sqlite3.connect(DB_FILENAME, timeout=10)

Getting "TypeError: 'NoneType' object is not iterable" while doing parallel ssh

I am trying to do parallel ssh on servers. While doing this, i am getting "TypeError: 'NoneType' object is not iterable" this error. Kindly help.
My script is below
from pssh import ParallelSSHClient
from pssh.exceptions import AuthenticationException, UnknownHostException, ConnectionErrorException
def parallelsshjob():
client = ParallelSSHClient(['10.84.226.72','10.84.226.74'], user = 'root', password = 'XXX')
try:
output = client.run_command('racadm getsvctag', sudo=True)
print output
except (AuthenticationException, UnknownHostException, ConnectionErrorException):
pass
#print output
if __name__ == '__main__':
parallelsshjob()
And the Traceback is below
Traceback (most recent call last):
File "parallelssh.py", line 17, in <module>
parallelsshjob()
File "parallelssh.py", line 10, in parallelsshjob
output = client.run_command('racadm getsvctag', sudo=True)
File "/Library/Python/2.7/site-packages/pssh/pssh_client.py", line 520, in run_command
raise ex
TypeError: 'NoneType' object is not iterable
Help me with the solution and also suggest me to use ssh-agent in this same script. Thanks in advance.
From reading the code and debugging a bit on my laptop, I believe the issue is that you don't have a file called ~/.ssh/config. It seems that parallel-ssh has a dependency on OpenSSH configuration, and this is the error you get when that file is missing.
read_openssh_config returns None here: https://github.com/pkittenis/parallel-ssh/blob/master/pssh/utils.py#L79
In turn, SSHClient.__init__ blows up when trying to unpack the values it expects to receive: https://github.com/pkittenis/parallel-ssh/blob/master/pssh/ssh_client.py#L97.
The fix is presumably to get some sort of OpenSSH config file in place, but I'm sorry to say I know nothing about that.
EDIT
After cleaning up some of parallel-ssh's exception handling, here's a better stack trace for the error:
Traceback (most recent call last):
File "test.py", line 11, in <module>
parallelsshjob()
File "test.py", line 7, in parallelsshjob
output = client.run_command('racadm getsvctag', sudo=True)
File "/Users/smarx/test/pssh/venv/lib/python2.7/site-packages/pssh/pssh_client.py", line 517, in run_command
self.get_output(cmd, output)
File "/Users/smarx/test/pssh/venv/lib/python2.7/site-packages/pssh/pssh_client.py", line 601, in get_output
(channel, host, stdout, stderr, stdin) = cmd.get()
File "/Users/smarx/test/pssh/venv/lib/python2.7/site-packages/gevent/greenlet.py", line 480, in get
self._raise_exception()
File "/Users/smarx/test/pssh/venv/lib/python2.7/site-packages/gevent/greenlet.py", line 171, in _raise_exception
reraise(*self.exc_info)
File "/Users/smarx/test/pssh/venv/lib/python2.7/site-packages/gevent/greenlet.py", line 534, in run
result = self._run(*self.args, **self.kwargs)
File "/Users/smarx/test/pssh/venv/lib/python2.7/site-packages/pssh/pssh_client.py", line 559, in _exec_command
channel_timeout=self.channel_timeout)
File "/Users/smarx/test/pssh/venv/lib/python2.7/site-packages/pssh/ssh_client.py", line 98, in __init__
host, config_file=_openssh_config_file)
TypeError: 'NoneType' object is not iterable
This was seemingly a regression in the 0.92.0 version of the library which is now resolved in 0.92.1. Previous versions also work. OpenSSH config should not be a dependency.
To answer your SSH agent question, if there is one running and enabled in the user session it gets used automatically. If you would prefer to provide a private key programmatically can do the following
from pssh import ParallelSSHClient
from pssh.utils import load_private_key
pkey = load_private_key('my_private_key')
client = ParallelSSHClient(hosts, pkey=pkey)
Can also provide an agent with multiple keys programmatically, per below
from pssh import ParallelSSHClient
from pssh.utils import load_private_key
from pssh.agent import SSHAgent
pkey = load_private_key('my_private_key')
agent = SSHAgent()
agent.add_key(pkey)
client = ParallelSSHClient(hosts, agent=agent)
See documentation for more examples.

Unable to ping managed nodes using ansible-2.0

I downloaded the ansible-2.0.0-0.2.alpha2.tar.gz and installed it on my control machine. However now I'm not able to ping any of my machines. Previously using v1.9.2 i could communicate with them. Now it gives the following error:
Unexpected Exception: lstat() argument 1 must be encoded string without NULL bytes, not str
the full traceback was:
Traceback (most recent call last):
File "/usr/bin/ansible", line 79, in
sys.exit(cli.run())
File "/usr/lib/python2.6/site-packages/ansible/cli/adhoc.py", line 111, in run
inventory = Inventory(loader=loader, variable_manager=variable_manager, host_list=self.options.inventory)
File "/usr/lib/python2.6/site-packages/ansible/inventory/init.py", line 77, in init
self.parse_inventory(host_list)
File "/usr/lib/python2.6/site-packages/ansible/inventory/init.py", line 133, in parse_inventory
host.vars = combine_vars(host.vars, self.get_host_variables(host.name))
File "/usr/lib/python2.6/site-packages/ansible/inventory/init.py", line 499, in get_host_variables
self.vars_per_host[hostname] = self.get_host_variables(hostname, vault_password=vault_password)
File "/usr/lib/python2.6/site-packages/ansible/inventory/__init.py", line 529, in get_host_variables
vars = combine_vars(vars, self.get_host_vars(host))
File "/usr/lib/python2.6/site-packages/ansible/inventory/__init_.py", line 653, in get_host_vars
return self.get_hostgroup_vars(host=host, group=None, new_pb_basedir=new_pb_basedir)
File "/usr/lib/python2.6/site-packages/ansible/inventory/__init_.py", line 702, in _get_hostgroup_vars
base_path = os.path.realpath(os.path.join(basedir, "host_vars/%s" % host.name))
File "/usr/lib64/python2.6/posixpath.py", line 365, in realpath
if islink(component):
File "/usr/lib64/python2.6/posixpath.py", line 132, in islink
st = os.lstat(path)
TypeError: lstat() argument 1 must be encoded string without NULL bytes, not str
Any help would be appreciated.
This is a known bug due to some Unicode changes made to the playbook parser in 2.0. Several versions of Python shipped with a version of shlex.split() that fails horribly on Unicode input- you likely have one of them installed. The bug has been worked around and will be included in the next drop. See https://github.com/ansible/ansible/issues/12257

Categories