PyMongo getting only the highest value of a string element

PyMongo getting only the highest value of a string element - python

i am running a python Application with Mongo DB Backend.
For an EOS Blockchain Query i need to find out the highest Value for an Attribute called eosTimestamp (String)
I created this code:
def getLastEOSTimestampProperty(client,db):
mydb = client[db]
collection = mydb["properties"]
ensure_index(collection,[('eosTimestamp',-1)])
filter={}
project={
'eosTimestamp': 1
}
sort=list({
'eosTimestamp': -1
}.items())
result = list(client[db][collection].find(
filter=filter,
projection=project,
sort=sort
).limit(1))
maxtimestamp=result[0]['eosTimestamp']
#return(maxblocknum)
return str(maxtimestamp)
When i run this code i get the following Error Message:
Traceback (most recent call last):
File "c:\projects\UplandDB\worker.py", line 408, in <module>
print(getLastEOSTimestampProperty(client,db))
File "c:\projects\UplandDB\worker.py", line 387, in getLastEOSTimestampProperty
result = list(client[db][collection].find(
File "C:\projects\UplandDB\.venv\lib\site-packages\pymongo\database.py", line 237, in __getitem__
return Collection(self, name)
File "C:\projects\UplandDB\.venv\lib\site-packages\pymongo\collection.py", line 212, in __init__
raise TypeError("name must be an instance of str")
TypeError: name must be an instance of str
Do You have any hints what i made wrong?

Related

Replit DB is not loading data into variables

i'm using replits database but when i try to load it in it returns an error
any ideas?
cookie = db[name]
cookiepc = db[name + "cookiepc"]
increase = db[name + "increase"]
the error is
Traceback (most recent call last):
File "main.py", line 25, in <module>
cookiepc = db[name + "cookiepc"]
File "/home/runner/Cookie-clicker/venv/lib/python3.8/site-packages/replit/database/database.py", line 439, in __getitem__
raw_val = self.get_raw(key)
File "/home/runner/Cookie-clicker/venv/lib/python3.8/site-packages/replit/database/database.py", line 479, in get_raw
raise KeyError(key)
KeyError: 'shdfgwbdhfbadwcookiepc'

According to the documentation for replit-py, a KeyError is raised when an attempt is made to read from a key that doesn't exist in the database.
You can use db.get to specify a default value for if the key doesn't exist:
print(db.get("b", "default")) # default
db["b"] = "pie"
print(db.get("b", "default")) # pie

My first hello word program in MistQL is throwing exception Expected RuntimeValueType.Number, got RuntimeValueType.String

I was trying this tutorial given in the MistQL for a personal work but this below code is throwing exception as given below
import mistql
data="{\"foo\": \"bar\"}"
query = '#.foo'
results = mistql.query(query, data)
print(results)
Traceback (most recent call last): File "C:\Temp\sample.py", line 4,
in
purchaserEmails = mistql.query(query, data) File "C:\Python3.10.0\lib\site-packages\mistql\query.py", line 18, in query
result = execute_outer(ast, data) File "C:\Python3.10.0\lib\site-packages\mistql\execute.py", line 73, in
execute_outer
return execute(ast, build_initial_stack(data, builtins)) File "C:\Python3.10.0\lib\site-packages\typeguard_init_.py", line 1033,
in wrapper
retval = func(*args, **kwargs) File "C:\Python3.10.0\lib\site-packages\mistql\execute.py", line 60, in
execute
return execute_fncall(ast.fn, ast.args, stack) File "C:\Python3.10.0\lib\site-packages\typeguard_init_.py", line 1033,
in wrapper
retval = func(*args, **kwargs) File "C:\Python3.10.0\lib\site-packages\mistql\execute.py", line 28, in
execute_fncall
return function_definition(arguments, stack, execute) File "C:\Python3.10.0\lib\site-packages\mistql\builtins.py", line 37, in
wrapped
return fn(arguments, stack, exec) File "C:\Python3.10.0\lib\site-packages\mistql\builtins.py", line 168, in
dot
return _index_single(RuntimeValue.of(right.name), left) File "C:\Python3.10.0\lib\site-packages\mistql\builtins.py", line 295, in
_index_single
assert_type(index, RVT.Number) File "C:\Python3.10.0\lib\site-packages\mistql\runtime_value.py", line 344,
in assert_type
raise MistQLTypeError(f"Expected {expected_type}, got {value.type}") mistql.exceptions.MistQLTypeError: Expected
RuntimeValueType.Number, got RuntimeValueType.String

Strings passed as data into MistQL's query method correspond to MistQL string types, and thus the #.foo query is attempting to index the string "{\"foo\": \"bar\"}") using the "foo" value, leading to the error above.
MistQL is expecting native dictionaries and lists. The correspondence between Python data types and MistQL data types can be seen here
That being said, the error message is indeed very confusing and should be fixed post-haste! Issue here

Pyspark UDF unable to use large dictionary

I've a dictionary consisting of keys = word, value = Array of 300 float numbers.
I'm unable to use this dictionary in my pyspark UDF.
When size of this dictionary is 2Million keys it does not work. But when I reduce the size to 200K it works.
This is my code for the function to be converted to UDF
def get_sentence_vector(sentence, dictionary_containing_word_vectors):
cleanedSentence = list(clean_text(sentence))
words_vector_list = np.zeros(300)# 300 dimensional vector
for x in cleanedSentence:
try:
words_vector_list = np.add(words_vector_list, dictionary_containing_word_vectors[str(x)])
except Exception as e:
print("Exception caught while finding word vector from Fast text pretrained model Dictionary: ",e)
return words_vector_list.tolist()
This is my UDF
get_sentence_vector_udf = F.udf(lambda val: get_sentence_vector(val, fast_text_dictionary), ArrayType(FloatType()))
This is how i'm calling the udf to be added as a column in my dataframe
dmp_df_with_vectors = df.filter(df.item_name.isNotNull()).withColumn("sentence_vector", get_sentence_vector_udf(df.item_name))
And this is the stack trace for the error
Traceback (most recent call last):
File "/usr/lib/spark/python/pyspark/broadcast.py", line 83, in dump
pickle.dump(value, f, 2)
SystemError: error return without exception set
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/spark/python/pyspark/sql/functions.py", line 1957, in wrapper
return udf_obj(*args)
File "/usr/lib/spark/python/pyspark/sql/functions.py", line 1916, in __call__
judf = self._judf
File "/usr/lib/spark/python/pyspark/sql/functions.py", line 1900, in _judf
self._judf_placeholder = self._create_judf()
File "/usr/lib/spark/python/pyspark/sql/functions.py", line 1909, in _create_judf
wrapped_func = _wrap_function(sc, self.func, self.returnType)
File "/usr/lib/spark/python/pyspark/sql/functions.py", line 1866, in _wrap_function
pickled_command, broadcast_vars, env, includes = _prepare_for_python_RDD(sc, command)
File "/usr/lib/spark/python/pyspark/rdd.py", line 2377, in _prepare_for_python_RDD
broadcast = sc.broadcast(pickled_command)
File "/usr/lib/spark/python/pyspark/context.py", line 799, in broadcast
return Broadcast(self, value, self._pickled_broadcast_vars)
File "/usr/lib/spark/python/pyspark/broadcast.py", line 74, in __init__
self._path = self.dump(value, f)
File "/usr/lib/spark/python/pyspark/broadcast.py", line 90, in dump
raise pickle.PicklingError(msg)
cPickle.PicklingError: Could not serialize broadcast: SystemError: error return without exception set

How big is your fast_text_dictionary in the 2M case? It might be too big.
Try broadcast it first before running udf. e.g.
broadcastVar = sc.broadcast(fast_text_dictionary)
Then use broadcastVar instead in your udf.
See the document for broadcast

Unable to access mongo entry by "id"

I have a Mongo Document with some fields (_id, id, name, status, etc...). I wrote a typical document in a class(like a model would do):
class mod(Document):
id=fields.IntField()
name = fields.StringField()
status=fields.StringField()
description_summary = fields.StringField()
_id = fields.ObjectIdField()
And with this model, I tried to access them:
>>> from mongoengine import *
>>> from api.models import *
>>> connect('doc')
MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True, read_preference=Primary())
I tried to fetch all the entries in the "mod" document: It Worked! I can get all the fields of all the entries (id, name, etc...)
>>> mod_ = mod.objects.all()
>>> mod_[0].name
'Name of entry'
>>> mod_[0].id
102
I tried to filter and return all the entries with the field "status" = "Incomplete": It works, just like before.I tried to filter other fields: it works too
>>> mod_ = mod.objects(status="Incomplete")
>>> mod_[0].name
'Name of entry'
>>> mod_[0].id
102
But When I try to filter with the id I don't manage to get a result:
>>> mod_ = mod.objects(id=102)
>>> mod_[0].name
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/.../lib/python3.4/site-packages/mongoengine/queryset/base.py", line 193, in __getitem__
return queryset._document._from_son(queryset._cursor[key],
File "/.../lib/python3.4/site-packages/pymongo/cursor.py", line 570, in __getitem__
raise IndexError("no such item for Cursor instance")
IndexError: no such item for Cursor instance
>>> mod_[0].id
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/.../lib/python3.4/site-packages/mongoengine/queryset/base.py", line 193, in __getitem__
return queryset._document._from_son(queryset._cursor[key],
File "/.../lib/python3.4/site-packages/pymongo/cursor.py", line 570, in __getitem__
raise IndexError("no such item for Cursor instance")
IndexError: no such item for Cursor instance
So I tried with mod.objects.get(id=102)
>>> mod_ = mod.objects.get(id=102)
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/.../lib/python3.4/site-packages/mongoengine/queryset/base.py", line 271, in get
raise queryset._document.DoesNotExist(msg)
api.models.DoesNotExist: mod matching query does not exist.
Mod matching query does not exist, so it doesn't recognize the id field but when I write mod_[0].id I do have a result, so what can be wrong?
EDIT: I believe that when writing mod.objects(id=102), the field id is interpreted as the _id. How can I specify, I want to query by id and not _id? My Document is already written, so I cannot change the name of the fields.

So, the problem does not come from the difference between _id and id, like said #HourGlass. The values of the id field were stored as integer, I wrote fields.IntField() for the id field, and called mod.objects(id=102) (without quotes).
But for some reason, I have to write them as fields.StringField(), and call mod.objects(id='102').

Unable to read column family using pycassa

I've just started using pycassa, so if this is a stupid question, I apologize upfront.
I have a column family with the following schema:
create column family MyColumnFamilyTest
with column_type = 'Standard'
and comparator = 'CompositeType(org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.TimeUUIDType)'
and default_validation_class = 'BytesType'
and key_validation_class = 'UTF8Type'
and read_repair_chance = 0.1
and dclocal_read_repair_chance = 0.0
and populate_io_cache_on_flush = false
and gc_grace = 864000
and min_compaction_threshold = 4
and max_compaction_threshold = 32
and replicate_on_write = true
and compaction_strategy = 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'
and caching = 'KEYS_ONLY'
and compression_options = {'sstable_compression' : 'org.apache.cassandra.io.compress.SnappyCompressor'};
When I try to do a get() with a valid key (works fine in cassandra-cli) I get:
Traceback (most recent call last):
File "<pyshell#19>", line 1, in <module>
cf.get('mykey',column_count=3)
File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/columnfamily.py", line 664, in get
return self._cosc_to_dict(list_col_or_super, include_timestamp, include_ttl)
File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/columnfamily.py", line 368, in _cosc_to_dict
ret[self._unpack_name(col.name)] = self._col_to_dict(col, include_timestamp, include_ttl)
File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/columnfamily.py", line 444, in _unpack_name
return self._name_unpacker(b)
File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/marshal.py", line 140, in unpack_composite
components.append(unpacker(bytestr[2:2 + length]))
File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/marshal.py", line 374, in <lambda>
return lambda v: uuid.UUID(bytes=v)
File "/usr/lib/python2.7/uuid.py", line 144, in __init__
raise ValueError('bytes is not a 16-char string')
ValueError: bytes is not a 16-char string
Here's some more information I've discovered:
When using cassandra-cli I can see the data as:
% cassandra-cli -h 10.249.238.131
Connected to: "LocalDB" on 10.249.238.131/9160
Welcome to Cassandra CLI version 1.2.10-SNAPSHOT
Type 'help;' or '?' for help.
Type 'quit;' or 'exit;' to quit.
[default#unknown] use Keyspace;
[default#Keyspace] list ColumnFamily;
Using default limit of 100
Using default cell limit of 100
-------------------
RowKey: urn:keyspace:ColumnFamily:a36e8ab1-7032-4e4c-a53d-e3317f63a640:
=> (name=autoZoning:::, value=01, timestamp=1391298393966000)
=> (name=creationTime:::, value=00000143efd8b76e, timestamp=1391298393966000)
=> (name=inactive:::14fe78e0-8b9b-11e3-b171-005056b700bb, value=00, timestamp=1391298393966000)
=> (name=label:::14fe78e0-8b9b-11e3-b171-005056b700bb, value=726a6d2d766e782d76613031, timestamp=1391298393966000)
1 Row Returned.
Elapsed time: 16 msec(s).
Since it was unclear what was causing the exception, I decided to add a print prior to the 'return self._name_unpacker(b)' line in columnfamily.py and I see:
>>> cf.get(dict(cf.get_range(column_count=0,filter_empty=False)).keys()[0])
Attempting to unpack: <00>\rautoZoning<00><00><00><00><00><00><00><00><00><00>
Traceback (most recent call last):
File "<pyshell#172>", line 1, in <module>
cf.get(dict(cf.get_range(column_count=0,filter_empty=False)).keys()[0])
File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/columnfamily.py", line 665, in get
return self._cosc_to_dict(list_col_or_super, include_timestamp, include_ttl)
File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/columnfamily.py", line 368, in _cosc_to_dict
ret[self._unpack_name(col.name)] = self._col_to_dict(col, include_timestamp, include_ttl)
File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/columnfamily.py", line 445, in _unpack_name
return self._name_unpacker(b)
File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/marshal.py", line 140, in unpack_composite
components.append(unpacker(bytestr[2:2 + length]))
File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/marshal.py", line 374, in <lambda>
return lambda v: uuid.UUID(bytes=v)
File "/usr/lib/python2.7/uuid.py", line 144, in __init__
raise ValueError('bytes is not a 16-char string')
ValueError: bytes is not a 16-char string
I have no idea where the extra characters are coming from around the column name. But that got me curious so I added another print in _cosc_to_dict in columnfamily.py and I see:
>>> cf.get(dict(cf.get_range(column_count=0,filter_empty=False)).keys()[0])
list_col_or_super is: []
list_col_or_super is: [ColumnOrSuperColumn(column=Column(timestamp=1391298393966000,
name='\x00\rautoZoning\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00', value='\x01', ttl=None),
counter_super_column=None, super_column=None, counter_column=None),
ColumnOrSuperColumn(column=Column(timestamp=1391298393966000,
name='\x00\x0ccreationTime\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00',
value='\x00\x00\x01C\xef\xd8\xb7n', ttl=None), counter_super_column=None, super_column=None,
counter_column=None), ColumnOrSuperColumn(column=Column(timestamp=1391298393966000,
name='\x00\x08inactive\x00\x00\x00\x00\x00\x00\x00\x00\x10\x14\xfex\xe0\x8b\x9b\x11\xe3\xb1q\x00PV\xb7\x00\xbb\x00', value='\x00', ttl=None), counter_super_column=None, super_column=None,
counter_column=None), ColumnOrSuperColumn(column=Column(timestamp=1391298393966000,
name='\x00\x05label\x00\x00\x00\x00\x00\x00\x00\x00\x10\x14\xfex\xe0\x8b\x9b\x11\xe3\xb1q\x00PV\xb7\x00\xbb\x00', value='thisIsATest', ttl=None), counter_super_column=None, super_column=None, counter_column=None)]
autoZoning unpack:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib64/python2.6/site-packages/pycassa-1.11.0-py2.6.egg/pycassa/columnfamily.py", line 666, in get
return self._cosc_to_dict(list_col_or_super, include_timestamp, include_ttl)
File "/usr/local/lib64/python2.6/site-packages/pycassa-1.11.0-py2.6.egg/pycassa/columnfamily.py", line 369, in _cosc_to_dict
ret[self._unpack_name(col.name)] = self._col_to_dict(col, include_timestamp, include_ttl)
File "/usr/local/lib64/python2.6/site-packages/pycassa-1.11.0-py2.6.egg/pycassa/columnfamily.py", line 446, in _unpack_name
return self._name_unpacker(b)
File "/usr/local/lib64/python2.6/site-packages/pycassa-1.11.0-py2.6.egg/pycassa/marshal.py", line 140, in unpack_composite
components.append(unpacker(bytestr[2:2 + length]))
File "/usr/local/lib64/python2.6/site-packages/pycassa-1.11.0-py2.6.egg/pycassa/marshal.py", line 374, in <lambda>
return lambda v: uuid.UUID(bytes=v)
File "/usr/lib64/python2.6/uuid.py", line 144, in __init__
raise ValueError('bytes is not a 16-char string')
ValueError: bytes is not a 16-char string
Am I correct in assuming that the extra characters around the column names are what is responsible for the 'ValueError: bytes is not a 16-char string' exception?
Also if I try to use the column name and select it I get:
>>> cf.get(u'urn:keyspace:ColumnFamily:a36e8ab1-7032-4e4c-a53d-e3317f63a640:',columns=['autoZoning:::'])
Traceback (most recent call last):
File "<pyshell#184>", line 1, in <module>
cf.get(u'urn:keyspace:ColumnFamily:a36e8ab1-7032-4e4c-a53d-e3317f63a640:',columns=['autoZoning:::'])
File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/columnfamily.py", line 651, in get
cp = self._column_path(super_column, column)
File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/columnfamily.py", line 383, in _column_path
self._pack_name(column, False))
File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/columnfamily.py", line 426, in _pack_name
return self._name_packer(value, slice_start)
File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/marshal.py", line 115, in pack_composite
packed = packer(item)
File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/marshal.py", line 298, in pack_uuid
randomize=True)
File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/util.py", line 75, in convert_time_to_uuid
'neither a UUID, a datetime, or a number')
ValueError: Argument for a v1 UUID column name or value was neither a UUID, a datetime, or a number
Any further thoughts?
Thanks,
Rob

Turns out that the problem wasn't with the key, it was being caused, in part, by a bug in pycassa that wasn't handling an empty (null) string in the column UUID. A short-term fix is in the answer in google groups:
https://groups.google.com/d/msg/pycassa-discuss/Vf_bSgDIi9M/KTA1kbE9IXAJ
The other part of the answer was to get at the columns by using tuples (with the UUID as a UUID and not a str) instead of a string with ':' separators because that's, as I found out, a cassandra-cli thing.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

PyMongo getting only the highest value of a string element - python

Related

Replit DB is not loading data into variables

My first hello word program in MistQL is throwing exception Expected RuntimeValueType.Number, got RuntimeValueType.String

Pyspark UDF unable to use large dictionary

Unable to access mongo entry by "id"

Unable to read column family using pycassa

Categories

Resources