python : sqlalchemy batch insert with on_conflict_update - python

I have to insert approx. 30000 rows daily in my postgres database,
I have 4 columns in my database namely :
id(pkey), category, createddate, updatedon.
My requirement is to update updatedon and category column with today's date and new category if id is present, else insert a new row with createddate and updateon being same.
I found Ilja Everilä's [answer]:https://stackoverflow.com/a/44865375/5665430 for batch update
insert_statement = sqlalchemy.dialects.postgresql.insert(id_tag)
upsert_statement = insert_statement.on_conflict_do_update(
constraint='id',
set_={ "createddate": insert_statement.excluded.createddate }
)
insert_values = df.to_dict(orient='records')
conn.execute(upsert_statement, insert_values)
Its throwing AttributeError,
Traceback (most recent call last):
File "<ipython-input-60-4c5e5e0daf14>", line 5, in <module>
set_= dict(createddate = insert_statement.excluded.createddate)
File "/home/bluepi/anaconda2/lib/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 764, in __get__
obj.__dict__[self.__name__] = result = self.fget(obj)
File "/home/bluepi/anaconda2/lib/python2.7/site-packages/sqlalchemy/dialects/postgresql/dml.py", line 43, in excluded
return alias(self.table, name='excluded').columns
File "/home/bluepi/anaconda2/lib/python2.7/site-packages/sqlalchemy/sql/selectable.py", line 161, in alias
return _interpret_as_from(selectable).alias(name=name, flat=flat)
AttributeError: 'TextClause' object has no attribute 'alias'
I have tried one by one update as shown here http://docs.sqlalchemy.org/en/latest/dialects/postgresql.html#postgresql-insert-on-conflict , but I am getting the same error.
Please help me understand where I am going wrong, thanks in advance.

From your comment
id_tag is nothing but mane of my table in postgres
one could deduce that id_tag is bound to a string. If you'd provided a Minimal, Complete, and Verifiable example, there'd been a lot less guesswork. As it turns out, postgresql.dml.insert() automatically wraps passed strings in a text() construct, and the result when trying to use Insert.excluded is:
In [2]: postgresql.insert('fail').excluded
~/sqlalchemy/lib/sqlalchemy/sql/selectable.py:43: SAWarning: Textual SQL FROM expression 'fail' should be explicitly declared as text('fail'), or use table('fail') for more specificity (this warning may be suppressed after 10 occurrences)
{"expr": util.ellipses_string(element)})
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-2-f176aac8b913> in <module>()
----> 1 postgresql.insert('fail').excluded
~/sqlalchemy/lib/sqlalchemy/util/langhelpers.py in __get__(self, obj, cls)
765 if obj is None:
766 return self
--> 767 obj.__dict__[self.__name__] = result = self.fget(obj)
768 return result
769
~/sqlalchemy/lib/sqlalchemy/dialects/postgresql/dml.py in excluded(self)
41
42 """
---> 43 return alias(self.table, name='excluded').columns
44
45 #_generative
~/sqlalchemy/lib/sqlalchemy/sql/selectable.py in alias(selectable, name, flat)
159
160 """
--> 161 return _interpret_as_from(selectable).alias(name=name, flat=flat)
162
163
AttributeError: 'TextClause' object has no attribute 'alias'
So, instead of passing a string containing the name of your table to postgresql.dml.insert() pass it an actual Table object, or a light weight table() construct that has been populated with column() objects.

Related

NotImplementedError: 'split_respect_sentence_boundary=True' is only compatible with split_by='word'

I have the following lines of code
from haystack.document_stores import InMemoryDocumentStore, SQLDocumentStore
from haystack.nodes import TextConverter, PDFToTextConverter,PreProcessor
from haystack.utils import clean_wiki_text, convert_files_to_docs, fetch_archive_from_http, print_answers
doc_dir = "C:\\Users\\abcd\\Downloads\\PDF Files\\"
docs = convert_files_to_docs(dir_path=doc_dir, clean_func=None, split_paragraphs=True
preprocessor = PreProcessor(
clean_empty_lines=True,
clean_whitespace=True,
clean_header_footer=True,
split_by="passage",
split_length=2)
doc = preprocessor.process(docs)
When i try to run it, i get the following error message
NotImplementedError Traceback (most recent call last)
c:\Users\abcd\Downloads\solr9.ipynb Cell 27 in <cell line: 23>()
16 print(type(docs))
17 preprocessor = PreProcessor(
18 clean_empty_lines=True,
19 clean_whitespace=True,
20 clean_header_footer=True,
21 split_by="passage",
22 split_length=2)
---> 23 doc = preprocessor.process(docs)
File ~\AppData\Roaming\Python\Python39\site-packages\haystack\nodes\preprocessor\preprocessor.py:167, in PreProcessor.process(self, documents, clean_whitespace, clean_header_footer, clean_empty_lines, remove_substrings, split_by, split_length, split_overlap, split_respect_sentence_boundary, id_hash_keys)
165 ret = self._process_single(document=documents, id_hash_keys=id_hash_keys, **kwargs) # type: ignore
166 elif isinstance(documents, list):
--> 167 ret = self._process_batch(documents=list(documents), id_hash_keys=id_hash_keys, **kwargs)
168 else:
169 raise Exception("documents provided to PreProcessor.prepreprocess() is not of type list nor Document")
File ~\AppData\Roaming\Python\Python39\site-packages\haystack\nodes\preprocessor\preprocessor.py:225, in PreProcessor._process_batch(self, documents, id_hash_keys, **kwargs)
222 def _process_batch(
223 self, documents: List[Union[dict, Document]], id_hash_keys: Optional[List[str]] = None, **kwargs
224 ) -> List[Document]:
--> 225 nested_docs = [
226 self._process_single(d, id_hash_keys=id_hash_keys, **kwargs)
...
--> 324 raise NotImplementedError("'split_respect_sentence_boundary=True' is only compatible with split_by='word'.")
326 if type(document.content) is not str:
327 logger.error("Document content is not of type str. Nothing to split.")
NotImplementedError: 'split_respect_sentence_boundary=True' is only compatible with split_by='word'.
I don't even have split_respect_sentence_boundary=True as my argument and also i don't have split_by='word' rather i have it set as split_by="passage".
This is the same error if i try changing it to split_by="sentence".
Do let me know if i am missing out anything here.
Tried using split_by="sentence" but getting same error.
As you can see in the PreProcessor API docs, the default value for split_respect_sentence_boundary is True.
In order to make your code work, you should specify split_respect_sentence_boundary=False:
preprocessor = PreProcessor(
clean_empty_lines=True,
clean_whitespace=True,
clean_header_footer=True,
split_by="passage",
split_length=2,
split_respect_sentence_boundary=False)
I agree that this behavior is not intuitive.
Currently, this node is undergoing a major refactoring.

Compute method is failing to assign values - Odoo 14

I' am trying to assign values when a users belongs into a project in this case the project name and manager to which he is assigned but the compute method gives an error, I've tried to log in the values just in case one of them came empty but no they definitely have the required data.
we can see in the logs:
root: Staff Augmentation - project_task.project_id.display_name
root: 120 - project_task.project_id.delivery_director.id
#api.depends('user_id')
#api.model
def _get_assigned_project(self):
today = fields.Date.today()
for employee in self:
project_task = self.env['project.task'].search([('user_id','=',employee.user_id.id),
('status','=','assigned'),
('date_end','>',today),
('project_id.original_project_id', '=', False)], order='date_end desc', limit=1)
if project_task:
employee.project_name = project_task.project_id.display_name
if project_task.project_id.role_id.name = 'Delivery Manager' and employee.base_manager != 0:
employee.parent_id = project_task.project_id.delivery_director.id
else:
employee.parent_id = project_task.project_id.delivery_manager.id
else:
employee.project_name = ""
if employee.base_manager != 0:
employee.parent_id = employee.base_manager
this is the error:
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/odoo/src/odoo/odoo/http.py", line 640, in _handle_exception
return super(JsonRequest, self)._handle_exception(exception)
File "/home/odoo/src/odoo/odoo/http.py", line 316, in _handle_exception
raise exception.with_traceback(None) from new_cause
ValueError: Compute method failed to assign hr.employee(266,).project_name
You need to assign employee.parent_id in your compute method , If your if else are failed to assign the value to the field than this error will comes up.
So check your method and put an else condition after your if condition and give assignment like employee.parent_id = False.
Because if in any case your compute method failed to assign value which we think it should be, than they throws error.
Put default value for the field to be assigned in case it did'n fulfill the computed assignment condition

sqlalchemy: Boolean expression for hybrid_property

I have a sqlalchemy class that represents a table with columns FileID and SettlementDate. I want to create a hybrid property to say whether a given instance is the maximum FileID for its SettlementDate, and an associated expression to use when querying. I've successfully got the property working, but am struggling with the expression. Here's the existing model:
class Hdr(model.Base):
id = Column('ID', Integer, primary_key=True)
file_id = Column('FileID', BIGINT, ForeignKey('FileRegister.Files.ID'))
settlement_date = Column('SettlementDate', Date)
#hybrid_property
def is_latest(self):
subquery = (
object_session(self)
.query(func.max(Hdr.file_id).label('file_id'))
.group_by(Hdr.settlement_date)
.subquery()
)
return (
object_session(self)
.query(func.count(Hdr.file_id).cast(Boolean))
.filter(subquery.c.file_id==self.file_id)
.scalar()
)
I'd like to think I can do something along the lines of:
subquery = (
select((func.max(Hdr.file_id).label('file_id'), ))
.group_by(Hdr.settlement_date)
.alias('a')
)
s = select(
case(
whens=[
(Hdr.file_id.in_(subquery), 1)
],
else_=0
)
)
But this raises an error Boolean value of this clause is not defined.
Any help would be greatly appreciated!
Traceback follows:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
c:\Users\\venvs\insight\lib\site-packages\sqlalchemy-1.3.0b3-py3.7-win-amd64.egg\sqlalchemy\sql\selectable.py in __init__(self, columns, whereclause, from_obj, distinct, having, correlate, prefixes, suffixes, **kwargs)
2889 try:
-> 2890 cols_present = bool(columns)
2891 except TypeError:
c:\Users\\venvs\insight\lib\site-packages\sqlalchemy-1.3.0b3-py3.7-win-amd64.egg\sqlalchemy\sql\elements.py in __bool__(self)
515 def __bool__(self):
--> 516 raise TypeError("Boolean value of this clause is not defined")
517
TypeError: Boolean value of this clause is not defined
During handling of the above exception, another exception occurred:
ArgumentError Traceback (most recent call last)
<ipython-input-20-4946a4bf7faa> in <module>
10 (Hdr.file_id.in_(subquery), 1)
11 ],
---> 12 else_=0
13 )
14 )
<string> in select(columns, whereclause, from_obj, distinct, having, correlate, prefixes, suffixes, **kwargs)
<string> in __init__(self, columns, whereclause, from_obj, distinct, having, correlate, prefixes, suffixes, **kwargs)
c:\Users\\venvs\insight\lib\site-packages\sqlalchemy-1.3.0b3-py3.7-win-amd64.egg\sqlalchemy\util\deprecations.py in warned(fn, *args, **kwargs)
128 )
129
--> 130 return fn(*args, **kwargs)
131
132 doc = fn.__doc__ is not None and fn.__doc__ or ""
c:\Users\\venvs\insight\lib\site-packages\sqlalchemy-1.3.0b3-py3.7-win-amd64.egg\sqlalchemy\sql\selectable.py in __init__(self, columns, whereclause, from_obj, distinct, having, correlate, prefixes, suffixes, **kwargs)
2891 except TypeError:
2892 raise exc.ArgumentError(
-> 2893 "columns argument to select() must "
2894 "be a Python list or other iterable"
2895 )
ArgumentError: columns argument to select() must be a Python list or other iterable
The problem is
s = select(case(...))
The first argument to select() should be a sequence of column elements or from clause objects. It seems that SQLAlchemy at some point checks if the passed sequence is empty or not by doing bool(columns). The solution is to simply wrap it in a sequence, as you have done in creating the subquery:
s = select([case(...)])
In the hybrid property's "Python side" instead of counting if the maximum file_id of any settlement_date happens to match the instance's you could filter by the instance's settlement_date and check against the maximum:
class Hdr(model.Base):
#hybrid_property
def is_latest(self):
max_file_id = (
object_session(self)
.query(func.max(Hdr.file_id))
.filter(Hdr.settlement_date == self.settlement_date)
.scalar()
)
return max_file_id == self.file_id
In the expression you don't need to wrap the boolean expression in a scalar subquery, but return the boolean expression itself:
#is_latest.expression
def is_latest(cls):
hdr_alias = aliased(Hdr)
subquery = (
select([func.max(hdr_alias.file_id)])
.group_by(hdr_alias.settlement_date)
)
return cls.file_id.in_(subquery)

pydicom 'Dataset' object has no attribute 'TransferSyntaxUID'

I'm using pydicom 1.0.0a1, downloaded from here, When I run the following code:
ds=pydicom.read_file('./DR/abnormal/abc.dcm',force=True)
ds.pixel_array
this error occurs:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-6-d4e81d303439> in <module>()
7 ds=pydicom.read_file('./DR/abnormal/abc.dcm',force=True)
8
----> 9 ds.pixel_array
10
/Applications/anaconda/lib/python2.7/site-packages/pydicom-1.0.0a1-py2.7.egg/pydicom/dataset.pyc in __getattr__(self, name)
501 if tag is None: # `name` isn't a DICOM element keyword
502 # Try the base class attribute getter (fix for issue 332)
--> 503 return super(Dataset, self).__getattribute__(name)
504 tag = Tag(tag)
505 if tag not in self: # DICOM DataElement not in the Dataset
/Applications/anaconda/lib/python2.7/site-packages/pydicom-1.0.0a1-py2.7.egg/pydicom/dataset.pyc in pixel_array(self)
1064 The Pixel Data (7FE0,0010) as a NumPy ndarray.
1065 """
-> 1066 return self._get_pixel_array()
1067
1068 # Format strings spec'd according to python string formatting options
/Applications/anaconda/lib/python2.7/site-packages/pydicom-1.0.0a1-py2.7.egg/pydicom/dataset.pyc in _get_pixel_array(self)
1042 elif self._pixel_id != id(self.PixelData):
1043 already_have = False
-> 1044 if not already_have and not self._is_uncompressed_transfer_syntax():
1045 try:
1046 # print("Pixel Data is compressed")
/Applications/anaconda/lib/python2.7/site-packages/pydicom-1.0.0a1-py2.7.egg/pydicom/dataset.pyc in _is_uncompressed_transfer_syntax(self)
662 """Return True if the TransferSyntaxUID is a compressed syntax."""
663 # FIXME uses file_meta here, should really only be thus for FileDataset
--> 664 return self.file_meta.TransferSyntaxUID in NotCompressedPixelTransferSyntaxes
665
666 def __ne__(self, other):
/Applications/anaconda/lib/python2.7/site-packages/pydicom-1.0.0a1-py2.7.egg/pydicom/dataset.pyc in __getattr__(self, name)
505 if tag not in self: # DICOM DataElement not in the Dataset
506 # Try the base class attribute getter (fix for issue 332)
--> 507 return super(Dataset, self).__getattribute__(name)
508 else:
509 return self[tag].value
AttributeError: 'Dataset' object has no attribute 'TransferSyntaxUID'
I read the google group post , and I changed the filereader.py file to the posted file, and I got this error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Applications/anaconda/lib/python2.7/site-packages/pydicom-1.0.0a1-py2.7.egg/pydicom/__init__.py", line 41, in read_file
from pydicom.dicomio import read_file
File "/Applications/anaconda/lib/python2.7/site-packages/pydicom-1.0.0a1-py2.7.egg/pydicom/dicomio.py", line 3, in <module>
from pydicom.filereader import read_file, read_dicomdir
File "/Applications/anaconda/lib/python2.7/site-packages/pydicom-1.0.0a1-py2.7.egg/pydicom/filereader.py", line 35, in <module>
from pydicom.datadict import dictionaryVR
ImportError: cannot import name dictionaryVR
Does anybody know how to solve this problem?
You should set the TransferSyntaxUID after reading the file before trying to get the pixel_array.
import pydicom.uid
ds=pydicom.read_file('./DR/abnormal/abc.dcm',force=True)
ds.file_meta.TransferSyntaxUID = pydicom.uid.ImplicitVRLittleEndian # or whatever is the correct transfer syntax for the file
ds.pixel_array
The correction from the post you referenced was done before some changes in the code to harmonize some naming, so the error is thrown because the current master uses dictionary_VR rather than dictionaryVR. Setting the transfer syntax in user code as above avoids that problem.

Unable to access mongo entry by "id"

I have a Mongo Document with some fields (_id, id, name, status, etc...). I wrote a typical document in a class(like a model would do):
class mod(Document):
id=fields.IntField()
name = fields.StringField()
status=fields.StringField()
description_summary = fields.StringField()
_id = fields.ObjectIdField()
And with this model, I tried to access them:
>>> from mongoengine import *
>>> from api.models import *
>>> connect('doc')
MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True, read_preference=Primary())
I tried to fetch all the entries in the "mod" document: It Worked! I can get all the fields of all the entries (id, name, etc...)
>>> mod_ = mod.objects.all()
>>> mod_[0].name
'Name of entry'
>>> mod_[0].id
102
I tried to filter and return all the entries with the field "status" = "Incomplete": It works, just like before.I tried to filter other fields: it works too
>>> mod_ = mod.objects(status="Incomplete")
>>> mod_[0].name
'Name of entry'
>>> mod_[0].id
102
But When I try to filter with the id I don't manage to get a result:
>>> mod_ = mod.objects(id=102)
>>> mod_[0].name
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/.../lib/python3.4/site-packages/mongoengine/queryset/base.py", line 193, in __getitem__
return queryset._document._from_son(queryset._cursor[key],
File "/.../lib/python3.4/site-packages/pymongo/cursor.py", line 570, in __getitem__
raise IndexError("no such item for Cursor instance")
IndexError: no such item for Cursor instance
>>> mod_[0].id
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/.../lib/python3.4/site-packages/mongoengine/queryset/base.py", line 193, in __getitem__
return queryset._document._from_son(queryset._cursor[key],
File "/.../lib/python3.4/site-packages/pymongo/cursor.py", line 570, in __getitem__
raise IndexError("no such item for Cursor instance")
IndexError: no such item for Cursor instance
So I tried with mod.objects.get(id=102)
>>> mod_ = mod.objects.get(id=102)
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/.../lib/python3.4/site-packages/mongoengine/queryset/base.py", line 271, in get
raise queryset._document.DoesNotExist(msg)
api.models.DoesNotExist: mod matching query does not exist.
Mod matching query does not exist, so it doesn't recognize the id field but when I write mod_[0].id I do have a result, so what can be wrong?
EDIT: I believe that when writing mod.objects(id=102), the field id is interpreted as the _id. How can I specify, I want to query by id and not _id? My Document is already written, so I cannot change the name of the fields.
So, the problem does not come from the difference between _id and id, like said #HourGlass. The values of the id field were stored as integer, I wrote fields.IntField() for the id field, and called mod.objects(id=102) (without quotes).
But for some reason, I have to write them as fields.StringField(), and call mod.objects(id='102').

Categories