How to fix IncompleteRead error on Linux using Py2Neo

How to fix IncompleteRead error on Linux using Py2Neo - python

I am updating data on a Neo4j server using Python (2.7.6) and Py2Neo (1.6.4). My load function is:
from py2neo import neo4j,node, rel, cypher
session = cypher.Session('http://my_neo4j_server.com.mine:7474')
def load_data():
tx = session.create_transaction()
for row in dataframe.iterrows(): #dataframe is a pandas dataframe
name = row[1].name
id = row[1].id
merge_query = "MERGE (a:label {name:'%s', name_var:'%s'}) " % (id, name)
tx.append(merge_query)
tx.commit()
When I execute this from Spyder in Windows it works great. All the data from the dataframe is committed to neo4j and visible in the graph. However, when I run this from a linux server (different from the neo4j server) I get the following error at tx.commit(). Note that I have the same version of python and py2neo.
INFO:py2neo.packages.httpstream.http:>>> POST http://neo4j1.qs:7474/db/data/transaction/commit [1360120]
INFO:py2neo.packages.httpstream.http:<<< 200 OK [chunked]
ERROR:__main__:some part of process failed
Traceback (most recent call last):
File "my_file.py", line 132, in load_data
tx.commit()
File "/usr/local/lib/python2.7/site-packages/py2neo/cypher.py", line 242, in commit
return self._post(self._commit or self._begin_commit)
File "/usr/local/lib/python2.7/site-packages/py2neo/cypher.py", line 208, in _post
j = rs.json
File "/usr/local/lib/python2.7/site-packages/py2neo/packages/httpstream/http.py", line 563, in json
return json.loads(self.read().decode(self.encoding))
File "/usr/local/lib/python2.7/site-packages/py2neo/packages/httpstream/http.py", line 634, in read
data = self._response.read()
File "/usr/local/lib/python2.7/httplib.py", line 543, in read
return self._read_chunked(amt)
File "/usr/local/lib/python2.7/httplib.py", line 597, in _read_chunked
raise IncompleteRead(''.join(value))
IncompleteRead: IncompleteRead(128135 bytes read)
This post (IncompleteRead using httplib) suggests that is an httplib error. I am not sure how to handle since I am not calling httplib directly.
Any suggestions for getting this load to work on Linux or what the IncompleteRead error message means?
UPDATE :
The IncompleteRead error is being caused by a Neo4j error being returned. The line returned in _read_chunked that is causing the error is:
pe}"}]}],"errors":[{"code":"Neo.TransientError.Network.UnknownFailure"
Neo4j docs say this is an unknown network error.

Although I can't say for sure, this implies some kind of local network issue between client and server rather than a bug within the library. Py2neo wraps httplib (which is pretty solid itself) and, from the stack trace, it looks as though the client is expecting more chunks from a chunked response.
To diagnose further, you could make some curl calls from your Linux application server to your database server and see what succeeds and what doesn't. If that works, try writing a quick and dirty python script to make the same calls with httplib directly.
UPDATE 1: Given the update above and the fact that the server streams its responses, I'm thinking that the chunk size might represent the intended payload but the error cuts the response short. Recreating the issue with curl certainly seems like the best next step to help determine whether it is a fault in the driver, the server or something else.
UPDATE 2: Looking again this morning, I notice that you're using Python substitution for the properties within the MERGE statement. As good practice, you should use parameter substitution at the Cypher level:
merge_query = "MERGE (a:label {name:{name}, name_var:{name_var}})"
merge_params = {"name": id, "name_var": name}
tx.append(merge_query, merge_params)

Related

Document AI process document fails with invalid argument when processing docs from GCS

I am getting an error very similar to the below, but I am not in EU:
Document AI: google.api_core.exceptions.InvalidArgument: 400 Request contains an invalid argument
When I use the raw_document and process a local pdf file, it works fine. However, when I specify a pdf file on a GCS location, it fails.
Error message:
the processor name: projects/xxxxxxxxx/locations/us/processors/f7502cad4bccdd97
the form process request: name: "projects/xxxxxxxxx/locations/us/processors/f7502cad4bccdd97"
inline_document {
uri: "gs://xxxx/temp/test1.pdf"
}
Traceback (most recent call last):
File "C:\Python39\lib\site-packages\google\api_core\grpc_helpers.py", line 66, in error_remapped_callable
return callable_(*args, **kwargs)
File "C:\Python39\lib\site-packages\grpc\_channel.py", line 946, in __call__
return _end_unary_response_blocking(state, call, False, None)
File "C:\Python39\lib\site-packages\grpc\_channel.py", line 849, in _end_unary_response_blocking
raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.INVALID_ARGUMENT
details = "Request contains an invalid argument."
debug_error_string = "{"created":"#1647296055.582000000","description":"Error received from peer ipv4:142.250.80.74:443","file":"src/core/lib/surface/call.cc","file_line":1070,"grpc_message":"Request contains an invalid argument.","grpc_status":3}"
>
Code:
client = documentai.DocumentProcessorServiceClient(client_options=opts)
# The full resource name of the processor, e.g.:
# projects/project-id/locations/location/processor/processor-id
# You must create new processors in the Cloud Console first
name = f"projects/{project_id}/locations/{location}/processors/{processor_id}"
print(f'the processor name: {name}')
# document = {"uri": gcs_path, "mime_type": "application/pdf"}
document = {"uri": gcs_path}
inline_document = documentai.Document()
inline_document.uri = gcs_path
# inline_document.mime_type = "application/pdf"
# Configure the process request
# request = {"name": name, "inline_document": document}
request = documentai.ProcessRequest(
inline_document=inline_document,
name=name
)
print(f'the form process request: {request}')
result = client.process_document(request=request)
I do not believe I have permission issues on the bucket since the same set up works fine for a document classification process on the same bucket.

This is a known issue for Document AI, and is already reported in this issue tracker. Unfortunately the only workaround for now is to either:
Download your file, read the file as bytes and use process_documents(). See Document AI local processing for the sample code.
Use batch_process_documents() since by default is only accepts files from GCS. This is if you don't want to do the extra step on downloading the file.

This is still an issue 5 months later, and something not mentioned in the accepted answer is (and I could be wrong but it seems to me) that batch processes are only able to output their results to GCS, so you'll still incur the extra step of downloading something from a bucket (be it the input document under Option 1 or the result under Option 2). On top of that, you'll end up having to do cleanup in the bucket if you don't want the results there, so under many circumstances, Option 2 won't present much of an advantage other than the fact that the result download will probably be smaller than the input file download.
I'm using the client library in a Python Cloud Function and I'm affected by this issue. I'm implementing Option 1 for the reason that it seems simplest and I'm holding out for the fix. I also considered using the Workflow client library to fire a Workflow that runs a Document AI process, or calling the Document AI REST API, but it's all very suboptimal.

Why is UPnPy discover throwing AttributeError?

I am working with UPnPy, and I immediately notice an issue when attempting to discover devices on my local network. Here is the basic code I am using:
import upnpy
upnp = upnpy.UPnP()
devices = upnp.discover()
This throws the following exception:
Traceback (most recent call last):
File "C:\Users\name\Projects\pythonProject\main.py", line 5, in <module>
devices = upnp.discover()
File "C:\Users\name\Projects\pythonProject\venv\lib\site-packages\upnpy\upnp\UPnP.py", line 33, in discover
for device in self.ssdp.m_search(discover_delay=delay, st='upnp:rootdevice', **headers):
File "C:\Users\name\Projects\pythonProject\venv\lib\site-packages\upnpy\ssdp\SSDPRequest.py", line 50, in m_search
devices = self._send_request(self._get_raw_request())
File "C:\Users\name\Projects\pythonProject\venv\lib\site-packages\upnpy\ssdp\SSDPRequest.py", line 100, in _send_request
device = SSDPDevice(addr, response.decode())
File "C:\Users\name\Projects\pythonProject\venv\lib\site-packages\upnpy\ssdp\SSDPDevice.py", line 87, in __init__
self._get_services_request()
File "C:\Users\name\Projects\pythonProject\venv\lib\site-packages\upnpy\ssdp\SSDPDevice.py", line 23, in wrapper
return func(device, *args, **kwargs)
File "C:\Users\name\Projects\pythonProject\venv\lib\site-packages\upnpy\ssdp\SSDPDevice.py", line 54, in wrapper
return func(instance, *args, **kwargs)
File "C:\Users\name\Projects\pythonProject\venv\lib\site-packages\upnpy\ssdp\SSDPDevice.py", line 171, in _get_services_request
event_sub_url = service.getElementsByTagName('eventSubURL')[0].firstChild.nodeValue
AttributeError: 'NoneType' object has no attribute 'nodeValue'
I have been researching the cause of this but I have found nothing. I am using UPnPy version 1.1.8. I use PyCharm as my IDE. I've tried using previous versions of UPnPy but none seem to be working. Any help would be appreciated. Thanks!

Most likely you have a non-compliant UPnP Device in your home network that is serving a non-standard/broken XML at their description location, and upnpy was not smart enough to handle the parsing error and perhaps ignore that device.
That scenario is more common than you might think: many Smart TVs (LG ones for sure) have an embedded device that advertises as UPnP but their description endpoint answers JSON instead of XML!
Some suggestions:
Use a different library or app (you could try my own), at least to identify the culprit. Check turn on verbosity and look for warnings and parsing error logs
Use a scanner such as tcpdump to sniff network packets (UDP port 1900) for SSDP NOTICE advertises, and manually open each LOCATION URL in a browser to see if they're valid XML
Selectively turn off / unplug devices you think might be UPnP-enabled, such as Smart TVs, Home Theaters, Video Game consoles, Routers, etc, to see wich one was serving the bogus XML.
Edit your local copy of upnpy to handle that error, for example enclose the function/line with a try/except block and print some details about what is it trying to parse prior to the error.

Using pysolr getting 400 error when trying to add data to solr

Using pysolr trying to add document to solr with Python3.9 and getting below error 400 even with only 1 or 2 fields.
The fields that I'm using here are dynamic fields.
No issue with connecting to solr.
#/usr/bin/python
import pysolr
solr = pysolr.Solr('http://localhost:8080/solr/', always_commit=True)
if solr.ping():
print('connection successful')
docs = [{'id':'123c', 's_chan_name': 'TV-201'}]
solr.add(docs)
res = solr.search('123c')
print(res)
Getting below error:
connection successful
Traceback (most recent call last):
File "/tmp/test-solr.py", line 8, in <module>
solr.add(docs)
File "/usr/local/lib/python3.9/site-packages/pysolr.py", line 1042, in add
return self._update( File "/usr/local/lib/python3.9/site-packages/pysolr.py", line 568, in
_update
return self._send_request( File "/usr/local/lib/python3.9/site-packages/pysolr.py", line 463, in
_send_request
raise SolrError(error_message % (resp.status_code, solr_message))
pysolr.SolrError: Solr responded with an error (HTTP 400): [Reason: None]
<html><head><meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>Error 400 Unexpected character &apos;[&apos; (code 91) in prolog;
expected &apos;<&apos; at [row,col {unknown-source}]: [1,1]</title>
</head><body><h2>HTTP ERROR 400</h2><p>Problem accessing /solr/update/.
Reason:<pre> Unexpected character &apos;[&apos; (code 91) in prolog;
expected &apos;<&apos; at [row,col {unknown-source}]: [1,1]</pre>
</p><hr><a href="http://eclipse.org/jetty">
Powered by Jetty://9.4.15.v20190215</a><hr/></body></html>

I’d recommend looking at the debug logs (which might require you to use logging.basicConfig() to enable them) and testing the URLs it’s using yourself. Depending on your Solr configuration, it might be the case that your Solr URL needs to have the core at the end (e.g. http://localhost:8983/solr/mycore).
In general, most reports like this which we get turn out to be oddities in how someone configured Solr. In addition to the debug logging, I highly recommend looking through the test suite since that does everything needed to run Solr and can be a handy point for comparison:
https://github.com/django-haystack/pysolr

solr.add method supports only one attribute
docs = {'id':'123c', 's_chan_name': 'TV-201'}
solr.add(docs)
it is not suitable for iterable.
docs = [{'id':'123c', 's_chan_name': 'TV-201'}]
solr.add_many(docs)
it will works

Why would inspect.getfile give me a file that's not there?

Or, Saltstack + docker-py AttributeError: 'RecentlyUsedContainer' object has no attribute 'lock'
I have been digging into this issue to no avail. I'm trying to use SaltStack to manage my docker images/containers but ran into this problem.
Initially I was using the salt state docker.running but that presented as the command does not exist. When I changed the state to docker.running, I got the traceback I posted over at that GitHub issue:
ID: scheduler
Function: docker.pulled
Result: False
Comment: An exception occurred in this state: Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/salt/state.py", line 1563, in call
**cdata['kwargs'])
File "/usr/lib/python2.7/dist-packages/salt/states/dockerio.py", line 271, in pulled
returned = pull(name, tag=tag, insecure_registry=insecure_registry)
File "/usr/lib/python2.7/dist-packages/salt/modules/dockerio.py", line 1599, in pull
client = _get_client()
File "/usr/lib/python2.7/dist-packages/salt/modules/dockerio.py", line 277, in _get_client
client._version = client.version()['ApiVersion']
File "/usr/local/lib/python2.7/dist-packages/docker/client.py", line 837, in version
return self._result(self._get(url), json=True)
File "/usr/local/lib/python2.7/dist-packages/docker/clientbase.py", line 86, in _get
return self.get(url, **self._set_request_timeout(kwargs))
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 310, in get
#: Stream response content default.
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 279, in request
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 374, in send
url=request.url,
File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 155, in send
**proxy_kwargs)
File "/usr/local/lib/python2.7/dist-packages/docker/unixconn/unixconn.py", line 74, in get_connection
with self.pools.lock:
AttributeError: 'RecentlyUsedContainer' object has no attribute 'lock'
Started: 09:33:42.873628
Duration: 22.115 ms
After searching Google a bit more and coming up with nothing, I went ahead and started reading the source.
After reading unixconn.py and realizing that RecentlyUsedContainer was coming from urllib3, I went and tracked down the source for that and discovered that there was a _lock attribute that was changed to lock a while ago. That seemed strange.
I looked closer at the imports and realized that unixconn.py was attempting to use requests' built-in urllib3 and then falling back to the stand alone urllib3. So I checked out the requests urllib3 and found that it did, indeed have the _lock -> lock change. But it was newer than my version of requests. So I upgraded requests and tried again. Still no dice - same AttributeError.
Now things start to get weird.
In order to get information back to my salt master, I started mucking with the docker-py and urllib3 code on my salt minion. At first I raised exceptions with urllib3.__file__ to make sure I was using the right file. But occasionally the file name that it would return was in a file and a folder that did not exist. Usually it was displaying /usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/_collections.pyc, but when I would delete that file thinking that maybe the .pyc being cached was causing a problem it would still say that was the __file__, even though it didn't exist.
Then I discovered inspect.getfile. And I got the same bizarre behavior - I could delete the .pyc file and yet inspect.getfile(self.pools) would return the non-existent file.
To make life even better, I've added
raise Exception('Pining for the Fjords')
to
/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/_collections.py
At the end of the RecentlyUsedContainer.__init__. Yet that exception does not raise.
And I have just confirmed that something is in fact lying to me, because despite changing unixconn.py
def get_connection(self, url, proxies=None):
import inspect
r = RecentlyUsedContainer(10)
raise Exception(inspect.getfile(r.__class__) + '\n' + r.__doc__)
which returns /usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/_collections.pyc, when I go edit that .pyc and modify the RecentlyUsedContainer's docstring I get the original docstring.
And finally, when I edit /usr/lib/python2.7/dist-packages/urllib3/_collections.pyc and change it's docstring, (or the same path but _collections.py instead)...
I still get the same docstring!
Why is the wrong code getting executed here, and how can I find out where it is so I can fix the problem?

So I finally figured out the problem:
It did have something to do with salt. For some reason the way the salt minion imported the docker-py library it did some sort of... partial hold on the imports. I suspect that what was happening was that salt was re-importing just the docker-py library specifically so when I would make changes to those files the changes would show up.
However, since the Python import mechanism will search for pre-imported modules first the urllib3 code was never re-imported.
Ultimately all that is required is to restart the salt minion:
salt 'my-minion' cmd.run "nohup /bin/sh -c 'sleep 10 && salt-call --local service.restart salt-minion'"

Soap Server raised fault: 'java.lang.NullPointerException'. How to debug?

I'm trying to call a SOAP webservice from the Dutch land register (WSDL here). I first tried doing that using the pysimplesoap library. Although I do get relevant xml back, pysimplesoap gives a TypeError: Tag: IMKAD_Perceel invalid (type not found) (I created a SO question about that here). Since I suspect this to be a bug in pysimplesoap I'm now trying to use the suds library.
In pysimplesoap the following returned correct xml (but as I said pysimplesoap gave a TypeError):
from pysimplesoap.client import SoapClient
client = SoapClient(wsdl='http://www1.kadaster.nl/1/schemas/kik-inzage/20141101/verzoekTotInformatie-2.1.wsdl', username=xxx, password=xxx, trace=True)
response = client.VerzoekTotInformatie(
Aanvraag={
'berichtversie': '4.7', # Refers to the schema version: http://www.kadaster.nl/web/show?id=150593&op=/1/schemas/homepage.html
'klantReferentie': 'MyReference1', # Refers to something we can set ourselves.
'productAanduiding': '1185', # a four-digit code referring to whether the response should be in "XML" (1185), "PDF" (1191) or "XML and PDF" (1057).
'Ingang': {
'Object': {
'IMKAD_KadastraleAanduiding': {
'gemeente': 'ARNHEM',
'sectie': 'AC',
'perceelnummer': '1234'
}
}
}
}
)
This produced the xml below:
<soap:Body>
<VerzoekTotInformatieRequest xmlns="http://www.kadaster.nl/schemas/kik-inzage/20141101">
<Aanvraag xmlns="http://www.kadaster.nl/schemas/kik-inzage/ip-aanvraag/v20141101">
<berichtversie xmlns="http://www.kadaster.nl/schemas/kik-inzage/ip-aanvraag/v20141101">4.7</berichtversie>
<klantReferentie xmlns="http://www.kadaster.nl/schemas/kik-inzage/ip-aanvraag/v20141101">ARNHEM-AC-1234</klantReferentie>
<productAanduiding xmlns="http://www.kadaster.nl/schemas/kik-inzage/ip-aanvraag/v20141101">1185</productAanduiding>
<Ingang xmlns="http://www.kadaster.nl/schemas/kik-inzage/ip-aanvraag/v20141101">
<Object xmlns="http://www.kadaster.nl/schemas/kik-inzage/ip-aanvraag/v20141101">
<IMKAD_KadastraleAanduiding xmlns="http://www.kadaster.nl/schemas/kik-inzage/ip-aanvraag/v20141101">
<gemeente xmlns="http://www.kadaster.nl/schemas/kik-inzage/ip-aanvraag/v20141101">ARNHEM AC</gemeente>
<sectie xmlns="http://www.kadaster.nl/schemas/kik-inzage/ip-aanvraag/v20141101">AC</sectie>
<perceelnummer xmlns="http://www.kadaster.nl/schemas/kik-inzage/ip-aanvraag/v20141101">5569</perceelnummer>
</IMKAD_KadastraleAanduiding>
</Object>
</Ingang>
</Aanvraag>
</VerzoekTotInformatieRequest>
</soap:Body>
So now I tried changing this code to use suds instead. So far I came up with this:
from suds.client import Client
client = Client(url='http://www1.kadaster.nl/1/schemas/kik-inzage/20141101/verzoekTotInformatie-2.1.wsdl', username='xxx', password='xxx')
Aanvraag = client.factory.create('ns3:Aanvraag')
Aanvraag.berichtversie = '4.7'
Aanvraag.klantReferentie = 'MyReference1'
Aanvraag.productAanduiding = '1185'
IMKAD_KadastraleAanduiding = client.factory.create('ns3:IMKAD_KadastraleAanduiding')
IMKAD_KadastraleAanduiding.gemeente = 'ARNHEM'
IMKAD_KadastraleAanduiding.sectie = 'AC'
IMKAD_KadastraleAanduiding.perceelnummer = '1234'
Object = client.factory.create('ns3:Object')
Object.IMKAD_KadastraleAanduiding = IMKAD_KadastraleAanduiding
Ingang = client.factory.create('ns3:Ingang')
Ingang.Object = Object
Aanvraag.Ingang = Ingang
result = client.service.VerzoekTotInformatie(Aanvraag)
which produces the following xml:
<ns2:Body>
<ns0:VerzoekTotInformatieRequest>
<ns0:Aanvraag>
<ns1:berichtversie>4.7</ns1:berichtversie>
<ns1:klantReferentie>MyReference1</ns1:klantReferentie>
<ns1:productAanduiding>1185</ns1:productAanduiding>
<ns1:Ingang>
<ns1:Object>
<ns1:IMKAD_KadastraleAanduiding>
<ns1:gemeente>ARNHEM</ns1:gemeente>
<ns1:sectie>AC</ns1:sectie>
<ns1:perceelnummer>1234</ns1:perceelnummer>
</ns1:IMKAD_KadastraleAanduiding>
</ns1:Object>
</ns1:Ingang>
</ns0:Aanvraag>
</ns0:VerzoekTotInformatieRequest>
</ns2:Body>
Unfortunately, this results in the server giving back a Nullpointer:
Traceback (most recent call last):
File "<input>", line 1, in <module>
result = client.service.VerzoekTotInformatie(Aanvraag)
File "/Library/Python/2.7/site-packages/suds/client.py", line 542, in __call__
return client.invoke(args, kwargs)
File "/Library/Python/2.7/site-packages/suds/client.py", line 602, in invoke
result = self.send(soapenv)
File "/Library/Python/2.7/site-packages/suds/client.py", line 649, in send
result = self.failed(binding, e)
File "/Library/Python/2.7/site-packages/suds/client.py", line 702, in failed
r, p = binding.get_fault(reply)
File "/Library/Python/2.7/site-packages/suds/bindings/binding.py", line 265, in get_fault
raise WebFault(p, faultroot)
WebFault: Server raised fault: 'java.lang.NullPointerException'
This error is of course terribly unhelpful. The error gives no hint whatsoever on what causes the NullPointer.
If I look at the differences between the xml which pysimplesoap and suds send over the wire, the xml by suds is missing a lot of xmlns definitions (although I don't know whether they are needed) and the names of the tags include prefixes with for example ns0:. I don't know if these differences are relevant, and I also don't know how I would make suds create the same xml as pysimplesoap.
Although the wsdl file of the service is public, the service itself is paid (€60 yearly + €3 for every successful request). So I guess it is hard/impossible for people reading this to reproduce the issue, and I can't really give out my user credentials here.
But since I'm really stuck on this issue, maybe someone can give me some tips on how to debug this? For example; how can I make suds create the same xml as pysimplesoap? Or how I can get more information on the nullpointer?
Any help is welcome!

This is not so much an answer, but an advice from prior experience with Python and SOAP.
Find some good (established, reference for SOAP) Java tool for making SOAP queries given WSDL.
Make some typical queries, interesting to you, and record what is being sent / received as templates
Forget Python SOAP libraries and just use template to query SOAP endpoint (there are many templating languages for Python).
If the step 2. fails with the prominent Java tool, contact techsupport of the service you are paying for.
Have you checked whether all those nice XSDs are really downloaded by Python SOAP clients?

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to fix IncompleteRead error on Linux using Py2Neo - python

Related

Document AI process document fails with invalid argument when processing docs from GCS

Why is UPnPy discover throwing AttributeError?

Using pysolr getting 400 error when trying to add data to solr

Why would inspect.getfile give me a file that's not there?

Soap Server raised fault: 'java.lang.NullPointerException'. How to debug?

Categories

Resources