PyODBC Cursor.fetchall() causes python to crash (segfault) - python

I am using Python 2.7 on Windows XP.
I have a simple python script on a schedule that uses pyodbc to grab data from an AR database which has worked perfectly until today. I get a segfault once the cursor reaches a particular row. I have similar code in C++ which has no problem retrieving the results, so I figure this is an issue with pyodbc. Either way, I'd like to "catch" this error. I've tried to use the subprocess module, but it doesn't seem to work since once the script hits a segfault it just hangs on the "python.exe has encountered a problem and needs to close." message. I guess I could set some arbitrary time frame for it to complete in and, if it doesn't, force close the process, but that seems kind of lame.
I have reported the issue here as well - http://code.google.com/p/pyodbc/issues/detail?id=278
#paulsm4 - I have answered your questions below, thanks!
Q: You're on Windows/XP (32-bit, I imagine), Python 2.7, and BMC
Remedy AR. Correct?
A: Yes, it fails on Win XP 32 bit and Win Server 2008 R2 64 bit.
Q: Is there any chance you (or perhaps your client, if they purchased
Remedy AR) can open a support call with BMC?
A: Probably not...
Q: Can you isolate which column causes the segfault? "What's
different" when the segfault occurs?
A: Just this particular row...but I have now isolated the issue with your suggestions below. I used a loop to fetch each field until a segfault occurred.
cursor.columns(table="mytable")
result = cursor.fetchall()
columns = [x[3] for x in result]
for x in columns:
print x
cursor.execute("""select "{0}"
from "mytable"
where id = 'abc123'""".format(x))
cursor.fetchall()
Once I identified the column that causes the segfault I tried a query for all columns EXCEPT that one and sure enough it worked no problem.
The column's data type was CHAR(1024). I used C++ to grab the data and noticed that the column for that row had the most characters in it out of any other row...1023! Thinking that maybe there is a buffer in the C code for PyODBC that is getting written to beyond its boundaries.
2) Enable ODBC tracing: http://support.microsoft.com/kb/274551
3) Post back the results (including the log trace of the failure)
Ok, I have created a pastebin with the results of the ODBC trace - http://pastebin.com/6gt95rB8. To protect the innocent, I have masked some string values.
Looks like it may have been due to data truncation.
Does this give us enough info as to how to fix the issue? I'm thinking it's a bug within PyODBC since using the C ODBC API directly works fine.
Update
So I compiled PyODBC for debugging and I got an interesting message -
Run-Time Check Failure #2 - Stack around the variable 'tempBuffer' was corrupted.
While I don't currently understand it, the call stack is as follows -
pyodbc.pyd!GetDataString(Cursor * cur=0x00e47100, int iCol=0) Line 410 + 0xf bytes C++
pyodbc.pyd!GetData(Cursor * cur=0x00e47100, int iCol=0) Line 697 + 0xd bytes C++
pyodbc.pyd!Cursor_fetch(Cursor * cur=0x00e47100) Line 1032 + 0xd bytes C++
pyodbc.pyd!Cursor_fetchlist(Cursor * cur=0x00e47100, int max=-1) Line 1063 + 0x9 bytes C++
pyodbc.pyd!Cursor_fetchall(_object * self=0x00e47100, _object * args=0x00000000) Line 1142 + 0xb bytes C++
Resolved!
Problem was solved by ensuring that the buffer had enough space.
In getdata.cpp on line 330
char tempBuffer[1024];
Was changed to
char tempBuffer[1025];
Compiled and replaced the old pyodbc.pyd file in site-packages and we're all good!
Thanks for your help!

Q: You're on Windows/XP (32-bit, I imagine), Python 2.7, and BMC Remedy AR. Correct?
Q: Is there any chance you (or perhaps your client, if they purchased Remedy AR) can open a support call with BMC?
Q: Can you isolate which column causes the segfault? "What's different" when the segfault occurs?
Please do the following:
1) Try different "select a,b,c" statements using Python/ODBC to see if you can reproduce the problem (independent of your program) and isolate a specific column (or, ideally, a specific column and row!)
2) Enable ODBC tracing:
http://support.microsoft.com/kb/274551
3) Post back the results (including the log trace of the failure)
4) If that doesn't work - and if you can't get BMC Technical Support involved - then Plan B might be to debug at the ODBC library level:
How to debug C extensions for Python on Windows
Q: What C/C++ compiler would work best for you?

for any one else who might get this error, check the data type format returned. in my case, it was a datetime column. used select convert(varchar, getdate(), 20) from xxx, or any convert formats to get your desired result.

Related

Searching a list using astroquery

I have the following code below that I got from https://astroquery.readthedocs.io/en/latest/gaia/gaia.html.
I get the br when ra and dec is a number. However, I have not just one number, but a list of ra and dec. When I tried putting a list in for ra and dec in the code below, I got an error saying Error 500: null. Is there a way to find br using a list of ra and dec?
coord = SkyCoord(ra=, dec=, unit=(u.degree, u.degree), frame='icrs')
width = u.Quantity(0.0005, u.deg)
height = u.Quantity(0.0005, u.deg)
r = Gaia.query_object_async(coordinate=coord, width=width, height=height)
r.pprint()
r.columns
br=[r["phot_bp_rp_excess_factor"]]
print (br)
I am new to astroquery, so any help will be apreciated.
Hi and congrats on your first StackOverflow question. As I understand it, Astroquery is a community effort, and the modules for querying individual online catalogues are in many cases being developed and maintained side-by-side with the online query systems, often by their same developers. So different modules within Astroquery are being worked on sometimes by different teams, and have some varying degree of feature-completeness and consistency in interfaces (something I'd like to see improved but it's very difficult to coordinate).
In the case of Gaia.query_object_async the docs are not completely clear, but it's not obvious that it even supports array SkyCoord. It should, or at least if it doesn't it should give a better error.
To double check, I dug into the code and found what I kind of suspected: It does not at all allow this. It is building a SQL query using string replacement (generally considered a bad idea) and passing that SQL query to a web-based service. Because the ra and dec values are arrays, it just blindly passes those array representations into the template for the SQL query, resulting in an invalid query:
SELECT
TOP 50
DISTANCE(
POINT('ICRS', ra, dec),
POINT('ICRS', [99.00000712 87.00000767], [24.99999414 24.99999461])
) as dist,
*
FROM
gaiadr2.gaia_source
WHERE
1 = CONTAINS(
POINT('ICRS', ra, dec),
BOX(
'ICRS',
[99.00000712 87.00000767],
[24.99999414 24.99999461],
0.005,
0.005
)
)
ORDER BY
dist ASC
The server, rather than return an error message suggesting that the query is malformed, instead just returns a general server error. Basically it crashes.
Long story short, you should probably open a bug report about this against astroquery, and see if it's on the Gaia maintainers' radar to deal with: https://github.com/astropy/astroquery/issues/new
In the meantime it sounds like your best bet is to make multiple queries in a loop and join their results together. Since it returns a Table, you can use astropy.table.vstack:
from astropy.table import vstack
results = []
for coord in coords:
results.append(Gaia.query_object_async(coord, width=width, height=height))
results = vstack(results)

Neo4J / py2neo -- cursor-based query?

If I do something like this:
from py2neo import Graph
graph = Graph()
stuff = graph.cypher.execute("""
match (a:Article)-[p]-n return a, n, p.weight
""")
on a database with lots of articles and links, the query takes a long time and uses all my system's memory, presumably because it's copying the entire result set into memory in one go. Is there some kind of cursor-based version where I could iterate through the results one at a time without having to have them all in memory at once?
EDIT
I found the stream function:
stuff = graph.cypher.stream("""
match (a:Article)-[p]-n return a, n, p.weight
""")
which seems to be what I want according to the documentation but now I get a timeout error (py2neo.packages.httpstream.http.SocketError: timed out), followed by the server becoming unresponsive until I kill it using kill -9.
Have you tried implementing a paging mechanism? Perhaps with the skip keyword: http://neo4j.com/docs/stable/query-skip.html
Similar to using limit / offset in a postgres / mysql query.
EDIT: I previously said that the entire result set was stored in memory, but it appears this is not the case when using api streaming - per Nigel's (Neo engineer) comment below.

Error message: Protocol wrong type for socket with py2neo

I am using py2neo to query a set of data from a neo4j graph database and to create relationships between nodes once appropriate information has been achieved within a python script.
Here is a basic structure of the script.
import py2neo
### get a set of data from a graph database
graph = Graph()
query = graph.cypher.execute("MATCH (p:LABEL1))-[:relationship]->(c:Label2 {Name:'property'} \
RETURN p.Id, p.value1, c.value2 LIMIT 100")
### do a data analysis within a python script here
~ ~ ~
### update newly found information through the analysis to the graph database
tx = graph.cypher.begin()
qs1 = "UNWIND range(0,{size}-1) AS r \
MATCH (p1:Label1 {Id:{xxxx}[r]}),(p2:Label2 {Id:{yyyy}[r]}) \
MERGE (p1)-[:relationship {property:{value}[r]]->(p2)"
tx.append(qs1, parameters = {"size":size, \
"xxxx":xxxx, \
"yyyy":yyyy, \
"value":value})
tx.commit()
The script does perform as it is supposed to do when the query results are limited to 100, but when I increase 200 or above, the program crashes, leaving the following error message:
--> 263 tx.commit()
SocketError: Protocol wrong type for socket
Unfortunately, besides the above statement, there is no other useful information that may hint what the problem might be. Did anyone have this sort of problem and could you suggest what the underlying issues may be?
Thank you.
I haven't looked into this in much depth.
I ran into the same error when using:
tx = graph.cypher.begin()
tx.append("my cypher statement, i.e. MATCH STUFF")
tx.process()
....
tx.commit()
I was appending 1000 statements between each process call, and every 2000 statements I'd commit.
When I reduce the number of statements between process & commit to 250/500 it works fine (similar to you limiting to 100).
The actual error in the stack trace is from python's http/client:
File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/http/client.py", line 888, in send
self.sock.sendall(data)
OSError: [Errno 41] Protocol wrong type for socket
Which is this an EPROTOTYPE error, so it's happening on the client side and not in neo4j.
Are you on OS X? (I am.) This looks like it could be the issue:
Chasing an EPROTOTYPE Through Rust, Sendto, and the OSX Kernel With C-Reduce. I don't know enough about underlying socket behaviour to tell you why reducing the amount we're sending over the connection removes the race condition.

Python for Keithley

I hooked up the Keithley 2701 DMM, installed the software and set the IPs right. I can access and control the instrument via the internet explorer webpage and the Keithley communicator. When I try to use python, it detects the instrument
i.e. a=visa.instrument("COM1") doesn't give an error.
I can write to the instrument as well:
a.write("*RST")
a.write("DISP:ENAB ON/OFF")
a.write("DISP:TEXT:STAT ON/OFF")
etc all don't give any error but no change is seen on the instrument screen.
However when I try to read back, a.ask("*IDN?") etc give me an error
saying timeout expired before operation completed.
I tried redefining as:
a=visa.instrument("COM1",timeout=None)
a=visa.instrument("TCPIP::<the IP adress>::1354::SOCKET")
and a few other possible combinations but I'm getting the same error.
Please do help.
The issue with communicating to the 2701 might be an invalid termination character. By default the termination character has the value CR+LF which is “\r\n”.
The python code to set the termination character is:
theInstrument = visa.instrument(“TCPIP::<IPaddress>::1394::SOCKET”, term_chars = “\n”)
or
theInstrument = visa.instrument(“TCPIP::<IPaddress>::1394::SOCKET”)
theInstrument.term_chars = “\n”
I hope this helps,

Passing a record over a socket

I have basic socket communication set up between python and Delphi code (text only). Now I would like to send/receive a record of data on both sides. I have a Record "C compatible" and would like to pass records back and forth have it in a usable format in python.
I use conn.send("text") in python to send the text but how do I send/receive a buffer with python and access the record items sent in python?
Record
TPacketData = record
pID : Integer;
dataType : Integer;
size : Integer;
value : Double;
end;
I don't know much about python, but I have done a lot between Delphi, C++, C# and Java even with COBOL.Anyway, to send a record from Delphi to C first you need to pack the record at both ends,
in Deplhi
MyRecord = pack record
in C++
#pragma pack(1)
I don’t know in python but I guess there must be a similar one. Make sure that at both sides the sizeof(MyRecord) is the same length.Also, before sending the records, you should take care about byte ordering (you know, Little-Endian vs Big-Endian), use the Socket.htonl() and Socket.ntohl() in python and the equivalent in Deplhi which are in WinSock unit. Also a "double" in Delphi could not be the same as in python, in Delphi is 8 bytes check this as well, and change it to Single(4 bytes) or Extended (10 bytes) whichever matches.
If all that match then you could send/receive binary records in one shut, otherwise, I'm afraid, you have to send the individual fields one by one.
I know this answer is a bit late to the game, but may at least prove useful to other people finding this question in their search-results. Because you say the Delphi code sends and receives "C compatible data" it seems that for the sake of the answer about Python's handling it is irrelevant whether it is Delphi (or any other language) on the other end...
The python struct and socket modules have all the functionality for the basic usage you describe. To send the example record you would do something like the below. For simplicity and sanity I have presumed signed integers and doubles, and packed the data in "network order" (bigendian). This can easily be a one-liner but I have split it up for verbosity and reusability's sake:
import struct
t_packet_struc = '>iiid'
t_packet_data = struct.pack(t_packet_struc, pid, data_type, size, value)
mysocket.sendall(t_packet_data)
Of course the mentioned "presumptions" don't need to be made, given tweaks to the format string, data preparation, etc. See the struct inline help for a description of the possible format strings - which can even process things like Pascal-strings... By the way, the socket module allows packing and unpacking a couple of network-specific things which struct doesn't, like IP-address strings (to their bigendian int-blob form), and allows explicit functions for converting data bigendian-to-native and vice-versa. For completeness, here is how to unpack the data packed above, on the Python end:
t_packet_size = struct.calcsize(t_packet_struc)
t_packet_data = mysocket.recv(t_packet_size)
(pid, data_type, size, value) = struct.unpack(t_packet_struc,
t_packet_data)
I know this works in Python version 2.x, and suspect it should work without changes in Python version 3.x too. Beware of one big gotcha (because it is easy to not think about, and hard to troubleshoot after the fact): Aside from different endianness, you can also distinguish between packing things using "standard size and alignment" (portably) or using "native size and alignment" (much faster) depending on how you prefix - or don't prefix - your format string. These can often yield wildly different results than you intended, without giving you a clue as to why... (there be dragons).

Categories