Python poplib error_proto: line too long

Python poplib error_proto: line too long - python

Recently, without changes to codes/libs, I started getting python error_proto: line too long error when reading email (poplib.retr) from hotmail inbox. I am using Python version 2.7.8. I understand that a long line may be caused this error. But is there a way to go around this or a certain version I need to put in place. Thank you for any advice/direction anyone can give.
Here is a traceback error:
"/opt/rh/python27/root/usr/lib64/python2.7/poplib.py", line 232, in retr\n return self._longcmd(\'RETR %s\' % which)\n',
' File "/opt/rh/python27/root/usr/lib64/python2.7/poplib.py", line 167, in _longcmd\n return self._getlongresp()\n',
' File "/opt/rh/python27/root/usr/lib64/python2.7/poplib.py", line 152, in _getlongresp\n line, o = self._getline()\n',
' File "/opt/rh/python27/root/usr/lib64/python2.7/poplib.py", line 377, in _getline\n raise error_proto(\'line too long\')\n',
'error_proto: line too long\n'

A python bug report exists for this issue here: https://bugs.python.org/issue16041
The work around I put inplace was as follows:
import poplib
poplib._MAXLINE=20480
I thought this was a better idea, rather than editing the poplib.py library file directly.
Woody

Are you sure you've not updated poplib? Have a look at the most recent diff, committed last night:
# Added:
...
# maximal line length when calling readline(). This is to prevent
# reading arbitrary length lines. RFC 1939 limits POP3 line length to
# 512 characters, including CRLF. We have selected 2048 just to be on
# the safe side.
_MAXLINE = 2048
...
# in_getline()...
if len(self.buffer) > _MAXLINE:
raise error_proto('line too long')
...it looks suspiciously similar to your problem.
So if you roll back to the previous version, it will probably be OK.

Related

python Twitch-chatbot MONKALOT encounters json error on startup

Presently I'm trying to make MONKALOT run on a PythonAnywhere account (customized Web Developer). I have basic knowledge of Linux but unfortunately no knowledge of dev'oping python scripts but advanced knowledge of dev'oping Java (hope that helps).
My success log so far:
After upgrading my account to Web Developer level I finally made pip download the (requirements)[https://github.com/NMisko/monkalot/blob/master/requirements.txt] and half the internet (2 of 5GB used). All modules and dependencies seem to be successfully installed.
I configured my own monkalot-channel including OAuth which serves as a staging instance for now. The next challenge was how to get monkalot starting up. Using python3.7 instead of python or any other python3 environment did the trick.
But now I'm stuck. After "completing the training stage" the monkalot-script prematurely ends with the following message:
[22:14] ...chat bot finished training.
Traceback (most recent call last):
File "monkalot.py", line 72, in <module>
bots.append(TwitchBot(path))
File "/home/Chessalot/monkalot/bot/bot.py", line 56, in __init__
self.users = self.twitch.get_chatters()
File "/home/Chessalot/monkalot/bot/data_sources/twitch.py", line 25, in get_chatters
data = requests.get(USERLIST_API.format(self.channel)).json()
File "/usr/local/lib/python3.7/site-packages/requests/models.py", line 900, in json
return complexjson.loads(self.text, **kwargs)
File "/usr/local/lib/python3.7/site-packages/simplejson/__init__.py", line 525, in loads
return _default_decoder.decode(s)
File "/usr/local/lib/python3.7/site-packages/simplejson/decoder.py", line 370, in decode
obj, end = self.raw_decode(s)
File "/usr/local/lib/python3.7/site-packages/simplejson/decoder.py", line 400, in raw_decode
return self.scan_once(s, idx=_w(s, idx).end())
simplejson.errors.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
By now I figured out that monkalot tries to load the chatters list and expects at least an empty json array as result but actually seems to receive an empty string.
So my question is: What can I do to make the monkalot-script work? Is monkalot's current version incompatible to the current Twitch-API? Are there any outdated python libraries which may cause the incompatibility? Or is there an unrecognized configuration issue preventing the script from running successfully?
Thank you all in advance. Any ideas provided by you are highly appreciated.

The most likely cause of that is that you are using a free PythonAnywhere account and have not configured monkalot to use the proxy. Check the documentation of monkalot to determine how you can configure it to use a proxy. See https://help.pythonanywhere.com/pages/403ForbiddenError/ for the proxy details.

Only a quick thought, might not be the problem you are encountering, but it may be due to the project name. E.g.:
From github:
... I believed that the issue was something other than the project name, since I get a different error if I use a project name that doesn't exist. However, I just tried using ben-heil/saged instead of just saged for the project name and that seems to have fixed it.
EDIT: your HTTP 404 error was caused by this:
File "monkalot.py", line 72, in <module>
bots.append(TwitchBot(path))
Now this points out that the function called with path is giving an error. Especially since you see a lot of decode in the traceback error, you can deduce it has something to with your characters you inputted.
Other errors in your traceback that point this out:
simplejson.errors.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
JSONDecodeError: Expecting value: line 1 column 1 (char 0) occurs when we try to parse something that is not valid JSON as if it were. To solve the error, make sure the response or the file is not empty or conditionally check for the content type before parsing.
In most cases your json.loads- JSONDecodeError: Expecting value: line 1 column 1 (char 0) error is due to :
non-JSON conforming quoting
XML/HTML output (that is, a string starting with <), or
incompatible character encoding
In this case, the case caused the error (content type!).
Related sources:
python json decoder
JSONDecodeError: Expecting value: line 1 column 1 (char 0)

After I expected the response, I found out that I received a HTTP 400, Bad Request error WITHOUT any data in the HTTP response body. Since monkalot expects a JSON answer the errors were raised. This was due to the fact that in the channel configuration I used an uppercase letter whereas Twitch expects all letters lowercase.

struct.error: unpack requires a string argument of length 16

While processing a PDF file (2.pdf) with pdfminer (pdf2txt.py) I received the following error:
pdf2txt.py 2.pdf
Traceback (most recent call last):
File "/usr/local/bin/pdf2txt.py", line 115, in <module>
if __name__ == '__main__': sys.exit(main(sys.argv))
File "/usr/local/bin/pdf2txt.py", line 109, in main
interpreter.process_page(page)
File "/usr/local/lib/python2.7/dist-packages/pdfminer/pdfinterp.py", line 832, in process_page
self.render_contents(page.resources, page.contents, ctm=ctm)
File "/usr/local/lib/python2.7/dist-packages/pdfminer/pdfinterp.py", line 843, in render_contents
self.init_resources(resources)
File "/usr/local/lib/python2.7/dist-packages/pdfminer/pdfinterp.py", line 347, in init_resources
self.fontmap[fontid] = self.rsrcmgr.get_font(objid, spec)
File "/usr/local/lib/python2.7/dist-packages/pdfminer/pdfinterp.py", line 195, in get_font
font = self.get_font(None, subspec)
File "/usr/local/lib/python2.7/dist-packages/pdfminer/pdfinterp.py", line 186, in get_font
font = PDFCIDFont(self, spec)
File "/usr/local/lib/python2.7/dist-packages/pdfminer/pdffont.py", line 654, in __init__
StringIO(self.fontfile.get_data()))
File "/usr/local/lib/python2.7/dist-packages/pdfminer/pdffont.py", line 375, in __init__
(name, tsum, offset, length) = struct.unpack('>4sLLL', fp.read(16))
struct.error: unpack requires a string argument of length 16
While the similar file (1.pdf) doesn't cause a problem.
I can't find any information about the error. I added an issue on the pdfminer GitHub repository, but it remained unanswered. Can someone explain to me why this is happening? What can I do to parse 2.pdf?
Update: I get a similar error with BytesIO instead of StringIO after installing pdfminer directly from the GitHub repository.
$ pdf2txt.py 2.pdf
Traceback (most recent call last):
File "/home/danil/projects/python/pdfminer-source/env/bin/pdf2txt.py", line 116, in <module>
if __name__ == '__main__': sys.exit(main(sys.argv))
File "/home/danil/projects/python/pdfminer-source/env/bin/pdf2txt.py", line 110, in main
interpreter.process_page(page)
File "/home/danil/projects/python/pdfminer-source/env/local/lib/python2.7/site-packages/pdfminer/pdfinterp.py", line 839, in process_page
self.render_contents(page.resources, page.contents, ctm=ctm)
File "/home/danil/projects/python/pdfminer-source/env/local/lib/python2.7/site-packages/pdfminer/pdfinterp.py", line 850, in render_contents
self.init_resources(resources)
File "/home/danil/projects/python/pdfminer-source/env/local/lib/python2.7/site-packages/pdfminer/pdfinterp.py", line 356, in init_resources
self.fontmap[fontid] = self.rsrcmgr.get_font(objid, spec)
File "/home/danil/projects/python/pdfminer-source/env/local/lib/python2.7/site-packages/pdfminer/pdfinterp.py", line 204, in get_font
font = self.get_font(None, subspec)
File "/home/danil/projects/python/pdfminer-source/env/local/lib/python2.7/site-packages/pdfminer/pdfinterp.py", line 195, in get_font
font = PDFCIDFont(self, spec)
File "/home/danil/projects/python/pdfminer-source/env/local/lib/python2.7/site-packages/pdfminer/pdffont.py", line 665, in __init__
BytesIO(self.fontfile.get_data()))
File "/home/danil/projects/python/pdfminer-source/env/local/lib/python2.7/site-packages/pdfminer/pdffont.py", line 386, in __init__
(name, tsum, offset, length) = struct.unpack('>4sLLL', fp.read(16))
struct.error: unpack requires a string argument of length 16

TL; DR
Thanks to #mkl and #hynecker for the extra info... With that I can confirm this is a bug in pdfminer and your PDF. Whenever pdfminer tries to get embedded file streams (e.g. font definitions), it is picking up the last one in the file before an endobj. Sadly, not all PDFs rigorously add the end tag and so pdfminer should be resilient to this.
Quick fix for this issue
I've created a patch - which has been submitted as a pull request on github. See https://github.com/euske/pdfminer/pull/159.
Detailed diagnosis
As mentioned in the other answers, the reason you're seeing this is that you're not getting the expected number of bytes from the stream as pdfminer is unpacking the data. But why?
As you can see in your stack trace, pdfminer (rightly) spots that it has a CID font to process. It then goes on to process the embedded font file as a TrueType font (in pdffont.py). It tries to parse the associated stream (stream ID 18) by reading out a set of binary tables.
This doesn't work for 2.pdf because it has a text stream. You can see this by running dumppdf -b -i 18 2.pdf. I've put the start here:
/CIDInit /ProcSet findresource begin
12 dict begin
begincmap
/CIDSystemInfo << /Registry (Adobe) /Ordering (UCS) /Supplement 0
>> def /CMapName /Adobe-Identity-UCS def
...
So, garbage in, garbage out... Is this a bug in your file or pdfminer? Well, the fact that other readers can handle it made me suspicious.
Digging around a little more, I see that this stream is identical to stream ID 17, which is the cmap for the ToUnicode field. A quick look at the PDF spec shows that these cannot be the same.
Digging in to the code further, I see that all streams are getting the same data. Oops! This is the bug. The cause appears to be related to the fact that this PDF is missing some end tags - as noted by #hynecker.
The fix is to return the right data for each stream. Any other fix to just swallow the error will result in bad data being used for all streams and so, for example, incorrect font definitions.
I believe the attached patch will fix your problem and should be safe to use in general.

I fixed your problem in the source code, and I try on your file 2.pdf to make sure it worked.
In the file pdffont.py I replaced:
class TrueTypeFont(object):
class CMapNotFound(Exception):
pass
def __init__(self, name, fp):
self.name = name
self.fp = fp
self.tables = {}
self.fonttype = fp.read(4)
(ntables, _1, _2, _3) = struct.unpack('>HHHH', fp.read(8))
for _ in xrange(ntables):
(name, tsum, offset, length) = struct.unpack('>4sLLL', fp.read(16))
self.tables[name] = (offset, length)
return
by this:
class TrueTypeFont(object):
class CMapNotFound(Exception):
pass
def __init__(self, name, fp):
self.name = name
self.fp = fp
self.tables = {}
self.fonttype = fp.read(4)
(ntables, _1, _2, _3) = struct.unpack('>HHHH', fp.read(8))
for _ in xrange(ntables):
fp_bytes = fp.read(16)
if len(fp_bytes) < 16:
break
(name, tsum, offset, length) = struct.unpack('>4sLLL', fp_bytes)
self.tables[name] = (offset, length)
return
Explanations
#Nabeel Ahmed was right
The foramt string >4sLLL requires 16 bytes size of buffer, which is specified correctly to fp.read to read 16 bytes at a time.
So, the problem can only be with the buffer stream it's reading i.e. the content of your specific PDF file.
In the code we see that fp.read(16) are made in a loop without any check.Thus, we don't know for sure if it successfully read it all. It could for instance reached an EOF.
To avoid this problem, I just break out of the for loop when this kind of problem appears.
for _ in xrange(ntables):
fp_bytes = fp.read(16)
if len(fp_bytes) < 16:
break
In any regular cases, it shouldn't change anything anyway.
I will try to do a pull request on github, but I'm not even sure it will be accepted so I suggest you do a monkey patch for now and modify your /home/danil/projects/python/pdfminer-source/env/local/lib/python2.7/site-packages/pdfminer/pdffont.py file right now.

This is really an invalid PDF because there are some missing keywords endobj after three indirect objects. (object 5, 18 and 22)
The definition of an indirect object in a PDF file shall consist of its object number and generation number (separated by white space), followed by the value of the object bracketed between the keywords obj and endobj.
(chapter 7.3.10 in PDF reference)
The example 2.pdf is a simple PDF 1.3 version that uses a simple uncompressed cross reference and uncompressed object separators. The failure can be easily found by grep command and by a general file viewer that the PDF has 22 indirect objects. The pattern " obj" is found correctly exactly 22 times (never accidentally in a string object or in a stream, fortunately for simplicity), but the keyword endobj is three times missing.
$ grep --binary-files=text -B1 -A2 -E " obj|endobj" 2.pdf
...
18 0 obj
<< /Length 451967/Length1 451967/Filter [/FlateDecode] >>
stream
...
endstream % # see the missing "endobj" here
17 0 obj
<< /Length 12743 /Filter [/FlateDecode] >>
stream
...
endstream
endobj
...
Similarly the object 5 has no endobj before object 1 and the object 22 has no endobj before object 21.
It is known that broken cross references in PDF can be and should be usually reconstructed by obj/endobj keywords (see the PDF reference, chapter C.2) Some applications do probably vice-versa fix missing endobj if cross references are correct, but it is no written advice.

The last error message tells you a lot:
File "/usr/local/lib/python2.7/dist-packages/pdfminer/pdffont.py", line 375, in
init
(name, tsum, offset, length) = struct.unpack('>4sLLL', fp.read(16))
struct.error: unpack requires a string argument of length 16
You can easily debug what is going on, for example, by putting necessary debug statements exactly in pdffont.py file. My guess is that there is something special about your pdf contents. Judging by the method name - TrueTypeFont - which throws the error message, there is some incompatibility with the font type.

Let start with explaining the statement where you're getting exception:
struct.unpack('>4sLLL', fp.read(16))
where the synopsis is:
struct.unpack(fmt, buffer)
The method unpack, unpacks from the buffer buffer (which
presumably earlier packed by pack(fmt, ...)) according to the
format string fmt. The result is a tuple even if it
contains exactly one item. The buffer’s size in bytes must match the
size required by the format, as reflected by calcsize().
The most common case is, wrong number of bytes (16) for the format used (>4sLLL) - for example, for a format expecting 4 bytes, you have specified 3 bytes:
(name, tsum, offset, length) = struct.unpack('BH', fp.read(3))
for this you'll get
struct.error: unpack requires a string argument of length 4
The reason - the format struct ('BH') expects 4 bytes i.e. when we pack something using 'BH' format it'll occupy 4 bytes of memory.
A good explanation here.
To clarify it further - let's look into the >4sLLL format string. To verify the size unpack 'd be expecting for the buffer (the bytes you're reading from the PDF file). Quoting from docs:
The buffer’s size in bytes must match the size required by the format,
as reflected by calcsize().
>>> import struct
>>> struct.calcsize('>4sLLL')
16
>>>
To this point we can say there's nothing wrong with the statement:
(name, tsum, offset, length) = struct.unpack('>4sLLL', fp.read(16))
The foramt string >4sLLL requires 16 bytes size of buffer, which is specified correctly to fp.read to read 16 bytes at a time.
So, the problem can only be with the buffer stream it's reading i.e. the content of your specific PDF file.
Can be a bug - as per this comment:
This is a bug in the upstream PDFminer by #euske There seems to be
patches for this so it should be an easy fix. Beyond this I also need
to strengthen the pdf parsing such that we never error out from a
failed parse
I'll edit the question it I find something helpful to add here - a solution, or a patch.

In case you still get some struct errors after applying Peter's patch, especially when parsing many files in one script's run (using os.listdir), try changing resource manager caching to false.
rsrcmgr = PDFResourceManager(caching=False)
It helped me to get rid of the rest of errors after applying above solutions.

Python can not compile that regex. sre_constants.error: nothing to repeat [duplicate]

This question already has answers here:
regex error - nothing to repeat
(6 answers)
Closed 3 years ago.
I'm converting a C# function to Python. It's should bug for bug compatible with exist function.
This is a regex in that function: http://[a-zA-Z0-9]+(\.[a-zA-Z0-9]+)+([-A-Z0-9a-z_$.+!*()/\\\,:;#&=?~#%]*)*. But Python can't compile it:
>>> re.compile(r"http://[a-zA-Z0-9]+(\.[a-zA-Z0-9]+)+([-A-Z0-9a-z_$.+!*()/\\\,:;#&=?~#%]*)*")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.3/re.py", line 214, in compile
return _compile(pattern, flags)
File "/usr/lib/python3.3/re.py", line 281, in _compile
p = sre_compile.compile(pattern, flags)
File "/usr/lib/python3.3/sre_compile.py", line 498, in compile
code = _code(p, flags)
File "/usr/lib/python3.3/sre_compile.py", line 483, in _code
_compile(code, p.data, flags)
File "/usr/lib/python3.3/sre_compile.py", line 75, in _compile
elif _simple(av) and op is not REPEAT:
File "/usr/lib/python3.3/sre_compile.py", line 362, in _simple
raise error("nothing to repeat")
sre_constants.error: nothing to repeat
Note: There is a JavaScript version of that regex: /http:\/\/[a-zA-Z0-9]+(\.[a-zA-Z0-9]+)+([-A-Z0-9a-z\$\.\+\!\_\*\(\)\/\,\:;#&=\?~#%]*)*/gi.
I searched about nothing to repeat Error, but got nothing.
Sorry, this is a duplicate post.
Where is the problem?

I've reproduced the error with:
re.compile(r"([A]*)*")
The problem is that [A]* can potentially match an empty string. Guess what happens when it tries to match ([A]*)* when [A]* is empty? "nothing to repeat". The regex engine won't wait around for that to actually happen, though. It fails because it is even remotely possible for the scenario to happen.
This should work for you:
r"http://[a-zA-Z0-9]+(\.[a-zA-Z0-9]+)+([-A-Z0-9a-z_$.+!*()/\\\,:;#&=?~#%]*)"
I just removed the last *.

Had the same error come up with the following regex:
re.compile(r'(?P<term>[0-9]{1,2})-(?P<features>[A-Za-z\:]*)?')
It was the '?' at the end that caused the error. Strictly-speaking, this is NOT a repeat and, in fact, this works just fine (as it should) as of python 2.7.9. However, the bug was present as of python 2.7.3.

Using visa to interface with GPIB always gives me VisaIOError

I'm trying to import visa in Python and interface with GPIB to control a device.
The name of device I'm using is "GPIB0::9::INSTR", and I think there should be no problem with this.
I ran the following code in 2.7.3 Python Shell
>>> from visa import *
>>> a = instrument("GPIB0::9", timeout = 20)
>>> a.write("*IDN?")
>>> print a.read()
Traceback (most recent call last):
File "<pyshell#53>", line 1, in <module>
print a.read()
File "C:\Python27\lib\site-packages\pyvisa\visa.py", line 433, in read
return self._strip_term_chars(self.read_raw())
File "C:\Python27\lib\site-packages\pyvisa\visa.py", line 407, in read_raw
chunk = vpp43.read(self.vi, self.chunk_size)
File "C:\Python27\lib\site-packages\pyvisa\vpp43.py", line 840, in read
visa_library().viRead(vi, buffer, count, byref(return_count))
File "C:\Python27\lib\site-packages\pyvisa\vpp43.py", line 398, in check_status
raise visa_exceptions.VisaIOError, status
VisaIOError: VI_ERROR_TMO: Timeout expired before operation completed.
Above is the error the system gave me.
Actually at the beginning, I set the Timeout to be 3, it shows this errot. But after I changed the value to be 20 as shown above, it still didn't work.
Can somebody help me?

There are different problems that could lead to a timeout.
First you should check if your device supports the *IDN?query. It is a IEEE-488.2 standard command, so chances are high it is supported (if not check your manual for commands that are).
Then you should check you're communication settings, specifically the termination character and the EOI.
If you're using the wrong termination character, visa will keep on reading and finally time out.
Note: You can use pyvisa's ask function if you're using a queryable command (it is a combined write and read).
import visa
# If you've got just one gpib card installed, you can ommit the 0.
# ASsuming the EOI line should be asserted and a termination character
# of '\n'
instrument = visa.instrument('GPIB::9', term_chars='\n', send_end=True)
# use ask to write the command and read back the response
print instrument.ask('*IDN?')

import visa
rm = visa.ResourceManager()
devices = rm.list_resources()
comm_channel = rm.open_resource(devices[0]) #assuming you only have 1 address to worry about
print(comm_channel.query("*IDN?"))
This exploits PYVisa's module and the many functions it has to offer with connecting/writing/reading to and from a USB/GPIB device.

For each one specific instruments, it will have its own command to control them. Please refer to the device's user manual.

Inconsistent file behavior

I'm trying to track down a Python UnicodeDecodeError in the following log line:
10.210.141.123 - - [09/Nov/2011:14:41:04 -0800] "gfR\x15¢\x09ì|Äbk\x0F[×ÐÖà\x11CEÐÌy\x5C¿DÌj\x08Ï ®At\x07å!;f>\x08éPW¤\x1C\x02ö*6+\x5C\x15{,ªIkCRA\x22 xþP9â\x13h\x01¢è´\x1DzõWiË\x5C\x10sòÊ¨R)¶²\x1F8äl¾¢{ÆNw\x08÷#ï" 400 166 0.000 "-" "-"
I opened the entire log file in Vim, and then yanked the line into a new file so I could test just the one line. However, my parsing script works OK with the new file - it doesn't throw a UnicodeDecodeError. I don't understand why the one file would generate an error and the other one would not, when they are (on the surface) identical.
Here's what I tried: running enca to determine the file encoding, which complained that it Cannot determine (or understand) your language preferences. file -i says that both files are Regular files. I also deleted every other line in the original log file and still got the error in one file and no error in the other. I tried deleting
set encoding=utf-8
from my .vimrc, writing the file again, and I still got the error in one file and not in the other.
The logs are nginx logs. Nginx has this note in their release notes:
*) Change: now the 0x00-0x1F, '"' and '\' characters are escaped as \xXX
in an access_log.
Thanks to Maxim Dounin.
My Python script has with open('log_file') as f and the error comes up when I try to call json.dumps on a dict.
How can I track this down?

Your question: How can I track this down?
Answer:
(1) Show us the full text of the error message that you got -- without knowing what encoding that you were trying to use, we can't tell you anything. A traceback and a snippet of code that reads the file and reproduces the error would also be handy.
(2) Write a tiny Python script to find the line in the file and then do:
print repr(the_line) # Python 2.X
print ascii(the_line) # Python 3.x
and copy/paste the result into an edit of your question, so that we can see unambiguously what is in the line.
(3) It does look like random gibberish except for the  but do tell us whether you expect that line to be text (if so, in what human language?).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python poplib error_proto: line too long - python

A python bug report exists for this issue here: https://bugs.python.org/issue16041 The work around I put inplace was as follows: import poplib poplib._MAXLINE=20480 I thought this was a better idea, rather than editing the poplib.py library file directly. Woody

Related

python Twitch-chatbot MONKALOT encounters json error on startup

struct.error: unpack requires a string argument of length 16

Python can not compile that regex. sre_constants.error: nothing to repeat [duplicate]

Using visa to interface with GPIB always gives me VisaIOError

Inconsistent file behavior

Categories

Resources