I'm having a weird error on the completion of a mapreduce job that writes to the google storage, has anybody seen this before?
Final result for job '158354152558......' is 'success'
....
File "/base/data/home/apps/s~app/bqmapper.360899047207944804/libs/mapreduc/handlers.py", line 539, in _finalize_job
mapreduce_spec.mapper.output_writer_class().finalize_job(mapreduce_state)
File "/base/data/home/apps/s~app/bqmapper.360899047207944804/libs/mapreduce/output_writers.py", line 571, in finalize_job
files.finalize(create_filename)
File "/base/data/home/apps/s~app/bqmapper.360899047207944804/libs/mapreduce/lib/files/file.py", line 568, in finalize
f.close(finalize=True)
File "/base/data/home/apps/s~app/bqmapper.360899047207944804/libs/mapreduce/lib/files/file.py", line 291, in close
self._make_rpc_call_with_retry('Close', request, response)
File "/base/data/home/apps/s~app/bqmapper.360899047207944804/libs/mapreduce/lib/files/file.py", line 427, in _make_rpc_call_with_retry
_make_call(method, request, response)
File "/base/data/home/apps/s~app/bqmapper.360899047207944804/libs/mapreduce/lib/files/file.py", line 252, in _make_call
_raise_app_error(e)
File "/base/data/home/apps/s~app/bqmapper.360899047207944804/libs/mapreduce/lib/files/file.py", line 186, in _raise_app_error
raise UnknownError()
UnknownError
After playing with it I found that an open file on the cloud storage has to be finalized in less than 1 hour or it will fail with this lovely UnknownError.
I mitigate the problem increasing the number of shards to make the mapping faster and changed the output_sharding strategy to "input" that creates one file per shard.
Related
tl;dr: An app that had been working fine is suddenly throwing a "Bad file descriptor" error with no other changes; I need advice for how to evaluate this.
I inherited an app that had been untouched for years, after the server crashed and I needed to move it to another machine. It's built with Flask, and uses Peewee to talk to a Postgres database over pyscopg2. It has a bunch of other stuff--an Elasticsearch engine for searching, a lot of heavy JS on the front end--but that doesn't seem to be the problem here. The code is moderately complex, and I am not very knowledgeable about all of its pieces.
It took me a while to get it set up using the sketchy deployment instructions that had been left behind, but eventually I got it running, and was able to get a test version running on a clean VM and then deploy it on an actual server, using gunicorn and nginx. It's been working fine in production for a week. I'm using Debian Buster for all versions. I'm using the most recent versions of all software.
I then decided to do some basic code cleanup, and ran the entire app through a linter, before looking at some other changes to make, that the end user had requested. Unfortunately, after this, the app consistently fails at the same point with a "Bad file descriptor" error. This is in a pre-run section, which parses a large XML file and saves the info to the database and to Elasticsearch; the app receives an XML upload, forks a few processes, and runs the parse/index process in the background.
I am subsequently unable to get past this error by any means. I have launched a clean VM and installed everything from scratch; I've reverted the git repo to before I linted the code. Same problem. I don't see how it can be a code issue, as it's now at the same point it was when I started. But I'm at a loss for what to do, and terrified that the production machine will fail.
The errors I get (trimming the first few lines that refer to places in the app itself) are:
[2021-03-14 14:40:11.699837] self.execute()
[2021-03-14 14:40:11.699878] File "/home/deploy/git/myapp/venv/lib/python3.7/site-packages/peewee.py", line 1906, in inner
[2021-03-14 14:40:11.699907] return method(self, database, *args, **kwargs)
[2021-03-14 14:40:11.699946] File "/home/deploy/git/myapp/venv/lib/python3.7/site-packages/peewee.py", line 1977, in execute
[2021-03-14 14:40:11.699976] return self._execute(database)
[2021-03-14 14:40:11.700004] File "/home/deploy/git/myapp/venv/lib/python3.7/site-packages/peewee.py", line 2149, in _execute
[2021-03-14 14:40:11.700032] cursor = database.execute(self)
[2021-03-14 14:40:11.700060] File "/home/deploy/git/myapp/venv/lib/python3.7/site-packages/peewee.py", line 3156, in execute
[2021-03-14 14:40:11.700088] return self.execute_sql(sql, params, commit=commit)
[2021-03-14 14:40:11.700115] File "/home/deploy/git/myapp/venv/lib/python3.7/site-packages/peewee.py", line 3150, in execute_sql
[2021-03-14 14:40:11.700143] self.commit()
[2021-03-14 14:40:11.700171] File "/home/deploy/git/myapp/venv/lib/python3.7/site-packages/peewee.py", line 2916, in __exit__
[2021-03-14 14:40:11.700198] reraise(new_type, new_type(exc_value, *exc_args), traceback)
[2021-03-14 14:40:11.700226] File "/home/deploy/git/myapp/venv/lib/python3.7/site-packages/peewee.py", line 190, in reraise
[2021-03-14 14:40:11.700254] raise value.with_traceback(tb)
[2021-03-14 14:40:11.700282] File "/home/deploy/git/myapp/venv/lib/python3.7/site-packages/peewee.py", line 3143, in execute_sql
[2021-03-14 14:40:11.700309] cursor.execute(sql, params or ())
[2021-03-14 14:40:11.700339] OperationalError('SSL SYSCALL error: Bad file descriptor\n')
127.0.0.1 - - [14/Mar/2021 10:40:11] "POST /manage/versions/upload HTTP/1.1" 500 -
Error on request:
Traceback (most recent call last):
File "/home/deploy/git/myapp/venv/lib/python3.7/site-packages/werkzeug/serving.py", line 323, in run_wsgi
execute(self.server.app)
File "/home/deploy/git/myapp/venv/lib/python3.7/site-packages/werkzeug/serving.py", line 315, in execute
write(data)
File "/home/deploy/git/myapp/venv/lib/python3.7/site-packages/werkzeug/serving.py", line 273, in write
self.send_response(code, msg)
File "/home/deploy/git/myapp/venv/lib/python3.7/site-packages/werkzeug/serving.py", line 388, in send_response
self.wfile.write(hdr.encode("ascii"))
File "/usr/lib/python3.7/socketserver.py", line 799, in write
self._sock.sendall(b)
OSError: [Errno 9] Bad file descriptor
Exception in thread Thread-22:
Traceback (most recent call last):
File "/usr/lib/python3.7/threading.py", line 917, in _bootstrap_inner
self.run()
File "/usr/lib/python3.7/threading.py", line 865, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.7/socketserver.py", line 654, in process_request_thread
self.shutdown_request(request)
File "/usr/lib/python3.7/socketserver.py", line 509, in shutdown_request
self.close_request(request)
File "/usr/lib/python3.7/socketserver.py", line 513, in close_request
request.close()
File "/usr/lib/python3.7/socket.py", line 420, in close
self._real_close()
File "/usr/lib/python3.7/socket.py", line 414, in _real_close
_ss.close(self)
OSError: [Errno 9] Bad file descriptor
I note that the final section ("Exception in thread Thread-22") is showing the system Python, rather than my virtual environment; I don't know if that's relevant, or if that's just what's running some overall process. I didn't get to this point doing anything different, though--the app is running in the virtual environment.
I'd be very grateful for any thoughts here--I'm obviously hoping it's some kind of stupid permission error or something, as I can't easily go into the code because of its complexity.
I really don't know how to help myself, being unfamiliar with this kind of error, and not finding anything on the Google landscape really. My last hope is one of you guys since I don't know where else to go with this. I tried reinstalling all libraries and setting up a new venv. For more action I don't trust myself enough in these kinds of things.
The code triggering the error:
from wetterdienst import DWDObservationData
observations_daily = DWDObservationData(
station_ids=station_ids_d,
parameter=params_daily,
time_resolution=TimeResolution.DAILY,
start_date="2015-01-01",
end_date="2020-10-10",
tidy_data=True,
humanize_column_names=True,
)
for df in observations_hourly.collect_data():
name = str(df.STATION_ID.iloc[0]).strip(".0")
df.to_csv('./data/hourly/{}.csv'.format(name))
print('{} done'.format(name))
API is found here: https://github.com/earthobservations/wetterdienst
Error:
Traceback (most recent call last):
File "/Users/sashakaun/PycharmProjects/wetter2.0/main.py", line 83, in <module>
for df in observations_hourly.collect_data():
File "/Users/sashakaun/PycharmProjects/wetter2.0/venv/lib/python3.8/site-packages/wetterdienst/dwd/observations/api.py", line 178, in collect_data
df_parameter = self._collect_parameter_from_station(
File "/Users/sashakaun/PycharmProjects/wetter2.0/venv/lib/python3.8/site-packages/wetterdienst/dwd/observations/api.py", line 243, in _collect_parameter_from_station
df_period = collect_climate_observations_data(
File "/Users/sashakaun/PycharmProjects/wetter2.0/venv/lib/python3.8/site-packages/wetterdienst/dwd/observations/access.py", line 82, in collect_climate_observations_data
filenames_and_files = download_climate_observations_data_parallel(remote_files)
File "/Users/sashakaun/PycharmProjects/wetter2.0/venv/lib/python3.8/site-packages/wetterdienst/dwd/observations/access.py", line 106, in download_climate_observations_data_parallel
return list(zip(remote_files, files_in_bytes))
File "/usr/local/Cellar/python#3.8/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/concurrent/futures/_base.py", line 611, in result_iterator
yield fs.pop().result()
File "/usr/local/Cellar/python#3.8/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/concurrent/futures/_base.py", line 432, in result
return self.__get_result()
File "/usr/local/Cellar/python#3.8/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
raise self._exception
File "/usr/local/Cellar/python#3.8/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/Users/sashakaun/PycharmProjects/wetter2.0/venv/lib/python3.8/site-packages/wetterdienst/dwd/observations/access.py", line 124, in _download_climate_observations_data
return BytesIO(__download_climate_observations_data(remote_file=remote_file))
File "<decorator-gen-2>", line 2, in __download_climate_observations_data
File "/Users/sashakaun/PycharmProjects/wetter2.0/venv/lib/python3.8/site-packages/dogpile/cache/region.py", line 1356, in get_or_create_for_user_func
return self.get_or_create(
File "/Users/sashakaun/PycharmProjects/wetter2.0/venv/lib/python3.8/site-packages/dogpile/cache/region.py", line 954, in get_or_create
with Lock(
File "/Users/sashakaun/PycharmProjects/wetter2.0/venv/lib/python3.8/site-packages/dogpile/lock.py", line 185, in __enter__
return self._enter()
File "/Users/sashakaun/PycharmProjects/wetter2.0/venv/lib/python3.8/site-packages/dogpile/lock.py", line 94, in _enter
generated = self._enter_create(value, createdtime)
File "/Users/sashakaun/PycharmProjects/wetter2.0/venv/lib/python3.8/site-packages/dogpile/lock.py", line 178, in _enter_create
return self.creator()
File "/Users/sashakaun/PycharmProjects/wetter2.0/venv/lib/python3.8/site-packages/dogpile/cache/region.py", line 920, in gen_value
self.backend.set(key, value)
File "/Users/sashakaun/PycharmProjects/wetter2.0/venv/lib/python3.8/site-packages/dogpile/cache/backends/file.py", line 239, in set
dbm[key] = pickle.dumps(value, pickle.HIGHEST_PROTOCOL)
_gdbm.error: Database needs recovery
Thanks a lot!!
A GDBM file has been corrupted. You need to use gdbmtool to recover the database. Install gdbmtool then run
gdbmtool FILENAME
Where FILENAME is the name of the GDBM database. A prompt will appear, then you can enter
gdbmtool> recover summary
If the database can be recovered it will display a summary of the recovery results, eg:
Recovery succeeded.
Keys recovered: 6870650, failed: 5, duplicate: 0
Buckets recovered: 64830, failed: 2
Before you tell me, yes I am aware that selfbots can get you banned. My selfbot is for work purposes in a server with me and three others. I'm doing nothing shady or weird over here.
I'm using the following selfbot code: https://github.com/Supersebi3/Selfbot
Upon logging in, being that I'm in about 50 servers, I experience the following:
This carries on for several minutes, until I eventually get a MemoryError:
File "main.py", line 96, in <module>
bot.run(token, bot=False)
File "D:\Python\Python36-32\lib\site-packages\discord\client.py", line 519, in run
self.loop.run_until_complete(self.start(*args, **kwargs))
File "D:\Python\Python36-32\lib\asyncio\base_events.py", line 468, in run_until_complete
return future.result()
File "D:\Python\Python36-32\lib\site-packages\discord\client.py", line 491, in start
yield from self.connect()
File "D:\Python\Python36-32\lib\site-packages\discord\client.py", line 448, in connect
yield from self.ws.poll_event()
File "D:\Python\Python36-32\lib\site-packages\discord\gateway.py", line 431, in poll_event
yield from self.received_message(msg)
File "D:\Python\Python36-32\lib\site-packages\discord\gateway.py", line 327, in received_message
log.debug('WebSocket Event: {}'.format(msg))
MemoryError
Can anyone explain to why this is happening and how I can fix it? Is there any way I can skip the chunk processing for the members of every server my selfbot account is in?
I'm trying to figure out what causes this error when I run my app using the basic Flask server during development. I start it with this:
from myapp import app
app.run(debug=True, port=5001)
All is well and I'll continue to code and refresh etc, but then after a while I get the recursion error and have to Ctrl-C the server and restart it. Not a big deal, just a little annoying to have to deal with every now and then.
Here's the full traceback, which I tried to use to determine the cause but can't see anything that stands out (possibly something to do with how werkzeug uses Cookie.py?):
Traceback (most recent call last):
File "/Users/jeff/.virtualenvs/fmll/lib/python2.7/site-packages/flask/app.py", line 1701, in __call__
return self.wsgi_app(environ, start_response)
File "/Users/jeff/.virtualenvs/fmll/lib/python2.7/site-packages/werkzeug/wsgi.py", line 411, in __call__
return self.app(environ, start_response)
(last bit repeated a bunch - trimmed to fit in posting size requirements)
File "/Users/jeff/.virtualenvs/fmll/lib/python2.7/site-packages/flask/app.py", line 1685, in wsgi_app
with self.request_context(environ):
File "/Users/jeff/.virtualenvs/fmll/lib/python2.7/site-packages/flask/ctx.py", line 274, in __enter__
self.push()
File "/Users/jeff/.virtualenvs/fmll/lib/python2.7/site-packages/flask/ctx.py", line 238, in push
self.session = self.app.open_session(self.request)
File "/Users/jeff/.virtualenvs/fmll/lib/python2.7/site-packages/flask/app.py", line 792, in open_session
return self.session_interface.open_session(self, request)
File "/Users/jeff/.virtualenvs/fmll/lib/python2.7/site-packages/flask/sessions.py", line 191, in open_session
secret_key=key)
File "/Users/jeff/.virtualenvs/fmll/lib/python2.7/site-packages/werkzeug/contrib/securecookie.py", line 309, in load_cookie
data = request.cookies.get(key)
File "/Users/jeff/.virtualenvs/fmll/lib/python2.7/site-packages/werkzeug/utils.py", line 77, in __get__
value = self.func(obj)
File "/Users/jeff/.virtualenvs/fmll/lib/python2.7/site-packages/werkzeug/wrappers.py", line 418, in cookies
cls=self.dict_storage_class)
File "/Users/jeff/.virtualenvs/fmll/lib/python2.7/site-packages/werkzeug/http.py", line 741, in parse_cookie
cookie.load(header)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/Cookie.py", line 632, in load
self.__ParseString(rawdata)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/Cookie.py", line 665, in __ParseString
self.__set(K, rval, cval)
File "/Users/jeff/.virtualenvs/fmll/lib/python2.7/site-packages/werkzeug/_internal.py", line 290, in _BaseCookie__set
morsel = self.get(key, _ExtendedMorsel())
File "/Users/jeff/.virtualenvs/fmll/lib/python2.7/site-packages/werkzeug/_internal.py", line 271, in __init__
Morsel.__init__(self)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/Cookie.py", line 438, in __init__
dict.__setitem__(self, K, "")
RuntimeError: maximum recursion depth exceeded while calling a Python object
Since it occurs during your developement process, you could increase recursion limit, before starting your server, using :
sys.setrecursionlimit(2000) # Choose the right figure for you here
# the value on my system is 1000 but this is platform-dependant
However, you should use it very carefully and probably not in production unless you have a good knowledge of it's impacts.
Ref : http://docs.python.org/2/library/sys.html#sys.setrecursionlimit
I use:
MongoDB 1.6.5
Pymongo 1.9
Python 2.6.6
I have 3 types of daemons. 1st load data from web, 2nd analyze it and save result, and 3rd group result. All of them working with Mongodb.
At some time 3rd daemon throws many exceptions like this(mostly when there are big amount of data in DB):
Traceback (most recent call last):
File "/usr/local/lib/python2.6/dist-packages/gevent-0.13.1-py2.6-linux-x86_64.egg/gevent/greenlet.py", line 405, in run
result = self._run(*self.args, **self.kwargs)
File "/data/www/spider/daemon/scripts/mainconverter.py", line 72, in work
for item in res:
File "/usr/local/lib/python2.6/dist-packages/pymongo-1.9_-py2.6-linux-x86_64.egg/pymongo/cursor.py", line 601, in next
if len(self.__data) or self._refresh():
File "/usr/local/lib/python2.6/dist-packages/pymongo-1.9_-py2.6-linux-x86_64.egg/pymongo/cursor.py", line 564, in _refresh
self.__query_spec(), self.__fields))
File "/usr/local/lib/python2.6/dist-packages/pymongo-1.9_-py2.6-linux-x86_64.egg/pymongo/cursor.py", line 521, in __send_message
**kwargs)
File "/usr/local/lib/python2.6/dist-packages/pymongo-1.9_-py2.6-linux-x86_64.egg/pymongo/connection.py", line 743, in _send_message_with_response
return self.__send_and_receive(message, sock)
File "/usr/local/lib/python2.6/dist-packages/pymongo-1.9_-py2.6-linux-x86_64.egg/pymongo/connection.py", line 724, in __send_and_receive
return self.__receive_message_on_socket(1, request_id, sock)
File "/usr/local/lib/python2.6/dist-packages/pymongo-1.9_-py2.6-linux-x86_64.egg/pymongo/connection.py", line 714, in __receive_message_on_socket
struct.unpack("<i", header[8:12])[0])
AssertionError: ids don't match -561338340 0
<Greenlet at 0x2baa628: <bound method Worker.work of <scripts.mainconverter.Worker object at 0x2ba8450>>> failed with AssertionError
Can anyone tell what cause this exeption and how to fix this.
Thanks.
This is likely a threading problem related to how you are using worker threads with gevent coroutines. It seems like the pymongo connection object is reading a response for a request it didn't make.