NBT Parser Minecraft mca file not a gzipped file error - python

I try to read a Minecraft world with Python from the filesystem and the .mca region/anvil files using the NBT 1.4.1 module (Named Binary Tag Reader/Writer), which is supposed to read the NBT format used in Minecraft. It works fine for files such as level.dat, but throws an error for the region files such as r.0.0.mca
Edit: I am referring to the auto generated world files that minecraft stores in the .minecraft/saves/"MyWorld"/ folder. Such as the level.dat (which works), and the mca files stored in the .minecraft/saves/"MyWorld"/region/ folder such as r.0.0.mca which don't work. I uploaded two sample files from one of my worlds.
Code:
from nbt import nbt
level_file = nbt.NBTFile("level.dat", "rb") # works
region_file = nbt.NBTFile("r.0.0.mca", "rb")# does not work
Error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.5/dist-packages/nbt/nbt.py", line 508, in __init__
self.parse_file()
File "/usr/local/lib/python3.5/dist-packages/nbt/nbt.py", line 532, in parse_file
type = TAG_Byte(buffer=self.file)
File "/usr/local/lib/python3.5/dist-packages/nbt/nbt.py", line 85, in __init__
self._parse_buffer(buffer)
File "/usr/local/lib/python3.5/dist-packages/nbt/nbt.py", line 90, in _parse_buffer
self.value = self.fmt.unpack(buffer.read(self.fmt.size))[0]
File "/usr/lib/python3.5/gzip.py", line 274, in read
return self._buffer.read(size)
File "/usr/lib/python3.5/_compression.py", line 68, in readinto
data = self.read(len(byte_view))
File "/usr/lib/python3.5/gzip.py", line 461, in read
if not self._read_gzip_header():
File "/usr/lib/python3.5/gzip.py", line 409, in _read_gzip_header
raise OSError('Not a gzipped file (%r)' % magic)
OSError: Not a gzipped file (b'\x00\x00')
Any suggestions how to get this working?

r.0.0.mca is most definitely not compressed. About 80% of the bytes are zeros.

It turns out that the NBT library only supports .mcr region files which have been replaced by .mca files about 6 years ago. However, mcedit is written in Python and supports those files. Due the changes in the Minecraft save format, the interpretation of the content needs to be adjusted though, but the files can be successfully read.

Related

Unable to read_excel using pandas on CentOS Stream 9 VM: zipfile.BadZipFile: Bad magic number for file header

I've been running a script for several months now where I read and concat several excel exports using the following code:
files = os.listdir(os.path.abspath('exports/'))
for file in files:
if file.startswith('ap_statistics_') and file.endswith('.xlsx'):
excel_list.append(pd.read_excel('exports/' + file, sheet_name='Access Points'))
df = pd.concat(excel_list, axis=0, ignore_index=True)
This has worked just fine until this Saturday when I uploaded new exports to the CentOS Stream 9 VM where I have a cronjob running the script every hour.
Now I always get this error:
Traceback (most recent call last):
File "/root/projects/beacon_check_v8/main.py", line 310, in <module>
ap_check()
File "/root/projects/beacon_check_v8/main.py", line 260, in ap_check
siteaps_result = getaps()
File "/root/projects/beacon_check_v8/main.py", line 30, in getaps
excel_list.append(pd.read_excel('exports/' + file, sheet_name='Access Points'))
File "/root/projects/beacon_check_v8/venv/lib64/python3.9/site-packages/pandas/util/_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "/root/projects/beacon_check_v8/venv/lib64/python3.9/site-packages/pandas/io/excel/_base.py", line 457, in read_excel
io = ExcelFile(io, storage_options=storage_options, engine=engine)
File "/root/projects/beacon_check_v8/venv/lib64/python3.9/site-packages/pandas/io/excel/_base.py", line 1419, in __init__
self._reader = self._engines[engine](self._io, storage_options=storage_options)
File "/root/projects/beacon_check_v8/venv/lib64/python3.9/site-packages/pandas/io/excel/_openpyxl.py", line 525, in __init__
super().__init__(filepath_or_buffer, storage_options=storage_options)
File "/root/projects/beacon_check_v8/venv/lib64/python3.9/site-packages/pandas/io/excel/_base.py", line 518, in __init__
self.book = self.load_workbook(self.handles.handle)
File "/root/projects/beacon_check_v8/venv/lib64/python3.9/site-packages/pandas/io/excel/_openpyxl.py", line 536, in load_workbook
return load_workbook(
File "/root/projects/beacon_check_v8/venv/lib64/python3.9/site-packages/openpyxl/reader/excel.py", line 317, in load_workbook
reader.read()
File "/root/projects/beacon_check_v8/venv/lib64/python3.9/site-packages/openpyxl/reader/excel.py", line 277, in read
self.read_strings()
File "/root/projects/beacon_check_v8/venv/lib64/python3.9/site-packages/openpyxl/reader/excel.py", line 143, in read_strings
with self.archive.open(strings_path,) as src:
File "/usr/lib64/python3.9/zipfile.py", line 1523, in open
raise BadZipFile("Bad magic number for file header")
zipfile.BadZipFile: Bad magic number for file header
I develop on my Windows 10 notebook using PyCharm with a Python 3.9 venv, same as on the VM, where the script continued to work just fine.
When researching online all I found was that sometimes .pyc files can cause issues so I created a completely new venv on the VM, installed all libraries (netmiko, pandas, openpyxl, etc.) and tried running the script again before and after deleting all .pyc files in the directory but no luck.
I have extracted the Excel file header using the following code:
with open('exports/' + file, 'rb') as myexcel:
print(myexcel.read(4))
Unfortunately it comes back as the same values on both my Windows venv as well as the CentOS venv:
b'PK\x03\x04'
I don't know if this header value is correct or not but I can read the files on my Windows notebook just fine using pandas or excel.
Any help would be greatly appreciated.
The issue was actually the program I used to transfer the files between my notebook and the VM, WinSCP. I don't know why or how this caused the error but I was able to fix it by transferring directly over pscp.

Google App Engine dev_appserver.py: watcher_ignore_re flag "is not JSON serializable"

Why I run the dev_appserver.py with the option watcher_ignore_re, I get an error message that the regex is not JSON serializable.
Is this a bug with the development server? Am I using this command improperly? The command and callstack is printed below.
C:\Users\mes65\Documents\MyProject>"C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\bin\dev_appserver.py" ^
--watcher_ignore_re="(.*\.git|.*\.idea|tmp\.py)" ^
"C:\Users\mes65\Documents\MyProject"
WARNING 2018-06-06 09:28:59,161 appinfo.py:1622] lxml version "2.3" is deprecated, use one of: "3.7.3"
INFO 2018-06-06 09:28:59,187 devappserver2.py:120] Skipping SDK update check.
Traceback (most recent call last):
File "C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\platform\google_appengine\dev_appserver.py", line 96, in <module>
_run_file(__file__, globals())
File "C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\platform\google_appengine\dev_appserver.py", line 90, in _run_file
execfile(_PATHS.script_file(script_name), globals_)
File "C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\platform\google_appengine\google\appengine\tools\devappserver2\devappserver2.py", line 454, in <module>
main()
File "C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\platform\google_appengine\google\appengine\tools\devappserver2\devappserver2.py", line 442, in main
dev_server.start(options)
File "C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\platform\google_appengine\google\appengine\tools\devappserver2\devappserver2.py", line 163, in start
bool(ssl_certificate_paths), options)
File "C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\platform\google_appengine\google\appengine\tools\devappserver2\metrics.py", line 166, in Start
self._cmd_args = json.dumps(vars(cmd_args)) if cmd_args else None
File "C:\Python27\lib\json\__init__.py", line 244, in dumps
return _default_encoder.encode(obj)
File "C:\Python27\lib\json\encoder.py", line 207, in encode
chunks = self.iterencode(o, _one_shot=True)
File "C:\Python27\lib\json\encoder.py", line 270, in iterencode
return _iterencode(o, 0)
File "C:\Python27\lib\json\encoder.py", line 184, in default
raise TypeError(repr(o) + " is not JSON serializable")
TypeError: <_sre.SRE_Pattern object at 0x00000000063C2188> is not JSON serializable
It looks like it is an issue with the google analytics code built into dev_appserver2 (google-cloud-sdk\platform\google_appengine\google\appengine\tools\devappserver2\devappserver2.py on or around line 316). It wants to send all of your command line options to google analytics. If you remove the analytics client id by adding the command line option --google_analytics_client_id= (note: '=' without any following value) the appserver won't call the google analytics code where it is trying to JSON serialize an SRE object and failing. However, since you are on Windows I find that the --watcher_ignore_re does not work anyway even when you get past this issue.
There is a comment in file_watcher.py
TODO: b/33178251 - Add watcher_ignore_re support for windows.
I also faced with this usability problem on Windows and was really disappointed. I tried to find some workarounds but I hadn't found any appropriate way.
In the end, I decided to make my own implementation of support watcher_ignore_re for Windows. I put required changes in my Github repo
If describe them in several words:
Add _watcher_ignore_re, _skip_files_re properties and its setters
Add import statement from google.appengine.tools.devappserver2 import watcher_common and use it in newly created def _path_ginored
Filter additional_changes before adding them to watcher changed files
For resolving mentioned problem with not serializable regex attribute we should drop them from the serialized dictionary. Fix for this is added as consequent commit and can be checked at metrics.py:185-193.
I hope it helps other guys enjoy developing on GAE on Windows :)

Python unable to open a .h5 file

I am trying to open a HDF5 file in order to read it with python, so that I can do more things with it later. There is an error when I run the program to read the file. The program is below:
import h5py # HDF5 support
import numpy
fileName = "C:/.../file.h5"
f = h5py.File(fileName, "r")
for item in f.attrs.keys():
print item + ":", f.attrs[item]
mr = f['/entry/mr_scan/mr']
i00 = f['/entry/mr_scan/I00']
print "%s\t%s\t%s" % ("#", "mr", "I00")
for i in range(len(mr)):
print "%d\t%g\t%d" % (i, mr[i], i00[i])
f.close()
If I run the program I end up seeing this error:
Traceback (most recent call last):
File "TestHD5.py", line 8, in <module>
mr = f['/entry/mr_scan/mr']
File "h5py\_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (C:\aroot\work\h5py\_objects.c:2587)
File "h5py\_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (C:\aroot\work\h5py\_objects.c:2546)
File "C:\programs\Python27\lib\site-packages\h5py\_hl\group.py", line 166, in __getitem__
oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
File "h5py\_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (C:\aroot\work\h5py\_objects.c:2587)
File "h5py\_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (C:\aroot\work\h5py\_objects.c:2546)
File "h5py\h5o.pyx", line 190, in h5py.h5o.open (C:\aroot\work\h5py\h5o.c:3417)
KeyError: 'Unable to open object (Component not found)'
Am I just missing some modules to read the file, or is this something else. It will open the .h5 file if I use an h5 file veiwer program. Thank you
Your string:
path = "C:\Users\312001\m2020\data\20170104_145626\doPoint_20170104_150016\dataset_XMIT data_20170104_150020.h5"
is full of broken/illegal escapes (thankfully they will be turned into SyntaxErrors, though you are using Python 2), and some that actually do work, so Python thinks path is really equal to: 'C:\\Users\xca001\\m2020\\data\x8170104_145626\\doPoint_20170104_150016\\dataset_XMIT data_20170104_150020.h5' (note those \x##'s).
Your options:
Use a raw string by prefixing the string literal with r
Don't use backslashes for paths. Python will convert forward slashes to backslashes for Windows paths.
Double-backslash.
The answer that #NickT posted fixed the orginal problem I had. The problem that is shown in the new version is due to the hd5 folder names in the hd5 file not matching the folder names that the code provided.

wave.Error: file does not start with RIFF id

I am trying to use the SpeechRecognition library (https://pypi.python.org/pypi/SpeechRecognition/). When running the example code (full example) below:
#!/usr/bin/env python3
import speech_recognition as sr
# obtain path to "english.wav" in the same folder as this script
from os import path
AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "english.wav")
#AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "french.aiff")
#AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "chinese.flac")
# use the audio file as the audio source
r = sr.Recognizer()
with sr.AudioFile(AUDIO_FILE) as source:
audio = r.record(source) # read the entire audio file
I receive this error:
/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/bin/python3.4 /Users/adamg/te/Polli/ASR/SpeechRecognitionTest0.py
Traceback (most recent call last):
File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/speech_recognition/__init__.py", line 174, in __enter__
self.audio_reader = wave.open(self.filename_or_fileobject, "rb")
File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/wave.py", line 497, in open
return Wave_read(f)
File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/wave.py", line 163, in __init__
self.initfp(f)
File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/wave.py", line 130, in initfp
raise Error('file does not start with RIFF id')
wave.Error: file does not start with RIFF id
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/speech_recognition/__init__.py", line 179, in __enter__
self.audio_reader = aifc.open(self.filename_or_fileobject, "rb")
File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/aifc.py", line 887, in open
return Aifc_read(f)
File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/aifc.py", line 340, in __init__
self.initfp(f)
File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/aifc.py", line 305, in initfp
raise Error('file does not start with FORM id')
aifc.Error: file does not start with FORM id
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/adamg/te/Polli/ASR/SpeechRecognitionTest0.py", line 13, in <module>
with sr.AudioFile(AUDIO_FILE) as source:
File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/speech_recognition/__init__.py", line 199, in __enter__
self.audio_reader = aifc.open(aiff_file, "rb")
File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/aifc.py", line 887, in open
return Aifc_read(f)
File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/aifc.py", line 340, in __init__
self.initfp(f)
File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/aifc.py", line 303, in initfp
chunk = Chunk(file)
File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/chunk.py", line 63, in __init__
raise EOFError
EOFError
Process finished with exit code 1
How can this be fixed?
I had the same problem for a while, my mistake is to download the wav file using "save as". When I double click to play the wav file, I am not able to play the wav file using any player. After cloning or downloading the entire zip file, I am able to play the wav file and the error disappears.
Your wav file is probably corrupted. To check try to play the file using any media player if possible. Download the audio file into your environment correctly and then it should work.
I have faced this issue when I did a mistake by downloading the file the incorrect way, I downloaded the audio file using the following command in the google colab:
[!wget 'https://github.com/mozilla/DeepSpeech/blob/master/data/smoke_test/LDC93S1_pcms16le_1_16000.wav']
This threw an error as the audio file was corrupted(could not be played my media player when downloaded).
I was able to correct the issue by downloading the following way:
[!wget 'https://raw.githubusercontent.com/mozilla/DeepSpeech/master/data/smoke_test/LDC93S1_pcms16le_1_16000.wav']
For what it's worth, I ran into this problem when git LFS WAV files I had stored in a repo were not cloned properly. Their pointers were present, but the files were not. A
git lfs pull
fixed the problem.
That being said, this might just happen any time you a file is pointed to that isn't actually an audio file that is "complete". Hopefully that helps!

Merging PDF files with Python3

I am writing a small script that needs to merge many one-page pdf files. I want the script to run with Python3 and to have as few dependencies as possible.
For the PDF merging part, I tried using PyPdf. However, the Python 3 support seems to be buggy; It can't handle inkscape generated PDF files (which I need). I have the current git version of PyPdf installed, and the following test script doesn't work:
import PyPDF2
output_pdf = PyPDF2.PdfFileWriter()
with open("testI.pdf", "rb") as input:
input_pdf = PyPDF2.PdfFileReader(input)
output_pdf.addPage(input_pdf.getPage(0))
with open("test.pdf", "wb") as output:
output_pdf.write(output)
It throws the following stack trace:
Traceback (most recent call last):
File "test.py", line 7, in <module>
output.addPage(input.getPage(0))
File "/usr/lib/python3.3/site-packages/pyPdf/pdf.py", line 420, in getPage
self._flatten()
File "/usr/lib/python3.3/site-packages/pyPdf/pdf.py", line 574, in _flatten
self._flatten(page.getObject(), inherit)
File "/usr/lib/python3.3/site-packages/pyPdf/generic.py", line 165, in getObject
return self.pdf.getObject(self).getObject()
File "/usr/lib/python3.3/site-packages/pyPdf/pdf.py", line 616, in getObject
retval = readObject(self.stream, self)
File "/usr/lib/python3.3/site-packages/pyPdf/generic.py", line 66, in readObject
return DictionaryObject.readFromStream(stream, pdf)
File "/usr/lib/python3.3/site-packages/pyPdf/generic.py", line 526, in readFromStream
value = readObject(stream, pdf)
File "/usr/lib/python3.3/site-packages/pyPdf/generic.py", line 57, in readObject
return ArrayObject.readFromStream(stream, pdf)
File "/usr/lib/python3.3/site-packages/pyPdf/generic.py", line 152, in readFromStream
obj = readObject(stream, pdf)
File "/usr/lib/python3.3/site-packages/pyPdf/generic.py", line 86, in readObject
return NumberObject.readFromStream(stream)
File "/usr/lib/python3.3/site-packages/pyPdf/generic.py", line 231, in readFromStream
return FloatObject(name.decode("ascii"))
File "/usr/lib/python3.3/site-packages/pyPdf/generic.py", line 207, in __new__
return decimal.Decimal.__new__(cls, str(value), context)
TypeError: optional argument must be a context
The same script, however, works flawlessly with Python 2.7.
What am I doing wrong here? Is it a bug in the library? Can I work around it without touching the PyPDF library?
So I found the answer. The decimal.Decimal module in Python3.3 shows some weird behaviour. This is the corresponding StackOverflow question: Instantiate Decimal class I added some workaround to the PyPDF2 library and submitted a pull request.
Just to make sure you are aware of already existing tools that do exactly this:
PDFtk
PDFjam (my favourite, requires LaTeX though)
Directly with GhostScript:
gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=finished.pdf file1.pdf file2.pdf

Categories