Error opening megawarc archive from Python - python

I've found myself having to use a python script to access a webarchive.
What I have is a 'megawarc' web archive file from http://archive.org/details/archiveteam-fanfiction-warc-11. I need to un-megawarc this, using the python script found at https://github.com/alard/megawarc.
I'm trying to run the 'restore' command, and I have the three files needed (FILE.warc.gz,
FILE.tar, and FILE.json.gz) from the first link.
I have both python 2.7 and 3.3 installed.
--------------update--------------
I've ran both this method..
python megawarc restore FILE
and this method..
Make sure you have the files megawarc and ordereddict.py in the same directory, with the files you want to convert.
Rename the file megawarc to megawarc.py
Open a python console in this directory
Type the following code (line by line) :
import sys
sys.argv = ['megawarc','restore','FILE']
import megawarc
megawarc.main()
using python 2.7, and this is what I get..
c:\Python27>python megawarc restore FILE
Traceback (most recent call last):
File "megawarc", line 563, in <module>
main()
File "megawarc", line 552, in main
mwr.process()
File "megawarc", line 460, in process
self.process_entry(entry, tar_out)
File "megawarc", line 478, in process_entry
entry["target"]["offset"], entry["target"]["size"])
File "megawarc", line 128, in copy_to_stream
raise Exception("End of file: %d bytes expected, but %d bytes read." % (buf_size, l))
Exception: End of file: 4096 bytes expected, but 236 bytes read.
Is there something else i'm missing?
I have the following files all in
c:\python27
FILE.megawarc.json.gz
FILE.megawarc.tar
FILE.megawarc.warc.gz
megawarc
ordereddict.py
Is this some type of corrupt file error? Is there something i'm missing?

On the second link you provided, there are two important files :
megawarc
ordereddict.py
The executable script is megawarc. To run it, you have to launch it in a shell with
python megawarc restore FILE
Alternatively, if you're using a UNIX-based system. You can do
chmod +x megawarc
To give megawarc script executable property and then run it with
./megawarc restore FILE
Here, FILE is the actual name you should type if the 3 files you have are FILE.warc.gz, FILE.tar, and FILE.json.gz. You have to change this parameter by the common prefix to your 3 input files if needed.
EDIT :
Okay, i found an alternative that would work if you don't have a standard shell to start the script in command line.
What you have to do is :
Make sure you have the files megawarc and ordereddict.py in the same directory, with the files you want to convert.
Rename the file megawarc to megawarc.py
Open a python console in this directory
Type the following code (line by line) :
import sys
sys.argv = ['megawarc','restore','FILE']
import megawarc
megawarc.main()
This should work, i've just tried it.
Hope it will help.

Related

Python on RPi: Errno 2 No such file or directory: 'config.json' but config.json is in the same folder as the main.py script

Hey all
I am trying to run a python script from GitHub, https://github.com/dudisgit/gmod_toolgun_prop for a project with a functioning screen and I put a command at the end of the .bashrc file
python3 /home/pi/gmod_toolgun_prop-main/main.py
so that the code executes as soon as the RPi powers up. When running the script in Thonny's Python IDE on my RPi 2B it executes no problem and the screen works. However when I open terminal I get an error message from the code running in the .bashrc file:
Traceback (most recent call last):
File "/home/pi/gmod_toolgun_prop-main/main.py", line 381, in <module>
main()
File "/home/pi/gmod_toolgun_prop-main/main.py", line 357, in main
with open(args.config) as config_file:
FileNotFoundError: [Errno 2] No such file or directory: 'config.json'
However, the config.json file is in the same folder as the main.py file as shown below:
Screenshot of file explorer showing config.json is in the same folder as main.py
And here's a screenshot of the error message
And here's the code that is refenced in the error message as line 357:
with open(args.config) as config_file:
config = json.load(config_file)
The entirety of the main.py script is in the Github link as attached in the first paragraph as well as the config.json file and other relevant files.
I am fairly new to the Python programming space so I don't understand what could be causing this error nor how this script handles opening the config.json file.
I have tried creating a custom service but it spits out the same error. Crontab and the local.bashrc file just doesn't work straight up for this. This is the furthest I have got with it attempting to execute on boot.
Maybe it helps to use the absolute path of config.json, i.e.(according to your description)
/home/pi/gmod_toolgun_prop-main/config.json
instead of the simple filename config.json.
If Python says the file is not there, then the file is not there. The only question is, where is there?
The filename in this case is config.json. Since there's no / (no directory name), the name is taken to be relative to the current working directory. That might or might not be the same as the directory of the main Python module, here /home/pi/gmod_toolgun_prop-main/main.py.
You can verify that by printing the current working directory just before opening the file. You can use os.getcwd to do that. Or, use strace(1) to show the interpreter's attempts to open config.json.

Opening a file in Python gives me errors

I am quite new to Python and I am having problems opening a file in Python.
I want to open a text file called 'coolStuff' in a Folder on my Desktop and this is how I type in the command but I still get an error message. The file exists and so I do not understand why I get that error message.
open("coolStuff.txt","r")
Traceback (most recent call last):
File "<pyshell#11>", line 1, in <module>
open("coolStuff.txt","r")
FileNotFoundError: [Errno 2] No such file or directory: 'coolStuff.txt'
If you want to simply supply a filename like coolStuff.txt without also providing a full directory, then you have to make sure that Python is running in the same directory as the file. If you aren't sure what directory Python is running in, try this:
import os
print(os.getcwd())
You have two options:
let's say your file is in C:\path\to\dir\coolStuff.txt
1.
open(r'C:\path\to\dir\coolStuff.txt','r')
2.
import os
os.chdir(r'c:\path\to\dir')
open('coolStuff.txt', 'r')
Because the file you want to open is not in the current directory.You can find the file 'coolStuff.txt' in the terminal and launch your python environment at the same directory.

Trying to make PLY work for the first time

I'm new to Python and I'm having some problems trying to make PLY works. For now, all I want is to successfully run the example from the PLY homepage.
At first I tried to just download PLY-3.8, put the ply folder in the same directory I saved the example (calc.py) and ran it. The calc.py file is at the C:\Users\...\Python directory and the ply folder is the C:\Users\...\Python\ply, just to make it clearer. But I got an ImportError: No module named 'ply'.
Then I searched for a while, tried to update something called distutils and install the modules through the Windows PowerShell and so on and so forth, but none of that worked and I just reset the whole thing (reinstalling Python and all of that). But then I finally got it to work by simply inserting into the sys.path the directory path where the script I was running (edit: in interactive mode) was, by doing this:
import sys
sys.path.insert(0,'C:\\Users\\ ... \\Python')
This fixed the ImportError but, and this is where I am now, there are a bunch of other errors:
Traceback (most recent call last):
File "C:\Users\...\Python\calc.py", line 48, in <module>
lexer = lex.lex()
File "C:\Users\...\Python\ply\lex.py", line 906, in lex
if linfo.validate_all():
File "C:\Users\...\Python\ply\lex.py", line 580, in validate_all
self.validate_rules()
File "C:\Users\...\Python\ply\lex.py", line 822, in validate_rules
self.validate_module(module)
File "C:\Users\...\Python\ply\lex.py", line 833, in validate_module
lines, linen = inspect.getsourcelines(module)
File "c:\users\...\python\python35\lib\inspect.py", line 930, in getsourcelines
lines, lnum = findsource(object)
File "c:\users\...\python\python35\lib\inspect.py", line 743, in findsource
file = getsourcefile(object)
File "c:\users\...\python\python35\lib\inspect.py", line 659, in getsourcefile
filename = getfile(object)
File "c:\users\...\python\python35\lib\inspect.py", line 606, in getfile
raise TypeError('{!r} is a built-in module'.format(object))
TypeError: <module '__main__'> is a built-in module
Now I have absolutely no idea what to do. I tried to search for a solution but had no luck. I appreciate if anyone can help me out.
I'm on Windows 10, using Python 3.5.0 and iep as my IDE (www.iep-project.org) if these informations are of any importance.
In short: I just want to successfully run the example from the PLY homepage and then I think I can figure out the rest.
EDIT: I found out that if I do:
import inspect
inspect.getfile(__main__)
I get the exact same (last) error from before:
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "c:\users\...\python\python35\lib\inspect.py", line 606, in getfile
raise TypeError('{!r} is a built-in module'.format(object))
TypeError: <module '__main__'> is a built-in module
I think this is the culprit, but I still don't know how to fix it.
EDIT 2: I got it to work and answered the question explaining how, but if someone have a more complete answer, I would love to hear it.
To anyone having this problem, I found what was the issue. I still don't know why exactly is like that, so if anyone have a more complete answer to provide I would appreciate (I'm still a newbie at Python).
Anyway, it seems this code can't be executed in Interactive mode, it needs to be executed as a script. To do that on IEP it's Run > Run file as script or Ctrl+Shift+E. On IDLE you need to Open... the file (Ctrl+O) and then Run Module (F5).
As to why it can't be executed in Interactive mode, here's a little bit about the difference between interactive mode and running as script from the IEP wizard:
Interactive mode vs running as script
You can run the current file or the main file normally, or as a script. When run as script, the shell is restared (sic) to provide a clean environment. The shell is also initialized differently so that it closely resembles a normal script execution.
In interactive mode, sys.path[0] is an empty string (i.e. the current dir), and sys.argv is set to [''].
In script mode, __file__ and sys.argv[0] are set to the scripts filename, sys.path[0] and the working dir are set to the directory containing the script.
That explains a bit about why the inspect.getfile(__main__) was throwing an error: the __main__ had no attribute __file__. And also why I had to insert the current directory into sys.path: sys.path didn't had the current directory in interactive mode.
I hope this helps someone.

Py2exe isn't copying webdriver_prefs.json into builds

I'm using py2exe to compile a Python 2.7 script that uses Selenium 2.39.0 to open up Firefox windows and carry out some routines. In the past, I've been able to compile the code without any issue. Today though, after updating from Selenium 2.35 to 2.39, I'm running into trouble. When I try to run the .exe generated by the compiled code, I get the following error:
Exception in Tkinter callback
Traceback (most recent call last):
File "Tkinter.pyo", line 1410, in __call__
File "literatureonlineapi2.5.5.py", line 321, in startapi
File "selenium\webdriver\firefox\webdriver.pyo", line 43, in __init__
File "selenium\webdriver\firefox\firefox_profile.pyo", line 58, in __init__
IOError: [Errno 2] No such file or directory: 'C:\\Text\\Professional\\Digital H
umanities\\Programming Languages\\Python\\Query Literature Online\\LION 1.0\\2.5
\\2.5.5\\dist\\.\\selenium\\webdriver\\firefox\\webdriver_prefs.json'
Here we go!
Exception in Tkinter callback
Traceback (most recent call last):
File "Tkinter.pyo", line 1410, in __call__
File "literatureonlineapi2.5.5.py", line 321, in startapi
File "selenium\webdriver\firefox\webdriver.pyo", line 43, in __init__
File "selenium\webdriver\firefox\firefox_profile.pyo", line 58, in __init__
IOError: [Errno 2] No such file or directory: 'C:\\Text\\Professional\\Digital H
umanities\\Programming Languages\\Python\\Query Literature Online\\LION 1.0\\2.5
\\2.5.5\\dist\\.\\selenium\\webdriver\\firefox\\webdriver_prefs.json'
(This error does not appear when I run the uncompiled code.)
I came across a google code page that led me to believe newer versions of Selenium have had trouble with this missing webdriver_prefs.json file, but that didn't help me sort out the problem.
Does anyone know how I might manually provide the missing file? I would be grateful for any help others can offer.
I found a solution, and thought I would post it in case others have a similar problem. I found the missing webdriver_prefs.json file tucked away in
C:\Python27\Lib\site-packages\selenium-2.39.0-py2.7.egg\selenium\webdriver\firefox\
After I had navigated to that directory, I grabbed the webdriver_prefs.json file and the webdriver.xpi file. I then copied both of those files into
dist\selenium\webdriver\firefox\
created by py2exe, and was able to run the compiled code as expected. God save the queen.
I did the following to fix the problem:
Create a sub-folder \selenium\webdriver\firefox\ under dist.
Under command DOS prompt, enter python.exe setup_firefox.py
You could either running the executable under dist or copy all the files under "dist" to your own directory and run the executable from there.
Here is my setup_firefox.py:
from distutils.core import setup
import py2exe,sys,os
sys.argv.append('py2exe')
setup(
console=[{'script':"test.py"}],
options={
"py2exe":{
"skip_archive": True,
"unbuffered": True,
"optimize": 2
},
}
)
I had a related issue for which I have found a work round...
My issue
I was trying to run a python script that uses Selenium 2.48.0 and worked fine on the development machine but failed to open Firefox when py2exe'ed with the error message:
[Errno 2] No such file or directory:'C:\test\dist\library.zip\selenium\webdriver\firefox\webdriver_prefs.json'
Cause
I traced the problem to the following file in the selenium package
C:\Python27\Lib\site-packages\selenium\webdriver\firefox\firefox_profile.py
It was trying to open webdriver_prefs.json and webdriver.xpifrom the same parent directory
This works fine when running on the development machine but when the script is run through py2exe firefox_profile.pyc is added to library.zip but webdriver_prefs.json and webdriver.xpi aren't.
Even if you manual add these files to appropriate location in the zip file you will still get the 'file not found' message.
I think this is because the Selenium file can't cope with opening files from within the zip file.
Work Round
My work round was to get py2exe to copy the two missing files to the dist directory and then modify firefox_profile.py to check the directory string.
If it contained .zip modify the string to look in the parent directory
webdriver_prefs.json
class FirefoxProfile(object):
def __init__(self, profile_directory=None):
if not FirefoxProfile.DEFAULT_PREFERENCES:
'''
The next couple of lines attempt to WEBDRIVER_PREFERENCES json file from the directory
that this file is located.
However if the calling script has been converted to an exe using py2exe this file will
now live within a zip file which will cause the open line to fail with a 'file not found'
message. I think this is because open can't cope with opening a file from within a zip file.
As a work round in our application py2exe will copy the preference to the parent directory
of the zip file and attempt to load it from there
'''
if '.zip' in os.path.join(os.path.dirname(__file__)) :
# find the parent dir that contains the zipfile
parentDir = __file__.split('.zip')[0]
configFile = os.path.join(os.path.dirname(parentDir), WEBDRIVER_PREFERENCES)
print "Running from within a zip file, using [%s]" % configFile
else:
configFile = os.path.join(os.path.dirname(__file__), WEBDRIVER_PREFERENCES)
with open(configFile) as default_prefs:
FirefoxProfile.DEFAULT_PREFERENCES = json.load(default_prefs)
webdriver.xpi
def _install_extension(self, addon, unpack=True):
if addon == WEBDRIVER_EXT:
addon = os.path.join(os.path.dirname(__file__), WEBDRIVER_EXT)
tmpdir = None
xpifile = None
'''
The next couple of lines attempt to install the webdriver xpi from the directory
that this file is located.
However if the calling script has been converted to an exe using py2exe this file will
now live within a zip file which will cause the script to fail with a 'file not found'
message. I think this is because it can't cope with opening a file from within a zip file.
As a work round in our application py2exe will copy the .xpi to the parent directory
of the zip file and attempt to load it from there
'''
if '.zip' in addon :
# find the parent dir that contains the zipfile
parentDir = os.path.dirname(addon.split('.zip')[0])
addon = os.path.join(parentDir, os.path.basename(addon))
print "Running from within a zip file, using [%s]" % addon
if addon.endswith('.xpi'):
tmpdir = tempfile.mkdtemp(suffix='.' + os.path.split(addon)[-1])
compressed_file = zipfile.ZipFile(addon, 'r')
for name in compressed_file.namelist():
if name.endswith('/'):
if not os.path.isdir(os.path.join(tmpdir, name)):
os.makedirs(os.path.join(tmpdir, name))
else:
if not os.path.isdir(os.path.dirname(os.path.join(tmpdir, name))):
os.makedirs(os.path.dirname(os.path.join(tmpdir, name)))
data = compressed_file.read(name)
with open(os.path.join(tmpdir, name), 'wb') as f:
f.write(data)
xpifile = addon
addon = tmpdir
I found the sulution, the py2exe can't open zip file. So after copy the webdriver_prefs.json and webdriver.xpi, decompression the library.zip into a folder named "library.zip"

calling python script from another script

i'm trying to get simple python script to call another script, just in order to understand better how it's working. The 'main' code goes like this:
#!/usr/bin/python
import subprocess
subprocess.call('kvadrat.py')
and the script it calls - kvadrat.py:
#!/usr/bin/python
def kvadriranje(x):
kvadrat = x * x
return kvadrat
print kvadriranje(5)
Called script works on its own, but when called through 'main' script error occurs:
Traceback (most recent call last):
File "/Users/user/Desktop/Python/General Test.py", line 5, in <module>
subprocess.call('kvadrat.py')
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/subprocess.py", line 444, in call
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/subprocess.py", line 595, in __init__
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/subprocess.py", line 1106, in _execute_child
OSError: [Errno 2] No such file or directory
Obviously something's wrong, but being a beginner don't see what.
you need to give it the full path to the script that you are trying to call, if you want to do this dynamically (and you're in the same directory), you can do:
import os
full_path = os.path.abspath('kvadrat.py')
Have you tried:
from subprocess import call
call(["python","kvadrat.py"]) #if in same directory, else get abs path
You should also check if your file is there:
import os
print os.path.exists('kvadrat.py')
Subprocess.call requires that the file is executable and found in path. In unix systems, you can try to use subprocess.call(['./kvadrat.py']) to execute a kvadrat.py file in the current working directory and make sure the kvadrat.py has executable permissions on; or you can copy it to a directory in your PATH, such as /usr/local/bin - then it is executable from anywhere as you wanted.
Most of the time you do not want to run other python applications using subprocess but instead just importing them as modules, however...

Categories