Install virtualenv fails because an HTML is downloaded not a tar.gz - python

I follow instructions here:
However when I try to tar xvfz:
gzip: stdin: not in gzip format tar: Child returned status 1 tar:
Error is not recoverable: exiting now
this happens because apparently, I have downloaded an HTML:
file virtualenv-16.0.tar.gz
gives:
virtualenv-16.0.tar.gz: HTML document, ASCII text, with no line
terminators
I guess the problem could be in some settings on my machine, but have no idea which ones.
Thanks for any help.

This is because the file url mentioned in the instructions, you followed is invalid and so it returns following html code instead of expected tar.gz file:
<html><head><title>301 Moved Permanently</title></head><body><center><h1>301 Moved Permanently</h1></center></body></html>
Solution
Try below url:
https://files.pythonhosted.org/packages/33/bc/fa0b5347139cd9564f0d44ebd2b147ac97c36b2403943dbee8a25fd74012/virtualenv-16.0.0.tar.gz
You can refer from here:
https://pypi.org/project/virtualenv/#files

Related

Protobuf Decode Error when reading bytes from withing Bitbucket

I have a protobuf file saved as bytes on Windows.
I am reading it as follows:
tlog = tlog_schema_pb2.TLog()
with open("tests/unittests/data/tlog.proto", "rb") as f:
tlog.ParseFromString(f.read())
It all fine. But when I push my changes with git to Bitbucket, there I get an error:
google.protobuf.message.DecodeError: Error parsing message with type 'globusdigital.tlogprocessing.TLog'
I cannot understand what can it be. What can change when I push to Bitbucket?
It appears that the problem was in pre-commit.
I had trailing-whitespace check there, and it was destructing protobuf file.
To solve it, just add
files: '\.pyi?$'
at the end of .pre-commit-config.yaml (would be cool to exclude files, but did not understand how)

Python: Error tokenizing data. C error: Calling read(nbytes) on source failed with input nzip file

I am using conda python 2.7
python --version
Python 2.7.12 :: Anaconda 2.4.1 (x86_64)
I have fallowing method to read large gzip files:
df = pd.read_csv(os.path.join(filePath, fileName),
sep='|', compression = 'gzip', dtype='unicode', error_bad_lines=False)
but when I read the file I get the following error:
pandas.parser.CParserError: Error tokenizing data. C error: Calling read(nbytes) on source failed. Try engine='python'.
Segmentation fault: 11
I read all the existing answers but most of those questions had errors such as additional columns. I was already handling that with error_bad_lines=False option.
What are my options here?
Found something interesting when I tried to uncompress the file:
gunzip -k myfile.txt.gz
gunzip: myfile.txt.gz: unexpected end of file
gunzip: myfile.txt.gz: uncompress failed
Chances are the path you put is actually that of a folder instead of the file that needs to be read.
Pandas.read_csv can't read folders and need explicit compatible file names.
I didn't really find a python solution but using unix tools I manage to find a solution:
First I use zless myfile.txt.gz > uncompressedMyfile.txt
then I use sed tool to remove the last line because I clearly saw that last line was corrupt.
sed '$d' uncompressedMyfile.txt
I gzipped the file again gzip -k uncompressedMyfile.txt
I was able to successfully read the file with following python code:
try:
df = pd.read_csv(os.path.join(filePath, fileName),
sep='|', compression = 'gzip', dtype='unicode', error_bad_lines=False)
except CParserError:
print "Something wrong the file"
return df
Sometimes the error shows up if you have the file already open. Try closing the file and re-running
The input zip file is corrupted. Get a proper copy of this file from the source of try to use zip repairing tools before you pass it along to pandas.
This error also occurs when trying to load in files from oneDrive, and you are not login or its not running on your pc.

file overwrite in python

I am using python to crawl web pages and I am doing it iteratively- so I am using 3 html files to store the web pages but somehow I am finding that these files are not getting overwritten and I am still getting old files. Here is the code that I am using:
def Vals(a,b):
file1="C:\\Users\\YAS_ayush\\Desktop\\dataset_recommendation\\file1.html"
file2="C:\\Users\\YAS_ayush\\Desktop\\dataset_recommendation\\file22.html"
file3="C:\\Users\\YAS_ayush\\Desktop\\dataset_recommendation\\file33.html"
Query1='"http://scholar.google.com/scholar?q=%22'+a+'%22&btnG=&hl=en&as_sdt=0%2C24"'
URL1='wget --user-agent Mozilla '+Query1+' -O '+file1
Query2='"http://scholar.google.com/scholar?q=%22'+b+'%22&btnG=&hl=en&as_sdt=0%2C24"'
URL2='wget --user-agent Mozilla '+Query2+' -O '+file2
Query3='"http://scholar.google.com/scholar?q=%22'+a+'%22+%22'+b+'%22&btnG=&hl=en&as_sdt=0%2C24"'
URL3='wget --user-agent Mozilla '+Query3+' -O '+file3
## print Query1
## print Query2
## print Query3
##
## print URL1
## print URL2
## print URL3
os.system("wget "+ URL1)
os.system("wget "+ URL2)
os.system("wget "+ URL3)
f1 = open(file1,'r+')
f2 = open(file2,'r+')
f3 = open(file3,'r+')
S1=str(f1.readlines())
start=S1.find("About")+6
stop=S1.find("results",start)-1
try:
val1=float((S1[start:stop]).replace(",",""))
except ValueError:
val1=Reads('C:\\Users\\YAS_ayush\\Desktop\\dataset_recommendation\\file1.html')
S1=str(f2.readlines())
#f2.close()
start=S1.find("About")+6
stop=S1.find("results",start)-1
try:
val2=float((S1[start:stop]).replace(",",""))
except ValueError:
val2=Reads('C:\\Users\\YAS_ayush\\Desktop\\dataset_recommendation\\file22.html')
S1=str(f3.readlines())
#f3.close()
start=S1.find("About")+6
stop=S1.find("results",start)-1
try:
val3=float((S1[start:stop]).replace(",",""))
except ValueError:
val3=Reads('C:\\Users\\YAS_ayush\\Desktop\\dataset_recommendation\\file33.html')
f1.truncate()
f2.truncate()
f3.truncate()
f1.close()
f2.close()
f3.close()
return (val1,val2,val3)
Can anyone tell if there is some error in closing the files or how shall I close them for my purpose.
Thanks
you're using the -O (capital O) option, which concatenates everything to 1 file.
‘-O file’
‘--output-document=file’
The documents will not be written to the appropriate files, but all will be concatenated together and written to file. If ‘-’ is used as file, documents will be printed to standard output, disabling link conversion. (Use ‘./-’ to print to a file literally named ‘-’.)
Use of ‘-O’ is not intended to mean simply “use the name file instead of the one in the URL;” rather, it is analogous to shell redirection: wget -O file http://foo is intended to work like wget -O - http://foo > file; file will be truncated immediately, and all downloaded content will be written there.
For this reason, ‘-N’ (for timestamp-checking) is not supported in combination with ‘-O’: since file is always newly created, it will always have a very new timestamp. A warning will be issued if this combination is used.
Similarly, using ‘-r’ or ‘-p’ with ‘-O’ may not work as you expect: Wget won't just download the first file to file and then download the rest to their normal names: all downloaded content will be placed in file. This was disabled in version 1.11, but has been reinstated (with a warning) in 1.11.2, as there are some cases where this behavior can actually have some use.
Note that a combination with ‘-k’ is only permitted when downloading a single document, as in that case it will just convert all relative URIs to external ones; ‘-k’ makes no sense for multiple URIs when they're all being downloaded to a single file; ‘-k’ can be used only when the output is a regular file.
This snippet was taken from wget's manual.
Hope this helps.

Wrong filename when pushing a file to a user through HTTP using Python over Apache

I'm adding functionality onto our website so that users can download files stored in a database. The problem is that I cannot properly specify the filename for the user - the user is instead prompted to save the file with the name of the main python script running the website. I am setting the Content-Disposition information but its not working as expected. I've edited the code down to the following which still fails to work:
import sys, os
import mydatabasemodule
PDFReport = [...read file from database ...]
print('Content-Type: application/octet-stream\n')
print('Content-Disposition: attachment; filename=\"mytest.pdf\"\n')
print(report)
sys.stdout.close()
Running this code prompts the user to download the file as mysite.py. The PDF downloads correctly just with the wrong filename.
Can anyone tell what I'm doing wrong here? In the full version of the code, I also set Content-Description and Content-Length but that also fails. The files are small and I am trying to avoid saving them to disk but even when I do so, the same problem happens.
[edit]
The webserver is running CentOS 5.5, Python 2.4.3, Apache 2.2.3, and mod_python. I've tested this on an Ubuntu 11.04 client using Google Chrome 17.0.963.46 beta and Firefox 13. If I instead try to show the PDF inline:
print('Content-type: application/pdf\n')
print('Content-Disposition: inline; filename=\"mytest.pdf\"\n')
print("Content-Length: %d" % len(report))
then Chrome shows the PDF (with a plugin) and Firefox asks to save the file, recognizing it as a PDF but still with the wrong filename i.e. the filename is still the script name.
[edit]
The solution was given below by Mike. I think the problem was the newline I added in the first line above. Since print adds a newline, this second newline signaled the end of the header so the Content-Disposition line was never read. Thanks to all for the quick help!
In python versions < 3.0 print is not a function and automatically adds a newline char. Try this.
import sys, os
import mydatabasemodule
PDFReport = [...read file from database ...]
print 'Content-Type: application/octet-stream'
print 'Content-Disposition: attachment; filename="mytest.pdf"'
print
sys.stdout.write(PDFReport)
sys.stdout.flush()

Decompress zip file with password fails - bug in Python?

I get a strange error in python. When I try to extract a password protected file using the zip module, I get an exception when trying to set "oy" as password. Everything else seems to work. A bug in ZipFile module?
import zipfile
zip = zipfile.ZipFile("file.zip", "r")
zip.setpassword("oy".encode('utf-8'))
zip.extractall() #Above password "oy" generates the error here
zip.close()
This is the exception I get:
Traceback (most recent call last):
File "unzip.py", line 4, in <module>
zip.extractall()
File "C:\Program Files\Python32\lib\zipfile.py", line 1002, in extrac
l
self.extract(zipinfo, path, pwd)
File "C:\Program Files\Python32\lib\zipfile.py", line 990, in extract
return self._extract_member(member, path, pwd)
File "C:\Program Files\Python32\lib\zipfile.py", line 1035, in _extra
member
shutil.copyfileobj(source, target)
File "C:\Program Files\Python32\lib\shutil.py", line 65, in copyfileo
buf = fsrc.read(length)
File "C:\Program Files\Python32\lib\zipfile.py", line 581, in read
data = self.read1(n - len(buf))
File "C:\Program Files\Python32\lib\zipfile.py", line 633, in read1
max(n - len_readbuffer, self.MIN_READ_SIZE)
zlib.error: Error -3 while decompressing: invalid block type
If I use UTF-16 as encoding I get this error:
zlib.error: Error -3 while decompressing: invalid distance too far back
EDIT
I have now tested on a virtual Linux machine with following stuff:
Python version: 2.6.5
I created a password protected zip file with zip -e file.zip
hello.txt
Now it seems the problem is something else. Now I can extract the zip file even if the password is wrong!
try:
zip.setpassword("ks") # "ks" is wrong password but it still extracts the zip
zip.extractall()
except RuntimeException:
print "wrong!"
Sometimes I can extract the zip file with an incorrect password. The file (inside the zip file) is then extracted but when I try to open it the information seems to be corrupted/decrypted.
If there's a problem with the password, usually you get the following exception:
RuntimeError: ('Bad password for file', <zipfile.ZipInfo object at 0xb76dec2c>)
Since your exception complains about block type, most probably your .zip archive is corrupted, have you tried to unpack it with standalone unzip utility?
Or maybe you have used something funny, like 7zip to create it, which makes incompatible .zip archives.
You don't provide enough information (OS version? Python version? ZIP archive creator and contents? are there many files in those archives or single file in single archive? do all those files give same errors, or you can unpack some of them?), so here's quick Q&A section, which should help you to find and remedy the problem.
Q1. Is this a bug in Python?
A1. Unlikely.
Q2. What might cause this behaviour?
A2. Broken zip files, incompatible zip compressors -- since you don't tell anything, it's hard to point the the exact cause.
Q3. How to find the cause?
A3. Try to isolate the problem, find the file which gives you an error, try to use zip.testzip() and/or decompress that particular file with different unzip utility, share the results. Only you have access to the problematic files, so nobody can help you unless you try to do something yourself.
Q4. How to fix this?
A4. You cannot. Use different zip extractor, ZipFile won't work.
Try using the testzip() method to check the file's integrity before extracting files.
It could be possibly a bug in zipfile, or a bug in your zip implementation. I noted that your line numbers do not match mine so I guess this is python 3.2 earlier than the current 3.2.3 release I have.
Now, as to your code, it does work for me on Python 3.2.3 on Linux. I suggest you update to the latest 3.2.x as there seem to be a number of bug fixes related to zipfile and zlib, including fixes for crashes.

Categories