urllib alongwith json to save to a variable

urllib alongwith json to save to a variable - python

Please correct my code. I am trying to save the result of this web page in json format to a variable in python.
Error:
Traceback (most recent call last):
File "C:/Users/Varen/Desktop/json_v1.py", line 5, in <module>
json.dump(link, f)
File "C:\Python27\lib\json\__init__.py", line 189, in dump
for chunk in iterable:
File "C:\Python27\lib\json\encoder.py", line 442, in _iterencode
o = _default(o)
File "C:\Python27\lib\json\encoder.py", line 184, in default
raise TypeError(repr(o) + " is not JSON serializable")
TypeError: <addinfourl at 53244992 whose fp = <socket._fileobject object at 0x032B4AF0>> is not JSON serializable
Code:
import urllib
import json
link = urllib.urlopen("http://www.saferproducts.gov/RestWebServices/Recall?RecallDateStart=2015-01-01&RecallDateEnd=2015-12-31&format=json")
with open('link.json', 'w') as f:
json.dump(link, f)

You need to read the data from the file like object returned by urlopen():
import urllib
import json
link = urllib.urlopen("http://www.saferproducts.gov/RestWebServices/Recall?RecallDateStart=2015-01-01&RecallDateEnd=2015-12-31&format=json")
with open('link.json', 'w') as f:
json.dump(link.read(), f)
will do the trick.

Related

How to save SHA256 object to a file?

(I'm doing all this in python 3.10.4 using pycryptodome)
I'm trying to do this process:
Get a hash of a file
Save that hash somewhere
Load that hash and perform RSA signing using a private key
I'm having a problem in step 3 where to save the hash, I have to save it as a string which doesn't work in Step 3.
I've tried using pickle but I'm getting
"ctypes objects containing pointers cannot be pickled"
Code generating the hash:
sha256 = SHA256.new()
with open(fileDir, 'rb') as f:
while True:
data = f.read(BUF_SIZE)
if not data:
break
sha256.update(data)
Code to perform the signing:
get_file(fileName + '.hash', directory)
with open(currentDir + '/client_files/downloaded/' + fileName + '.hash', 'r') as f:
hash_data = f.read()
with open(currentDir + '/client_files/private_key.pem', 'rb') as f:
private_key = RSA.importKey(f.read())
print(private_key)
signer = PKCS1_v1_5.new(private_key)
signature = signer.sign(hash_data)
The error I'm getting:
Traceback (most recent call last):
File "c:\Users\User\Documents\Coding\VSCode Projects\practiceGround\sec_cloud_project\client\client.py", line 168, in <module>
main()
File "c:\Users\User\Documents\Coding\VSCode Projects\practiceGround\sec_cloud_project\client\client.py", line 163, in main
sign(fileName, 'worker_test_files')
File "c:\Users\User\Documents\Coding\VSCode Projects\practiceGround\sec_cloud_project\client\client.py", line 120, in sign
signature = signer.sign(hash_data)
File "C:\Users\User\anaconda3\envs\nscc_project\lib\site-packages\Crypto\Signature\pkcs1_15.py", line 77, in sign
em = _EMSA_PKCS1_V1_5_ENCODE(msg_hash, k)
File "C:\Users\User\anaconda3\envs\nscc_project\lib\site-packages\Crypto\Signature\pkcs1_15.py", line 191, in _EMSA_PKCS1_V1_5_ENCODE
digestAlgo = DerSequence([ DerObjectId(msg_hash.oid).encode() ])
AttributeError: 'str' object has no attribute 'oid'
Note that I'm currently saving the original hash as a string to a text file. If I try to use pickle to save the object as a whole I get this error
with open(currentDir + '/worker_files/sha256.pickle', 'wb') as f:
pickle.dump(sha256, f)
Traceback (most recent call last):
File "c:\Users\User\Documents\Coding\VSCode Projects\practiceGround\sec_cloud_project\worker\worker.py", line 188, in <module>
main()
File "c:\Users\User\Documents\Coding\VSCode Projects\practiceGround\sec_cloud_project\worker\worker.py", line 179, in main
hash_file(fileName, 'worker_test_files')
File "c:\Users\User\Documents\Coding\VSCode Projects\practiceGround\sec_cloud_project\worker\worker.py", line 55, in hash_file
pickle.dump(sha256, f)
ValueError: ctypes objects containing pointers cannot be pickled

Thanks to #Topaco. Changing to using Cyptography for both hashing and signing seemed to work.
Hashing with Cryptography, dumping to a file with pickle, then load and sign with Cryptography again.

Decompress a gzip compressed dictionary object using python

I need to decompress this "H4sIAAAAAAAA/6tWKkktLjFUsjI00lEAs42UrCAMpVoAbyLr+R0AAAA=" which actually is compressed form of {"test1":12, "test2": "test"}. Now in python I'm using gzip library and getting below mentioned response:
>>> import gzip
>>> gzip.decompress("H4sIAAAAAAAA/6tWKkktLjFUsjI00lEAs42UrCAMpVoAbyLr+R0AAAA=".encode("UTF-8"))
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/data/python-3.8.10/lib/python3.8/gzip.py", line 548, in decompress
return f.read()
File "/data/python-3.8.10/lib/python3.8/gzip.py", line 292, in read
return self._buffer.read(size)
File "/data/python-3.8.10/lib/python3.8/gzip.py", line 479, in read
if not self._read_gzip_header():
File "/data/python-3.8.10/lib/python3.8/gzip.py", line 427, in _read_gzip_header
raise BadGzipFile('Not a gzipped file (%r)' % magic)
gzip.BadGzipFile: Not a gzipped file (b'H4')
Is there any way to decompress the string in python ?

The string is Base64 encoded. Therefore:-
import gzip
import base64
b = base64.b64decode('H4sIAAAAAAAA/6tWKkktLjFUsjI00lEAs42UrCAMpVoAbyLr+R0AAAA=')
r = gzip.decompress(b)
print(r.decode())

How Do We Convert HTML to PDF using Python, Is there Any code please share it to me?

I have tried the Library called pytotree, But i didnt get any Answer
This is the code:
import pdftotree
file= open('C:/Users/chaitanya.naidu/Downloads/test.pdf', 'rb')
f = pdftotree.parse(file)
I am getting this error
Traceback (most recent call last):
File "<ipython-input-4-4a9a6b72801d>", line 1, in <module>
f = pdftotree.parse(file)
File "C:\Users\chaitanya.naidu\AppData\Local\Continuum\Anaconda3\lib\site-packages\pdftotree\core.py", line 63, in parse
if not extractor.is_scanned():
File "C:\Users\chaitanya.naidu\AppData\Local\Continuum\Anaconda3\lib\site-packages\pdftotree\TreeExtract.py", line 121, in is_scanned
self.parse()
File "C:\Users\chaitanya.naidu\AppData\Local\Continuum\Anaconda3\lib\site-packages\pdftotree\TreeExtract.py", line 91, in parse
for page_num, layout in enumerate(analyze_pages(self.pdf_file)):
File "C:\Users\chaitanya.naidu\AppData\Local\Continuum\Anaconda3\lib\site-packages\pdftotree\utils\pdf\pdf_utils.py", line 117, in analyze_pages
with open(os.path.realpath(file_name), "rb") as fp:
File "C:\Users\chaitanya.naidu\AppData\Local\Continuum\Anaconda3\lib\ntpath.py", line 542, in abspath
path = os.fspath(path)
TypeError: expected str, bytes or os.PathLike object, not _io.BufferedReader

You can use pdfkit, example:
import pdfkit
pdfkit.from_url('http://google.com', 'out.pdf')
pdfkit.from_file('test.html', 'out.pdf')
pdfkit.from_string('Hello!', 'out.pdf')

How to download ms word docx file in python with raw data from http url

if the following url is hit in browser the docx file will be downloaded i want to automate the download with python.
https://hudoc.echr.coe.int/app/conversion/docx/?library=ECHR&id=001-176931&filename=CASE OF NDIDI v. THE UNITED KINGDOM.docx&logEvent=False
i have tried this following
from docx import Document
import requests
import json
from bs4 import BeautifulSoup
dwnurl = 'https://hudoc.echr.coe.int/app/conversion/docx/?library=ECHR&id=001-176931&filename=CASE%20OF%20NDIDI%20v.%20THE%20UNITED%20KINGDOM.docx&logEvent=False'
doc = requests.get(dwnurl)
print(doc.content) #printing the document like b'PK\x03\x04\x14\x00\x06\x00\x08\x00\x00\x00!\x00!\xfb\x16\x01\x16\x02\x00\x00\xec\x0c\x00\x00\x13\x00\xc4\x01[Content_Types].xml \xa2\xc0\
print(doc.raw) #printing the document like <urllib3.response.HTTPResponse object at 0x063D8BD0>
document = Document(doc.content)
document.save('test.docx')
#on document.save i have facing these issues
Traceback (most recent call last):
File "scraping_hudoc.py", line 40, in <module>
document = Document(doc.content)
File "C:\Users\204387\AppData\Local\Programs\Python\Python36-32\lib\site-packages\docx\api.py", line 25, in Document
document_part = Package.open(docx).main_document_part
File "C:\Users\204387\AppData\Local\Programs\Python\Python36-32\lib\site-packages\docx\opc\package.py", line 116, in open
pkg_reader = PackageReader.from_file(pkg_file)
File "C:\Users\204387\AppData\Local\Programs\Python\Python36-32\lib\site-packages\docx\opc\pkgreader.py", line 32, in from_file
phys_reader = PhysPkgReader(pkg_file)
File "C:\Users\204387\AppData\Local\Programs\Python\Python36-32\lib\site-packages\docx\opc\phys_pkg.py", line 101, in __init__
self._zipf = ZipFile(pkg_file, 'r')
File "C:\Users\204387\AppData\Local\Programs\Python\Python36-32\lib\zipfile.py", line 1108, in __init__
self._RealGetContents()
File "C:\Users\204387\AppData\Local\Programs\Python\Python36-32\lib\zipfile.py", line 1171, in _RealGetContents
endrec = _EndRecData(fp)
File "C:\Users\204387\AppData\Local\Programs\Python\Python36-32\lib\zipfile.py", line 241, in _EndRecData
fpin.seek(0, 2)
AttributeError: 'bytes' object has no attribute 'seek'

i have saved the ms word docx file through this
import requests
def save_link(book_link, book_name):
the_book = requests.get(book_link, stream=True)
with open(book_name, 'wb') as f:
for chunk in the_book.iter_content(1024 * 1024 * 2): # 2 MB chunks
f.write(chunk)
save_link("https://hudoc.echr.coe.int/app/conversion/docx/?library=ECHR&id=001-176931&filename=CASE%20OF%20NDIDI%20v.%20THE%20UNITED%20KINGDOM.docx&logEvent=False","CASE OF NDIDI v. THE UNITED KINGDOM.docx")

Python lib execute error

I made this python lib and it had this function with uses urllib and urllib2 but when i execute the lib's functions from python shell i get this error
>>> from sabermanlib import geturl
>>> geturl("roblox.com","ggg.html")
Traceback (most recent call last):
File "<pyshell#11>", line 1, in <module>
geturl("roblox.com","ggg.html")
File "sabermanlib.py", line 21, in geturl
urllib.urlretrieve(Address,File)
File "C:\Users\Andres\Desktop\ddd\Portable Python 2.7.5.1\App\lib\urllib.py", line 94, in urlretrieve
return _urlopener.retrieve(url, filename, reporthook, data)
File "C:\Users\Andres\Desktop\ddd\Portable Python 2.7.5.1\App\lib\urllib.py", line 240, in retrieve
fp = self.open(url, data)
File "C:\Users\Andres\Desktop\ddd\Portable Python 2.7.5.1\App\lib\urllib.py", line 208, in open
return getattr(self, name)(url)
File "C:\Users\Andres\Desktop\ddd\Portable Python 2.7.5.1\App\lib\urllib.py", line 463, in open_file
return self.open_local_file(url)
File "C:\Users\Andres\Desktop\ddd\Portable Python 2.7.5.1\App\lib\urllib.py", line 477, in open_local_file
raise IOError(e.errno, e.strerror, e.filename)
IOError: [Errno 2] The system cannot find the file specified: 'roblox.com'
>>>
and here's the code for the lib i made:
import urllib
import urllib2
def geturl(Address,File):
urllib.urlretrieve(Address,File)
EDIT 2
I cant understand why i get this error in the python shell executing:
geturl(Address,File)

You don't want urllib.urlretrieve. This takes a file-like object. Instead, you want urllib.urlopen:
>>> help(urllib.urlopen)
urlopen(url, data=None, proxies=None)
Create a file-like object for the specified URL to read from.
Additionally, if you want to download and save a document, you'll need a more robust geturl function:
def geturl(Address, FileName):
html_data = urllib.urlopen(Address).read() # Open the URL
with open(FileName, 'wb') as f: # Open the file
f.write(html_data) # Write data from URL to file
geturl(u'http://roblox.com') # URL's must contain the full URI, including http://

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

urllib alongwith json to save to a variable - python

Related

How to save SHA256 object to a file?

Decompress a gzip compressed dictionary object using python

How Do We Convert HTML to PDF using Python, Is there Any code please share it to me?

How to download ms word docx file in python with raw data from http url

Python lib execute error

Categories

Resources