python wkhtmltopdf to Generate pdf

python wkhtmltopdf to Generate pdf - python

I am able to generate the pdf using the Command Line wkhtmltopdf but when i use it in python lib
from wkhtmltopdf import WKhtmlToPdf
wkhtmltopdf = WKhtmlToPdf(
url='http://www.wikipedia.org',
output_file='a.pdf',
)
i get
'Exception: Missing url and output file arguments'

I think there is an issue with the current version. I had the same issues, and if you look at their Github issues page, someone posted the same issue two days ago.
This should have worked also, according to their documentation:
python -m wkhtmltopdf.main google.com ~/google.pdf
But instead I get:
optparse.OptionConflictError: option -h/--header-html: conflicting option string(s): -h
Since it's a wrapper, I'm guessing the underlying application was updated, but the wrapper has not been.

The problem in typos and rewrited API in wkhtmltopdf/main.py
Right now API is:
from wkhtmltopdf import WKhtmlToPdf
wkhtmltopdf = WKhtmlToPdf('http://www.wikipedia.org','out.pdf')

Related

Docx to pdf using pandoc in python

So I a quite new to Python so it may be a silly question but i can't seem to find the solution anywhere.
I have a django site I am running it locally on my machine just for development.
on the site I want to convert a docx file to pdf. I want to use pandoc to do this. I know there are other methods such as online apis or the python modules such as "docx2pdf". However i want to use pandoc for deployment reasons.
I have installed pandoc on my terminal using brew install pandoc.
so it should b installed correctly.
In my django project i am doing:
import pypandoc
import docx
def making_a_doc_function(request):
doc = docx.Document()
doc.add_heading("MY DOCUMENT")
doc.save('thisisdoc.docx')
pypandoc.convert_file('thisisdoc.docx', 'docx', outputfile="thisisdoc.pdf")
pdf = open('thisisdoc.pdf', 'rb')
response = FileResponse(pdf)
return response
The docx file get created no problem but it not pdf has been created. I am getting an error that says:
Pandoc died with exitcode "4" during conversion: b'cannot produce pdf output from docx\n'
Does anyone have any ideas?

The second argument to convert_file is output format, or, in this case, the format through which pandoc generates the pdf. Pandoc doesn't know how to produce a PDF through docx, hence the error.
Use pypandoc.convert_file('thisisdoc.docx', 'latex', outputfile="thisisdoc.pdf") or pypandoc.convert_file('thisisdoc.docx', 'pdf', outputfile="thisisdoc.pdf") instead.

Google image download with python cannot download images

I'm using google_images_download library to download top 20 images for a keyword. It's worked perfectly when I'm using it last days. Code is as follows.
from google_images_download import google_images_download
response = google_images_download.googleimagesdownload()
arguments = {"keywords":keyword,"limit":10,"print_urls":True}
paths = response.download(arguments)
Now it gives following error.
Evaluating...
Starting Download...
Unfortunately all 10 could not be downloaded because some images were not downloadable. 0 is all we got for this search filter!
Errors: 0
How can I solve this error.

There has been some changes on Google end (how they respond to the request) which results in this issue. Joeclinton1 on github has done some modifications to the original repo which provides a temporary fix.
You can find the updated repo here: https://github.com/Joeclinton1/google-images-download.git . The solution is in patch-1 branch if I'm not mistaken.
First uninstall the current version of google_images_download.
Then manually install Joeclinton1's repo by:
git clone https://github.com/Joeclinton1/google-images-download.git
cd google-images-download && sudo python setup.py install #no need for 'sudo' on windows Anaconda environment
or to install it with pip
pip install git+https://github.com/Joeclinton1/google-images-download.git
This should solve the problem. Note that currently this repo only supports upto 100 images.

I faced the same issue with google-image-download, which used to work perfect earlier!
I have an alternative that I would like to suggest, which should solve the problem.
Solution: Instead of using google-image-download for Python, use bing-image-downloader, that downloads from Bing! search engine.
Steps:
Step 1:
Install the library by using: pip install bing-image-downloader
Step 2:
from bing_image_downloader import downloader
downloader.download(query_string, limit=100, output_dir='dataset',
adult_filter_off=True, force_replace=False, timeout=60)
That's it! All you would need to do is to add your image topic to the query_string.
Note:
Parameters that you can further tweak:
query_string : String to be searched.
limit : (optional, default is 100) Number of images to download.
output_dir : (optional, default is 'dataset') Name of output dir.
adult_filter_off : (optional, default is True) Enable of disable adult filteration.
force_replace : (optional, default is False) Delete folder if present and start a fresh download.
timeout : (optional, default is 60) timeout for connection in seconds.
Further Reference: https://pypi.org/project/bing-image-downloader/

If you want to download less than 100 images per query string, google-images-download will work better than bing-images-downloader. It handles the errors better and, you know, Google Images gives quite better results than Bing equivalent.
However, if you're trying to download more than 100 images, google-images-downloader will give you a lot of headaches. As mentioned in this answer, Google changed their end, and because of this the repo is having a lot of failures (more info on the situation status here).
So, if you want to download thousands of images, use bing-image-downloader:
Install package from pip
pip install bing-image-downloader
Run query.
NOTE: The documentation seems to be incorrect, as it returns a "No module found" error when importing the package as from bing_image_downloader import downloader (as mentioned in this answer). Import it and use it like this:
from bing_image_downloader.downloader import download
query_string = 'muscle cars'
download(query_string, limit=1000, output_dir='dataset', adult_filter_off=True, force_replace=False, timeout=60, verbose=True)

Another easy way to download any number of images :-
pip install simple_image_download
from simple_image_download import simple_image_download as simp
response = simp.simple_image_download
response().download(a, b)
Where a= string of subject you want to download
B= number of images you want to download

Python 3.6 - image scraping with google-image-download

I want to crawl some images for my machine learning practice and found this google-image-download to very useful and the codes works out of the box.
However, at the moment, it only allow not more than 100 images, which is the limit from google image page(that only load 100 images per page).
The document said if you are using pip install google_images_download(which in my case, I am doing that), it will download together with selenium and by using chromedriver, you can download more than that limit.
however, everytime I run the code python gimages.py:
from google_images_download import google_images_download
response = google_images_download.googleimagesdownload()
arguments = {"keywords":"number plates","limit":200,"print_urls":True}
paths = response.download(arguments)
print(paths)
I will get error:
Looks like we cannot locate the path the 'chromedriver' (use the
'--chromedriver' argument to specify the path to the executable.) or
google chrome browser is not installed on your machine (exception:
expected str, bytes or os.PathLike object, not NoneType)
as I checked my installation, selenium already installed:
reading further, it said I can download chromedriver and put inside the same folder and call python gimages.py --chromedriver "chromedriver", I still get the same error.
How can I resolve this?
I am using conda with python 3.6, running the terminal from conda. the code is already working, just that chromedriver part is not.

You need to specify the path... "chromedriver" is not a path...
You might need to the explicit path "/path/to/chromedriver/folder".
In your case: python gimages.py --chromedriver "/path/to/chromedriver/folder"
Hope this helps you!

Got errors, while running exe file built with pyinstaller and Google Cloud API integration in python

I am working one file python project.
I integrated google-cloud-API for realtime speech streaming and recognition.
It works with python aaa.py command well.
Now I need windows build file(.exe), so I used pyinstaller program and I got aaa.exe file successfully.
But I got this error while running speech streaming by using Google cloud API.
[Errno 2] No such file or directory:
'D:\AI\ai\dist\AAA\google\cloud\gapic\speech\v1\speech_client_config.json'
So I copied this speech_client_config.json file in needed path, after that I got below error again.
Exception in 'grpc._cython.cygrpc.ssl_roots_override_callback'
ignored E0511 01:13:14.320000000 3108
src/core/lib/security/security_connector/security _connector.cc:1170]
assertion failed: pem_root_certs != nullptr
Then, I can not find solution to get working version with google-cloud API.
I am using python version 2.7.14
I need your friendly help.
Thanks.

I had the same problem. If you are willing to distribute roots.pem with your executable (just search for the file - it should be buried deep within the installation directory of grpcio), I had luck fixing this by setting GRPC_DEFAULT_SSL_ROOTS_FILE_PATH environment variable to the full path of this roots.pem file.

Update 2021
To anyone who is experiencing this issue. I got it working thanks to these amazing people. See the full conversation on this github issue.
Here is the link
Step 1
Credits to #cbenhagen & #rising-stark on this github link.
A PyInstaller hook called hook-grpc.py looking like this would do the trick:
Create a python file named hook-grpc.py with this code.
from PyInstaller.utils.hooks import collect_data_files
datas = collect_data_files('grpc')
Step 2
Put the hook-grpc.py file in your \site-packages\PyInstaller\hooks directory of the python environment you are running on. So basically you can find it at
C:\Users\yourusername\AppData\Local\Programs\Python\Python37\Lib\site-packages\PyInstaller\hooks
Note:
Just change the yourusername and Python37 to your
respective username and python version you are using.
For Anaconda users it might be different. Check this site
to find the anaconda python environment path you are using.
Step 3
Once you've done that you can now convert your .py python program to .exe using pyinstaller and it should work.

This looks to me like a SSL credentials mistake. I think you are not being allowed to GC. Check this code snippet and this documentation.

Thrift error while generating python client file

I'm new to Hbase and I would like to comunicate with it throught a python API which works with Thrift. I've followed this tutorial in order to install it properly on my machine, everything seemed to worked fine then I generated a .thrift file with the following command:
wget http://svn.apache.org/viewvc/hbase/trunk/hbase-thrift/src/main/resources/org/apache/hadoop/hbase/thrift/Hbase.thrift\?view\=markup
-O hbase.thrift
Then I tried to generate my client like it's showed here but i get the following error message:
[ERROR:/home/tests/hbase/hbase.thrift:12] (last token was '<')
syntax error
[FAILURE:/home/tests/hbase/hbase.thrift:12] Parser error during include pass.
I tried to lookup on internet what was the cause of this error and found this paper, I tried to lookup in thriftl.ll to see if I could correct the error but I found that the correction was already present in the file.
What can I do more in order to make this work ?
Thank you !
EDIT:
I'm using thrift 0.9.0

Using a fairly recent Thrift version and more importantly the proper URL I was able to generate Python sources without errors.
Please check if your download is actually a Thrift file or if it is a HTML source. The error message sounds very much like that, since HTML pages typically start with an <, and the right Thrift file contains the first < at line 110 (within list<Mutation>), not around line 12.
PS: Why are you using such an old version? 0.9.1 was released a year ago and 0.9.2 is actually in the process of being released.
PPS: Actually, the referenced ticket THRIFT-1274 seems not have much to do with it. But maybe I'm overlooking something.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

python wkhtmltopdf to Generate pdf - python

I am able to generate the pdf using the Command Line wkhtmltopdf but when i use it in python lib from wkhtmltopdf import WKhtmlToPdf wkhtmltopdf = WKhtmlToPdf( url='http://www.wikipedia.org', output_file='a.pdf', ) i get 'Exception: Missing url and output file arguments'

The problem in typos and rewrited API in wkhtmltopdf/main.py Right now API is: from wkhtmltopdf import WKhtmlToPdf wkhtmltopdf = WKhtmlToPdf('http://www.wikipedia.org','out.pdf')

Related

Docx to pdf using pandoc in python

Google image download with python cannot download images

Python 3.6 - image scraping with google-image-download

Got errors, while running exe file built with pyinstaller and Google Cloud API integration in python

Thrift error while generating python client file

Categories

Resources