Google image download with python cannot download images - python

I'm using google_images_download library to download top 20 images for a keyword. It's worked perfectly when I'm using it last days. Code is as follows.
from google_images_download import google_images_download
response = google_images_download.googleimagesdownload()
arguments = {"keywords":keyword,"limit":10,"print_urls":True}
paths = response.download(arguments)
Now it gives following error.
Evaluating...
Starting Download...
Unfortunately all 10 could not be downloaded because some images were not downloadable. 0 is all we got for this search filter!
Errors: 0
How can I solve this error.

There has been some changes on Google end (how they respond to the request) which results in this issue. Joeclinton1 on github has done some modifications to the original repo which provides a temporary fix.
You can find the updated repo here: https://github.com/Joeclinton1/google-images-download.git . The solution is in patch-1 branch if I'm not mistaken.
First uninstall the current version of google_images_download.
Then manually install Joeclinton1's repo by:
git clone https://github.com/Joeclinton1/google-images-download.git
cd google-images-download && sudo python setup.py install #no need for 'sudo' on windows Anaconda environment
or to install it with pip
pip install git+https://github.com/Joeclinton1/google-images-download.git
This should solve the problem. Note that currently this repo only supports upto 100 images.

I faced the same issue with google-image-download, which used to work perfect earlier!
I have an alternative that I would like to suggest, which should solve the problem.
Solution: Instead of using google-image-download for Python, use bing-image-downloader, that downloads from Bing! search engine.
Steps:
Step 1:
Install the library by using: pip install bing-image-downloader
Step 2:
from bing_image_downloader import downloader
downloader.download(query_string, limit=100, output_dir='dataset',
adult_filter_off=True, force_replace=False, timeout=60)
That's it! All you would need to do is to add your image topic to the query_string.
Note:
Parameters that you can further tweak:
query_string : String to be searched.
limit : (optional, default is 100) Number of images to download.
output_dir : (optional, default is 'dataset') Name of output dir.
adult_filter_off : (optional, default is True) Enable of disable adult filteration.
force_replace : (optional, default is False) Delete folder if present and start a fresh download.
timeout : (optional, default is 60) timeout for connection in seconds.
Further Reference: https://pypi.org/project/bing-image-downloader/

If you want to download less than 100 images per query string, google-images-download will work better than bing-images-downloader. It handles the errors better and, you know, Google Images gives quite better results than Bing equivalent.
However, if you're trying to download more than 100 images, google-images-downloader will give you a lot of headaches. As mentioned in this answer, Google changed their end, and because of this the repo is having a lot of failures (more info on the situation status here).
So, if you want to download thousands of images, use bing-image-downloader:
Install package from pip
pip install bing-image-downloader
Run query.
NOTE: The documentation seems to be incorrect, as it returns a "No module found" error when importing the package as from bing_image_downloader import downloader (as mentioned in this answer). Import it and use it like this:
from bing_image_downloader.downloader import download
query_string = 'muscle cars'
download(query_string, limit=1000, output_dir='dataset', adult_filter_off=True, force_replace=False, timeout=60, verbose=True)

Another easy way to download any number of images :-
pip install simple_image_download
from simple_image_download import simple_image_download as simp
response = simp.simple_image_download
response().download(a, b)
Where a= string of subject you want to download
B= number of images you want to download

Related

Python 3.6 - image scraping with google-image-download

I want to crawl some images for my machine learning practice and found this google-image-download to very useful and the codes works out of the box.
However, at the moment, it only allow not more than 100 images, which is the limit from google image page(that only load 100 images per page).
The document said if you are using pip install google_images_download(which in my case, I am doing that), it will download together with selenium and by using chromedriver, you can download more than that limit.
however, everytime I run the code python gimages.py:
from google_images_download import google_images_download
response = google_images_download.googleimagesdownload()
arguments = {"keywords":"number plates","limit":200,"print_urls":True}
paths = response.download(arguments)
print(paths)
I will get error:
Looks like we cannot locate the path the 'chromedriver' (use the
'--chromedriver' argument to specify the path to the executable.) or
google chrome browser is not installed on your machine (exception:
expected str, bytes or os.PathLike object, not NoneType)
as I checked my installation, selenium already installed:
reading further, it said I can download chromedriver and put inside the same folder and call python gimages.py --chromedriver "chromedriver", I still get the same error.
How can I resolve this?
I am using conda with python 3.6, running the terminal from conda. the code is already working, just that chromedriver part is not.
You need to specify the path... "chromedriver" is not a path...
You might need to the explicit path "/path/to/chromedriver/folder".
In your case: python gimages.py --chromedriver "/path/to/chromedriver/folder"
Hope this helps you!

How do I add python libraries to an AWS lambda function for Alexa?

I was following the tutorial to create an Alexa app using Python:
Python Alexa Tutorial
I was able to successfully follow all the steps and get the app to work.I now want to modify the python code and use external libraries such as import requests
or any other libraries that I install using pip. How would I setup my lambda function to include any pip packages that I install locally on my machine?
As it is described in the Amazon official documentation link here It is as simple as just creating a zip of all the folder contents after installing the required packages in your folder where you have your python lambda code.
As Vineeth pointed above in his comment, The very first step in moving from an inline code editor to a zip file upload approach is to change your lambda function handler name under configuration settings to include the python script file name that holds the lambda handler.
lambda_handler => {your-python-script-file-name}.lambda_handler.
Other solutions like python-lambda and lambda-uploader help with simplifying the process of uploading and the most importantly LOCAL TESTING. These will save a lot of time in development.
The official documentation is pretty good. In a nutshell, you need to create a zip file of a directory containing both the code of your lambda function and all external libraries you use at the top level.
You can simulate that by deactivating your virtualenv, copying all your required libraries into the working directory (which is always in sys.path if you invoke a script on the command line), and checking whether your script still works.
You may want to look into using frameworks such as zappa which will handle packaging up and deploying the lambda function for you.
You can use that in conjunction with flask-ask to have an easier time making Alexa skills. There's even a video tutorial of this (from the zappa readme) here
To solve this particular problem we're using a library called juniper. In a nutshell, all you need to do is create a very simple manifest file that looks like:
functions:
# Name the zip file you want juni to create
router:
# Where are your dependencies located?
requirements: ./src/requirements.txt.
# Your source code.
include:
- ./src/lambda_function.py
From this manifest file, calling juni build will create the zip file artifact for you. The file will include all the dependencies you specify in the requirements.txt.
The command will create this file ./dist/router.zip. We're using that file in conjunction with a sam template. However, you can then use that zip and upload it to the console, or through the awscli.
Echoing #d3ming's answer, a framework is a good way to go at this point. Creating the deployment package manually isn't impossible, but you'll need to be uploading your packages' compiled code, and if you're compiling that code on a non-linux system, the chance of running into issues with differences between your system and the Lambda function's deployed environment are high.
You can then work around that by compiling your code on a linux machine or Docker container.. but between all that complexity you can get much more from adopting a framework.
Serverless is well adopted and has support for custom python packages. It even integrates with Docker to compile your python dependencies and build the deployment package for you.
If you're looking for a full tutorial on this, I wrote one for Python Lambda functions here.
Amazon created a repository that deals with your situation:
https://github.com/awsdocs/aws-lambda-developer-guide/tree/master/sample-apps/blank-python
The blank app is an example on how to push a lambda function that depends on requirements, with the bonus that being made by Amazon.
Everything you need to do is to follow the step by step, and update the repository based on your needs.
For some lambda POCs and fast lambda prototyping you can include and use the following function _install_packages, you can place a call to it before lambda handling function (for lambda init time package installation, if your deps need less than 10 seconds to install) or place the call at the beginning of the lambda handler (this will call the function exactly once at the first lambda event). Given pip install options included, packages to be installed must provide binary installable versions for manylinux.
_installed = False
def _install_packages(*packages):
global _installed
if not _installed:
import os
import sys
import time
_started = time.time()
os.system("mkdir -p /tmp/packages")
_packages = " ".join(f"'{p}'" for p in packages)
print("INSTALLED:")
os.system(
f"{sys.executable} -m pip freeze --no-cache-dir")
print("INSTALLING:")
os.system(
f"{sys.executable} -m pip install "
f"--no-cache-dir --target /tmp/packages "
f"--only-binary :all: --no-color "
f"--no-warn-script-location {_packages}")
sys.path.insert(0, "/tmp/packages")
_installed = True
_ended = time.time()
print(f"package installation took: {_ended - _started:.2f} sec")
# usage example before lambda handler
_install_packages("pymssql", "requests", "pillow")
def lambda_handler(event, context):
pass # lambda code
# usage example from within the lambda handler
def lambda_handler(event, context):
_install_packages("pymssql", "requests", "pillow")
pass # lambda code
Given examples install python packages: pymssql, requests and pillow.
An example lambda that installs requests and then calls ifconfig.me to obtain it's egress IP address.
import json
_installed = False
def _install_packages(*packages):
global _installed
if not _installed:
import os
import sys
import time
_started = time.time()
os.system("mkdir -p /tmp/packages")
_packages = " ".join(f"'{p}'" for p in packages)
print("INSTALLED:")
os.system(
f"{sys.executable} -m pip freeze --no-cache-dir")
print("INSTALLING:")
os.system(
f"{sys.executable} -m pip install "
f"--no-cache-dir --target /tmp/packages "
f"--only-binary :all: --no-color "
f"--no-warn-script-location {_packages}")
sys.path.insert(0, "/tmp/packages")
_installed = True
_ended = time.time()
print(f"package installation took: {_ended - _started:.2f} sec")
# usage example before lambda handler
_install_packages("requests")
def lambda_handler(event, context):
import requests
return {
'statusCode': 200,
'body': json.dumps(requests.get('http://ifconfig.me').content.decode())
}
Given single quote escaping is considered when building pip's command line, you can specify a version in a package spec like this pillow<9, the former will install most recent 8.X.X version of pillow.
I too struggled for a while with this. The after deep diving into aws resources I got to know the lambda function on aws runs locally on a a linux. And it's very important to have the the python package version which matches with the linux version.
You may find more information on this on :
https://aws.amazon.com/lambda/faqs/
Follow the steps to download the version.
1. Find the .whl image of the package from pypi and download it on you local.
2. Zip the packages and add them as layers in aws lambda
3. Add the layer to the lambda function.
Note: Please make sure that version you're trying to install python package matches the linux os on which the aws lambda performs computes tasks.
References :
https://pypi.org/project/Pandas3/#files
A lot of python libraries can be imported via Layers here: https://github.com/keithrozario/Klayers, or your can use a framework like serverless that has plugins to package packages directly into your artifact.

unable to completely download nltk package in python. stops at omw

I am unable to complete the nltk package download. It always stops at the items omw (Open Multilingual Wordnet). These are the only two remaining ones. I have looked at other help items (i.e. install ntlk supporting packages or error installing Nltk) but the problem persists. It returns the error code 11001, i.e. wrong server location. But the server index http://ntlk.org/nltk_data/ worked for all other items. I am a bit lost here.
Print scree of error message can be found here
I use python 3.5 and have the latest nltk file (downloaded and unzipped it last night)
Many thanks!
Are you connecting to the internet with a proxy server? If so, try this:
nltk.set_proxy('http://proxy.example.com:3128', ('USERNAME', 'PASSWORD'))
nltk.download()
Alternatively, try this:
Open a terminal window (Use the “Run...” option on the Start menu). Go to the directory where Python is installed, for example C:\Program Files\Python 3.5\
type:
python -m nltk.downloader all
If all that fails, you should try downloading the data manually from here: http://www.nltk.org/nltk_data/ and then put your data in the C:\nltk_data directory.

python wkhtmltopdf to Generate pdf

I am able to generate the pdf using the Command Line wkhtmltopdf but when i use it in python lib
from wkhtmltopdf import WKhtmlToPdf
wkhtmltopdf = WKhtmlToPdf(
url='http://www.wikipedia.org',
output_file='a.pdf',
)
i get
'Exception: Missing url and output file arguments'
I think there is an issue with the current version. I had the same issues, and if you look at their Github issues page, someone posted the same issue two days ago.
This should have worked also, according to their documentation:
python -m wkhtmltopdf.main google.com ~/google.pdf
But instead I get:
optparse.OptionConflictError: option -h/--header-html: conflicting option string(s): -h
Since it's a wrapper, I'm guessing the underlying application was updated, but the wrapper has not been.
The problem in typos and rewrited API in wkhtmltopdf/main.py
Right now API is:
from wkhtmltopdf import WKhtmlToPdf
wkhtmltopdf = WKhtmlToPdf('http://www.wikipedia.org','out.pdf')

Using pip within a python script

I am writing a utility in python that needs to check for (and if necessary, install and even upgrade) various other modules with in a target project/virtualenv, based on user supplied flags and/or input. I am currently trying to utilize 'pip' directly/programatically (because of it's existing support for the various repo types I will need to access), but I am having difficulty in finding examples or documentation on using it this way.
This seemed like the direction to go:
import pip
vcs = pip.vcs.VersionControl(url="http://path/to/repo/")
...but it gives no joy.
I need help with some of the basics aparently - like how can I use pip to pull/export a copy of an svn repo into a given local directory. Ultimately, I will also need to use it for git and mercurial checkouts as well as standard pypi installs. Any links, docs or pointers would be much appreciated.
Pip uses a particular format for vcs urls. The format is
vcsname+url#rev
#rev is optional, you can use it to reference a specific commit/tag
To use pip to retrieve a repository from a generic vcs to a local directory you may do this
from pip.vcs import VcsSupport
req_url = 'git+git://url/repo'
dest_path = '/this/is/the/destination'
vcs = VcsSupport()
vc_type, url = req_url.split('+',1)
backend = vcs.get_backend(vc_type)
if backend:
vcs_backend = backend(req_url)
vcs_backend.obtain(dest_path)
else:
print('Not a repository')
Check https://pip.pypa.io/en/stable/reference/pip_install/#id8 to know which vcs are supported

Categories