I want to crawl some images for my machine learning practice and found this google-image-download to very useful and the codes works out of the box.
However, at the moment, it only allow not more than 100 images, which is the limit from google image page(that only load 100 images per page).
The document said if you are using pip install google_images_download(which in my case, I am doing that), it will download together with selenium and by using chromedriver, you can download more than that limit.
however, everytime I run the code python gimages.py:
from google_images_download import google_images_download
response = google_images_download.googleimagesdownload()
arguments = {"keywords":"number plates","limit":200,"print_urls":True}
paths = response.download(arguments)
print(paths)
I will get error:
Looks like we cannot locate the path the 'chromedriver' (use the
'--chromedriver' argument to specify the path to the executable.) or
google chrome browser is not installed on your machine (exception:
expected str, bytes or os.PathLike object, not NoneType)
as I checked my installation, selenium already installed:
reading further, it said I can download chromedriver and put inside the same folder and call python gimages.py --chromedriver "chromedriver", I still get the same error.
How can I resolve this?
I am using conda with python 3.6, running the terminal from conda. the code is already working, just that chromedriver part is not.
You need to specify the path... "chromedriver" is not a path...
You might need to the explicit path "/path/to/chromedriver/folder".
In your case: python gimages.py --chromedriver "/path/to/chromedriver/folder"
Hope this helps you!
I was following the tutorial to create an Alexa app using Python:
Python Alexa Tutorial
I was able to successfully follow all the steps and get the app to work.I now want to modify the python code and use external libraries such as import requests
or any other libraries that I install using pip. How would I setup my lambda function to include any pip packages that I install locally on my machine?
As it is described in the Amazon official documentation link here It is as simple as just creating a zip of all the folder contents after installing the required packages in your folder where you have your python lambda code.
As Vineeth pointed above in his comment, The very first step in moving from an inline code editor to a zip file upload approach is to change your lambda function handler name under configuration settings to include the python script file name that holds the lambda handler.
lambda_handler => {your-python-script-file-name}.lambda_handler.
Other solutions like python-lambda and lambda-uploader help with simplifying the process of uploading and the most importantly LOCAL TESTING. These will save a lot of time in development.
The official documentation is pretty good. In a nutshell, you need to create a zip file of a directory containing both the code of your lambda function and all external libraries you use at the top level.
You can simulate that by deactivating your virtualenv, copying all your required libraries into the working directory (which is always in sys.path if you invoke a script on the command line), and checking whether your script still works.
You may want to look into using frameworks such as zappa which will handle packaging up and deploying the lambda function for you.
You can use that in conjunction with flask-ask to have an easier time making Alexa skills. There's even a video tutorial of this (from the zappa readme) here
To solve this particular problem we're using a library called juniper. In a nutshell, all you need to do is create a very simple manifest file that looks like:
functions:
# Name the zip file you want juni to create
router:
# Where are your dependencies located?
requirements: ./src/requirements.txt.
# Your source code.
include:
- ./src/lambda_function.py
From this manifest file, calling juni build will create the zip file artifact for you. The file will include all the dependencies you specify in the requirements.txt.
The command will create this file ./dist/router.zip. We're using that file in conjunction with a sam template. However, you can then use that zip and upload it to the console, or through the awscli.
Echoing #d3ming's answer, a framework is a good way to go at this point. Creating the deployment package manually isn't impossible, but you'll need to be uploading your packages' compiled code, and if you're compiling that code on a non-linux system, the chance of running into issues with differences between your system and the Lambda function's deployed environment are high.
You can then work around that by compiling your code on a linux machine or Docker container.. but between all that complexity you can get much more from adopting a framework.
Serverless is well adopted and has support for custom python packages. It even integrates with Docker to compile your python dependencies and build the deployment package for you.
If you're looking for a full tutorial on this, I wrote one for Python Lambda functions here.
Amazon created a repository that deals with your situation:
https://github.com/awsdocs/aws-lambda-developer-guide/tree/master/sample-apps/blank-python
The blank app is an example on how to push a lambda function that depends on requirements, with the bonus that being made by Amazon.
Everything you need to do is to follow the step by step, and update the repository based on your needs.
For some lambda POCs and fast lambda prototyping you can include and use the following function _install_packages, you can place a call to it before lambda handling function (for lambda init time package installation, if your deps need less than 10 seconds to install) or place the call at the beginning of the lambda handler (this will call the function exactly once at the first lambda event). Given pip install options included, packages to be installed must provide binary installable versions for manylinux.
_installed = False
def _install_packages(*packages):
global _installed
if not _installed:
import os
import sys
import time
_started = time.time()
os.system("mkdir -p /tmp/packages")
_packages = " ".join(f"'{p}'" for p in packages)
print("INSTALLED:")
os.system(
f"{sys.executable} -m pip freeze --no-cache-dir")
print("INSTALLING:")
os.system(
f"{sys.executable} -m pip install "
f"--no-cache-dir --target /tmp/packages "
f"--only-binary :all: --no-color "
f"--no-warn-script-location {_packages}")
sys.path.insert(0, "/tmp/packages")
_installed = True
_ended = time.time()
print(f"package installation took: {_ended - _started:.2f} sec")
# usage example before lambda handler
_install_packages("pymssql", "requests", "pillow")
def lambda_handler(event, context):
pass # lambda code
# usage example from within the lambda handler
def lambda_handler(event, context):
_install_packages("pymssql", "requests", "pillow")
pass # lambda code
Given examples install python packages: pymssql, requests and pillow.
An example lambda that installs requests and then calls ifconfig.me to obtain it's egress IP address.
import json
_installed = False
def _install_packages(*packages):
global _installed
if not _installed:
import os
import sys
import time
_started = time.time()
os.system("mkdir -p /tmp/packages")
_packages = " ".join(f"'{p}'" for p in packages)
print("INSTALLED:")
os.system(
f"{sys.executable} -m pip freeze --no-cache-dir")
print("INSTALLING:")
os.system(
f"{sys.executable} -m pip install "
f"--no-cache-dir --target /tmp/packages "
f"--only-binary :all: --no-color "
f"--no-warn-script-location {_packages}")
sys.path.insert(0, "/tmp/packages")
_installed = True
_ended = time.time()
print(f"package installation took: {_ended - _started:.2f} sec")
# usage example before lambda handler
_install_packages("requests")
def lambda_handler(event, context):
import requests
return {
'statusCode': 200,
'body': json.dumps(requests.get('http://ifconfig.me').content.decode())
}
Given single quote escaping is considered when building pip's command line, you can specify a version in a package spec like this pillow<9, the former will install most recent 8.X.X version of pillow.
I too struggled for a while with this. The after deep diving into aws resources I got to know the lambda function on aws runs locally on a a linux. And it's very important to have the the python package version which matches with the linux version.
You may find more information on this on :
https://aws.amazon.com/lambda/faqs/
Follow the steps to download the version.
1. Find the .whl image of the package from pypi and download it on you local.
2. Zip the packages and add them as layers in aws lambda
3. Add the layer to the lambda function.
Note: Please make sure that version you're trying to install python package matches the linux os on which the aws lambda performs computes tasks.
References :
https://pypi.org/project/Pandas3/#files
A lot of python libraries can be imported via Layers here: https://github.com/keithrozario/Klayers, or your can use a framework like serverless that has plugins to package packages directly into your artifact.
I am unable to complete the nltk package download. It always stops at the items omw (Open Multilingual Wordnet). These are the only two remaining ones. I have looked at other help items (i.e. install ntlk supporting packages or error installing Nltk) but the problem persists. It returns the error code 11001, i.e. wrong server location. But the server index http://ntlk.org/nltk_data/ worked for all other items. I am a bit lost here.
Print scree of error message can be found here
I use python 3.5 and have the latest nltk file (downloaded and unzipped it last night)
Many thanks!
Are you connecting to the internet with a proxy server? If so, try this:
nltk.set_proxy('http://proxy.example.com:3128', ('USERNAME', 'PASSWORD'))
nltk.download()
Alternatively, try this:
Open a terminal window (Use the “Run...” option on the Start menu). Go to the directory where Python is installed, for example C:\Program Files\Python 3.5\
type:
python -m nltk.downloader all
If all that fails, you should try downloading the data manually from here: http://www.nltk.org/nltk_data/ and then put your data in the C:\nltk_data directory.
I am able to generate the pdf using the Command Line wkhtmltopdf but when i use it in python lib
from wkhtmltopdf import WKhtmlToPdf
wkhtmltopdf = WKhtmlToPdf(
url='http://www.wikipedia.org',
output_file='a.pdf',
)
i get
'Exception: Missing url and output file arguments'
I think there is an issue with the current version. I had the same issues, and if you look at their Github issues page, someone posted the same issue two days ago.
This should have worked also, according to their documentation:
python -m wkhtmltopdf.main google.com ~/google.pdf
But instead I get:
optparse.OptionConflictError: option -h/--header-html: conflicting option string(s): -h
Since it's a wrapper, I'm guessing the underlying application was updated, but the wrapper has not been.
The problem in typos and rewrited API in wkhtmltopdf/main.py
Right now API is:
from wkhtmltopdf import WKhtmlToPdf
wkhtmltopdf = WKhtmlToPdf('http://www.wikipedia.org','out.pdf')
I am writing a utility in python that needs to check for (and if necessary, install and even upgrade) various other modules with in a target project/virtualenv, based on user supplied flags and/or input. I am currently trying to utilize 'pip' directly/programatically (because of it's existing support for the various repo types I will need to access), but I am having difficulty in finding examples or documentation on using it this way.
This seemed like the direction to go:
import pip
vcs = pip.vcs.VersionControl(url="http://path/to/repo/")
...but it gives no joy.
I need help with some of the basics aparently - like how can I use pip to pull/export a copy of an svn repo into a given local directory. Ultimately, I will also need to use it for git and mercurial checkouts as well as standard pypi installs. Any links, docs or pointers would be much appreciated.
Pip uses a particular format for vcs urls. The format is
vcsname+url#rev
#rev is optional, you can use it to reference a specific commit/tag
To use pip to retrieve a repository from a generic vcs to a local directory you may do this
from pip.vcs import VcsSupport
req_url = 'git+git://url/repo'
dest_path = '/this/is/the/destination'
vcs = VcsSupport()
vc_type, url = req_url.split('+',1)
backend = vcs.get_backend(vc_type)
if backend:
vcs_backend = backend(req_url)
vcs_backend.obtain(dest_path)
else:
print('Not a repository')
Check https://pip.pypa.io/en/stable/reference/pip_install/#id8 to know which vcs are supported