Is there a way to dynamically determine whether the currently executing task is a standard http request or a TaskQueue?
In some parts of my request handler, I make a few urlfetches. I would like the timeout delay of the url fetch to be short if the request is a standard http request and long if it is a TaskQueue.
Pick any one of the following HTTP headers:
X-AppEngine-QueueName, the name of the queue (possibly default)
X-AppEngine-TaskName, the name of the task, or a system-generated unique ID if no name was specified
X-AppEngine-TaskRetryCount, the number of times this task has been retried; for the first attempt, this value is 0
X-AppEngine-TaskETA, the target execution time of the task, specified in microseconds since January 1st 1970.
Standard HTTP requests won't have these headers.
Task requests always include a specific set of HTTP headers, which you can check.
Related
I am tasked with writing a Python client app (validating admissions webhook) to determine:
which of these resources have PDBs configured
["deployments", "replicasets", "statefulsets", "horizontalpodautoscalers"]
and
if they have PDB configured, verify that the PDB allows for at least one disruption ("ALLOWED DISRUPTIONS").
For the first requirement, how can I use the Python Kubernetes API to determine if the incoming request (CREATE or UPDATE) is referring to a resource that has a pdb?
The requirement is not that every CREATE/UPDATE operation has a pdb configured, but rather "if" the resource has a pbd, the request must result in the pdb having "at least one" disruption permittted.
So am I right that I must first put the incoming request thru some sort of filter that determines if the resource in question has a pdb? And if so, how can I get that filter coded?
What would psuedo-code even look like for this?
These are my first thoughts:
list all pdbs in the cluster
determine if the incoming request is referecing a resource that has a pdb
if the request is referencing a resource that has a pdb, determine whether or not the request would take the pdb down to "0" disruptions
if "yes" to previous step, do not allow the operation
This is what I have so far
from kubernetes import config, client
config.load_incluster_config()
app_client = client.AppsV1Api()
I need to somehow extract plain HTTP request message from a Request object in Scrapy (so that I could, for example, copy/paste this request and run from Burp).
So given a scrapy.http.Request object, I would like to get the corresponding request message, such as e.g.
POST /test/demo_form.php HTTP/1.1
Host: w3schools.com
name1=value1&name2=value2
Clearly I have all the information I need in the Request object, however trying to reconstruct the message manually is error-prone as I could miss some edge cases. My understanding is that Scrapy first converts this Request into Twisted object, which then writes headers and body into a TCP transport. So maybe there's away to do something similar, but write to a string instead?
UPDATE
I could use the following code to get HTTP 1.0 request message, which is based on http.py. Is there a way to do something similar with HTTP 1.1 requests / http11.py, which is what's actually being sent? I would obviously like to avoid duplicating code from Scrapy/Twisted frameworks as much as possible.
factory = webclient.ScrapyHTTPClientFactory(request)
transport = StringTransport()
protocol = webclient.ScrapyHTTPPageGetter()
protocol.factory = factory protocol.makeConnection(transport)
request_message = transport.value()
print(request_message.decode("utf-8"))
As scrapy is open source and also has plenty of extension points, this should be doable.
The requests are finally assembled and sent out in scrapy/core/downloader/handlers/http11.py in ScrapyAgent.download_request ( https://github.com/scrapy/scrapy/blob/master/scrapy/core/downloader/handlers/http11.py#L270 )
If you place your hook there you can dump the request type, request headers, and request body.
To place your code there you can either try monkey patching ScrapyAgent.download_request or to subclass ScrapyAgent to do the request logging, then subclass HTTP11DownloadHandler to use your Scrapy Agent and then set HTTP11DownloadHandler as new DOWNLOAD_HANDLER for http / https requests in your project's settings.py (for details see: https://doc.scrapy.org/en/latest/topics/settings.html#download-handlers)
In my opinion this is the closest you can get to logging the requests going out without using a packet sniffer or a logging proxy (which might be a bit overkill for your scenario).
I'm writing a Django view that gets the latest blog posts of a wordpress system.
def __get_latest_blog_posts(rss_url, limit=4):
feed = feedparser.parse(rss_url)
return something
I tried in a terminal to use ETags:
>>> import feedparser
>>> d = feedparser.parse("http://a real url")
>>> d.etag
u'"2ca34419a999eae486b5e9fddaa2b2b9"'
>>> d2 = feedparser.parse("http://a real url", d.etag)
I'd like to avoid requesting the feed for every user of the web app. maybe etag aren't the best option?
Once the first user sees this view, can I store the etag and use it for all the other users? is there a thread for every user and therefore I can't share the value of a variable this way?
Etag allows to mark unique status of a web resource, so that you have a chance to ask for the resource expressing latest status you already have.
But to have some version already at your client, you have to fetch it the first time, so for the first request is use of etag irrelevant.
See HTTP Etag at wikipedia, it explains it all.
Typical scenario is:
fetch your page the first time and read value of Etag header for future use
next time you ask for the same page, you add header If-None-Match with value of Etag from your last fetch. Server will check, if there is something new, if the Etag you provide and Etag at current version of resource are the same, it will not return complete page back, but rather returh HTTP Status code 304 Not Modified. If the page has different status on the server, you get the page with HTTP Status code 200 and with new value of Etag in the response header.
If you want to optimize your app not to generate initial request for the same feed by each user, you shall somehow share the Etag value for given resource globally across your application.
The first request the client will never be able to use any local caches, so at the first request ETag isn't necessary. Remember that ETag needs to be passed into the conditional request headers (If-None-Match, If-Match, etc), the semantic of non conditional requests are clear.
If your feed is a public feed, then an intermediate caching proxy are also allowed to return an ETagged result for non conditional request, although it will always have to contact origin server if the conditional header doesn't match.
I am currently using the python requests package to make JSON requests. Unfortunately, the service which I need to query has a daily maximum request limit. Right know, I cache the executed request urls, so in case I come go beyond this limit, I know where to continue the next day.
r = requests.get('http://someurl.com', params=request_parameters)
log.append(r.url)
However, for using this log the the next day I need to create the request urls in my program before actually doing the requests so I can match them against the strings in the log. Otherwise, it would decrease my daily limit. Does anybody of you have an idea how to do this? I didn't find any appropriate method in the request package.
You can use PreparedRequests.
To build the URL, you can build your own Request object and prepare it:
from requests import Session, Request
s = Session()
p = Request('GET', 'http://someurl.com', params=request_parameters).prepare()
log.append(p.url)
Later, when you're ready to send, you can just do this:
r = s.send(p)
The relevant section of the documentation is here.
I was trying to download images with multi-thread, which has a limited max_count in python.
Each time a download_thread is started, I leave it alone and activate another one. I hope the download process could be ended in 5s, which means downloading is failed if opening the url costs more than 5s.
But how can I know it and stop the failed thread???
Can you tell which version of python you are using?
Maybe you could have posted a snippet too.
From Python 2.6, you have a timeout added in urllib2.urlopen.
Hope this will help you. It's from the python docs.
urllib2.urlopen(url[, data][,
timeout]) Open the URL url, which can
be either a string or a Request
object.
Warning HTTPS requests do not do any
verification of the server’s
certificate. data may be a string
specifying additional data to send to
the server, or None if no such data is
needed. Currently HTTP requests are
the only ones that use data; the HTTP
request will be a POST instead of a
GET when the data parameter is
provided. data should be a buffer in
the standard
application/x-www-form-urlencoded
format. The urllib.urlencode()
function takes a mapping or sequence
of 2-tuples and returns a string in
this format. urllib2 module sends
HTTP/1.1 requests with
Connection:close header included.
The optional timeout parameter
specifies a timeout in seconds for
blocking operations like the
connection attempt (if not specified,
the global default timeout setting
will be used). This actually only
works for HTTP, HTTPS and FTP
connections.
This function returns a file-like
object with two additional methods:
geturl() — return the URL of the
resource retrieved, commonly used to
determine if a redirect was followed
info() — return the meta-information
of the page, such as headers, in the
form of an mimetools.Message instance
(see Quick Reference to HTTP Headers)
Raises URLError on errors.
Note that None may be returned if no
handler handles the request (though
the default installed global
OpenerDirector uses UnknownHandler to
ensure this never happens).
In addition, default installed
ProxyHandler makes sure the requests
are handled through the proxy when
they are set.
Changed in version 2.6: timeout was
added.