Will the Session() close after terminate the python script? - python

I've written a simple python code to login to a forum, in order to keep alive and gain the online time. The code is as following:
logPara = {'username':user,'password':pwd}
s = requests.Session()
s.post(forumUrl,data=logPara)
homePage = requests.get(pageUrl)
I can get the correct homePage and am sure the login is successful. While I'm curious how long will this Session() last? If my program only contains theses four lines, will the Session() close thus the online status is lost?

Yes, definitely the session will be lost.
So, you two options for making the session last longer, One is as the answer posted by #Seekheart. Second is to save the session state in a file using python's pickle and load it again when needed. But this also will depend on the cookie expiration etc.
This is how you can do it.
When making the session request:
import pickle
import requests
logPara = {'username':user,'password':pwd}
s = requests.Session()
s.post(forumUrl,data=logPara)
homePage = requests.get(pageUrl)
with open('temp.dat', 'w') as f:
pickle.dump(s, f)
When you want to get the state back later:
import pickle
with open('temp.dat', 'r') as f:
s = pickle.load(f)

When you run the script, unless the script is being told to run endlessly or after a certain condition, it'll terminate almost immediately. So for your script it'll end after you run it. To continue running it you can put in a condition like for example.
while 1:
#Run your code

Related

Python Flask: How to wait for webhook to be executed?

I am working on a Python flask app, and the main method start() calls an external API (third_party_api_wrapper()). That external API has an associated webhook (webhook()) that receives the output of that external API call (note that the output that webhook() receives is actually different from the response returned in the third_party_wrapper())
The main method start() needs the result of webhook(). How do I make start() wait for webhook() to be executed? And how do wo pass the returned value of webhook() back to start()?
Here's is a minimal code snippet to capture the scenario.
#app.route('/webhook', methods=['POST'])
def webhook():
return "webhook method has executed"
# this method has a webhook that calls webhook() after this method has executed
def third_party_api_wrapper():
url = 'https://api.thirdparty.com'
response = requests.post(url)
return response
# this is the main entry point
#app.route('/start', methods=['POST'])
def start():
third_party_api_wrapper()
# The rest of this code depends on the output of webhook().
# How do we wait until webhook() is called, and how do we access the returned value?
The answer to this question really depends on how you plan on running your app in production. It's much simpler if we make the assumption that you only plan to have a single instance of your app running at once (as opposed to multiple behind a load balancer, for example), so I'll make that assumption first to give you a place to start, and comment on a more "production-ready" solution afterwards.
A big thing to keep in mind when writing a web application is that you have to understand how you want the outside world to interact with your app. Do you expect to have the /start endpoint called only once at the beginning of your app's lifetime, or is this a generic endpoint that may start any number of background processes that you want the caller of each to wait for? Or, do you want the behavior where any caller after the first one will wait for the same process to complete as the first one? I can't answer these questions for you, it depends on the use-case you're trying to implement. I'll give you a relatively simple solution that you should be able to modify to fulfill any of the ones I mentioned though.
This solution will use the Event class from the threading standard library module; I added some comments to clarify which parts you may have to change depending on the specifics of the API you're calling and stuff like that.
import threading
import uuid
from typing import Any
import requests
from flask import Flask, Response, request
# The base URL for your app, if you're running it locally this should be fine
# however external providers can't communicate with your `localhost` so you'll
# need to change this for your app to work end-to-end.
BASE_URL = "http://localhost:5000"
app = Flask(__name__)
class ThirdPartyProcessManager:
def __init__(self) -> None:
self.events = {}
self.values = {}
def wait_for_request(self, request_id: str) -> None:
event = threading.Event()
actual_event = self.events.setdefault(request_id, event)
if actual_event is not event:
raise ValueError(f"Request {request_id} already exists.")
event.wait()
return self.values.pop(request_id)
def finish_request(self, request_id: str, value: Any) -> None:
event = self.events.pop(request_id, None)
if event is None:
raise ValueError(f"Request {request_id} does not exist.")
self.values[request_id] = value
event.set()
MANAGER = ThirdPartyProcessManager()
# This is assuming that you can specify the callback URL per-request, otherwise
# you may have to get the request ID from the body of the request or something
#app.route('/webhook/<request_id>', methods=['POST'])
def webhook(request_id: str) -> Response:
MANAGER.finish_request(request_id, request.json)
return "webhook method has executed"
# Somehow in here you need to create or generate a unique identifier for this
# request--this may come from the third-party provider, or you can generate one
# yourself. There are three main paths I see here:
# - If you can specify the callback/webhook URL in each call, you can just pass them
# <base>/webhook/<request_id> and use that to identify which request is being
# responded to in the webhook.
# - If the provider gives you a request ID, you can return it from this function
# then retrieve it from the request body in the webhook route
# For now, I'll assume the first situation but you should be able to implement the second
# with minimal changes
def third_party_api_wrapper() -> str:
request_id = uuid.uuid4().hex
url = 'https://api.thirdparty.com'
# Just an example, I don't know how the third party API you're working with works
response = requests.post(
url,
json={"callback_url": f"{BASE_URL}/webhook/{request_id}"}
)
# NOTE: unrelated to the problem at hand, you should always check for errors
# in HTTP responses. This method is an easy way provided by requests to raise
# for non-success status codes.
response.raise_for_status()
return request_id
#app.route('/start', methods=['POST'])
def start() -> Response:
request_id = third_party_api_wrapper()
result = MANAGER.wait_for_request(request_id)
return result
If you want to run the example fully locally to test it, do the following:
Comment out lines 62-71, which actually make the external API call
Add a print statement after line 77, so that you can get the ID of the "in flight" request. E.g. print("Request ID", request_id)
In one terminal, run the app by pasting the above code into an app.py file and running flask run in that directory.
In another terminal, start the process via:
curl -XPOST http://localhost:5000/start
Copy the request ID that will be logged in the first terminal that's running the server.
In a third terminal, complete the process by calling the webhook:
curl -XPOST http://localhost:5000/webhook/<your_request_id> -H Content-Type:application/json -d '{"foo":"bar"}'
You should see {"foo":"bar"} as the response in the second terminal that made the /start request.
I hope that's enough to help you get started w/ whatever problem you're trying to solve.
There are a couple of design-y comments I have based on the information provided as well:
As I mentioned before, this will not work if you have more than one instance of the app running at once. This works by storing the state of in-flight requests in a global state inside your python process, so if you have more than one process, they won't all be working and modifying the same state. If you need to run more than one instance of your process, I would use a similar approach with some database backend to store the shared state (assuming your requests are pretty short-lived, Redis might be a good choice here, but once again it'll depend on exactly what you're trying to do).
Even if you do only have one instance of the app running, flask is capable of being run in a variety of different server contexts--for example, the server might be using threads (the default), greenlets via gevent or a similar library, or multiple processes, or maybe some other approach entirely in order to handle multiple requests concurrently. If you're using an approach that creates multiple processes, you should be able to use the utilities provided by the multiprocessing module to implement the same approach as I've given above.
This approach probably will work just fine for something where the difference in time between the API call and the webhook response is small (on the order of a couple of seconds at most I'd say), but you should be wary of using this approach for something where the difference in time can be quite large. If the connection between the client and your server fails, they'll have to make another request and run the long-running process that your third party is completing for you again. Some proxies and load balancers may also have time out behavior that could terminate the request after a certain amount of time even if nothing goes wrong in the connection between your server and the client making a request to it. An alternative approach would be for your /start endpoint to return quickly and give the client a request_id that they could poll for updates. As an example, AWS Athena's API is structured like this--there is a StartQueryExecution method, and separate GetQueryExecution and GetQueryResults methods that the client makes requests to check the status of a query and retrieve the results respectively (there are also other methods like StopQueryExecution and GetQueryRuntimeStatistics available as well). You can check out the documentation here.
I know that's a lot of info, but I hope it helps. Happy to update the answer w/ more specific info if you'll provide some more details about your use-case.

Python Request Package Close Connection Method Does not Work

it is the first time that I am working with a REST API in a jupyter notebook and I don't know what I am doing wrong here. When I try to execute the following code in a cell, the cell runs forever without throwing any errors. First I did not include the close method from the request package, but then I thought the problem might be the open connection. However including the close method also did not help. Do you know what could be the reason?
api_key = "exampletoken"
header = {'authorization':"Bearer {}".format(api_key)}
payload = {}
r = request.post('exampleurl', headers = header, data = payload)
r.close()
Thanks in advance!
runs forever without throwing any errors.
By default requests does not timeout, so it can wait infinite amount of time. This might cause behavior you described and mean server did not respond. To figure if that is cause, please set timeout for example
r = request.post('exampleurl', headers = header, data = payload, timeout=180)
would raise Exception after 180 seconds (i.e. 3 minutes) if it do not get response. If you want to know more about timeouts in requests I suggest reading realpython.com tutorial

is it possible to pass data from one python program to another python program? [duplicate]

Is it possible -- other than by using something like a .txt/dummy file -- to pass a value from one program to another?
I have a program that uses a .txt file to pass a starting value to another program. I update the value in the file in between starting the program each time I run it (ten times, essentially simultaneously). Doing this is fine, but I would like to have the 'child' program report back to the 'mother' program when it is finished, and also report back what files it found to download.
Is it possible to do this without using eleven files to do it (that's one for each instance of the 'child' to 'mother' reporting, and one file for the 'mother' to 'child')? I am talking about completely separate programs, not classes or functions or anything like that.
To operate efficently, and not be waiting around for hours for everything to complete, I need the 'child' program to run ten times and get things done MUCH faster. Thus I run the child program ten times and give each program a separate range to check through.
Both programs run fine, I but would like to get them to run/report back and forth with each other and hopefully not be using file 'transmission' to accomplish the task, especially on the child-mother side of the transferring of data.
'Mother' program...currently
import os
import sys
import subprocess
import time
os.chdir ('/media/')
#find highest download video
Hival = open("Highest.txt", "r")
Histr = Hival.read()
Hival.close()
HiNext = str(int(Histr)+1)
#setup download #1
NextVal = open("NextVal.txt","w")
NextVal.write(HiNext)
NextVal.close()
#call download #1
procs=[]
proc=subprocess.Popen(['python','test.py'])
procs.append(proc)
time.sleep(2)
#setup download #2-11
Histr2 = int(Histr)/10000
Histr2 = Histr2 + 1
for i in range(10):
Hiint = str(Histr2)+"0000"
NextVal = open("NextVal.txt","w")
NextVal.write(Hiint)
NextVal.close()
proc=subprocess.Popen(['python','test.py'])
procs.append(proc)
time.sleep(2)
Histr2 = Histr2 + 1
for proc in procs:
proc.wait()
'Child' program
import urllib
import os
from Tkinter import *
import time
root = Tk()
root.title("Audiodownloader")
root.geometry("200x200")
app = Frame(root)
app.grid()
os.chdir('/media/')
Fileval = open('NextVal.txt','r')
Fileupdate = Fileval.read()
Fileval.close()
Fileupdate = int(Fileupdate)
Filect = Fileupdate/10000
Filect2 = str(Filect)+"0009"
Filecount = int(Filect2)
while Fileupdate <= Filecount:
root.title(Fileupdate)
url = 'http://www.yourfavoritewebsite.com/audio/encoded/'+str(Fileupdate)+'.mp3'
urllib.urlretrieve(url,str(Fileupdate)+'.mp3')
statinfo = os.stat(str(Fileupdate)+'.mp3')
if statinfo.st_size<10000L:
os.remove(str(Fileupdate)+'.mp3')
time.sleep(.01)
Fileupdate = Fileupdate+1
root.update_idletasks()
I'm trying to convert the original VB6 program over to Linux and make it much easier to use at the same time. Hence the lack of .mainloop being missing. This was my first real attempt at anything in Python at all hence the lack of def or classes. I'm trying to come back and finish this up after 1.5 months of doing nothing with it mostly due to not knowing how to. In research a little while ago I found this is WAY over my head. I haven't ever did anything with threads/sockets/client/server interaction so I'm purely an idiot in this case. Google anything on it and I just get brought right back here to stackoverflow.
Yes, I want 10 running copies of the program at the same time, to save time. I could do without the gui interface if it was possible for the program to report back to 'mother' so the mother could print on the screen the current value that is being searched. As well as if the child could report back when its finished and if it had any file that it downloaded successfully(versus downloaded and then erased due to being to small). I would use the successful download information to update Highest.txt for the next time the program got ran.
I think this may clarify things MUCH better...that or I don't understand the nature of using server/client interaction:) Only reason time.sleep is in the program was due to try to make sure that the files could get written before the next instance of the child program got ran. I didn't know for sure what kind of timing issue I may run into so I included those lines for safety.
This can be implemented using a simple client/server topology using the multiprocessing library. Using your mother/child terminology:
server.py
from multiprocessing.connection import Listener
# client
def child(conn):
while True:
msg = conn.recv()
# this just echos the value back, replace with your custom logic
conn.send(msg)
# server
def mother(address):
serv = Listener(address)
while True:
client = serv.accept()
child(client)
mother(('', 5000))
client.py
from multiprocessing.connection import Client
c = Client(('localhost', 5000))
c.send('hello')
print('Got:', c.recv())
c.send({'a': 123})
print('Got:', c.recv())
Run with
$ python server.py
$ python client.py
When you talk about using txt to pass information between programs, we first need to know what language you're using.
Within my knowledge of Java and Python achi viable despite laborious depensendo the amount of information that wants to work.
In python, you can use the library that comes with it for reading and writing txt and schedule execution, you can use the apscheduler.

Alternating between variables each run

I want to use different API keys for data scraping each time my program is run.
For instance, I have the following 2 keys:
apiKey1 = "123abc"
apiKey2 = "345def"
and the following URL:
myUrl = http://myurl.com/key=...
When the program is run, I would like myUrl to be using apiKey1. Once it is run again, I would then like it to use apiKey2 and so forth... i.e:
First Run:
url = "http://myurl.com/key=" + apiKey1
Second Run:
url = "http://myurl.com/key=" + apiKey2
Sorry if this doesn't make sense, but does anyone know a way to do this? I have no idea.
EDIT:
To avoid confusion, I've had a look at this answer. But this doesn't answer my query. My target is to cycle between the variables between executions of my script.
I would use a persistent dictionary (it's like a database but more lightweight). That way you can easily store the options and the one to visit next.
There's already a library in the standard library that provides such a persistent dictionary: shelve:
import shelve
filename = 'target.shelve'
def get_next_target():
with shelve.open(filename) as db:
if not db:
# Not created yet, initialize it:
db['current'] = 0
db['options'] = ["123abc", "345def"]
# Get the current option
nxt = db['options'][db['current']]
db['current'] = (db['current'] + 1) % len(db['options']) # increment with wraparound
return nxt
And each call to get_next_target() will return the next option - no matter if you call it several times in the same execution or once per execution.
The logic could be simplified if you never have more than 2 options:
db['current'] = 0 if db['current'] == 1 else 1
But I thought it might be worthwhile to have a way that can easily handle multiple options.
Here is an example of how you can do it with automatic file creation if no such file exists:
import os
if not os.path.exists('Checker.txt'):
'''here you check whether the file exists
if not this bit creates it
if file exists nothing happens'''
with open('Checker.txt', 'w') as f:
#so if the file doesn't exist this will create it
f.write('0')
myUrl = 'http://myurl.com/key='
apiKeys = ["123abc", "345def"]
with open('Checker.txt', 'r') as f:
data = int(f.read()) #read the contents of data and turn it into int
myUrl = myUrl + apiKeys[data] #call the apiKey via index
with open('Checker.txt', 'w') as f:
#rewriting the file and swapping values
if data == 1:
f.write('0')
else:
f.write('1')
I would rely on an external process to hold which key was used last time,
or even simpler I would count executions of the script and use a key if execution count is an odd number, or the other key for an even number.
So I would introduce something like redis, which will also help a lot for other (future ?) features you may want to add in your project. redis is one of those tools that always give benefits at almost no cost, it's very practical to be able to rely on an external permanent storage - it can serve many purposes.
So here is how I would do it:
first make sure redis-server is running (can be started automatically as a daemon, depends on your system)
install Python redis module
then, here is some Python code for inspiration:
import redis
db = redis.Redis()
if db.hincrby('execution', 'count', 1) % 2:
key = apiKey1
else:
key = apiKey2
That's it !

"file being used by another process" error message

Yes, there are many other questions here on this topic. I have looked at the responses but I have not seen any which gives a useful solution.
I have the problem in its simplest form:
import os, time
hfile = "Positions.htm"
hf = open(hfile, "w")
hf.write(str(buf))
hf.close
time.sleep(2) # give it time to catch up
os.system(hfile) # run the html file in the default browser
I get this error message: "The process cannot access the file because it is being used by another process". The file is referenced nowhere else in the program.
No other process is using it, since I can access it without error from any other program, even if I run os.system(file) from the python console.
There's no point in using unlocker, because as soon as I leave the program, I can open the html file in the browser with no complaints from the system.
It looks like 'close' is not properly releasing the file.
I run programs this way out of perl all the time, with no problem except requiring the 1 or 2 second delay.
I'm using Python 3.4 on Win7.
Any suggestions?
You're not calling close(). Needs to be:
import os, time
hfile = "Positions.htm"
hf = open(hfile, "w")
hf.write(str(buf))
hf.close() # note the parens
time.sleep(2) # give it time to catch up
os.system(hfile) # run the html file in the default browser
However to avoid problems like this you should use a context manager:
import os, time
hfile = "Positions.htm"
with open(hfile, 'w') as hf:
hf.write(str(buf))
os.system(hfile) # run the html file in the default browser
The context manager will handle the closing of the file automatically.

Categories