Python chat server when can't guarantee page not reset? - python

I'm relatively new to Python, cgi, and passenger-wsgi, so please bear with me.
I set up a python script that's not much more than
import time
startTime = time.time()
def main():
return time.time()-startTime
.. just so I know how long the passenger server has been running. I did this a few days ago, but it's only at about 12 minutes now.
Keeping track of state isn't important for any of the scripts I currently have, but I'm planning on writing a simple chat page. Keeping track of the various users online and the chat groups will be very important, and I wouldn't want everything to be reset every 12 minutes.
What can I do about this?
My only thought is to store any necessary variables inside an object, then serialize that object and store it in a file every time I change it so that if the server restarts again I still have everything. Is this normally how it's done?

Related

How can I update my data automatically in Kivy?

I have a code block in my class
def datas(self):
source = requests.get("http://150.150.150.150/tek.json").json()
return source
My program uses this data and this data is updated in that server everyday at 12:05 AM.
When i run app at 12:00 and if i wait untill 12:10 the datas is not updated in my app.
Only if i restart the app then new data pulls from my app.
I used While True and sleep for 10 but this caused the program to wait every 10 seconds to run.
How can i fix this issue. Should i use Clock Schedule? I don't know how to use. Could you please help me? I want to apdate data in every 10 seceonds so new datas can seen on the screen.
Thanks very much.
I think in this case what could be helpful is an asyncio library. It would the While True loop to run every 10 seconds without freezing the whole program.
Here is an example:
https://github.com/kivy/kivy/blob/master/examples/async/asyncio_basic.py?fbclid=IwAR0Ae1UhdZL57ytHdVzpXYiPUnxE1c5si6RWnYQjo5l_N-Wm3MTNbGJDsok
And if that does not work for you there are plenty videos on YTB from "Tech with Tim" for example that shows you how to set methods in asyncio .
Using the Clock object looks like the easiest way to solve your problem. To execute your function retrieving data every 10 seconds, you can use:
from kivy.clock import Clock
def datas(self):
source = requests.get("http://150.150.150.150/tek.json").json()
return source
Clock.schedule_interval(datas, 10)
However, I don't really get why you want to retrieve the data every 10s if you know it's updated only once per day. To me, it would make more sense to schedule the Clock for every 60s, check whether it's 12:06 for instance, and only do the request if it's the case (otherwise do nothing)

Scheduling in Heroku while keeping track of counter

Might be a noob question but I'm new to hosting code online. For my first project I've created a web scraper that logs into a website and updates a number every hour. Indexing up by one. Initially I was scheduling using the schedule library. Looked a little like this:
import schedule
int number = 4
def job():
number = number + 1
element.clear()
element.send_keys(number)
schedule.every().hour.at(":00").do(job)
while True:
schedule.run_pending()
time.sleep(1)
This worked fine for the first 24 hours until the dyno had to be cycled. After which, the value number reverted back to 4. I could use some built-in scheduler but I don't see how that would remedy the issue since it will run the int number = 4 line anyway. What am I missing and how can I keep track of the running count even when the dyno resets?
This isn't really about schedulers, it's about where that data lives. Right now it is in memory and (a) will be lost whenever your dyno restarts and (b) will not behave properly if you scale your application up or out.
You need to store it somewhere.
Heroku offers a number of addons that do data storage, many of which have free tiers.
Depending on your use case, you could also store it on an off-site blob storage container like Amazon S3 or Azure Blob Storage, but a data store is likely a better choice.

Scraping Edgar with Python regular expressions

I am working on a personal project's initial stage of downloading 10-Q statements from EDGAR. Quick disclaimer, I am very new to programming and python so the code that I wrote is very basic, not even using custom functions and classes, just a very long script that I'm more comfortable editing. As a result, some solutions are quite rough (i.e. concatenating urls using CIKs and other search options instead of doing requests with "browser" headers)
I keep running into a problem that those who have scraped EDGAR might be familiar with. Every now and then my script just stops running. It doesn't raise any exceptions (I created some that append txt reports with links that can't be opened and so forth). I suspect that either SEC servers have a certain limit of requests from an IP per some unit of time (if I wait some time after CTRL-C'ing the script and run it again, it generates more output compared to rapid re-activation), alternatively it could be TWC that identifies me as a bot and limits such requests.
If it's SEC, what could potentially work? I tried learning how to work with TOR and potentially get a new IP every now and then but I can't really find some basic tutorial that would work for my level of expertise. Maybe someone can recommend something good on the topic?
Maybe the timers would work? Like force the script to sleep every hour or so (still trying to figure out how to make such timers and reset them if an event occurs). The main challenge with this particular problem is that I can't let it run at night.
Thank you in advance for any advice, I keep fighting with it for days and at this stage it could take me more than a month to get what I want (before I even start tackling 10-Ks)
It seems like delays are pretty useful - sitting at 3.5k downloads with no interruptions thanks to a simple:
import(time)
time.sleep(random.randint(0, 1) + abs(random.normalvariate(0, 0.2)))

Run python job every x minutes

I have a small python script that basically connects to a SQL Server (Micrsoft) database and gets users from there, and then syncs them to another mysql database, basically im just running queries to check if the user exists, if not, then add that user to the mysql database.
The script usually would take around 1 min to sync. I require the script to do its work every 5 mins (for example) exactly once (one sync per 5 mins).
How would be the best way to go about building this?
I have some test data for the users but on the real site, theres a lot more users so I can't guarantee the script takes 1 min to execute, it might even take 20 mins. However having an interval of say 15 mins everytime the script executes would be ideal for the problem...
Update:
I have the connection params for the sql server windows db, so I'm using a small ubuntu server to sync between the two databases located on different servers. So lets say db1 (windows) and db2 (linux) are the database servers, I'm using s1 (python server) and pymssql and mysql python modules to sync.
Regards
I am not sure cron is right for the job. It seems to me that if you have it run every 15 minutes but sometimes a synch takes 20 minutes you could have multiple processes running at once and possibly collide.
If the driving force is a constant wait time between the variable execution times then you might need a continuously running process with a wait.
def main():
loopInt = 0
while(loopInt < 10000):
synchDatabase()
loopInt += 1
print("call #" + str(loopInt))
time.sleep(300) #sleep 5 minutes
main()
(obviously not continuous, but long running) You can set the result of while to true and it will be continuous. (comment out loopInt += 1)
Edited to add: Please see note in comments about monitoring the process as you don't want the script to hang or crash and you not be aware of it.
You might want to use a system that handles queues, for example RabbitMQ, and use Celery as the python interface to implement it. With Celery, you can add tasks (like execution of a script) to a queue or run a schedule that'll perform a task after a given interval (just like cron).
Get started http://celery.readthedocs.org/en/latest/

How to continuously read data from xively in a (python) heroku app?

I am trying to write a Heroku app in python which will read and store data from a xively feed in real time. I want the app to run independently as a sort of 'backend process' to simply store the data in a database. (It does not need to 'serve up' anything for users (for site visitors).)
Right now I am working on the 'continuous reading' part. I have included my code below. It simply reads the datastream once, each time I hit my app's Heroku URL. How do I get it to operate continuously so that it keeps on reading the data from xively?
import os
from flask import Flask
import xively
app = Flask(__name__)
#app.route('/')
def run_xively_script():
key = 'FEED_KEY'
feedid = 'FEED_ID'
client = xively.XivelyAPIClient(key)
feed = client.feeds.get(feedid)
datastream = feed.datastreams.get("level")
level = datastream.current_value
return "level is %s" %(level)
I am new to web development, heroku, and python... I would really appreciate any help(pointers)
{
PS:
I have read about Heroku Scheduler and from what I understand, it can be used to schedule a task at specific time intervals and when it does so, it starts a one-off dyno for the task. But as I mentioned, my app is really meant to perform just one function->continuously reading and storing data from xively. Is it necessary to schedule a separate task for that? And the one-off dyno that the scheduler will start will also consume dyno hours, which I think will exceed the free 750 dyno-hours limit (as my app's web dyno is already consuming 720 dyno-hours per month)...
}
Using the scheduler, as you and #Calumb have suggested, is one method to go about this.
Another method would be for you to setup a trigger on Xively. https://xively.com/dev/docs/api/metadata/triggers/
Have the trigger occur when your feed is updated. The trigger should POST to your Flask app, and the Flask app can then take the new data, manipulate it and store it as you wish. This would be the most near realtime, I'd think, because Xively is pushing the update to your system.
This question is more about high level architecture decisions and what you are trying to accomplish than a specific thing you should do.
Ultimately, Flask is probably not the best choice for an app to do what you are trying to do. You would be better off with just pure python or pure ruby. With that being said, using Heroku scheduler (which you alluded to) makes it possible to do something like what you are trying to do.
The simplest way to accomplish your goal (assuming that you want to change minimal amount of code and that constantly reading data is really what you want to do. Both of which you should consider) is to write a loop that runs when you call that task and grabs data for a few seconds. Just use a for loop and increment a counter for however many times you want to get the data.
Something like:
for i in range(0,5):
key = 'FEED_KEY'
feedid = 'FEED_ID'
client = xively.XivelyAPIClient(key)
feed = client.feeds.get(feedid)
datastream = feed.datastreams.get("level")
level = datastream.current_value
time.sleep(1)
However, Heroku has limits on how long something can run before it returns a value. Otherwise the router will return a 503 or 500. But you could use the scheduler to then schedule this to run every certain amount of time.
Again, I think that Flask and Heroku is not the best solution for what it sounds like you are trying to do. I would review your use case and go back to the drawing board on what the best method to accomplish it our.

Categories