Simpliest python long polling

Simpliest python long polling - python

I saw some threads about long polling in python, but my problem is not so bit to use some additional kits like tornado etc.
I have js client. It sends requests to my /longpolling page and wait for response. Once it get response or timeout it sends new one. This is working well.
My /longpolling handler is a function:
currentTime = datetime.datetime.now()
lastUpdate = datetime.datetime.strptime(req.GET["ts"], "%Y-%m-%dT%H:%M:%S.%f")
response = {
"added": [],
"updated": [],
"deleted": []
}
while (datetime.datetime.now() - currentTime).seconds < 600:
time.sleep(2)
now = datetime.datetime.now()
#query = Log.objects.filter(time__range = (lastUpdate, now))
query = Log.objects.raw("SELECT * FROM ...log WHERE time BETWEEN %s and %s", [lastUpdate, now])
exist = False
for log in query:
exist = True
type = {
NEW: "added",
UPDATED: "updated",
DELETED: "deleted"
}[log.type]
response[type].append(json.loads(log.data))
if exist:
response["ts"] = now.isoformat()
return JsonResponse(response)
response["ts"] = datetime.datetime.now().isoformat()
return JsonResponse(response)
Every 2 seconds during 10 min I want to check for new Log instances in DB to notify js client.
I tryed to insert Log record manualy through phpMyAdmin but next Log.objects.filter(time__range = (lastUpdate, now)) returns empty QuerySet. I copy raw query from .query attr it looks like:
SELECT ... FROM ... WHERE time BETWEEN 2013-01-05 03:30:36 and 2013-01-05 03:45:18
So I quoted 2013-01-05 03:30:36 and 2013-01-05 03:45:18 and executed this SQL through phpMyAdmin and it returned my added record.
I tryed:
query = Log.objects.filter(time__range = (lastUpdate, now))
and
query = Log.objects.raw("SELECT * FROM ...log WHERE time BETWEEN %s and %s", [lastUpdate, now])
and
for log in query.iterate():
But it always returns an empty QuerySet but never my added record.
I thought there is some caching, but where?
Or the problem is that I insert new record until while True: loop was performing? Or maybe there is some thread protection? Why phpMyAdmin see record but django does not?
Please help me, I am stuck.

I haven't run into this problem, so I'm not sure. Based on #DanielRoseman's answer in the thread you linked in comments, you might be able to do this:
with transaction.commit_on_success():
query = Log.objects.raw("SELECT * FROM ...log WHERE time BETWEEN %s and %s", [lastUpdate, now])
It seems more likely, though, that you will have to wrap the lines that insert your log entries in the commit_on_success decorator. I'm not sure where in your code the log entries are inserted.

Related

Python JSON scraping - how can I handle missing values?

I'm pretty new to coding, so I'm learning a lot as I go. This problem got me stumped, and even though I can find several similar questions on here, I can't find one that works or has a recognizable syntax to me.
I'm trying to scrape various user data from a JSON API, og then store those values in a MySQL database I've set up.
The code seems to run fine for the most part, but some users does not have the attributes I'm trying to scrape in the JSON, and thus I'm left with Nonetype errors that I cant seem to foil.
If possible I'd like to just store "0" in the database where the json does not contain the attribute.
In the m/snippet below this works fine for users that has a job, but users without a job returns Nonetype on jobposition and apparently breaks the loop.
response = requests.get("URL")
json_obj = json.loads(response.text)
timer = json_obj['timestamp']
jobposition = json_obj['job']['position']
query = "INSERT INTO users (timer, jobposition) VALUES (%s, %s)"
values = (timer, jobposition)
cursor = db.cursor()
cursor.execute(query, values)
db.commit()
Thanks in advance!

You can use for that the get() method of the dictionary as follow
timer = json_obj.get('timestamp', 0)
0 is the default value and in case there is no 'timestamp' attribute it will return 0.
For job position, you can do
jobposition = json_obj['job'].get('position', 0) if 'job' in json_obj else 0

Try this
try:
jobposition = json_obj['job']['position']
except:
jobposition = 0

You can more clearly declare the data schema using dataclasses:
from dataclasses import dataclass
from validated_dc import ValidatedDC
#dataclass
class Job(ValidatedDC):
name: str
position: int = 0
#dataclass
class Workers(ValidatedDC):
timer: int
job: Job
input_data = {
'timer': 123,
'job': {'name': 'driver'}
}
workers = Workers(**input_data)
assert workers.job.position == 0
https://github.com/EvgeniyBurdin/validated_dc

I have an Error with python flask cause of an API result (probably cause of my list) and my Database

I use flask, an api and SQLAlchemy with SQLite.
I begin in python and flask and i have problem with the list.
My application work, now i try a news functions.
I need to know if my json informations are in my db.
The function find_current_project_team() get information in the API.
def find_current_project_team():
headers = {"Authorization" : "bearer "+session['token_info']['access_token']}
user = requests.get("https://my.api.com/users/xxxx/", headers = headers)
user = user.json()
ids = [x['id'] for x in user]
return(ids)
I use ids = [x['id'] for x in user] (is the same that) :
ids = []
for x in user:
ids.append(x['id'])
To get ids information. Ids information are id in the api, and i need it.
I have this result :
[2766233, 2766237, 2766256]
I want to check the values ONE by One in my database.
If the values doesn't exist, i want to add it.
If one or all values exists, I want to check and return "impossible sorry, the ids already exists".
For that I write a new function:
def test():
test = find_current_project_team()
for find_team in test:
find_team_db = User.query.filter_by(
login=session['login'], project_session=test
).first()
I have absolutely no idea to how check values one by one.
If someone can help me, thanks you :)
Actually I have this error :
sqlalchemy.exc.InterfaceError: (InterfaceError) Error binding
parameter 1 - probably unsupported type. 'SELECT user.id AS user_id,
user.login AS user_login, user.project_session AS user_project_session
\nFROM user \nWHERE user.login = ? AND user.project_session = ?\n
LIMIT ? OFFSET ?' ('my_tab_login', [2766233, 2766237, 2766256], 1, 0)

It looks to me like you are passing the list directly into the database query:
def test():
test = find_current_project_team()
for find_team in test:
find_team_db = User.query.filter_by(login=session['login'], project_session=test).first()
Instead, you should pass in the ID only:
def test():
test = find_current_project_team()
for find_team in test:
find_team_db = User.query.filter_by(login=session['login'], project_session=find_team).first()
Asides that, I think you can do better with the naming conventions though:
def test():
project_teams = find_current_project_team()
for project_team in project_teams:
project_team_result = User.query.filter_by(login=session['login'], project_session=project_team).first()

All works thanks
My code :
project_teams = find_current_project_team()
for project_team in project_teams:
project_team_result = User.query.filter_by(project_session=project_team).first()
print(project_team_result)
if project_team_result is not None:
print("not none")
else:
project_team_result = User(login=session['login'], project_session=project_team)
db.session.add(project_team_result)
db.session.commit()

Google App Engine (Python) handle large number of writing tasks

I am reading a large amount of data from an API provider. Once get the response, I need to scan through and repackage the data and put into App Engine datastore. A particular big account will contain ~50k entries.
Every time I get some entries from the API, I will store 500 entries as a batch in a temp table and send the processing task to a queue. In case too many tasks get jammed inside one queue, I use 6 queues in total:
count = 0
worker_number = 6
for folder, property in entries:
data[count] = {
# repackaging data here
}
count = (count + 1) % 500
if count == 0:
cache = ClientCache(parent=user_key, data=json.dumps(data))
cache.put()
params = {
'access_token': access_token,
'client_key': client.key.urlsafe(),
'user_key': user_key.urlsafe(),
'cache_key': cache.key.urlsafe(),
}
taskqueue.add(
url=task_url,
params=params,
target='dbworker',
queue_name='worker%d' % worker_number)
worker_number = (worker_number + 1) % 6
And the task_url will lead to the following:
logging.info('--------------------- Process File ---------------------')
user_key = ndb.Key(urlsafe=self.request.get('user_key'))
client_key = ndb.Key(urlsafe=self.request.get('client_key'))
cache_key = ndb.Key(urlsafe=self.request.get('cache_key'))
cache = cache_key.get()
data = json.loads(cache.data)
for property in data.values():
logging.info(property)
try:
key_name = '%s%s' % (property['key1'], property['key2'])
metadata = Metadata.get_or_insert(
key_name,
parent=user_key,
client_key=client_key,
# ... other info
)
metadata.put()
except StandardError, e:
logging.error(e.message)
All the tasks are running in the backend.
With such structure, it's working fine. well... most of time. But sometimes I get this error:
2013-09-19 15:10:07.788
suspended generator transaction(context.py:938) raised TransactionFailedError(The transaction could not be committed. Please try again.)
W 2013-09-19 15:10:07.788
suspended generator internal_tasklet(model.py:3321) raised TransactionFailedError(The transaction could not be committed. Please try again.)
E 2013-09-19 15:10:07.789
The transaction could not be committed. Please try again.
It seems to be the problem of writing into datastore too frequently? I want to find out how I can balance the pace and let the worker run smoothly...
Also is there any other way I can improve the performance further? My queue configuration is something like this:
- name: worker0
rate: 120/s
bucket_size: 100
retry_parameters:
task_retry_limit: 3

You are writing single entities at a time.
How about modifing your code to write in batches using ndb.put_multi that will reduce the round trip time for each transaction.
And why are you using get_or_insert as you are overwriting the record each time. You might as well just write. Both of these will reduce the workload a lot

Limit calls to external database with Python CGI

I've got a Python CGI script that pulls data from a GPS service; I'd like this information to be updated on the webpage about once every 10s (the max allowed by the GPS service's TOS). But there could be, say, 100 users viewing the webpage at once, all calling the script.
I think the users' scripts need to grab data from a buffer page that itself only upates once every ten seconds. How can I make this buffer page auto-update if there's no one directly viewing the content (and not accessing the CGI)? Are there better ways to accomplish this?

Cache the results of your GPS data query in a file or database (sqlite) along with a datetime.
You can then do a datetime check against the last cached datetime to initiate another GPS data query.
You'll probably run into concurrency issues with cgi and the datetime check though...
To get around concurrency issues, you can use sqlite, and put the write in a try/except.
Here's a sample cache implementation using sqlite.
import datetime
import sqlite3
class GpsCache(object):
db_path = 'gps_cache.db'
def __init__(self):
self.con = sqlite3.connect(self.db_path)
self.cur = self.con.cursor()
def _get_period(self, dt=None):
'''normalize time to 15 minute periods'''
if dt.minute < 15:
minute_period = 0
elif 15 <= dt.minute < 30:
minute_period = 15
elif 30 <= dt_minute < 45:
minute_period = 30
elif 45 <= dt_minute:
minute_period = 25
period_dt = datetime.datetime(year=dt.year, month=dt.month, day=dt.day, hour=dt.hour, minute=minute_period)
return period_dt
def get_cache(dt=None):
period_dt = self._get_period(dt)
select_sql = 'SELECT * FROM GPS_CACHE WHERE date_time = "%s";' % period_dt.strftime('%Y-%m-%d %H:%M')
self.cur.execut(select_sql)
result = self.cur.fetchone()[0]
return result
def put_cache(dt=None, data=None):
period_dt = self._get_period(dt)
insert_sql = 'INSERT ....' # edit to your table structure
try:
self.cur.execute(insert_sql)
self.con.commit()
except sqlite3.OperationalError:
# assume db is being updated by another process with the current resutls and ignore
pass
So we have the cache tool now the implementation side.
You'll want to check the cache first then if it's not 'fresh' (doens't return anything), go grab the data using your current method. Then cache the data you grabbed.
you should probably organize this better, but you should get the general idea here.
Using this sample, you just replace your current calls to 'remote_get_gps_data' with 'get_gps_data'.
from gps_cacher import GpsCache
def remote_get_gps_data():
# your function here
return data
def get_gps_data():
data = None
gps_cache = GpsCache()
current_dt = datetime.datetime.now()
cached_data = gps_cache.get_cache(current_dt)
if cached_data:
data = cached_data
else:
data = remote_get_gps_data()
gps_cache.put_cache(current_dt, data)
return data

What is an efficient way of inserting thousands of records into an SQLite table using Django?

I have to insert 8000+ records into a SQLite database using Django's ORM. This operation needs to be run as a cronjob about once per minute.
At the moment I'm using a for loop to iterate through all the items and then insert them one by one.
Example:
for item in items:
entry = Entry(a1=item.a1, a2=item.a2)
entry.save()
What is an efficient way of doing this?
Edit: A little comparison between the two insertion methods.
Without commit_manually decorator (11245 records):
nox#noxdevel marinetraffic]$ time python manage.py insrec
real 1m50.288s
user 0m6.710s
sys 0m23.445s
Using commit_manually decorator (11245 records):
[nox#noxdevel marinetraffic]$ time python manage.py insrec
real 0m18.464s
user 0m5.433s
sys 0m10.163s
Note: The test script also does some other operations besides inserting into the database (downloads a ZIP file, extracts an XML file from the ZIP archive, parses the XML file) so the time needed for execution does not necessarily represent the time needed to insert the records.

You want to check out django.db.transaction.commit_manually.
http://docs.djangoproject.com/en/dev/topics/db/transactions/#django-db-transaction-commit-manually
So it would be something like:
from django.db import transaction
#transaction.commit_manually
def viewfunc(request):
...
for item in items:
entry = Entry(a1=item.a1, a2=item.a2)
entry.save()
transaction.commit()
Which will only commit once, instead at each save().
In django 1.3 context managers were introduced.
So now you can use transaction.commit_on_success() in a similar way:
from django.db import transaction
def viewfunc(request):
...
with transaction.commit_on_success():
for item in items:
entry = Entry(a1=item.a1, a2=item.a2)
entry.save()
In django 1.4, bulk_create was added, allowing you to create lists of your model objects and then commit them all at once.
NOTE the save method will not be called when using bulk create.
>>> Entry.objects.bulk_create([
... Entry(headline="Django 1.0 Released"),
... Entry(headline="Django 1.1 Announced"),
... Entry(headline="Breaking: Django is awesome")
... ])
In django 1.6, transaction.atomic was introduced, intended to replace now legacy functions commit_on_success and commit_manually.
from the django documentation on atomic:
atomic is usable both as a decorator:
from django.db import transaction
#transaction.atomic
def viewfunc(request):
# This code executes inside a transaction.
do_stuff()
and as a context manager:
from django.db import transaction
def viewfunc(request):
# This code executes in autocommit mode (Django's default).
do_stuff()
with transaction.atomic():
# This code executes inside a transaction.
do_more_stuff()

Bulk creation is available in Django 1.4:
https://django.readthedocs.io/en/1.4/ref/models/querysets.html#bulk-create

Have a look at this. It's meant for use out-of-the-box with MySQL only, but there are pointers on what to do for other databases.

You might be better off bulk-loading the items - prepare a file and use a bulk load tool. This will be vastly more efficient than 8000 individual inserts.

To answer the question particularly with regard to SQLite, as asked, while I have just now confirmed that bulk_create does provide a tremendous speedup there is a limitation with SQLite: "The default is to create all objects in one batch, except for SQLite where the default is such that at maximum 999 variables per query is used."
The quoted stuff is from the docs--- A-IV provided a link.
What I have to add is that this djangosnippets entry by alpar also seems to be working for me. It's a little wrapper that breaks the big batch that you want to process into smaller batches, managing the 999 variables limit.

You should check out DSE. I wrote DSE to solve these kinds of problems ( massive insert or updates ). Using the django orm is a dead-end, you got to do it in plain SQL and DSE takes care of much of that for you.
Thomas

def order(request):
if request.method=="GET":
cust_name = request.GET.get('cust_name', '')
cust_cont = request.GET.get('cust_cont', '')
pincode = request.GET.get('pincode', '')
city_name = request.GET.get('city_name', '')
state = request.GET.get('state', '')
contry = request.GET.get('contry', '')
gender = request.GET.get('gender', '')
paid_amt = request.GET.get('paid_amt', '')
due_amt = request.GET.get('due_amt', '')
order_date = request.GET.get('order_date', '')
print(order_date)
prod_name = request.GET.getlist('prod_name[]', '')
prod_qty = request.GET.getlist('prod_qty[]', '')
prod_price = request.GET.getlist('prod_price[]', '')
print(prod_name)
print(prod_qty)
print(prod_price)
# insert customer information into customer table
try:
# Insert Data into customer table
cust_tab = Customer(customer_name=cust_name, customer_contact=cust_cont, gender=gender, city_name=city_name, pincode=pincode, state_name=state, contry_name=contry)
cust_tab.save()
# Retrive Id from customer table
custo_id = Customer.objects.values_list('customer_id').last() #It is return
Tuple as result from Queryset
custo_id = int(custo_id[0]) #It is convert the Tuple in INT
# Insert Data into Order table
order_tab = Orders(order_date=order_date, paid_amt=paid_amt, due_amt=due_amt, customer_id=custo_id)
order_tab.save()
# Insert Data into Products table
# insert multiple data at a one time from djanog using while loop
i=0
while(i<len(prod_name)):
p_n = prod_name[i]
p_q = prod_qty[i]
p_p = prod_price[i]
# this is checking the variable, if variable is null so fill the varable value in database
if p_n != "" and p_q != "" and p_p != "":
prod_tab = Products(product_name=p_n, product_qty=p_q, product_price=p_p, customer_id=custo_id)
prod_tab.save()
i=i+1

I recommend using plain SQL (not ORM) you can insert multiple rows with a single insert:
insert into A select from B;
The select from B portion of your sql could be as complicated as you want it to get as long as the results match the columns in table A and there are no constraint conflicts.

def order(request):
if request.method=="GET":
# get the value from html page
cust_name = request.GET.get('cust_name', '')
cust_cont = request.GET.get('cust_cont', '')
pincode = request.GET.get('pincode', '')
city_name = request.GET.get('city_name', '')
state = request.GET.get('state', '')
contry = request.GET.get('contry', '')
gender = request.GET.get('gender', '')
paid_amt = request.GET.get('paid_amt', '')
due_amt = request.GET.get('due_amt', '')
order_date = request.GET.get('order_date', '')
prod_name = request.GET.getlist('prod_name[]', '')
prod_qty = request.GET.getlist('prod_qty[]', '')
prod_price = request.GET.getlist('prod_price[]', '')
# insert customer information into customer table
try:
# Insert Data into customer table
cust_tab = Customer(customer_name=cust_name, customer_contact=cust_cont, gender=gender, city_name=city_name, pincode=pincode, state_name=state, contry_name=contry)
cust_tab.save()
# Retrive Id from customer table
custo_id = Customer.objects.values_list('customer_id').last() #It is return Tuple as result from Queryset
custo_id = int(custo_id[0]) #It is convert the Tuple in INT
# Insert Data into Order table
order_tab = Orders(order_date=order_date, paid_amt=paid_amt, due_amt=due_amt, customer_id=custo_id)
order_tab.save()
# Insert Data into Products table
# insert multiple data at a one time from djanog using while loop
i=0
while(i<len(prod_name)):
p_n = prod_name[i]
p_q = prod_qty[i]
p_p = prod_price[i]
# this is checking the variable, if variable is null so fill the varable value in database
if p_n != "" and p_q != "" and p_p != "":
prod_tab = Products(product_name=p_n, product_qty=p_q, product_price=p_p, customer_id=custo_id)
prod_tab.save()
i=i+1
return HttpResponse('Your Record Has been Saved')
except Exception as e:
return HttpResponse(e)
return render(request, 'invoice_system/order.html')

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Simpliest python long polling - python

Related

Python JSON scraping - how can I handle missing values?

I have an Error with python flask cause of an API result (probably cause of my list) and my Database

Google App Engine (Python) handle large number of writing tasks

Limit calls to external database with Python CGI

What is an efficient way of inserting thousands of records into an SQLite table using Django?

Categories

Resources