SQLAlchemy session query with INSERT IGNORE - python

I'm trying to do a bulk insert/update with SQLAlchemy. Here's a snippet:
for od in clist:
where = and_(Offer.network_id==od['network_id'],
Offer.external_id==od['external_id'])
o = session.query(Offer).filter(where).first()
if not o:
o = Offer()
o.network_id = od['network_id']
o.external_id = od['external_id']
o.title = od['title']
o.updated = datetime.datetime.now()
payout = od['payout']
countrylist = od['countries']
session.add(o)
session.flush()
for country in countrylist:
c = session.query(Country).filter(Country.name==country).first()
where = and_(OfferPayout.offer_id==o.id,
OfferPayout.country_name==country)
opayout = session.query(OfferPayout).filter(where).first()
if not opayout:
opayout = OfferPayout()
opayout.offer_id = o.id
opayout.payout = od['payout']
if c:
opayout.country_id = c.id
opayout.country_name = country
else:
opayout.country_id = 0
opayout.country_name = country
session.add(opayout)
session.flush()
It looks like my issue was touched on here, http://www.mail-archive.com/sqlalchemy#googlegroups.com/msg05983.html, but I don't know how to use "textual clauses" with session query objects and couldn't find much (though admittedly I haven't had as much time as I'd like to search).
I'm new to SQLAlchemy and I'd imagine there's some issues in the code besides the fact that it throws an exception on a duplicate key. For example, doing a flush after every iteration of clist (but I don't know how else to get an the o.id value that is used in the subsequent OfferPayout inserts).
Guidance on any of these issues is very appreciated.

The way you should be doing these things is with session.merge().
You should also be using your objects relation properties. So the o above should have o.offerpayout and this a list (of objects) and your offerpayout has offerpayout.country property which is the related countries object.
So the above would look something like
for od in clist:
o = Offer()
o.network_id = od['network_id']
o.external_id = od['external_id']
o.title = od['title']
o.updated = datetime.datetime.now()
payout = od['payout']
countrylist = od['countries']
for country in countrylist:
opayout = OfferPayout()
opayout.payout = od['payout']
country_obj = Country()
country_obj.name = country
opayout.country = country_obj
o.offerpayout.append(opayout)
session.merge(o)
session.flush()
This should work as long as all the primary keys are correct (i.e the country table has a primary key of name). Merge essentially checks the primary keys and if they are there merges your object with one in the database (it will also cascade down the joins).

Related

How to Optimize a loop "For" in Django

I want to optimize this functions, because they take too long, each of them bring specific atributes, if you can help me. I think there's maybe a way to call the atributes in the function.
The functions are made with python and Django.
This is what i've done so far.
Definition of the functions.
cand_seleccionados = ListaFinal.objects.filter(interesado__id_oferta=efectiva.oferta.id)
seleccionados_ids = cand_seleccionados.values_list("interesado_id", flat=True)
cand_postulados = Postulados.objects.filter(
interesado__id_oferta=efectiva.oferta.id
).exclude(interesado_id__in=seleccionados_ids)
postulados_ids = cand_postulados.values_list("interesado_id", flat=True)
cand_entrevistados = Entrevistados.objects.filter(
interesado__id_oferta=efectiva.oferta.id
).exclude(interesado_id__in=postulados_ids)
This is the loop for cand_Postulados, the others are the same so i thought it wouldnt be necesary to put more
for p in cand_postulados:
postulado = dict()
telefono = Perfil.objects.values_list("telefono", flat=True).get(
user_id=p.interesado.candidato.id
)
postulado["id"] = p.interesado.candidato.id
postulado["nombre"] = p.interesado.candidato.first_name
postulado["email"] = p.interesado.candidato.email
postulado["teléfono"] = telefono
if p.interesado.id_oferta.pais is None:
postulado["pais"] = "Sin pais registrado"
else:
postulado["pais"] = p.interesado.id_oferta.pais.nombre
postulado["nombre_reclutador"] = p.interesado.id_reclutador.first_name
postulado["id_reclutador"] = p.interesado.id_reclutador.id
postulados.append(postulado)
If I'm reading your loop correctly, this should do everything in a single query. (You may need to adjust some of the __ spanning lookups if I read things incorrectly. In particular, I don't necessarily know the reverse name of your Perfil to user relationship.)
cand_postulados = (
Postulados.objects
.filter(interesado__id_oferta=efectiva.oferta.id)
.exclude(interesado_id__in=seleccionados_ids)
)
postulados = list(cand_postulados.values(
teléfono="interesado__candidato__telefono",
nombre="interesado__candidato__first_name",
email="interesado__candidato__email",
pais="interesado__id_oferta__pais__nombre",
nombre_reclutador="interesado__id_reclutador__first_name",
id_reclutador="interesado__id_reclutador__id",
))
for datum in postulados:
if not datum.get("pais"):
datum["pais"] = "Sin pais registrado"

How to Handle When Request Returns None

I have a list of IDs which corresponds to a set of records (opportunities) in a database. I then pass this list as a parameter in a RESTful API request where I am filtering the results (tickets) by ID. For each match, the query returns JSON data pertaining to the individual record. However, I want to handle when the query does not find a match. I would like to assign some value for this case such as the string "None", because not every opportunity has a ticket. How can I make sure there exists some value in presales_tickets for every ID in opportunity_list? Could I provide a default value in the request for this case?
views.py
opportunities = cwObj.get_opportunities()
temp = []
opportunity_list = []
cw_presales_engineers = []
for opportunity in opportunities:
temp.append(str(opportunity['id']))
opportunity_list = ','.join(temp)
presales_tickets = cwObj.get_tickets_by_opportunity(opportunity_list)
for opportunity in opportunities:
try:
if opportunity['id'] == presales_tickets[0]['opportunity']['id']:
try:
for presales_ticket in presales_tickets:
cw_engineer = presales_ticket['owner']['name']
cw_presales_engineers.append(cw_engineer)
except:
pass
else:
cw_engineer = 'None'
cw_presales_engineers.append(cw_engineer)
except AttributeError:
cw_engineer = ''
cw_presales_engineers.append(cw_engineer)
So, lets say you have a Ticket model and Opportunity model. Connected via a foreign key.
class Opportunity(models.Model):
... some fields here ...
class Ticket(models.Model):
opportunity = models.ForeignKey(Opportunity)
and in your view, you get a list of opportunity ids
def some_view(request):
ids = request.GET['ids']
It sounds, like what you want is to fetch all the tickets for the supplied opportunities and add some default processing for the opportunities that do not have tickets. If that is the case, why not do something like
def some_view(request):
ids = request.GET['ids']
tickets = Ticket.objects.filter(opportunity__id__in=ids)
results = []
for ticket in tickets:
result = ... do your thing here ...
results.append(result)
# now handle missing opportunities
good_ids = tickets.values_list('opportunity__id', flat=True).distinct()
for id in ids:
if id not in good_ids:
result = ... do your default processing ...
results.append(result)
Is that what you are trying to do?

how to relate one ponyorm entity attribute with several other entities?

I'm starting working with an existing database where attribute foo of table A is related to more then one other table, B.foo and C.foo. How do I form this relationship in ponyorm?
The database is organized like below.
from pony import orm
db = orm.Database()
class POI(db.Entity):
'''Point of interest on a map'''
name = orm.PrimaryKey(str)
coordinateID = orm.Optional(('cartesian', 'polar')) # doesn't work ofc
class cartesian(db.Entity):
coordinateID = orm.Required(POI)
x = orm.Required(float)
y = orm.Required(float)
class polar(db.Entity):
coordinateID = orm.Required(POI)
r = orm.Required(float)
phi = orm.Required(float)
Of course x,y from cartesian and r,phi from polar could be moved to POI, and in the database I work with, that's the same situation. But the tables are divided up between stakeholders (cartesian and polar in this example) and I don't get to change the schema anyway. I can't split coordinateID in the schema (but it would actually be nice to have different attributes of the python class).
It is not possible to relate one attribute for several enties in PonyORM except for the case when these entities are inherited from the same base entity, then you can specify base entity as the attribute type and use any of inherited entity as a real type.
If you use existing schema that you can't change, you probably can't use inheritance and need to specify raw id attribute instead of relationship:
from pony import orm
db = orm.Database()
class POI(db.Entity):
_table_ = "table_name"
name = orm.PrimaryKey(str)
coordinate_id = orm.Optional(int, column="coordinateID")
class Cartesian(db2.Entity):
_table_ = "cartesian"
id = orm.PrimaryKey(int, column="coordinateID")
x = orm.Required(float)
y = orm.Required(float)
class Polar(db2.Entity):
_table_ = "polar"
id = orm.PrimaryKey(int, column="coordinateID")
r = orm.Required(float)
phi = orm.Required(float)
And then you can perform queries like this:
left_join(poi for poi in POI
for c in Cartesian
for p in Polar
if poi.coordinate_id == c.id
and poi.coordinate_id = p.id
and <some additional conditions>)
Note that all entities used in the same query should be from the same database. If entities belongs to two different databases, you cannot use them in the same query. And need to issue separate queries:
with db_session:
poi = POI.get(id=some_id)
coord = Cartesian.get(id=poi.coordinate_id)
if coord is None:
coord = Polar.get(id=poi.coordinate_id)
<do something with poi and coord>
But in case, for example, of SQLite you can attach one database to another to make them appear as a single database.

How to query multiple tables in SQLAlchemy ORM

I'm a newcomer to SQLAlchemy ORM and I'm struggling to accomplish complex-ish queries on multiple tables - queries which I find relatively straightforward to do in Doctrine DQL.
I have data objects of Cities, which belong to Countries. Some Cities also have a County ID set, but not all. As well as the necessary primary and foreign keys, each record also has a text_string_id, which links to a TextStrings table which stores the name of the City/County/Country in different languages. The TextStrings MySQL table looks like this:
CREATE TABLE IF NOT EXISTS `text_strings` (
`id` INT UNSIGNED NOT NULL,
`language` VARCHAR(2) NOT NULL,
`text_string` varchar(255) NOT NULL,
PRIMARY KEY (`id`, `language`)
)
I want to construct a breadcrumb for each city, of the form:
country_en_name > city_en_name OR
country_en_name > county_en_name > city_en_name,
depending on whether or not a County attribute is set for this city. In Doctrine this would be relatively straightforward:
$query = Doctrine_Query::create()
->select('ci.id, CONCAT(cyts.text_string, \'> \', IF(cots.text_string is not null, CONCAT(cots.text_string, \'> \', \'\'), cits.text_string) as city_breadcrumb')
->from('City ci')
->leftJoin('ci.TextString cits')
->leftJoin('ci.Country cy')
->leftJoin('cy.TextString cyts')
->leftJoin('ci.County co')
->leftJoin('co.TextString cots')
->where('cits.language = ?', 'en')
->andWhere('cyts.language = ?', 'en')
->andWhere('(cots.language = ? OR cots.language is null)', 'en');
With SQLAlchemy ORM, I'm struggling to achieve the same thing. I believe I've setup the objects correctly - in the form eg:
class City(Base):
__tablename__ = "cities"
id = Column(Integer, primary_key=True)
country_id = Column(Integer, ForeignKey('countries.id'))
text_string_id = Column(Integer, ForeignKey('text_strings.id'))
county_id = Column(Integer, ForeignKey('counties.id'))
text_strings = relation(TextString, backref=backref('cards', order_by=id))
country = relation(Country, backref=backref('countries', order_by=id))
county = relation(County, backref=backref('counties', order_by=id))
My problem is in the querying - I've tried various approaches to generating the breadcrumb but nothing seems to work. Some observations:
Perhaps using things like CONCAT and IF inline in the query is not very pythonic (is it even possible with the ORM?) - so I've tried performing these operations outside SQLAlchemy, in a Python loop of the records. However here I've struggled to access the individual fields - for example the model accessors don't seem to go n-levels deep, e.g. City.counties.text_strings.language doesn't exist.
I've also experimented with using tuples - the closest I've got to it working was by splitting it out into two queries:
# For cities without a county
for city, country in session.query(City, Country).\
filter(Country.id == City.country_id).\
filter(City.county_id == None).all():
if city.text_strings.language == 'en':
# etc
# For cities with a county
for city, county, country in session.query(City, County, Country).\
filter(and_(City.county_id == County.id, City.country_id == Country.id)).all():
if city.text_strings.language == 'en':
# etc
I split it out into two queries because I couldn't figure out how to make the Suit join optional in just the one query. But this approach is of course terrible and worse the second query didn't work 100% - it wasn't joining all of the different city.text_strings for subsequent filtering.
So I'm stumped! Any help you can give me setting me on the right path for performing these sorts of complex-ish queries in SQLAlchemy ORM would be much appreciated.
The mapping for Suit is not present but based on the propel query I would assume it has a text_strings attribute.
The relevant portion of SQLAlchemy documentation describing aliases with joins is at:
http://www.sqlalchemy.org/docs/orm/tutorial.html#using-aliases
generation of functions is at:
http://www.sqlalchemy.org/docs/core/tutorial.html#functions
cyts = aliased(TextString)
cits = aliased(TextString)
cots = aliased(TextString)
cy = aliased(Suit)
co = aliased(Suit)
session.query(
City.id,
(
cyts.text_string + \
'> ' + \
func.if_(cots.text_string!=None, cots.text_string + '> ', cits.text_string)
).label('city_breadcrumb')
).\
outerjoin((cits, City.text_strings)).\
outerjoin((cy, City.country)).\
outerjoin((cyts, cy.text_strings)).\
outerjoin((co, City.county))\
outerjoin((cots, co.text_string)).\
filter(cits.langauge=='en').\
filter(cyts.langauge=='en').\
filter(or_(cots.langauge=='en', cots.language==None))
though I would think its a heck of a lot simpler to just say:
city.text_strings.text_string + " > " + city.country.text_strings.text_string + " > " city.county.text_strings.text_string
If you put a descriptor on City, Suit:
class City(object):
# ...
#property
def text_string(self):
return self.text_strings.text_string
then you could say city.text_string.
Just for the record, here is the code I ended up using. Mike (zzzeek)'s answer stays as the correct and definitive answer because this is just an adaptation of his, which was the breakthrough for me.
cits = aliased(TextString)
cyts = aliased(TextString)
cots = aliased(TextString)
for (city_id, country_text, county_text, city_text) in \
session.query(City.id, cyts.text_string, cots.text_string, cits.text_string).\
outerjoin((cits, and_(cits.id==City.text_string_id, cits.language=='en'))).\
outerjoin((County, City.county)).\
outerjoin((cots, and_(cots.id==County.text_string_id, cots.language=='en'))).\
outerjoin((Country, City.country)).\
outerjoin((cyts, and_(cyts.id==Country.text_string_id, cyts.language=='en'))):
# Python to construct the breadcrumb, checking county_text for None-ness

What is an efficient way of inserting thousands of records into an SQLite table using Django?

I have to insert 8000+ records into a SQLite database using Django's ORM. This operation needs to be run as a cronjob about once per minute.
At the moment I'm using a for loop to iterate through all the items and then insert them one by one.
Example:
for item in items:
entry = Entry(a1=item.a1, a2=item.a2)
entry.save()
What is an efficient way of doing this?
Edit: A little comparison between the two insertion methods.
Without commit_manually decorator (11245 records):
nox#noxdevel marinetraffic]$ time python manage.py insrec
real 1m50.288s
user 0m6.710s
sys 0m23.445s
Using commit_manually decorator (11245 records):
[nox#noxdevel marinetraffic]$ time python manage.py insrec
real 0m18.464s
user 0m5.433s
sys 0m10.163s
Note: The test script also does some other operations besides inserting into the database (downloads a ZIP file, extracts an XML file from the ZIP archive, parses the XML file) so the time needed for execution does not necessarily represent the time needed to insert the records.
You want to check out django.db.transaction.commit_manually.
http://docs.djangoproject.com/en/dev/topics/db/transactions/#django-db-transaction-commit-manually
So it would be something like:
from django.db import transaction
#transaction.commit_manually
def viewfunc(request):
...
for item in items:
entry = Entry(a1=item.a1, a2=item.a2)
entry.save()
transaction.commit()
Which will only commit once, instead at each save().
In django 1.3 context managers were introduced.
So now you can use transaction.commit_on_success() in a similar way:
from django.db import transaction
def viewfunc(request):
...
with transaction.commit_on_success():
for item in items:
entry = Entry(a1=item.a1, a2=item.a2)
entry.save()
In django 1.4, bulk_create was added, allowing you to create lists of your model objects and then commit them all at once.
NOTE the save method will not be called when using bulk create.
>>> Entry.objects.bulk_create([
... Entry(headline="Django 1.0 Released"),
... Entry(headline="Django 1.1 Announced"),
... Entry(headline="Breaking: Django is awesome")
... ])
In django 1.6, transaction.atomic was introduced, intended to replace now legacy functions commit_on_success and commit_manually.
from the django documentation on atomic:
atomic is usable both as a decorator:
from django.db import transaction
#transaction.atomic
def viewfunc(request):
# This code executes inside a transaction.
do_stuff()
and as a context manager:
from django.db import transaction
def viewfunc(request):
# This code executes in autocommit mode (Django's default).
do_stuff()
with transaction.atomic():
# This code executes inside a transaction.
do_more_stuff()
Bulk creation is available in Django 1.4:
https://django.readthedocs.io/en/1.4/ref/models/querysets.html#bulk-create
Have a look at this. It's meant for use out-of-the-box with MySQL only, but there are pointers on what to do for other databases.
You might be better off bulk-loading the items - prepare a file and use a bulk load tool. This will be vastly more efficient than 8000 individual inserts.
To answer the question particularly with regard to SQLite, as asked, while I have just now confirmed that bulk_create does provide a tremendous speedup there is a limitation with SQLite: "The default is to create all objects in one batch, except for SQLite where the default is such that at maximum 999 variables per query is used."
The quoted stuff is from the docs--- A-IV provided a link.
What I have to add is that this djangosnippets entry by alpar also seems to be working for me. It's a little wrapper that breaks the big batch that you want to process into smaller batches, managing the 999 variables limit.
You should check out DSE. I wrote DSE to solve these kinds of problems ( massive insert or updates ). Using the django orm is a dead-end, you got to do it in plain SQL and DSE takes care of much of that for you.
Thomas
def order(request):
if request.method=="GET":
cust_name = request.GET.get('cust_name', '')
cust_cont = request.GET.get('cust_cont', '')
pincode = request.GET.get('pincode', '')
city_name = request.GET.get('city_name', '')
state = request.GET.get('state', '')
contry = request.GET.get('contry', '')
gender = request.GET.get('gender', '')
paid_amt = request.GET.get('paid_amt', '')
due_amt = request.GET.get('due_amt', '')
order_date = request.GET.get('order_date', '')
print(order_date)
prod_name = request.GET.getlist('prod_name[]', '')
prod_qty = request.GET.getlist('prod_qty[]', '')
prod_price = request.GET.getlist('prod_price[]', '')
print(prod_name)
print(prod_qty)
print(prod_price)
# insert customer information into customer table
try:
# Insert Data into customer table
cust_tab = Customer(customer_name=cust_name, customer_contact=cust_cont, gender=gender, city_name=city_name, pincode=pincode, state_name=state, contry_name=contry)
cust_tab.save()
# Retrive Id from customer table
custo_id = Customer.objects.values_list('customer_id').last() #It is return
Tuple as result from Queryset
custo_id = int(custo_id[0]) #It is convert the Tuple in INT
# Insert Data into Order table
order_tab = Orders(order_date=order_date, paid_amt=paid_amt, due_amt=due_amt, customer_id=custo_id)
order_tab.save()
# Insert Data into Products table
# insert multiple data at a one time from djanog using while loop
i=0
while(i<len(prod_name)):
p_n = prod_name[i]
p_q = prod_qty[i]
p_p = prod_price[i]
# this is checking the variable, if variable is null so fill the varable value in database
if p_n != "" and p_q != "" and p_p != "":
prod_tab = Products(product_name=p_n, product_qty=p_q, product_price=p_p, customer_id=custo_id)
prod_tab.save()
i=i+1
I recommend using plain SQL (not ORM) you can insert multiple rows with a single insert:
insert into A select from B;
The select from B portion of your sql could be as complicated as you want it to get as long as the results match the columns in table A and there are no constraint conflicts.
def order(request):
if request.method=="GET":
# get the value from html page
cust_name = request.GET.get('cust_name', '')
cust_cont = request.GET.get('cust_cont', '')
pincode = request.GET.get('pincode', '')
city_name = request.GET.get('city_name', '')
state = request.GET.get('state', '')
contry = request.GET.get('contry', '')
gender = request.GET.get('gender', '')
paid_amt = request.GET.get('paid_amt', '')
due_amt = request.GET.get('due_amt', '')
order_date = request.GET.get('order_date', '')
prod_name = request.GET.getlist('prod_name[]', '')
prod_qty = request.GET.getlist('prod_qty[]', '')
prod_price = request.GET.getlist('prod_price[]', '')
# insert customer information into customer table
try:
# Insert Data into customer table
cust_tab = Customer(customer_name=cust_name, customer_contact=cust_cont, gender=gender, city_name=city_name, pincode=pincode, state_name=state, contry_name=contry)
cust_tab.save()
# Retrive Id from customer table
custo_id = Customer.objects.values_list('customer_id').last() #It is return Tuple as result from Queryset
custo_id = int(custo_id[0]) #It is convert the Tuple in INT
# Insert Data into Order table
order_tab = Orders(order_date=order_date, paid_amt=paid_amt, due_amt=due_amt, customer_id=custo_id)
order_tab.save()
# Insert Data into Products table
# insert multiple data at a one time from djanog using while loop
i=0
while(i<len(prod_name)):
p_n = prod_name[i]
p_q = prod_qty[i]
p_p = prod_price[i]
# this is checking the variable, if variable is null so fill the varable value in database
if p_n != "" and p_q != "" and p_p != "":
prod_tab = Products(product_name=p_n, product_qty=p_q, product_price=p_p, customer_id=custo_id)
prod_tab.save()
i=i+1
return HttpResponse('Your Record Has been Saved')
except Exception as e:
return HttpResponse(e)
return render(request, 'invoice_system/order.html')

Categories