I'm working on parsing a file and inserting it into a database, using sqlalchemy core. I had it set up with the orm originally but that doesn't meet the speed requirements for the project.
My database has 2 tables: Objects and Attributes. The Objects table has a primary key of obj_id. The primary key for Attributes is composite: attr_name, attr_class, and obj_id, which is also a foreign key from Objects.
The attributes are stored after parsing the file in a list of dictionaries, like so:
[
{ 'obj_id' = obj_id, 'attr_name' = name, 'attr_class' = class, etc...},
{ ETC ETC ETC}]
The data is being inserted by first bulk inserting the objects, then the attributes. The object insert works perfectly. When inserting the attributes however, I get an integrity error, saying I tried to insert a duplicate primary key.
Here is my insert code for attributes:
self.engine.execute(
Attributes.__table__.insert(),
[{'obj_id' : attr['obj_id'],
'attr_name' : attr['attr_name'],
'attr_class': attr['attr_class'],
'attr_type' : attr['attr_type'],
'attr_size' : attr['attr_size']} for attr in attrList])
While trying to work this error out, I printed the id, name, and class of each attribute in the list to a file to find the duplicate key. Nowhere in the list is there actually an identical primary key, so this leads me to believe it is a problem with the structure of my query.
Can anyone figure this out with the info I've given, or give me somewhere to look for more information? I've already checked the documentation pretty thoroughly and couldn't find anything helpful.
Edit:
I also tried executing each insert statement separately, as suggested by someone on sqlalchemy's google group. The results were the same. The code I used:
insert = Attributes.__table__.insert()
for attr in attrList:
stmt = insert.values({'obj_id' : attr['obj_id'], ...})
self.engine.execute(stmt)
where ... was the rest of the values.
Edit 2:
The Integrity error is thrown as soon as I try to insert an attribute with the same name/class but a different object id. So for example:
In the format name-class-id:
By iteration 4, I've got:
Attr1-Class1-0
Attr2-Class2-0
Attr3-Class3-0
Attr4-Class4-0
On the next iteration, I try to insert Attr1-Class1-1, which fails.
I found the problem, completely unrelated to the insert code. When storing the data in the list, I was storing an Object as obj_id, which sqlalchemy didn't like. By fixing that I fixed the insertions.
Related
I have work in Perl where I am able to get the newly created data object ID by passing the result back to a variable. For example:
my $data_obj = $schema->resultset('PersonTable')->create(\%psw_rec_hash);
Where the $data_obj contains the primary key's column value.
I want to be able to do the same thing using Python 3.7, Flask and flask-mysqldb,
but without having to do another query. I want to be able to use the specific
record's primary key column value for another method.
Python and flask-mysqldb inserts data like so:
query = "INSERT INTO PersonTable (fname, mname, lname) VALUES('Phil','','Vil')
cursor = db.connection.cursor()
cursor.execute(query)
db.connection.commit()
cursor.close()
The PersonTable has a primary key column called, id. So, the newly inserted data row would look
like:
23, 'Phil', 'Vil'
Because there are 22 rows of data before the last inserted data, I don't want to perform a search
for the data, because there could be more than one entry with the same data. However, all I want
the most recent data row.
Can I do something similar to Perl with python 3.7 and flask-mysqldb?
You may want to consider the Flask-SQLAlchemy package to help you with this.
Although the syntax is going to be slightly different from Perl, what you can do is, when you create the model object, you can set it to a variable. Then, when you either flush or commit on the Database session, you can pull up your primary key attribute on that model object you had created (whether it's "id" or something else), and use it as needed.
SQLAlchemy supports MySQL, as well as several other relational databases. In addition, it is able to help prevent SQL injection attacks so long as you use model objects and add/delete them to your database session, as opposed to straight SQL commands.
Okey so currently I'm trying to upsert something in a local mongodb using pymongo.(I check to see if the document is in the db and if it is, update it, otherwise just insert it)
I'm using bulk_write to do that, and everything is working ok. The data is inserted/updated.
However, i would need the ids of the newly inserted/updated documents but the "upserted_ids" in the bulkWriteResult object is empty, even if it states that it inserted 14 documents.
I've added this screenshot with the variable. Is it a bug? or is there something i'm not aware of?
Finally, is there a way of getting the ids of the documents without actually searching for them in the db? (If possible, I would prefer to use bulk_write)
Thank you for your time.
EDIT:
As suggested, i added a part of the code so it's easier to get the general ideea:
for name in input_list:
if name not in stored_names: #completely new entry (both name and package)
operations.append(InsertOne({"name": name, "package" : [package_name]}))
if len(operations) == 0:
print ("## No new permissions to insert")
return
bulkWriteResult = _db_insert_bulk(collection_name,operations)
and the insert function:
def _db_insert_bulk(collection_name,operations_list):
return db[collection_name].bulk_write(operations_list)
The upserted_ids field in the pymongo BulkWriteResult only contains the ids of the records that have been inserted as part of an upsert operation, e.g. an UpdateOne or ReplaceOne with the upsert=True parameter set.
As you are performing InsertOne which doesn't have an upsert option, the upserted_ids list will be empty.
The lack of an inserted_ids field in pymongo's BulkWriteResult in an omission in the drivers; technically it conforms to crud specificaiton mentioned in D. SM's answer as it is annotated as "Drivers may choose to not provide this property.".
But ... there is an answer. If you are only doing inserts as part of your bulk update (and not mixed bulk operations), just use insert_many(). It is just as efficient as a bulk write and, crucially, does provide the inserted_ids value in the InsertManyResult object.
from pymongo import MongoClient
db = MongoClient()['mydatabase']
inserts = [{'foo': 'bar'}]
result = db.test.insert_many(inserts, ordered=False)
print(result.inserted_ids)
Prints:
[ObjectId('5fb92cafbe8be8a43bd1bde0')]
This functionality is part of crud specification and should be implemented by compliant drivers including pymongo. Reference pymongo documentation for correct usage.
Example in Ruby:
irb(main):003:0> c.bulk_write([insert_one:{a:1}])
=> #<Mongo::BulkWrite::Result:0x00005579c42d7dd0 #results={"n_inserted"=>1, "n"=>1, "inserted_ids"=>[BSON::ObjectId('5fb7e4b12c97a60f255eb590')]}>
Your output shows that zero documents were upserted, therefore there wouldn't be any ids associated with the upserted documents.
Your code doesn't appear to show any upserts at all, which again means you won't see any upserted ids.
I am trying to select a subset of columns from a table with sqlalchemy's load_only function. Unfortunately it doesn't seem to return only the columns specified in the functional call - specifically, it also seems to fetch the primary key (in my case, an auto_increment id field).
A simple example, if I use this statement to build a query,:
query = session.query(table).options(load_only('col_1', 'col_2'))
Then the query.statement looks like this:
SELECT "table".id, "table"."col_1", "table"."col_2"
FROM "table"
Which is not what I would have expected - given I've specified the "only" columns to use...Where did the id come from - and is there a way to remove it?
Deferring the primary key would not make sense, if querying complete ORM entities, because an entity must have an identity so that a unique row can be identified in the database table. So the query includes the primary key though you have your load_only(). If you want the data only, you should query for that specifically:
session.query(table.col1, table.col2).all()
The results are keyed tuples that you can treat like you would the entities in many cases.
There actually was an issue where having load_only() did remove the primary key from the select list, and it was fixed in 0.9.5:
[orm] [bug] Modified the behavior of orm.load_only() such that primary key columns are always added to the list of columns to be “undeferred”; otherwise, the ORM can’t load the row’s identity. Apparently, one can defer the mapped primary keys and the ORM will fail, that hasn’t been changed. But as load_only is essentially saying “defer all but X”, it’s more critical that PK cols not be part of this deferral.
I have a program inserting a bunch of data into an SQL database. The data consists of Reports, each having a number of Tags.
A Tag has a field report_id, which is a reference to the primary key of the relevant Report.
Now, each time I insert the data, there can be 200 Reports or even more, each maybe having 400 Tags. So in pseudo-code I'm now doing this:
for report in reports:
cursor_report = sql('INSERT report...')
cursor_report.commit()
report_id = sql('SELECT ##IDENTITY')
for tag in report:
cursor_tag += sql('INSERT tag, report_id=report_id')
cursor_tag.commit()
I don't like this for a couple of reasons. Mostly i don't like the SELECT ##IDENTITY statement.
Wouldn't this mean that if another process were inserting data at the right moment then the statement would return the wrong primary key?
I would rather like the INSERT report... to return the inserted primary key, is that possible?
Since I currently have to commit between reports the program "pauses" during these moments. If I could commit everything at the end then it would greatly reduce the time spent. I have been considering creating a seperate field in Report used for identification so I could report_id = (SELECT id FROM reports WHERE seperate_field=?) or something in the Tags, but that doesn't seem very elegant.
Wouldn't this mean that if another process were inserting data at the right moment then the ["SELECT ##IDENTITY"] statement would return the wrong primary key?
No. The database engine keeps track of the last identity value inserted for each connection and returns the appropriate value for the connection on which the SELECT ##IDENTITY statement is executed.
I have this model
class Type(models.Model):
type = models.CharField(max_length=50)
value = models.CharField(max_length=1)
And into it, I have some data from an sql file:
INSERT INTO quest_type (type, value) VALUES ('Noun', '1');
INSERT INTO quest_type (type, value) VALUES ('Adjective', '2');
INSERT INTO quest_type (type, value) VALUES ('Duration', '3');
How do I access these values in the python shell? For example, if I know the type, how do I get the value (and vice verse)? I'm not sure how the syntax works.
you should be able to get that with
Type.objects.filter(type=typeImInterestedIn)
A couple of things to be leary of:
-you probably want to avoid manually writing to a DB that you're using an ORM in. It just creates potential for mismatches.
-naming an object Type is little problematic since it's so close to the python native function type.
It's unclear from your question how much about databases you understand, so I apologize if this answer is too basic for you (if so, please edit your question to include information about what actual database engine you're using and show some sample code trying to read from the database).
The SQL file you have is not the same as an SQL database. It is a series of commands that will create records in an SQL database. First you must install and configure a database engine on your machine then "run" that .sql file so that the records are created in the database.
After you have an actual database, you will have to configure Django so that it knows what kind of SQL engine you're using and the name and location of the database.
Finally, once the database is created and Django configured to talk to the engine, you will write python code to instantiate an instance of the Type class, read a record from the database, and inspect the values.
Also, let me point out that Type is a really, really bad name for a class in any programming language, and type and value are both bad names for columns in SQL databases.
If you are using python shell from django (python manage.py shell) firstly You have to import to your namespace your model, so type from my_app.models import Type.
Now if You want to get only one object from db syntax is:
result = Type.objects.get(type='your_query')
If you want to fetch more then one object syntax goes like this:
result = Type.objects.filter(type='your_query')
second method returns list instead of single object
To loop through list after using filter write:
for item in result:
item.value #will print values from matched rows