I'm doing some bulk_create in Django from uploaded csv file. I'm using ignore_conflicts=True to ignore duplicates, but it doesn't seems to work, whatever, I would like to check duplicates based on just one column in csv data. Let me explain;
My data looks like
name address number
aaa bbbb 090
I would like to check if same number in database exist and if yes, don't insert the data from bulk_create there.
I'm using the code bellow, but it throws an syntax error
....some code...
all_db_data = OnlineDB.objects.all()
OnlineDB.objects.bulk_create([OnlineDB(
number=row[2],
name=row[0],
address=row[1]
) ignore_dups(row, all_db_data ) for row in reader], ignore_conflicts=True) # reader are data from csv.reader()
def ignore_dups(row, all_db_data):
match = all_db_data.filter(number=row[2])
if match.Count() > 0:
return
return row
Related
I'm trying to create an sql query that takes records from a File table and a Customer table. A file can have multiple customers. I want to show only one record per File.id and Concatenate the last names based on alphabetical order of the clients if the names are different or only show one if they are the same.
Below is a picture of the Relationship.
Table Relationship
The results from my query look like this currently.
enter image description here
I would like the query to look like this.
File ID
Name
1
Dick Dipe
2
Bill
3
Lola
Originally I had tried doing a subquery but I had issues that there were multiple results and it couldn't list more than one. If I could do a loop and add to an array, I feel like that would work.
If I were to do it in Python, I would write this but when I try to translate that into SQL, I get errors that either the subquery can only display one result or the second name under file two gets cut off.
clients = ['Dick','Dipe','Bill','Lola', 'Lola']
files = [1,2,3]
fileDetails = [[1,0],[1,1],[2,2],[3,3],[3,4]]
file_clients = {}
for file_id, client_index in fileDetails:
if file_id not in file_clients:
file_clients[file_id] = []
client_name = clients[client_index]
file_clients[file_id].append(client_name)
for file_id, client_names in file_clients.items():
client_names = list(dict.fromkeys(client_names))
client_names_string = " ".join(client_names)
print(f"File {file_id}: {client_names_string}")
I'm using python unittests and sqlalchemy to test datamodels to store an WTFoms in mariaDB.
The test should create a dataset, write this dataset to db, read this set an compare if original dataset is the same like sored data.
So the partial test looks like that:
#set data
myForm = NiceForm()
myForm.name = "Ben"
#write data
db.session.add(myForm)
db.session.commit()
#read data
loadedForms = NiceForm.query.all()
#check that only one entry is in db
self.assertEqual(len(loadedForms), 1)
#compare stores data with dataset
self.assertIn(myForm, loadedForms)
The test seams to work fine. No I tried the find out, if the test fails, if dataset != stored data. So ein changed the dataset before compareing it, like this:
#set data
myForm = NiceForm()
myForm.name = "Ben"
#write data
db.session.add(myForm)
db.session.commit()
#read data
loadedForms = NiceForm.query.all()
#modify dataset
myForm.name = "Foo"
#show content of both
print(myForm.name)
print(loadedForms[0].name)
#check that only one entry is in db
self.assertEqual(len(loadedForms), 1)
#compare stores data with dataset
self.assertIn(myForm, loadedForms)
This test still passed. Why? I output the content of myForm.name and loadedForms[0].name where both set to Foo. This is the reason, why the self.assertsIn(myForm, loadedForms)passed the test, but I don't understand:
Why the content of the loadedForms is changed, when Foowas only applied to myForm?
The row identity for MyForm does not change by changing one of the values.
Row numbers have no meaning in a table, but to make the issue clear I will still use them.
Row 153 has 2 fields. Field name = "Ben" and field homeruns = 3.
Now we change the home runs (Ben has hit a home run);
Row 153 has 2 fields. Field name = "Ben" and field homeruns = 4.
It is still row 153, so your assertIn wil still return True, though one of the values in the row has changed. You only test identity.
If it wouldn't, changing a field in a table row would need to be saved by an insert into the table and not an update to the row. That is not correct of course; how many Bens do we have? One. And he has 4 home runs, not 3 or 4, depending on which record you look at.
I am developing a web application with Flask, Python, SQLAlchemy, and Mysql.
I have 2 tables:
TaskUser:
- id
- id_task (foreign key of id column of table Task)
- message
Task
- id
- id_type_task
I need to extract all the tasksusers (from TaskUser) where the id_task is in a specific list of Task ids.
For example, all the taskusers where id_task is in (1,2,3,4,5)
Once I get the result, I do some stuff and use some conditions.
When I make this request :
all_tasksuser=TaskUser.query.filter(TaskUser.id_task==Task.id) \
.filter(TaskUser.id_task.in_(list_all_tasks),Task.id_type_task).all()
for item in all_tasksuser:
item.message="something"
if item.id_type_task == 2:
#do some stuff
if item.id_task == 7 or item.id_task == 7:
#do some stuff
I get this output error:
if item.id_type_task == 2:
AttributeError: 'TaskUser' object has no attribute 'id_type_task'
It is normal as my SQLAlchemy request is calling only one table. I can't access to columns of table Task.
BUT I CAN call the columns of TaskUser by their names (see item.id_task).
So I change my SQLAlchemy to this:
all_tasksuser=db_mysql.session.query(TaskUser,Task.id,Task.id_type_task).filter(TaskUser.id_task==Task.id) \
.filter(TaskUser.id_task.in_(list_all_tasks),Task.id_type_task).all()
This time I include the table Task in my query BUT I CAN'T call the columns by their names. I should use the [index] of columns.
I get this kind of error message:
AttributeError: 'result' object has no attribute 'message'
The problem is I have many more columns (around 40) on both tables. It is too complicated to handle data with index numbers of columns.
I need to have a list of rows with data from 2 different tables and I need to be able to call the data by column name in a loop.
Do you think it is possible?
The key point leading to the confusion is the fact that when you perform a query for a mapped class like TaskUser, the sqlalchemy will return instances of that class. For example:
q1 = TaskUser.query.filter(...).all() # returns a list of [TaskUser]
q2 = db_mysql.session.query(TaskUser).filter(...).all() # ditto
However, if you specify only specific columns, you will receive just a (special) list of tuples:
q3 = db_mysql.session.query(TaskUser.col1, TaskUser.col2, ...)...
If you switch your mindset to completely use the ORM paradigm, you will work mostly with objects. In your specific example, the workflow could be similar to below, assuming you have relationships defined on your models:
# model
class Task(...):
id = ...
id_type_task = ...
class TaskUser(...):
id = ...
id_task = Column(ForeignKey(Task.id))
message = ...
task = relationship(Task, backref="task_users")
# query
all_tasksuser = TaskUser.query ...
# work
for item in all_tasksuser:
item.message = "something"
if item.task.id_type_task == 2: # <- CHANGED to navigate the relationship ...
#do some stuff
if item.task.id_task == 7 or item.task.id_task == 7: # <- CHANGED
#do some stuff
First error message is the fact that query and filter without join (our any other joins) cannot give you columns from both tables. You need to either put both tables into session query, or join those two tables in order to gather column values from different tables, so this code:
all_tasksuser=TaskUser.query.filter(TaskUser.id_task==Task.id) \
.filter(TaskUser.id_task.in_(list_all_tasks),Task.id_type_task).all()
needs to look more like this:
all_tasksuser=TaskUser.query.join(Task) \
.filter(TaskUser.id_task.in_(list_all_tasks),Task.id_type_task).all()
or like this:
all_tasksuser=session.query(TaskUser, Task).filter(TaskUser.id_task==Task.id) \
.filter(TaskUser.id_task.in_(list_all_tasks),Task.id_type_task).all()
Another thing is that the data will be structured differently so in the first example, you will need two for loops:
for taskuser in all_taskuser:
for task in taskuser.task:
# to reference to id_type_task : task.id_type_task
and in the second example your result is the tuple, so for loop should look like this
for taskuser, task in all_taskuser:
# to reference to id_type_task : task.id_type_task
NOTE: I haven't check all these examples, so there may be errors, but concepts are there. For more info, please refer yourself to this page:
https://www.tutorialspoint.com/sqlalchemy/sqlalchemy_orm_working_with_joins.htm
Am trying to use petl library to build an ETL process that copied data between two tables. The table contain a unique slug field on the destination. For that, I wrote my script so It would identify duplicate slugs and convert them with by appending ID to the slug value.
table = etl.fromdb(source_con, 'SELECT * FROM user')
# get whatever remains as duplicates
duplicates = etl.duplicates(table, 'slug')
for dup in [i for i in duplicates.values('id')]:
table = etl.convert(
table,
'slug',
lambda v, row: '{}-{}'.format(slugify_unicode(v), str(row.id).encode('hex')),
where=lambda row: row.id == dup,
pass_row=True
)
The above did not work as expected, it seems like the table object remains with duplicate values after the loop.
Anyone can advise?
Thanks
I am trying to create an new unique id field in an access table. I already have one field called SITE_ID_FD, but it is historical. The format of the unique value in that field isn't what our current format is, so I am creating a new field with the new format.
Old Format = M001, M002, K003, K004, S005, M006, etc
New format = 12001, 12002, 12003, 12004, 12005, 12006, etc
I wrote the following script:
fc = r"Z:\test.gdb\testfc"
x = 12001
cursor = arcpy.UpdateCursor(fc)
for row in cursor:
row.setValue("SITE_ID", x)
cursor.updateRow(row)
x+= 1
This works fine, but it populates the new id field based on the default sorting of objectID. I need to sort 2 fields first and then populate the new id field based on that sorting (I want to sort by a field called SITE and then by the old id field SITE_ID_FD)
I tried manually sorting the 2 fields in hopes that Python would honor the sort, but it doesn't. I'm not sure how to do this in Python. Can anyone suggest a method?
A possible solution is when you are creating your update cursor. you can specify to the cursor the fields by which you wish it to be sorted (sorry for my english..), they explain this in the documentation: http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#//000v0000003m000000
so it goes like this:
UpdateCursor(dataset, {where_clause}, {spatial_reference}, {fields}, {sort_fields})
and you are intrested only in the sort_fields so assuming that your code will work well on a sorted table and that you want the table ordered asscending the second part of your code should look like this:
fc = r"Z:\test.gdb\testfc"
x = 12001
cursor = arcpy.UpdateCursor(fc,"","","","SITE A, SITE_ID_FD A")
#if you want to sort it descending you need to write it with a D
#>> cursor = arcpy.UpdateCursor(fc,"","","","SITE D, SITE_ID_FD D")
for row in cursor:
row.setValue("SITE_ID", x)
cursor.updateRow(row)
x+= 1
i hope this helps
Added a link to the arcpy docs in a comment, but from what I can tell, this will create a new, sorted dataset--
import arcpy
from arcpy import env
env.workspace = r"z:\test.gdb"
arcpy.Sort_management("testfc", "testfc_sort", [["SITE", "ASCENDING"],
["SITE_IF_FD", "ASCENDING]])
And this will, on the sorted dataset, do what you want:
fc = r"Z:\test.gdb\testfc_sort"
x = 12001
cursor = arcpy.UpdateCursor(fc)
for row in cursor:
row.setValue("SITE_ID", x)
cursor.updateRow(row)
x+= 1
I'm assuming there's some way to just copy the sorted/modified dataset back over the original, so it's all good?
I'll admit, I don't use arcpy, and the docs could be a lot more explicit.