Cx_Oracle equivalent of on duplicate key update - python

Now I have a list of tuple named "Data"
[
('171115090000',
Timestamp('2017-11-15 09:00:00'),
'PAIR1',
156.0)
]
I want to insert this list to Oracle DB, my code is
cur.executemany(
'''INSERT INTO A
("SID","DATE","ATT","VALUE")
VALUES(:1,:2,:3,:4)''',Data)
And it works well. However if I want to add/replace new records into this database, I have to create a table B to put those records then merge A and B.
Is there anything like on duplicate key update that I could finish my job without creating a new table?
I know I could select all records from A, convert them to a DataFrame and merge DataFrames in Python, is this a good solution?

Is there anything like on duplicate key update
In Oracle, it is called MERGE; have a look at the following example:
Table contents at the beginning:
SQL> select * From dept;
DEPTNO DNAME LOC
---------- -------------- -------------
10 ACCOUNTING NEW YORK
20 RESEARCH DALLAS
30 SALES CHICAGO
40 OPERATIONS BOSTON
MERGE statement:
SQL> merge into dept d
2 using (select deptno, dname, loc
3 from (select 10 deptno, 'ACC' dname, 'NY' loc from dual --> already exists, should be updated
4 union all
5 select 99 , 'NEW DEPT' , 'LONDON' from dual --> doesn't exists, should be inserted
6 )
7 ) x
8 on (d.deptno = x.deptno)
9 when matched then update set
10 d.dname = x.dname,
11 d.loc = x.loc
12 when not matched then insert (d.deptno, d.dname, d.loc)
13 values (x.deptno, x.dname, x.loc);
2 rows merged.
The result: as you can see, values for existing DEPTNO = 10 were updated, while the new DEPTNO = 99 was inserted into the table.
SQL> select * From dept;
DEPTNO DNAME LOC
---------- -------------- -------------
10 ACC NY
20 RESEARCH DALLAS
30 SALES CHICAGO
40 OPERATIONS BOSTON
99 NEW DEPT LONDON
SQL>
I don't speak Python so I can't compose code you might use, but I hope that you'll manage to do it yourself.

Related

Python/Pandas: Delete rows where value in one column is not present in another column in the same data frame

I have a data frame containing a multi-parent hierarchy of employees. Node (int64) is a key that identifies each unique combination of employee and manager. Parent (float64) is a key that represents the manager in the ‘node’.
Due to some source data anomalies, there are parent keys present in the data frame that are not 'nodes'. I would like to delete all such rows where this is occurring.
empId
empName
mgrId
mgrName
node
parent
111
Alice
222
Bob
1
3
111
Alice
333
Charlie
2
4
222
Bob
444
Dave
3
5
333
Charlie
444
Dave
4
5
444
Dave
5
555
Elizabeth
333
Charlie
6
7
In the above sample, it would be employee ID 555 because parent key 7 is not present anywhere in ‘node’ column.
Here's what I tried so far:
This removes some rows but does not remove all. Not sure why?
df1 = df[df['parent'].isin(df['node'])]
I thought maybe it was because ‘parent’ is float and ‘node’ is int64, so I converted and tried but same result as previous.
df1 = df[df['parent'].astype('int64').isin(df['node'])]
Something to consider is that the data frame contains around 1.5 million rows.
I also tried this, but this just keeps running the code forever - I'm assume this is because .map will loop through the entire data frame (which is around 1.5 million rows):
df[df['parent'].map(lambda x: np.isin(x, df['node']).all())]
I'm especially perplexed, when I use the first 2 code snippets, as to why it would consistently filter out a small subset of rows that do not meet the filter condition but not all.
Again, 'parent' is float64 and has empty values. 'node' is int64 and has no empty values. A more realistic example of node and parent keys is as follows:
Node - 13192210227
Parent - 12668210227.0

SQLite removing a row, and reordering the rest of the rows

I'm struggling with something relatively simple,
Let's say I have a table the first column is the primary key autoincrement
id
name
age
1
tom
22
2
harry
33
3
greg
44
4
sally
55
I want to remove row 2 and the rest of the items to automatically reorder so it looks like this
id
name
age
1
tom
22
2
greg
44
3
sally
55
I have tried every available bit of advice I can find online, they all involve deleting the table name from the sqlite_sequence table which doesn't work
is there a simple way to do archive what I am after?
I don't see the point/need to resequence the id column, as all values there will continue to be unique, even after deleting one or more records. If you really wanted to view your data this way, you could delete the record mentioned, and then use ROW_NUMBER:
DELETE
FROM yourTable
WHERE id = 2;
SELECT ROW_NUMBER() OVER (ORDER BY id) id, name, age
FROM yourTable
ORDER BY id;

Is there a way to join 2 dataframes using another reference table in python

I have 2 data frames created from CSV files and there is another data frame which is a reference for these table. For e.g.
1 Employee demographic (Emp_id, dept_id)
2 Employee detail (Emp_id, RM_ID)
I have 3rd dataframe(dept_manager) which has only 2 columns (dept_id, RM_ID). Now I need to join table 1 and 2 referencing the 3rd dataframe.
Trying out in pandas(python) any help here would be much appreciated..Thanks in advance.
Table1
Empdemogr
Empid dept_id
1 10
2 20
1 30
Table2
Empdetail
Empid RM_id
1 E120
2 E140
3 E130
Table3
dept_manager
dept_id RM_id
10 E110
10 E120
10 E121
10 E122
10 E123
20 E140
20 E141
20 E142
30 E130
30 E131
30 E132
Output:
Emp_id dept_id RM_id
1 10 E120
2 20 E140
1 30 E130
So trying to bring this sql in python:
select a.Emp_id, a.dept_id, b.RM_id
Empdemogr a, Empdetail b, dept_manager d
where
a.emp_id=b.emp_id
and a.dept_id=d.dept_id
and b.RM_id=d.RM_id
Trying to figure out if you had a typo or you have wrong understanding. Your above SQL would not output the the result you are looking for based on the provided data. I do not think you will see dept_id '30' in there.
But Going by your SQL query, here is how you can write the same in python dataframe:
Preparing DataFrames (I will leave it up to you how you load the dataframes):
import pandas as pd
EmpployeeDemo=pd.read_csv(r"YourEmployeeDemoFile.txt")
EmpDetail=pd.read_csv(r"YourEmpDetailFile.txt")
Dept_Manager=pd.read_csv(r"YourDept_Manager.txt")
Code to Join the DataFrames:
joined_dataframe = pd.merge(pd.merge(EmpployeeDemo, EmpDetail, on="Empid"),Dept_Manager, on=["dept_id", "RM_id"])
print(joined_dataframe)

UPDATE table after LEFT JOIN

I'm doing a LEFT OUTER JOIN with some conditions. The code I'm using for that is:
SELECT *
FROM
(SELECT ADS, Unit, Quantity, ZXY FROM TABLE1) as A
LEFT OUTER JOIN (SELECT ADS, Name, Unit_U, Price FROM TABLE2) as B
ON ((A.ADS = B.ADS OR A.ADS = B.Name) and A.Unit = B.Unit_U) COLLATE nocase
Doing this I arrive to print the result, but the table is not updated (if I close the connection and restart it, I don't see the last column).
Even if I do a print of the column 'Price' selecting the table 1, I get an error saying that the column doesn't exists.
Here the example that I'm trying to solve :
TABLE 1
ADS Unit Quantity ZXY
--------------------------------------
1 KG 2 None
2 KG 1 None
3 KG 3 None
4 KG 5 None
5 KG 7 None
6 KG 1 None
TABLE 2
ADS Name Unit_U Price
--------------------------------------
1 15 KG 7.00
25 2 KG 8.00
3 14 KG 5.00
25 4 G 8.00
TABLE AFTER LEFT JOIN
ADS Unit Quantity ZXY Price
--------------------------------------
1 KG 2 None 7.00
2 KG 1 None 8.00
3 KG 3 None 5.00
4 KG 5 None None
5 KG 7 None None
6 KG 1 None None
How can I UPDATE de table and save the modifications after the LEFT OUTER JOIN ?
First add a Price column to TABLE1:
ALTER TABLE TABLE1 ADD COLUMN Price INTEGER;
Then run the following update to populate the Price column with values from TABLE2, if available:
UPDATE TABLE1 t1
SET Price = (SELECT Price FROM TABLE2
WHERE
(LOWER(t1.ADS) = LOWER(t2.ADS) OR LOWER(t1.ADS) = LOWER(t2.Name)) AND
LOWER(t1.Unit) = LOWER(t2.Unit_U))
SQLite does not support update joins and using a subquery is an alternative.
Update:
One way to do case insensitive comparisons of your fields is to compare the lowercase version of the left and right hand sides.
Just use update query as follows
Update table name
set all the required columns in the table_to_be_updated set to individual columns in the join query

combine two id into a new table?

i had a task about text processing and i don't know how to combine some columns from separate tables into one table
so here is the case:
i have a table named list with id_doc, and title columns
then i create a new table named term_list which contains a list of result terms when i do some text processing to titles from list.
the term_list table have id_term, term, df, and idf column. Lastly, i want to have a table named term_freq which has columns id, id_term, id_doc, tf, and normalized_tf
example :
table list is like this:
id_doc titles
11 information retrieval system
12 operating system
13 business information
table term_list is below this:
id_term term df idf
21 information 2 --
22 retrieval 1 --
23 system 2 --
24 operating 1 --
25 business 1 --
I want to ask how to create a table term_freq so that the table becomes like this?
id id_term id_doc tf normalized_tf
31 21 11 1 --
32 22 11 1 --
33 23 11 1 --
34 24 12 1 --
35 23 12 1 --
36 25 13 1 --
37 21 13 1 --
the main problem is i have to join id_term and id_doc into one table that one id_doc has relation to several id_term but i don't know how to correlate because list and term_listdoesn't have any similar column.
Please help :(
You can iterate over rows in term_list:
SELECT id_term, term FROM term_list
for each term make:
SELECT id_doc FROM list WHERE titles LIKE "term"
and saves pairs id_term and id_doc in the table term_freq.

Categories