I'm doing a LEFT OUTER JOIN with some conditions. The code I'm using for that is:
SELECT *
FROM
(SELECT ADS, Unit, Quantity, ZXY FROM TABLE1) as A
LEFT OUTER JOIN (SELECT ADS, Name, Unit_U, Price FROM TABLE2) as B
ON ((A.ADS = B.ADS OR A.ADS = B.Name) and A.Unit = B.Unit_U) COLLATE nocase
Doing this I arrive to print the result, but the table is not updated (if I close the connection and restart it, I don't see the last column).
Even if I do a print of the column 'Price' selecting the table 1, I get an error saying that the column doesn't exists.
Here the example that I'm trying to solve :
TABLE 1
ADS Unit Quantity ZXY
--------------------------------------
1 KG 2 None
2 KG 1 None
3 KG 3 None
4 KG 5 None
5 KG 7 None
6 KG 1 None
TABLE 2
ADS Name Unit_U Price
--------------------------------------
1 15 KG 7.00
25 2 KG 8.00
3 14 KG 5.00
25 4 G 8.00
TABLE AFTER LEFT JOIN
ADS Unit Quantity ZXY Price
--------------------------------------
1 KG 2 None 7.00
2 KG 1 None 8.00
3 KG 3 None 5.00
4 KG 5 None None
5 KG 7 None None
6 KG 1 None None
How can I UPDATE de table and save the modifications after the LEFT OUTER JOIN ?
First add a Price column to TABLE1:
ALTER TABLE TABLE1 ADD COLUMN Price INTEGER;
Then run the following update to populate the Price column with values from TABLE2, if available:
UPDATE TABLE1 t1
SET Price = (SELECT Price FROM TABLE2
WHERE
(LOWER(t1.ADS) = LOWER(t2.ADS) OR LOWER(t1.ADS) = LOWER(t2.Name)) AND
LOWER(t1.Unit) = LOWER(t2.Unit_U))
SQLite does not support update joins and using a subquery is an alternative.
Update:
One way to do case insensitive comparisons of your fields is to compare the lowercase version of the left and right hand sides.
Just use update query as follows
Update table name
set all the required columns in the table_to_be_updated set to individual columns in the join query
Related
I want to ask a conceptual question.
I have a table that looks like
UPC_CODE A_PRICE A_QTY DATE COMPANY_CODE A_CAT
1001 100.25 2 2021-05-06 1 PB
1001 2122.75 10 2021-05-01 1 PB
1002 212.75 5 2021-05-07 2 PT
1002 3100.75 10 2021-05-01 2 PB
I want that for each UPC_CODE and COMPANY_CODE the latest data should be picked up.
To achieve this, I have SQL and Python
Using SQL:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY UPC_CODE, COMPANY_CODE ORDER BY DATE DESC) rn
FROM yourTable)
SELECT UPC_CODE, A_PRICE, A_QTY, DATE, COMPANY_CODE, A_CAT
FROM cte
WHERE rn = 1;
Using Python:
df = df.groupby(['UPC_CODE','COMPANY_CODE']).\
agg(Date = ('DATE','max'),A_PRICE = ('A_PRICE','first'),\
A_QTY = ('A_QTY','first'),A_CAT = ('A_CAT','first').reset_index()
Ideally I should be getting the following resultant table:
UPC_CODE A_PRICE A_QTY DATE COMPANY_CODE A_CAT
1001 100.25 2 2021-05-06 1 PB
1002 212.75 5 2021-05-07 2 PT
However, using SQL I am getting the above, but this is not the case for Python.
What I am missing out here?
upc_code and date columns might be used along with rank(method='first',ascending = False), eg. descending order while determining first rows , applied dataframe.groupby() function after converting date column to datetime type in Python in order to filter out the corresponding rows with value = 1 for df['rn']
df['date']=pd.to_datetime(df['date'])
df['rn']=df.groupby('upc_code')['date'].rank(method='first',ascending = False)
print(df[df['rn']==1])
The problem is as follows, I need to input (can be saved to the database, or csv file, excel file, etc.) I try to solve the problem with MySQL because the same order may have multiple products, they will branch. And the situation of the same product in different orders is not the same, which brings me a great challenge. If a great god can help me, thank you very much.
input:
Orderid Itemid Quantity
001 a1 1
001 a2 1
002 a1 1
003 a1 1
003 a2 1
004 a1 1
005 a1 3
006 a2 1
007 a1 1
008 a1 1
output:
ordersum percent Cumulative itemdetail
4 50.00% 50.00% a1[1]
2 25.00% 75.00% a1[1]a2[1]
1 12.50% 87.50% a1[3]
1 12.50% 100.00% a2[1]
I have been writing for an afternoon. The statistical results are not very satisfactory. Note that the problem fields I described are different from my original table fields. My thinking is as follows:
SET #csum: = 0;
select order volume, shop proportion, concat (round ((# csum: = #csum + shop proportion), 2), '', '%') cumulative proportion, shop name
from
(select t1.store name, t1.order quantity, concat (round (t1.order quantity / t2.ordersum * 100,2), '', '%')
from
(select shop shop name, count (distinct (order number)) order quantity
from order20190801
group by shop
order by count (distinct (order number)) desc) t1,
(select count (distinct (order number)) ordersum
from order20190801) t2) t3
5.8 seconds or so A bit long! !!
SET #csum := 0;
select 订单量,
concat(round(订单量 / 订单总量* 100 ,5),'','%') 订单占比,
concat(round(#csum := #csum + concat(round(订单量 / 订单总量* 100 ,5),'','%'),5),'','%') AS 累计占比 ,
订单明细
from(
select count(订单明细) 订单量,订单明细
from (
select 订单编号,GROUP_CONCAT(商家编码,'|',货品数量) 订单明细
from order20190801
group by 订单编号
)t1
group by 订单明细
order by count(订单明细) desc
)t2,(select count(distinct(订单编号)) 订单总量 from order20190801
)t3
#10.437秒!!!
Solve my own problem, but the execution time is relatively long, if there are optimization suggestions, I would greatly appreciate it
I have 2 data frames created from CSV files and there is another data frame which is a reference for these table. For e.g.
1 Employee demographic (Emp_id, dept_id)
2 Employee detail (Emp_id, RM_ID)
I have 3rd dataframe(dept_manager) which has only 2 columns (dept_id, RM_ID). Now I need to join table 1 and 2 referencing the 3rd dataframe.
Trying out in pandas(python) any help here would be much appreciated..Thanks in advance.
Table1
Empdemogr
Empid dept_id
1 10
2 20
1 30
Table2
Empdetail
Empid RM_id
1 E120
2 E140
3 E130
Table3
dept_manager
dept_id RM_id
10 E110
10 E120
10 E121
10 E122
10 E123
20 E140
20 E141
20 E142
30 E130
30 E131
30 E132
Output:
Emp_id dept_id RM_id
1 10 E120
2 20 E140
1 30 E130
So trying to bring this sql in python:
select a.Emp_id, a.dept_id, b.RM_id
Empdemogr a, Empdetail b, dept_manager d
where
a.emp_id=b.emp_id
and a.dept_id=d.dept_id
and b.RM_id=d.RM_id
Trying to figure out if you had a typo or you have wrong understanding. Your above SQL would not output the the result you are looking for based on the provided data. I do not think you will see dept_id '30' in there.
But Going by your SQL query, here is how you can write the same in python dataframe:
Preparing DataFrames (I will leave it up to you how you load the dataframes):
import pandas as pd
EmpployeeDemo=pd.read_csv(r"YourEmployeeDemoFile.txt")
EmpDetail=pd.read_csv(r"YourEmpDetailFile.txt")
Dept_Manager=pd.read_csv(r"YourDept_Manager.txt")
Code to Join the DataFrames:
joined_dataframe = pd.merge(pd.merge(EmpployeeDemo, EmpDetail, on="Empid"),Dept_Manager, on=["dept_id", "RM_id"])
print(joined_dataframe)
Now I have a list of tuple named "Data"
[
('171115090000',
Timestamp('2017-11-15 09:00:00'),
'PAIR1',
156.0)
]
I want to insert this list to Oracle DB, my code is
cur.executemany(
'''INSERT INTO A
("SID","DATE","ATT","VALUE")
VALUES(:1,:2,:3,:4)''',Data)
And it works well. However if I want to add/replace new records into this database, I have to create a table B to put those records then merge A and B.
Is there anything like on duplicate key update that I could finish my job without creating a new table?
I know I could select all records from A, convert them to a DataFrame and merge DataFrames in Python, is this a good solution?
Is there anything like on duplicate key update
In Oracle, it is called MERGE; have a look at the following example:
Table contents at the beginning:
SQL> select * From dept;
DEPTNO DNAME LOC
---------- -------------- -------------
10 ACCOUNTING NEW YORK
20 RESEARCH DALLAS
30 SALES CHICAGO
40 OPERATIONS BOSTON
MERGE statement:
SQL> merge into dept d
2 using (select deptno, dname, loc
3 from (select 10 deptno, 'ACC' dname, 'NY' loc from dual --> already exists, should be updated
4 union all
5 select 99 , 'NEW DEPT' , 'LONDON' from dual --> doesn't exists, should be inserted
6 )
7 ) x
8 on (d.deptno = x.deptno)
9 when matched then update set
10 d.dname = x.dname,
11 d.loc = x.loc
12 when not matched then insert (d.deptno, d.dname, d.loc)
13 values (x.deptno, x.dname, x.loc);
2 rows merged.
The result: as you can see, values for existing DEPTNO = 10 were updated, while the new DEPTNO = 99 was inserted into the table.
SQL> select * From dept;
DEPTNO DNAME LOC
---------- -------------- -------------
10 ACC NY
20 RESEARCH DALLAS
30 SALES CHICAGO
40 OPERATIONS BOSTON
99 NEW DEPT LONDON
SQL>
I don't speak Python so I can't compose code you might use, but I hope that you'll manage to do it yourself.
I need some help iterating over a groupby object in python. I have people nested under a single ID variable, and then under each one of those, they have balances for anywhere from 3 to 6 months. So, printing the groupby object looks, for example, like this:
(1, Primary BP Product Rpt Month Closing Balance
0 1 CHECK 201708 10.04
1 1 CHECK 201709 11.1
2 1 CHECK 201710 11.16
3 1 CHECK 201711 11.22
4 1 CHECK 201712 11.28
5 1 CHECK 201801 11.34)
(2, Primary BP Product Rpt Month Closing Balance
79 2 CHECK 201711 52.42
85 2 CHECK 201712 31.56
136 2 CHECK 201801 99.91)
I want to create another column that standardizes the closing balance based on their first amount. So the ideal output would then look like this:
(1, Primary BP Product Rpt Month Closing Balance standardized
0 1 CHECK 201708 10.04 0
1 1 CHECK 201709 11.1 1.1
2 1 CHECK 201710 11.16 1.16
3 1 CHECK 201711 11.22 1.22
4 1 CHECK 201712 11.28 1.28
5 1 CHECK 201801 11.34 1.34)
(2, Primary BP Product Rpt Month Closing Balance standardized
79 2 CHECK 201711 52.42 0
85 2 CHECK 201712 31.56 -20.86
136 2 CHECK 201801 99.91 47.79)
I just can't quite figure out how to make a nice for loop, or if there is any other way, that will iterate within the groups of a groupby object, taking the first value for closing balance and subtracting it from each closing balance to essentially create a difference score.
I solved it! Only two weeks later. Did it without the use of a groupby object. Here is how:
bpid = []
diffs = []
# These two lines were just a bit of cleaning needed to make the vals numeric
data['Closing Balance'] = data['Closing Balance'].str.replace(",", "")
data['Closing Balance'] = pd.to_numeric(data['Closing Balance'])
# Create a new variable in monthly_data that simply shows the increase in closing balance for each month,
# setting the first month to 0
for index, row in data.iterrows():
bp = row[0]
if bp not in bpid:
bpid.append(bp)
first = row[3]
bal = row[3]
diff = round(bal-first, 2)
diffs.append(diff)
row['balance increase'] = diff
# Just checking to make sure there are the right number of values. Same as data, so good to go
print(len(diffs))
# Convert my list of differences in closing balance to a series object, and merge with the monthly_data
se = pd.Series(diffs)
data['balance increase'] = se.values