sqlalchemy how to divide 2 columns from different table - python

I have 2 tables named as company_info and company_income:
company_info :
| id | company_name | staff_num | year |
|----|--------------|-----------|------|
| 0 | A | 10 | 2010 |
| 1 | A | 10 | 2011 |
| 2 | A | 20 | 2012 |
| 3 | B | 20 | 2010 |
| 4 | B | 5 | 2011 |
company_income :
| id | company_name | income | year |
|----|--------------|--------|------|
| 0 | A | 10 | 2010 |
| 1 | A | 20 | 2011 |
| 2 | A | 30 | 2012 |
| 3 | B | 20 | 2010 |
| 4 | B | 15 | 2011 |
Now I want to calculate average staff income of each company, the result looks like this:
result :
| id | company_name | avg_income | year |
|----|--------------|------------|------|
| 0 | A | 1 | 2010 |
| 1 | A | 2 | 2011 |
| 2 | A | 1.5 | 2012 |
| 3 | B | 1 | 2010 |
| 4 | B | 3 | 2011 |
how to get this result using python SQLalchemy ? The database of the table is MySQL.

Join the tables and do a standard sum. You'd want to either set yourself up a view in MySQL with this query or create straight in your program.
SELECT
a.CompanyName,
a.year,
(a.staff_num / b.income) as avg_income
FROM
company_info as a
LEFT JOIN
company_income as b
ON
a.company_name = b.company_name
AND
a.year = b.year
You'd want a few wheres as well (such as where staff_num is not null or not equal to 0 and same as income. Also if you can have multiple values for the same company / year in both columns then you'll want to do a SUM of the values in the column, then group by companyname and year)

Try this:
SELECT
info.company_name,
(inc.income / info.staff_num) as avg,
info.year
FROM
company_info info JOIN company_income inc
ON
info.company_name = inc.company_name
AND
info.year = inc.year

Related

How can i convert 31 colums (refering to the days of month) into a single datetime column?

I have this:
Febuary_Sells
(31 columns refering to the days of month)
Store | Product | 1 | 2 | 3 | 4 | 5 | ... | 31 |
Store 1 | Iphone | 0 | 3 | 1 | 3 | 2 | ... | 0 |
Store 1 | 4k TV | 1 | 4 | 2 | 3 | 0 | ... | 0 |
And i want to have something like this:
Store | Product | Date | Quantity |
Store 1 | Iphone | 01/02/2022 | 0 |
Store 1 | 4k TV | 01/02/2022 | 1 |
Store 1 | Iphone | 02/02/2022 | 3 |
Store 1 | 4k TV | 02/02/2022 | 4 |
I just want to get rid of the 31 columns and transform it into a datetime column (also keep the quantity selled of each item)
I really don't know how can i solve this problem...
For some reason, i can't put images on my question

Transform a Pandas dataframe in a pandas with multicolumns

I have the following pandas dataframe, where the column id is the dataframe index
+----+-----------+------------+-----------+------------+
| | price_A | amount_A | price_B | amount_b |
|----+-----------+------------+-----------+------------|
| 0 | 0.652826 | 0.941421 | 0.823048 | 0.728427 |
| 1 | 0.400078 | 0.600585 | 0.194912 | 0.269842 |
| 2 | 0.223524 | 0.146675 | 0.375459 | 0.177165 |
| 3 | 0.330626 | 0.214981 | 0.389855 | 0.541666 |
| 4 | 0.578132 | 0.30478 | 0.789573 | 0.268851 |
| 5 | 0.0943601 | 0.514878 | 0.419333 | 0.0170096 |
| 6 | 0.279122 | 0.401132 | 0.722363 | 0.337094 |
| 7 | 0.444977 | 0.333254 | 0.643878 | 0.371528 |
| 8 | 0.724673 | 0.0632807 | 0.345225 | 0.935403 |
| 9 | 0.905482 | 0.8465 | 0.585653 | 0.364495 |
+----+-----------+------------+-----------+------------+
And I want to convert this dataframe in to a multi column data frame, that looks like this
+----+-----------+------------+-----------+------------+
| | A | B |
+----+-----------+------------+-----------+------------+
| id | price | amount | price | amount |
|----+-----------+------------+-----------+------------|
| 0 | 0.652826 | 0.941421 | 0.823048 | 0.728427 |
| 1 | 0.400078 | 0.600585 | 0.194912 | 0.269842 |
| 2 | 0.223524 | 0.146675 | 0.375459 | 0.177165 |
| 3 | 0.330626 | 0.214981 | 0.389855 | 0.541666 |
| 4 | 0.578132 | 0.30478 | 0.789573 | 0.268851 |
| 5 | 0.0943601 | 0.514878 | 0.419333 | 0.0170096 |
| 6 | 0.279122 | 0.401132 | 0.722363 | 0.337094 |
| 7 | 0.444977 | 0.333254 | 0.643878 | 0.371528 |
| 8 | 0.724673 | 0.0632807 | 0.345225 | 0.935403 |
| 9 | 0.905482 | 0.8465 | 0.585653 | 0.364495 |
+----+-----------+------------+-----------+------------+
I've tried transforming my old pandas dataframe in to a dict this way:
dict = {"A": df[["price_a","amount_a"]], "B":df[["price_b", "amount_b"]]}
df = pd.DataFrame(dict, index=df.index)
But I had no success, how can I do that?
Try renaming columns manually:
df.columns=pd.MultiIndex.from_tuples([x.split('_')[::-1] for x in df.columns])
df.index.name='id'
Output:
A B b
price amount price amount
id
0 0.652826 0.941421 0.823048 0.728427
1 0.400078 0.600585 0.194912 0.269842
2 0.223524 0.146675 0.375459 0.177165
3 0.330626 0.214981 0.389855 0.541666
4 0.578132 0.304780 0.789573 0.268851
5 0.094360 0.514878 0.419333 0.017010
6 0.279122 0.401132 0.722363 0.337094
7 0.444977 0.333254 0.643878 0.371528
8 0.724673 0.063281 0.345225 0.935403
9 0.905482 0.846500 0.585653 0.364495
You can split the column names on the underscore and convert to a tuple. Once you map each split column name to a tuple, pandas will convert the Index to a MultiIndex for you. From there we just need to call swaplevel to get the letter level to come first and reassign to the dataframe.
note: in my input dataframe I replaced the column name "amount_b" with "amount_B" because it lined up with your expected output so I assumed it was a typo
df.columns = df.columns.str.split("_", expand=True).swaplevel()
print(df)
A B
price amount price amount
0 0.652826 0.941421 0.823048 0.728427
1 0.400078 0.600585 0.194912 0.269842
2 0.223524 0.146675 0.375459 0.177165
3 0.330626 0.214981 0.389855 0.541666
4 0.578132 0.304780 0.789573 0.268851
5 0.094360 0.514878 0.419333 0.017010
6 0.279122 0.401132 0.722363 0.337094
7 0.444977 0.333254 0.643878 0.371528
8 0.724673 0.063281 0.345225 0.935403
9 0.905482 0.846500 0.585653 0.364495

How to create a table resulting from joining of two or more table with this structure?

Lets say I have two tables with the following structure and same values-
+-----------+-----------+---------+-------+--------+---------+--------+---------+
| TEACHER | STUDENT | CLASS | SEC | HB_a | VHB_b | HG_c | VHG_d |
|-----------+-----------+---------+-------+--------+---------+--------+---------|
| 1 | - | - | - | 1 | 1 | 1 | 1 |
| - | 1 | 10 | D | 1 | 1 | 1 | 1 |
| - | 1 | 9 | D | 1 | 1 | 1 | 1 |
+-----------+-----------+---------+-------+--------+---------+--------+---------+
CLASS can go from 6-12 and SEC from A-Z,
*There's nothing in STUDENT, CLASS, SEC while there's some value in TEACHER and Vice-versa .
Now i want to create a table joining two tables with exact structure and data given above... I.e, I want the result to be something like below-
+-----------+-----------+---------+-------+--------+---------+--------+---------+
| TEACHER | STUDENT | CLASS | SEC | HB_a | VHB_b | HG_c | VHG_d |
|-----------+-----------+---------+-------+--------+---------+--------+---------|
| 2 | - | - | - | 2 | 2 | 2 | 2 |
| - | 2 | 10 | D | 2 | 2 | 2 | 2 |
| - | 2 | 9 | D | 2 | 2 | 2 | 2 |
+-----------+-----------+---------+-------+--------+---------+--------+---------+
I tried something like this but it doesn't work well, the output isn't what I want-
__tbl_sy = f"""
CREATE TABLE <tbl>
AS SELECT CLASS, SEC, SUM(TEACHER), SUM(STUDENT), SUM(HB_a), SUM(VHB_b), SUM(HG_c), SUM(VHG_d)
FROM <tbl1>
UNION
SELECT CLASS, SEC, SUM(TEACHER), SUM(STUDENT), SUM(HB_a), SUM(VHB_b), SUM(HG_c), SUM(VHG_d)
FROM <tbl2>
GROUP BY CLASS, SEC
"""
Cursor.execute(__tbl_sy)
For the sample data that you posted this will work:
select
sum(teacher) teacher, sum(student) student,
class, sec,
sum(hb_a) hb_a, sum(vhb_b) vhb_b, sum(hg_c) hg_c, sum(vhg_d) vhg_d
from (
select * from tbl1
union all
select * from tbl2
)
group by class, sec
See the demo.
Results:
| teacher | student | CLASS | SEC | hb_a | vhb_b | hg_c | vhg_d |
| ------- | ------- | ----- | --- | ---- | ----- | ---- | ----- |
| 2 | | | | 2 | 2 | 2 | 2 |
| | 2 | 10 | D | 2 | 2 | 2 | 2 |
| | 2 | 9 | D | 2 | 2 | 2 | 2 |

How do I get the change from the same quarter in the previous year in a pandas datatable grouped by more than 1 column

I have a datatable that looks like this (but with more than 1 country and many more years worth of data):
| Country | Year | Quarter | Amount |
-------------------------------------------
| UK | 2014 | 1 | 200 |
| UK | 2014 | 2 | 250 |
| UK | 2014 | 3 | 200 |
| UK | 2014 | 4 | 150 |
| UK | 2015 | 1 | 230 |
| UK | 2015 | 2 | 200 |
| UK | 2015 | 3 | 200 |
| UK | 2015 | 4 | 160 |
-------------------------------------------
I want to get the change for each row from the same quarter in the previous year. So for the first 4 rows in the example the change would be null (because there is no previous data for that quarter). For 2015 quarter 1, the difference would be 30 (because quarter 1 for the previous year is 200, so 230 - 200 = 30). So the data table I'm trying to get is:
| Country | Year | Quarter | Amount | Change |
---------------------------------------------------|
| UK | 2014 | 1 | 200 | NaN |
| UK | 2014 | 2 | 250 | NaN |
| UK | 2014 | 3 | 200 | NaN |
| UK | 2014 | 4 | 150 | NaN |
| UK | 2015 | 1 | 230 | 30 |
| UK | 2015 | 2 | 200 | -50 |
| UK | 2015 | 3 | 200 | 0 |
| UK | 2015 | 4 | 160 | 10 |
---------------------------------------------------|
From looking at other questions I've tried using the .diff() method but I'm not quite sure how to get it to do what I want (or if I'll actually need to do something more brute force to work this out), e.g. I've tried:
df.groupby(by=["Country", "Year", "Quarter"]).sum().diff().head(10)
This yields the difference from the previous row in the table as a whole though, rather than the difference from the same quarter for the previous year.
Since you want the change over Country and quarter and not the year, you have to remove the year from the group.
df['Change'] = df.groupby(['Country', 'Quarter']).Amount.diff()

Discount for repeated for rows

User creates one plan to purchase some items on N (five) diff dates.
+--------------+---------+------------+
| plan_date_id | plan_id | ad_date |
+--------------+---------+------------+
| 1 | 1 | 2015-09-13 |
| 2 | 1 | 2015-09-15 |
| 3 | 1 | 2015-09-17 |
| 4 | 1 | 2015-09-21 |
| 5 | 1 | 2015-09-24 |
+--------------+---------+------------+
Week Span: For each product week span will be calculated based on the date on which the product was sold for the first time + 6 to date part.
i.e., for product_ID 10, first purchase was made on 2015-09-13, so
the week span will be from 2015-09-13 to 2015-09-19(2015-09-13 + 6).
Discount logic for a product: (Total No. of Repetition in a Week Span - 1) * 10%.
But maximum discount can be 30%.
+---------------+--------------+-------------+
| plan_product_id | plan_date_id | product_id |
+-----------------+--------------+------------+
| 1 | 1 | 10 |
| 2 | 2 | 5715 |
| 3 | 2 | 10 |
| 4 | 3 | 10 |
| 5 | 3 | 128900 |
| 6 | 4 | 10 |
| 7 | 5 | 10 |
+-----------------+--------------+------------+
So in my example I want discount as follow.
+---------------+--------------+-------------+------------+
| plan_product_id | plan_date_id | product_id | discount |
+-----------------+--------------+------------+------------+
| 1 | 1 | 10 | 0% |
| 2 | 2 | 5715 | 0% |
| 3 | 2 | 10 | 10% |
| 4 | 3 | 10 | 20% |
| 5 | 3 | 128900 | 0% |
| 6 | 4 | 10 | 0% |
| 7 | 5 | 10 | 0% |
+-----------------+--------------+------------+------------+
Please Note there will be 0% discount in plan_product_id 6 and 7
Currently, I am doing discount calculation in python.
First get all required records. Then create a dict with product_id as key,
in value there is another dict holding base date and repeated times in a week. Then loop to all record.
What will be the best way to do it?
Is it possible to do it only from MySQL or Django Orm?
Will looping in MySQL be more performance efficient?

Categories