How can I Group By Month from a Date field - python

I have a data frame similar to this one
| date | Murders | State |
|-----------|--------- |------- |
| 6/2/2017 | 100 | Ags |
| 5/23/2017 | 200 | Ags |
| 5/20/2017 | 300 | BC |
| 6/22/2017 | 400 | BC |
| 6/21/2017 | 500 | Ags |
I would like to group the above data by month and state to get an output as:
| date | Murders(SUM) | State |
|-----------|--------- |------- |
| January | 100 | Ags |
| February | 200 | Ags |
| March | 300 | Ags |
| .... | .... | Ags |
| January | 400 | BC |
| February | 500 | BC |
.... .... ..
I tried with this:
dg = DF.groupby(pd.Grouper(key='date', freq='1M')).sum() # groupby each 1 month
dg.index = dg.index.strftime('%B')
But these lines are only add the murders by month but without taking in count the State

We can do
df.groupby([pd.to_datetime(df.date).dt.strftime('%B'),df.State]).Murders.sum().reset_index()

Related

Reading data to python dataframe

I am struggling with reading data into python dataframe. Am R programmer trying to do stuff in Python. So how would I read the following data into pandas dataframe? The data is actually the result of calling API.
Thanks.
b'{"mk_id":"1200011617609","doc_type":"sales_order","opr_code":"0","count_code":"1051885/2022","doc_date":"2022-08-23+02:00","partner":{"mk_id":"400020633177","business_entity":"false","taxpayer":"false","foreign_county":"true","customer":"Emilia Chabadova","street":"Ga\xc5\xa1tanov\xc3\xa1 2915/13","street_number":"2915/13","post_number":"92101","place":"Pie\xc5\xa1\xc5\xa5any","country":"Slovakia","count_code":"5770789334526546744","partner_contact":{"gsm":"+421949340254","email":"emily.chabadova#gmail.com"},"mk_address_id":"400020530565","country_iso_2":"SK","buyer":"true","supplier":"false"},"receiver":{"mk_id":"400020633177","business_entity":"false","taxpayer":"false","foreign_county":"true","customer":"Emilia Chabadova","street":"Ga\xc5\xa1tanov\xc3\xa1 2915/13","street_number":"2915/13","post_number":"92101","place":"Pie\xc5\xa1\xc5\xa5any","country":"Slovakia","count_code":"5770789334526546744","partner_contact":{"gsm":"+421949340254","email":"emily.chabadova#gmail.com"},"mk_address_id":"400020530565","country_iso_2":"SK","buyer":"true","supplier":"false"},"currency_code":"EUR","status_code":"Zaklju\xc4\x8dena","doc_created_email":"stifter.rok#gmail.com","buyer_order":"SK-956103","warehouse":"glavno","delivery_type":"Gls_sk","product_list":[{"count_code":"54","mk_id":"266405022384","code":"MSS","name":"Mousse","unit":"kos","amount":"1","price":"16.66","price_with_tax":"19.99","tax":"200"},{"count_code":"53","mk_id":"266405022383","code":"MIT","name":"Mitt","unit":"kos","amount":"1","price":"0","tax":"200"},{"count_code":"48","mk_id":"266404892511","code":"TM","name":"Tanning mist","name_desc":"TM","unit":"kos","amount":"1","price":"0","tax":"200"}],"extra_column":[{"name":"tracking_number","value":"91114278162"}],"sum_basic":"16.66","sum_tax_200":"3.33","sum_all":"19.99","sum_paid":"19.99","profit_center":"SHINE BROWN, PROIZVODNJA, TRGOVINA IN STORITVE, D.O.O.","bank_ref_number":"10518852022","method_of_payment":"Pla\xc4\x8dilo po povzetju","order_create_ts":"2022-08-23T09:43:00+02:00","created_ts":"2022-08-23T11:59:14+02:00","shipped_date":"2022-08-24+02:00","doc_link_list":[{"mk_id":"266412181173","count_code":"SK-MK-36044","doc_type":"sales_bill_foreign"},{"mk_id":"400015161112","count_code":"1043748/2022","doc_type":"warehouse_packing_list"},{"mk_id":"1200011617609","count_code":"1051885/2022","doc_type":"sales_order"}]}'
you can start by doing something like this :
result = {"mk_id":"1200011617609","doc_type":"sales_order","opr_code":"0","count_code":"1051885/2022","doc_date":"2022-08-23+02:00","partner":{"mk_id":"400020633177","business_entity":"false","taxpayer":"false","foreign_county":"true","customer":"Emilia Chabadova","street":"Ga\xc5\xa1tanov\xc3\xa1 2915/13","street_number":"2915/13","post_number":"92101","place":"Pie\xc5\xa1\xc5\xa5any","country":"Slovakia","count_code":"5770789334526546744","partner_contact":{"gsm":"+421949340254","email":"emily.chabadova#gmail.com"},"mk_address_id":"400020530565","country_iso_2":"SK","buyer":"true","supplier":"false"},"receiver":{"mk_id":"400020633177","business_entity":"false","taxpayer":"false","foreign_county":"true","customer":"Emilia Chabadova","street":"Ga\xc5\xa1tanov\xc3\xa1 2915/13","street_number":"2915/13","post_number":"92101","place":"Pie\xc5\xa1\xc5\xa5any","country":"Slovakia","count_code":"5770789334526546744","partner_contact":{"gsm":"+421949340254","email":"emily.chabadova#gmail.com"},"mk_address_id":"400020530565","country_iso_2":"SK","buyer":"true","supplier":"false"},"currency_code":"EUR","status_code":"Zaklju\xc4\x8dena","doc_created_email":"stifter.rok#gmail.com","buyer_order":"SK-956103","warehouse":"glavno","delivery_type":"Gls_sk","product_list":[{"count_code":"54","mk_id":"266405022384","code":"MSS","name":"Mousse","unit":"kos","amount":"1","price":"16.66","price_with_tax":"19.99","tax":"200"},{"count_code":"53","mk_id":"266405022383","code":"MIT","name":"Mitt","unit":"kos","amount":"1","price":"0","tax":"200"},{"count_code":"48","mk_id":"266404892511","code":"TM","name":"Tanning mist","name_desc":"TM","unit":"kos","amount":"1","price":"0","tax":"200"}],"extra_column":[{"name":"tracking_number","value":"91114278162"}],"sum_basic":"16.66","sum_tax_200":"3.33","sum_all":"19.99","sum_paid":"19.99","profit_center":"SHINE BROWN, PROIZVODNJA, TRGOVINA IN STORITVE, D.O.O.","bank_ref_number":"10518852022","method_of_payment":"Pla\xc4\x8dilo po povzetju","order_create_ts":"2022-08-23T09:43:00+02:00","created_ts":"2022-08-23T11:59:14+02:00","shipped_date":"2022-08-24+02:00","doc_link_list":[{"mk_id":"266412181173","count_code":"SK-MK-36044","doc_type":"sales_bill_foreign"},{"mk_id":"400015161112","count_code":"1043748/2022","doc_type":"warehouse_packing_list"},{"mk_id":"1200011617609","count_code":"1051885/2022","doc_type":"sales_order"}]}
pd.DataFrame([result])
Here is a way using BytesIO and json.normalize:
from ast import literal_eval
from io import BytesIO
import pandas as pd
data = b'{"mk_id":"1200011617609","doc_type":"sales_order","opr_code":"0","count_code":"1051885/2022","doc_date":"2022-08-23+02:00","partner":{"mk_id":"400020633177","business_entity":"false","taxpayer":"false","foreign_county":"true","customer":"Emilia Chabadova","street":"Ga\xc5\xa1tanov\xc3\xa1 2915/13","street_number":"2915/13","post_number":"92101","place":"Pie\xc5\xa1\xc5\xa5any","country":"Slovakia","count_code":"5770789334526546744","partner_contact":{"gsm":"+421949340254","email":"emily.chabadova#gmail.com"},"mk_address_id":"400020530565","country_iso_2":"SK","buyer":"true","supplier":"false"},"receiver":{"mk_id":"400020633177","business_entity":"false","taxpayer":"false","foreign_county":"true","customer":"Emilia Chabadova","street":"Ga\xc5\xa1tanov\xc3\xa1 2915/13","street_number":"2915/13","post_number":"92101","place":"Pie\xc5\xa1\xc5\xa5any","country":"Slovakia","count_code":"5770789334526546744","partner_contact":{"gsm":"+421949340254","email":"emily.chabadova#gmail.com"},"mk_address_id":"400020530565","country_iso_2":"SK","buyer":"true","supplier":"false"},"currency_code":"EUR","status_code":"Zaklju\xc4\x8dena","doc_created_email":"stifter.rok#gmail.com","buyer_order":"SK-956103","warehouse":"glavno","delivery_type":"Gls_sk","product_list":[{"count_code":"54","mk_id":"266405022384","code":"MSS","name":"Mousse","unit":"kos","amount":"1","price":"16.66","price_with_tax":"19.99","tax":"200"},{"count_code":"53","mk_id":"266405022383","code":"MIT","name":"Mitt","unit":"kos","amount":"1","price":"0","tax":"200"},{"count_code":"48","mk_id":"266404892511","code":"TM","name":"Tanning mist","name_desc":"TM","unit":"kos","amount":"1","price":"0","tax":"200"}],"extra_column":[{"name":"tracking_number","value":"91114278162"}],"sum_basic":"16.66","sum_tax_200":"3.33","sum_all":"19.99","sum_paid":"19.99","profit_center":"SHINE BROWN, PROIZVODNJA, TRGOVINA IN STORITVE, D.O.O.","bank_ref_number":"10518852022","method_of_payment":"Pla\xc4\x8dilo po povzetju","order_create_ts":"2022-08-23T09:43:00+02:00","created_ts":"2022-08-23T11:59:14+02:00","shipped_date":"2022-08-24+02:00","doc_link_list":[{"mk_id":"266412181173","count_code":"SK-MK-36044","doc_type":"sales_bill_foreign"},{"mk_id":"400015161112","count_code":"1043748/2022","doc_type":"warehouse_packing_list"},{"mk_id":"1200011617609","count_code":"1051885/2022","doc_type":"sales_order"}]}'
df = pd.DataFrame(BytesIO(data))
df[0] = df[0].str.decode("utf-8").apply(literal_eval)
df = pd.json_normalize(
data=df.pop(0),
record_path="product_list",
meta=["mk_id", "doc_type", "opr_code", "count_code", "doc_date", "currency_code",
"status_code", "doc_created_email", "buyer_order", "warehouse", "delivery_type"],
meta_prefix="meta."
)
print(df.to_markdown())
| | count_code | mk_id | code | name | unit | amount | price | price_with_tax | tax | name_desc | meta.mk_id | meta.doc_type | meta.opr_code | meta.count_code | meta.doc_date | meta.currency_code | meta.status_code | meta.doc_created_email | meta.buyer_order | meta.warehouse | meta.delivery_type |
|---:|-------------:|-------------:|:-------|:-------------|:-------|---------:|--------:|-----------------:|------:|:------------|--------------:|:----------------|----------------:|:------------------|:-----------------|:---------------------|:-------------------|:-------------------------|:-------------------|:-----------------|:---------------------|
| 0 | 54 | 266405022384 | MSS | Mousse | kos | 1 | 16.66 | 19.99 | 200 | nan | 1200011617609 | sales_order | 0 | 1051885/2022 | 2022-08-23+02:00 | EUR | Zaključena | stifter.rok#gmail.com | SK-956103 | glavno | Gls_sk |
| 1 | 53 | 266405022383 | MIT | Mitt | kos | 1 | 0 | nan | 200 | nan | 1200011617609 | sales_order | 0 | 1051885/2022 | 2022-08-23+02:00 | EUR | Zaključena | stifter.rok#gmail.com | SK-956103 | glavno | Gls_sk |
| 2 | 48 | 266404892511 | TM | Tanning mist | kos | 1 | 0 | nan | 200 | TM | 1200011617609 | sales_order | 0 | 1051885/2022 | 2022-08-23+02:00 | EUR | Zaključena | stifter.rok#gmail.com | SK-956103 | glavno | Gls_sk |

How to get the column values of a Dataframe into another dataframe as a new column after matching the values in columns that both dataframes have?

I'm trying to create a new column in a DataFrame and storing it with values stored in a different dataframe by first comparing the values of columns that both dataframes have. For example:
df1 >>>
| name | team | week | dates | interceptions | pass_yds | rating |
| ---- | ---- | -----| ---------- | ------------- | --------- | -------- |
| maho | KC | 1 | 2020-09-10 | 0 | 300 | 105 |
| went | PHI | 1 | 2020-09-13 | 2 | 225 | 74 |
| lock | DEN | 1 | 2020-09-14 | 0 | 150 | 89 |
| dris | DEN | 2 | 2020-09-20 | 1 | 220 | 95 |
| went | PHI | 2 | 2020-09-20 | 2 | 250 | 64 |
| maho | KC | 2 | 2020-09-21 | 1 | 245 | 101 |
df2 >>>
| name | team | week | catches | rec_yds | rec_tds |
| ---- | ---- | -----| ------- | ------- | ------- |
| ertz | PHI | 1 | 5 | 58 | 1 |
| fant | DEN | 2 | 6 | 79 | 0 |
| kelc | KC | 2 | 8 | 105 | 1 |
| fant | DEN | 1 | 3 | 29 | 0 |
| kelc | KC | 1 | 6 | 71 | 1 |
| ertz | PHI | 2 | 7 | 91 | 2 |
| goed | PHI | 2 | 2 | 15 | 0 |
I want to create a dates column in df2 with the values of the dates stored in the dates column in df1 after matching the teams and the weeks columns. After the matching, df2 in this example should look something like this:
df2 >>>
| name | team | week | catches | rec_yds | rec_tds | dates |
| ---- | ---- | -----| ------- | ------- | ------- | ---------- |
| ertz | PHI | 1 | 5 | 58 | 1 | 2020-09-13 |
| fant | DEN | 2 | 6 | 79 | 0 | 2020-09-20 |
| kelc | KC | 2 | 8 | 105 | 1 | 2020-09-20 |
| fant | DEN | 1 | 3 | 29 | 0 | 2020-09-14 |
| kelc | KC | 1 | 6 | 71 | 1 | 2020-09-10 |
| ertz | PHI | 2 | 7 | 91 | 2 | 2020-09-20 |
| goed | PHI | 2 | 2 | 15 | 0 | 2020-09-20 |
I'm looking for an optimal solution. I've already tried nested for loops and comparing the week and team columns from both dataframes together but that hasn't worked. At this point I'm all out of ideas. Please help!
Disclaimer: The actual DataFrames I'm working with are a lot larger. They have a lot more rows, columns, and values (i.e. a lot more teams in the team columns, a lot more dates in the dates columns, and a lot more weeks in the week columns)

Manipulate pandas columns with datetime

Please see this SO post Manipulating pandas columns
I shared this dataframe:
+----------+------------+-------+-----+------+
| Location | Date | Event | Key | Time |
+----------+------------+-------+-----+------+
| i2 | 2019-03-02 | 1 | a | |
| i2 | 2019-03-02 | 1 | a | |
| i2 | 2019-03-02 | 1 | a | |
| i2 | 2019-03-04 | 1 | a | 2 |
| i2 | 2019-03-15 | 2 | b | 0 |
| i9 | 2019-02-22 | 2 | c | 0 |
| i9 | 2019-03-10 | 3 | d | |
| i9 | 2019-03-10 | 3 | d | 0 |
| s8 | 2019-04-22 | 1 | e | |
| s8 | 2019-04-25 | 1 | e | |
| s8 | 2019-04-28 | 1 | e | 6 |
| t14 | 2019-05-13 | 3 | f | |
+----------+------------+-------+-----+------+
This is a follow-up question. Consider two more columns after Date as shown below.
+-----------------------+----------------------+
| Start Time (hh:mm:ss) | Stop Time (hh:mm:ss) |
+-----------------------+----------------------+
| 13:24:38 | 14:17:39 |
| 03:48:36 | 04:17:20 |
| 04:55:05 | 05:23:48 |
| 08:44:34 | 09:13:15 |
| 19:21:05 | 20:18:57 |
| 21:05:06 | 22:01:50 |
| 14:24:43 | 14:59:37 |
| 07:57:32 | 09:46:21
| 19:21:05 | 20:18:57 |
| 21:05:06 | 22:01:50 |
| 14:24:43 | 14:59:37 |
| 07:57:32 | 09:46:21 |
+-----------------------+----------------------+
The task remains the same - to get the time difference but in hours, corresponding to the Stop Time of the first row and Start Time of the last row
for each Key.
Based on the answer, I was trying something like this:
df['Time']=df.groupby(['Location','Event']).Date.\
transform(lambda x : (x.iloc[-1]-x.iloc[0]))[~df.duplicated(['Location','Event'],keep='last')]
df['Time_h']=df.groupby(['Location','Event'])['Start Time (hh:mm:ss)','Stop Time (hh:mm:ss)'].\
transform(lambda x,y : (x.iloc[-1]-y.iloc[0]))[~df.duplicated(['Location','Event'],keep='last')] # This gives an error on transform
to get the difference in days and hours separately and then combine. Is there a better way?

sqlalchemy how to divide 2 columns from different table

I have 2 tables named as company_info and company_income:
company_info :
| id | company_name | staff_num | year |
|----|--------------|-----------|------|
| 0 | A | 10 | 2010 |
| 1 | A | 10 | 2011 |
| 2 | A | 20 | 2012 |
| 3 | B | 20 | 2010 |
| 4 | B | 5 | 2011 |
company_income :
| id | company_name | income | year |
|----|--------------|--------|------|
| 0 | A | 10 | 2010 |
| 1 | A | 20 | 2011 |
| 2 | A | 30 | 2012 |
| 3 | B | 20 | 2010 |
| 4 | B | 15 | 2011 |
Now I want to calculate average staff income of each company, the result looks like this:
result :
| id | company_name | avg_income | year |
|----|--------------|------------|------|
| 0 | A | 1 | 2010 |
| 1 | A | 2 | 2011 |
| 2 | A | 1.5 | 2012 |
| 3 | B | 1 | 2010 |
| 4 | B | 3 | 2011 |
how to get this result using python SQLalchemy ? The database of the table is MySQL.
Join the tables and do a standard sum. You'd want to either set yourself up a view in MySQL with this query or create straight in your program.
SELECT
a.CompanyName,
a.year,
(a.staff_num / b.income) as avg_income
FROM
company_info as a
LEFT JOIN
company_income as b
ON
a.company_name = b.company_name
AND
a.year = b.year
You'd want a few wheres as well (such as where staff_num is not null or not equal to 0 and same as income. Also if you can have multiple values for the same company / year in both columns then you'll want to do a SUM of the values in the column, then group by companyname and year)
Try this:
SELECT
info.company_name,
(inc.income / info.staff_num) as avg,
info.year
FROM
company_info info JOIN company_income inc
ON
info.company_name = inc.company_name
AND
info.year = inc.year

Interpolate in SQL based on subgroup in django models

I have the following sheetinfo model with the following data:
| Trav | Group | Subgroup | Sheet_num | T_val |
| SAT123A01 | SAT123 | A | 1 | 400 |
| SAT123A02 | SAT123 | A | 2 | 0 |
| SAT123A03 | SAT123 | A | 3 | 0 |
| SAT123A04 | SAT123 | A | 4 | 0 |
| SAT123A05 | SAT123 | A | 5 | 500 |
| SAT123B05 | SAT123 | B | 5 | 400 |
| SAT123B04 | SAT123 | B | 4 | 0 |
| SAT123B03 | SAT123 | B | 3 | 0 |
| SAT123B02 | SAT123 | B | 2 | 500 |
| SAT124A01 | SAT124 | A | 1 | 400 |
| SAT124A02 | SAT124 | A | 2 | 0 |
| SAT124A03 | SAT124 | A | 3 | 0 |
| SAT124A04 | SAT124 | A | 4 | 475 |
I would like to interpolate and update the table with the correct T_val.
Formula is:
new_t_val = delta / (cnt -1) * sheet_num + min_tvc_of_subgroup
For instance the top 5:
| Trav | Group | Subgroup | Sheet_num | T_val |
| SAT123A01 | SAT123 | A | 1 | 400 |
| SAT123A02 | SAT123 | A | 2 | 425 |
| SAT123A03 | SAT123 | A | 3 | 450 |
| SAT123A04 | SAT123 | A | 4 | 475 |
| SAT123A05 | SAT123 | A | 5 | 500 |
I have a django query that works to update the data, however it is SLOW and stops after a while (due to type errors etc.)
My question is there a way to accomplish this in SQL?
The ability to do this as one database call doesn't exist in stock Django. 3rd party packages exist though: https://github.com/aykut/django-bulk-update
Example of how that package works:
rows = Model.objects.all()
for row in rows:
# Modify rows as appropriate
row.T_val = delta / (cnt -1) * row.sheet_num + min_tvc_of_subgroup
Model.objects.bulk_update(rows)
For datasets up to the 1,000,000 range, this should have reasonable performance. Most of the bottleneck in iterating through and .save()-ing each object is the overhead on a database call. The python part is reasonably fast. The above example has only two database calls so it will be perhaps an order of magnitude or two faster.

Categories