I've come across a situation at our work where we need to create a report to measure our field service staff occupation for the future (as a forecast).
We have all the data at SAP as the example below, extracted as an excel sheet:
We need to calculate how many Man Days (column C) we have per month (start date and end date) and per work center (column G), and also by activity type (column J). Per month we have a calculation where we have the number of working days in a month multiplied by the number of employees. The idea is to have a bar chart with a line representing the capacity.
It is possible to do this manually but i'm trying to find a more practical way because today it is an extremely manual job with copy and paste all over the place. Does anyone have any idea on how to do this with Python or even Power BI?
Related
Overflow Data/CSV/Pandas peeps hivemind....I need your help!
I've only recently started using Python/Pandas and I found a really good project to possibly work on, that would save me a lot of time.
I do weekly reports and report on the differences in data week by week.
I dont know Pandas 100% but I dont think this would be that hard to do with code and I feel like this project would be a great way for me to learn.
Here is an example of the report I have:
Report Example
Now, I have a list of items from the items list (and gets concatenated in the item info column) that I'm to be reporting on:
I'm essentially trying to have code that can compute:
-IF the name (from my list) is found in the item info column AND the Week number(s) is a particular number AND the year(s) is 2022 THEN aggregate the total number of the POS/sales altogether as data A
&
-IF there is viable data there as well for Week 16 (compute the similar above info for that week as data B), then subtract the difference between these weeks (A and B) and output that data to me as information point C (aka the difference)
-THEN if that difference is positive, divide C by B (aka, give me the percentage of that move)
Tl:dr-I want to aggregate the total sales of an item for the week and subtract it from the corresponding amount for the previous week for the same item and verify the difference, as well as the percentage of movement in amounts.
I only know so much in Pandas right now, would anyone be able to point me in a direction that could help? I so feel like this shouldn't be that hard to do/I'd love to make it a weekend project and saves myself a good bit of time at work and learn how to automate some work tasks too. :)
I am working on a panel dataset that includes daily stock returns of 450 firms for 5 years and daily ESG score(momentum based) for 5 years. I want to regress stock return on daily ESG scores, keeping Firm and year fixed effect. I have used linearmodels.panel function in python and set the index('Stock ticker", "Date") before running the regressions with entity and time effects. In the regression result, the number of entities shows 450, which is perfect but the time period shows 1800. I am wondering how python is capturing the time effects? Is it based on year or some other way? What I want is a year fixed effects, where for a particular year all firm will have same indicator variable. Can someone please help me to do it in the right way?
the image shows the format of the data, where panel is based on daily returns
Sounds like your model is capturing daily fixed effects instead of yearly fixed effects. This is happening because you set Date as an index, so you're telling Python that you want one fixed effect per date.
You have to create a new column that only contains the year. That is, convert the date column to datetime format (see pandas.to_datetime) and then:
# Extract year from Date
df['Year'] = pd.DatetimeIndex(df['Date']).year
# Set indices
df = df.set_index(['Ticker','Year'])
Then run your model.
I recommend using linearmodels.PanelOLS because that module is specifically made for fitting fixed effects models.
For future reference, post your code and a replicable example so we can help you out more easily.
Which modelling strategy (time frame, features, technique) would you recommend to forecast 3-month sales for total customer base?
At my company, we often analyse the effect of e.g. marketing campaigns that run at the same time for the total customer base. In order to get an idea of the true incremental impact of the campaign, we - among other things - want to use predicted sales as a counterfactual for the campaign, i.e. what sales were expected to be assuming no marketing campaign.
Time frame used to train the model I'm currently considering 2 options (static time frame and rolling window) - let me know what you think. 1. Static: Use the same period last year as the dependent variable to build a specific model for this particular 3 month time frame. Data of 12 months before are used to generate features. 2. Use a rolling window logic of 3 months, dynamically creating dependent time frames and features. Not yet sure what the benefit of that would be. It uses more recent data for the model creation but feels less specific because it uses any 3 month period in a year as dependent. Not sure what the theory says for this particular example. Experiences, thoughts?
Features - currently building features per customer from one year pre-period data, e.g. Sales in individual months, 90,180,365 days prior, max/min/avg per customer, # of months with sales, tenure, etc. Takes quite a lot of time - any libraries/packages you would recommend for this?
Modelling technique - currently considering GLM, XGboost, S/ARIMA, and LSTM networks. Any experience here?
To clarify, even though I'm considering e.g. ARIMA, I do not need to predict any seasonal patterns of the 3 month window. As a first stab, a single number, total predicted sales of customer base for these 3 months, would be sufficient.
Any experience or comment would be highly appreciated.
Thanks,
F
I have this daily stats churned out from a system which outputs total sales and units sold per region group. For my analysis, I want to breakdown the entries into regions instead of region group. I'm trying to look for a way to split each row into per region with the respective measures.
I have historical percentages on the market share per region which I'll use to come up with the estimated sales and units sold.
I can do this manually in excel but given how i'll be doing this on a weekly basis, I'm looking for a way to automate it via python.
My data: https://imgur.com/a/pBr3y4D
Goal: https://imgur.com/a/Uc56PVR
Well, first of all, when you're doing DS researches try to find the most appropriate way in your personal case. There's nothing bad in using all Excel functionality to solve your issue, scripting, etc.
However, if you really-really want to use pandas, then what I would do in your case - just .append() and then split on regions and grouping by sales or made up a function with for..loop.
I am using QuantLib 1.7 with the Python interface.
I have constructed the JPY Fixed-Float swap curve following the standard convention. For the swap schedules I have a JointCalendar with Japan and UnitedKingdom. My JPYLibor index has the UK calendar only.
When I set the market date to 2009-May-1, I do a bootstrap using PiecewiseFlatForward with settlement date 2009-May-8 because in the Japan calendar there was a long holiday from 2009-May-4 (monday) to 2009-May-6.
Now, with this bootstraped curve, I try to value a swap that has a floating payment on 2009-May-7. When I try to value it (or compute the amount() function of the next floatingLeg cashflow which has a reset date on 2009-May-5) I get the error message "2nd leg: negative time (-0.00277778) given".
I guess that this is related to the fact that 2009-May-5, which is the London fixing date for value date 2009-May-7, falls on a Japanese holiday?
My swap payments schedules and reset schedule are matching Bloomberg so I am confident in theory is the correct convention. I have read some old posts regarding apparently a similar issue for a US swap, but as far as I understood this was a bug which was corrected around the time of QuantLib 0.9.
Could my problem be related to the same bug or I am not using QuantLib correctly?
The problem is that the value date for the payment, May 7th, is between today's date and the reference date of the curve. The fixing needs to be forecast, since it's in the future (the fixing date is on May 5th); but because the curve effectively starts on May 8th, it can't return the May 7th discount which is required to forecast the fixing.
The reason why this doesn't usually happen is that, when the value date is between today and the reference date, the fixing date is usually before today's date and thus the fixing can be loaded from past ones.
In this particular case, the way to make it work would be to create a curve with no settlement days so that its reference date is the same as today's date. If you then wanted the price as-of May 8th, you'd have to manually adjust the swap NPV for the discount between May 1st and 8th.