Trying to create a ML model - python

https://docs.google.com/spreadsheets/d/1_UkbdbxHKKfS7ibnR7HVS2lYE_Cr0WMk5zKRbTBddJY/edit?usp=sharing
above is the data table in thata table
i am trying to find the maximum numnber of product a particular cust_id(Customer purchased)
and customer id are same in multiple and expect
output as
CustId
Product_ID
1
Food
2
Electronic
and need to plot this also
so like this for any given data the ml model should give me the output
i tried groupby function and many function in pandas as im new to machince learning unable to slove this any one can help me.

Related

Why my data visualization model takes index values instead of "Year" column values

Im trying to do visualize my data. In my table ı have 3 column and ım trying to visualize them, but when ı write the "Year" column my model doesn't take Year column and it takes index values of year table's, so can you please help me ım new learner about data thing.
This is my table that ım working on it
This is the problem that ı mentioned

Using multivariable LSTM to predict only certain values

Okay, so, the question might be a bit tricky.
For a project I'm working on, I'm supposed to predict sales values from a store for certain products. Easy enough, I've done two functional models that, analyzing the sales over the past 10 years of a single product, is capable of predicting the future sales.
However, here's where it gets complicated:
My dataframe looks something like this:
df={month : [...], id : [...], n_sales : [...], group : [...], brand : [...]}
Id refers to the product, whereas group refers to the type of product and brand is just the brand.
It's important to understand that, of course, a single id has only one group and one brand, contrary to them since they both can have multiple different id's.
Finally, my data is organized by month (ascendant) and by ID (also ascendant).
Meaning that, let's say the store has 50 products (50 id's).
Then the first 50 rows of my dataset would be:
----Date----|--Id--|--n_sales--|......
2012-01-01 | 1 ......
2012-01-01 | 2 ......
2012-01-01 | 3 ......
......
2012-01-01 | 50 .....
Then the next 50 rows would be the respective sales of each product for the month 2012-02-01 and so on until now.
I'm sorry if this is confusing, I'm trying to explain it as clear as I can.
Okay, I'm almost done. It's understandable that, if I isolate a single product, it would be easy to analyze the data.
I could just plot the sales from the known months alongside the sales from the prediction.
However, in order to make a more accurate prediction, I was asked to run a LSTM multivariable model, meaning that I have to take into account both group and brand. This, of course, means training my model with all the data from all the products. This is better understood with an example:
Let's say a new ice cream from Nestle was just created last November. Only analyzing the sales from that ice cream could not predict that the sales in summer will go up, since the only data the model would have is the few sales made in the cold months.
Nonetheless, if I analyze all the products, LSTM would know that, products from Nestle sell considerably more in summer and would take this into account when making the prediction for this new product.
And there's the problem, so now, getting to the question, how can I analyze all the data, from all the products but only get the predictions from a single Id?
Note: It has to be with LSTM, other models aren't an option.
And to anyone making it this far, even if you are not able to help, thank you for reading such a mess!

Creating 30 Plotly charts based on multi-index Pandas series

I've got a DataFrame with PLAYER_NAME, the corresponding cluster they're assigned to, their team's net rating, and their team ids. This is what it looks like:
I'd like to have a bunch of bar charts for each team that look like the following:
The would be matched with the team's net rating and id. I've tried using groupby like this to get a multi-index Pandas series where there's a team_id and a cluster number corresponding to the number of instances that cluster appears for a certain team. It looks like this:
.
Unfortunately this removes the net rating, so I'm really not sure how to get all that info. I'm using plotly right now but if there's a solution with matplotlib that's fine.
This is the groupby code:
temp_df = pos_clusters.groupby(['TEAM_ID'])['Cluster'].value_counts().sort_index()
Thanks so much!

Forecasting, (finding the right model)

Using Python, I am trying to predict the future sales count of a product, using historical sales data. I am also trying to predict these counts for various groups of products.
For example, my columns looks like this:
Date Sales_count Department Item Color
8/1/2018, 50, Homegoods, Hats, Red_hat
If I want to build a model that predicts the sales_count for each Department/Item/Color combo using historical data (time), what is the best model to use?
If I do Linear regression on time against sales, how do I account for various categories? Can I group them?
Would I instead use multilinear regression, treating the various categories as independent variables?
The best way I have come across in forecasting in python is using SARIMAX( Seasonal Auto Regressive Integrated Moving Average with Exogenous Variables) model in statsmodel Library. Here is the link for a very good tutorial in SARIMAX using python
Also, If you are able to group the data frame according to your Department/Item?color combo, you can put them in a loop and apply the same model.
May be you can create a key for each unique combination and for each key condition you can forecast the sales.
For example,
df=pd.read_csv('your_file.csv')
df['key']=df['Department']+'_'+df['Item']+'_'+df['Color']
for key in df['key'].unique():
temp=df.loc[df['key']==key]#filtering only the specific group
temp=temp.groupby('Date')['Sales_count'].sum().reset_index()
#aggregating the sum of sales in that date. Ignore if not required.
#write the forecasting code here from the tutorial

Transform a list of customers and products into a matrix in python

I recently started reading about product recommendation. Basically I would like to build a small recommendation engine using python. My problem is that I have three lists:
List of all customers
List of all products
List of all orders
The list of orders is actually a 2D array containing customer's ID and product's ID.
What I would like to do is to create a Boolean matrix with customer's ID as rows and product's ID as columns and then fill it up. 1 if the customer has bought the product and 0 if customer didn't buy it.
Let me know if I wasn't clear. Help will be much appreciated. Thanks a lot.

Categories