Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
Pandas: Kindly need to find out repeated problem for same customer Note: problem consider repeated if only occurred within 30 days with same code
Lets try group by Customer ID and Problem code and find the consecutive differences in dates within each group. Convert the time delata into days and check if the resultant absolute value is less than or equal to 30.
However, pay serious attention to comments posted above
df['Date']=pd.to_datetime(df['Date'])# Coerce date to datetime
df[abs(df.groupby(['CT_ID','Problem_code'])['Date'].diff().dt.days).le(30)]
CT_ID Problem_code Date
3 XO1 code_1 2021-01-03 11:35:00
5 XO3 code_4 2020-09-20 09:35:00
8 XO3 code_4 2020-10-10 11:35:00
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 days ago.
Improve this question
I would like to know how to find the values of a list that result in the sum of a specific value example that I have a list of payments but a payment is found that comes from different values of the list that added together result in the total value of the payment
I just want to know how to find the values of a sum in a way to be program it
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I have a pandas DataFrame where there is a column which contains the new number of covid cases per day.
After plotting that I get this graph :
Now I want to find out at what rate the cases are growing so how can I do this ?
The rate at which cases grow will be: (cases in current day - cases in previous day) / cases in previous day
There are several ways to do this. The easiest is to use df.pct_change()
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I am looking to keep track of customers that are going to churn in 2019 in the order data of 2018 so that I can do some analyses such as where the customers come from, if their order size has been decreasing compared to customers that will not churn.
The orderdata in 2018 is a pandas df called 'order_data' and I have a list of customers that will churn in 2019 called 'churn_customers_2019'. In order_data there is a column called Customer_id. The list is also filled with Customer_id names of the clients that will churn.
However my logic is not running well.
order_data['churn in 2019?'] = str('N')
for x in order_data['Customer_id']:
if x in churn_customers_2019:
order_data['churn in 2019?'][x] = 'Y'
If I run this code everything changes to N instead of also having some Y. Only about 10% of the customers churn.
I would suggest using np.where and isin for your problem, likewise:
order_data['churn in 2019?'] = np.where(order_data['Customer_id'].isin(churn_customers_2019), 'Y', 'N')
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I'm doing a neural network project in which there is a feature that is the time passed from the user's last activity until some specific times. For example imagine that there is a list of times (March 15th, April 1st, April 24th, etc) and we want to find the interval between each user last activity before any of those specific times and the specific time. To be more clear, image user1 has some actions on March 10th, March 13th and March 24th, the values for him/her according to March 15th would be 2 days (March 13th). Now what if the user has no actions before March 15th?
Now due to some algorithms, I am joining some temp tables which result in having lots of NaN. How to tell the network that these cells should not be considered?
edit1
the code to fill the cell is this:
for aciton_time in all_aciton_times:
interval_tmp = actions_df.loc[(actions_df['when'] < aciton_time)].drop_duplicates(subset="device_id", keep='last')
interval_tmp['aciton_' + str(aciton_time)] = interval_tmp['when'].apply(lambda x: aciton_time - x)
del interval_tmp['when']
interval = interval.merge(interval_tmp, on="device_id", how="outer")
previous_aciton_time = aciton_time
and the result is something like this:
thanks
If you have a large dataset you could drop any rows that have NaN values
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
This post was edited and submitted for review 3 months ago and failed to reopen the post:
Original close reason(s) were not resolved
Improve this question
If the integer is 2013, then the maximum number would be 3.
How would I proceed to doing so?
max([int(c) for c in str(2013)])
First you convert the number to a string, this allows to look at every digit one by one, then you convert it into a list of single digits and look for the maximum.
Another solution is
max = -1
for c in str(2013):
i = int(c)
if i > max:
max = i