Change column value based on list pandas python [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I am looking to keep track of customers that are going to churn in 2019 in the order data of 2018 so that I can do some analyses such as where the customers come from, if their order size has been decreasing compared to customers that will not churn.
The orderdata in 2018 is a pandas df called 'order_data' and I have a list of customers that will churn in 2019 called 'churn_customers_2019'. In order_data there is a column called Customer_id. The list is also filled with Customer_id names of the clients that will churn.
However my logic is not running well.
order_data['churn in 2019?'] = str('N')
for x in order_data['Customer_id']:
if x in churn_customers_2019:
order_data['churn in 2019?'][x] = 'Y'
If I run this code everything changes to N instead of also having some Y. Only about 10% of the customers churn.

I would suggest using np.where and isin for your problem, likewise:
order_data['churn in 2019?'] = np.where(order_data['Customer_id'].isin(churn_customers_2019), 'Y', 'N')

Related

How can I best split this entry into searchable words? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 months ago.
Improve this question
This is a picture of my dataset this dataset is from Kaggle here the link to it: https://www.kaggle.com/code/gleblevankov/exploring-spotify-data
The type of the column is object, and I want to split/transform this column into a list with words which could be also searchable.
I do want to split the column after every "," to get the word. I am somehow searching for a function which could create the words in the column into a list of searchable words pro row. So for example if I want to plot the column to see which genre is the most used one to not see genre like "rap,pop,kpop" but more "rap" "pop" "k-pop" instead.
I tried to change the type to list but then it aggregates the whole column into a list.
Is there another possible action on how I could transform this column?
Try running this command:
import pandas as pd
pd.Series([x for item in df.Genere for x in item]).value_counts()

How to find a rate at which data is growing with pandas? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I have a pandas DataFrame where there is a column which contains the new number of covid cases per day.
After plotting that I get this graph :
Now I want to find out at what rate the cases are growing so how can I do this ?
The rate at which cases grow will be: (cases in current day - cases in previous day) / cases in previous day
There are several ways to do this. The easiest is to use df.pct_change()

I am having a problem with coding this in python [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I am trying to solve this problem forever now, I am having a problem when I do a dictionary for the days and for the students I can't store the birthdays since they are dictionaries or maybe I am just lost, how can I write efficient code fr this simple problem
In this simulation question, we will simulate the birthday problem for the case where at least
3 students have the same birthday in a class with 200 students.
For the simulation you can consider this scenario: Assume a bag filled with
numbers from 1 to 365. Then let 200 students pick a ball and then put it back. The
number the students pick is their birthday. Convert this process into a ’function’
in your program, where the return is an array of size with 200 and filled with the
birthdays.
You could use python's random module to sample 200 numbers in the range 0-365 without duplicates: https://docs.python.org/3/library/random.html#random.sample
Something like:
from random import sample
random.sample(range(365), 200)
Does it have to be a dictionary? This may be simpler to do by using a list. By creating a list you can append every birthday drawn (in the case of scenario of the simulation) to the list and have the indices of list correspond to each student. ie. student 1: index 0 would have the birthday: list[index]

handle undefined data in neural networks [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I'm doing a neural network project in which there is a feature that is the time passed from the user's last activity until some specific times. For example imagine that there is a list of times (March 15th, April 1st, April 24th, etc) and we want to find the interval between each user last activity before any of those specific times and the specific time. To be more clear, image user1 has some actions on March 10th, March 13th and March 24th, the values for him/her according to March 15th would be 2 days (March 13th). Now what if the user has no actions before March 15th?
Now due to some algorithms, I am joining some temp tables which result in having lots of NaN. How to tell the network that these cells should not be considered?
edit1
the code to fill the cell is this:
for aciton_time in all_aciton_times:
interval_tmp = actions_df.loc[(actions_df['when'] < aciton_time)].drop_duplicates(subset="device_id", keep='last')
interval_tmp['aciton_' + str(aciton_time)] = interval_tmp['when'].apply(lambda x: aciton_time - x)
del interval_tmp['when']
interval = interval.merge(interval_tmp, on="device_id", how="outer")
previous_aciton_time = aciton_time
and the result is something like this:
thanks
If you have a large dataset you could drop any rows that have NaN values

how to plot graph based on attendance [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have a csv file that contains the attendance of a few students on particular dates.
Here is my csv file
Name,RollNumber,Attendance,Date,Day,Time
student1,1,Present,1/30/2019,Wednesday,12:34:05
student2,2,Present,1/30/2019,Wednesday,12:34:05
student3,3,Present,1/30/2019,Wednesday,12:34:05
student4,4,Present,1/30/2019,Wednesday,12:34:05
student1,1,Absent,1/31/2019,Thursday,23:34:05
student2,2,Present,1/31/2019,Thursday,23:34:05
student3,3,Present,1/31/2019,Thursday,23:34:05
student4,4,Present,1/31/2019,Thursday,12:34:05
student1,1,Present,2/1/2019,Friday,12:34:05
student2,2,Absent,2/1/2019,Friday,12:34:05
student3,3,Absent,2/1/2019,Friday,12:34:05
student4,4,Present,2/1/2019,Friday,12:34:05
student1,1,Absent,2/2/2019,Saturday,12:34:05
student2,2,Absent,2/2/2019,Saturday,12:34:05
student3,3,Absent,2/2/2019,Saturday,12:34:05
student4,4,Absent,2/2/2019,Saturday,12:34:05
I want to plot a graph that show the number of students present and absent on each date from the csv file. How do I do this with matplotlib?
The easiest way in my opinion is to work with pandas pivot_table as follow:
df = pd.read_csv('your_csv_filepath_here')
# Create a duplicate of your target value
df['attendance'] = a.Attendance
# Pivot your dataframe
df_pivot = df.pivot_table(index=['Date'], columns='Attendance', values='attendance', aggfunc='count')
# Plot it using pandas (barplot is probably what you want)
df_pivot.plot(kind='bar')
Of course further plot customizations are possible, as well as other methods would achieve the same result

Categories