Function to match orders to batch not working

Function to match orders to batch not working - python

#df =
order_number Product1 ... Product 16 Product17
0 4329374937 1 ... 0 0
1 3483872349 1 ... 0 0
2 2394287383 1 ... 0 0
3 3423984902 1 ... 1 0
4 9378374873 0 ... 0 0
Batch1 = ["Product1", "Product2", "Product 6"]
for indices in df.index:
for column in columns:
if df[column] > 0 and in Batch1 df[B1] = True
else df[B1] = False
print(df.head))
I am trying to determine a way to look through each order number and see if the orders which are greater than 0 are within my listed batch. I want to create a new column for each row that is a boolean. I am getting a syntax error.

From what I understand you want to take the columns in your batch, add up the order numbers and see where it is > 0. So
batch_1 = ["Product1", "Product2", "Product6"]
df['B1'] = df[batch_1].sum(axis=1)>0
df
output:
order_number Product1 Product2 Product6 Product16 Product17 B1
-- -------------- ---------- ---------- ---------- ----------- ----------- -----
0 4329374937 1 1 3 0 0 True
1 3483872349 1 0 1 0 0 True
2 2394287383 1 0 2 0 0 True
3 3423984902 1 0 1 1 0 True
4 9378374873 0 0 0 0 0 False

Related

Appending 2 dataframes with having duplicates without removing the duplicates

I'm trying to append prediction to my original data which is:
product_id date views wishlists cartadds orders order_units gmv score
mp000000000001321 01-09-2022 0 0 0 0 0 0 0
mp000000000001321 02-09-2022 0 0 0 0 0 0 0
mp000000000001321 03-09-2022 0 0 0 0 0 0 0
mp000000000001321 04-09-2022 0 0 0 0 0 0 0
I have sequence length of [1,3] and each for each sequence length I have prediction. I want to add those prediction to my original data so that my output is like this:
product_id date views wishlists cartadds orders order_units gmv score prediction sequence_length
mp000000000001321 01-09-2022 0 0 0 0 0 0 0 5.75 1
mp000000000001321 01-09-2022 0 0 0 0 0 0 0 5.88 3
mp000000000001321 02-09-2022 0 0 0 0 0 0 0 5.88 3
mp000000000001321 03-09-2022 0 0 0 0 0 0 0 5.88 3
I have tried the following:
df1 = df_batch.head(sequence_length)
dfff = pd.DataFrame.from_dict(predictions_dict, orient='index')
dfff.index.names = ['product_id']
merged_df = df1.merge(dfff, on='product_id')
merged_df.to_csv('data_prediction'+str(sequence_length)+'.csv', index_label='product_id')
but this only saves the data of last product_id which was sent and it saves for each sequence length in a different csv. I want everything to be in 1 csv instead. How do that?
Edit: sample predictions_dict:
{'mp000000000001321': {'sequence_length': 1, 'prediction': 5.75}}
{'mp000000000001321': {'sequence_length': 3, 'prediction': 5.88}}

So, I found a fix
df1 = df_batch[df_batch['product_id'] == product_id].iloc[:sequence_length]
dfff = pd.DataFrame.from_dict(predictions_dict, orient='index')
dfff.index.names = ['product_id']
merged_df = df1.merge(dfff, on='product_id')
new_df = pd.concat([new_df, merged_df], ignore_index=True)
This way I'm able to get the desired output for unique product id's

Calculate count of a numeric column into new columns Pandas DataFrame

I have a pandas DataFrame like this:
Movie Rate
0 5821 4
1 2124 2
2 7582 1
3 3029 5
4 17479 1
both movie and the rating could be repeated. I need to transform this DataFrame to something like this:
Movie Rate_1_Count Rate_2_Count ... Rate_5_Count
0 5821 20 1 5
1 2124 2 0 99
2 7582 50 22 22
...
which the movie ids are unique and Rate {Number} Count is the count of the ratings to that movie that are equal to the {Number}.
I already accomplished this task using the code below which I believe is very messy. I guess there must be a neater way to do that. Can anyone help me with it?
self.movie_df_tmp = self.rating_df[['MovieId', 'Rate']]
self.movie_df_tmp['RaCount'] = self.movie_df_tmp.groupby(['MovieId'])['Rate'].transform('count')
self.movie_df_tmp['Sum'] = self.movie_df_tmp.groupby(['MovieId'])['Rate'].transform('sum')
self.movie_df_tmp['NORC'] = self.movie_df_tmp.groupby(['MovieId', 'Rate'])['Rate'].transform('count')
self.movie_df_tmp = self.movie_df_tmp.drop_duplicates()
self.movie_df_tmp['Rate1C'] = self.movie_df_tmp[self.movie_df_tmp['Rate'] == 1]['NORC']
self.movie_df_tmp['Rate2C'] = self.movie_df_tmp[self.movie_df_tmp['Rate'] == 2]['NORC']
self.movie_df_tmp['Rate3C'] = self.movie_df_tmp[self.movie_df_tmp['Rate'] == 3]['NORC']
self.movie_df_tmp['Rate4C'] = self.movie_df_tmp[self.movie_df_tmp['Rate'] == 4]['NORC']
self.movie_df_tmp['Rate5C'] = self.movie_df_tmp[self.movie_df_tmp['Rate'] == 5]['NORC']
self.movie_df_tmp = self.movie_df_tmp.replace(np.nan, 0)
self.movie_df = self.movie_df_tmp[['MovieId', 'RaCount', 'Sum']].drop_duplicates()
self.movie_df_tmp = self.movie_df_tmp.drop(columns=['Rate', 'NORC', 'Sum', 'RaCount'])
self.movie_df_tmp = self.movie_df_tmp.groupby(['MovieId'])["Rate1C", "Rate2C", "Rate3C", "Rate4C", "Rate5C"].apply(
lambda x: x.astype(int).sum())
self.movie_df = self.movie_df.merge(self.movie_df_tmp, left_on='MovieId', right_on='MovieId')
self.movie_df = pd.DataFrame(self.movie_df.values,
columns=['MovieId', 'Rate1C', 'Rate2C', 'Rate3C', 'Rate4C',
'Rate5C'])

Try with pd.crosstab:
pd.crosstab(df['Movie'], df['Rate'])
Rate 1 2 4 5
Movie
2124 0 1 0 0
3029 0 0 0 1
5821 0 0 1 0
7582 1 0 0 0
17479 1 0 0 0
Fix axis names and column names rename + reset_index + rename_axis:
new_df = (
pd.crosstab(df['Movie'], df['Rate'])
.rename(columns=lambda c: f'Rate_{c}_Count')
.reset_index()
.rename_axis(columns=None)
)
Movie Rate_1_Count Rate_2_Count Rate_4_Count Rate_5_Count
0 2124 0 1 0 0
1 3029 0 0 0 1
2 5821 0 0 1 0
3 7582 1 0 0 0
4 17479 1 0 0 0

This should give you the desired output:
grouper=df.groupby(['Movie','Rate']).size()
dg=pd.DataFrame()
dg['Movie']=df['Movie'].unique()
for i in [1,2,3,4,5]:
dg['Rate_'+str(i)+'Count']=dg['Movie'].apply(lambda x: grouper[x,i] if (x,i)
in grouper.index else 0)

creating conditions on np.where in Pandas based on value in current column

I have a dataframe in Pandas (subset below).
DATE IN 200D_MA TEST
10/30/2013 0 1 0
10/31/2013 0 1 0
11/1/2013 1 1 1 IN & 200D_MA both =1, results 1
11/4/2013 0 1 1 PREVIOUS TEST ROW =1 & 200DM_A = 1, TEST ans=1
11/5/2013 0 1 1 PREVIOUS TEST ROW =1 & 200DM_A = 1, TEST ans=1
11/6/2013 0 1 1
11/7/2013 0 1 1
11/8/2013 0 1 1
11/11/2013 0 0 0 PREVIOUS TEST ROW =1 & 200DM_A = 0, TEST ans=0
This is easy to do in excel so I thought it would be easy to do in python. I have this code using nested np.where formulas
df3['TEST'] = np.where( (df3['IN'] == 1) & (df3['200D_MA'] == 1),1,\
np.where( (df3['TEST'].shift(-1) == 1)\
& (df3['200D_MA'] == 1),1,0))
but it throws a KeyError: 'IN' > presumably because I am using a condition from column that has not been created yet. Can anyone help me figure out how to do this?

Seems like you need condition ffill
df['TEST']=df.loc[df.IN==1,'IN']
df.loc[df['200D_MA']==1,'TEST']=df.loc[df['200D_MA']==1,'TEST'].ffill()
df.fillna(0,inplace=True)
df.TEST=df.TEST.astype(int)
df
Out[349]:
DATE IN 200D_MA TEST
0 10/30/2013 0 1 0
1 10/31/2013 0 1 0
2 11/1/2013 1 1 1
3 11/4/2013 0 1 1
4 11/5/2013 0 1 1
5 11/6/2013 0 1 1
6 11/7/2013 0 1 1
7 11/8/2013 0 1 1
8 11/11/2013 0 0 0

I think you can use rolling to calculate previous TEST row.
df['TEST'] = (df['IN 200D_MA'] & df['IN 200D_MA'].rolling(2).min().shift(1)).astype(int)
Output:
DATE IN 200D_MA TEST
10/30/2013 0 1 0
10/31/2013 0 1 0
11/1/2013 1 1 1
11/4/2013 0 1 1
11/5/2013 0 1 1
11/6/2013 0 1 1
11/7/2013 0 1 1
11/8/2013 0 1 1
11/11/2013 0 0 0

Convert Dictionary to Pandas in Python

I have a dict as follows:
data_dict = {'1.160.139.117': ['712907','742068'],
'1.161.135.205': ['667386','742068'],
'1.162.51.21': ['326136', '663056', '742068']}
I want to convert the dict into a dataframe:
df= pd.DataFrame.from_dict(data_dict, orient='index')
How can I create a dataframe that has columns representing the values of the dictionary and rows representing the keys of the dictionary?, as below:

The best option is #4
pd.get_dummies(df.stack()).sum(level=0)
Option 1:
One way you could do it:
df.stack().reset_index(level=1)\
.set_index(0,append=True)['level_1']\
.unstack().notnull().mul(1)
Output:
326136 663056 667386 712907 742068
1.160.139.117 0 0 0 1 1
1.161.135.205 0 0 1 0 1
1.162.51.21 1 1 0 0 1
Option 2
Or with a litte reshaping and pd.crosstab:
df2 = df.stack().reset_index(name='Values')
pd.crosstab(df2.level_0,df2.Values)
Output:
Values 326136 663056 667386 712907 742068
level_0
1.160.139.117 0 0 0 1 1
1.161.135.205 0 0 1 0 1
1.162.51.21 1 1 0 0 1
Option 3
df.stack().reset_index(name="Values")\
.pivot(index='level_0',columns='Values')['level_1']\
.notnull().astype(int)
Output:
Values 326136 663056 667386 712907 742068
level_0
1.160.139.117 0 0 0 1 1
1.161.135.205 0 0 1 0 1
1.162.51.21 1 1 0 0 1
Option 4 (#Wen pointed out a short solution and fastest so far)
pd.get_dummies(df.stack()).sum(level=0)
Output:
326136 663056 667386 712907 742068
1.160.139.117 0 0 0 1 1
1.161.135.205 0 0 1 0 1
1.162.51.21 1 1 0 0 1

Counting number of consecutive zeros in a Dataframe [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
i want to count number of consecutive zeros in my Dataframe shown below, help please
DEC JAN FEB MARCH APRIL MAY consecutive zeros
0 X X X 1 0 1 0
1 X X X 1 0 1 0
2 0 0 1 0 0 1 2
3 1 0 0 0 1 1 3
4 0 0 0 0 0 1 5
5 X 1 1 0 0 0 3
6 1 0 0 1 0 0 2
7 0 0 0 0 1 0 4

For each row, you want cumsum(1-row) with reset at every point when row == 1. Then you take the row max.
For example
ts = pd.Series([0,0,0,0,1,1,0,0,1,1,1,0])
ts2 = 1-ts
tsgroup = ts.cumsum()
consec_0 = ts2.groupby(tsgroup).transform(pd.Series.cumsum)
consec_0.max()
will give you 4 as needed.
Write that in a function and apply to your dataframe

Here's my two cents...
Think of all the other non-zero elements as 1, then you will have a binary code. All you need to do now is find the 'largest interval' where there's no bit flip starting with 0.
We can write a function and 'apply' with lambda
def len_consec_zeros(a):
a = np.array(list(a)) # convert elements to `str`
rr = np.argwhere(a == '0').ravel() # find out positions of `0`
if not rr.size: # if there are no zeros, return 0
return 0
full = np.arange(rr[0], rr[-1]+1) # get the range of spread of 0s
# get the indices where `0` was flipped to something else
diff = np.setdiff1d(full, rr)
if not diff.size: # if there are no bit flips, return the
return len(full) # size of the full range
# break the array into pieces wherever there's a bit flip
# and the result is the size of the largest chunk
pos, difs = full[0], []
for el in diff:
difs.append(el - pos)
pos = el + 1
difs.append(full[-1]+1 - pos)
# return size of the largest chunk
res = max(difs) if max(difs) != 1 else 0
return res
Now that you have this function, call it on every row...
# join all columns to get a string column
# assuming you have your data in `df`
df['concated'] = df.astype(str).apply(lambda x: ''.join(x), axis=1)
df['consecutive_zeros'] = df.concated.apply(lambda x: len_consec_zeros(x))

Here's one approach -
# Inspired by https://stackoverflow.com/a/44385183/
def pos_neg_counts(mask):
idx = np.flatnonzero(mask[1:] != mask[:-1])
if len(idx)==0: # To handle all 0s or all 1s cases
if mask[0]:
return np.array([mask.size]), np.array([0])
else:
return np.array([0]), np.array([mask.size])
else:
count = np.r_[ [idx[0]+1], idx[1:] - idx[:-1], [mask.size-1-idx[-1]] ]
if mask[0]:
return count[::2], count[1::2] # True, False counts
else:
return count[1::2], count[::2] # True, False counts
def get_consecutive_zeros(df):
arr = df.values
mask = (arr==0) | (arr=='0')
zero_count = np.array([pos_neg_counts(i)[0].max() for i in mask])
zero_count[zero_count<2] = 0
return zero_count
Sample run -
In [272]: df
Out[272]:
DEC JAN FEB MARCH APRIL MAY
0 X X X 1 0 1
1 X X X 1 0 1
2 0 0 1 0 0 1
3 1 0 0 0 1 1
4 0 0 0 0 0 1
5 X 1 1 0 0 0
6 1 0 0 1 0 0
7 0 0 0 0 1 0
In [273]: df['consecutive_zeros'] = get_consecutive_zeros(df)
In [274]: df
Out[274]:
DEC JAN FEB MARCH APRIL MAY consecutive_zeros
0 X X X 1 0 1 0
1 X X X 1 0 1 0
2 0 0 1 0 0 1 2
3 1 0 0 0 1 1 3
4 0 0 0 0 0 1 5
5 X 1 1 0 0 0 3
6 1 0 0 1 0 0 2
7 0 0 0 0 1 0 4

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Function to match orders to batch not working - python

Related

Appending 2 dataframes with having duplicates without removing the duplicates

Calculate count of a numeric column into new columns Pandas DataFrame

creating conditions on np.where in Pandas based on value in current column

Convert Dictionary to Pandas in Python

Counting number of consecutive zeros in a Dataframe [closed]

Categories

Resources