Place Variable into a Specific Location [row,column] Pandas Python - python

I've been working to place a string variable "revenue" into a pandas dataframe df1. As you can see, I used df.ait.
More details about the code: It's about finding the specific date row, by m counting loop.
My issue occurs at the .iat.
if info[1] == "1": #Get Kikko1
listofdates = df.Date.tolist()
m = 0
for i in listofdates:
if i != date: #Counting the rows
m = m+1
elif i == date: #Select the row with the matched date
df.iat[m, 9] = "revenue"
break
The error says:
IndexError: index 36 is out of bounds for axis 0 with size 31

One of the main benefits of using a package like pandas is to avoid this kind of manual looping, which is very difficult to follow and modify.
I think you can do what you need to in one line. Something like:
df1.loc[date, 9] = 'revenue'
If that doesn't work, could you edit into your question some example data and your desired output?

Related

calculate descriptive statistics in pandas dataframe based on a condition cyclically

I have a pandas DataFrame with more than 100 thousands of rows. Index represents the time and two columns represents the sensor data and the condition.
When the condition becomes 1, I want to start calculating score card (average and standard deviation) till the next 1 comes. This needs to be calculated for the whole dataset.
Here is a picture of the DataFrame for a specific time span:
What I thought is to iterate through index and items of the df and when condition is met I start to calculate the descriptive statistics.
cycle = 0
for i, row in df_b.iterrows():
if row['condition'] == 1:
print('Condition is changed')
cycle += 1
print('cycle: ', cycle)
#start = ?
#end = ?
#df_b.loc[start:end]
I am not sure how to calculate start and end for this DataFrame. The end will be the start for the next cycle. Additionally, I think this iteration is not the optimal one because it takes a bit of long time to iterate. I appreciate any idea or solution for this problem.
Maybe start out with getting the rows where condition == 1:
cond_1_df = df.loc[df['condition'] == 1]
This dataframe will only contain the rows that meet your condition (being 1).
From here on, you can access the timestamps pairwise, meaning that the first element is beginning and second element is end, sketched below:
former = 0
stamp_pairs = []
df = cond_1_df.reset_index() # make sure indexes pair with number of rows
for index, row in df.iterrows():
if former != 0:
beginning = former
end = row["timestamp"]
former = row["timestamp"]
else:
beginning = 0
end = row["timestamp"]
former = row["timestamp"]
stamp_pairs.append([beginning, end])
This should give you something like this:
[[stamp0, stamp1], [stamp1,stamp2], [stamp2, stamp3]...]
for each of these pairs, you can again create a df containing only the subset of rows where stamp_x < timestamp < stamp_x+1:
time_cond_df = df.loc[(df['timestamp'] > stamp_x) & (df['timestamp'] < stamp_x+1)]
Finally, you get one time_cond_df per timestamp tuple, on which you can perform your score calculations.
Just make shure that your timestamps are comparable with operators ">" and "<"! We can't tell since you did not explicate how you produced the timestamps.

Define new column based on matching values between multiple columns in two dataframes

I'm currently trying to define a class label for a dataset I'm building. I have two different datasets that I need to consult, with df_port_call being the one that will ultimately contain the class label.
The conditions in the if statements need to be satisfied for the row to receive a class label of 1. Basically, if a row exists in df_deficiency that matches the if statement conditions listed below, the Class column in df_port_call should get a label of 1. But I'm not sure how to vectorize this and the loop is running very slowly (will take about 8 days to terminate). Any assistance here would be great!
df_port_call["Class"] = 0
for index, row in tqdm(df_port_call.iterrows()):
for index_def, row_def in df_deficiency.iterrows():
if row['MMSI'] == row_def['Primary VIN'] or row['IMO'] == row_def['Primary VIN'] or row['SHIP NAME'] == row_def['Vessel Name']:
if row_def['Inspection Date'] >= row['ARRIVAL IN USA (UTC)'] and row_def['Inspection Date'] <= row['DEPARTURE (UTC)']:
row['Class'] = 1
Without input data and expected outcome, it's difficult to answer. However you can use something like this with np.where:
df_port_call['Class'] = \
np.where(df_port_call['MMSI'].eq(df_deficiency['Primary VIN'])
| df_port_call['IMO'].eq(df_deficiency['Primary VIN'])
| df_port_call['SHIP NAME'].eq(df_deficiency['Vessel Name'])
& df_deficiency['Inspection Date'].between(df_port_call['ARRIVAL IN USA (UTC)'],
df_port_call['DEPARTURE (UTC)']),
1, 0)
Adapt to your code but I think this is the right way.

beginner panda change row data based upon code

I'm a beginner in panda and python, trying to learn it.
I would like to iterate over panda rows, to apply simple coded logic.
Instead of fancy mapping functions, I just want simple coded logic.
So then I can easily adapt it later for other coded logic rules as well.
In my dataframe dc,
I like to check if column AgeUnkown == 1 (or >0 )
And if so it should move the value of column Age to AgeUnknown.
And then make Age equal to 0.0
I tried various combinations of my below code but it won't work.
# using a row reference #########
for index, row in dc.iterrows():
r = row['AgeUnknown']
if (r>0):
w = dc.at[index,'Age']
dc.at[index,'AgeUnknown']=w
dc.at[index,'Age']=0
Another attempt
for index in dc.index:
r = dc.at[index,'AgeUnknown'].[0] # also tried .sum here
if (r>0):
w= dc.at[index,'Age']
dc.at[index,'AgeUnknown']=w
dc.at[index,'Age']=0
Also tried
if(dc[index,'Age']>0 #wasnt allowed either
Why isn't this working as far as I understood a dataframe should be able to be addressed like above.
I realize you requested a solution involving iterating the df, but I thought I'd provide one that I think is more traditional.
A non-iterating solution to your problem is something like this- 1) get all the indexes that meet your criteria 2) set those indexes of the df to what you want.
# indexes where column AgeUnknown is >0
inds = dc[dc['AgeUnknown'] > 0].index.tolist()
# change the indexes of AgeUnknown to to the Age column
dc.loc[inds, 'AgeUnknown'] = dc.loc[inds, 'Age']
# change the Age to 0 at those indexes
dc.loc[inds, 'Age'] = 0

Pandas create column from another dataframe if certain conditions match

Hey all I am trying to create a new column in a dataframe based on if certain conditions are meet. The end goal is go have all rows that condition is unoccupied in a column as long as the building, floor, and location matches. And time is greater then the occupied time.
Sample CSV File
I tried looking at this beforehand but I don't believe that it fits what I am trying to do. Other Stack Overflow Post
Would love to get pointed into the right direction for this.
current code that I am playing around with: (Also attempted with a loop but I no longer have the code to post it below)
[from IPython.display import display
df = pd.read_csv("/Users/username/Desktop/test.csv")
df2 = pd.DataFrame()
df2['Location'] = df.Location
df2['Type'] = df.Type
df2['Floor'] = df.Floor
df2['Building'] = df.Building
df2['Time'] = df['Date/Time']
df2['Status'] = df['Status']
df2 = df[~df['Condition'].isin(['Unoccupied'])]
df2['Went Unoccupied'] = np.where((df2['Location']==df['Location'])&(df2['Time'] < df['Date/Time']))
The OP tried to add the unoccupied time for each row that has Condition == occupied. It seems the data is well sorted and alternates between occupied and unoccupied. Thus, we shift the dataset backward and create a new column time_of_next_row. Then, query for the condition that df1.Condition == "Occupied".
df["time_of_next_row"] = df.shift(-1)["Date/Time"]
df_occ = df1[df1.Condition == "Occupied"]

Looking to find values in one dataframe and input them into another dataframe

So, I have 5 dataframes that I need to loop through and they all follow a similar format:
RX Dataframe
And here is the final dataframe:
So, essentially I need to pull the cluster of a specific index in the first martrx and find :
If that index is present in the new dataframe
If it is, find it and put the cluster value in the appropriate column
Ended up figuring it out:
for i in final_inds:
for j in range(0,5):
try:
cluster_values = all_dfs[j].loc[i,"clusters"]
except:
cluster_values = -1
final_df.loc[i, cols[j]] = cluster_values
final_df.head()

Categories