I'm iterating over a dataframe and I want to add new elements to each row, so that I can add the new row to a second dataframe.
for index, row in df1.iterrows():
# I looking for somethis like this:
new_row = row.append({'column_name_A':10})
df2 = df2.append(new_row,ignore_index=True)
If I understand correctly, you want to have a copy of your original dataframe with a new column added. You can create a copy of the original dataframe, add the new column to it and then iterate over the rows of the new dataframe to update the values of the new column as you would have done in your code posted in the question.
df2 = df1.copy()
df2['column_name_A'] = 0
for index, row in df2.iterrows():
row['column_name_A'] = some_value
Related
I am trying to create a dataframe from an .xlsx file that transforms a string that is in a cell into a number of strings that are arranged in a single cell.
For example, I have a dataframe as follows:
column_name1 column_name2
[[[A;B;C], [D;E]]],
[[F;G;H], [I;J]]]]]
My intention is that 5 columns are created: "column_name1_1", "column_name1_2", "column_name1_3", "column_name2_1", "column_name2_2". Can the column name be automatized?
After the dataframe is created, my intention is to enter the data "A" in the first column, "B" in the second column, and so on. "F" would also go in the first column, but under "A" and "G" would go in the second column, but under "B".
Is there any way to achieve this result? It would also be useful for me not to create the name of the columns, but to distribute the information in the way I stated above.
I have created this simple code that separates the letters into lists:
for headers in df.columns:
for cells in df[headers]:
cells = str(cells)
sublist = cells.split(character)
print(sublist)
I am using pandas for the first time and this is my first post. Any advice is welcome. Thank you all very much!
You can achieve this using Pandas.
Here you go!
import pandas as pd
# Load the .xlsx file into a Pandas dataframe
df = pd.read_excel("file.xlsx")
# Create a new dataframe to store the split values
split_df = pd.DataFrame()
# Loop through the columns
for headers in df.columns:
# Loop through the cells in each column
for cells in df[headers]:
cells = str(cells)
sublist = cells.split(";")
# Get the number of elements in the sublist
num_elements = len(sublist)
# Create new columns in the split_df dataframe for each element in the sublist
for i in range(num_elements):
column_name = headers + "_" + str(i+1)
split_df[column_name] = sublist[i]
# Reset the index of the split_df dataframe
split_df = split_df.reset_index(drop=True)
# Save the split_df dataframe to a new .xlsx file
split_df.to_excel("split_file.xlsx", index=False)
This code will split the values in a .xlsx file into a new dataframe, with each value separated into its own column. The new columns will be named based on the original column names and the position of the value in the list. The new dataframe will then be saved to a new .xlsx file named "split_file.xlsx".
I want to create a new column, V, in an existing DataFrame, df. I would like the value of the new column to be the difference between the value in the 'x' column in that row, and the value of the 'x' column in the row below it.
As an example, in the picture below, I want the value of the new column to be
93.244598 - 93.093285 = 0.151313.
I know how to create a new column based on existing columns in Pandas, but I don't know how to reference other rows using this method. Is there a way to do this that doesn't involve iterating over the rows in the dataframe? (since I have read that this is generally a bad idea)
You can use pandas.DataFrame.shift for your use case.
The last row will not have any row to subtract from so you will get the value for that cell as NaN
df['temp_x'] = df['x'].shift(-1)
df[`new_col`] = df['x'] - df['temp_x']
or one liner :
df[`new_col`] = df['x'] - df['x'].shift(-1)
the column new_col will contain the expected data
An ideal solution is to use diff:
df['new'] = df['x'].diff(-1)
for date in list_of_dates:
df = *dataframe with identifiers for rows and dates for columns*
new_column = *a new column with a new date to be added to the df*
df_incl_new_column = *original df merged with new column*
I want to use the 'df_incl_new_column' at the start of the loop as the new 'df' and keep iteratively adding new columns for each date, and using the dataframe with the new column as the start 'df' again and again.
I want to do this for a list of over 20 dates to build a new dataframe with all the new columns.
Each new column has data which changes depending on the previous new column having been added to the df.
What is the best way to do this?
It may be that a for loop is not appropriate but i need to build a dataframe gradually using the latest data in the dataframe to add the next column.
You should try this:
df = *dataframe with identifiers for rows and dates for columns*
for date in list_of_dates:
df['New Column Name'] = *a new column with a new date to be added to the df*
I have a dataframe df with one column and 500k rows (df with first 5 elements is given below). I want to add new data in the existing column. The new data is a matrix of 200k rows and 1 column. How can I do it? Also I want add a new column named op.
X098_DE_time
0.046104
-0.037134
-0.089496
-0.084906
-0.038594
We can use concat function after rename the column from second dataframe.
df2.rename(columns={'op':' X098_DE_time'}, inplace=True)
new_df = pd.concat([df, new_df], axis=0)
Note: If we don't rename df2 column, the resultant new_df will have 2 different columns.
To add new column you can use
df["new column"] = [list of values];
I am trying to iterate a Pandas DataFrame (row-by-row) that has non-sequential index labels. In other words, one Dataframe's index labels look like this: 2,3,4,5,6,7,8,9,11,12,.... There is no row/index label 10. I would like to iterate the DataFrame to update/edit certain values in each row based on a condition since I am reading Excel sheets (which has merged cells) into DataFrames.
I tried the following code (# Manuel's answer) to iterate through each row of df and edit each row if conditions apply.
for col in list(df): #All columns
for row in df[1].iterrows(): ##All rows, except first
if pd.isnull(df.loc[row[0],'Album_Name']): ##If this cell is empty all in the same row too.
continue
elif pd.isnull(df.loc[row[0], col]) and pd.isnull(df.loc[row[0]+1, col]): ##If a cell and next one are empty, take previous value.
df.loc[row[0], col] = df.loc[row[0]-1, col]
However, since the DataFrame has non-sequential index labels, I get the following error message: KeyError: the label [10] is not in the [index]. How can I iterate and edit the DataFrame (row-by-row) with non-sequential index labels?
For reference, here is what my Excel sheet and DataFrame looks like:
Yes, just change the second loop to:
for row in df:
and then refer to the row with "row", not name.
for col in df: #All columns
for row in df: ##All rows, except first
if row==1:
continue #this skips to next loop iteration
if pd.isnull(df.loc[row[0],'Album_Name']): ##If this cell is empty all in the same row too.
continue
elif pd.isnull(df.loc[row[0], col]) and pd.isnull(df.loc[row[0]+1, col]): ##If a cell and next one are empty, take previous value.
df.loc[row[0], col] = df.loc[row[0]-1, col]