I have a data frame that I made the transpose of it looking like this
I would like to know how I can transform this group into filled lines, follow an example below
Where the first column is filled with the first value until the last empty row.
how can i do this if the column is grouped
In your case, repeat the indices of your data frame five times, save them in a new column, and then make the column entries original indices.
ibov_transpose['index'] = ibov_transpose.index.repeat(5)
ibov_transpose.set_index('index')
del(ibov_transpose['index'])
Related
I am trying to replace the Values in the "All Assortment" column of the "buyer" data frame.
I need to replace them with the data from the "All Stores" column of the "asl" data frame. The twist is that the index values of the asl data frame are the values that need to match for the replacement to work.
Hard to say without a minimal reproducible example, but try mapping the values of buyer['All Assortment'] to corresponding values from the asl['All Stores'] column based on the asl index:
buyer['All Assortment'] = buyer['All Assortment'].map(asl['All Stores'])
I have a data frame that contains product sales for each day starting from 2018 to 2021 year. Dataframe contains four columns (Date, Place, Product Category and Sales). From the first two columns (Date, Place) I want to use the available data to fill in the gaps. Once the data is added, I would like to delete rows that do not have data in ProductCategory. I would like to do in python pandas.
The sample of my data set looked like this:
I would like the dataframe to look like this:
Use fillna with method 'ffill' that propagates last valid observation forward to next valid backfill. Then drop the rows that contain NAs.
df['Date'].fillna(method='ffill',inplace=True)
df['Place'].fillna(method='ffill',inplace=True)
df.dropna(inplace=True)
You are going to use the forward-filling method to replace null values with the value of the nearest one above it df['Date', 'Place'] = df['Date', 'Place'].fillna(method='ffill'). Next, to drop rows with missing values df.dropna(subset='ProductCategory', inplace=True). Congrats, now you have your desired df 😄
Documentation: Pandas fillna function, Pandas dropna function
compute the frequency of catagories in the column by plotting,
from plot you can see bars reperesenting the most repeated values
df['column'].value_counts().plot.bar()
and get the most frequent value using index, index[0] gives most repeated and
index[1] gives 2nd most repeated and you can choose as per your requirement.
most_frequent_attribute = df['column'].value_counts().index[0]
then fill missing values by above method
df['column'].fillna(df['column'].most_freqent_attribute,inplace=True)
to fill multiple columns with same method just define this as funtion, like this
def impute_nan(df,column):
most_frequent_category=df[column].mode()[0]
df[column].fillna(most_frequent_category,inplace=True)
for feature in ['column1','column2']:
impute_nan(df,feature)
I have a list of lists that contain the indexes of the mininum values on each column of a DataFrame that has row and column name going from 0 to 399 (on columns) and 0 to 1595 (on rows). I want to use this list to access the data of another DataFrame. For example, I have the list (43,579,100) and I want to access the 43rd, 579th and 100th value of a column in the second DataFrame. However, this DataFrame has row number names that do not go from 0 to 1595 so I don't want to make the mistake of accessing the data on the row that may have the name "43", I want to access the 43rd row.
I added a picture of my Data Frames.
I would like to get a list with the data on the selected rows.
You can use .values to convert the column data to a numpy array and index with your list. For example, if your data is in variable df and the list of indexes is idxs, then for a given column:
df[column].values[idxs]
I want to aggregate my data based off a field known as COLLISION_ID and a count of each COLLISION_ID.
I want to remove repeating COLLISION_IDs since they have the same Coordinates, but retain a count of occurrences in original data-set.
My code is below
df2 = df1.groupby(['COLLISION_ID'])[['COLLISION_ID']].count()
This returns such:
I would like my data returned as the COLLISION_ID numbers, the count, and the remaining columns of my data which are not shown here(~40 additional columns that will be filtered later)
If you are talking about filter , we should do transform
df1['count_col']=df1.groupby(['COLLISION_ID'])['COLLISION_ID'].transform('count')
Then you can filter the df1 with column count
I have two questions.
I would like to select rows from a dataframe called df_12 when the value of STABBR column is one of the elements of an array called states100.
The array is the states that had 100+ respondents to a survey.
states100 = ['CA','TX','NY','FL','PA','OH','IL','MI','MO',
'NC','MA','GA','TN','VA','NJ','IN','MN','PR',
'OK','AZ','CO','WA','WI','LA','KY','SC','CT',
'KS'
]
And dataframe is like this:
What I would like select the row when the value of STABBR is one of the elements of states100 array. I tried to use for loop, but I am not sure how for loop work in dataframe situation.
Next question is, with those selected rows I would like to fill in an empty pandas dataframe. When I did pd.dataframe(index=df_12.index, columns=df_12.columns) to create an empty dataframe, it already has already set shape (which is the shape of df_12), and when the number of selected rows are less than the empty dataframe's shape, there will be plenty of NaN in the dataframe. I would like to extend the empty dataframe whenever a new row is added.