I have a list of lists that contain the indexes of the mininum values on each column of a DataFrame that has row and column name going from 0 to 399 (on columns) and 0 to 1595 (on rows). I want to use this list to access the data of another DataFrame. For example, I have the list (43,579,100) and I want to access the 43rd, 579th and 100th value of a column in the second DataFrame. However, this DataFrame has row number names that do not go from 0 to 1595 so I don't want to make the mistake of accessing the data on the row that may have the name "43", I want to access the 43rd row.
I added a picture of my Data Frames.
I would like to get a list with the data on the selected rows.
You can use .values to convert the column data to a numpy array and index with your list. For example, if your data is in variable df and the list of indexes is idxs, then for a given column:
df[column].values[idxs]
Related
suppose I have a DataFrame
I want to convert this into dataframe which has single value in each row like below
Is there any way to obtain this?
The dataset I am using has number of columns which hold criminal offence codes (eg, 90, 120, 10) for prisoners. The columns are sparsely populated because of the complex survey routing logic used to capture the data. The data needs to be one hot encoded to feed into a machine learning model. Having (number of columns where offenses are held) x (number of offense codes) does one-hot encode the data, but it creates a dataset that is far too sparse.
I therefore want to create one column for each offense code and, for each row in the dataset, populate it with the count of that code across all columns that hold offenses.
I can imagine a way to do this by converting the dataframe to a dictionary, but this seems very slow and bad practice for pandas.
#dataset is a dataframe
#offense_columns is a list of strings corresponding to column names in the dataset
#create a list of all the codes that appear across all offense columns
all_possible_offense_codes=[]
for colname in all_possible_offense_codes.values():
for value in dataset[colname].values():
if value not in all_possible_offense_codes:
all_possible_offense_codes.append(value)
#create a copy subset of the dataframe with just the offense columns
offense_cols_subset=dataset[offense_columns]
#convert to dictionary- quicker to loop through than df
offense_cols_dict=offense_cols_subset.to_dict(orient='index')
#create an empty dictionary to hold the counts and append back onto the main dataframe
all_offense_counts={}
#look at each row in the dataframe (converted into a dict) one by one
for row,variables in offense_cols_dict.items():
#create a dict with all offense code as key and value as 0 (starting count)
#considered using get(code,0) rather than prepopulating keys and vals...
#but think different vals across dicts would create alignment issues...
#when appending back onto dataset df
this_row_offense_counts={code:0 for code in all_possible_offense_codes}
#then go through each offense column
for column in offense_columns:
#find the code stored in this column for this row
code=offense_cols_dict[row,column]
#increment count by 1
this_row_offense_counts[code]=this_row_offense_offense_counts[code]+1
#once all columns have been counted, store counts in dictionary
all_offense_counts[row]=this_row_offense_counts
#once all rows have been counted, turn into a dataframe
offense_counts_cols=pf.DataFrame.from_dict(all_offense_counts,orient=index)
#join to the original dataframe
dataset.join(offense_counts)
#drop the sparsely populated offense_columns
dataset.drop(offense_columns,axis=1)
From what I understood, melt function should help, please try this:
pd.melt(dataset, id_vars =[please add a unique col here], value_vars =[[offense_columns]])
I have a data frame that I made the transpose of it looking like this
I would like to know how I can transform this group into filled lines, follow an example below
Where the first column is filled with the first value until the last empty row.
how can i do this if the column is grouped
In your case, repeat the indices of your data frame five times, save them in a new column, and then make the column entries original indices.
ibov_transpose['index'] = ibov_transpose.index.repeat(5)
ibov_transpose.set_index('index')
del(ibov_transpose['index'])
I have two questions.
I would like to select rows from a dataframe called df_12 when the value of STABBR column is one of the elements of an array called states100.
The array is the states that had 100+ respondents to a survey.
states100 = ['CA','TX','NY','FL','PA','OH','IL','MI','MO',
'NC','MA','GA','TN','VA','NJ','IN','MN','PR',
'OK','AZ','CO','WA','WI','LA','KY','SC','CT',
'KS'
]
And dataframe is like this:
What I would like select the row when the value of STABBR is one of the elements of states100 array. I tried to use for loop, but I am not sure how for loop work in dataframe situation.
Next question is, with those selected rows I would like to fill in an empty pandas dataframe. When I did pd.dataframe(index=df_12.index, columns=df_12.columns) to create an empty dataframe, it already has already set shape (which is the shape of df_12), and when the number of selected rows are less than the empty dataframe's shape, there will be plenty of NaN in the dataframe. I would like to extend the empty dataframe whenever a new row is added.
Simple indexing question.
I have the following list object named as xlist.
It could also be a df.
[ result
6221 0.974214
6220 0.973909
6222 0.973447
3032 0.973444
3033 0.973387]
I would like to get the index 'value' of a particular row (or is it column as a list?). So for example: I would like to specify row 2 and get 6222.
Am confused as to why this is not straightforward (to me anyway).
If you have a list of DataFrames, you can do this:
#for the first DataFrame in list and the numbers represent the index in those DataFrames
needed_value = xlist[0].reset_index().iloc[1]['index'] #iloc[1] to select 2nd row
print(needed_value)
6222