I imported two csv files in python using read_csv. So now I have 2 dataframes with dimensions 40x300. What I want to do is create a new csv file with dimensions 40x300, where each cell will have the mean value calculated using the values of the respective position in the other two csv files. For example, if the cell with position 1x2 in the first dataframe is 10 and the cell with the same position in the second dataframe is 20, I want a third dataframe with dimensions 40x300 which has the value of 15 in position 1x2. I tried
frame1.add(frame2)
but this created a new dataframe with dimensions 40x600. Any help would be very appreciated.
Panda is index and column sensitive , when you adding them up, you always need to make sure the two df have the same index and column
frame2.index=frame1.index
frame2.columns=frame1.columns
frame1.add(frame2)/2
After change the index and columns concatwill work as well
pd.concat([frame1,frame2]).mean(level=0)
Related
I cant seem to find a way to split all of the array values from the column of a dataframe.
I have managed to get all the array values using this code:
The dataframe is as follows:
I want to use value.counts() on the dataframe and I get this
I want the array values that are clubbed together to be split so that I can get the accurate count of every value.
Thanks in advance!
You could try .explode(), which would create a new row for every value in each list.
df_mentioned_id_exploded = pd.DataFrame(df_mentioned_id.explode('entities.user_mentions'))
With the above code you would create a new dataframe df_mentioned_id_exploded with a single column entities.user_mentions, which you could then use .value_counts() on.
I have a list of lists that contain the indexes of the mininum values on each column of a DataFrame that has row and column name going from 0 to 399 (on columns) and 0 to 1595 (on rows). I want to use this list to access the data of another DataFrame. For example, I have the list (43,579,100) and I want to access the 43rd, 579th and 100th value of a column in the second DataFrame. However, this DataFrame has row number names that do not go from 0 to 1595 so I don't want to make the mistake of accessing the data on the row that may have the name "43", I want to access the 43rd row.
I added a picture of my Data Frames.
I would like to get a list with the data on the selected rows.
You can use .values to convert the column data to a numpy array and index with your list. For example, if your data is in variable df and the list of indexes is idxs, then for a given column:
df[column].values[idxs]
I have two questions.
I would like to select rows from a dataframe called df_12 when the value of STABBR column is one of the elements of an array called states100.
The array is the states that had 100+ respondents to a survey.
states100 = ['CA','TX','NY','FL','PA','OH','IL','MI','MO',
'NC','MA','GA','TN','VA','NJ','IN','MN','PR',
'OK','AZ','CO','WA','WI','LA','KY','SC','CT',
'KS'
]
And dataframe is like this:
What I would like select the row when the value of STABBR is one of the elements of states100 array. I tried to use for loop, but I am not sure how for loop work in dataframe situation.
Next question is, with those selected rows I would like to fill in an empty pandas dataframe. When I did pd.dataframe(index=df_12.index, columns=df_12.columns) to create an empty dataframe, it already has already set shape (which is the shape of df_12), and when the number of selected rows are less than the empty dataframe's shape, there will be plenty of NaN in the dataframe. I would like to extend the empty dataframe whenever a new row is added.
I have a pandas dataframe as shown below , My goal is to create a new column in the existing pandas dataframe and assign values to each row of that column based on the values of that in 2 different columns, The catch is that one of is nothing to do with arithmetic vallue meaning in one column I have sensor number and the other column I have the slope , Now in my new column which I want to create , I have to add a matching component for the corresponding sensor number and slope
I tried creating a dictionary and mapping it to the pandas dataframe, But I am able to include only the sensor number there and not the corresponding slope
Component_LIST= {1015:'Blade_Fatigue', 215:'Hub=pitch_Bolts'}
df_fatigue_filtered['Component_List'] =
df_fatigue_filtered['Sensor_Num'].map(Component_LIST)
Now the above code creates the component list for the corresponding sensor number , But every sensor has more than one slope , So just want to know how can the slope also be mapped here
I took data from an .xlsx file and stored it in the dataframe. The data frame is called df, and the size of the dataframe is (51,3). 51 rows. 3 columns. The columns are unnamed and numbered 0,1,2. The rows are indexed from 0-50. What syntax would I use to extract data from a dataframe with pandas in python and put it into a csv? I know I would use DataFrame.to_csv("outputFile.csv" ), but I'm not sure how to identify a specific piece of data (row/column pair), so I can put it in a new location in the csv table in comparison to the old excel table.
You can use integer based indexing using iloc: http://pandas.pydata.org/pandas-docs/stable/indexing.html#selection-by-position
In your case, the row/col value you are looking for can be retrieved by:
df.iloc[row_id, col_id]