I have a Pandas data frame, and I would like to remove all rows where there is a character "?" in column 6.
Assuming df is my data frame, I tried:
df2 = df[df[6].str.contains("\?")==False]
This, however, does only seem to generate a view of my original frame (when I print df2, the rows I wanted to remove are gone, but the row indices skip values at the removed rows...).
How can I obtain an independent new data frame df2 where the targeted rows are gone?
edit: the frame looks like this:
You can do that:
df2 = df[~df[6].str.contains("?")].reset_index(drop=True)
df2
Related
I am trying to format a data frame from 2 rows to 1 rows. but I am encountering some issues. Do you have any idea on how to do that? Here the code and df:
Thanks!
If you are looking to convert two rows into one, you can do the following...
Stack the dataframe and reset the index at level=1, which will convert the data and columns into a stack. This will end up having each of the column headers as a column (called level_1) and the data as another column(called 0)
Then set the index as level_1, which will move the column names as index
Remove the index name (level_1). Then transpose the dataframe
Code is shown below.
df3=df3.stack().reset_index(level=1).set_index('level_1')
df3.index.name = None
df3=df3.T
Output
df3
Looking to split columns of this data frame into multiple data frames. Each with the date column and the consecutive column. How do I get a function that can automate this. So we would have n data frames, n being the number of columns in the original data frame - 1( the date column).
The first thing first is to set the date column as the index:
df.set_index('Date')
Then, when you filter the data frame by a single column you will get a series object with the date and your column of interest:
e.g. df.P19245Y8E will give a series of the second column.
I think this will do what you need, but if you really want to create separate dataframes for each column then you just iterate through the columns:
new_dfs = []
for col in df.columns:
new_dfs.append(df[col])
or with list comprehension:
new_dfs = [df[col] for col in df.columns]
I have a data frame that I made the transpose of it looking like this
I would like to know how I can transform this group into filled lines, follow an example below
Where the first column is filled with the first value until the last empty row.
how can i do this if the column is grouped
In your case, repeat the indices of your data frame five times, save them in a new column, and then make the column entries original indices.
ibov_transpose['index'] = ibov_transpose.index.repeat(5)
ibov_transpose.set_index('index')
del(ibov_transpose['index'])
Essentially, I'd like to be able to do something like this to filter a data frame to show only rows where the value in every column is positive:
for column in df.columns:
df = df.loc[flow_df[column]>0,:]
but in only one line of code. Is that possible?
This should work.
df_neg =df.loc[(df>0).all(axis=1)]
I have two data frames that I imported as spreadsheets into Pandas and cleaned up. They have a similar key value called 'PurchaseOrders' that I am using to match product numbers to a shipment number. When I attempt to merge them, I only end up with a df of 34 rows, but I have over 400 pairs of matching product to shipment numbers.
This is the closest I've gotten, but I have also tried using join()
ShipSheet = pd.merge(new_df, orders, how ='inner')
ShipSheet.shape
Here is my order df
orders df
and here is my new_df that I want to add to my orders df using the 'PurchaseOrders' key
new_df
In the end, I want them to look like this
end goal df
I am not sure if I'm not using the merge function improperly, but my end product should have around 300+ rows. I will note that the new_df data frame's 'PurchaseOrders' values had to be delimited from a single column and split into rows, so I guess this could have something to do with it.
Use the merge method on the dataframe and specify the key
merged_inner = pd.merge(left=df_left, right=df_right, left_on='PurchaseOrders', right_on='PurchaseOrders')
learn more here
Use the concat method on pandas and specify the axis.
final_df = pd.concat([new_df, order], axis = 1)
when you specify the axis please careful if you specify axis = 0 then it placed second data frame under the first one and if you specify axis = 1 then it placed the second data frame right to the first data frame.