This question already has an answer here:
Pandas select rows and columns based on boolean condition
(1 answer)
Closed 4 years ago.
what is the pandas equivalent of
SELECT Column2
FROM DF
WHERE column3 ="value"
when we are using dataframe please
THANK YOU
You can use .loc and a conditional statement on a Dataframe to select out relevant information, similar to a where clause. You can pull out your desired column(s) using a second argument to loc, e.g.
df.loc[df[column3]==“value”, [column2]]
Related
This question already has answers here:
How do I melt a pandas dataframe?
(3 answers)
Closed 7 months ago.
I have a table in csv that looks like this:
I want to tranpose it to look like this, where the columns are nmow rows of a new column called ACCOUNTLABEL, and the values are in a corresponding column called VALUE:
Any help? thanks!
You might want to look at pandas.melt function : https://pandas.pydata.org/docs/reference/api/pandas.melt.html
I wouldn't call that a 'transposition' but 'un-pivoting' a table.
Edit: I just noticed that your question has nothing to do with transposing a DataFrame, but I will leave this here, in case it helps.
Use df.T for this. It uses the linked method.
I didn't downvote your question, but someone did because the provided link is the first search result if you google 'transpose pandas dataframe'.
This question already has answers here:
How can repetitive rows of data be collected in a single row in pandas?
(3 answers)
pandas group by and find first non null value for all columns
(3 answers)
Closed 7 months ago.
While using iterrows to implement the logic takes lot of time.Can some suggest a way on how I could optimize the code with vectorized/apply()
Below is the input table..From a partition of (ITEMSALE,ITEMID),I need to populate rows with rank=1 .If any column value is null in rank=1,I need to populate the next available value in that column.This has to be done for all columns in dataset.
Below is the output format expected
I have tried below logic using iterrows where am accessing values rowise.Performance is too low using this method.
This should get you what you need
df.loc[df.loc[df['Item_ID'].isna()].groupby('Item_Sale')['Date'].idxmin()]
This question already has answers here:
Python renaming Pandas DataFrame Columns
(4 answers)
Multiple aggregations of the same column using pandas GroupBy.agg()
(4 answers)
Closed 11 months ago.
im new to python, i used to code...
StoreGrouper= DonHenSaless.groupby('Store')
StoreGrouperT= StoreGrouper["SalesDollars"].agg(np.sum)
StoreGrouperT.rename(columns={SalesDollars:TotalSalesDollars})
to group stores and sum by the SalesDollars then rename SalesDollars to TotalSalesDollars. it outputted the following error...
NameError: name 'SalesDollars' is not defined
I also tried using quotes
StoreGrouper= DonHenSaless.groupby('Store')
StoreGrouperT= StoreGrouper["SalesDollars"].agg(np.sum)
StoreGrouperT= StoreGrouperT.rename(columns={'SalesDollars':'TotalSalesDollars'})
This output the error: rename() got an unexpected keyword argument 'columns'
Here is my df
df
In order to rename a column you need quotes so it would be:
StoreGrouperT.rename(columns={'SalesDollars':'TotalSalesDollars'})
Also I usually assign it a variable
StoreGrouperT = StoreGrouperT.rename(columns={'SalesDollars':'TotalSalesDollars'})
Use the pandas rename option to change the column name. You can also use inplace as true if you want your change to get reflected to the dataframe rather than saving it again on the df variable
df.rename(columns={'old_name':'new_name'}, inplace=True)
This question already has answers here:
pandas read_csv remove blank rows
(4 answers)
Closed 1 year ago.
I’ve read a file into a dataframe, and every second row is n/a. How do I remove the offending blank rows?
I am assuming there are many ways to do this. But I just use iloc
df = df.iloc[::2,:]
Try it and let me know if it worked for you.
This question already has answers here:
How to filter Pandas dataframe using 'in' and 'not in' like in SQL
(11 answers)
Closed 5 years ago.
This should be incredibly easy, but I can't get it to work.
I want to filter my dataset on two or more values.
#this works, when I filter for one value
df.loc[df['channel'] == 'sale']
#if I have to filter, two separate columns, I can do this
df.loc[(df['channel'] == 'sale')&(df['type']=='A')]
#but what if I want to filter one column by more than one value?
df.loc[df['channel'] == ('sale','fullprice')]
Would this have to be an OR statement? I can do something like in SQL using in?
There is a df.isin(values) method wich tests
whether each element in the DataFrame is contained in values.
So, as #MaxU wrote in the comment, you can use
df.loc[df['channel'].isin(['sale','fullprice'])]
to filter one column by multiple values.