I am trying to fetch values from an excel file using pandas dataframe and print out the values of each cells.
Im using read_excel() to populate the dataframe, and
I am looking for specific rows using the following line of code:
df.loc[df['Parcel_ID'] == parcel]
parcel being an arbitrary input from the user. And I simply use this to print out the cells:
row = df.loc[df['Parcel_ID'] == parcel]
print row['Category']
print row['Sub_Category']
.
.
(more values)
What I want is only the values from the cells, yet I get dtypes, names of the column, and other junks that I don't want to see. How would I only print out the value from each cells?
If you have several values in your row you could use following:
row['Category'].values.tolist()
row['Sub_Category'].values.tolist()
IIUC the following should work:
print row['Category'][0]
print row['Sub_Category'][0]
what is returned will be a Series in your case a Series with a single element which you index into to return a scalar value
How to find a value in a column (Column 1 name) by another value in another column (Column 2 name)
df = pd.read_excel('file.xlsm','SheetName') #Get a excel file sheet
Then you have two ways to get that:
First Way:
Value_Target = df.loc[df['Column1name']== 'Value_Key']['Column2name'].values
Second Way:
Value_Target = df['Column1name'][df['Column2name']=='Value_Key'].values[0]
Value_Key is the value in a column you have
Value_Target is the value you want to find using Value_Key
Related
I've been trying to search for a specific keyword in a specific CSV column and print the value found, although when I get the print it returns the whole row.
I have tried the data.filter function command I didn't have much luck.
What I plan to do is:
Get to the right column
Search for a value in the specific column
Print a value from another named column but not the whole row.
The code:
import pandas as pd
#list only specific columns
list_columns = ["Name", "Size","Colour"]
#read csv in the defined columns above
data = pd.read_csv("Database.csv", usecols=list_columns)
##print a specific column only
#print(data["Colour"])
#define keyword,and wherre/what to look for
keyword = data[data["Name"] == 'Jeans']
#print the value found
print (keyword.head())
And the output:
Name Colour Size
0 Jeans Black X-Large
I feel like I need to tell it to print only a value from a specific column instead of keyword.head() - is that correct?
From the comments:
Use
print(*keyword["Size"], sep='\n')
If you want to store the results in a variable you can use
lst = [*keyword["Size"]]
or even better,
lst = keyword['Size'].tolist()
I have a use case where I need to fill a new pandas column with the contents of a specific cell in the same table. There are 60 countries in Europe, so I need to fill a shared currency column with the content's of one country's currency (as an example only)
I need an SQL "Where" clause for Pandas - that:
1. Searches the dataframe rows for the single occurrence of "Britain" in column "country"
2. Returns a single, unique value "pound" from df['currency'].
3. Creates a new column filled with just this value = string "pound"
w['Euro_currency'] = w['Euro_currency'].map(w.loc["country"]=="Britain"["currency"])
# [Britain][currency] - contains the value - "Pound"
When this works correctly, every row in the new column 'Euro_currency' contains the value "pound"
How about you take the value from that cell and just create a new column with it as below:
p = w.loc["Britain"]["currency"]
w['Euro_currency'] = p
Does this work for you?
Thanks for help. I found this answer by #anton-protopopov at extract column value based on another column pandas dataframe
currency_value = df.loc[df['country'] == 'Britain', 'currency'].iloc[0]
df['currency_britain'] = currency_value
#anderson-zhu also mentioned that .item() would work as well
currency_value = df.loc[df['country'] == 'Britain', 'currency'].item()
I'm trying to clean an excel file that has some random formatting. The file has blank rows at the top, with the actual column headings at row 8. I've gotten rid of the blank rows, and now want to use the row 8 string as the true column headings in the dataframe.
I use this code to get the position of the column headings by searching for the string 'Destination' in the whole dataframe, and then take the location of the True value in the Boolean mask to get the list for renaming the column headers:
boolmsk=df.apply(lambda row: row.astype(str).str.contains('Destination').any(), axis=1)
print(boolmsk)
hdrindex=boolmsk.index[boolmsk == True].tolist()
print(hdrindex)
hdrstr=df.loc[7]
print(hdrstr)
df2=df.rename(columns=hdrstr)
However when I try to use hdrindex as a variable, I get errors when the second dataframe is created (ie when I try to use hdrstr to replace column headings.)
boolmsk=df.apply(lambda row: row.astype(str).str.contains('Destination').any(), axis=1)
print(boolmsk)
hdrindex=boolmsk.index[boolmsk == True].tolist()
print(hdrindex)
hdrstr=df.loc[hdrindex]
print(hdrstr)
df2=df.rename(columns=hdrstr)
How do I use a variable to specify an index, so that the resulting list can be used as column headings?
I assume your indicator of actual header rows in dataframe is string "destination". Lets find where it is:
start_tag = df.eq("destination").any(1)
We'll keep the number of the index of first occurrence of word "destination" for further use:
start_row = df.loc[start_tag].index.min()
Using index number we will get list of values in the "header" row:
new_col_names = df.iloc[start_row].values.tolist()
And here we can assign new column names to dataframe:
df.columns = new_col_names
From here you can play with new dataframe, actual column names and proper indexing:
df2 = df.iloc[start_row+1:].reset_index(drop=True)
I have been trying to wrap my head around this for a while now and have yet to come up with a solution.
My question is how do I change current column values in multiple columns based on the column name if criteria is met???
I have survey data which has been read in as a pandas csv dataframe:
import pandas as pd
df = pd.read_csv("survey_data")
I have created a dictionary with column names and the values I want in each column if the current column value is equal to 1. Each column contains 1 or NaN. Basically any column within the data frame ending in '_SA' =5, '_A' =4, '_NO' =3, '_D' =2 and '_SD' stays as the current value 1. All of the 'NaN' values remain as is. This is the dictionary:
op_dict = {
'op_dog_SA':5,
'op_dog_A':4,
'op_dog_NO':3,
'op_dog_D':2,
'op_dog_SD':1,
'op_cat_SA':5,
'op_cat_A':4,
'op_cat_NO':3,
'op_cat_D':2,
'op_cat_SD':1,
'op_fish_SA':5,
'op_fish_A':4,
'op_fish_NO':3,
'op_fish_D':2,
'op_fish__SD':1}
I have also created a list of the columns within the data frame I would like to be changed if the current column value = 1 called [op_cols]. Now I have been trying to use something like this that iterates through the values in those columns and replaces 1 with the mapped value in the dictionary:
for i in df[op_cols]:
if i == 1:
df[op_cols].apply(lambda x: op_dict.get(x,x))
df[op_cols]
It is not spitting out an error but it is not replacing the 1 values with the corresponding value from the dictionary. It remains as 1.
Any advice/suggestions on why this would not work or a more efficient way would be greatly appreciated
So if I understand your question you want to replace all ones in a column with 1,2,3,4,5 depending on the column name?
I think all you need to do is iterate through your list and multiple by the value your dict returns:
for col in op_cols:
df[col] = df[col]*op_dict[col]
This does what you describe and is far faster than replacing every value. NaNs will still be NaNs, you could handle those in the loop with fillna if you like too.
my question is very similar to here: Find unique values in a Pandas dataframe, irrespective of row or column location
I am very new to coding, so I apologize for the cringing in advance.
I have a .csv file which I open as a pandas dataframe, and would like to be able to return unique values across the entire dataframe, as well as all unique strings.
I have tried:
for row in df:
pd.unique(df.values.ravel())
This fails to iterate through rows.
The following code prints what I want:
for index, row in df.iterrows():
if isinstance(row, object):
print('%s\n%s' % (index, row))
However, trying to place these values into a previously defined set (myset = set()) fails when I hit a blank column (NoneType error):
for index, row in df.iterrows():
if isinstance(row, object):
myset.update(print('%s\n%s' % (index, row)))
I get closest to what I was when I try the following:
for index, row in df.iterrows():
if isinstance(row, object):
myset.update('%s\n%s' % (index, row))
However, my set prints out a list of characters rather than the strings/floats/values that appear on my screen when I print above.
Someone please help point out where I fail miserably at this task. Thanks!
I think the following should work for almost any dataframe. It will extract each value that is unique in the entire dataframe.
Post a comment if you encounter a problem, i'll try to solve it.
# Replace all nones / nas by spaces - so they won't bother us later
df = df.fillna('')
# Preparing a list
list_sets = []
# Iterates all columns (much faster than rows)
for col in df.columns:
# List containing all the unique values of this column
this_set = list(set(df[col].values))
# Creating a combined list
list_sets = list_sets + this_set
# Doing a set of the combined list
final_set = list(set(list_sets))
# For completion's sake, you can remove the space introduced by the fillna step
final_set.remove('')
Edit :
I think i know what happens. You must have some float columns, and fillna is failing on those, as the code i gave you was replacing missing values with an empty string. Try those :
df = df.fillna(np.nan) or
df = df.fillna(0)
For the first point, you'll need to import numpy first (import numpy as np). It must already be installed as you have pandas.