Panda print the mark if the name exists - python

Name
Mark
Ben
20
James
50
Jimmy
70
I have a dataframe which looks something like this. I wanna check if the name exists and then it will print the mark for that specific person.
if len(df[(df['Name'] == "James")]) != 0:
print(len(df["Mark"]))
Above is my code. Hope to get some advise!

Better use a Series here with get with a default argument:
marks = df.set_index('Name')['Mark']
marks.get('James', 'missing')
# 50
marks.get('Nonexistent', 'missing')
# missing
Or without default, get returns None:
marks.get('Nonexistent') # no output

You can return the mark of a specified name in your Name column using loc. The below will print the Mark of the name you pass, and will return an empty series if the name does not exist in your Name column:
name_to_retrieve_mark = 'Ben'
df.loc[df.Name.eq(name_to_retrieve_mark),'Mark']
Out[13]:
0 20
Name: Mark, dtype: int64
name_to_retrieve_mark = 'Sophocles'
df.loc[df.Name.eq(name_to_retrieve_mark),'Mark']
Out[15]: Series([], Name: Mark, dtype: int64)

Related

Pandas remove every entry with a specific value

I would like to go through every row (entry) in my df and remove every entry that has the value of " " (which yes is an empty string).
So if my data set is:
Name Gender Age
Jack 5
Anna F 6
Carl M 7
Jake M 7
Therefore Jack would be removed from the dataset.
On another note, I would also like to remove entries that has the value "Unspecified" and "Undetermined" as well.
Eg:
Name Gender Age Address
Jack 5 *address*
Anna F 6 *address*
Carl M 7 Undetermined
Jake M 7 Unspecified
Now,
Jack will be removed due to empty field.
Carl will be removed due to the value Undetermined present in a column.
Jake will be removed due to the value Unspecified present in a column.
For now, this has been my approach but I keep getting a TypeError.
list = []
for i in df.columns:
if df[i] == "":
# everytime there is an empty string, add 1 to list
list.append(1)
# count list to see how many entries there are with empty string
len(list)
Please help me with this. I would prefer a for loop being used due to there being about 22 columns and 9000+ rows in my actual dataset.
Note - I do understand that there are other questions asked like this, its just that none of them apply to my situation, meaning that most of them are only useful for a few columns and I do not wish to hardcode all 22 columns.
Edit - Thank you for all your feedbacks, you all have been incredibly helpful.
To delete a row based on a condition use the following:
df = df.drop(df[condition].index)
For example:
df = df.drop(df[Age==5].index) , will drop the row where the Age is 5.
I've come across a post regarding the same dating back to 2017, it should help you understand it more clearer.
Regarding question 2, here's how to remove rows with the specified values in a given column:
df = df[~df["Address"].isin(("Undetermined", "Unspecified"))]
Let's assume we have a Pandas DataFrame object df.
To remove every row given your conditions, simply do:
df = df[df.Gender == " " or df.df.Age == " " or df.Address in [" ", "Undetermined", "Unspecified"]]
If the unspecified fields are NaN, you can also do:
df = df.dropna(how="any", axis = 0)
Answer from #ThatCSFresher or #Bence will help you out in removing rows based on single column... Which is great!
However, I think there are multiple condition in your query needed to check across multiple columns at once in a loop. So, probably apply-lambda can do the job; Try the following code;
df = pd.DataFrame({"Name":["Jack","Anna","Carl","Jake"],
"Gender":["","F","M","M"],
"Age":[5,6,7,7],
"Address":["address","address","Undetermined","Unspecified"]})
df["Noise_Tag"] = df.apply(lambda x: "Noise" if ("" in list(x)) or ("Undetermined" in list(x)) or ("Unspecified" in list(x)) else "No Noise",axis=1)
df1 = df[df["Noise_Tag"] == "No Noise"]
del df1["Noise_Tag"]
# Output of df;
Name Gender Age Address Noise_Tag
0 Jack 5 address Noise
1 Anna F 6 address No Noise
2 Carl M 7 Undetermined Noise
3 Jake M 7 Unspecified Noise
# Output of df1;
Name Gender Age Address
1 Anna F 6 address
Well, OP actually wants to delete any column with "empty" string.
df = df[~(df=="").any(axis=1)] # deletes all rows that have empty string in any column.
If you want to delete specifically for address column, then you can just delete using
df = df[~df["Address"].isin(("Undetermined", "Unspecified"))]
Or if any column with Undetermined or Unspecified, try similar as the first solution in my post, just by replacing the empty string with Undertermined or Unspecified.
df = df[~((df=="Undetermined") | (df=="Unspecified")).any(axis=1)]
You can build masks and then filter the df according to it:
m1 = df.eq('').any(axis=1)
# m1 is True if any cell in a row has an empty string
m2 = df['Address'].isin(['Undetermined', 'Unspecified'])
# m2 is True if a row has one of the values in the list in column 'Address'
out = df[~m1 & ~m2] # invert both condition and get the desired output
print(out)
Output:
Name Gender Age Address
1 Anna F 6 *address*
Used Input:
df = pd.DataFrame({'Name': ['Jack', 'Anna', 'Carl', 'Jake'],
'Gender': ['', 'F', 'M', 'M'],
'Age': [5, 6, 7, 7],
'Address': ['*address*', '*address*', 'Undetermined', 'Unspecified']}
)
using lambda fun
Code:
df[df.apply(lambda x: False if (x.Address in ['Undetermined', 'Unspecified'] or '' in list(x)) else True, axis=1)]
Output:
Name Gender Age Address
1 Anna F 6 *add

How would I groupby and see if all members of the group meet a certain condition?

I want to groupby and see if all members in the group meet a certain condition. Here's a dummy example:
x = ['Mike','Mike','Mike','Bob','Bob','Phil']
y = ['Attended','Attended','Attended','Attended','Not attend','Not attend']
df = pd.DataFrame({'name':x,'attendance':y})
And what I want to do is return a 3x2 dataframe that shows for each name, who was always in attendance. It should look like below:
new_df = pd.DataFrame({'name':['Mike','Bob','Phil'],'all_attended':[True,False,False]})
Whats the best way to do this?
Thanks so much.
Let's try
out = (df['attendance'].eq('Attended')
.groupby(df['name']).all()
.to_frame('all_attended').reset_index())
print(out)
name all_attended
0 Bob False
1 Mike True
2 Phil False
one way could be:
df.groupby('name')['attendance'].apply(lambda x: True if x.unique().all()=='Attended' else False)
name
Bob False
Mike True
Phil False
Name: attendance, dtype: bool
I would say away from strings for data that does not need to be a string:
z = [s == 'Attended' for s in y]
df = pd.DataFrame({'name': x, 'attended': z})
Now you can check if all the elements for a given group are True:
>>> df.groupby('name')['attendance'].all()
name
Bob False
Mike True
Phil False
Name: attendance, dtype: bool
If something can only be a 0 or 1, using a string introduces the possibility of errors because someone might type Atended instead of Attended, for example.

Put a value in variable using Pandas

I have this csv file :
Names Credit
0 James 21
1 John 34
2 Lucas 20
3 William 11
And what i want to do using Pandas is : If i put any name like John , I want to add his Credit in a variable to do some math with it.
i'm trying this :
import pandas as pd
df = pd.read_csv('file.csv')
n = input('Enter a name: ')
x = df[df['Names'] == n]['Credit']
print(x)
but doesn't work for me :
Enter a name: John
1 30
Name: Credit, dtype: int64
(i'm trying to get just the number : 30)
You can .squeeze() that last dimension:
>>> df[df.name == 'John']['credit'].squeeze()
30
What you want is loc:
n = input('Enter a name: ')
x = df.loc[df['Names'] == n, 'Credit']
x is a pandas series, if you want only one value, you can do like this:
x = df[df['Names'] == n]['Credit'].tolist()[0]
But if you have two "John" in your data frame you will only get the credit for the first, thus make sure your Names column is always unique. If it is always unique, consider doing the following:
df = df.set_index('Name', drop=True)
This will make 'Name' the index of your data frame and then you can get the credit more easily in the following way:
x = df.loc[n, 'Credit']

How to replace the entry of a column with different name by recognizing a pattern?

I have a column let's say 'Match Place' in which there are entries like 'MANU # POR', 'MANU vs. UTA', 'MANU # IND', 'MANU vs. GRE' etc. So my columns have 3 things in its entry, the 1st name is MANU i.e, 1st country code, 2nd is #/vs. and 3rd is again 2nd country name. So what I wanna do is if '#' comes in any entry of my column I want is to be changed to 'away' and if 'vs.' comes in replace whole entry to 'home' like 'MANU # POR' should be changed to 'away' and 'MANU vs. GRE' should be changed to 'home'
although I wrote some code to do so using for, if, else but it's taking a way too long time to compute it and my total rows are 30697
so is there any other way to reduce time
below I'm showing you my code
pls help
for i in range(len(df)):
if is_na(df['home/away'][i]) == True:
temp = (df['home/away'][i]).split()
if temp[1] == '#':
df['home/away'][i] = 'away'
else:
df['home/away'][i] = 'home
You can use np.select to assign multiple conditions:
s=df['Match Place'].str.split().str[1] #select the middle element
c1,c2=s.eq('#'),s.eq('vs.') #assign conditions
np.select([c1,c2],['away','home']) #assign this to the desired column
#array(['away', 'home', 'away', 'home'], dtype='<U11')
use np.where to with contains to check any substring exist or not
import numpy as np
df = pd.DataFrame(data={"col1":["manu vs. abc","manu # pro"]})
df['type'] = np.where(df['col1'].str.contains("#"),"away","home")
col1 type
0 manu vs. abc home
1 manu # pro away
You can use .str.contains(..) [pandas-doc] to check if the string contains an #, and then use .map(..) [pandas-doc] to fill in values accordingly. For example:
>>> df
match
0 MANU # POR
1 MANU vs. UTA
2 MANU # IND
3 MANU vs. GRE
>>> df['match'].str.contains('#').map({False: 'home', True: 'away'})
0 away
1 home
2 away
3 home
Name: match, dtype: object
A fun usage of replace more info check link
df['match'].replace({'#':0},regex=True).astype(bool).map({False: 'away', True: 'home'})
0 away
1 home
2 away
3 home
Name: match, dtype: object

Scalar error when swapping out a hard coded string for a variable with pandas

df1 looks something like this:
name age
1 Bobby 17
2 Sally 23
3 John 19
df2 looks like this:
name city state
1 Bobby Lakeside MN
2 Sally Carlstown MS
3 John Wallsburg UT
I am looping through a DataFrame, df1, like this:
for row in df1.itertuples(name='Pandas', index=True):
name = getattr(row, "name")
print(type(name))
print(name)
and I will get (as expected):
<type 'str'>
Bobby
<type 'str'>
Sally
<type 'str'>
John
Then I am searching a second dataframe, df2, and getting it's row location (index) number, so I can get additional information.
i = df2[(df2['name'] == "Bobby").index.item()
i is now the integer... worked like a champ. It found Bobby in the other DataFrame, df2, and walla! Gave me the index number.
However... if I try swapping out the hard coded string "Bobby" to the variable like this...
for row in df1.itertuples(name='Pandas', index=True):
name = getattr(row, "name")
i = df2[(df2['name'] == name)].index.item()
then it explodes and dies.
for row in df1.itertuples(name='Pandas', index=True):
name = getattr(row, "name")
i = df2[(df2['name'] == str(name))].index.item()
I get the following exception:
ValueError: can only convert an array of size 1 to a Python scalar
I am at a complete loss, help! and Thank you!
Your logic seems overcomplicated. You can create a name to age mapping from df1 and iterate df2.iterrows. There is no need to access indices, unless you have repeated names. In the latter case, you can use the index.
s = df1.set_index('name')['age']
for _, row in df2.iterrows():
print('{0} who is {1} lives in {2}'.format(row['name'], s.get(row['name']), row.city))
Bobby who is 17 lives in Lakeside
Sally who is 23 lives in Carlstown
John who is 19 lives in Wallsburg

Categories