When I serached a way to remove an entire column in pandas if there is a null/NaN value, the only appropriate function I found was dropna(). For some reason, it's not removing the entire row as intended, but instead replacing the null values with zero. As I want to discard the entire row to then make a mean age of the animals from the dataframe, I need a way to not count the NaN values.
Here's the code:
import numpy as np
import pandas as pd
data = {'animal': ['cat', 'cat', 'snake', 'dog', 'dog', 'cat', 'snake', 'cat', 'dog', 'dog'],
'age': [2.5, 3, 0.5, np.nan, 5, 2, 4.5, np.nan, 7, 3],
'visits': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
'priority': ['yes', 'yes', 'no', 'yes', 'no', 'no', 'no', 'yes', 'no', 'no']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
df = pd.DataFrame(data, labels)
df.dropna(inplace= True)
df.head()
In this case, I need to delete the Dog 'd' and Cat 'h'. But the code that comes out is:
To note I have also done this, and it didn't work either:
df2 = df.dropna()
you have to specify the axis = 1 and any to remove column
see : https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.dropna.html
df.dropna(axis=1, inplace= True, how='any')
if you want just delet the row :
df.dropna(inplace= True, how='any')
I am having trouble applying filters with pandas. The problem looks like this.
The first variable in the set (filter_names) should correspond to the first variable in the set (filter_values). The value of the second variable should be bigger or equal to the value given.
In other words, in the input like this:
df = pd.DataFrame({'animal': ['cat', 'cat', 'snake', 'dog', 'dog', 'cat', 'snake', 'cat', 'dog', 'dog'],
'age': [2.5, 3, 0.5, np.nan, 5, 2, 4.5, np.nan, 7, 3],
'name': ['Murzik', 'Pushok', 'Kaa', 'Bobik', 'Strelka', 'Vaska', 'Kaa2', 'Murka', 'Graf', 'Muhtar'],
'visits': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
'priority': ['yes', 'yes', 'no', 'yes', 'no', 'no', 'no', 'yes', 'no', 'no']},
index = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'])
filter_names = ["animal", "age"]
filter_values = ["cat", 3]
the condition to be put in the query looks like this:
"cat"=="animal", "age"<3.
It should provide the DF below:
animal age name visits priority
a cat 2.5 Murzik 1 yes
f cat 2.0 Vaska 3 no
I wrote the following code to achieve this effect:
df_filtered = df[(filter_names[0]==filter_values[0])&(df[filter_names[1]]>=filter_values[1])]
to no avail.
What do I seem to be missing?
I think you lost df[...]in the first condition and use the wrong sign in the second one:
df[(df[filter_names[0]] == filter_values[0]) & (df[filter_names[1]] < filter_values[1])]
It will work like this:
In [2]: df[(df[filter_names[0]] == filter_values[0]) & (df[filter_names[1]] < filter_values[1])]
Out[2]:
animal age name visits priority
a cat 2.5 Murzik 1 yes
f cat 2.0 Vaska 3 no
This is the dataframe:
exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'],
'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
I'm trying to print only even values of a column named "attempts"
is there a function I can use?
such as:
df["attempts"].even()
You can use
df.attempts[df.attempts % 2 == 0]
If you look at df.attempts % 2 == 0, you'll see it is a series of Trues and Falses. Then we use it as a boolean mask to select desired entries which give "0 as remainder when divided by 2" i.e., even ones.
Use DataFrame.loc for filter by mask - compare modulo 2 with 0 and column name:
df = pd.DataFrame(exam_data)
print (df.loc[df.attempts % 2 == 0, 'attempts'])
2 2
4 2
8 2
Name: attempts, dtype: int64
For odd values compare by 1:
print (df.loc[df.attempts % 2 == 1, 'attempts'])
0 1
1 3
3 3
5 3
6 1
7 1
9 1
Name: attempts, dtype: int64
You can try with the following:
print(df[df['attempts']%2==0]['attempts'].values)
This question already has answers here:
Python: get a frequency count based on two columns (variables) in pandas dataframe some row appears
(3 answers)
Closed 2 years ago.
Example df:
df = pd.DataFrame({
'group': ['a', 'a', 'a', 'a', 'a', 'b', 'b', 'b', 'b',],
'value': ['yes', 'yes', 'yes', 'no', 'no', 'yes', 'no', 'no', 'no']
})
I need to count each value in group and get something like this:
group yes no
a 3 2
b 1 3
I tried df.groupby(['group', 'value'])['value'].count().to_frame() and its look fine but too multiindexed, i need simple table like example above
try:
df.groupby(['group', 'value'])['value'].count().reset_index()
next, you can change the column names
Consider the following Python dictionary data and Python list labels:**
data = {'birds': ['Cranes', 'Cranes', 'plovers', 'spoonbills', 'spoonbills', 'Cranes', 'plovers', 'Cranes', 'spoonbills', 'spoonbills'],
'age': [3.5, 4, 1.5, np.nan, 6, 3, 5.5, np.nan, 8, 4],
'visits': [2, 4, 3, 4, 3, 4, 2, 2, 3, 2],
'priority': ['yes', 'yes', 'no', 'yes', 'no', 'no', 'no', 'yes', 'no', 'no']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
Create a DataFrame birds from this dictionary data which has the index labels using Pandas
Assuming your dictionary is already ordered into the correct ordering for the labels
import pandas as pd
data = {'birds': ['Cranes', 'Cranes', 'plovers', 'spoonbills', 'spoonbills', 'Cranes', 'plovers', 'Cranes', 'spoonbills', 'spoonbills'],
'age': [3.5, 4, 1.5, np.nan, 6, 3, 5.5, np.nan, 8, 4],
'visits': [2, 4, 3, 4, 3, 4, 2, 2, 3, 2],
'priority': ['yes', 'yes', 'no', 'yes', 'no', 'no', 'no', 'yes', 'no', 'no']}
data['labels'] = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
df = pd.DataFrame(data, columns=['birds', 'age', 'visits', 'priority', 'labels'])
df.set_index('labels')
Try the code below,
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
data = {
'birds': ['Cranes', 'Cranes', 'plovers', 'spoonbills', 'spoonbills', 'Cranes', 'plovers', 'Cranes', 'spoonbills', 'spoonbills'],
'age': [3.5, 4, 1.5, np.nan, 6, 3, 5.5, np.nan, 8, 4],
'visits': [2, 4, 3, 4, 3, 4, 2, 2, 3, 2],
'priority': ['yes', 'yes', 'no', 'yes', 'no', 'no', 'no', 'yes', 'no', 'no'],
'labels' : labels
}
df = pd.DataFrame.from_dict(data)
df.set_index('labels')
You can reduce some code like:
DataFrame provides us a flexibilty to provide some values like data,columns,index and the list goes on.
If we are dealing with Dictionary, then by default dictionaries keys are treated as column and values will be as rows.
In following Code I have used name attribute through DataFrame object
df=pd.DataFrame(data,index=Labels) # Custom indexes
df.index.name='labels' # After Running df.index.name you will get index as none, by this approach you can set any name to the column
I hope this will be help full for you.
Even I encountered the same exact issue few days back and we have a very beautiful library to handle dataframes and is better than pandas.
Search for turicreate in python, it is very very similar to the pandas but has a lot more to offer than pandas.
You can define the Sframes in Turienter image description here, somewhat similar to the pandas dataframe. After that you just have to run:
dataframe_name.show()
.show() visualizes any data structure in Turi Create.
You can visit the mentioned notebook for a better understanding: https://colab.research.google.com/drive/1DIFmRjGYx0UOiZtvMi4lOZmaBMnu_VlD
You can try out this.
import pandas as pd
import numpy as np
from pandas import DataFrame
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
data = {'birds': ['Cranes', 'Cranes', 'plovers', 'spoonbills', 'spoonbills', 'Cranes', 'plovers', 'Cranes', 'spoonbills', 'spoonbills'],
'age': [3.5, 4, 1.5, np.nan, 6, 3, 5.5, np.nan, 8, 4],
'visits': [2, 4, 3, 4, 3, 4, 2, 2, 3, 2],
'priority': ['yes', 'yes', 'no', 'yes', 'no', 'no', 'no', 'yes', 'no', 'no']}
df=DataFrame(data,index=labels)
print(df)