Convert variables from dataframe into nums - python

I need to convert my Sex and Embarked values into num.
My solution doesn't work properly, it changes values for all columns
titset[titset['Sex']=='male'] = 2
titset[titset['Sex']=='female'] = 1
titset
Piece of my dataframe:
import pandas as pd
id = [1,2,3,4,5,6,7,8,9,10]
data = {
'Fare': ['male', 'female', 'male', 'male', 'female','male','male','female','male','female'],
'Embarked': ['S','C','Q','S','S','S','C','C','S','S']
}
titanic = pd.DataFrame(data=data, index=id)
titanic

Use Series.map:
titanic["Fare"] = titanic["Fare"].map({"male": 2, "female": 1})

you can use code from #Code_Different or you can use DataFrame.replace function
titanic = titanic.replace({'Fare':{'male':2,'female':1'}})

Related

Using 'isin' in python for three filters

I have the following dataframe
# Import pandas library
import pandas as pd
import numpy as np
# initialize list elements
data = ['george',
'instagram',
'nick',
'basketball',
'tennis']
# Create the pandas DataFrame with column name is provided explicitly
df = pd.DataFrame(data, columns=['Unique Words'])
# print dataframe.
df
and I want to create a new column based on the following two lists that looks like this
key_words = ["football", "basketball", "tennis"]
usernames = ["instagram", "facebook", "snapchat"]
Label
-----
0
2
0
1
1
So the words that are in the list key_words take the label 1, in the list usernames take the label 2 and all the other the label 0.
Thank you so much for your time and help!
One way to do this is to create a label map by numbering all of the elements in the first list as 1, and the other as 2. Then you can use .map in pandas to map the values and fillna with 0.
# Import pandas library
import pandas as pd
import numpy as np
# initialize list elements
data = ['george',
'instagram',
'nick',
'basketball',
'tennis']
# Create the pandas DataFrame with column name is provided explicitly
df = pd.DataFrame(data, columns=['Unique Words'])
key_words = ["football", "basketball", "tennis"]
usernames = ["instagram", "facebook", "snapchat"]
label_map = {e: i+1 for i, l in enumerate([key_words,usernames]) for e in l}
print(label_map)
df['Label'] = df['Unique Words'].map(label_map).fillna(0).astype(int)
print(df)
Output
{'football': 1, 'basketball': 1, 'tennis': 1, 'instagram': 2, 'facebook': 2, 'snapchat': 2}
Unique Words Label
0 george 0
1 instagram 2
2 nick 0
3 basketball 1
4 tennis 1

Extract values from dictionary and conditionally assign them to columns in pandas

I am trying to extract values from a column of dictionaries in pandas and assign them to their respective columns that already exist. I have hardcoded an example below of the data set that I have:
df_have = pd.DataFrame(
{
'value_column':[np.nan, np.nan, np.nan]
,'date':[np.nan, np.nan, np.nan]
,'string_column':[np.nan, np.nan, np.nan]
, 'dict':[[{'value_column':40},{'date':'2017-08-01'}],[{'value_column':30},
{'string_column':'abc'}],[{'value_column':10},{'date':'2016-12-01'}]]
})
df_have
df_want = pd.DataFrame(
{
'value_column':[40, 30, 10]
,'date':['2017-08-01', np.nan, '2016-12-01']
,'string_column':[np.nan, 'abc', np.nan]
,'dict':[[{'value_column':40},{'date':'2017-08-01'}],[{'value_column':30},
{'string_column':'abc'}],[{'value_column':10},{'date':'2016-12-01'}]]})
df_want
I have managed to extract the values out of the dictionaries using loops:
'''
for row in range(len(df_have)):
row_holder = df_have.dict[row]
number_of_dictionaries_in_the_row = len(row_holder)
for dictionary in range(number_of_dictionaries_in_the_row):
variable_holder = df_have.dict[row][dictionary].keys()
variable = list(variable_holder)[0]
value = df_have.dict[row][dictionary].get(variable)
'''
I now need to somehow conditionally turn df_have into df_want. I am happy to take a completely new approach and recreate the whole thing from scratch. We could even assume that I only have a dataframe with the dictionaries and nothing else.
You could use pandas string methods to pull the data out, although I think it is inefficient nesting data structures within Pandas :
df_have.loc[:, "value_column"] = df_have["dict"].str.get(0).str.get("value_column")
df_have.loc[:, "date"] = df_have["dict"].str.get(-1).str.get("date")
df_have.loc[:, "string_column"] = df_have["dict"].str.get(-1).str.get("string_column")
value_column date string_column dict
0 40 2017-08-01 None [{'value_column': 40}, {'date': '2017-08-01'}]
1 30 None abc [{'value_column': 30}, {'string_column': 'abc'}]
2 10 2016-12-01 None [{'value_column': 10}, {'date': '2016-12-01'}]

How to add new line to existing pandas dataframe? [duplicate]

This question already has answers here:
Create a Pandas Dataframe by appending one row at a time
(31 answers)
Closed 3 years ago.
I have a pandas dataframe that has been defined as empty and then I would like to add some rows to it after doing some calculations.
I have tried to do the following:
test = pd.DataFrame(columns=['Name', 'Age', 'Gender'])
if #some statement:
test.append(['James', '95', 'M'])
If I try to print and then append to test shows
print(test)
test.append(['a', 'a', 'a', 'a', 'a', 'a'])
print(test)
>>>
Empty DataFrame
Columns: [Name, Age, Gender]
Index: []
Empty DataFrame
Columns: [Name, Age, Gender]
Index: []
So clearly the line is not being added to the dataframe.
I want the output to be
Name | Age | Gender
James | 95 | M
Use append with dictionary as:
test = test.append(dict(zip(test.columns,['James', '95', 'M'])), ignore_index=True)
print(test)
Name Age Gender
0 James 95 M
You can pass as a Series:
test.append(pd.Series(['James', 95, 'M'], index=test.columns), ignore_index=True)
[out]
Name Age Gender
0 James 95 M
Try to append it as dictionary:
>>> test = test.append({'Name': "James", "Age": 95, "Gender": "M"}, ignore_index=True)
>>> print(test)
Outputs:
Name Age Gender
0 James 95 M
Append function will convert list into DataFrame,it automatic generation columns[0],but test have columns=['Name', 'Age', 'Gender'].And append Will not change test.What I said may be confusing,running the following code might make you understand.
import pandas as pd
#Append function will convert list into DataFrame,and two dataframe object should have the same column
data = pd.DataFrame(['James', '95', 'M'])
print(data)
#right code
test = pd.DataFrame(columns=['Name', 'Age', 'Gender'])
test = test.append(pd.DataFrame([['James', '95', 'M']],columns=['Name', 'Age', 'Gender']))
print(test)
Try this,
>>> test.append(dict(zip(test.columns,['James', '95', 'M'])), ignore_index=True)
Name Age Gender
0 James 95 M
First, the append method does not modify the DataFrame in place but returns the modified (appended version).
Second, the new row passed should be either a DataFrame, either a dict/Series, or either a list of these.
#using dict
row = {'Name': "James", "Age": 95, "Gender": "M"}
# using Series
row = pd.Series(['James', 95, 'M'], index=test.columns))
Try print( test.append(row) ) and see the result.
What you need is to save the return value of test.append as the appended version, you can save it with the same name if you don't care about the preceding version, it gives us:
test = test.append( row )

Python Error: 'list' has no attribute 'mean'

I am trying to get the mean value for a list of percentages from an Excel file which has data. My current code is as follows:
import numpy as pd
data = pd.DataFrame =({'Percentages': [.20, .10, .05], 'Nationality':['American', 'Mexican', 'Russian'],
'Gender': ['Male', 'Female'], 'Question': ['They have good looks']})
pref = data[data.Nationality == 'American']
prefPref = pref.pivot_table(data.Percentage.mean(), index=['Question'], column='Gender')
The error is coming from where I try to get the .mean() from my ['Percentage'] list. So, how can I get the mean from the list of Percentages? Do I need to create a variable for the mean value, and if so how to I implement that into the code?
["Percentage"] is a list containging the single string item "Percentage". It isn't possible to calculate a mean from lists of text.
In addition, the method .mean() doesn't exist in Python for generic lists, have a look at numpy for calculating means and other mathematical operations.
For example:
import numpy
numpy.array([4,2,6,5]).mean()
Here is a reworked version of your pd.pivot_table. See also How to pivot a dataframe.
import pandas as pd, numpy as np
data = pd.DataFrame({'Percentages': [0.20, 0.10, 0.05],
'Nationality': ['American', 'American', 'Russian'],
'Gender': ['Male', 'Female', 'Male'],
'Question': ['Q1', 'Q2', 'Q3']})
pref = data[data['Nationality'] == 'American']
prefPref = pref.pivot_table(values='Percentages', index='Question',\
columns='Gender', aggfunc='mean')
# Gender Female Male
# Question
# Q1 NaN 0.2
# Q2 0.1 NaN

Get a list of keys and values in a nested dictionary oriented by index

I have an Excel file with a structure like this:
name age status
anna 35 single
petr 27 married
I have converted such a file into a nested dictionary with a structure like this:
{'anna': {'age':35}, {'status': 'single'}},
{'petr': {'age':27}, {'status': 'married'}}
using pandas:
import pandas as pd
df = pd.read_excel('path/to/file')
df.set_index('name', inplace=True)
print(df.to_dict(orient='index'))
But now when running list(df.keys()) it returns me a list of all keys in the dictionary ('age', 'status', etc) but not 'name'.
My eventual goal is that it returns me all the keys and values by typing a name.
Is it possible somehow? Or maybe I should use some other way to import a data in order to achieve a goal? Eventually I should anyway come to a dictionary because I will merge it with other dictionaries by a key.
I think you need parameter drop=False to set_index for not drop column Name:
import pandas as pd
df = pd.read_excel('path/to/file')
df.set_index('name', inplace=True, drop=False)
print (df)
name age status
name
anna anna 35 single
petr petr 27 married
d = df.to_dict(orient='index')
print (d)
{'anna': {'age': 35, 'status': 'single', 'name': 'anna'},
'petr': {'age': 27, 'status': 'married', 'name': 'petr'}}
print (list(df.keys()))
['name', 'age', 'status']
Given a dataframe from excel, you should do this to obtain the thing you want:
resulting_dict = {}
for name, info in df.groupby('name').apply(lambda x: x.to_dict()).iteritems():
stats = {}
for key, values in info.items():
if key != 'name':
value = list(values.values())[0]
stats[key] = value
resulting_dict[name] = stats
Try this :
import pandas as pd
df = pd.read_excel('path/to/file')
df[df['name']=='anna'] #Get all details of anna
df[df['name']=='petr'] #Get all details of petr

Categories