translate dataframe with dictionary in python [duplicate] - python

This question already has answers here:
Remap values in pandas column with a dict, preserve NaNs
(11 answers)
Closed 5 years ago.
Having the following pandas Dataframe sample:
df = pd.DataFrame([[1,2],[1,2],[3,5]])
df
0 1
0 1 2
1 1 2
2 3 5
And the following dictionary:
d = {1:'foo',2:'bar',3:'tar',4:'tartar',5:'foofoo'}
I would like to "translate" the dataframe by using the dictionary d. The output looks like:
result = pd.DataFrame([['foo','bar'],['foo','bar'],['tar','fofo']])
result
0 1
0 foo bar
1 foo bar
2 tar fofo
I would like to avoid using for loops. The solution I'm trying to find is something with map or similars...

Solution
Replacing whole dataframe:
result_1 = df.replace(d)
Replacing a specific column of a dataframe:
result_2 = df.replace({"COLUMN":d})

Related

Transpose a table using pandas [duplicate]

This question already has answers here:
How can I pivot a dataframe?
(5 answers)
Closed 7 months ago.
I have a dataframe that looks like this:
A
type
val
first
B
20
second
B
30
first
C
200
second
C
300
I need to get it to look like this:
A
B
C
first
20
200
second
30
300
How do I do this using Pandas? I tried using transpose, but couldn't get it to this exact table.
df = df.pivot('A','type')
df.columns = [x[1] for x in list(df.columns)]
df.reset_index()

How to aggregate duplicate rows in python? [duplicate]

This question already has answers here:
Get statistics for each group (such as count, mean, etc) using pandas GroupBy?
(9 answers)
How to count duplicate rows in pandas dataframe?
(10 answers)
Closed 2 years ago.
I have a dataframe that looks like this:
Cell1 Cell2 Cell3
A B B
A B B
B B A
C B A
I am trying to get the following output:
Cell1 Cell2 Cell3 sum
A B B 2
B B A 1
C B A 1
I tried the aggregate function, but can't find the solution for this.

Doing .diff() on pandas column(s) gives wrong output? [duplicate]

This question already has answers here:
Subtract consecutive columns in a Pandas or Pyspark Dataframe
(2 answers)
Closed 2 years ago.
I am trying to take the difference of a column using .diff() in a dataframe with a date column and a value column.
import pandas as pd
d = {'Date':['11/11/2011', '11/12/2011', '11/13/2011'], 'a': [2, 3,4]}
df1 = pd.DataFrame(data=d)
df1.diff(axis = 1)
Pandas gives me this output:
Date a
0 11/11/2011 2
1 11/12/2011 3
2 11/13/2011 4
Which is the df1 and not the difference where I expect the output to be:
Date a
0 11/11/2011 NaN
1 11/12/2011 1
2 11/13/2011 1
df1.set_index('Date').diff(axis = 0) saves the day
axis=1 means you are subtracting columns not rows. Your target result is related to rows. Use axis=0 instead.
Second, it is not correct to do subtractions over strings. It will throw an error since python does not support that.

How to aggregate rows in a dataframe [duplicate]

This question already has answers here:
How do I Pandas group-by to get sum?
(11 answers)
Closed 2 years ago.
I have a dataframe that contains values by country (and by region in certain countries) and which looks like this:
For each country that is repeated, I would add the values by regions so that there is only one row per country and obtain the following file:
How can I do this in Python? Since I'm really new to Python, I don't mind having a long set of instructions, as long as the procedure is clear, rather than a single line of code, compacted but hard to understand.
Thanks for your help.
You want to study the split-apply-combine paradigm of Pandas DataFrame manipulation. You can do a lot with it. What you want to do is common, and can be accomplished in one line.
>>> import pandas as pd
>>> df = pd.DataFrame({"foo": ["a","b","a","b","c"], "bar": [6,5,4,3,2]})
>>> df
foo bar
0 a 6
1 b 5
2 a 4
3 b 3
4 c 2
>>> df.groupby("foo").sum()
bar
foo
a 10
b 8
c 2

how to use map in index of pandas dataframe [duplicate]

This question already has answers here:
Map dataframe index using dictionary
(6 answers)
Closed 1 year ago.
I want to create a new column on a pandas dataframe using values on the index and a dictionary that translates these values into something more meaningful. My initial idea was to use map. I arrived to a solution but it is very convoluted and there must be a more elegant way to do it. Suggestions?
#dataframe and dict definition
df=pd.DataFrame({'foo':[1,2,3],'boo':[3,4,5]},index=['a','b','c'])
d={'a':'aa','b':'bb','c':'cc'}
df['new column']=df.reset_index().set_index('index',drop=False)['index'].map(d)
Creating a new series explicitly is a bit shorter:
df['new column'] = pd.Series(df.index, index=df.index).map(d)
After to_series, you can using map or replace
df.index=df.index.to_series().map(d)
df
Out[806]:
boo foo
aa 3 1
bb 4 2
cc 5 3
Or we think about another way
df['New']=pd.Series(d).get(df.index)
df
Out[818]:
boo foo New
a 3 1 aa
b 4 2 bb
c 5 3 cc

Categories