This question already has answers here:
How do I Pandas group-by to get sum?
(11 answers)
Closed 2 years ago.
I have a dataframe that contains values by country (and by region in certain countries) and which looks like this:
For each country that is repeated, I would add the values by regions so that there is only one row per country and obtain the following file:
How can I do this in Python? Since I'm really new to Python, I don't mind having a long set of instructions, as long as the procedure is clear, rather than a single line of code, compacted but hard to understand.
Thanks for your help.
You want to study the split-apply-combine paradigm of Pandas DataFrame manipulation. You can do a lot with it. What you want to do is common, and can be accomplished in one line.
>>> import pandas as pd
>>> df = pd.DataFrame({"foo": ["a","b","a","b","c"], "bar": [6,5,4,3,2]})
>>> df
foo bar
0 a 6
1 b 5
2 a 4
3 b 3
4 c 2
>>> df.groupby("foo").sum()
bar
foo
a 10
b 8
c 2
Related
This question already has answers here:
How can I pivot a dataframe?
(5 answers)
Closed 7 months ago.
I have a dataframe that looks like this:
A
type
val
first
B
20
second
B
30
first
C
200
second
C
300
I need to get it to look like this:
A
B
C
first
20
200
second
30
300
How do I do this using Pandas? I tried using transpose, but couldn't get it to this exact table.
df = df.pivot('A','type')
df.columns = [x[1] for x in list(df.columns)]
df.reset_index()
This question already has answers here:
Pandas Merging 101
(8 answers)
How to filter Pandas dataframe using 'in' and 'not in' like in SQL
(11 answers)
Closed 2 years ago.
I have 2 Pandas Dataframes with one column (ID).
the first one look like this:
ID
1
2
3
4
5
and the second one look like this:
ID
3
4
5
6
7
I want to make a new Dataframe by combining those 2 Dataframes, but only the value that exist on both Dataframe.
This is the result that I want:
ID
3
4
5
can you show me how to do this in the most efficient way with pandas? Thank you
This question already has answers here:
Map dataframe index using dictionary
(6 answers)
Closed 1 year ago.
I want to create a new column on a pandas dataframe using values on the index and a dictionary that translates these values into something more meaningful. My initial idea was to use map. I arrived to a solution but it is very convoluted and there must be a more elegant way to do it. Suggestions?
#dataframe and dict definition
df=pd.DataFrame({'foo':[1,2,3],'boo':[3,4,5]},index=['a','b','c'])
d={'a':'aa','b':'bb','c':'cc'}
df['new column']=df.reset_index().set_index('index',drop=False)['index'].map(d)
Creating a new series explicitly is a bit shorter:
df['new column'] = pd.Series(df.index, index=df.index).map(d)
After to_series, you can using map or replace
df.index=df.index.to_series().map(d)
df
Out[806]:
boo foo
aa 3 1
bb 4 2
cc 5 3
Or we think about another way
df['New']=pd.Series(d).get(df.index)
df
Out[818]:
boo foo New
a 3 1 aa
b 4 2 bb
c 5 3 cc
This question already has answers here:
Pandas DENSE RANK
(4 answers)
pandas group by and assign a group id then ungroup
(3 answers)
Closed 5 years ago.
I have a pandas dataframe with a column, call it range_id, that looks something like this:
range_id
1
1
2
2
5
5
5
8
8
10
10
...
I want to maintain the number buckets (each rows that share values still share values), but make the numbers ascend uniformly. So the new column would like this:
range_id
1
1
2
2
3
3
3
4
4
5
5
...
I could write a lambda function that maps these in such a way to achieve this desired output, but I was wondering if pandas has any sort of built-in functionality to achieve this, as it has always surprised me before in what it is capable of doing. Thanks for the help!
This question already has answers here:
Remap values in pandas column with a dict, preserve NaNs
(11 answers)
Closed 5 years ago.
Having the following pandas Dataframe sample:
df = pd.DataFrame([[1,2],[1,2],[3,5]])
df
0 1
0 1 2
1 1 2
2 3 5
And the following dictionary:
d = {1:'foo',2:'bar',3:'tar',4:'tartar',5:'foofoo'}
I would like to "translate" the dataframe by using the dictionary d. The output looks like:
result = pd.DataFrame([['foo','bar'],['foo','bar'],['tar','fofo']])
result
0 1
0 foo bar
1 foo bar
2 tar fofo
I would like to avoid using for loops. The solution I'm trying to find is something with map or similars...
Solution
Replacing whole dataframe:
result_1 = df.replace(d)
Replacing a specific column of a dataframe:
result_2 = df.replace({"COLUMN":d})