This question already has answers here:
How to remove nan value while combining two column in Panda Data frame?
(5 answers)
Closed 4 years ago.
I have a pretty simple Pandas question that deals with merging two series. I have two series in a dataframe together that are similar to this:
Column1 Column2
0 Abc NaN
1 NaN Abc
2 Abc NaN
3 NaN Abc
4 NaN Abc
The answer will probably end up being a really simple .merge() or .concat() command, but I'm trying to get a result like this:
Column1
0 Abc
1 Abc
2 Abc
3 Abc
4 Abc
The idea is that for each row, there is a string of data in either Column1, Column2, but never both. I did about 10 minutes of looking for answers on StackOverflow as well as Google, but I couldn't find a similar question that cleanly applied to what I was looking to do.
I realize that a lot of this question just stems from my ignorance on the three functions that Pandas has to stick series and dataframes together. Any help is very much appreciated. Thank you!
You can just use pd.Series.fillna:
df['Column1'] = df['Column1'].fillna(df['Column2'])
Merge or concat are not appropriate here; they are used primarily for combining dataframes or series based on labels.
Use groupby with first
df.groupby(df.columns.str[:-1],axis=1).first()
Out[294]:
Column
0 Abc
1 Abc
2 Abc
3 Abc
4 Abc
Or :
`ndf = pd.DataFrame({'Column1':df.fillna('').sum(1)})`
Related
This question already has answers here:
How do I transpose dataframe in pandas without index?
(3 answers)
Closed 1 year ago.
I have the following DataFrame df
value
type
one
1
two
2
three
3
which I want to reshape such that the desired output would look like that
one
two
three
1
2
3
I used
df.pivot(columns="values", values="type")
which gave me this:
one
two
three
1
nan
nan
nan
2
nan
nan
nan
3
How can I get around the redundancies?
You don't need to pivot the data, you can .Transpose it:
df.set_index('value').T
Out[22]:
value one two three
type 1 2 3
I currently got this dataframe:
original dataframe
However I would like to obtain a dataframe (not containing the 't') from this which looks like this (considering the index):
The index we want for our original dataframe
This of course is done easily when using .groupby().agg(), but the thing is that I don't got a simple aggregation function such as 'max' or 'mean', that I would like to use. Hence my question is: 'Is it possible to group by a dataframe with a customized aggregation function and without using SQL? If so, please let me know!'
I would love to get some help!
Simplified code example explaining my question:
df_example =
C D E
A B
1 2 5 8 9
3 7 9 3
2 4 9 5 5
6 1 4 5
We would like to obtain:
df_example_groupedby_A_only_aggregating_with_custom_function =
Z_custom
A
1 33
2 34
The values in Z_custom are obtained by using the custom aggregation function which uses the values in columns [C,D,E] from df_example.
This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 2 years ago.
I have two dataframes with different sizes and I want to merge them.
It's like an "update" to a dataframe column based on another dataframe with different size.
This is an example input:
dataframe 1
CODUSU Situação TIPO1
0 1AB P A0
1 2C3 C B1
2 3AB P C1
dataframe 2
CODUSU Situação ABC
0 1AB A 3
1 3AB A 4
My output should be like this:
dataframe 3
CODUSU Situação TIPO1
0 1AB A A0
1 2C3 C B1
2 3AB A C1
PS: I did it through loop but I think there should better and easier way to make it!
I read this content: pandas merging 101 and wrote this code:
df3=df1.merge(df2, on=['CODUSU'], how='left', indicator=False)
df3['Situação'] = np.where((df3['Situação_x'] == 'P') & (df3['Situação_y'] == 'A') , df3['Situação_y'] , df3['Situação_x'])
df3=df3.drop(columns=['Situação_x', 'Situação_y','ABC'])
df3 = df3[['CODUSU','Situação','TIPO1']]
And Voilà, df3 is exactly what I needed!
Thanks for everyone!
PS: I already found my answer, is there a better place to answer my own question?
df1.merge(df2,how='left', left_on='CODUSU', right_on='CODUSU')
This should do the trick.
Also, worth noting that if you want your resultant data frame to not contain the column ABC, you'd use df2.drop("ABC") instead of just df2.
This question already has answers here:
How do I Pandas group-by to get sum?
(11 answers)
Closed 2 years ago.
I have a dataframe that contains values by country (and by region in certain countries) and which looks like this:
For each country that is repeated, I would add the values by regions so that there is only one row per country and obtain the following file:
How can I do this in Python? Since I'm really new to Python, I don't mind having a long set of instructions, as long as the procedure is clear, rather than a single line of code, compacted but hard to understand.
Thanks for your help.
You want to study the split-apply-combine paradigm of Pandas DataFrame manipulation. You can do a lot with it. What you want to do is common, and can be accomplished in one line.
>>> import pandas as pd
>>> df = pd.DataFrame({"foo": ["a","b","a","b","c"], "bar": [6,5,4,3,2]})
>>> df
foo bar
0 a 6
1 b 5
2 a 4
3 b 3
4 c 2
>>> df.groupby("foo").sum()
bar
foo
a 10
b 8
c 2
This question already has an answer here:
Deleting DataFrame row in Pandas where column value in list
(1 answer)
Closed 3 years ago.
I have pandas dataframe for exemple like :
id column1 column2
1 aaa mmm
2 bbb nnn
3 ccc ooo
4 ddd ppp
5 eee qqq
I have a list that contain some values from column1 :
[bbb],[ddd],[eee]
I need python code in order to delete from the pandas all elements existing in the list
Ps: my pandas contains 280 000 samples so I need a fast code
Thanks
You can use isin and its negation (~):
df[~df.column1.isin(['bbb','ddd', 'eee'])]
Try this:
df = df.loc[~df['B'].isin(list), :]