Adding columns to DataFrame from other DataFrame without intersection - python

I have on Dataframe with diff size and columns, I require to add the columns from one DataFrame to another, and fulfill with same data all rows.
for instance:
one of them:
Out[48]:
A B
0 1 2
1 1 2
2 1 2
3 1 2
and the other
Out[49]:
C D
0 3 4
I want to have a new one as:
A B C D
0 1 2 3 4
1 1 2 3 4
2 1 2 3 4
3 1 2 3 4
Is it possible?

You can assign with pd.Series
df.assign(**df1.loc[0])
Out[11]:
A B C D
0 1 2 3 4
1 1 2 3 4
2 1 2 3 4
3 1 2 3 4

Using join with ffill:
df1.join(df2).ffill().astype(int)
A B C D
0 1 2 3 4
1 1 2 3 4
2 1 2 3 4
3 1 2 3 4

Related

Autoincrement indexing after groupby with pandas on the original table

I cannot solve a very easy/simple problem in pandas. :(
I have the following table:
df = pd.DataFrame(data=dict(a=[1, 1, 1,2, 2, 3,1], b=["A", "A","B","A", "B", "A","A"]))
df
Out[96]:
a b
0 1 A
1 1 A
2 1 B
3 2 A
4 2 B
5 3 A
6 1 A
I would like to make an incrementing ID of each grouped (grouped by columns a and b) unique item. So the result would like like this (column c):
Out[98]:
a b c
0 1 A 1
1 1 A 1
2 1 B 2
3 2 A 3
4 2 B 4
5 3 A 5
6 1 A 1
I tried with:
df.groupby(["a", "b"]).nunique().cumsum().reset_index()
Result:
Out[105]:
a b c
0 1 A 1
1 1 B 2
2 2 A 3
3 2 B 4
4 3 A 5
Unfortunatelly this works only for the grouped by dataset and not on the original dataset. As you can see in the original table I have 7 rows and the grouped by returns only 5.
So could someone please help me on how to get the desired table:
a b c
0 1 A 1
1 1 A 1
2 1 B 2
3 2 A 3
4 2 B 4
5 3 A 5
6 1 A 1
Thank you in advance!
groupby + ngroup
df['c'] = df.groupby(['a', 'b']).ngroup() + 1
a b c
0 1 A 1
1 1 A 1
2 1 B 2
3 2 A 3
4 2 B 4
5 3 A 5
6 1 A 1
Use pd.factorize after create a tuple from (a, b) columns:
df['c'] = pd.factorize(df[['a', 'b']].apply(tuple, axis=1))[0] + 1
print(df)
# Output
a b c
0 1 A 1
1 1 A 1
2 1 B 2
3 2 A 3
4 2 B 4
5 3 A 5
6 1 A 1

how to loop through columns of a dataframe which have intezers as column names

I have a dataframe with column names as 1,2,3,4..10 . I have sub category of columns as
sub_cols = ['1','2','3']
I want to loop through these sub_cols
for col in sub_cols:
print('column: '+str(col))
data[col]
len(data[col])
I get an output
column: 1
column: 2
column: 3
but data of columns or len of columns is not printed. I don't see any error too. Where am I going wrong?
Your code corrected:
for col in sub_cols:
print('column: '+str(col))
print(data[col])
print(len(data[col])
Looks like your fundamental issue if you list of integers is a list of strings.
following code shows how to loop through columns where name is an integer
simple list comprehension with isinstance(c, int)
cols = [i for i in range(5)]+list("abcd")
df = pd.DataFrame(np.random.randint(1,5,5*len(cols)).reshape(5,len(cols)), columns=cols)
df.loc[:,[c for c in df.columns if isinstance(c, int)]]
df
0
1
2
3
4
a
b
c
d
0
3
4
1
2
4
4
4
1
4
1
1
4
3
2
1
2
4
4
1
2
2
3
4
1
1
4
4
4
2
3
3
1
3
1
2
4
2
4
3
4
3
4
1
4
1
4
1
1
1
df.loc[:,[c for c in df.columns if isinstance(c, int)]]
0
1
2
3
4
0
3
4
1
2
4
1
1
4
3
2
1
2
2
3
4
1
1
3
3
1
3
1
2
4
3
4
1
4
1

Pandas Dataframe: Update values in a certain columns for last n rows

In the example below, I want to update column C for the last 3 rows to the value 0.
Source Dataframe
A B C D E
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
4 4 4 4 4
5 5 5 5 5
Target Dataframe
A B C D E
1 1 1 1 1
2 2 2 2 2
3 3 0 3 3
4 4 0 4 4
5 5 0 5 5
I tried something like
df.tail(3)['C']=0
but it does not work. Any idea?
You can settle for
df.loc[df.tail(3).index, 'C'] = 0
You can use:
df.iloc[-3:]['C'] = 0
Output:
A B C D E
0 1 1 1 1 1
1 2 2 2 2 2
2 3 3 0 3 3
3 4 4 0 4 4
4 5 5 0 5 5
Other way:
df[-3:]['C'] = 0

Pandas DataFrame assign hirachic number to element

I have the following Dataframe:
a b c d
0 1 4 9 2
1 2 5 8 7
2 4 6 2 3
3 3 2 7 5
I want to assign a number to each element in a row according to it's order. The result should look like this:
a b c d
0 1 3 4 2
1 1 2 4 3
2 3 4 1 2
3 2 1 4 3
I tried to use the np.argsort function which doesn't work. Does someone know an easy way to to this? Thanks.
Use DataFrame.rank:
df = df.rank(axis=1).astype(int)
print (df)
a b c d
0 1 3 4 2
1 1 2 4 3
2 3 4 1 2
3 2 1 4 3

How can I count the number of cycles in a column with cyclic values?

I have this DataFrame:
import pandas as pd
data = {'c': [1,2,1,2,3,2,3], 'b': [5,6,4,5,5,6,4]}
df = pd.DataFrame(data = data)
and I want to create the column N with the cycle number of c:
b c N
0 5 1 1
1 6 2 1
2 4 1 2
3 5 2 2
4 5 3 2
5 6 2 3
6 4 3 3
How can I do that?
You can use shift to see if c stops increasing:
(df.c < df.c.shift()).cumsum().add(1)
0 1
1 1
2 2
3 2
4 2
5 3
6 3
Name: c, dtype: int32
Use diff and cumsum
(df.c.diff() <0).cumsum()
0 0
1 0
2 1
3 1
4 1
5 2
6 2
If need, add 1
(df.c.diff() <0).cumsum() + 1
0 1
1 1
2 2
3 2
4 2
5 3
6 3

Categories