Populate Dataframe column from information in other Dataframe [duplicate] - python

This question already has answers here:
Pandas: how to merge two dataframes on a column by keeping the information of the first one?
(4 answers)
Closed 1 year ago.
I have two dataframes, one (A) contains the notes associated with certain accounts. The other (B) is a list of accounts that i wish to add a column containing the note for that account. In this example there will be times when the account number in dataframe B is not in dataframe A and i would like to fill this either NaN or 0.
Input:
Dataframe A:
Account Note
11 a
12 b
13 c
14 d
15 e
16 f
Dataframe B:
Account
11
25
42
14
15
19
26
Desired Output:
Dataframe C:
Account Note
11 a
25
42
14 d
15 e
19
26
Note that in my real world example the size of Dataframe B will be much bigger than A

Try merge with how='left' and on='Account':
>>> df_b.merge(df_a, how='left', on='Account')

Related

Merging two dataframes while considering overlaps and missing indexes [duplicate]

This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 1 year ago.
I have multiple dataframes that have an ID and a value and I am trying to merge them such that each ID has all the values in it's row.
ID
Value
1
10
3
21
4
12
5
43
7
11
And then I have another dataframe:
ID
Value2
1
12
2
14
4
55
6
23
7
90
I want to merge these two in a way where it considers the ID's that are already in the first dataframe and if an ID that is the second dataframe is not in the first one, it adds it to the ID row with value2 leaving value empty. This is what my result would look like:
ID
Value
Value2
1
10
12
3
21
-
4
12
55
5
43
-
7
11
90
2
-
14
6
-
23
Hope this makes sense. I don't really care for the order of the ID numbers, they can be sorted or not. My goal is to be able to create dictionaries for each ID with "Value", "Value2", "Value3,... as keys and the corresponding actual value numbers as the keys values. Please let me know if any clarification needed.
You can use pandas' merge method (see here for the help page):
import pandas as pd
df1.merge(df2, how='outer', on='ID')
Specifying 'outer' will use union keys from both dataframes.

Split dataframe into 3 equally sized new dataframes - Pandas [duplicate]

This question already has answers here:
Split large Dataframe into smaller equal dataframes
(2 answers)
Split a large pandas dataframe
(10 answers)
Closed 1 year ago.
I have a dataframe that is 100,227 records long.
I would like to export this dataframe into 3 equally sized csv's
I did the following but getting an error. Is there perhaps a simpler approach to doing this?
df_seen = pd.read_csv("data.csv")
df1 = df_seen.shape.iloc[:, :33409]
df2 = df_seen.shape.iloc[33410:, 66818:]
df3 = df_seen.shape.iloc[66819:, 100227]
df1.to_csv('data1.csv', index=False)
df2.to_csv('data2.csv',index = False)
df3.to_csv('data3.csv',index = False)
So for example, the data looks similar to below where I would take the 1 column of 15 numbers and split it into 3 columns of 5 each:
Numbers
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Desired out put (note each column represents a new csv with a single column)
1 6 11
2 7 12
3 8 13
4 9 14
5 10 15
Try using numpy.array_split:
import numpy as np
df1, df2, df3 = np.array_split(df_seen, 3)
To save each DataFrame to a separate file, you could do:
for i, df in enumerate(np.array_split(df_seen, 3)):
df.to_csv(f"data{i+1}.csv", index=False)

groupby with multiple columns with addition and frequency counts in pandas [duplicate]

This question already has answers here:
Multiple aggregations of the same column using pandas GroupBy.agg()
(4 answers)
Closed 4 years ago.
I have a table that is looks like follows:
name type val
A online 12
B online 24
A offline 45
B online 32
A offline 43
B offline 44
I want to dataframe in such a manner that it can be groupby with multiple cols name & type, which also have additional columns that return the count of the record with val being added of the same type records. It should be like follows:
name type count val
A online 1 12
offline 2 88
B online 2 56
offline 1 44
I have tried pd.groupby(['name', 'type'])['val'].sum() that gives the addition but unable to add the count of records.
Add parameter sort=False to groupby for avoid default sorting and aggregate by agg with tuples with new columns names and aggregate functions, last reset_index for MultiIndex to columns:
df1 = (df.groupby(['name', 'type'], sort=False)['val']
.agg([('count', 'count'),('val', 'sum')])
.reset_index())
print (df1)
name type count val
0 A online 1 12
1 B online 2 56
2 A offline 2 88
3 B offline 1 44
You can try pivoting i.e
df.pivot_table(index=['name','type'],aggfunc=['count','sum'],values='val')
count sum
val val
name type
A offline 2 88
online 1 12
B offline 1 44
online 2 56

add a column with a counter per group in pandas [duplicate]

This question already has answers here:
Pandas number rows within group in increasing order
(2 answers)
Generate column of unique ID in pandas
(1 answer)
Closed 4 years ago.
I have a pandas DataFrame in python with one column A with numerical values:
A
11
12
13
12
14
I want to add a column that contains a counter that counts the number of elements per group up to that index in column A, like this:
A B
11 1
12 1
13 1
12 2
14 1
How do I create column B?

Pandas: Get top 10 values AFTER grouping

I have a pandas data frame with a column 'id' and a column 'value'. It is already sorted by first id (ascending) and then value (descending). What I need is the top 10 values per id.
I assumed that something like the following would work, but it doesn't:
df.groupby("id", as_index=False).aggregate(lambda (index,rows) : rows.iloc[:10])
What I get is just a list of ids, the value column (and other columns that I omitted for the question) aren't there anymore.
Any ideas how it might be done, without iterating through each of the single rows and appending the first ten to another data structure?
Is this what you're looking for?
df.groupby('id').head(10)
I would like to answer this by giving and example dataframe as:
df = pd.DataFrame(np.array([['a','a','b','c','a','c','b'],[4,6,1,8,9,4,1],[12,11,7,1,5,5,7],[123,54,146,96,10,114,200]]).T,columns=['item','date','hour','value'])
df['value'] = pd.to_numeric(df['value'])
This gives you a dataframe
item date hour value
a 4 12 123
a 6 11 54
b 1 7 146
c 8 1 96
a 9 5 10
c 4 5 114
b 1 7 200
Now this is grouped below and displays first 2 values of grouped items.
df.groupby(['item'])['value'].head(2)

Categories