Pandas cross join without duplication [duplicate]

Pandas cross join without duplication [duplicate] - python

This question already has answers here:
Concatenate rows of two dataframes in pandas
(3 answers)
Closed 5 years ago.
I have two Pandas DataFrames, each with different columns. I want to basically glue them together horizontally (they each have the same number of rows so this shouldn't be an issue).
There must be a simple way of doing this but I've gone through the docs and concat isn't what I'm looking for (I don't think).
Any ideas?
Thanks!

concat is indeed what you're looking for, you just have to pass it a different value for the "axis" argument than the default. Code sample below:
import pandas as pd
df1 = pd.DataFrame({
'A': [1,2,3,4,5],
'B': [1,2,3,4,5]
})
df2 = pd.DataFrame({
'C': [1,2,3,4,5],
'D': [1,2,3,4,5]
})
df_concat = pd.concat([df1, df2], axis=1)
print(df_concat)
With the result being:
A B C D
0 1 1 1 1
1 2 2 2 2
2 3 3 3 3
3 4 4 4 4
4 5 5 5 5

Related

How to get multiple values from Pandas dataframe whose indices and columns are known? [duplicate]

This question already has answers here:
Vectorized lookup on a pandas dataframe
(3 answers)
Closed 4 years ago.
I have a Pandas dataframe that looks like this
>> pd.DataFrame({'A': [1,2,3], 'B':[4,5,6], 'C':[7,8,9]})
A B C
0 1 4 7
1 2 5 8
2 3 6 9
I want to select values 1,6,8 that correspond to index-column pairs (0,'A'),(2,'B'),(1,'C'). How do I simultaneously select them?

Use lookup:
import pandas as pd
df = pd.DataFrame({'A': [1,2,3], 'B':[4,5,6], 'C':[7,8,9]})
rows, cols = zip(*[(0,'A'),(2,'B'),(1,'C')])
result = df.lookup(rows, cols)
print(result)
Output
[1 6 8]

simple pivot dataframe in pandas [duplicate]

This question already has answers here:
How to switch columns rows in a pandas dataframe
(2 answers)
Closed 4 years ago.
Have a simple df:
df = pd.DataFrame({"v": [1, 2]}, index = pd.Index(data = ["a", "b"], name="colname"))
Want to reshape it to look like this:
a b
0 1 2
How do I do that? I looked at the docs for pd.pivot and pd.pivot_table but
df.reset_index().pivot(columns = "colname", values = "v")
produces a df that has NaNs obviously.
update: i want dataframes not series because i am going to concatenate a bunch of them together to store results of a computation.

From your setup
v
colname
a 1
b 2
Seems like you need to transpose
>>> df.T
or
>>> df.transpose()
Which yield
colname a b
v 1 2
You can always reset the index to get 0 and set the column name to None to get your expected output
ndf = df.T.reset_index(drop=True)
ndf.columns.name = None
a b
0 1 2

How about:
df.T.reset_index(drop=True)
[out]
colname a b
0 1 2

Better alternative to a groupby with a merge [duplicate]

This question already has answers here:
Pandas grouby and transform('count') gives placement error - works fine on smaller dataset
(1 answer)
Merging a pandas groupby result back into DataFrame
(3 answers)
Closed 4 years ago.
I was wondering if anyone knew of a better method to what I am currently doing. Here is an example data set:
ID Number
a 1
a 2
a 3
b 4
c 5
c 6
c 7
c 8
Example: if I wanted to get a count of Numbers by ID column in the table above. I would first do a groupby ID and do a count on Number, then merge the results back to the original table like so:
df2 = df.groupby('ID').agg({'Number':'count'}).reset_index()
df2 = df2.rename(columns = {'Number':'Number_Count'})
df = pd.merge(df, df2, on = ['ID'])
This results in:
It feels like a roundabout way of doing this, does anyone know a better alternative? The reason I ask is because when working with large data sets, this method can chew up a lot of memory (by creating another table and then merging them).

You can do that quite simply with this:
import pandas as pd
df = pd.DataFrame({'ID': list('aaabcccc'),
'Number': range(1,9)})
df['Number_Count'] = df.groupby('ID').transform('count')
df
# ID Number Number_Count
#0 a 1 3
#1 a 2 3
#2 a 3 3
#3 b 4 1
#4 c 5 4
#5 c 6 4
#6 c 7 4
#7 c 8 4

Python: how to merge two dataframes with different size? [duplicate]

This question already has answers here:
Merge two dataframes by index
(7 answers)
pandas: merge (join) two data frames on multiple columns
(6 answers)
Pandas Merging 101
(8 answers)
Closed 4 years ago.
I have two dataframes, df and df1. The first one contains all the information about all the possible combination of a dataset while the second one is just a subset without the information.
df
x y distance
0 1 4
0 2 3
0 3 2
1 2 2
1 3 5
2 3 1
df1
x y
1 3
2 3
2 3
I would like to merge df and df1 in order to have the following:
df1
x y distance
1 3 5
2 3 1
2 3 1

You can use the merge command
df.merge(df1, left_on=['x','y'], right_on=['x','y'], how='right')
Here you're merging the df on the left with df1 on the right using the columns x andy as merging criteria and keeping only the rows that are present in the right dataframe.
You can read more about merging and joining dataframes here.

Combine two Pandas dataframes with the same index [duplicate]

This question already has an answer here:
What are the 'levels', 'keys', and names arguments for in Pandas' concat function?
(1 answer)
Closed 4 years ago.
I have two dataframes with the same index but different columns. How do I combine them into one with the same index but containing all the columns?
I have:
A
1 10
2 11
B
1 20
2 21
and I need the following output:
A B
1 10 20
2 11 21

pandas.concat([df1, df2], axis=1)

You've got a few options depending on how complex the dataframe is:
Option 1:
df1.join(df2, how='outer')
Option 2:
pd.merge(df1, df2, left_index=True, right_index=True, how='outer')

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pandas cross join without duplication [duplicate] - python

Related

How to get multiple values from Pandas dataframe whose indices and columns are known? [duplicate]

simple pivot dataframe in pandas [duplicate]

Better alternative to a groupby with a merge [duplicate]

Python: how to merge two dataframes with different size? [duplicate]

Combine two Pandas dataframes with the same index [duplicate]

Categories

Resources