This question already has answers here:
How to switch columns rows in a pandas dataframe
(2 answers)
Closed 4 years ago.
Have a simple df:
df = pd.DataFrame({"v": [1, 2]}, index = pd.Index(data = ["a", "b"], name="colname"))
Want to reshape it to look like this:
a b
0 1 2
How do I do that? I looked at the docs for pd.pivot and pd.pivot_table but
df.reset_index().pivot(columns = "colname", values = "v")
produces a df that has NaNs obviously.
update: i want dataframes not series because i am going to concatenate a bunch of them together to store results of a computation.
From your setup
v
colname
a 1
b 2
Seems like you need to transpose
>>> df.T
or
>>> df.transpose()
Which yield
colname a b
v 1 2
You can always reset the index to get 0 and set the column name to None to get your expected output
ndf = df.T.reset_index(drop=True)
ndf.columns.name = None
a b
0 1 2
How about:
df.T.reset_index(drop=True)
[out]
colname a b
0 1 2
Related
This question already has answers here:
Pandas reset index is not taking effect [duplicate]
(4 answers)
Closed 5 days ago.
This post was edited and submitted for review 5 days ago.
I was wondering why reset_index() has no effect in the following piece of code.
data = [0,10,20,30,40,50]
df = pd.DataFrame(data, columns=['Numbers'])
df.drop(df.index[2:4], inplace=True)
df.reset_index()
df
Numbers
0 0
1 10
4 40
5 50
UPDATE:
If I use df.reset_index(inplace=True), I see a new column which is not desired.
index Numbers
0 0 0
1 1 10
2 4 40
3 5 50
Because reset_index() has inplace=False as default, so you need to do reset_index(inplace=True). Docs
Please try this code
import pandas as pd
# create a sample DataFrame
df = pd.DataFrame({'column_name': [1, 2, 0, 4, 0, 6]})
# drop rows where column 'column_name' has value of 0
df = df[df['column_name'] != 0]
# reset the index of the resulting DataFrame
df = df.reset_index(drop=True)
print(df)
I have a DataFrame with 100 columns (however I provide only three columns here) and I want to build a new DataFrame with two columns. Here is the DataFrame:
import pandas as pd
df = pd.DataFrame()
df ['id'] = [1,2,3]
df ['c1'] = [1,5,1]
df ['c2'] = [-1,6,5]
df
I want to stick the values of all columns for each id and put them in one columns. For example, for id=1 I want to stick 2, 3 in one column. Here is the DataFrame that I want.
Note: df.melt does not solve my question. Since I want to have the ids also.
Note2: I already use the stack and reset_index, and it can not help.
df = df.stack().reset_index()
df.columns = ['id','c']
df
You could first set_index with "id"; then stack + reset_index:
out = (df.set_index('id').stack()
.droplevel(1).reset_index(name='c'))
Output:
id c
0 1 1
1 1 -1
2 2 5
3 2 6
4 3 1
5 3 5
This question already has answers here:
Creating a new column assigning same index to repeated values in Pandas DataFrame [closed]
(2 answers)
Closed 2 years ago.
Consider the sample dataframe ('value' column is of no significance here):
df = pd.DataFrame({'key':list('AABBBC'), 'value': [1, 2, 3, 4, 5, 6]})
What I want is a column to count the unique value of only the 'key' column, the caveat being value count will be incrementally ascending and the count will only go up if the cell value hasn't appeared in previous rows. So here "A" will be assigned value 1, "B" 2 and "C" 3.
The desired result looks like this:
Right now I can only achieve this with a couple of steps:
df1 = df.drop_duplicates('key').reset_index(drop = True).drop(columns = ['value'])
df1['count_unique'] = df1.index+1
pd.merge(df, df1.set_index(['key']), left_on = ['key'], right_index= True, how = 'left')
It doesn't look very Pythonic and is not the most efficient. Any advice is appreciated.
Is it:
df['count_unique'] = df['key'].factorize()[0] + 1
Output:
key value count_unique
0 A 1 1
1 A 2 1
2 B 3 2
3 B 4 2
4 B 5 2
5 C 6 3
This question already has answers here:
Concatenate rows of two dataframes in pandas
(3 answers)
Closed 5 years ago.
I have two Pandas DataFrames, each with different columns. I want to basically glue them together horizontally (they each have the same number of rows so this shouldn't be an issue).
There must be a simple way of doing this but I've gone through the docs and concat isn't what I'm looking for (I don't think).
Any ideas?
Thanks!
concat is indeed what you're looking for, you just have to pass it a different value for the "axis" argument than the default. Code sample below:
import pandas as pd
df1 = pd.DataFrame({
'A': [1,2,3,4,5],
'B': [1,2,3,4,5]
})
df2 = pd.DataFrame({
'C': [1,2,3,4,5],
'D': [1,2,3,4,5]
})
df_concat = pd.concat([df1, df2], axis=1)
print(df_concat)
With the result being:
A B C D
0 1 1 1 1
1 2 2 2 2
2 3 3 3 3
3 4 4 4 4
4 5 5 5 5
This question already has answers here:
How to add multiple columns to pandas dataframe in one assignment?
(13 answers)
Closed 2 years ago.
I have follow simple DataFrame - df:
0
0 1
1 2
2 3
Once I try to create a new columns and assign some values for them, as example below:
df['col2', 'col3'] = [(2,3), (2,3), (2,3)]
I got following structure
0 (col2, col3)
0 1 (2, 3)
1 2 (2, 3)
2 3 (2, 3)
However, I am looking a way to get as here:
0 col2, col3
0 1 2, 3
1 2 2, 3
2 3 2, 3
Looks like solution is simple:
df['col2'], df['col3'] = zip(*[(2,3), (2,3), (2,3)])
There is a convenient solution to joining multiple series to a dataframe via a list of tuples. You can construct a dataframe from your list of tuples before assignment:
df = pd.DataFrame({0: [1, 2, 3]})
df[['col2', 'col3']] = pd.DataFrame([(2,3), (2,3), (2,3)])
print(df)
0 col2 col3
0 1 2 3
1 2 2 3
2 3 2 3
This is convenient, for example, when you wish to join an arbitrary number of series.
alternatively assign can be used
df.assign(col2 = 2, col3= 3)
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.assign.html
I ran across this issue when trying to apply multiple scalar values to multiple new columns and couldn't find a better way. If I'm missing something blatantly obvious, let me know, but df[['b','c']] = 0 doesn't work. but here's the simplified code:
# Create the "current" dataframe
df = pd.DataFrame({'a':[1,2]})
# List of columns I want to add
col_list = ['b','c']
# Quickly create key : scalar value dictionary
scalar_dict = { c : 0 for c in col_list }
# Create the dataframe for those columns - key here is setting the index = df.index
df[col_list] = pd.DataFrame(scalar_dict, index = df.index)
Or, what appears to be slightly faster is to use .assign():
df = df.assign(**scalar_dict)