Dividing pandas dataframe to to separate dataframes by rows - python

I have a dataframe, for example:
a b c
0 1 2
3 4 5
6 7 8
and i need to separate it by rows and create a new dataframe from each row.
i tried to iterate over the rows and then for each row (which is a seriese) i tried the command row.to_df() but it gives me a weird result.
basicly im looking to create bew dataframe sa such:
a b c
0 1 2
a b c
3 4 5
a b c
7 8 9

You can simply iterate row-by-row and use .to_frame(). For example:
for _, row in df.iterrows():
print(row.to_frame().T)
print()
Prints:
a b c
0 0 1 2
a b c
1 3 4 5
a b c
2 6 7 8

You can try doing:
for _, row in df.iterrows():
new_df = pd.DataFrame(row).T.reset_index(drop=True)
This will create a new DataFrame object from each row (Series Object) in the original DataFrame df.

Related

Is there a way to get the data after a specific condition in Pandas?

i want to know if there is a way to take the data from a dataframe after a specific condition, and keep taking that data until another condition is applied.
I have the following dataframe:
column_1 column_2
0 1 a
1 1 a
2 1 b
3 4 b
4 4 c
5 4 c
6 0 d
7 0 d
8 0 e
9 4 e
10 4 f
11 4 f
12 1 g
13 1 g
I want to select from this dataframe only the rows when in column_1 when it changes from 1->4 and stays 4 until it changes to another value, as follow:
column_1 column_2
3 4 b
4 4 c
5 4 c
Is there a way to do this in Pandas and not make them lists?
Another option is to find the cut off points using shift+eq; then use groupby.cummax to create a boolean filter:
df[(df['column_1'].shift().eq(1) & df['column_1'].eq(4)).groupby(df['column_1'].diff().ne(0).cumsum()).cummax()]
Output:
column_1 column_2
3 4 b
4 4 c
5 4 c
You can create helper column for groups by duplicated values new first, then test if shifted values is 1 compare with actual row and for these rows get new values. Last compare new column by filtered values for all duplicated 4 rows:
df['new'] = df['column_1'].ne(df['column_1'].shift()).cumsum()
s = df.loc[df['column_1'].shift().eq(1) & df['column_1'].eq(4), 'new']
df = df[df['new'].isin(s)]
print (df)
column_1 column_2 new
3 4 b 2
4 4 c 2
5 4 c 2

Is there a way to filter out rows from a table with an unnamed column

I'm currently trying to do analysis of rolling correlations of a dataset with four compared values but only need the output of rows containing 'a'
I got my data frame by using the command newdf = df.rolling(3).corr()
Sample input (random numbers)
a b c d
1 a
1 b
1 c
1 d
2 a
2 b
2 c
2 d
3 a
3 b 5 6 3
3 c 4 3 1
3 d 3 4 2
4 a 1 3 5 6
4 b 6 2 4 1
4 c 8 6 6 7
4 d 2 5 4 6
5 a 2 5 4 1
5 b 1 4 6 3
5 c 2 6 3 7
5 d 3 6 3 7
and need the output
a b c d
1 a 1 3 5 6
2 a 2 5 4 1
I've tried filtering it by doing adf = newdf.filter(['a'], axis=0) however that gets rid of everything and when doing it for the other axis it filters by column. Unfortunately the column containing the rows with values: a, b, c, d is unnamed so I cant filter that column individually. This wouldn't be an issue however if its possible to flip the rows and columns with the values being listed by index to get the desired output.
Try using loc. Put the column of abcdabcd ... as index and just use loc
df.loc['a']
The actual source of problem in your case is that your DataFrame
has a MultiIndex.
So when you attempt to execute newdf.filter(['a'], axis=0) you want
to leave rows with the index containing only "a" string.
But since your DataFrame has a MultiIndex, each row with "a" at
level 1 contains also some number at level 0.
To get your intended result, run:
newdf.filter(like='a', axis=0)
maybe followed by .dropna().
An alterantive solution is:
newdf.xs('a', level=1, drop_level=False)

Pandas: How to merge columns containing the same name within a single data frame?

I have a dataframe extracted from an excel file which I have manipulated to be in the following form (there are mutliple rows but this is reduced to make my question as clear as possible):
|A|B|C|A|B|C|
index 0: 1 2 3 4 5 6
As you can see there are repetitions of the column names. I would like to merge this dataframe to look like the following:
|A|B|C|
index 0: 1 2 3
index 1: 4 5 6
I have tried to use the melt function but have not had any success thus far.
import pandas as pd
df = pd.DataFrame([[1,2,3,4,5,6]], columns = ['A', 'B','C','A', 'B','C'])
df
A B C A B C
0 1 2 3 4 5 6
pd.concat(x for _, x in df.groupby(df.columns.duplicated(), axis=1))
A B C
0 1 2 3
0 4 5 6

Pandas: Create new dataframe based on existing dataframe

what is the most elegant way to create a new dataframe from an existing dataframe, by 1. selecting only certain columns and 2. renaming them at the same time?
For instance I have the following dataframe, where I want to pick column B, D and F and rename them into X, Y, Z
base dataframe
A B C D E F
1 2 3 4 5 6
1 2 3 4 5 6
new dataframe
X Y Z
2 4 6
2 4 6
You can select and rename the columns in one line
df2=df[['B','D','F']].rename({'B':'X','D':'Y','F':'Z'}, axis=1)
Slightly more general selection of every other column:
df = pd.DataFrame({'A':[1,2,3], 'B':[4,5,6],
'C':[7,8,9], 'D':[10,11,12]})
df_half = df.iloc[:, ::2]
with df_half being:
A C
0 1 7
1 2 8
2 3 9
You can then use the rename method mentioned in the answer by #G. Anderson or directly assign to the columns:
df_half.columns = ['X','Y']
returning:
X Y
0 1 7
1 2 8
2 3 9

Modifying DataFrames in loop

Given this data frame:
import pandas as pd
df=pd.DataFrame({'A':[1,2,3],'B':[4,5,6],'C':[7,8,9]})
df
A B C
0 1 4 7
1 2 5 8
2 3 6 9
I'd like to create 3 new data frames; one from each column.
I can do this one at a time like this:
a=pd.DataFrame(df[['A']])
a
A
0 1
1 2
2 3
But instead of doing this for each column, I'd like to do it in a loop.
Here's what I've tried:
a=b=c=df.copy()
dfs=[a,b,c]
fields=['A','B','C']
for d,f in zip(dfs,fields):
d=pd.DataFrame(d[[f]])
...but when I then print each one, I get the whole original data frame as opposed to just the column of interest.
a
A B C
0 1 4 7
1 2 5 8
2 3 6 9
Update:
My actual data frame will have some columns that I do not need and the columns will not be in any sort of order, so I need to be able to get the columns by name.
Thanks in advance!
A simple list comprehension should be enough.
In [68]: df_list = [df[[x]] for x in df.columns]
Printing out the list, this is what you get:
In [69]: for d in df_list:
...: print(d)
...: print('-' * 5)
...:
A
0 1
1 2
2 3
-----
B
0 4
1 5
2 6
-----
C
0 7
1 8
2 9
-----
Each element in df_list is its own data frame, corresponding to each data frame from the original. Furthermore, you don't even need fields, use df.columns instead.
Or you can try this, instead create copy of df, this method will return the result as single Dataframe, not a list, However, I think save Dataframe into a list is better
dfs=['a','b','c']
fields=['A','B','C']
variables = locals()
for d,f in zip(dfs,fields):
variables["{0}".format(d)] = df[[f]]
a
Out[743]:
A
0 1
1 2
2 3
b
Out[744]:
B
0 4
1 5
2 6
c
Out[745]:
C
0 7
1 8
2 9
You should use loc
a = df.loc[:,0]
and then loop through like
for i in range(df.columns.size):
dfs[i] = df.loc[:, i]

Categories