moving last two dataframe rows - python

I'm trying to move the last two rows up:
import pandas as pd
df = pd.DataFrame({
"A" : [1,2,3,4],
"C": [5, 6, 7, 8],
"D": [9, 10, 11, 12],
"E": [13, 14, 15, 16],
})
print(df)
Output:
A C D E
0 1 5 9 13
1 2 6 10 14
2 3 7 11 15
3 4 8 12 16
Desired output:
A C D E
0 3 7 11 15
1 4 8 12 16
2 1 5 9 13
3 2 6 10 14
I was able to move the last row using
df = df.reindex(np.roll(df.index, shift=1))
But can't get the second to last row to move as well. Any advice what's the most efficient way to do this without creating a copy of the dataframe?

Using your code, you can just change the roll's shift value.
import pandas as pd
import numpy as np
df = pd.DataFrame({
"A" : [1,2,3,4],
"C": [5, 6, 7, 8],
"D": [9, 10, 11, 12],
"E": [13, 14, 15, 16],
})
df = df.reindex(np.roll(df.index, shift=2), copy=False)
df.reset_index(inplace=True, drop=True)
print(df)
A C D E
0 3 7 11 15
1 4 8 12 16
2 1 5 9 13
3 2 6 10 14
The shift value will change how many rows are affected by the roll, and afterwards we just reset the index of the dataframe so that it goes back to 0,1,2,3.
Based on the comment of wanting to swap indexes 0 and 1 around, we can use an answer in #CatalinaChou's link to do that. I am choosing to do it after using the roll so as to only have to contend with indexes 0 and 1 after it's been shifted.
# continuing from where the last code fence ends
swap_indexes = {1: 0, 0: 1}
df.rename(swap_indexes, inplace=True)
df.sort_index(inplace=True)
print(df)
A C D E
0 4 8 12 16
1 3 7 11 15
2 1 5 9 13
3 2 6 10 14
A notable difference is the use of inplace=True and thus not being able to chain the methods, but this would be to fulfil not copying the dataframe at all (or as much as possible, I'm not sure if df.reindex will make an internal copy with copy=False).

Related

Pandas dataframe with N columns

I need to use Python with Pandas to write a DataFrame with N columns. This is a simplified version of what I have:
Ind=[[1, 2, 3],[4, 5, 6],[7, 8, 9],[10, 11, 12]]
DAT = pd.DataFrame([Ind[0],Ind[1],Ind[2],Ind[3]], index=None).T
DAT.head()
Out
0 1 2 3
0 1 4 7 10
1 2 5 8 11
2 3 6 9 12
This is the result that I want, but my real Ind has 121 sets of points and I really don't want to write each one in the DataFrame's argument. Is there a way to write this easily? I tried using a for loop, but that didn't work out.
You can just pass the list directly:
data = [[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]]
df = pd.DataFrame(data, index=None).T
df.head()
Outputs:
0 1 2
0 1 2 3
1 4 5 6
2 7 8 9
3 10 11 12

Function in pandas to stack rows into columns by number of rows?

Suppose I have heterogeneous dataframe:
a b c d
1 1 2 3 4
2 5 6 7 8
3 9 10 11 12
4 13 14 15 16
And i want to stack the rows like so:
a b c d
1 1,5,8,13 2,6,10,14 3,7,11,15 4,8,12,16
Etc...
All the references for grouby etc seem to require some feature of grouping, I just want to put x rows into columns, regardless of their content. Each row has a timestamp, I am looking to group values by sample count, so i want 1 row with all the values of x sample rows as columns.
I should end up with a dataframe that has x*original number of columns and original number of rows/x
I'm sure there must be some simple method I'm missing here without a series of loop etc
If need join all values to strings use:
df1 = df.astype(str).agg(','.join).to_frame().T
print (df1)
a b c d
0 1,5,9,13 2,6,10,14 3,7,11,15 4,8,12,16
Or if need create lists use:
df2 = pd.DataFrame([[list(df[x]) for x in df]], columns=df.columns)
print (df2)
a b c d
0 [1, 5, 9, 13] [2, 6, 10, 14] [3, 7, 11, 15] [4, 8, 12, 16]
If need scalars with MultiIndex (generated fro index nad columns labels) use:
df3 = df.unstack().to_frame().T
print (df3)
a b c d
1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4
0 1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16

Find nearest value from multiple columns and add to a new column in Python

I have the following dataframe:
import pandas as pd
import numpy as np
data = {
"index": [1, 2, 3, 4, 5],
"A": [11, 17, 5, 9, 10],
"B": [8, 6, 16, 17, 9],
"C": [10, 17, 12, 13, 15],
"target": [12, 13, 8, 6, 12]
}
df = pd.DataFrame.from_dict(data)
print(df)
I would like to find nearest values for column target in column A, B and C, and put those values into column result. As far as I know, I need to use abs() and argmin() function.
Here is the output I expected:
index A B C target result
0 1 11 8 10 12 11
1 2 17 6 17 13 17
2 3 5 16 12 8 5
3 4 9 17 13 6 9
4 5 10 9 15 12 10
Here is the solution and links what i have found from stackoverflow which may help:
(df.assign(closest=df.apply(lambda x: x.abs().argmin(), axis='columns'))
.apply(lambda x: x[x['target']], axis='columns'))
Identifying closest value in a column for each filter using Pandas
https://codereview.stackexchange.com/questions/204549/lookup-closest-value-in-pandas-dataframe
Subtract "target" from the other columns, use idxmin to get the column of the minimum difference, followed by a lookup:
idx = df.drop(['index', 'target'], 1).sub(df.target, axis=0).abs().idxmin(1)
df['result'] = df.lookup(df.index, idx)
df
index A B C target result
0 1 11 8 10 12 11
1 2 17 6 17 13 17
2 3 5 16 12 8 5
3 4 9 17 13 6 9
4 5 10 9 15 12 10
General solution handling string columns and NaNs (along with your requirement of replacing NaN values in target with value in "v1"):
df2 = df.select_dtypes(include=[np.number])
idx = df2.drop(['index', 'target'], 1).sub(df2.target, axis=0).abs().idxmin(1)
df['result'] = df2.lookup(df2.index, idx.fillna('v1'))
You can also index into the underlying NumPy array by getting integer indices using df.columns.get_indexer.
# idx = df[['A', 'B', 'C']].sub(df.target, axis=0).abs().idxmin(1)
idx = df.drop(['index', 'target'], 1).sub(df.target, axis=0).abs().idxmin(1)
# df['result'] = df.values[np.arange(len(df)), df.columns.get_indexer(idx)]
df['result'] = df.values[df.index, df.columns.get_indexer(idx)]
df
index A B C target result
0 1 11 8 10 12 11
1 2 17 6 17 13 17
2 3 5 16 12 8 5
3 4 9 17 13 6 9
4 5 10 9 15 12 10
You can use NumPy positional integer indexing with argmin:
col_lst = list('ABC')
col_indices = df[col_lst].sub(df['target'], axis=0).abs().values.argmin(1)
df['result'] = df[col_lst].values[np.arange(len(df.index)), col_indices]
Or you can lookup column labels with idxmin:
col_labels = df[list('ABC')].sub(df['target'], axis=0).abs().idxmin(1)
df['result'] = df.lookup(df.index, col_labels)
print(df)
index A B C target result
0 1 11 8 10 12 11
1 2 17 6 17 13 17
2 3 5 16 12 8 5
3 4 9 17 13 6 9
4 5 10 9 15 12 10
The principle is the same, though for larger dataframes you may find NumPy more efficient:
# Python 3.7, NumPy 1.14.3, Pandas 0.23.0
def np_lookup(df):
col_indices = df[list('ABC')].sub(df['target'], axis=0).abs().values.argmin(1)
df['result'] = df[list('ABC')].values[np.arange(len(df.index)), col_indices]
return df
def pd_lookup(df):
col_labels = df[list('ABC')].sub(df['target'], axis=0).abs().idxmin(1)
df['result'] = df.lookup(df.index, col_labels)
return df
df = pd.concat([df]*10**4, ignore_index=True)
assert df.pipe(pd_lookup).equals(df.pipe(np_lookup))
%timeit df.pipe(np_lookup) # 7.09 ms
%timeit df.pipe(pd_lookup) # 67.8 ms

How to replace elements inside the list in series

I have a DataFrame like below,
df1
col1
0 10
1 [5, 8, 11]
2 15
3 12
4 13
5 33
6 [12, 19]
Code to generate this df1:
df1 = pd.DataFrame({"col1":[10,[5,8,11],15,12,13,33,[12,19]]})
df2
col1 col2
0 12 1
1 10 2
2 5 3
3 11 10
4 7 5
5 13 4
6 8 7
Code to generate this df2:
df2 = pd.DataFrame({"col1":[12,10,5,11,7,13,8],"col2":[1,2,3,10,5,4,7]})
I want to replace elements in df1 with df2 values.
If the series values contains non list elements,
I could simply replace with map
df1['res'] = df1['col1'].map(df2.set_index('col1')["col2"].to_dict())
But now this series contains mixed of list and scalar.
How to replace elements in list and scalar values in series in effective way.
Expected Output
col1 res
0 10 2
1 [5, 8, 11] [3,7,10]
2 15 15
3 12 1
4 13 4
5 33 33
Your series is of dtype object, as it contains int and list objects. This is inefficient for Pandas and means a vectorised solution won't be possible.
You can create a mapping dictionary and use pd.Series.apply. To account for list objects, you can catch TypeError. You meet this specific error for lists since they are not hashable, and therefore cannot be used as dictionary keys.
d = df2.set_index('col1')['col2'].to_dict()
def mapvals(x):
try:
return d.get(x, x)
except TypeError:
return [d.get(i, i) for i in x]
df1['res'] = df1['col1'].apply(mapvals)
print(df1)
col1 res
0 10 2
1 [5, 8, 11] [3, 7, 10]
2 15 15
3 12 1
4 13 4
5 33 33
6 [12, 19] [1, 19]

How to make separate lists out of multiple dataframe columns?

Yep, much discussed and similar questions down voted multiple times.. I still can't figure this one out..
Say I have a dataframe like this:
df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
I want to end up with four separate list (a, b, c and d) with the data from each column.
Logically (to me anyway) I would do:
list_of_lst = df.values.T.astype(str).tolist()
for column in df.columns:
i = 0
while i < len(df.columns) - 1:
column = list_of_lst[1]
i = i + 1
But assigning variable names in a loop is not doable/recommended...
Any suggestions how I can get what I need?
I think the best is create dictionary of list by DataFrame.to_dict:
np.random.seed(456)
df = pd.DataFrame(np.random.randint(0,10,size=(10, 4)), columns=list('ABCD'))
print (df)
A B C D
0 5 9 4 5
1 7 1 8 3
2 5 2 4 2
3 2 8 4 8
4 5 6 0 9
5 8 2 3 6
6 7 0 0 3
7 3 5 6 6
8 3 8 9 6
9 5 1 6 1
d = df.to_dict('l')
print (d['A'])
[5, 7, 5, 2, 5, 8, 7, 3, 3, 5]
If really want A, B, C and D lists:
for k, v in df.to_dict('l').items():
globals()[k] = v
print (A)
[5, 7, 5, 2, 5, 8, 7, 3, 3, 5]
retList = dict()
for i in df.columns:
iterator = df[i].tolist()
retList[i] = iterator
You'd get a dictionary with the keys as the column names and values as the list of values in that column.
Modify it to any data structure you want.
retList.values() will give you a list of size 4 with each inner list being the list of each column values
You can transpose your dataframe and use df.T.values.tolist(). But, if you are manipulating numeric arrays thereafter, it's advisable you skip the tolist() part.
df = pd.DataFrame(np.random.randint(0, 100, size=(5, 4)), columns=list('ABCD'))
# A B C D
# 0 17 56 57 31
# 1 3 44 15 0
# 2 94 36 87 30
# 3 44 49 56 76
# 4 29 5 35 24
list_of_lists = df.T.values.tolist()
# [[17, 3, 94, 44, 29],
# [56, 44, 36, 49, 5],
# [57, 15, 87, 56, 35],
# [31, 0, 30, 76, 24]]

Categories