This question already has an answer here:
Pandas Python: sort dataframe but don't include given row
(1 answer)
Closed 4 years ago.
columns=['NAME', 'AB', 'H']
import pandas as pd
df = pd.DataFrame([['Harper', '10', '5'], ['Trout', '10', '5'], ['Ohtani', '10', '5'], ['TOTAL', '30', '15']], columns=columns)
df1 = df.sort_values(by='NAME')
print(df1)
the result is
NAME AB H
0 Harper 10 5
2 Ohtani 10 5
3 TOTAL 30 15
1 Trout 10 5
I want to sort the dataframe except index of 'TOTAL'.
Try following code to sort the df by 'NAME' by excluding 'Total':
df1 = df[df.NAME!='TOTAL'].sort_values(by='NAME')
Output:
NAME AB H
0 Harper 10 5
2 Ohtani 10 5
1 Trout 10 5
You can append back the 'Total' after sorting by:
df1 = df1.append(df[df.NAME=='TOTAL'])
Output:
NAME AB H
0 Harper 10 5
2 Ohtani 10 5
1 Trout 10 5
3 TOTAL 30 15
Related
I have a pandas Dataframe named dataframe.
I want to add two rows at the start and end of the data frame with 0s.
#create DataFrame
df_x = pd.DataFrame({'logvalue': ['20', '20.5', '18.5', '2', '10'],
'ID': ['1', '2', '3', '4', '5']})
Output should look like below.
logvalue
ID
violatedInstances
0
0
0
20
1
0
20.5
2
1
18.5
3
0
2
4
1
10
5
1
0
0
0
The output should rearrange the indexes of the dataframe as well.
How can I do this in pandas?
You can use concat:
First create a new dataframe (df_y) that contains the zero'd row
Use the concat function to join this dataframe with the original
Use the reset_index(drop=True) function to reset the index.
Code:
df_x = pd.DataFrame({ 'logvalue': [20.0, 20.5, 18.5, 2.0, 10.0, 0.0],
'ID': [1, 2, 3, 4, 5, 0],
'violatedInstances': [0, 1, 0, 1, 1, 0]})
# Extract the column names from the original dataframe
column_names = df_x.columns
number_of_columns = len(column_names)
row_of_zeros = [0]*number_of_columns
# Create a new dataframe that has a row of zeros
df_y = pd.DataFrame([row_of_zeros], columns=column_names)
# Join the dataframes together
output = pd.concat([df_y, df_x, df_y]).reset_index(drop=True)
print(output)
Output:
logvalue ID violatedInstances
0 0.0 0 0
1 20.0 1 0
2 20.5 2 1
3 18.5 3 0
4 2.0 4 1
5 10.0 5 1
6 0.0 0 0
7 0.0 0 0
Example
df_x = pd.DataFrame({'logvalue': ['20', '20.5', '18.5', '2', '10'],
'ID': ['1', '2', '3', '4', '5']})
df_x
logvalue ID
0 20 1
1 20.5 2
2 18.5 3
3 2 4
4 10 5
Code
use reindex with fill_value
idx = ['start'] + df_x.index.tolist() + ['end']
df_x.reindex(idx, fill_value=0).reset_index(drop=True)
result:
logvalue ID
0 0 0
1 20 1
2 20.5 2
3 18.5 3
4 2 4
5 10 5
6 0 0
['start'] and ['end'] of idx variable : any label that is not in index of df_x.
I am trying to create a pandas df like this post.
df = pd.DataFrame(np.arange(9).reshape(3,3) , columns=list('123'))
df
this piece of code gives
describe() gives
is there is way to set the name of each row (i.e. the index) in df as 'A', 'B', 'C' instead of '0', '1', '2' ?
Use df.index:
df.index=['A', 'B', 'C']
print(df)
1 2 3
A 0 1 2
B 3 4 5
C 6 7 8
A more scalable and general solution would be using list-comprehension
df.index = [chr(ord('a') + x).upper() for x in df.index]
print(df)
1 2 3
A 0 1 2
B 3 4 5
C 6 7 8
Add index parameter in DataFrame constructor:
df = pd.DataFrame(np.arange(9).reshape(3,3) ,
index=list('ABC'),
columns=list('123'))
print (df)
1 2 3
A 0 1 2
B 3 4 5
C 6 7 8
This question already has answers here:
pandas add column to groupby dataframe
(3 answers)
Closed 4 years ago.
I am trying to find the unique number of items in each 'group' of ID's. So in the code below I am trying to find the unique number of demographics (A, B, C) for each value of id_match (101, 201, 26).
tst = pd.DataFrame({'demographic' : ['A', 'B', 'B', 'A', 'C', 'C'],
'id_match' : ['101', '101', '201', '201', '26', '26']})
tst['num_unq'] = tst.groupby('demographic')['id_match'].nunique()
Expected output
demographic id_match num_unq
1 A 101 2
2 B 101 2
3 B 201 2
4 A 201 2
5 C 26 1
6 C 26 1
However instead of the expected output i simply get a columns of NaN's. Does anyone know why this happens and also an alternative method?
Thanks J
Use transform:
tst = pd.DataFrame({'demographic' : ['A', 'B', 'B', 'A', 'C', 'C'],
'id_match' : ['101', '101', '201', '201', '26', '26']})
tst['num_unq'] = tst.groupby('demographic')['id_match'].transform('nunique')
print(tst)
Output
demographic id_match num_unq
0 A 101 2
1 B 101 2
2 B 201 2
3 A 201 2
4 C 26 1
5 C 26 1
I am a new in Python pandas, so sorry if this question is very easy.
I have three lists:
A = ['A','B','C']
M = ['1','2','3']
F = ['plus','minus','square']
I want to make those lists combined and show it in data frame.
I have tried to use list.append
new_list = []
for i in A:
new_list.append(i)
for j in (M):
new_list.append(j)
print(new_list)
['A', '1', '2', '3', 'B', '1', '2', '3', 'C', '1', '2', '3']
I confused, how to get the output like this (in dataframe):
It seems as if you want to create all list of all possible permutations. You can do this with itertools and pandas. Itertools is a native library to python:
import pandas as pd
import itertools
A = ['A','B','C']
M = ['1','2','3']
F = ['plus','minus','square']
df = pd.DataFrame(list(itertools.product(A,M,F)), columns=['A','M','F'])
print(df)
Output:
A M F
0 A 1 plus
1 A 1 minus
2 A 1 square
3 A 2 plus
4 A 2 minus
5 A 2 square
6 A 3 plus
7 A 3 minus
8 A 3 square
9 B 1 plus
10 B 1 minus
11 B 1 square
12 B 2 plus
13 B 2 minus
14 B 2 square
15 B 3 plus
16 B 3 minus
17 B 3 square
18 C 1 plus
19 C 1 minus
20 C 1 square
21 C 2 plus
22 C 2 minus
23 C 2 square
24 C 3 plus
25 C 3 minus
26 C 3 square
What you need is a Cartesian product of the three sets:
import pandas as pd
from itertools import product
pd.DataFrame(list(product(A,M,F)), columns=['A', 'M', 'F'])
I have text file like below.
A1 1234 56
B2 1234 56
C3 2345167
I have the startposition and length table.
which represents each where each elements start in previous df,and length for each rows.
start length
1 1
2 1
3 1
4 2
6 2
8 2
10 1
I would like to read like below according to startposition and length.
A 1 nan 12 34 5 6
B 2 nan 12 34 5 6
C 3 nan 23 45 16 7
first,I tried
pd.read_csv(file.txt,sep=" ")
But I couldnt figure out how to split.
How can I read and split dataframe?
As mentioned in the comments, this isn't a CSV format, and so I had to produce a work-around.
def get_row_format(length_file):
with open(length_file, 'r') as fd_len:
#Read in the file, not a CSV!
#this double list-comprehension produces a list of lists
rows = [[x.strip() for x in y.split()] for y in fd_len.readlines()]
#determine the row-format from the rows lists
row_form = {int(x[0]): int(x[1]) for x in rows[1:]} #idx 1: to skip header
return row_form
def read_with_row_format(data_file, rform):
with open(data_file, 'r') as fd_data:
for row in fd_data.readlines():
#Get the formatted output
#use .items() for Python 3.x
formatted_output = [row[k-1:k+v-1] for k, v in rform.iteritems()]
print formatted_output
The first function gets the 'row-format' and the second function applies that row format to each line in the file
Usage:
rform = get_row_format('lengths.csv')
read_with_row_format('data.csv', rform)
Output:
['A', '1', '12', '34', '5', '6']
['B', '2', '12', '34', '5', '6']
['C', '3', '23', '45', '6', '7']
This is a fixed width file, you can use pandas.read_fwf:
import pandas as pd
from io import StringIO
s = StringIO("""A1 1234 56
B2 1234 56
C3 2345167""")
pd.read_fwf(s, widths = widths.length, header=None)
# 0 1 2 3 4 5 6
#0 A 1 NaN 12 34 5 6
#1 B 2 NaN 12 34 5 6
#2 C 3 NaN 23 45 16 7
The widths data frame:
widths = pd.read_csv(StringIO("""start length
1 1
2 1
3 1
4 2
6 2
8 2
10 1"""), sep = "\s+")
Since you have the starting position and length of each field, use them.
Here is code to carry that out. Each line is taken in turn. Each field is the slice from the start column to the same position plus the length of the field.
I leave the conversions to you.
data = [
"A1 1234 56",
"B2 1234 56",
"C3 2345167"
]
table = [
[1, 1],
[2, 1],
[3, 1],
[4, 2],
[6, 2],
[8, 2],
[10, 1]
]
for line in data:
fields = [line[(table[col][0]-1) : (table[col][0]+table[col][1]-1)] for col in range(len(table))]
print fields