How to extract a value from a list in Pandas - python

Hi I have a dataframe and looks like this:
0 1
0 0 [03/25/93]
1 1 [6/18/85]
2 2 [7/8/71]
3 3 [9/27/75]
4 4 []
5 5 []
How can I extract the value inside the list in another column of the DataFrame???
0 1
0 0 03/25/93
1 1 6/18/85
2 2 7/8/71
3 3 9/27/75
4 4 NaN
5 5 Nan
Thank you very much.

Use str[0]:
df[1] = df[1].str[0]
print (df)
0 1
0 0 03/25/93
1 1 6/18/85
2 2 7/8/71
3 3 9/27/75
4 4 NaN
5 5 NaN

Related

Replace all column values with value from single column - Pandas

I'm hoping to replace values in all columns within a df using integers from a specified column. Using the df below I want to use the values in Code and replace them in all other columns.
df = pd.DataFrame({
'Place' : ['X','Y','X','Y','X','Y','X','Y'],
'Number' : ['A','B','C','D','F','G','H','I'],
'Code' : [1,2,3,0,1,2,5,4],
'Value' : ['','','','','','','','']
})
df[:] = df['Code'].apply(lambda x: x if np.isreal(x) else 0).astype(int)
print(df)
Intended Output:
Place Number Code Value
0 1 1 1 1
1 2 2 2 2
2 3 3 3 3
3 0 0 0 0
4 1 1 1 1
5 2 2 2 2
6 5 5 5 5
7 4 4 4 4
Use reindex, ffill, bfill
df[['Code']].reindex(columns=df.columns).ffill(1).bfill(1).astype(int)
Out[256]:
Place Number Code Value
0 1 1 1 1
1 2 2 2 2
2 3 3 3 3
3 0 0 0 0
4 1 1 1 1
5 2 2 2 2
6 5 5 5 5
7 4 4 4 4
Numpy solution
df[:] = np.transpose([df.Code] * df.shape[1])
Out[314]:
Place Number Code Value
0 1 1 1 1
1 2 2 2 2
2 3 3 3 3
3 0 0 0 0
4 1 1 1 1
5 2 2 2 2
6 5 5 5 5
7 4 4 4 4
Try this:
df[df.columns] = df[['Code', 'Code', 'Code', 'Code']]
or:
df[df.columns] = df[['Code']*len(df.columns)]
Hope it helps you.

Pandas Insert a new row after every nth row

I have a dataframe that looks like below:
**L_Type L_ID C_Type E_Code**
0 1 1 9
0 1 2 9
0 1 3 9
0 1 4 9
0 2 1 2
0 2 2 2
0 2 3 2
0 2 4 2
0 3 1 3
0 3 2 3
0 3 3 3
0 3 4 3
I need to insert a new row after every 4 row and increment the value in third column (C_Type) by 01 like below table while keeping the values same as first two columns and does not want any value in last column:
L_Type L_ID C_Type E_Code
0 1 1 9
0 1 2 9
0 1 3 9
0 1 4 9
0 1 5
0 2 1 2
0 2 2 2
0 2 3 2
0 2 4 2
0 2 5
0 3 1 3
0 3 2 3
0 3 3 3
0 3 4 3
0 3 5
I have searched other threads but could not figure out the exact solution:
How to insert n DataFrame to another every nth row in Pandas?
Insert new rows in pandas dataframe
You can seelct rows by slicing, add 1 to column C_Type and 0.5 to index, for 100% sorrect slicing, because default method of sorting in DataFrame.sort_index is quicksort. Last join together, sort index and create default by concat with DataFrame.reset_index and drop=True:
df['C_Type'] = df['C_Type'].astype(int)
df2 = (df.iloc[3::4]
.assign(C_Type = lambda x: x['C_Type'] + 1, E_Code = np.nan)
.rename(lambda x: x + .5))
df1 = pd.concat([df, df2], sort=False).sort_index().reset_index(drop=True)
print (df1)
L_Type L_ID C_Type E_Code
0 0 1 1 9.0
1 0 1 2 9.0
2 0 1 3 9.0
3 0 1 4 9.0
4 0 1 5 NaN
5 0 2 1 2.0
6 0 2 2 2.0
7 0 2 3 2.0
8 0 2 4 2.0
9 0 2 5 NaN
10 0 3 1 3.0
11 0 3 2 3.0
12 0 3 3 3.0
13 0 3 4 3.0
14 0 3 5 NaN

Creating a new column in panda dataframe using logical indexing and group by

I have a data frame like below
df=pd.DataFrame({'a':['a','a','b','a','b','a','a','a'], 'b' : [1,0,0,1,0,1,1,1], 'c' : [1,2,3,4,5,6,7,8],'d':['1','2','1','2','1','2','1','2']})
df
Out[94]:
a b c d
0 a 1 1 1
1 a 0 2 2
2 b 0 3 1
3 a 1 4 2
4 b 0 5 1
5 a 1 6 2
6 a 1 7 1
7 a 1 8 2
I want something like below
df[(df['a']=='a') & (df['b']==1)]
In [97]:
df[(df['a']=='a') & (df['b']==1)].groupby('d')['c'].rank()
df[(df['a']=='a') & (df['b']==1)].groupby('d')['c'].rank()
Out[97]:
0 1
3 1
5 2
6 2
7 3
dtype: float64
I want this rank as a new column in dataframe df and wherever there is no rank I want NaN. SO final output will be something like below
a b c d rank
0 a 1 1 1 1
1 a 0 2 2 NaN
2 b 0 3 1 NaN
3 a 1 4 2 1
4 b 0 5 1 NaN
5 a 1 6 2 2
6 a 1 7 1 2
7 a 1 8 2 3
I will appreciate all the help and guidance. Thanks a lot.
Almost there, you just need to call transform to return a series with an index aligned to your orig df:
In [459]:
df['rank'] = df[(df['a']=='a') & (df['b']==1)].groupby('d')['c'].transform(pd.Series.rank)
df
Out[459]:
a b c d rank
0 a 1 1 1 1
1 a 0 2 2 NaN
2 b 0 3 1 NaN
3 a 1 4 2 1
4 b 0 5 1 NaN
5 a 1 6 2 2
6 a 1 7 1 2
7 a 1 8 2 3

Pandas: expand index of a series so it contains all values in a range

I have a pandas series that looks like this:
>>> x.sort_index()
2 1
5 2
6 3
8 4
I want to fill out this series so that the "missing" index rows are represented, filling in the data values with a 0.
So that when I list the new series, it looks like this:
>>> z.sort_index()
1 0
2 1
3 0
4 0
5 2
6 3
7 0
8 4
I have tried creating a "dummy" Series
>>> y = pd.Series([0 for i in range(0,8)])
0 0
1 0
2 0
3 0
4 0
5 0
6 0
7 0
And then concat'ing them together - but the results are either:
>>> pd.concat([x,z],axis=0)
2 1
5 2
6 3
8 4
0 0
1 0
2 0
3 0
4 0
5 0
6 0
7 0
Or
>>> pd.concat([x,z],axis=1)
0 1
0 NaN 0
1 NaN 0
2 1 0
3 NaN 0
4 NaN 0
5 2 0
6 3 0
7 NaN 0
8 4 NaN
Neither of which is my target structure listed above.
I could try performing some arithmetic on the axis=1 version, and taking a sum of columns 1 and 2, but am looking for a neater, one-line version of this - does such an index filling/cleansing operation exist, and if so, what is it?
What you want is a reindex. First create the index as you want (in this case just a range), and then reindex with it:
In [64]: x = pd.Series([1,2,3,4], index=[2,5,6,8])
In [65]: x
Out[65]:
2 1
5 2
6 3
8 4
dtype: int64
In [66]: x.reindex(range(9), fill_value=0)
Out[66]:
0 0
1 0
2 1
3 0
4 0
5 2
6 3
7 0
8 4
dtype: int64
Apologies - slightly embarrassing situation, but having read here on what to do in this situation, am offering an answer to my own question.
I read the documentation here - one way of doing what I'm looking for is this:
>>> x.combine_first(y)
0 0
1 0
2 1
3 0
4 0
5 2
6 3
7 0
8 4
dtype: float64
N.B. in the above,
>>> y = pd.Series([0 for i in range(0,8)])

Python: Replace a cell value in Dataframe with if statement

I have a matrix with that looks like this:
com 0 1 2 3 4 5
AAA 0 5 0 4 2 1 4
ABC 0 9 8 9 1 0 3
ADE 1 4 3 5 1 0 1
BCD 1 6 7 8 3 4 1
BCF 2 3 4 2 1 3 0 ...
Where AAA, ABC ... is the dataframe index. The dataframe columns are com 0 1 3 4 5 6
I want to set the cell values in my dataframe equal to 0 when the row values of com is equal the column "number". So for instance, the above matrix will look like:
com 0 1 2 3 4 5
AAA 0 0 0 4 2 1 4
ABC 0 0 8 9 1 0 3
ADE 1 4 0 5 1 0 1
BCD 1 6 0 8 3 4 1
BCF 2 3 4 0 1 3 0 ...
I tried to iterate over rows and use both .loc and .ix but no success.
Just require some numpy trick
In [22]:
print df
0 1 2 3 4 5
0 5 0 4 2 1 4
0 9 8 9 1 0 3
1 4 3 5 1 0 1
1 6 7 8 3 4 1
2 3 4 2 1 3 0
[5 rows x 6 columns]
In [23]:
#making a masking matrix, 0 where column and index values equal, 1 elsewhere, kind of the vectorized way of doing if TURE 0, else 1
print df*np.where(df.columns.values==df.index.values[..., np.newaxis], 0,1)
0 1 2 3 4 5
0 0 0 4 2 1 4
0 0 8 9 1 0 3
1 4 0 5 1 0 1
1 6 0 8 3 4 1
2 3 4 0 1 3 0
[5 rows x 6 columns]
I think this should work.
for line in range(len(matrix)):
matrix[matrix[line][0]+1]=0
NOTE
Depending on your matrix setup you may not need the +1
Basically it takes the first digit of each line in the matrix and uses that as the index of the value to change to 0
i.e. if the row was
c 0 1 2 3 4 5
AAA 4 3 2 3 9 5 9,
it would change the 5 below the number 4 to 0
c 0 1 2 3 4 5
AAA 4 3 2 3 9 0 9

Categories