How to align text inside a cell in pandas - python

If I have a cell containing 2 characters and sometimes 3.
I need to format the cell-like:
<2spaces>XX<2spaces>
and if contains 3 characters:
<2spaces>XXX<1space>.
I use a new-style format
dx['C'] = dx['C'].map('{:^4s}'.format)
Note: dx['C'] is a column in pandas table.

Given:
C
0 aaa
1 aa
Doing:
df.C = df.C.str.center(6) if len(df.C)%2 else (' ' + df.C).str.center(6)
Output:
C
0 aaa
1 aa

Related

Can we pass a column instead of a variable to access nth item of a list?

My data contains multiple columns, on which I have done a group by and given row numbers based on the group by. I'm using python here
The column 'Text' is a list of strings. The entire 'Text' was initially 1 string, which is split into a list with ; as the delimiter. Rownumber are integers.
What I want to do here is, consider Rownum as a pointer towards output column.
If Rownum= 0, then my output should be Text[0], i.e a.
If Rownum = 2, Output should be 2nd item of the list Text, Text1, i.e b
To achieve this, I tried:
df['Outpu'] = df.apply(lambda x:x['Text'].split(';'),axis = 1)[df['Row_num']]
But I got the error "cannot reindex from a duplicate axis".
Not entirely sure what it means.
I have attached an image of my data and output, but also have written down the format incase image isn't available. Hope I explained the situation clear enough.
Original Text is not a,b,c: Just put it there for easier understanding. This is the text:
Text: [TAF KPHX 010246Z 0103/0206 VRB04KT P6SM FEW070 BKN160 ;FM010700 10005KT P6SM SCT060 BKN150 ;FM012100 21006KT P6SM SCT070 SCT140 ;FM020000 26005KT P6SM FEW070 SCT140]
Row num Text Output
0 [a,b,c,d] a
1 [a,b,c,d] b
2 [a,b,c,d] c
3 [a,b,c,d] d
0 [d,e,f] d
1 [d,e,f] e
2 [d,e,f] f
As i see, Delimeter should be "," not ";". Since "Text" is a single string then first remove square brackets uisng replace() then split it by ',' and then extract the element using index in "Row_num"
df =pd.DataFrame({"Row_num": [0,1,2,3,0,1,2],"Text":['[a,b,c,d]', '[a,b,c,d]', '[a,b,c,d]', '[a,b,c,d]', '[d,e,f]', '[d,e,f]', '[d,e,f]']})
df["Output"] = df.apply(lambda x: x.Text.replace("[","").replace("]","").split(",")[x.Row_num], axis=1)
print(df)
Row_num Text Output
0 0 [a,b,c,d] a
1 1 [a,b,c,d] b
2 2 [a,b,c,d] c
3 3 [a,b,c,d] d
4 0 [d,e,f] d
5 1 [d,e,f] e
6 2 [d,e,f] f
Lets say you have "," as a delimeter in Text column then try:
df =pd.DataFrame({"Row_num": [0,1,2,3,4],"Text":["a;b;c;d;e"]*5})
df["Output"] = df.apply(lambda x: x.Text.split(";")[x.Row_num], axis=1)
print(df)
Row_num Text Output
0 0 a;b;c;d;e a
1 1 a;b;c;d;e b
2 2 a;b;c;d;e c
3 3 a;b;c;d;e d
4 4 a;b;c;d;e e
Make sure data is available in Text at mentioned Row_num

Formatting specific rows in Dash Datatable with %, $, etc

I am using the Dash Datatable code to create the table in Plotly/Python. I would like to format the various rows in value column. For example, I would like to format Row[1] with $ sign, while Row[2] with %. TIA
#Row KPI Value
0 AA 1
1 BB $230.
2 CC 54%
3 DD 5.6.
4 EE $54000
Table
I have been looking into this issue as well. unfortunately I didn't succeed with any thing built-in either. If you do in the future, please let me know.
However, the solution that I implemented was the following function to easily change the format of DataFrame elements to strings with the formatting I would like:
def dt_formatter(df:pd.DataFrame,
formatter:str,
slicer:pd.IndexSlice=None)->pd.DataFrame:
if not slicer:
for col in df.columns:
df[col] = df[col].apply(formatter.format,axis = 0)
return df
else:
dfs = df.loc[slicer].copy()
for col in dfs.columns:
dfs[col] = dfs[col].apply(formatter.format,axis = 0)
df.loc[slicer] = dfs
return df
and the using your regular slicing / filtering with your base dataframe df. Assuming your base df looks like this:
>>> df
#Row KPI Value
0 AA 1
1 BB 230
2 CC 54
3 DD 5.6
4 EE 54000
>>> df = dt_formatter(df, '{:.0%}', pd.IndexSlice[df['#Row'] == 1,'Value')
>>> df
#Row KPI Value
0 AA 1
1 BB 230%
2 CC 54
3 DD 5.6
4 EE 54000
using a different slicer and different formatting string, you could "build" your DataFrame using such a helper function.

Add a character in a string inside a column dataframe

I have a dataframe with some numbers (or strings, it doesn't actually matter). The thing is that I need to add a character in the middle of them. The dataframe looks like this (I got it from Google Takeout)
id A B
1 512343 -1234
1 213 1231345
1 18379 187623
And I want to add a comma in the second position
id A B
1 51,2343 -12,34
1 21,3 12,31345
1 18,379 18,7623
A and B are actually longitude and latitude so I think it is not possible to achieve to add the comma in the right place since there is no way to know if a number is supposed to have one or two digits as coordinates, but it would do the trick if I can put the comma on the second position.
This should do the trick:
df[["A", "B"]]=df[["A", "B"]].astype(str).replace(r"(\d{2})(\d+)", r"\1,\2", regex=True)
Outputs:
id A B
0 1 51,2343 -12,34
1 1 21,3 12,31345
2 1 18,379 18,7623
Here's another approach with str.extract:
for c in ['A','B']:
df[c] = df[c].astype(str).str.extract('(-?\d{2})(\d*)').agg(','.join,axis=1)
Output:
id A B
0 1 51,2343 -12,34
1 1 21,3 12,31345
2 1 18,379 18,7623
You could do something like this -
import numpy as np
df['A'] = np.where(df['A']>=0,'', '-') + ( df['A'].abs().astype(str).str[:2] + ',' + df['A'].abs().astype(str).str[2:] )
df['B'] = np.where(df['B']>=0,'', '-') + ( df['B'].abs().astype(str).str[:2] + ',' + df['B'].abs().astype(str).str[2:] )
df
id A B
0 1 51,2343 -12,34
1 1 21,3 12,31345
2 1 18,379 18,7623

How to insert string value into specific column value on python pandas?

I have the following dataframe.
import pandas as pd
data=['ABC1','ABC2','ABC3','ABC4']
data = pd.DataFrame(data,columns=["Column A"])
Column A
0 ABC1
1 ABC2
2 ABC3
3 ABC4
How to insert "-" a ABC on column A of data?
Output:
Column A
0 ABC-1
1 ABC-2
2 ABC-3
3 ABC-4
The Simplest solution to Use replace method as a regex and inplace method to make it permanent in the dataframe.
>>> data['Column A'].replace(['ABC'], 'ABC-', regex=True, inplace=True)
print(data)
Column A
0 ABC-1
1 ABC-2
2 ABC-3
3 ABC-4
A possible solution is
data['Column A'] = data['Column A'].str[:-1] + '-' + data['Column A'].str[-1]
print (data)
# Column A
#0 ABC-1
#1 ABC-2
#2 ABC-3
#3 ABC-4
Here's a way which only assumes that the numbers to be preceded by a dash are at the end:
df['ColumnA'].str.split('([A-z]+)(\d+)').str.join('-').str.strip('-')
0 ABC-1
1 ABC-2
2 ABC-3
3 ABC-4
Another example:
df = pd.DataFrame({'ColumnA':['asf1','Ads2','A34']})
Will give:
df['ColumnA'].str.split('([A-z]+)(\d+)').str.join('-').str.strip('-')
0 asf-1
1 Ads-2
2 A-34

how to replace non-numeric or decimal in string in pandas

I have a column with values in degrees with the degree sign.
42.9377º
42.9368º
42.9359º
42.9259º
42.9341º
The digit 0 should replace the degree symbol
I tried using regex or str.replace but I can't figure out the exact unicode character.
The source xls has it as º
the error shows it as an obelus ÷
printing the dataframe shows it as ?
the exact position of the degree sign may vary, depending on rounding of the decimals, so I can't replace using exact string position.
Use str.replace:
df['a'] = df['a'].str.replace('º', '0')
print (df)
a
0 42.93770
1 42.93680
2 42.93590
3 42.92590
4 42.93410
#check hex format of char
print ("{:02x}".format(ord('º')))
ba
df['a'] = df['a'].str.replace(u'\xba', '0')
print (df)
a
0 42.93770
1 42.93680
2 42.93590
3 42.92590
4 42.93410
Solution with extract floats.
df['a'] = df['a'].str.extract('(\d+\.\d+)', expand=False) + '0'
print (df)
a
0 42.93770
1 42.93680
2 42.93590
3 42.92590
4 42.93410
Or if all last values are º is possible use indexing with str:
df['a'] = df['a'].str[:-1] + '0'
print (df)
a
0 42.93770
1 42.93680
2 42.93590
3 42.92590
4 42.93410
If you know that it's always the last character you could remove that character and append a "0".
s = "42.9259º"
s = s[:-1]+"0"
print(s) # 42.92590

Categories