Convert float64 to string with 2 decimal - python

I have a column of data and I am trying to push to 2 decimal places from the data to get
73.35
35.72
35.51 etc.
It currently looks like this when I read the excel file into python.
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
...
49056 73.345000
49057 35.720833
49058 35.505000
49059 17.075000
49060 27.710000
Name: AMOUNT, Length: 49061, dtype: object
I am using
pd.to_numeric(df['PAYMENT']).fillna(0).astype(str).mask(df['PAYMENT'].isnull())
But only get this
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
...
49056 73.345
49057 35.720833
49058 35.505
49059 17.075
49060 27.71
Name: AMOUNT, Length: 49061, dtype: object
Any help is appreciated!

df['PAYMENT'].round(2).astype(str)
round() converts the number to two decimal places, then you can do astype(str) to convert it to a string. Not sure really why you need it to be a string though

Two ideas:
Do you have values >= 1000? Excel might write them as 1,234.567
To round your values to 2 decimals use .round(2), e.g.
df['PAYMENT'].astype(float).fillna(0).round(2)

Related

Add commas to decimal column without rounding off

I have pandas column named Price_col. which look like this.
Price_col
1. 1000000.000
2. 234556.678900
3. 2345.00
4.
5. 23.56
I am trying to add commas to my Price_col to look like this.
Price_col
1. 1,000,000.000
2. 234,556.678900
3. 2,345.00
4.
5. 23.56
when I try convert the values it always round off. is there way that I can have original value without rounding off.
I tried below code. this what I got for the value 234556.678900.
n = "{:,}".format(234556.678900)
print(n)
>>> 234,556.6789
Add f for fixed-point
>>> "{:,}".format(234556.678900)
'234,556.6789'
>>> "{:,f}".format(234556.678900)
'234,556.678900'
You can also control the precision with .p where p is the number of digits (and should probably do so) .. beware, as you're dealing with floats, you'll have some IEEE 754 aliasing, though representation via format should be quite nice regardless of the backing data
>>> "{:,.5f}".format(234556.678900)
'234,556.67890'
>>> "{:,.20f}".format(234556.678900)
'234,556.67889999999897554517'
The full Format Specification Mini-Language can be found here:
https://docs.python.org/3/library/string.html#format-specification-mini-language
From your comment, I realized you may really want something else as described in How to display pandas DataFrame of floats using a format string for columns? and only change the view of the data
Creating a new string column formatted as a string
>>> df = pd.DataFrame({"Price_col": [1000000.000, 234556.678900, 2345.00, None, 23.56]}
>>> df["price2"] = df["Price_col"].apply(lambda x: f"{x:,f}")
>>> df
Price_col price2
0 1000000.0000 1,000,000.000000
1 234556.6789 234,556.678900
2 2345.0000 2,345.000000
3 NaN nan
4 23.5600 23.560000
>>> df.dtypes
Price_col float64
price2 object
dtype: object
Temporarily changing how data is displayed
>>> df = pd.DataFrame({"Price_col": [1000000.000, 234556.678900, 2345.00, None, 23.56]}
>>> print(df)
Price_col
0 1000000.0000
1 234556.6789
2 2345.0000
3 NaN
4 23.5600
>>> with pd.option_context('display.float_format', '€{:>18,.6f}'.format):
... print(df)
...
Price_col
0 € 1,000,000.000000
1 € 234,556.678900
2 € 2,345.000000
3 NaN
4 € 23.560000
>>> print(df)
Price_col
0 1000000.0000
1 234556.6789
2 2345.0000
3 NaN
4 23.5600

How to create a dataframe from series object when iterating

I am iterating and as a result of a single iteration I acquire a pandas series object which looks like this:
DE_AT 118.55
DE_CZ 62.73
PL_DE 263.36
PL_SK 315.07
dtype: float64
Sometimes I might get different names and lengths of this series for example I might get:
DE_AT 118.55
DE_CZ 62.73
PL_DE 263.36
PL_NL 315.07
PL_UK 420
dtype: float64
Now I want to create a dataframe from these series objects when iterating such that I will have all names as the index, from these two series objects I would like to get:
index 1 2
DE_AT 118.55 118.55
DE_CZ 62.73 62.73
PL_DE 263.36 263.36
PL_SK 315.07 NaN
PL_NL NaN 315.07
PL_UK NaN 420
Or maybe I can store them in a list and later create a dataframe?
Basic outer join of two series:
s1=pd.Series(index=["DE_AT","DE_CZ","PL_DE", "PL_SK"], data=[1,2,3,4]).to_frame()
s2=pd.Series(index=["DE_AT","DE_CZ","PL_DE", "PL_NL", "PL_UK"], data=[1,2,3,4,5]).to_frame()
s1.join(s2, how="outer",lsuffix="1",rsuffix="2")
Output:
index
00
01
DE_AT
1.0
1.0
DE_CZ
2.0
2.0
PL_DE
3.0
3.0
PL_NL
NaN
4.0
PL_SK
4.0
NaN
PL_UK
NaN
5.0

Convert DataFrame with 'N/As' to float to compute percent change

I am trying convert the following DataFrame (contains several 'N/As') to float so that I can perform a percent change operation:
d = pd.DataFrame({"A":['N/A','$10.00', '$5.00'],
"B":['N/A', '$10.00', '-$5.00']})
Ultimately, I would like the result to be:
(UPDATE: I do not want to remove the original N/A values. I'd like to keep them there as placeholders.)
Because there aren't any flags for dealing with negative numbers, I cannot use:
pct_change(-1)
So, I need to use:
d['A'].diff(-1)/d['A'].shift(-1).abs()
But, I get the error:
TypeError: unsupported operand type(s) for -: 'str' and 'str'
For a first step, I am trying to convert the data from object/string to float, but the output is unexpected (to me). I am getting float 'NaNs' instead of the actual number.
>d['A_float'] = pd.to_numeric(d['A'], errors='coerce')
>d
A B A_float
0 N/A N/A NaN
1 $10.00 -$100.00 NaN
2 $5.00 -$5.00 NaN
>d.dtypes
A object
B object
A_float float64
dtype: object
As a simple test, I tried subtracting '1' from the value, but still got float 'NaN'.
>d['A_float_minus1_test'] = pd.to_numeric(d['A'], errors='coerce')-1
>d
A B A_float A_float_minus1_test
0 N/A N/A NaN NaN
1 $10.00 -$100.00 NaN NaN
2 $5.00 -$5.00 NaN NaN
>d.dtypes
A object
B object
A_float float64
A_float_minus1_test float64
dtype: object
Is there a simple way to get the following result? The way I am thinking is to individually change each DataFrame column to float, then perform the operation. There must be an easier way.
Desired output:
(UPDATE: I do not want to remove the original N/A values. I'd like to keep them there as placeholders.)
Thanks!
To convert your columns from string to float, you can use apply, like such:
d['A_float'] = d['A'].apply(lambda x: float(x.split('$')[1]) if x != '' else 0.0)
The x.split('$')[1] is used to remove the $ character (and eventually the minus before).
Then I am not sure of what your are trying to do, but if you are trying to compute the percentage of A from B, you can use np.vectorize like this:
d['Percent'] = np.vectorize(percent)(d['A'],d['B'])
def percent(p1, p2):
return (100 * p2) / p1
import pandas as pd
d = pd.DataFrame({"A":['N/A','$10.00', '$5.00'],
"B":['N/A', '$10.00', '-$5.00']})
# Covert to number, remove '$', assign to new columns
d[['dA','dB']] = d[['A','B']].apply(lambda s: s.str.replace('$','')).apply(pd.to_numeric, errors='coerce')
# Perform calculations across desired column
d[['dA','dB']] = d[['dA','dB']].diff(-1)/d[['dA','dB']].shift(-1).abs()
print(d)
A B dA dB
0 N/A N/A NaN NaN
1 $10.00 $10.00 1.0 3.0
2 $5.00 -$5.00 NaN NaN

Excluding the NaN values while doing a Sum operation across the rows inside FOR Loop

I am having two data frame as given below
df1=
2492 3853 2486 3712 2288
0 4 NaN 3.5 NaN NaN
1 3 NaN 2.0 4.5 3.5
2 3 3.5 4.5 NaN 3.5
3 3. NaN 3.5 4.5 NaN
df2=
2492 0.476683
3853 0.464110
2486 0.438992
3712 0.400275
2288 0.379856
Right now I would like to get the sum of df2 values by excluding the NaN Values
Expected output
0 0.915675[0.476683+0.438992]
1 1.695806[0.476683+0.438992+0.400275+0.379856]
2 1.759641[0.476683+0.464110+0.438992+0.379856]
3 1.31595 [0.476683+0.438992+0.400275]
Please let me know your thoughts how to achieve this issue(without replacing NaN values as "0" )
df2.sum(1).sum()
Should be enough and skip NaNs.
The first sum is a DataFrame method that returns a Series which contains the sum for every line, then the second is summing the values on this Series.
NaNs are ignored by default.
edit: using simply df2.sum() should be enough
You can do:
>>> ((df1.fillna(0)>0)*1).mul(df2.iloc[:,1].values).sum(axis=1)
0 0.915675
1 1.695806
2 1.759641
3 1.315950
dtype: float64
Note that NaN are not replaced "by reference", you still have NaN in your original df1 after this operation.

Pandas converting column of strings and NaN (floats) to integers, keeping the NaN [duplicate]

This question already has answers here:
Convert Pandas column containing NaNs to dtype `int`
(27 answers)
Closed 3 years ago.
I have problems in converting a column which contains both numbers of 2 digits in string format (type: str) and NaN (type: float64). I want to obtain a new column made this way: NaN where there was NaN and integer numbers where there was a number of 2 digits in string format.
As an example: I want to obtain column Yearbirth2 from column YearBirth1 like this:
YearBirth1 #numbers here are formatted as strings: type(YearBirth1[0])=str
34 # and NaN are floats: type(YearBirth1[2])=float64.
76
Nan
09
Nan
91
YearBirth2 #numbers here are formatted as integers: type(YearBirth2[0])=int
34 #NaN can remain floats as they were.
76
Nan
9
Nan
91
I have tried this:
csv['YearBirth2'] = (csv['YearBirth1']).astype(int)
And as I expected i got this error:
ValueError: cannot convert float NaN to integer
So I tried this:
csv['YearBirth2'] = (csv['YearBirth1']!=NaN).astype(int)
And got this error:
NameError: name 'NaN' is not defined
Finally I have tried this:
csv['YearBirth2'] = (csv['YearBirth1']!='NaN').astype(int)
NO error, but when I checked the column YearBirth2, this was the result:
YearBirth2:
1
1
1
1
1
1
Very bad.. I think the idea is right but there is a problem to make Python able to understand what I mean for NaN.. Or maybe the method I tried is wrong..
I also used pd.to_numeric() method, but this way i obtain floats, not integers..
Any help?!
Thanks to everyone!
P.S: csv is the name of my DataFrame;
Sorry if I am not so clear, I am on improving with English language!
You can use to_numeric, but is impossible get int with NaN values - they are always converted to float: see na type promotions.
df['YearBirth2'] = pd.to_numeric(df.YearBirth1, errors='coerce')
print (df)
YearBirth1 YearBirth2
0 34 34.0
1 76 76.0
2 Nan NaN
3 09 9.0
4 Nan NaN
5 91 91.0

Categories