Add commas to decimal column without rounding off - python

I have pandas column named Price_col. which look like this.
Price_col
1. 1000000.000
2. 234556.678900
3. 2345.00
4.
5. 23.56
I am trying to add commas to my Price_col to look like this.
Price_col
1. 1,000,000.000
2. 234,556.678900
3. 2,345.00
4.
5. 23.56
when I try convert the values it always round off. is there way that I can have original value without rounding off.
I tried below code. this what I got for the value 234556.678900.
n = "{:,}".format(234556.678900)
print(n)
>>> 234,556.6789

Add f for fixed-point
>>> "{:,}".format(234556.678900)
'234,556.6789'
>>> "{:,f}".format(234556.678900)
'234,556.678900'
You can also control the precision with .p where p is the number of digits (and should probably do so) .. beware, as you're dealing with floats, you'll have some IEEE 754 aliasing, though representation via format should be quite nice regardless of the backing data
>>> "{:,.5f}".format(234556.678900)
'234,556.67890'
>>> "{:,.20f}".format(234556.678900)
'234,556.67889999999897554517'
The full Format Specification Mini-Language can be found here:
https://docs.python.org/3/library/string.html#format-specification-mini-language

From your comment, I realized you may really want something else as described in How to display pandas DataFrame of floats using a format string for columns? and only change the view of the data
Creating a new string column formatted as a string
>>> df = pd.DataFrame({"Price_col": [1000000.000, 234556.678900, 2345.00, None, 23.56]}
>>> df["price2"] = df["Price_col"].apply(lambda x: f"{x:,f}")
>>> df
Price_col price2
0 1000000.0000 1,000,000.000000
1 234556.6789 234,556.678900
2 2345.0000 2,345.000000
3 NaN nan
4 23.5600 23.560000
>>> df.dtypes
Price_col float64
price2 object
dtype: object
Temporarily changing how data is displayed
>>> df = pd.DataFrame({"Price_col": [1000000.000, 234556.678900, 2345.00, None, 23.56]}
>>> print(df)
Price_col
0 1000000.0000
1 234556.6789
2 2345.0000
3 NaN
4 23.5600
>>> with pd.option_context('display.float_format', '€{:>18,.6f}'.format):
... print(df)
...
Price_col
0 € 1,000,000.000000
1 € 234,556.678900
2 € 2,345.000000
3 NaN
4 € 23.560000
>>> print(df)
Price_col
0 1000000.0000
1 234556.6789
2 2345.0000
3 NaN
4 23.5600

Related

Convert float64 to string with 2 decimal

I have a column of data and I am trying to push to 2 decimal places from the data to get
73.35
35.72
35.51 etc.
It currently looks like this when I read the excel file into python.
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
...
49056 73.345000
49057 35.720833
49058 35.505000
49059 17.075000
49060 27.710000
Name: AMOUNT, Length: 49061, dtype: object
I am using
pd.to_numeric(df['PAYMENT']).fillna(0).astype(str).mask(df['PAYMENT'].isnull())
But only get this
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
...
49056 73.345
49057 35.720833
49058 35.505
49059 17.075
49060 27.71
Name: AMOUNT, Length: 49061, dtype: object
Any help is appreciated!
df['PAYMENT'].round(2).astype(str)
round() converts the number to two decimal places, then you can do astype(str) to convert it to a string. Not sure really why you need it to be a string though
Two ideas:
Do you have values >= 1000? Excel might write them as 1,234.567
To round your values to 2 decimals use .round(2), e.g.
df['PAYMENT'].astype(float).fillna(0).round(2)

Pandas how to truncate small float values

I'm using pandas.DataFrame.round to truncate columns on a DataFrame, but I have a column of p-values that have small values, which are being rounded to zero. For example, all the values bellow are being rounded to 0.
p-value
2.298564e-17
6.848231e-91
1.089847e-10
9.390048e-04
5.628517e-35
4.621786e-19
4.601818e-54
9.639073e-19
I want something like
p-value
2.29e-17
6.84e-91
1.08e-10
9.39e-04
5.62e-35
4.62e-19
4.60e-54
9.63e-19
Numpy has functions for this.
data = """p-value
2.298564e-17
6.848231e-91
1.089847e-10
9.390048e-04
5.628517e-35
4.621786e-19
4.601818e-54
9.639073e-19"""
a = [x for x in data.split("\n")]
df = pd.DataFrame({"p-value":a[1:]})
df["p-value"] = df["p-value"].astype(np.float)
df["p-value"].apply(lambda x: np.format_float_scientific(x, precision=2))
output
0 2.3e-17
1 6.85e-91
2 1.09e-10
3 9.39e-04
4 5.63e-35
5 4.62e-19
6 4.60e-54
7 9.64e-19
Name: p-value, dtype: object
not quite truncate, but rather round:
df['p-value'].apply(lambda x: f'{x:.2e}')
Output:
0 2.30e-17
1 6.85e-91
2 1.09e-10
3 9.39e-04
4 5.63e-35
5 4.62e-19
6 4.60e-54
7 9.64e-19
Name: p-value, dtype: object

Convert DataFrame with 'N/As' to float to compute percent change

I am trying convert the following DataFrame (contains several 'N/As') to float so that I can perform a percent change operation:
d = pd.DataFrame({"A":['N/A','$10.00', '$5.00'],
"B":['N/A', '$10.00', '-$5.00']})
Ultimately, I would like the result to be:
(UPDATE: I do not want to remove the original N/A values. I'd like to keep them there as placeholders.)
Because there aren't any flags for dealing with negative numbers, I cannot use:
pct_change(-1)
So, I need to use:
d['A'].diff(-1)/d['A'].shift(-1).abs()
But, I get the error:
TypeError: unsupported operand type(s) for -: 'str' and 'str'
For a first step, I am trying to convert the data from object/string to float, but the output is unexpected (to me). I am getting float 'NaNs' instead of the actual number.
>d['A_float'] = pd.to_numeric(d['A'], errors='coerce')
>d
A B A_float
0 N/A N/A NaN
1 $10.00 -$100.00 NaN
2 $5.00 -$5.00 NaN
>d.dtypes
A object
B object
A_float float64
dtype: object
As a simple test, I tried subtracting '1' from the value, but still got float 'NaN'.
>d['A_float_minus1_test'] = pd.to_numeric(d['A'], errors='coerce')-1
>d
A B A_float A_float_minus1_test
0 N/A N/A NaN NaN
1 $10.00 -$100.00 NaN NaN
2 $5.00 -$5.00 NaN NaN
>d.dtypes
A object
B object
A_float float64
A_float_minus1_test float64
dtype: object
Is there a simple way to get the following result? The way I am thinking is to individually change each DataFrame column to float, then perform the operation. There must be an easier way.
Desired output:
(UPDATE: I do not want to remove the original N/A values. I'd like to keep them there as placeholders.)
Thanks!
To convert your columns from string to float, you can use apply, like such:
d['A_float'] = d['A'].apply(lambda x: float(x.split('$')[1]) if x != '' else 0.0)
The x.split('$')[1] is used to remove the $ character (and eventually the minus before).
Then I am not sure of what your are trying to do, but if you are trying to compute the percentage of A from B, you can use np.vectorize like this:
d['Percent'] = np.vectorize(percent)(d['A'],d['B'])
def percent(p1, p2):
return (100 * p2) / p1
import pandas as pd
d = pd.DataFrame({"A":['N/A','$10.00', '$5.00'],
"B":['N/A', '$10.00', '-$5.00']})
# Covert to number, remove '$', assign to new columns
d[['dA','dB']] = d[['A','B']].apply(lambda s: s.str.replace('$','')).apply(pd.to_numeric, errors='coerce')
# Perform calculations across desired column
d[['dA','dB']] = d[['dA','dB']].diff(-1)/d[['dA','dB']].shift(-1).abs()
print(d)
A B dA dB
0 N/A N/A NaN NaN
1 $10.00 $10.00 1.0 3.0
2 $5.00 -$5.00 NaN NaN

Rounding up decimals Python

I am new to python pandas and I am having difficulties trying to round up all the values in the column as there is a white space between the decimal point and zero. For example,
Hi
21. 0
8. 0
52. 0
45. 0
I tried using my current code below, but it gave me:
invalid literal for float(): 21. 0
df.Hi.astype(float).round()
Try using replace on the string to replace all whitespace in the string before converting to a float:
df.Hi.str.replace(' ', '').astype(float).round()
If I understand you correctly, you want to convert the values in column Hi to float.
You have a white space between the decimal points and zeros, which means your values are strings.
You can convert them to float using a lambda function.
df['Hi'] = df['Hi'].apply(lambda x: x.replace(" ", "")).astype(float)
print(df)
Hi
0 21.0
1 8.0
2 52.0
3 45.0
print(df.dtypes)
Hi float64
dtype: object

Precision lost while using read_csv in pandas

I have files of the below format in a text file which I am trying to read into a pandas dataframe.
895|2015-4-23|19|10000|LA|0.4677978806|0.4773469340|0.4089938425|0.8224291972|0.8652525793|0.6829942860|0.5139162227|
As you can see there are 10 integers after the floating point in the input file.
df = pd.read_csv('mockup.txt',header=None,delimiter='|')
When I try to read it into dataframe, I am not getting the last 4 integers
df[5].head()
0 0.467798
1 0.258165
2 0.860384
3 0.803388
4 0.249820
Name: 5, dtype: float64
How can I get the complete precision as present in the input file? I have some matrix operations that needs to be performed so i cannot cast it as string.
I figured out that I have to do something about dtype but I am not sure where I should use it.
It is only display problem, see docs:
#temporaly set display precision
with pd.option_context('display.precision', 10):
print df
0 1 2 3 4 5 6 7 \
0 895 2015-4-23 19 10000 LA 0.4677978806 0.477346934 0.4089938425
8 9 10 11 12
0 0.8224291972 0.8652525793 0.682994286 0.5139162227 NaN
EDIT: (Thank you Mark Dickinson):
Pandas uses a dedicated decimal-to-binary converter that sacrifices perfect accuracy for the sake of speed. Passing float_precision='round_trip' to read_csv fixes this. See the documentation for more.

Categories