Rounding up decimals Python - python

I am new to python pandas and I am having difficulties trying to round up all the values in the column as there is a white space between the decimal point and zero. For example,
Hi
21. 0
8. 0
52. 0
45. 0
I tried using my current code below, but it gave me:
invalid literal for float(): 21. 0
df.Hi.astype(float).round()

Try using replace on the string to replace all whitespace in the string before converting to a float:
df.Hi.str.replace(' ', '').astype(float).round()

If I understand you correctly, you want to convert the values in column Hi to float.
You have a white space between the decimal points and zeros, which means your values are strings.
You can convert them to float using a lambda function.
df['Hi'] = df['Hi'].apply(lambda x: x.replace(" ", "")).astype(float)
print(df)
Hi
0 21.0
1 8.0
2 52.0
3 45.0
print(df.dtypes)
Hi float64
dtype: object

Related

Add commas to decimal column without rounding off

I have pandas column named Price_col. which look like this.
Price_col
1. 1000000.000
2. 234556.678900
3. 2345.00
4.
5. 23.56
I am trying to add commas to my Price_col to look like this.
Price_col
1. 1,000,000.000
2. 234,556.678900
3. 2,345.00
4.
5. 23.56
when I try convert the values it always round off. is there way that I can have original value without rounding off.
I tried below code. this what I got for the value 234556.678900.
n = "{:,}".format(234556.678900)
print(n)
>>> 234,556.6789
Add f for fixed-point
>>> "{:,}".format(234556.678900)
'234,556.6789'
>>> "{:,f}".format(234556.678900)
'234,556.678900'
You can also control the precision with .p where p is the number of digits (and should probably do so) .. beware, as you're dealing with floats, you'll have some IEEE 754 aliasing, though representation via format should be quite nice regardless of the backing data
>>> "{:,.5f}".format(234556.678900)
'234,556.67890'
>>> "{:,.20f}".format(234556.678900)
'234,556.67889999999897554517'
The full Format Specification Mini-Language can be found here:
https://docs.python.org/3/library/string.html#format-specification-mini-language
From your comment, I realized you may really want something else as described in How to display pandas DataFrame of floats using a format string for columns? and only change the view of the data
Creating a new string column formatted as a string
>>> df = pd.DataFrame({"Price_col": [1000000.000, 234556.678900, 2345.00, None, 23.56]}
>>> df["price2"] = df["Price_col"].apply(lambda x: f"{x:,f}")
>>> df
Price_col price2
0 1000000.0000 1,000,000.000000
1 234556.6789 234,556.678900
2 2345.0000 2,345.000000
3 NaN nan
4 23.5600 23.560000
>>> df.dtypes
Price_col float64
price2 object
dtype: object
Temporarily changing how data is displayed
>>> df = pd.DataFrame({"Price_col": [1000000.000, 234556.678900, 2345.00, None, 23.56]}
>>> print(df)
Price_col
0 1000000.0000
1 234556.6789
2 2345.0000
3 NaN
4 23.5600
>>> with pd.option_context('display.float_format', '€{:>18,.6f}'.format):
... print(df)
...
Price_col
0 € 1,000,000.000000
1 € 234,556.678900
2 € 2,345.000000
3 NaN
4 € 23.560000
>>> print(df)
Price_col
0 1000000.0000
1 234556.6789
2 2345.0000
3 NaN
4 23.5600

Adding 0 in front of date

New to programming:
I have a CSV file in which date is given in format DDMMYYYY, while reading the file in python its type is taken as int. So a date say 01022020 is being taken as 1022020. I need to add the 0 in front of all these dates wherein dates' len is less than 8.
Index Date Value
0 10042020 10.5
1 03052020 14.2
2 09052020 16.3
3 13052020 17.5
I converted the column to str using df.Date.map(str) but can't understand how to proceed.
I tried:
if len(df.Date[i])==7:
df.Date[i]= df.Date.str["0"]+df.Date.str[i]
Its not working. I have two queries regarding this:
want to understand why is this wrong logically and what's the best solution.
While reading the data from CSV file, can a column having integers only be converted to string directly?
Please help.
print(df)#input
Index Date Value
0 0 10042020 10.5
1 1 3052020 14.2
2 2 9052020 16.3
3 3 13052020 17.5
convert date column to string using .astype(str) and pad any strings whose len is less than 8 using .str.pad() method
df['Date']=df['Date'].astype(str).str.pad(width=8, side='left', fillchar='0')
Index Date Value
0 0 10042020 10.5
1 1 03052020 14.2
2 2 09052020 16.3
3 3 13052020 17.5
if needed in datetime object, then;
df['Date']=pd.to_datetime(df['Date'],format='%d%m%Y')
Chained together;
df['Date']=pd.to_datetime(df['Date'].astype(str).str.pad(width=8, side='left', fillchar='0'),format='%d%m%Y')
Use, .str.zfill:
s = pd.Series([1122020, 2032020, 12312020])
s
Input series:
0 1122020
1 2032020
2 12312020
dtype: int64
Use cast to string then use zfill:
s.astype(str).str.zfill(8)
Output:
0 01122020
1 02032020
2 12312020
dtype: object
Then you can use pd.to_datetime with format:
pd.to_datetime(s.astype(str).str.zfill(8), format='%m%d%Y')
Output:
0 2020-01-12
1 2020-02-03
2 2020-12-31
dtype: datetime64[ns]
The simplest solution I've seen for converting an int to a string that's left-padded with zeroes is to use the zfill command e.g. str(df.Date[i]).zfill(8)
Assuming you're using pandas for your csv load, you can specify the dtype on load: df = pd.read_csv('test.csv', dtype={'Date': 'string'})

Convert DataFrame with 'N/As' to float to compute percent change

I am trying convert the following DataFrame (contains several 'N/As') to float so that I can perform a percent change operation:
d = pd.DataFrame({"A":['N/A','$10.00', '$5.00'],
"B":['N/A', '$10.00', '-$5.00']})
Ultimately, I would like the result to be:
(UPDATE: I do not want to remove the original N/A values. I'd like to keep them there as placeholders.)
Because there aren't any flags for dealing with negative numbers, I cannot use:
pct_change(-1)
So, I need to use:
d['A'].diff(-1)/d['A'].shift(-1).abs()
But, I get the error:
TypeError: unsupported operand type(s) for -: 'str' and 'str'
For a first step, I am trying to convert the data from object/string to float, but the output is unexpected (to me). I am getting float 'NaNs' instead of the actual number.
>d['A_float'] = pd.to_numeric(d['A'], errors='coerce')
>d
A B A_float
0 N/A N/A NaN
1 $10.00 -$100.00 NaN
2 $5.00 -$5.00 NaN
>d.dtypes
A object
B object
A_float float64
dtype: object
As a simple test, I tried subtracting '1' from the value, but still got float 'NaN'.
>d['A_float_minus1_test'] = pd.to_numeric(d['A'], errors='coerce')-1
>d
A B A_float A_float_minus1_test
0 N/A N/A NaN NaN
1 $10.00 -$100.00 NaN NaN
2 $5.00 -$5.00 NaN NaN
>d.dtypes
A object
B object
A_float float64
A_float_minus1_test float64
dtype: object
Is there a simple way to get the following result? The way I am thinking is to individually change each DataFrame column to float, then perform the operation. There must be an easier way.
Desired output:
(UPDATE: I do not want to remove the original N/A values. I'd like to keep them there as placeholders.)
Thanks!
To convert your columns from string to float, you can use apply, like such:
d['A_float'] = d['A'].apply(lambda x: float(x.split('$')[1]) if x != '' else 0.0)
The x.split('$')[1] is used to remove the $ character (and eventually the minus before).
Then I am not sure of what your are trying to do, but if you are trying to compute the percentage of A from B, you can use np.vectorize like this:
d['Percent'] = np.vectorize(percent)(d['A'],d['B'])
def percent(p1, p2):
return (100 * p2) / p1
import pandas as pd
d = pd.DataFrame({"A":['N/A','$10.00', '$5.00'],
"B":['N/A', '$10.00', '-$5.00']})
# Covert to number, remove '$', assign to new columns
d[['dA','dB']] = d[['A','B']].apply(lambda s: s.str.replace('$','')).apply(pd.to_numeric, errors='coerce')
# Perform calculations across desired column
d[['dA','dB']] = d[['dA','dB']].diff(-1)/d[['dA','dB']].shift(-1).abs()
print(d)
A B dA dB
0 N/A N/A NaN NaN
1 $10.00 $10.00 1.0 3.0
2 $5.00 -$5.00 NaN NaN

Pandas inconsistency with regex "." dot metacharacter?

Consider
df
Cost
Store 1 22.5
Store 1 .........
Store 2 ...
To convert these the dots to nan, I can use:
df.replace('^\.+$', np.nan, regex=True)
Cost
Store 1 22.5
Store 1 NaN
Store 2 NaN
What I don't understand is why the following pattern also works:
df.replace('^.+$', np.nan, regex=True)
Cost
Store 1 22.5
Store 1 NaN
Store 2 NaN
Note that, in this case, I haven't escaped the ., so it should be treated as a matchall character, resulting in every single row being converted to NaN... but it isn't.... only the .... rows are matched... even though I used the matchall character.
Contrast this with:
import re
re.sub('^.+$', '', '22.5')
''
Which returns an empty string.
So what's going on?
Halfway through writing this question, I realised what the problem was:
df.Cost.dtype
dtype('O')
df.Cost.values
array([22.5, '.........', '...'], dtype=object)
So, the 22.5 happens to be a numeric value, and the regex pattern simply skips over non-string values when attempting to replace. Doing an astype conversion makes it obvious:
df.astype(str).replace('.+', np.nan, regex=True)
Cost
Store 1 NaN
Store 1 NaN
Store 2 NaN
Problem solved. Leaving this up in case anyone else is confused by this.

Rounding down values in Pandas dataframe column with NaNs

I have a Pandas dataframe that contains a column of float64 values:
tempDF = pd.DataFrame({ 'id': [12,12,12,12,45,45,45,51,51,51,51,51,51,76,76,76,91,91,91,91],
'measure': [3.2,4.2,6.8,5.6,3.1,4.8,8.8,3.0,1.9,2.1,2.4,3.5,4.2,5.2,4.3,3.6,5.2,7.1,6.5,7.3]})
I want to create a new column containing just the integer part. My first thought was to use .astype(int):
tempDF['int_measure'] = tempDF['measure'].astype(int)
This works fine but, as an extra complication, the column I have contains a missing value:
tempDF.ix[10,'measure'] = np.nan
This missing value causes the .astype(int) method to fail with:
ValueError: Cannot convert NA to integer
I thought I could round down the floats in the column of data. However, the .round(0) function will round to the nearest integer (higher or lower) rather than rounding down. I can't find a function equivalent to ".floor()" that will act on a column of a Pandas dataframe.
Any suggestions?
You could just apply numpy.floor;
import numpy as np
tempDF['int_measure'] = tempDF['measure'].apply(np.floor)
id measure int_measure
0 12 3.2 3
1 12 4.2 4
2 12 6.8 6
...
9 51 2.1 2
10 51 NaN NaN
11 51 3.5 3
...
19 91 7.3 7
You could also try:
df.apply(lambda s: s // 1)
Using np.floor is faster, however.
The answers here are pretty dated and as of pandas 0.25.2 (perhaps earlier) the error
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
Which would be
df.iloc[:,0] = df.iloc[:,0].astype(int)
for one particular column.

Categories