I have DataFrame from this question:
temp=u"""Total,Price,test_num
0,71.7,2.04256e+14
1,39.5,2.04254e+14
2,82.2,2.04188e+14
3,42.9,2.04171e+14"""
df = pd.read_csv(pd.compat.StringIO(temp))
print (df)
Total Price test_num
0 0 71.7 2.042560e+14
1 1 39.5 2.042540e+14
2 2 82.2 2.041880e+14
3 3 42.9 2.041710e+14
If convert floats to strings get trailing 0:
print (df['test_num'].astype('str'))
0 204256000000000.0
1 204254000000000.0
2 204188000000000.0
3 204171000000000.0
Name: test_num, dtype: object
Solution is convert floats to integer64:
print (df['test_num'].astype('int64'))
0 204256000000000
1 204254000000000
2 204188000000000
3 204171000000000
Name: test_num, dtype: int64
print (df['test_num'].astype('int64').astype(str))
0 204256000000000
1 204254000000000
2 204188000000000
3 204171000000000
Name: test_num, dtype: object
Question is why it convert this way?
I add this poor explanation, but feels it should be better:
Poor explanation:
You can check dtype of converted column - it return float64.
print (df['test_num'].dtype)
float64
After converting to string it remove exponential notation and cast to floats, so added traling 0:
print (df['test_num'].astype('str'))
0 204256000000000.0
1 204254000000000.0
2 204188000000000.0
3 204171000000000.0
Name: test_num, dtype: object
When you use pd.read_csv to import data and do not define datatypes,
pandas makes an educated guess and in this case decides, that column
values like "2.04256e+14" are best represented by a float value.
This, converted back to string adds a ".0". As you corrently write,
converting to int64 fixes this.
If you know that the column has int64 values only before input (and
no empty values, which np.int64 cannot handle), you can force this type on import to avoid the unneeded conversions.
import numpy as np
temp=u"""Total,Price,test_num
0,71.7,2.04256e+14
1,39.5,2.04254e+14
2,82.2,2.04188e+14
3,42.9,2.04171e+14"""
df = pd.read_csv(pd.compat.StringIO(temp), dtype={2: np.int64})
print(df)
returns
Total Price test_num
0 0 71.7 204256000000000
1 1 39.5 204254000000000
2 2 82.2 204188000000000
3 3 42.9 204171000000000
Related
I'm using pandas.DataFrame.round to truncate columns on a DataFrame, but I have a column of p-values that have small values, which are being rounded to zero. For example, all the values bellow are being rounded to 0.
p-value
2.298564e-17
6.848231e-91
1.089847e-10
9.390048e-04
5.628517e-35
4.621786e-19
4.601818e-54
9.639073e-19
I want something like
p-value
2.29e-17
6.84e-91
1.08e-10
9.39e-04
5.62e-35
4.62e-19
4.60e-54
9.63e-19
Numpy has functions for this.
data = """p-value
2.298564e-17
6.848231e-91
1.089847e-10
9.390048e-04
5.628517e-35
4.621786e-19
4.601818e-54
9.639073e-19"""
a = [x for x in data.split("\n")]
df = pd.DataFrame({"p-value":a[1:]})
df["p-value"] = df["p-value"].astype(np.float)
df["p-value"].apply(lambda x: np.format_float_scientific(x, precision=2))
output
0 2.3e-17
1 6.85e-91
2 1.09e-10
3 9.39e-04
4 5.63e-35
5 4.62e-19
6 4.60e-54
7 9.64e-19
Name: p-value, dtype: object
not quite truncate, but rather round:
df['p-value'].apply(lambda x: f'{x:.2e}')
Output:
0 2.30e-17
1 6.85e-91
2 1.09e-10
3 9.39e-04
4 5.63e-35
5 4.62e-19
6 4.60e-54
7 9.64e-19
Name: p-value, dtype: object
New to programming:
I have a CSV file in which date is given in format DDMMYYYY, while reading the file in python its type is taken as int. So a date say 01022020 is being taken as 1022020. I need to add the 0 in front of all these dates wherein dates' len is less than 8.
Index Date Value
0 10042020 10.5
1 03052020 14.2
2 09052020 16.3
3 13052020 17.5
I converted the column to str using df.Date.map(str) but can't understand how to proceed.
I tried:
if len(df.Date[i])==7:
df.Date[i]= df.Date.str["0"]+df.Date.str[i]
Its not working. I have two queries regarding this:
want to understand why is this wrong logically and what's the best solution.
While reading the data from CSV file, can a column having integers only be converted to string directly?
Please help.
print(df)#input
Index Date Value
0 0 10042020 10.5
1 1 3052020 14.2
2 2 9052020 16.3
3 3 13052020 17.5
convert date column to string using .astype(str) and pad any strings whose len is less than 8 using .str.pad() method
df['Date']=df['Date'].astype(str).str.pad(width=8, side='left', fillchar='0')
Index Date Value
0 0 10042020 10.5
1 1 03052020 14.2
2 2 09052020 16.3
3 3 13052020 17.5
if needed in datetime object, then;
df['Date']=pd.to_datetime(df['Date'],format='%d%m%Y')
Chained together;
df['Date']=pd.to_datetime(df['Date'].astype(str).str.pad(width=8, side='left', fillchar='0'),format='%d%m%Y')
Use, .str.zfill:
s = pd.Series([1122020, 2032020, 12312020])
s
Input series:
0 1122020
1 2032020
2 12312020
dtype: int64
Use cast to string then use zfill:
s.astype(str).str.zfill(8)
Output:
0 01122020
1 02032020
2 12312020
dtype: object
Then you can use pd.to_datetime with format:
pd.to_datetime(s.astype(str).str.zfill(8), format='%m%d%Y')
Output:
0 2020-01-12
1 2020-02-03
2 2020-12-31
dtype: datetime64[ns]
The simplest solution I've seen for converting an int to a string that's left-padded with zeroes is to use the zfill command e.g. str(df.Date[i]).zfill(8)
Assuming you're using pandas for your csv load, you can specify the dtype on load: df = pd.read_csv('test.csv', dtype={'Date': 'string'})
print(dataframe)
Total Price test_num
0 71.7 2.04256e+14
1 39.5 2.04254e+14
2 82.2 2.04188e+14
3 42.9 2.04171e+14
I have an error when uploading to Mongo db and converting it to Str.
print(data_frame.astype(str))
Total Price test_num
0 71.7 204255705072224.0
1 39.5 204253951078915.0
2 82.2 204188075120577.0
3 42.9 204171098699772.0
When converting Int to Str, .0 is added at the end.
How can I effectively eliminate .0?
thank you
Use astype by int64:
df['test_num'] = df['test_num'].astype('int64')
#alternative
#df['test_num'] = df['test_num'].astype(np.int64)
print (df)
Total Price test_num
0 0 71.7 204256000000000
1 1 39.5 204254000000000
2 2 82.2 204188000000000
3 3 42.9 204171000000000
Explanation:
You can check dtype of converted column - it return float64.
print (df['test_num'].dtype)
float64
After converting to string it remove exponential notation and cast to floats, so added traling 0:
print (df['test_num'].astype('str'))
0 204256000000000.0
1 204254000000000.0
2 204188000000000.0
3 204171000000000.0
Name: test_num, dtype: object
I read some weather data from a .csv file as a dataframe named "weather". The problem is that the data type of one of the columns is object. This is weird, as it indicates temperature. How do I change it to having a float data type? I tried to_numeric, but it can't parse it.
weather.info()
weather.head()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 304 entries, 2017-01-01 to 2017-10-31
Data columns (total 2 columns):
Temp 304 non-null object
Rain 304 non-null float64
dtypes: float64(1), object(1)
memory usage: 17.1+ KB
Temp Rain
Date
2017-01-01 12.4 0.0
2017-02-01 11 0.6
2017-03-01 10.4 0.6
2017-04-01 10.9 0.2
2017-05-01 13.2 0.0
You can use pandas.Series.astype
You can do something like this :
weather["Temp"] = weather.Temp.astype(float)
You can also use pd.to_numeric that will convert the column from object to float
For details on how to use it checkout this link :http://pandas.pydata.org/pandas-docs/version/0.20/generated/pandas.to_numeric.html
Example :
s = pd.Series(['apple', '1.0', '2', -3])
print(pd.to_numeric(s, errors='ignore'))
print("=========================")
print(pd.to_numeric(s, errors='coerce'))
Output:
0 apple
1 1.0
2 2
3 -3
=========================
dtype: object
0 NaN
1 1.0
2 2.0
3 -3.0
dtype: float64
In your case you can do something like this:
weather["Temp"] = pd.to_numeric(weather.Temp, errors='coerce')
Other option is to use convert_objects
Example is as follows
>> pd.Series([1,2,3,4,'.']).convert_objects(convert_numeric=True)
0 1
1 2
2 3
3 4
4 NaN
dtype: float64
You can use this as follows:
weather["Temp"] = weather.Temp.convert_objects(convert_numeric=True)
I have showed you examples because if any of your column won't have a number then it will be converted to NaN... so be careful while using it.
I tried all methods suggested here but sadly none worked. Instead, found this to be working:
df['column'] = pd.to_numeric(df['column'],errors = 'coerce')
And then check it using:
print(df.info())
I eventually used:
weather["Temp"] = weather["Temp"].convert_objects(convert_numeric=True)
It worked just fine, except that I got the following message.
C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:3: FutureWarning:
convert_objects is deprecated. Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.
You can try the following:
df['column'] = df['column'].map(lambda x: float(x))
First check your data cuz you may get an error if you have ',' instead of '.'
if so, you need to transform every ',' into '.' with a function :
def replacee(s):
i=str(s).find(',')
if(i>0):
return s[:i] + '.' + s[i+1:]
else :
return s
then you need to apply this function on every row in your column :
dfOPA['Montant']=dfOPA['Montant'].apply(replacee)
then the convert function will work fine :
dfOPA['Montant'] = pd.to_numeric(dfOPA['Montant'],errors = 'coerce')
Eg, For Converting $40,000.00 object to 40000 int or float32
Follow this step by step :
$40,000.00 ---(**1**. remove $)---> 40,000.00 ---(**2**. remove , comma)---> 40000.00 ---(**3**. remove . dot)---> 4000000 ---(**4**. remove empty space)---> 4000000 ---(**5**. Remove NA Values)---> 4000000 ---(**6**. now this is object type so, convert to int using .astype(int) )---> 4000000 ---(**7**. divide by 100)---> 40000
Implementing code In Pandas
table1["Price"] = table1["Price"].str.replace('$','')<br>
table1["Price"] = table1["Price"].str.replace(',','')<br>
table1["Price"] = table1["Price"].str.replace('.','')<br>
table1["Price"] = table1["Price"].str.replace(' ','')
table1 = table1.dropna()<br>
table1["Price"] = table1["Price"].astype(int)<br>
table1["Price"] = table1["Price"] / 100<br>
Finally it's done
consider the pd.Series s
s = pd.Series([-1.23, 4.56])
s
0 -1.23
1 4.56
dtype: float64
I can format floats with pandas display.float_format option
with pd.option_context('display.float_format', '${:,.2f}'.format):
print s
0 $-1.23
1 $4.56
dtype: float64
But how do I format it in such a way that I get the - sign in front of the $
0 -$1.23
1 $4.56
dtype: float64
You can substitute the formatting function with your own. Below is just a demo of how it works, you can tune it to your own needs:
def formatfunc(*args, **kwargs):
value = args[0]
if value >= 0:
return '${:,.2f}'.format(value)
else:
return '-${:,.2f}'.format(abs(value))
with pd.option_context('display.float_format', formatfunc):
print(s)
And you get:
0 -$1.23
1 $4.56
dtype: float64