Absolute number is coming with decimals using pandas - python

I'm using pandas to apply some format level changes on a csv and storing the result in a target file. The source file has some integers, but after pandas operation the integers are converted to decimals. For e.g. 3 in source file converted to 3.0. I would like the integers remain as integers.
Any pointers on how to get this working? Will be really helpful, thank you!
import pandas as pd
# reading the csv file
df = pd.read_csv(source)
# updating the column value/data
df['Regular'] = df['Regular'].replace({',': '_,'})
# writing into the file
df.to_csv(target, index=False)

You can specify data type for pandas read_csv(), eg.:
df = pd.read_csv(source, dtype={'column_name_a': 'Int32', 'column_name_b': 'Int32'})
see docs here :: https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html

Related

How to change the data format to one column and export as csv in Python and Pandas

The picture is just an example and I hope to change this data(str type) to one column (float type) and save as csv in Python. Pandas or Numpy may help but I don't know how to do that. Could someone help me please? Thank you very much.
data example
To convert as float, you just need to use astype method in your dataframe, and to save it as csv you just need to use to_csv method for your dataframe.
Here's the example of your case:
import pandas as pd
df = pd.DataFrame({"String_type":["2.1","4","5.2"]})
# convert string to float
df["float_type"] = df["String_type"].astype(float)
# save as csv, change out.csv to your filename
df.to_csv('out.csv')
print(df.dtypes)
Hope this would be helpful.

Saving columns from csv

I am trying to write a code that reads a csv file and can save each columns as a specific variable. I am having difficulty because the header is 7 lines long (something I can control but would like to just ignore if I can manipulate it in code), and then my data is full of important decimal places so it can not change to int( or maybe string?) I've also tried just saving each column by it's placement in the file but am struggling to run it. Any ideas?
Image shows my current code that I have slimmed to show important parts and circles data that prints in my console.
save each columns as a specific variable
import pandas as pd
pd.read_csv('file.csv')
x_col = df['X']
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html
If what you are looking for is how to iterate through the columns, no matter how many there are. (Which is what I think you are asking.) Then this code should do the trick:
import pandas as pd
import csv
data = pd.read_csv('optitest.csv', skiprows=6)
for column in data.columns:
# You will need to define what this save() method is.
# Just placing it here as an example.
save(data[column])
The line about formatting your data as a number or a string was a little vague. But if it's decimal data, then you need to use float. See #9637665.

pandas vs sasdataset ,values are exact correct

Before reading into pandas, data is used in sasdataset. My data looks like
SNYDJCM--integer
740.19999981
After reading into pandas my data is changing as below
SNYDJCM--converting to float
740.200000
How to get same value after reading into pandas dataframe
Steps followed:
1)
import pandas as pd
2)
pd.read_sas(path,format='sas7bdat',encoding='iso-8859-1')
Need your help
Try import SAS7BDAT and casting your file before reading:
from sas7bdat import SAS7BDAT
SAS7BDAT('FILENAME.sas7bdat')
df = pd.read_sas('FILENAME.sas7bdat',format='sas7bdat')
or use it to directly read the file:
from sas7bdat import SAS7BDAT
sas_file = SAS7BDAT('FILENAME.sas7bdat')
df = sas_file.to_data_frame()
or use pyreadstat to read the file:
import pyreadstat
df, meta = pyreadstat.read_sas7bdat('FILENAME.sas7bdat')
First 740.19999981 is no integer 740 would be the nearest integer. But also when you round 740.19999981 down to 6 digits you will get 740.200000. I would sugest printing out with higher precision and see if it is really changed.
print("%.12f"%(x,))

Fixed Width File manipulation in Pandas

I have a fixed-width file with the following format:
5678223313570888271712000000024XAXX0101010006461801325345088800.0784001501.25abc#yahoo.com
5678223324686600271712000000070XAXX0101010006461801325390998280.0784001501.25abcde.12345#gmail.com 5678123422992299
Here's what i tried :
import pandas as pd
ColSpecs = [(0,16),(16,31),(31,44),(44,62),(62,70),(70,73),(73,77),(77,127),(127,143)]
df = pd.read_fwf("~/filename.txt",colspecs=ColSpecs,Header=True)
Now this surely helps me to convert cleanly in Pandas format. However, the blank(or fixed white spaces) get trimmed off. For Eg: the Email field(#8) has 50 characters set fixed. They get truncated as soon as they're imported to Pandas dataframe.
For the data manipulation, I am creating 3 new fields that are extracted from the values of the previously imported fields.
Final Output file structure:
[(0,16),(16,31),(31,44),(44,62),(62,70),(70,73),(73,77),(77,127),(127,143),(143,153),(153,163),(164,165)]
Since, I haven't found any to_fwf method on dataframes or any other alternative for Pandas -> Flat File (keeping original lengths intact) , I would really appreciate if anyone has a better solution.
P.S. : I read that awk/sed in Unix works better, but still would like to know for Python

python dataframe write to R data format

I have a question with writing a dataframe format to R.
I have 1000 column X 77 row data. I want to write this dataframe to R data.
When I use function of
r_dataframe = com.convert_to_r_dataframe(df)
it gives me an error like dataframe object has no arttribute type.
When I see the code of com.convert_to_r_dataframe(). it just get the column of dataframe, and get the colunm.dtype.type.
In this moment, the column is dataframe, I think large columns dataframe has inside dataframes?
Any one have some idea to solve this problem?
The data.frame transfer from Python to R could be accomplished with the feather format. Via this link you can find more information.
Quick example.
Export in Python:
import feather
path = 'my_data.feather'
feather.write_dataframe(df, path)
Import in R:
library(feather)
path <- "my_data.feather"
df <- read_feather(path)
In this case you'll have the data in R as a data.frame. You can then decide to write it to an RData file.
save(df, file = 'my_data.RData')
simplest, bestest practical solution is to export in csv
import pandas as pd
dataframe.to_csv('mypath/file.csv')
and then read in R using read.csv

Categories