I have a csv file containing numerical values such as 1524.449677. There are always exactly 6 decimal places.
When I import the csv file (and other columns) via pandas read_csv, the column automatically gets the datatype object. My issue is that the values are shown as 2470.6911370000003 which actually should be 2470.691137. Or the value 2484.30691 is shown as 2484.3069100000002.
This seems to be a datatype issue in some way. I tried to explicitly provide the data type when importing via read_csv by giving the dtype argument as {'columnname': np.float64}. Still the issue did not go away.
How can I get the values imported and shown exactly as they are in the source csv file?
Pandas uses a dedicated dec 2 bin converter that compromises accuracy in preference to speed.
Passing float_precision='round_trip' to read_csv fixes this.
Check out this page for more detail on this.
After processing your data, if you want to save it back in a csv file, you can passfloat_format = "%.nf" to the corresponding method.
A full example:
import pandas as pd
df_in = pd.read_csv(source_file, float_precision='round_trip')
df_out = ... # some processing of df_in
df_out.to_csv(target_file, float_format="%.3f") # for 3 decimal places
I realise this is an old question, but maybe this will help someone else:
I had a similar problem, but couldn't quite use the same solution. Unfortunately the float_precision option only exists when using the C engine and not with the python engine. So if you have to use the python engine for some other reason (for example because the C engine can't deal with regex literals as deliminators), this little "trick" worked for me:
In the pd.read_csv arguments, define dtype='str' and then convert your dataframe to whatever dtype you want, e.g. df = df.astype('float64') .
Bit of a hack, but it seems to work. If anyone has any suggestions on how to solve this in a better way, let me know.
I have the following dataset:
https://github.com/felipe0216/fdfds/blob/master/sheffield_weather_station.csv
I know it is a csv, so I can use the pd.read_csv() function. However, if you look into the file it is not comma separated values nor a tab separated values, I am not sure what separation it has exactly. Any ideas about how to open it?
The proper way to do this is as follows:
df = pd.read_csv("sheffield_weather_station.csv", comment="#", delim_whitespace=True)
You should first download the .csv file. You have to tell pandas that there will be comments and that there will just be spaces separating the columns. The amount of spaces do not matter.
I want to read a txt.file with Pandas and the Problem is the seperator/delimiter consits of a number and Minimum two blanks afterwards.
I already tried it similiar to this code (How to make separator in pandas read_csv more flexible wrt whitespace?):
pd.read_csv("whitespace.txt", header=None, delimiter=r"\s+")
This is only working if there is only a blank or more. So I adjustet it to the following code.
delimiter=r"\d\s\s+"
But this is seperating my dataframe when it sees two blanks or more, but i strictly Need the number before it followed by at least two blanks, anyone has an idea how to fix it?
My data Looks as follows:
I am an example of a dataframe
I have Problems to get read
100,00
So How can I read it
20,00
so the first row should be:
I am an example of a dataframe I have Problems to get read 100,00
followed by the second row:
So HOw can I read it 20,00
Id try it like this.
Id manipulate the text file before I attempt to parse it to a dataframe as follows:
import pandas as pd
import re
f = open("whitespace.txt", "r")
g = f.read().replace("\n", " ")
prepared_text = re.sub(r'(\d+,\d+)', r'\1#', g)
df = pd.DataFrame({'My columns':prepared_text.split('#')})
print(df)
This gives the following:
My columns
0 I am an example of a dataframe I have Problems...
1 So How can I read it 20,00
2
I guess this'd suffice as long as the input file wasnt too large but using the re module and substitiution gives you the control you seek.
The (\d+,\d+) parentheses mark a group which we want to match. We're basically matching any of your numbers in your text file.
Then we use the \1 which is called a backreference to the matched group which is referred to when specifying a replacement. So \d+,\d+ is replaced by \d+,\d+#.
Then we use the inserted character as a delimiter.
There are some good examples here:
https://lzone.de/examples/Python%20re.sub
I have a csv file containing numerical values such as 1524.449677. There are always exactly 6 decimal places.
When I import the csv file (and other columns) via pandas read_csv, the column automatically gets the datatype object. My issue is that the values are shown as 2470.6911370000003 which actually should be 2470.691137. Or the value 2484.30691 is shown as 2484.3069100000002.
This seems to be a datatype issue in some way. I tried to explicitly provide the data type when importing via read_csv by giving the dtype argument as {'columnname': np.float64}. Still the issue did not go away.
How can I get the values imported and shown exactly as they are in the source csv file?
Pandas uses a dedicated dec 2 bin converter that compromises accuracy in preference to speed.
Passing float_precision='round_trip' to read_csv fixes this.
Check out this page for more detail on this.
After processing your data, if you want to save it back in a csv file, you can passfloat_format = "%.nf" to the corresponding method.
A full example:
import pandas as pd
df_in = pd.read_csv(source_file, float_precision='round_trip')
df_out = ... # some processing of df_in
df_out.to_csv(target_file, float_format="%.3f") # for 3 decimal places
I realise this is an old question, but maybe this will help someone else:
I had a similar problem, but couldn't quite use the same solution. Unfortunately the float_precision option only exists when using the C engine and not with the python engine. So if you have to use the python engine for some other reason (for example because the C engine can't deal with regex literals as deliminators), this little "trick" worked for me:
In the pd.read_csv arguments, define dtype='str' and then convert your dataframe to whatever dtype you want, e.g. df = df.astype('float64') .
Bit of a hack, but it seems to work. If anyone has any suggestions on how to solve this in a better way, let me know.
I've sought out an answer on multiple forums and YouTube but to no avail, sorry in advance if it is widely available and my keywords just weren't right.
I'm attempting to execute a simple pandas.read_csv('.csv',sep=','). However, the output I'm receiving is not splitting the data out into multiple columns as I imagine it should.
I'm getting back all of my headers in one row, separated by commas. The same is true for each line item tied to the respective headers.
I've tried setting this data up in a dataframe, manipulating the headers, manually adding the headers with no success.
For better understanding I've copied and pasted from Ipython notebook of what I'm seeing:
In [15]:
import pandas as pd
pd.read_csv('C:\Users\Dale\Desktop\ShpData\TrackerTW0.csv',sep=',')
Out[15]:
PurchaseOrderNumber,ShipmentFinalDestinationCity,TransferPointCity,POType,PlannedMode,ProgramType,FreightPaymentTerms,ContainerNumber,BL/AWB#,Mode,ShipmentFinalDestinationLocation,CarrierSCAC,Carrier,Forwarder,BrandDesc,POLCity,PODCity,InDCOutlookDate,InDCOriginalDate,AnticipatedShipDate,PlannedStockedDate,ExFactoryActualDate(LT),OriginConsolActualDate(LT),DepartLoadPortActualDate(LT),FullOutGatefromOceanTerminal(CYorPort)ActualDate(LT),DPArrivalActualDate(LT),FreightAvailableActualDate(LT),DestConsolActualDate(LT),DomDepartActualDate(LT),YardArrivalActualDate(LT),CarrierDropActualDate(LT),InDCActualDate(LT),StockedActualDate(LT),Vessel,VesselETADischargePortCity,DPArrivalOutlookDate,VesselETADischargePortActualDate(LT),FullOutGatefromOceanTerminal(CYorPort)OutlookDate,StockedOutlookDate,ShipmentLeg#,Metrics,TotalShippedQty
0 1251708,Rugby,Tuticorin,Initial Order,Ocean,Re...
1 1262597,Rugby,Hong Kong,Initial Order,Ocean,Re...
Thanks
You might want to try this, you have like 40 columns.
import pandas as pd
df = pd.read_csv('input.csv', names=['PurchaseOrderNumber','ShipmentFinalDestinationCity','TransferPointCity','POType','PlannedMode','ProgramType','FreightPaymentTerms','ContainerNumber','BL/AWB#','Mode','ShipmentFinalDestinationLocation','CarrierSCAC','Carrier','Forwarder','BrandDesc','POLCity','PODCity','InDCOutlookDate','InDCOriginalDate','AnticipatedShipDate','PlannedStockedDate','ExFactoryActualDate(LT)','OriginConsolActualDate(LT)','DepartLoadPortActualDate(LT)','FullOutGatefromOceanTerminal(CYorPort)ActualDate(LT)','DPArrivalActualDate(LT)','FreightAvailableActualDate(LT)','DestConsolActualDate(LT)','DomDepartActualDate(LT)','YardArrivalActualDate(LT)','CarrierDropActualDate(LT)','InDCActualDate(LT)','StockedActualDate(LT)','Vessel','VesselETADischargePortCity','DPArrivalOutlookDate','VesselETADischargePortActualDate(LT)','FullOutGatefromOceanTerminal(CYorPort)OutlookDate','StockedOutlookDate','ShipmentLeg#','Metrics','TotalShippedQty']
print df
Recently, I wanna process a csv file, the code is like this:
data = pd.read_csv(dir, sep=" ")
print(data)
the output also put all values in one row,
then I just use the default "sep" value, the problem has been solved.
data = pd.read_csv(dir, sep=",")
the situation seems like different from which the asker raised,
but I hope it's helpful for some other friends like me,
and this is my first comment, I hope it's not too bad!
It may not be the best option but it works!
Read the file as it is:
df = pd.read_csv('input.csv')
Get all the column names and assign them to a variable.
names= df.columns.str.split(',').tolist()
Split all the values by ','
df= df.iloc[:,0].str.split(',', expand=True)
Finally, assign 'names' to column names and that's it!
df.columns = names
I was also having the same issues. All of the columns were coming as one value. So the following worked for me.
df = pd.read_csv('/content/Reviews.csv',
sep=',',
error_bad_lines=False,
engine='python')