I'm saving my pd.DataFrame with
"""df.to_csv('df.csv', encoding='utf-8-sig)"""
my csv file have a problem...
please see rows, where have content2-1, content2-2, and content2-3 in this pic.
Before saving(to_csv), there was no problem. All the data had right columns, 'content2' was not separated. but after df -> csv...
'content2' is all separated, and the others of 'id2' are allocated to the wrong columns.
"2018-04-21" have to be in column D, 0 with E,F,G, and url must be in column I.
why this happen? because of large csv file?(774,740KB), because of language?(Korean), or csv cannot recognize enter key?(All data with problems such as content2 were separated based on the enter key.)
how can I resolve this? I have no idea
Unfortunately I never figured out the reason for this.. I assumed it was something to do with the size of the data i was working with and excel not liking it.
What worked for me though was using .to_excel() instead of to_csv(). I know, far from a perfect answer, but thought id put it here incase it is enough for your case
I'm currently working on a project that takes a csv list of student names who attended a meeting, and converts it into a list (later to be compared to full student roster list, but one thing at a time). I've been looking for answers for hours but I still feel stuck. I've tried using both pandas and the csv module. I'd like to stick with pandas, but if it's easier in the csv module that works too. CSV file example and code below.
The file is autogenerated by our video call software- so the formatting is a little weird.
Attendance.csv
see sample as image, I can't insert images yet
Code:
data = pandas.read_csv("2A Attendance Report.csv", header=3)
AttendanceList = data['A'].to_list()
print(str(AttendanceList))
However, this is raising KeyError: 'A'
Any help is really appreciated, thank you!!!
As seen in sample image, you have column headers in the first row itself. Hence you need to remove header=3 from your read_csv call. Either replace it with header=0 or don't specify any explicit header value at all.
I am trying to load a large text file into python dataframe. One thing I noticed is, if I want to load it successfully, I have to drop all the bad lines. But I would like to load all rows first then take a look then clean it manually. Is there a way to do that?
data = pd.read_csv('filename.txt', sep="\t", error_bad_lines=False, engine='python')
Here's warnings I've got. It's a common error, but all solutions are just skipping them, I really need to load all rows... any thought?
Skipping XXX line: Expected 28 fields in line XXX, saw 29
Without knowing more about the specific CSV file, it looks like there is either:
Too many columns in that row (an extra comma)
Quoting is off meaning there's a comma that should be quoted but isn't
The best way to remedy this is to fix the problem in the CSV file.
Technically you're not just loading the file, but also parsing it at the same time. It looks like you've handled the delimiter properly, so as you may have guessed you have too many columns or too few in some of your rows. That may actually be the case, or perhaps you have tabs within text fields that are being interpreted as delimiters.
In any case, pandas isn't going to parse those inconsistent lines.
A typical approach is to open the file in a robust text editor and look at the lines that are erroring out in Pandas. See what's actually wrong and either fix it in the text editor, or use python's native open() function to load the entire file and iterate line by line, with logic that fixes whatever the problem is.
Once you're certain that you have the same number of columns in every row load it with Pandas.
I have the following dataset:
https://github.com/felipe0216/fdfds/blob/master/sheffield_weather_station.csv
I know it is a csv, so I can use the pd.read_csv() function. However, if you look into the file it is not comma separated values nor a tab separated values, I am not sure what separation it has exactly. Any ideas about how to open it?
The proper way to do this is as follows:
df = pd.read_csv("sheffield_weather_station.csv", comment="#", delim_whitespace=True)
You should first download the .csv file. You have to tell pandas that there will be comments and that there will just be spaces separating the columns. The amount of spaces do not matter.
I've sought out an answer on multiple forums and YouTube but to no avail, sorry in advance if it is widely available and my keywords just weren't right.
I'm attempting to execute a simple pandas.read_csv('.csv',sep=','). However, the output I'm receiving is not splitting the data out into multiple columns as I imagine it should.
I'm getting back all of my headers in one row, separated by commas. The same is true for each line item tied to the respective headers.
I've tried setting this data up in a dataframe, manipulating the headers, manually adding the headers with no success.
For better understanding I've copied and pasted from Ipython notebook of what I'm seeing:
In [15]:
import pandas as pd
pd.read_csv('C:\Users\Dale\Desktop\ShpData\TrackerTW0.csv',sep=',')
Out[15]:
PurchaseOrderNumber,ShipmentFinalDestinationCity,TransferPointCity,POType,PlannedMode,ProgramType,FreightPaymentTerms,ContainerNumber,BL/AWB#,Mode,ShipmentFinalDestinationLocation,CarrierSCAC,Carrier,Forwarder,BrandDesc,POLCity,PODCity,InDCOutlookDate,InDCOriginalDate,AnticipatedShipDate,PlannedStockedDate,ExFactoryActualDate(LT),OriginConsolActualDate(LT),DepartLoadPortActualDate(LT),FullOutGatefromOceanTerminal(CYorPort)ActualDate(LT),DPArrivalActualDate(LT),FreightAvailableActualDate(LT),DestConsolActualDate(LT),DomDepartActualDate(LT),YardArrivalActualDate(LT),CarrierDropActualDate(LT),InDCActualDate(LT),StockedActualDate(LT),Vessel,VesselETADischargePortCity,DPArrivalOutlookDate,VesselETADischargePortActualDate(LT),FullOutGatefromOceanTerminal(CYorPort)OutlookDate,StockedOutlookDate,ShipmentLeg#,Metrics,TotalShippedQty
0 1251708,Rugby,Tuticorin,Initial Order,Ocean,Re...
1 1262597,Rugby,Hong Kong,Initial Order,Ocean,Re...
Thanks
You might want to try this, you have like 40 columns.
import pandas as pd
df = pd.read_csv('input.csv', names=['PurchaseOrderNumber','ShipmentFinalDestinationCity','TransferPointCity','POType','PlannedMode','ProgramType','FreightPaymentTerms','ContainerNumber','BL/AWB#','Mode','ShipmentFinalDestinationLocation','CarrierSCAC','Carrier','Forwarder','BrandDesc','POLCity','PODCity','InDCOutlookDate','InDCOriginalDate','AnticipatedShipDate','PlannedStockedDate','ExFactoryActualDate(LT)','OriginConsolActualDate(LT)','DepartLoadPortActualDate(LT)','FullOutGatefromOceanTerminal(CYorPort)ActualDate(LT)','DPArrivalActualDate(LT)','FreightAvailableActualDate(LT)','DestConsolActualDate(LT)','DomDepartActualDate(LT)','YardArrivalActualDate(LT)','CarrierDropActualDate(LT)','InDCActualDate(LT)','StockedActualDate(LT)','Vessel','VesselETADischargePortCity','DPArrivalOutlookDate','VesselETADischargePortActualDate(LT)','FullOutGatefromOceanTerminal(CYorPort)OutlookDate','StockedOutlookDate','ShipmentLeg#','Metrics','TotalShippedQty']
print df
Recently, I wanna process a csv file, the code is like this:
data = pd.read_csv(dir, sep=" ")
print(data)
the output also put all values in one row,
then I just use the default "sep" value, the problem has been solved.
data = pd.read_csv(dir, sep=",")
the situation seems like different from which the asker raised,
but I hope it's helpful for some other friends like me,
and this is my first comment, I hope it's not too bad!
It may not be the best option but it works!
Read the file as it is:
df = pd.read_csv('input.csv')
Get all the column names and assign them to a variable.
names= df.columns.str.split(',').tolist()
Split all the values by ','
df= df.iloc[:,0].str.split(',', expand=True)
Finally, assign 'names' to column names and that's it!
df.columns = names
I was also having the same issues. All of the columns were coming as one value. So the following worked for me.
df = pd.read_csv('/content/Reviews.csv',
sep=',',
error_bad_lines=False,
engine='python')