problems dealing with pandas read csv - python
I've got a problem with pandas read_csv. I had a many txt files that associate with stock market.It's like this:
SecCode,SecName,Tdate,Ttime,LastClose,OP,CP,Tq,Tm,Tt,Cq,Cm,Ct,HiP,LoP,SYL1,SYL2,Rf1,Rf2,bs,s5,s4,s3,s2,s1,b1,b2,b3,b4,b5,sv5,sv4,sv3,sv2,sv1,bv1,bv2,bv3,bv4,bv5,bsratio,spd,rpd,depth1,depth2
600000,浦发银行,20120104,091501,8.490,.000,.000,0,.000,0,0,.000,0,.000,.000,.000,.000,.000,.000, ,.000,.000,.000,.000,8.600,8.600,.000,.000,.000,.000,0,0,0,0,1100,1100,38900,0,0,0,.00,.000,.00,.00,.00
600000,浦发银行,20120104,091506,8.490,.000,.000,0,.000,0,0,.000,0,.000,.000,.000,.000,.000,.000, ,.000,.000,.000,.000,8.520,8.520,.000,.000,.000,.000,0,0,0,0,56795,56795,33605,0,0,0,.00,.000,.00,.00,.00
600000,浦发银行,20120104,091511,8.490,.000,.000,0,.000,0,0,.000,0,.000,.000,.000,.000,.000,.000, ,.000,.000,.000,.000,8.520,8.520,.000,.000,.000,.000,0,0,0,0,56795,56795,34605,0,0,0,.00,.000,.00,.00,.00
600000,浦发银行,20120104,091551,8.490,.000,.000,0,.000,0,0,.000,0,.000,.000,.000,.000,.000,.000, ,.000,.000,.000,.000,8.520,8.520,.000,.000,.000,.000,0,0,0,0,56795,56795,35205,0,0,0,.00,.000,.00,.00,.00
600000,浦发银行,20120104,091621,8.490,.000,.000,0,.000,0,0,.000,0,.000,.000,.000,.000,.000,.000, ,.000,.000,.000,.000,8.520,8.520,.000,.000,.000,.000,0,0,0,0,57795,57795,34205,0,0,0,.00,.000,.00,.00,.00
while I use this code to read it :
fields = ['SecCode', 'Tdate','Ttime','LastClose','OP','CP','Rf1','Rf2']
df = pd.read_csv('SHL1_TAQ_600000_201201.txt',usecols=fields)
But I got a problem:
Traceback (most recent call last):
File "E:/workspace/Senti/highlevel/highlevel.py", line 8, in <module>
df = pd.read_csv('SHL1_TAQ_600000_201201.txt',usecols=fields,header=1)
File "D:\Anaconda2\lib\site-packages\pandas\io\parsers.py", line 562, in parser_f
return _read(filepath_or_buffer, kwds)
File "D:\Anaconda2\lib\site-packages\pandas\io\parsers.py", line 315, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "D:\Anaconda2\lib\site-packages\pandas\io\parsers.py", line 645, in __init__
self._make_engine(self.engine)
File "D:\Anaconda2\lib\site-packages\pandas\io\parsers.py", line 799, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "D:\Anaconda2\lib\site-packages\pandas\io\parsers.py", line 1257, in __init__
raise ValueError("Usecols do not match names.")
ValueError: Usecols do not match names.
I can't find any problem similar to mine.And also it's wired when I copy the txt file into another one ,the code runs well,but the original one cause the above problem.How can I solve it ?
In your message, you said that you're a running:
df = pd.read_csv('SHL1_TAQ_600000_201201.txt',usecols=fields)
Which did not throw an error for me and #Anil_M. But from your traceback, it is possible to see that the command used is another one:
df = pd.read_csv('SHL1_TAQ_600000_201201.txt',usecols=fields, header=1)
which includes a header=1 and it throws the error mentioned.
So, I would guess that the error comes from some confusion on your code.
Use names instead of usecols while specifying parameter.
Related
Recieving error when trying to read csv file with pandas
I am tring to create a data frame from a csv file I downloaded. Using pandas I tried to read the file but is cannot manage to do so. I belive the code is correct but it is returning an OSError. I tried turning the file path into a raw string, I tried using double backward slashes, i tried to change the name of the file as the "-" in "ml-100k" name could be the issue. nothing worked. import pandas as pd user = pd.read_csv(r"C:\Users\Luca\Documents\ml-100k/u.user", sep="|", names = ["User ID","Age","Gender","Occupation","Zip Code"]) This returned an error of: SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 12-13: truncated \UXXXXXXXX escape import pandas as pd user = pd.read_csv("C:/Users/Luca/Documents/ml-100k/u.user", sep="|", names = ["User ID","Age","Gender","Occupation","Zip Code"]) This returned me an error of: Traceback (most recent call last): File "<pyshell#2>", line 1, in <module> user = pd.read_csv("C:/Users/Luca/Documents/ml-100k/u.user", sep="|", names = ["User ID","Age","Gender","Occupation","Zip Code"]) File "C:\Users\Luca\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\util\_decorators.py", line 211, in wrapper return func(*args, **kwargs) File "C:\Users\Luca\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\util\_decorators.py", line 331, in wrapper return func(*args, **kwargs) File "C:\Users\Luca\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\readers.py", line 950, in read_csv return _read(filepath_or_buffer, kwds) File "C:\Users\Luca\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\readers.py", line 605, in _read parser = TextFileReader(filepath_or_buffer, **kwds) File "C:\Users\Luca\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\readers.py", line 1442, in __init__ self._engine = self._make_engine(f, self.engine) File "C:\Users\Luca\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\readers.py", line 1735, in _make_engine self.handles = get_handle( File "C:\Users\Luca\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\common.py", line 856, in get_handle handle = open( OSError: [Errno 22] Invalid argument: '\u202aC:/Users/Luca/Documents/ml-100k/u.user' Please show me a way to format the filepath in order to be able to read the csv file. Thank you
Why does pandas.read_csv read all of the blank values in one code but not the other?
I am writing code that requires use of the pandas.read_csv function and I like to use a test python file before I implement my code into the main python file. The part of the CSV file I am trying to read in just has data displayed in a seemingly random fashion with no real columns or headers. I just want to read the information into a dataframe so I can parse the data exactly where I know it is going to be for every file. For the test code and the main code I am using the same list of CSV files. The only difference is the test code runs in a different folder and does not sit inside a function. In my test code I have no issue extracting data from the CSV file using the read_csv function but on my main program it is giving me errors. In my test code I can easily use pd.read_csv this way: for x in range(len(filelist)): df = pd.read_csv(filelist[x], index_col=False, nrows=15, header=None, usecols=(0,1,2,3), dtype={0:"string",1:"string",2:"string",3:"string"}) print(df) The output is shown below: Output from test code execution However, when I try to port this over into my main code it won't function the same way. If I copy the code exactly it says there is no column 1,2, or 3. My next step was to erase the usecols and dtype variables and then it gave me the error: pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 2, saw 2 I tried adding the comma delimiter and I tried changing the engine to python. Neither worked. Eventually, I determined from the tokenizing error that the program was expecting less information on certain lines so I broke up the code so that there were two dataframes and each would hold a respective amount of columns. This finally worked. The dataframes I created were structured as shown below: df1 = pd.read_csv(filelist[x],skiprows=range(1), index_col=False, nrows=11, header=None) df2 = pd.read_csv(filelist[x],skiprows=range(0,13), index_col=False, nrows=2, usecols=(0,1,2), header=None) print(df1) print(df2) The output for this is shown below: Output from main code execution This gives me something I can work with to accomplish my task, but it was extremely frustrating working through this and I have no idea why I even needed to go through all of this. I still will have to go back through and make some final adjustments including all the calls to the variables I need from these so if I can figure out why it is not working the same in the main code it would make my life a little easier. Does anyone have any clue why I had to make these adjustments? It seems the code for my main program is not reading in empty cells or just takes the amount of spaces used for the first row it looks at and just assumes the rest should be the same. Any information would be greatly appreciated. Thank you. I am adding the full error messages below. I made it so it only calls the first file in the list for debugging purposes. This first one is when I copy the read_csv command over exactly: Traceback (most recent call last): File "c:\Users\jacob.hollidge\Desktop\DCPR Threshold\DCPRthresholdV2.0.py", line 484, in <module> checkfilevariables(filelist) File "c:\Users\jacob.hollidge\Desktop\DCPR Threshold\DCPRthresholdV2.0.py", line 221, in checkfilevariables df = pd.read_csv(filelist[0], index_col=False, nrows=15, header=None, usecols=(0,1,2,3), File "C:\Users\jacob.hollidge\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\util\_decorators.py", line 311, in wrapper return func(*args, **kwargs) File "C:\Users\jacob.hollidge\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\readers.py", line 680, in read_csv return _read(filepath_or_buffer, kwds) File "C:\Users\jacob.hollidge\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\readers.py", line 575, in _read parser = TextFileReader(filepath_or_buffer, **kwds) File "C:\Users\jacob.hollidge\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\readers.py", line 933, in __init__ self._engine = self._make_engine(f, self.engine) File "C:\Users\jacob.hollidge\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\readers.py", line 1231, in _make_engine return mapping[engine](f, **self.options) File "C:\Users\jacob.hollidge\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\c_parser_wrapper.py", line 146, in __init__ self._validate_usecols_names( File "C:\Users\jacob.hollidge\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\base_parser.py", line 913, in _validate_usecols_names raise ValueError( ValueError: Usecols do not match columns, columns expected but not found: [1, 2, 3] This next error occurs after I remove usecols and dtype from the parameters. Traceback (most recent call last): File "c:\Users\jacob.hollidge\Desktop\DCPR Threshold\DCPRthresholdV2.0.py", line 483, in <module> checkfilevariables(filelist) File "c:\Users\jacob.hollidge\Desktop\DCPR Threshold\DCPRthresholdV2.0.py", line 221, in checkfilevariables df = pd.read_csv(filelist[0], index_col=False, nrows=15, header=None) File "C:\Users\jacob.hollidge\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\util\_decorators.py", line 311, in wrapper return func(*args, **kwargs) File "C:\Users\jacob.hollidge\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\readers.py", line 680, in read_csv return _read(filepath_or_buffer, kwds) File "C:\Users\jacob.hollidge\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\readers.py", line 581, in _read return parser.read(nrows) File "C:\Users\jacob.hollidge\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\readers.py", line 1250, in read index, columns, col_dict = self._engine.read(nrows) File "C:\Users\jacob.hollidge\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\c_parser_wrapper.py", line 225, in read chunks = self._reader.read_low_memory(nrows) File "pandas\_libs\parsers.pyx", line 817, in pandas._libs.parsers.TextReader.read_low_memory File "pandas\_libs\parsers.pyx", line 861, in pandas._libs.parsers.TextReader._read_rows File "pandas\_libs\parsers.pyx", line 847, in pandas._libs.parsers.TextReader._tokenize_rows File "pandas\_libs\parsers.pyx", line 1960, in pandas._libs.parsers.raise_parser_error pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 2, saw 2 This final set of errors is given after I add the delimiter=',' and engine='python' parameters while usecols and dtypes have still been removed. Traceback (most recent call last): File "c:\Users\jacob.hollidge\Desktop\DCPR Threshold\DCPRthresholdV2.0.py", line 483, in <module> checkfilevariables(filelist) File "c:\Users\jacob.hollidge\Desktop\DCPR Threshold\DCPRthresholdV2.0.py", line 221, in checkfilevariables df = pd.read_csv(filelist[0], index_col=False, nrows=15, header=None, delimiter=',', engine='python') File "C:\Users\jacob.hollidge\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\util\_decorators.py", line 311, in wrapper return func(*args, **kwargs) File "C:\Users\jacob.hollidge\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\readers.py", line 680, in read_csv return _read(filepath_or_buffer, kwds) File "C:\Users\jacob.hollidge\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\readers.py", line 581, in _read return parser.read(nrows) File "C:\Users\jacob.hollidge\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\readers.py", line 1250, in read index, columns, col_dict = self._engine.read(nrows) File "C:\Users\jacob.hollidge\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\python_parser.py", line 270, in read alldata = self._rows_to_cols(content) File "C:\Users\jacob.hollidge\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\python_parser.py", line 1013, in _rows_to_cols self._alert_malformed(msg, row_num + 1) File "C:\Users\jacob.hollidge\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\python_parser.py", line 739, in _alert_malformed raise ParserError(msg) pandas.errors.ParserError: Expected 2 fields in line 13, saw 4
Pandas and glob: convert all xlsx files in folder to csv – TypeError: __init__() got an unexpected keyword argument 'xfid'
I have a folder with many xlsx files that I'd like to convert to csv files. During my research, if found several threads about this topic, such as this or that one. Based on this, I formulated the following code using glob and pandas: import glob import pandas as pd path = r'/Users/.../xlsx files' excel_files = glob.glob(path + '/*.xlsx') for excel in excel_files: out = excel.split('.')[0]+'.csv' df = pd.read_excel(excel) # error occurs here df.to_csv(out) But unfortunately, I got the following error message that I could not interpret in this context and I could not figure out how to solve this problem: Traceback (most recent call last): File "<input>", line 11, in <module> File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/util/_decorators.py", line 299, in wrapper return func(*args, **kwargs) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 336, in read_excel io = ExcelFile(io, storage_options=storage_options, engine=engine) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 1131, in __init__ self._reader = self._engines[engine](self._io, storage_options=storage_options) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/io/excel/_openpyxl.py", line 475, in __init__ super().__init__(filepath_or_buffer, storage_options=storage_options) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 391, in __init__ self.book = self.load_workbook(self.handles.handle) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/io/excel/_openpyxl.py", line 486, in load_workbook return load_workbook( File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/openpyxl/reader/excel.py", line 317, in load_workbook reader.read() File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/openpyxl/reader/excel.py", line 281, in read apply_stylesheet(self.archive, self.wb) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/openpyxl/styles/stylesheet.py", line 198, in apply_stylesheet stylesheet = Stylesheet.from_tree(node) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/openpyxl/styles/stylesheet.py", line 103, in from_tree return super(Stylesheet, cls).from_tree(node) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/openpyxl/descriptors/serialisable.py", line 87, in from_tree obj = desc.expected_type.from_tree(el) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/openpyxl/descriptors/serialisable.py", line 87, in from_tree obj = desc.expected_type.from_tree(el) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/openpyxl/descriptors/serialisable.py", line 103, in from_tree return cls(**attrib) TypeError: __init__() got an unexpected keyword argument 'xfid' Does anyone know how to fix this? Thanks a lot for your help!
I had the same problem here. After some hours thinking and searching I realized the problem is, actually, the file. I opened it using MS Excel, and save. Alakazan, problem solved. The file was downloaded, so i think it's a "security" error or just an error from how the file was created. xD EDIT: It's not a security problem, but actually an error from the generation of file. The correct has the double of kb the wrong file. An solution is: if using xlrd==1.2.0 the file can be opened, you can, after doing this, call read_excel to the Book(file opened by xlrd). import xlrd # df = pd.read_excel('TabelaPrecos.xlsx') # The line above is the same result a = xlrd.open_workbook('TabelaPrecos.xlsx') b = pd.read_excel(a)
Can not import CSV file in Spyder using read_csv, ValueError: Only callable can be used as callback
This is my code. I am trying to import my dataset that is in the same directory I am working in, but it gives me ValueError. Code: import numpy as np import matplotlib.pyplot as plt import pandas as pd #Importing Dataset dataset = pd.read_csv("dataset.csv") Error with full traceback: Traceback (most recent call last): File "<ipython-input-34-7b10dca7f8e2>", line 6, in <module> dataset = pd.read_csv("dataset.csv") File "C:\Users\saraj\anaconda3\envs\AI project\lib\site-packages\pandas\io\parsers.py", line 688, in read_csv return _read(filepath_or_buffer, kwds) File "C:\Users\saraj\anaconda3\envs\AI project\lib\site-packages\pandas\io\parsers.py", line 460, in _read data = parser.read(nrows) File "C:\Users\saraj\anaconda3\envs\AI project\lib\site-packages\pandas\io\parsers.py", line 1213, in read df = DataFrame(col_dict, columns=columns, index=index) File "C:\Users\saraj\anaconda3\envs\AI project\lib\site-packages\pandas\core\frame.py", line 468, in __init__ mgr = init_dict(data, index, columns, dtype=dtype) File "C:\Users\saraj\anaconda3\envs\AI project\lib\site-packages\pandas\core\internals\construction.py", line 259, in init_dict if missing.any() and not is_integer_dtype(dtype): File "C:\Users\saraj\anaconda3\envs\AI project\lib\site-packages\pandas\core\generic.py", line 11580, in logical_func return self._reduce( File "C:\Users\saraj\anaconda3\envs\AI project\lib\site-packages\pandas\core\series.py", line 4248, in _reduce with np.errstate(all="ignore"): File "C:\Users\saraj\anaconda3\envs\AI project\lib\site-packages\numpy\core\_ufunc_config.py", line 436, in __enter__ self.oldcall = seterrcall(self.call) File "C:\Users\saraj\anaconda3\envs\AI project\lib\site-packages\numpy\core\_ufunc_config.py", line 308, in seterrcall raise ValueError("Only callable can be used as callback") ValueError: Only callable can be used as callback Please help me understand what is happening here and how to solve it?
When your .csv has corrupted data that error might be, obtained. The solution for that is, import pandas as pd dataset = pd.read_csv("dataset.csv",dtype=str) Sometimes, it resolves the problem. Because the data might be in a different format from the one which you're trying to read. Or sometimes there might be inconsistencies in "delimiter". So, please check data accordingly and it might solve your problem.
Strategy to open a corrupt csv file in pandas
I have got a bunch of csv files that I am loading in Pandas just fine, but one file is acting up I'm opening it this way : df = pd.DataFrame.from_csv(csv_file) error: File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/frame.py", line 1268, in from_csv encoding=encoding,tupleize_cols=False) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/io/parsers.py", line 400, in parser_f return _read(filepath_or_buffer, kwds) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/io/parsers.py", line 198, in _read parser = TextFileReader(filepath_or_buffer, **kwds) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/io/parsers.py", line 479, in init self._make_engine(self.engine) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/io/parsers.py", line 586, in _make_engine self._engine = CParserWrapper(self.f, **self.options) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/io/parsers.py", line 957, in init self._reader = _parser.TextReader(src, **kwds) File "parser.pyx", line 477, in pandas.parser.TextReader.cinit (pandas/parser.c:4434) File "parser.pyx", line 599, in pandas.parser.TextReader._get_header (pandas/parser.c:5831) pandas.parser.CParserError: Passed header=0 but only 0 lines in file To me, this means that there is some sort of corruption in the file, having a quick look is seems fine, it is a big file though and visually checking every single line is not an option, what would be a good strategy to troubleshoot a csv file that pandas won't open ? thank you
Looks like pandas assigns line 0 as the header. Try calling: df = pd.DataFrame.from_csv(csv_file,header=None) or df = pd.DataFrame.read_csv(csv_file,header=None) However, it's strange that the files seems to have zero lines (i.e. it's empty). Maybe the filepath is wrong?
if in Linux open it with head in the operating system to inspect it then fix it with awk or sed.. if in windows, you could also try vim to inspect and fix it. In short it probably is not best to fix the file in Pandas. You most likely have odd line endings (since the error message says 0 lines) so heading the file or cat or using Vim is needed to determine the line endings so that you can decide how best to fix or handle.
I encountered the issue like you: /usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas-0.13.1_601_g4663353-py2.7-macosx-10.9-x86_64.egg/pandas/io/parsers.pyc in init(self, src, **kwds) 970 kwds['allow_leading_cols'] = self.index_col is not False 971 --> 972 self._reader = _parser.TextReader(src, **kwds) 973 974 # XXX /usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas-0.13.1_601_g4663353-py2.7-macosx-10.9-x86_64.egg/pandas/parser.so in pandas.parser.TextReader.cinit (pandas/parser.c:4628)() /usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas-0.13.1_601_g4663353-py2.7-macosx-10.9-x86_64.egg/pandas/parser.so in pandas.parser.TextReader._get_header (pandas/parser.c:6068)() CParserError: Passed header=0 but only 0 lines in file My code is: df = pd.read_csv('/Users/steven/Documents/Mywork/Python/sklearn/beer/data') Finally, I found I have made a mistake: I sent a path of directory instead of file to read_csv. The correct code is: df = pd.read_csv('/Users/steven/Documents/Mywork/Python/sklearn/beer/data/beer_reviews.csv') It runs right. So, I think the reason of your issue lies in the file you sent. Maybe it is path of directory just as I have done. Maybe the file is empty or corrupt, or in wrong encoding set. I hope the above is helpful to you.