ParserError when using Pandas script - python

I'm getting the below error when trying to convert csx to xlx format using Pandas script.
I tried running the below script:
import os
os.chdir("/opt/alb_test/alb/albt1/Source/alb/al/conversion/scripts")
# Reading the csv file
import pandas as pd
print(pd.__file__)
df_new = pd.read_csv("sourcefile.csv", sep="|", header=None).dropna(axis=1, how="all")
# saving xlsx file
df_new.to_excel("sourcefile.xlsx", index=False)
I am getting the error mentioned below:
Traceback (most recent call last):
File "/opt/alb_test/alb/albt1/Source/alb/al/conversion/scripts/pythn.py", line 13, in <module>
df = pd.read_csv("ff_mdm_reject_report.csv", lineterminator='\n')
File "/opt/infa/MDW-env/lib/python3.9/site-packages/pandas/util/_decorators.py", line 211, in wrapper
return func(*args, **kwargs)
File "/opt/infa/MDW-env/lib/python3.9/site-packages/pandas/util/_decorators.py", line 331, in wrapper
return func(*args, **kwargs)
File "/opt/infa/MDW-env/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 950, in read_csv
return _read(filepath_or_buffer, kwds)
File "/opt/infa/MDW-env/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 611, in _read
return parser.read(nrows)
File "/opt/infa/MDW-env/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1778, in read
) = self._engine.read( # type: ignore[attr-defined]
File "/opt/infa/MDW-env/lib/python3.9/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 230, in read
chunks = self._reader.read_low_memory(nrows)
File "pandas/_libs/parsers.pyx", line 808, in pandas._libs.parsers.TextReader.read_low_memory
File "pandas/_libs/parsers.pyx", line 866, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 852, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas/_libs/parsers.pyx", line 1973, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 30488, saw 2
Can anyone guide me how to fix it?
Thanks in advance!

Related

TypeError: __init__() got an unexpected keyword argument 'xxid'

I have been getting the below errors for trying to open a file with xlsx extension.
import pandas as pd
readFile = pd.read_excel("test.xlsx")
Error below
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/dummy/venv/lib/python3.9/site-packages/pandas/util/_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "/Users/dummy/venv/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 457, in read_excel
io = ExcelFile(io, storage_options=storage_options, engine=engine)
File "/Users/dummy/venv/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 1419, in __init__
self._reader = self._engines[engine](self._io, storage_options=storage_options)
File "/Users/dummy/venv/lib/python3.9/site-packages/pandas/io/excel/_openpyxl.py", line 525, in __init__
super().__init__(filepath_or_buffer, storage_options=storage_options)
File "/Users/dummy/venv/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 518, in __init__
self.book = self.load_workbook(self.handles.handle)
File "/Users/dummy/venv/lib/python3.9/site-packages/pandas/io/excel/_openpyxl.py", line 536, in load_workbook
return load_workbook(
File "/Users/dummy/venv/lib/python3.9/site-packages/openpyxl/reader/excel.py", line 317, in load_workbook
reader.read()
File "/Users/dummy/venv/lib/python3.9/site-packages/openpyxl/reader/excel.py", line 281, in read
apply_stylesheet(self.archive, self.wb)
File "/Users/dummy/venv/lib/python3.9/site-packages/openpyxl/styles/stylesheet.py", line 198, in apply_stylesheet
stylesheet = Stylesheet.from_tree(node)
File "/Users/dummy/venv/lib/python3.9/site-packages/openpyxl/styles/stylesheet.py", line 103, in from_tree
return super(Stylesheet, cls).from_tree(node)
File "/Users/dummy/venv/lib/python3.9/site-packages/openpyxl/descriptors/serialisable.py", line 87, in from_tree
obj = desc.expected_type.from_tree(el)
File "/Users/dummy/venv/lib/python3.9/site-packages/openpyxl/descriptors/serialisable.py", line 87, in from_tree
obj = desc.expected_type.from_tree(el)
File "/Users/dummy/venv/lib/python3.9/site-packages/openpyxl/descriptors/serialisable.py", line 103, in from_tree
return cls(**attrib)
TypeError: __init__() got an unexpected keyword argument 'xxid'
NB:
I found a hack around reading this file. I open the file with my Excel on my pc and I do a Ctrl+S. After that, it works fine. But it can be very exhausting when I'm trying to work with over 100 files.
Any help is appreciated. Thank you

Could someone help solving a Pandas error when saving a csv file with python-requests?

Could someone help me understand why this code thrown an error and provide a solution how I could save my file on my local machine:
myLast = result[:1]
for x in myLast:
urldailyLocal= os.path.basename(x)
s=requests.get(x, verify=False).content
c=pd.read_csv(s)
c.to_csv('path/to/my/file/'+urldailyLocal, index=False)
error after running the above code:
Traceback (most recent call last):
File "<stdin>", line 4, in <module>
File "/usr/local/lib/python3.7/site-packages/pandas/io/parsers.py", line 676, in parser_f
return _read(filepath_or_buffer, kwds)
File "/usr/local/lib/python3.7/site-packages/pandas/io/parsers.py", line 448, in _read
parser = TextFileReader(fp_or_buf, **kwds)
File "/usr/local/lib/python3.7/site-packages/pandas/io/parsers.py", line 880, in __init__
self._make_engine(self.engine)
File "/usr/local/lib/python3.7/site-packages/pandas/io/parsers.py", line 1114, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/usr/local/lib/python3.7/site-packages/pandas/io/parsers.py", line 1891, in __init__
self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 374, in pandas._libs.parsers.TextReader.__cinit__
File "pandas/_libs/parsers.pyx", line 694, in pandas._libs.parsers.TextReader._setup_parser_source
OSError: Expected file path name or file-like object, got <class 'bytes'> type
myLast list stores an url in such format:
'https://test.com/something/example_2020-09-27-10.51PST_ALL.csv'
The below should work for you
s = requests.get(x, verify=False).content
df = pd.read_csv(io.StringIO(s.decode('utf-8')))

I am getting error when opening a CSV file in pycharm

I am using pycharm and when i run a code of opening a csv file using pandas I am getting an error of no existence.
I saved the csv file in my project directory and called it using pandas.
import pandas as pd
df = pd.read_csv("E:\\students")
print(df)
The error when i run the code:
Traceback (most recent call last): File "E:/untitled232/file1.py", line 2, in <module>
df = pd.read_csv("E:\\students") File "E:\untitled232\venv\lib\site-packages\pandas\io\parsers.py", line 678, in parser_f
return _read(filepath_or_buffer, kwds) File "E:\untitled232\venv\lib\site-packages\pandas\io\parsers.py", line 440, in _read
parser = TextFileReader(filepath_or_buffer, **kwds) File "E:\untitled232\venv\lib\site-packages\pandas\io\parsers.py", line 787, in __init__
self._make_engine(self.engine) File "E:\untitled232\venv\lib\site-packages\pandas\io\parsers.py", line 1014, in _make_engine
self._engine = CParserWrapper(self.f, **self.options) File "E:\untitled232\venv\lib\site-packages\pandas\io\parsers.py", line 1708, in __init__
self._reader = parsers.TextReader(src, **kwds) File "pandas\_libs\parsers.pyx", line 384, in pandas._libs.parsers.TextReader.__cinit__ File "pandas\_libs\parsers.pyx", line 695, in pandas._libs.parsers.TextReader._setup_parser_source FileNotFoundError: File b'E:\\students' does not exist
It seems I had to put .csv after the name.

Import strings with line terminator from csv to dask dataframe

I have a csv with strings containing line terminator I can import with panda with this code :
df_desc = pd.read_csv(import_desc, sep="|")
But when I try to import it in a dask dataframe :
import dask.dataframe as ddf
import_info = "data/info.csv"
df_desc = ddf.read_csv(import_desc, sep="|", blocksize=None, dtype='str')
I get this error :
Traceback (most recent call last):
File "/Applications/PyCharm CE.app/Contents/helpers/pydev/pydevd.py", line 1578, in <module>
globals = debugger.run(setup['file'], None, None, is_module)
File "/Applications/PyCharm CE.app/Contents/helpers/pydev/pydevd.py", line 1015, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/data_extraction_dask.py", line 10, in <module>
df_desc = ddf.read_table(import_desc, sep="|", blocksize=None, dtype='str')
File "/anaconda2/lib/python2.7/site-packages/dask/dataframe/io/csv.py", line 323, in read
**kwargs)
File "/anaconda2/lib/python2.7/site-packages/dask/dataframe/io/csv.py", line 243, in read_pandas
head = reader(BytesIO(b_sample), **kwargs)
File "/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 655, in parser_f
return _read(filepath_or_buffer, kwds)
File "/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 411, in _read
data = parser.read(nrows)
File "/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 982, in read
ret = self._engine.read(nrows)
File "/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 1719, in read
data = self._reader.read(nrows)
File "pandas/_libs/parsers.pyx", line 890, in pandas._libs.parsers.TextReader.read (pandas/_libs/parsers.c:10862)
File "pandas/_libs/parsers.pyx", line 912, in pandas._libs.parsers.TextReader._read_low_memory (pandas/_libs/parsers.c:11138)
File "pandas/_libs/parsers.pyx", line 966, in pandas._libs.parsers.TextReader._read_rows (pandas/_libs/parsers.c:11884)
File "pandas/_libs/parsers.pyx", line 953, in pandas._libs.parsers.TextReader._tokenize_rows (pandas/_libs/parsers.c:11755)
File "pandas/_libs/parsers.pyx", line 2184, in pandas._libs.parsers.raise_parser_error (pandas/_libs/parsers.c:28765)
pandas.errors.ParserError: Error tokenizing data. C error: EOF inside string starting at line 130
The documentation mention :
It should also be noted that this function may fail if a CSV file
includes quoted strings that contain the line terminator. To get
around this you can specify blocksize=None to not split files into
multiple partitions, at the cost of reduced parallelism.
That's why I used blocksize=None but this function use a sampling strategy that use the first bytes of the file to determine the type of columns and , I think, generate this error.
I can't skip the samping step even by indicating the type with dtypes.
Is there any workaround ?

MemoryError in Python when trying to load a big CSV file

When I load a big CSV file with pandas I get the following MemoryError:
Traceback (most recent call last):
File "/home/k/workspace/loans/src/loans.py", line 100, in <module>
X_test = testdata('test_v2.csv')
File "/home/k/workspace/loans/src/loans.py", line 18, in testdata
X = pd.read_table(filename, sep=',', warn_bad_lines=True, error_bad_lines=True)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 420, in parser_f
return _read(filepath_or_buffer, kwds)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 225, in _read
return parser.read()
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 626, in read
ret = self._engine.read(nrows)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 1070, in read
data = self._reader.read(nrows)
File "parser.pyx", line 727, in pandas.parser.TextReader.read (pandas/parser.c:6866)
File "parser.pyx", line 777, in pandas.parser.TextReader._read_low_memory (pandas/parser.c:7452)
File "parser.pyx", line 1788, in pandas.parser._concatenate_chunks (pandas/parser.c:20462)
MemoryError
The file is of size 1 GB. R opens it without much of a trouble (which is weird because if I understand correctly, R is higher level than Python...)
I am running the code on Intel(R) Core(TM) i3 CPU 550 # 3.20GHz with 4GB RAM. I am running the code on Linux Ubuntu 12.04 32-bit.
Is there any trick to make it work?
Thanks!

Categories