I'm getting the below error when trying to convert csx to xlx format using Pandas script.
I tried running the below script:
import os
os.chdir("/opt/alb_test/alb/albt1/Source/alb/al/conversion/scripts")
# Reading the csv file
import pandas as pd
print(pd.__file__)
df_new = pd.read_csv("sourcefile.csv", sep="|", header=None).dropna(axis=1, how="all")
# saving xlsx file
df_new.to_excel("sourcefile.xlsx", index=False)
I am getting the error mentioned below:
Traceback (most recent call last):
File "/opt/alb_test/alb/albt1/Source/alb/al/conversion/scripts/pythn.py", line 13, in <module>
df = pd.read_csv("ff_mdm_reject_report.csv", lineterminator='\n')
File "/opt/infa/MDW-env/lib/python3.9/site-packages/pandas/util/_decorators.py", line 211, in wrapper
return func(*args, **kwargs)
File "/opt/infa/MDW-env/lib/python3.9/site-packages/pandas/util/_decorators.py", line 331, in wrapper
return func(*args, **kwargs)
File "/opt/infa/MDW-env/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 950, in read_csv
return _read(filepath_or_buffer, kwds)
File "/opt/infa/MDW-env/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 611, in _read
return parser.read(nrows)
File "/opt/infa/MDW-env/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1778, in read
) = self._engine.read( # type: ignore[attr-defined]
File "/opt/infa/MDW-env/lib/python3.9/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 230, in read
chunks = self._reader.read_low_memory(nrows)
File "pandas/_libs/parsers.pyx", line 808, in pandas._libs.parsers.TextReader.read_low_memory
File "pandas/_libs/parsers.pyx", line 866, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 852, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas/_libs/parsers.pyx", line 1973, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 30488, saw 2
Can anyone guide me how to fix it?
Thanks in advance!
Related
I have been getting the below errors for trying to open a file with xlsx extension.
import pandas as pd
readFile = pd.read_excel("test.xlsx")
Error below
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/dummy/venv/lib/python3.9/site-packages/pandas/util/_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "/Users/dummy/venv/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 457, in read_excel
io = ExcelFile(io, storage_options=storage_options, engine=engine)
File "/Users/dummy/venv/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 1419, in __init__
self._reader = self._engines[engine](self._io, storage_options=storage_options)
File "/Users/dummy/venv/lib/python3.9/site-packages/pandas/io/excel/_openpyxl.py", line 525, in __init__
super().__init__(filepath_or_buffer, storage_options=storage_options)
File "/Users/dummy/venv/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 518, in __init__
self.book = self.load_workbook(self.handles.handle)
File "/Users/dummy/venv/lib/python3.9/site-packages/pandas/io/excel/_openpyxl.py", line 536, in load_workbook
return load_workbook(
File "/Users/dummy/venv/lib/python3.9/site-packages/openpyxl/reader/excel.py", line 317, in load_workbook
reader.read()
File "/Users/dummy/venv/lib/python3.9/site-packages/openpyxl/reader/excel.py", line 281, in read
apply_stylesheet(self.archive, self.wb)
File "/Users/dummy/venv/lib/python3.9/site-packages/openpyxl/styles/stylesheet.py", line 198, in apply_stylesheet
stylesheet = Stylesheet.from_tree(node)
File "/Users/dummy/venv/lib/python3.9/site-packages/openpyxl/styles/stylesheet.py", line 103, in from_tree
return super(Stylesheet, cls).from_tree(node)
File "/Users/dummy/venv/lib/python3.9/site-packages/openpyxl/descriptors/serialisable.py", line 87, in from_tree
obj = desc.expected_type.from_tree(el)
File "/Users/dummy/venv/lib/python3.9/site-packages/openpyxl/descriptors/serialisable.py", line 87, in from_tree
obj = desc.expected_type.from_tree(el)
File "/Users/dummy/venv/lib/python3.9/site-packages/openpyxl/descriptors/serialisable.py", line 103, in from_tree
return cls(**attrib)
TypeError: __init__() got an unexpected keyword argument 'xxid'
NB:
I found a hack around reading this file. I open the file with my Excel on my pc and I do a Ctrl+S. After that, it works fine. But it can be very exhausting when I'm trying to work with over 100 files.
Any help is appreciated. Thank you
Could someone help me understand why this code thrown an error and provide a solution how I could save my file on my local machine:
myLast = result[:1]
for x in myLast:
urldailyLocal= os.path.basename(x)
s=requests.get(x, verify=False).content
c=pd.read_csv(s)
c.to_csv('path/to/my/file/'+urldailyLocal, index=False)
error after running the above code:
Traceback (most recent call last):
File "<stdin>", line 4, in <module>
File "/usr/local/lib/python3.7/site-packages/pandas/io/parsers.py", line 676, in parser_f
return _read(filepath_or_buffer, kwds)
File "/usr/local/lib/python3.7/site-packages/pandas/io/parsers.py", line 448, in _read
parser = TextFileReader(fp_or_buf, **kwds)
File "/usr/local/lib/python3.7/site-packages/pandas/io/parsers.py", line 880, in __init__
self._make_engine(self.engine)
File "/usr/local/lib/python3.7/site-packages/pandas/io/parsers.py", line 1114, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/usr/local/lib/python3.7/site-packages/pandas/io/parsers.py", line 1891, in __init__
self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 374, in pandas._libs.parsers.TextReader.__cinit__
File "pandas/_libs/parsers.pyx", line 694, in pandas._libs.parsers.TextReader._setup_parser_source
OSError: Expected file path name or file-like object, got <class 'bytes'> type
myLast list stores an url in such format:
'https://test.com/something/example_2020-09-27-10.51PST_ALL.csv'
The below should work for you
s = requests.get(x, verify=False).content
df = pd.read_csv(io.StringIO(s.decode('utf-8')))
I am using pycharm and when i run a code of opening a csv file using pandas I am getting an error of no existence.
I saved the csv file in my project directory and called it using pandas.
import pandas as pd
df = pd.read_csv("E:\\students")
print(df)
The error when i run the code:
Traceback (most recent call last): File "E:/untitled232/file1.py", line 2, in <module>
df = pd.read_csv("E:\\students") File "E:\untitled232\venv\lib\site-packages\pandas\io\parsers.py", line 678, in parser_f
return _read(filepath_or_buffer, kwds) File "E:\untitled232\venv\lib\site-packages\pandas\io\parsers.py", line 440, in _read
parser = TextFileReader(filepath_or_buffer, **kwds) File "E:\untitled232\venv\lib\site-packages\pandas\io\parsers.py", line 787, in __init__
self._make_engine(self.engine) File "E:\untitled232\venv\lib\site-packages\pandas\io\parsers.py", line 1014, in _make_engine
self._engine = CParserWrapper(self.f, **self.options) File "E:\untitled232\venv\lib\site-packages\pandas\io\parsers.py", line 1708, in __init__
self._reader = parsers.TextReader(src, **kwds) File "pandas\_libs\parsers.pyx", line 384, in pandas._libs.parsers.TextReader.__cinit__ File "pandas\_libs\parsers.pyx", line 695, in pandas._libs.parsers.TextReader._setup_parser_source FileNotFoundError: File b'E:\\students' does not exist
It seems I had to put .csv after the name.
I have a csv with strings containing line terminator I can import with panda with this code :
df_desc = pd.read_csv(import_desc, sep="|")
But when I try to import it in a dask dataframe :
import dask.dataframe as ddf
import_info = "data/info.csv"
df_desc = ddf.read_csv(import_desc, sep="|", blocksize=None, dtype='str')
I get this error :
Traceback (most recent call last):
File "/Applications/PyCharm CE.app/Contents/helpers/pydev/pydevd.py", line 1578, in <module>
globals = debugger.run(setup['file'], None, None, is_module)
File "/Applications/PyCharm CE.app/Contents/helpers/pydev/pydevd.py", line 1015, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/data_extraction_dask.py", line 10, in <module>
df_desc = ddf.read_table(import_desc, sep="|", blocksize=None, dtype='str')
File "/anaconda2/lib/python2.7/site-packages/dask/dataframe/io/csv.py", line 323, in read
**kwargs)
File "/anaconda2/lib/python2.7/site-packages/dask/dataframe/io/csv.py", line 243, in read_pandas
head = reader(BytesIO(b_sample), **kwargs)
File "/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 655, in parser_f
return _read(filepath_or_buffer, kwds)
File "/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 411, in _read
data = parser.read(nrows)
File "/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 982, in read
ret = self._engine.read(nrows)
File "/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 1719, in read
data = self._reader.read(nrows)
File "pandas/_libs/parsers.pyx", line 890, in pandas._libs.parsers.TextReader.read (pandas/_libs/parsers.c:10862)
File "pandas/_libs/parsers.pyx", line 912, in pandas._libs.parsers.TextReader._read_low_memory (pandas/_libs/parsers.c:11138)
File "pandas/_libs/parsers.pyx", line 966, in pandas._libs.parsers.TextReader._read_rows (pandas/_libs/parsers.c:11884)
File "pandas/_libs/parsers.pyx", line 953, in pandas._libs.parsers.TextReader._tokenize_rows (pandas/_libs/parsers.c:11755)
File "pandas/_libs/parsers.pyx", line 2184, in pandas._libs.parsers.raise_parser_error (pandas/_libs/parsers.c:28765)
pandas.errors.ParserError: Error tokenizing data. C error: EOF inside string starting at line 130
The documentation mention :
It should also be noted that this function may fail if a CSV file
includes quoted strings that contain the line terminator. To get
around this you can specify blocksize=None to not split files into
multiple partitions, at the cost of reduced parallelism.
That's why I used blocksize=None but this function use a sampling strategy that use the first bytes of the file to determine the type of columns and , I think, generate this error.
I can't skip the samping step even by indicating the type with dtypes.
Is there any workaround ?
When I load a big CSV file with pandas I get the following MemoryError:
Traceback (most recent call last):
File "/home/k/workspace/loans/src/loans.py", line 100, in <module>
X_test = testdata('test_v2.csv')
File "/home/k/workspace/loans/src/loans.py", line 18, in testdata
X = pd.read_table(filename, sep=',', warn_bad_lines=True, error_bad_lines=True)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 420, in parser_f
return _read(filepath_or_buffer, kwds)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 225, in _read
return parser.read()
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 626, in read
ret = self._engine.read(nrows)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 1070, in read
data = self._reader.read(nrows)
File "parser.pyx", line 727, in pandas.parser.TextReader.read (pandas/parser.c:6866)
File "parser.pyx", line 777, in pandas.parser.TextReader._read_low_memory (pandas/parser.c:7452)
File "parser.pyx", line 1788, in pandas.parser._concatenate_chunks (pandas/parser.c:20462)
MemoryError
The file is of size 1 GB. R opens it without much of a trouble (which is weird because if I understand correctly, R is higher level than Python...)
I am running the code on Intel(R) Core(TM) i3 CPU 550 # 3.20GHz with 4GB RAM. I am running the code on Linux Ubuntu 12.04 32-bit.
Is there any trick to make it work?
Thanks!