Python Issue with class variables pointing to functions - python

I want to use objects to handle reading various types of input data. The final implementation will be much more complicated and basically the parent class will define some methods and just use read_func and the subclasses can handle the implementation of read_func, probably pointing to something like pd.read_excel or maybe adding on a few data cleaning steps. But I'm getting this odd error, here's a small reproducible example:
test.py:
import pandas as pd
class test:
read_func = pd.read_excel
print(pd.read_excel("test.xlsx")) # prints the excel fine
print(test().read_func) # prints <bound method read_excel of <__main__.test object at 0x104cebc70>>
print(test().read_func("test.xlsx")) # throws error
The error trace looks like this:
Traceback (most recent call last):
File "my/file/path/test.py", line 6, in <module>
test().read_func("test.xlsx")
File "/opt/homebrew/lib/python3.9/site-packages/pandas/util/_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "/opt/homebrew/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 457, in read_excel
io = ExcelFile(io, storage_options=storage_options, engine=engine)
File "/opt/homebrew/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 1376, in __init__
ext = inspect_excel_format(
File "/opt/homebrew/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 1250, in inspect_excel_format
with get_handle(
File "/opt/homebrew/lib/python3.9/site-packages/pandas/io/common.py", line 670, in get_handle
ioargs = _get_filepath_or_buffer(
File "/opt/homebrew/lib/python3.9/site-packages/pandas/io/common.py", line 427, in _get_filepath_or_buffer
raise ValueError(msg)
ValueError: Invalid file path or buffer object type: <class '__main__.test'>
What am I doing wrong here?

Try this:
import pandas as pd
class test:
def __init__(self):
self.read_func = pd.read_excel
print(pd.read_excel("test.xlsx"))
print(test().read_func)
print(test().read_func("test.xlsx"))

Related

Pandas concat() in Python 3.9 in a def returning the error: No objects to concatenate, while inline scripting there is no error

Using Python 3.9 I have several pandas dataframes, each are stored within the self.pointcloud.<dict>.
This looks something like the following, named after their date of capture:
self.pointcloud['20180712']['data']
self.pointcloud['20180713']['data']
self.pointcloud['20180714']['data']
Each ['data'] dict is a pandas dataframe all containing the same columns.
I'm running into an issue then I try to concat them into a single dict.
I can easily get all the dataframes in a list:
def get_data(self, tag):
return [self.pointcloud[pc]['data'] for pc in self.pointcloud if self.pointcloud[pc]['data']['tag'].unique() == [tag]]
This uses list comprehension, it filters the data based on the tag value relative to the tag column, but that's not an issue.
So now I have all the data I want to merge them into one single dataframe, if I run my code in debug mode the following gives me no issue and runs as expected:
tag = 'a value'
df_list = self.get_data(tag)
new_df = pd.concat(df_list)
However, when I add it to a def I get the error: ValueError: No objects to concatenate
def merge_pointclouds(self, tag):
df_list = self.get_data(tag)
return pd.concat(df_list)
self.merge_pointclouds('a value')
As I stated, in debug mode it works, even within the merge_pointclouds def, is there anything obvious that I'm missing?
The exact error from Pycharm:
Traceback (most recent call last):
File "C:\Program Files\JetBrains\PyCharm Community Edition 2020.2.2\plugins\python-ce\helpers\pydev\pydevd.py", line 1448, in _exec
pydev_imports.execfile(file, globals, locals) # execute the script
File "C:\Program Files\JetBrains\PyCharm Community Edition 2020.2.2\plugins\python-ce\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "C:/.../PycharmProjects/PCPro2023_0_3_2py39-64/pc.py", line 927, in <module>
debug(debugging=debugging)
File "C:/.../PycharmProjects/PCPro2023_0_3_2py39-64/pc.py", line 68, in debug
pc.set_distance_between()
File "C:/.../PycharmProjects/PCPro2023_0_3_2py39-64/pc.py", line 388, in set_distance_between
data_top = self.merge_pointclouds(tag='Top')
File "C:/.../PycharmProjects/PCPro2023_0_3_2py39-64/pc.py", line 221, in merge_pointclouds
res = pd.concat(df_list)
File "C:\...\PycharmProjects\PCPro2023_0_3_py39-64\venv\lib\site-packages\pandas\util\_decorators.py", line 331, in wrapper
return func(*args, **kwargs)
File "C:\...\PycharmProjects\PCPro2023_0_3_py39-64\venv\lib\site-packages\pandas\core\reshape\concat.py", line 368, in concat
op = _Concatenator(
File "C:\...\PycharmProjects\PCPro2023_0_3_py39-64\venv\lib\site-packages\pandas\core\reshape\concat.py", line 425, in __init__
raise ValueError("No objects to concatenate")
ValueError: No objects to concatenate

Pandas and glob: convert all xlsx files in folder to csv – TypeError: __init__() got an unexpected keyword argument 'xfid'

I have a folder with many xlsx files that I'd like to convert to csv files.
During my research, if found several threads about this topic, such as this or that one. Based on this, I formulated the following code using glob and pandas:
import glob
import pandas as pd
path = r'/Users/.../xlsx files'
excel_files = glob.glob(path + '/*.xlsx')
for excel in excel_files:
out = excel.split('.')[0]+'.csv'
df = pd.read_excel(excel) # error occurs here
df.to_csv(out)
But unfortunately, I got the following error message that I could not interpret in this context and I could not figure out how to solve this problem:
Traceback (most recent call last):
File "<input>", line 11, in <module>
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/util/_decorators.py", line 299, in wrapper
return func(*args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 336, in read_excel
io = ExcelFile(io, storage_options=storage_options, engine=engine)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 1131, in __init__
self._reader = self._engines[engine](self._io, storage_options=storage_options)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/io/excel/_openpyxl.py", line 475, in __init__
super().__init__(filepath_or_buffer, storage_options=storage_options)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 391, in __init__
self.book = self.load_workbook(self.handles.handle)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/io/excel/_openpyxl.py", line 486, in load_workbook
return load_workbook(
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/openpyxl/reader/excel.py", line 317, in load_workbook
reader.read()
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/openpyxl/reader/excel.py", line 281, in read
apply_stylesheet(self.archive, self.wb)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/openpyxl/styles/stylesheet.py", line 198, in apply_stylesheet
stylesheet = Stylesheet.from_tree(node)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/openpyxl/styles/stylesheet.py", line 103, in from_tree
return super(Stylesheet, cls).from_tree(node)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/openpyxl/descriptors/serialisable.py", line 87, in from_tree
obj = desc.expected_type.from_tree(el)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/openpyxl/descriptors/serialisable.py", line 87, in from_tree
obj = desc.expected_type.from_tree(el)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/openpyxl/descriptors/serialisable.py", line 103, in from_tree
return cls(**attrib)
TypeError: __init__() got an unexpected keyword argument 'xfid'
Does anyone know how to fix this? Thanks a lot for your help!
I had the same problem here. After some hours thinking and searching I realized the problem is, actually, the file. I opened it using MS Excel, and save. Alakazan, problem solved.
The file was downloaded, so i think it's a "security" error or just an error from how the file was created. xD
EDIT:
It's not a security problem, but actually an error from the generation of file. The correct has the double of kb the wrong file.
An solution is: if using xlrd==1.2.0 the file can be opened, you can, after doing this, call read_excel to the Book(file opened by xlrd).
import xlrd
# df = pd.read_excel('TabelaPrecos.xlsx')
# The line above is the same result
a = xlrd.open_workbook('TabelaPrecos.xlsx')
b = pd.read_excel(a)

Can not import CSV file in Spyder using read_csv, ValueError: Only callable can be used as callback

This is my code. I am trying to import my dataset that is in the same directory I am working in, but it gives me ValueError.
Code:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
#Importing Dataset
dataset = pd.read_csv("dataset.csv")
Error with full traceback:
Traceback (most recent call last):
File "<ipython-input-34-7b10dca7f8e2>", line 6, in <module>
dataset = pd.read_csv("dataset.csv")
File "C:\Users\saraj\anaconda3\envs\AI project\lib\site-packages\pandas\io\parsers.py", line 688, in read_csv
return _read(filepath_or_buffer, kwds)
File "C:\Users\saraj\anaconda3\envs\AI project\lib\site-packages\pandas\io\parsers.py", line 460, in _read
data = parser.read(nrows)
File "C:\Users\saraj\anaconda3\envs\AI project\lib\site-packages\pandas\io\parsers.py", line 1213, in read
df = DataFrame(col_dict, columns=columns, index=index)
File "C:\Users\saraj\anaconda3\envs\AI project\lib\site-packages\pandas\core\frame.py", line 468, in __init__
mgr = init_dict(data, index, columns, dtype=dtype)
File "C:\Users\saraj\anaconda3\envs\AI project\lib\site-packages\pandas\core\internals\construction.py", line 259, in init_dict
if missing.any() and not is_integer_dtype(dtype):
File "C:\Users\saraj\anaconda3\envs\AI project\lib\site-packages\pandas\core\generic.py", line 11580, in logical_func
return self._reduce(
File "C:\Users\saraj\anaconda3\envs\AI project\lib\site-packages\pandas\core\series.py", line 4248, in _reduce
with np.errstate(all="ignore"):
File "C:\Users\saraj\anaconda3\envs\AI project\lib\site-packages\numpy\core\_ufunc_config.py", line 436, in __enter__
self.oldcall = seterrcall(self.call)
File "C:\Users\saraj\anaconda3\envs\AI project\lib\site-packages\numpy\core\_ufunc_config.py", line 308, in seterrcall
raise ValueError("Only callable can be used as callback")
ValueError: Only callable can be used as callback
Please help me understand what is happening here and how to solve it?
When your .csv has corrupted data that error might be, obtained.
The solution for that is,
import pandas as pd
dataset = pd.read_csv("dataset.csv",dtype=str)
Sometimes, it resolves the problem. Because the data might be in a different format from the one which you're trying to read. Or sometimes there might be inconsistencies in "delimiter". So, please check data accordingly and it might solve your problem.

Any easy solution to resolve pickling error

I am using savez to save the weights. Following is my code:
class vgg16:
def __init__(self, imgs1,imgs2, weights=None, sess=None):
.........
self.weight_list=[]
self.keys=[]
........
self.SaveWeights()
....neural network............
def SaveWeights(self):
tmp = file("vgg16_predict.npz",'wb')
np.savez(self,**dict(zip(self.keys, self.weight_list)))
tmp.close
I keep getting the pickling error. There are different solutions provided. But is there an easiest way to make this happen?
Here is the traceback:
Traceback (most recent call last):
File "f.py", line 350, in <module>
vgg = vgg16(imgs1,imgs2, 'vgg16_weights.npz', sess)
File "f.py", line 43, in __init__
self.SaveWeights()
File "f.py", line 339, in SaveWeights
np.savez(self,**dict(zip(self.keys, self.weight_list)))
File "/usr/local/lib/python2.7/dist-packages/numpy/lib/npyio.py", line 574, in savez
_savez(file, args, kwds, False)
File "/usr/local/lib/python2.7/dist-packages/numpy/lib/npyio.py", line 639, in _savez
pickle_kwargs=pickle_kwargs)
File "/usr/local/lib/python2.7/dist-packages/numpy/lib/format.py", line 573, in write_array
pickle.dump(array, fp, protocol=2, **pickle_kwargs)
cPickle.PicklingError: Can't pickle <type 'module'>: attribute lookup __builtin__.module failed
Exception AttributeError: "vgg16 instance has no attribute 'tell'" in <bound method ZipFile.__del__ of <zipfile.ZipFile object at 0x7f812dec99d0>> ignored
You need to pickle to a file. Just use the path directly:
np.savez("vgg16_predict.npz", **dict(zip(self.keys, self.weight_list)))
So, this should be you full method:
def SaveWeights(self):
np.savez("vgg16_predict.npz", **dict(zip(self.keys, self.weight_list)))

problems dealing with pandas read csv

I've got a problem with pandas read_csv. I had a many txt files that associate with stock market.It's like this:
SecCode,SecName,Tdate,Ttime,LastClose,OP,CP,Tq,Tm,Tt,Cq,Cm,Ct,HiP,LoP,SYL1,SYL2,Rf1,Rf2,bs,s5,s4,s3,s2,s1,b1,b2,b3,b4,b5,sv5,sv4,sv3,sv2,sv1,bv1,bv2,bv3,bv4,bv5,bsratio,spd,rpd,depth1,depth2
600000,浦发银行,20120104,091501,8.490,.000,.000,0,.000,0,0,.000,0,.000,.000,.000,.000,.000,.000, ,.000,.000,.000,.000,8.600,8.600,.000,.000,.000,.000,0,0,0,0,1100,1100,38900,0,0,0,.00,.000,.00,.00,.00
600000,浦发银行,20120104,091506,8.490,.000,.000,0,.000,0,0,.000,0,.000,.000,.000,.000,.000,.000, ,.000,.000,.000,.000,8.520,8.520,.000,.000,.000,.000,0,0,0,0,56795,56795,33605,0,0,0,.00,.000,.00,.00,.00
600000,浦发银行,20120104,091511,8.490,.000,.000,0,.000,0,0,.000,0,.000,.000,.000,.000,.000,.000, ,.000,.000,.000,.000,8.520,8.520,.000,.000,.000,.000,0,0,0,0,56795,56795,34605,0,0,0,.00,.000,.00,.00,.00
600000,浦发银行,20120104,091551,8.490,.000,.000,0,.000,0,0,.000,0,.000,.000,.000,.000,.000,.000, ,.000,.000,.000,.000,8.520,8.520,.000,.000,.000,.000,0,0,0,0,56795,56795,35205,0,0,0,.00,.000,.00,.00,.00
600000,浦发银行,20120104,091621,8.490,.000,.000,0,.000,0,0,.000,0,.000,.000,.000,.000,.000,.000, ,.000,.000,.000,.000,8.520,8.520,.000,.000,.000,.000,0,0,0,0,57795,57795,34205,0,0,0,.00,.000,.00,.00,.00
while I use this code to read it :
fields = ['SecCode', 'Tdate','Ttime','LastClose','OP','CP','Rf1','Rf2']
df = pd.read_csv('SHL1_TAQ_600000_201201.txt',usecols=fields)
But I got a problem:
Traceback (most recent call last):
File "E:/workspace/Senti/highlevel/highlevel.py", line 8, in <module>
df = pd.read_csv('SHL1_TAQ_600000_201201.txt',usecols=fields,header=1)
File "D:\Anaconda2\lib\site-packages\pandas\io\parsers.py", line 562, in parser_f
return _read(filepath_or_buffer, kwds)
File "D:\Anaconda2\lib\site-packages\pandas\io\parsers.py", line 315, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "D:\Anaconda2\lib\site-packages\pandas\io\parsers.py", line 645, in __init__
self._make_engine(self.engine)
File "D:\Anaconda2\lib\site-packages\pandas\io\parsers.py", line 799, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "D:\Anaconda2\lib\site-packages\pandas\io\parsers.py", line 1257, in __init__
raise ValueError("Usecols do not match names.")
ValueError: Usecols do not match names.
I can't find any problem similar to mine.And also it's wired when I copy the txt file into another one ,the code runs well,but the original one cause the above problem.How can I solve it ?
In your message, you said that you're a running:
df = pd.read_csv('SHL1_TAQ_600000_201201.txt',usecols=fields)
Which did not throw an error for me and #Anil_M. But from your traceback, it is possible to see that the command used is another one:
df = pd.read_csv('SHL1_TAQ_600000_201201.txt',usecols=fields, header=1)
which includes a header=1 and it throws the error mentioned.
So, I would guess that the error comes from some confusion on your code.
Use names instead of usecols while specifying parameter.

Categories