How to read HDF table from pandas? - python

I have an my_file.h5 file that, presumably, contains data in HDF5 format (PyTables). I try to read this file using pandas:
import pandas as pd
store = pd.HDFStore('my_file.h5')
Then I try to use the store object:
print store
As a result I get:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/pymodules/python2.7/pandas/io/pytables.py", line 133, in __repr__
kind = v._v_attrs.pandas_type
File "/usr/lib/python2.7/dist-packages/tables/attributeset.py", line 302, in __getattr__
(name, self._v__nodePath)
AttributeError: Attribute 'pandas_type' does not exist in node: '/data'
Does anybody know what am I doing wrong? Can the problem be caused by the fact that my *.h5 is not really what I think it is (not data in hdf5 format)?

In your /usr/lib/pymodules/python2.7/pandas/io/pytables.py, line 133
kind = v._v_attrs.pandas_type
In my pytables.py I see
kind = getattr(n._v_attrs,'pandas_type',None)
By using getattr, if there is no pandas_type attribute, then kind is set to None. I'm guessing my version of Pandas
In [7]: import pandas as pd
In [8]: pd.__version__
Out[8]: '0.10.0'
is newer than yours. If so, the fix is to upgrade your pandas.

I had a h5 table. Made with pytables independent of pandas and needed to turn it into a list of tuples then import it to a df. This woked nice because it allows me to make use of my pytables index to run a "where" on the input. This saves me reading all the rows.

Related

Facing Wierd Issue using pandas-0.24.2

I am using python/pandas on Windows 10 from last month & did not face the below issue that suddenly came into being. I have a csv file that is read with pandas. However, the dataframe is arbitrarily joining the comma separated heading into one & while doing this abruptly leaving off last few characters, as a result of this, the code though very simple, is failing. Has anyone seen this kind of problem? Suggestions to overcome this would be of great help
Was trying to check the date format to be in 'yyyy-mm-dd'. Since I got the error, put a print statement to check column names,
Reinstalled python 3.6.8, pandas etc, but that did not help.
import pandas as pd
df = pd.read_csv('Data.csv','r')
print(df.columns)
for pdt in df.PublicDate:
try:
dat = pdt[0:10]
if dat[4] != '-' or dat[7] != '-':
print('\nPub Date Format Error',dat)
except TypeError as e:
print(e)
Test Data csv file has:
PIC,PublicDate,Version,OriginalDate,BPD
ABCD,2019-06-15T19:25:22.000000000Z,1,2019-06-1519.25.22.0000000000,15-06-2019
EFGH,06/15/2019T19:26:22.000000000Z,,2019-06-1519.26.22.0000000000,15-06-2019
IJKL,2019-06-15T20:26:22.000000000Z,1,2019-06-1520.26.22.0000000000,6/25/2019
MNOP,,,2019-06-1520.26.22.0000000000,6/25/2019
QRST,2019-06-15T22:26:22.000000000Z,1,,6/25/2019
Expected:
dates of the format 6/25/2019 should be pointed out for not being in the format 2019-06-25
Actual Result: Below Error
=============== RESTART: H:\Python\DateFormat.py ===============
Index(['PIC,PublicDate,Ve', 'sion,O', 'iginalDate,BPD'], dtype='object')
Traceback (most recent call last):
File "H:\Program Files\Python\DateFormat.py", line 8, in <module>
for pdt in df.PublicDate:
File "G:\Program Files\lib\site-packages\pandas\core\generic.py", line 5067, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'PublicDate'
The problem in the second parameter:
df = pd.read_csv('Data.csv','r')
Without it the example works fine:
df = pd.read_csv('Data.csv')
It happens because the second parameter is separator, not access modifier. With this configuration pandas still available to read the file but cannot create an index or work properly.

Open and edit excel via python

I want to import an existing excel file and edit it. But when i copy the excel file and try to edit on it i get some errors. I did not get errors while trying to execute "write" command. But when i am trying to read some values in the cell, i am having problem.
import xlsxwriter
from xlrd import open_workbook
from xlwt import Workbook, easyxf
import xlwt
from xlutils.copy import copy
workbook=open_workbook("month.xlsx")
sheet=workbook.sheet_by_index(0)
print sheet.nrows
book = copy(workbook)
w_sheet=book.get_sheet(0)
print w_sheet.cell(0,0).value
Error: Traceback (most recent call last):
File "excel.py", line 18, in <module>
print w_sheet.cell(0,0).value
AttributeError: 'Worksheet' object has no attribute 'cell'
I haven't used this library, but looking at the documentation I think you are trying to do something it doesn't support. The worksheet documentation lists it's functionality and cell() is not there.
I think this library is for writing excel only, not reading.
Perhaps try pandas read_excel() to read the excel documents you create?
You can the use pandas iloc on the resulting dataframe to get the value you want:
value=pd.read_excel("file.xlsx", sheet_name="sheet").iloc[0,0]
I think that's correct, although I can't run the code to check just now...

openpyxl read excel with filtered data

With openpyxl, I am reading an excel file which has some filters applied already.
from openpyxl import load_workbook
wb = load_workbook('C:\Users\dsivaji\Downloads\testcases.xlsx')
ws = wb['TestCaseList']
print ws['B3'].value
My goal to loop through the content of the column 'B'. With this I will be able to read the content of the cell 'B3'. If filters applied and in that case, I don't want to start from the initial cell.
i.e. whichever visible in the excel (after applying the filters) , those alone I want to fetch.
After searching in web for sometime, found that ws.row_dimensions can help with the visible property, but still no luck.
>>> ws.row_dimensions[1]
<openpyxl.worksheet.dimensions.RowDimension object at 0x03EF5B48>
>>> ws.row_dimensions[2]
<openpyxl.worksheet.dimensions.RowDimension object at 0x03EF5B70>
>>> ws.row_dimensions[3].visible
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'RowDimension' object has no attribute 'visible'
How to achieve this ?
You are almost there. The name of the attribute is hidden. If you replace visible in your code with hidden, it should work.
openpyxl is a library for the OOXML file format (.xlsx) and not a replacement for an application like Microsoft Excel. As such support for filters is limited to reading and writing their definitions but not applying them.

rpy2 and pandas: PandasError: DataFrame constructor not properly called

I am trying to create a pandas DataFrame from an R Dataframe. I am encountering the following error, which I cannot figure out.
Traceback (most recent call last):
File "", line 1, in
File "/Library/Python/2.7/site-packages/pandas/core/frame.py", line 291, in init
raise PandasError('DataFrame constructor not properly called!')
PandasError: DataFrame constructor not properly called!
The code I am using is:
import rpy2.robjects as robjects
from rpy2.robjects import r
robjects.r['load']("file.RData")
my_data = pd.DataFrame(r['ops.data'])
and the error comes after the last line.
You need to read in data sequentially uses a for loop. DataFrames don't easily read in data in the way you are representing it. They are much more suited to dictionaries. Write some headers and then write the data underneath the headers.
Furthermore by saying ['ops.data'] means you are specifying "ops.data" as a data header. Obviously you can't read in an entire file as a column header

Python cannot import CSV using DataFrame

I am using Pandas to import some csv file into Python.
my code is:
import pandas as pd
data_df = pd.read_csv('highfrequency2.csv')
print data_df.head()
but there is always an error message:
**Traceback (most recent call last):
File "G:\Python\sdfasdfasdfasdfasdf.py", line 7, in <module>
import pandas as pd
File "G:\Python\pandas.py", line 9, in <module>
from pandas import DataFrame
ImportError: cannot import name DataFrame**
Can some one figure out why ? Many thanks !!!
It look like you've called one of your own programs pandas:
G:\Python\pandas.py
So this is the one Python is trying to import, and the one which doesn't have a DataFrame object.
Rename your program, delete any cached objects (pandas.pyc or pandas.pyo), and restart your Python interpreter.

Categories