Python Pandas convert date to epoch timestamp - python

From CSV file, i'm trying with pandas to convert a date column to epoch timestamp as follow, but i got some errors:
csv:
<<Electric power and temperature Information>>
Date,Electric power average,Electric power maximum value,Electric power minimum value,...,...
2021/12/02 00:00:00,1524,1553,1506,22,22,22,,,,,,,21,21,21,,,,,,,,,,,,,,,,,,,,,,,,
2021/12/01 22:00:00,1521,1547,1468,22,22,22,,,,,,,21,21,21,,,,,,,,,,,,,,,,,,,,,,,,
2021/12/01 20:00:00,1546,1613,1524,22,22,22,,,,,,,21,21,21,,,,,,,,,,,,,,,,,,,,,,,,
2021/12/01 18:00:00,1553,1595,1525,22,22,22,,,,,,,21,21,21,,,,,,,,,,,,,,,,,,,,,,,,
2021/12/01 16:00:00,1541,1593,1520,22,22,22,,,,,,,21,21,21,,,,,,,,,,,,,,,,,,,,,,,,
2021/12/01 14:00:00,1540,1580,1514,22,22,22,,,,,,,21,21,21,,,,,,,,,,,,,,,,,,,,,,,,
code:
csv_envfile = csvfile.csv
df = pd.read_csv(csv_envfile[0], skiprows=[0])
date_pattern='%Y/%m/%d %H:%M:%S '
df['epoch'] = df.apply(lambda row: int(time.mktime(time.strptime(row.time,date_pattern))), axis=0) # create epoch as a column
print("epoch:",df['epoch'])
error:
Traceback (most recent call last):
File "./02-pickle-client.py", line 622, in <module>
main()
File "./02-pickle-client.py", line 576, in main
execute_run_csv_environnement(confcsv_path, storage_type, serial)
File "./02-pickle-client.py", line 434, in execute_run_csv_environnement
run_csv_environnement(sock, delay, csvfile, storage_type, serial)
File "./02-pickle-client.py", line 402, in run_csv_environnement
df['epoch'] = df.apply(lambda row: int(time.mktime(time.strptime(row.time,date_pattern))), axis=0) # create epoch as a column
File "/usr/local/lib64/python3.6/site-packages/pandas/core/frame.py", line 7552, in apply
return op.get_result()
File "/usr/local/lib64/python3.6/site-packages/pandas/core/apply.py", line 185, in get_result
return self.apply_standard()
File "/usr/local/lib64/python3.6/site-packages/pandas/core/apply.py", line 276, in apply_standard
results, res_index = self.apply_series_generator()
File "/usr/local/lib64/python3.6/site-packages/pandas/core/apply.py", line 305, in apply_series_generator
results[i] = self.f(v)
File "./02-pickle-client.py", line 402, in <lambda>
df['epoch'] = df.apply(lambda row: int(time.mktime(time.strptime(row.time,date_pattern))), axis=0) # create epoch as a column
File "/usr/local/lib64/python3.6/site-packages/pandas/core/generic.py", line 5141, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'Series' object has no attribute 'time'
Many thanks for help

You should select the Date column when applying the lambda function. In your case this should work:
import pandas as pd
import time
csv_envfile = csvfile.csv
df = pd.read_csv(csv_envfile[0], skiprows=[0])
date_pattern='%Y/%m/%d %H:%M:%S'
df['epoch'] = df["Date"].apply(lambda row: int(time.mktime(time.strptime(row,date_pattern))))

Related

Handling OutOfBoundsDatetime error in pandas

I am trying to apply few functions on a pandas data frame, but am getting OutOfBoundsDatetime error -
def data_cleanser(data):
clean_df = data.str.strip().str.replace(r'\\', '')
return clean_df
connection = pyodbc.connect(conn)
sql = 'SELECT * FROM {}'.format(tablename)
df = pd.read_sql_query(sql, connection)
df.replace([None], np.nan, inplace=True)
df.fillna('', inplace=True)
df=df.applymap(str)
df = df.apply(lambda x: data_cleanser(x))
Error message:
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Python_Scripts/lib64/python3.4/site-packages/pandas/core/internals.py", line 763, in replace
copy=not inplace) for b in blocks]
File "/Python_Scripts/lib64/python3.4/site-packages/pandas/core/internals.py", line 763, in <listcomp>
copy=not inplace) for b in blocks]
File "/Python_Scripts/lib64/python3.4/site-packages/pandas/core/internals.py", line 2135, in convert
blocks = self.split_and_operate(None, f, False)
File "/Python_Scripts/lib64/python3.4/site-packages/pandas/core/internals.py", line 478, in split_and_operate
nv = f(m, v, i)
File "/Python_Scripts/lib64/python3.4/site-packages/pandas/core/internals.py", line 2125, in f
values = fn(v.ravel(), **fn_kwargs)
File "/Python_Scripts/lib64/python3.4/site-packages/pandas/core/dtypes/cast.py", line 807, in soft_convert_objects
values = lib.maybe_convert_objects(values, convert_datetime=datetime)
File "pandas/_libs/src/inference.pyx", line 1290, in pandas._libs.lib.maybe_convert_objects
File "pandas/_libs/tslib.pyx", line 1575, in pandas._libs.tslib.convert_to_tsobject
File "pandas/_libs/tslib.pyx", line 1669, in pandas._libs.tslib.convert_datetime_to_tsobject
File "pandas/_libs/tslib.pyx", line 1848, in pandas._libs.tslib._check_dts_bounds
pandas._libs.tslib.OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1-01-01 07:00:00
Sample data frame:
appointment_id
start_time
end_time
emp_id
302205
2016-10-26 17:30:00
2016-10-26 18:30:00
45807
462501
2017-04-10 13:00:00
NaT
45807
How can i avoid this error?

KeyError: 0 when used in existing function, otherwise the code works fine

I want to do the following:
I have data in long format organized by dates
Sometimes, data is missing as it there is no record of it
I found a solution by interpolating missing data using reindex which works fine when used outside of function, but for some reason, doesn't work when used inside of a function
def sum_customer_portfolio(country, sold_to):
df = pd.merge(etl_customer_portfolio(), etl_week(), how="left", on=["Country", "GCAS"])
df = df.loc[df["Country"].isin(country)]
df = df.loc[df["Sold_to"].isin(sold_to)]
df_week = etl_week()
df_week = df_week.dropna(subset=["Sold_to"])
df_week = df_week[["Week_num", "Date_range"]]
df_week = df_week.drop_duplicates(subset=["Date_range"])
sum_df = pd.merge(df, df_week, how="outer", on=["Week_num", "Date_range"])
sum_df["Stat_unit_qty"] = sum_df["Stat_unit_qty"].fillna(0, axis=0)
sum_df[["Country", "Sold_to", "Customer"]] = sum_df[["Country", "Sold_to", "Customer"]].fillna(method="ffill",
axis=0)
sum_df = sum_df.fillna("DUMMY_NOT_USE").replace("DUMMY_NOT_USE", np.nan)
reindex_subset = sum_df[["GCAS", "Week_num", "Stat_unit_qty"]]
reindex_subset = reindex_subset.dropna()
reindex_subset = reindex_subset.set_index("Week_num")
reindex_subset = (reindex_subset.groupby("GCAS").apply(
lambda x: x.reindex(list(range(reindex_subset.index.min(), reindex_subset.index.max() + 1)), fill_value=0))
.drop("GCAS", axis=1).
reset_index("GCAS").
fillna(0).
reset_index())
reindex_subset = reindex_subset.drop(columns=["Stat_unit_qty"])
final_df = pd.merge(sum_df, reindex_subset, how="outer", on=["GCAS", "Week_num"])
current_date = datetime.now().strftime("%d%m%Y_%H%M%S")
# return sum_df.to_excel(f"CUSTOMER_PORTFOLIO-{current_date}.xlsx", sheet_name="GCAS_SUM", index=False)
return final_df
Code above keeps giving me the following error:
Traceback (most recent call last):
File "C:\Users\xxxxxxx\PycharmProjects\pythonProject\venv\lib\site-packages\pandas\core\indexes\base.py", line 3361, in get_loc
return self._engine.get_loc(casted_key)
File "pandas\_libs\index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 103, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 135, in pandas._libs.index.IndexEngine._get_loc_duplicates
File "pandas\_libs\index_class_helper.pxi", line 51, in pandas._libs.index.Float64Engine._maybe_get_bool_indexer
File "pandas\_libs\index.pyx", line 161, in pandas._libs.index.IndexEngine._unpack_bool_indexer
KeyError: 0
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\xxxxxxx\PycharmProjects\pythonProject\main.py", line 167, in <module>
sum_customer_portfolio(country=["Croatia", "Slovenia"], sold_to=[2000829798, 2000558171]).to_excel(writer, index=False, sheet_name="GCAS_SUM")
File "C:\Users\xxxxxxx\PycharmProjects\pythonProject\main.py", line 113, in sum_customer_portfolio
reindex_subset = (reindex_subset.groupby(["GCAS", "Sold_to"]).apply(
File "C:\Users\xxxxxxx\PycharmProjects\pythonProject\venv\lib\site-packages\pandas\core\groupby\groupby.py", line 1253, in apply
result = self._python_apply_general(f, self._selected_obj)
File "C:\Users\xxxxxxx\PycharmProjects\pythonProject\venv\lib\site-packages\pandas\core\groupby\groupby.py", line 1287, in _python_apply_general
keys, values, mutated = self.grouper.apply(f, data, self.axis)
File "C:\Users\xxxxxxx\PycharmProjects\pythonProject\venv\lib\site-packages\pandas\core\groupby\ops.py", line 783, in apply
result_values, mutated = splitter.fast_apply(f, sdata, group_keys)
File "C:\Users\xxxxxxx\PycharmProjects\pythonProject\venv\lib\site-packages\pandas\core\groupby\ops.py", line 1328, in fast_apply
return libreduction.apply_frame_axis0(sdata, f, names, starts, ends)
File "pandas\_libs\reduction.pyx", line 369, in pandas._libs.reduction.apply_frame_axis0
File "pandas\_libs\reduction.pyx", line 428, in pandas._libs.reduction.BlockSlider.__init__
File "C:\Users\xxxxxxx\PycharmProjects\pythonProject\venv\lib\site-packages\pandas\core\frame.py", line 3430, in __getitem__
indexer = convert_to_index_sliceable(self, key)
File "C:\Users\xxxxxxx\PycharmProjects\pythonProject\venv\lib\site-packages\pandas\core\indexing.py", line 2329, in convert_to_index_sliceable
return idx._convert_slice_indexer(key, kind="getitem")
File "C:\Users\xxxxxxx\PycharmProjects\pythonProject\venv\lib\site-packages\pandas\core\indexes\numeric.py", line 242, in _convert_slice_indexer
return self.slice_indexer(key.start, key.stop, key.step, kind=kind)
File "C:\Users\xxxxxxx\PycharmProjects\pythonProject\venv\lib\site-packages\pandas\core\indexes\base.py", line 5686, in slice_indexer
start_slice, end_slice = self.slice_locs(start, end, step=step)
File "C:\Users\xxxxxxx\PycharmProjects\pythonProject\venv\lib\site-packages\pandas\core\indexes\base.py", line 5894, in slice_locs
end_slice = self.get_slice_bound(end, "right")
File "C:\Users\xxxxxxx\PycharmProjects\pythonProject\venv\lib\site-packages\pandas\core\indexes\base.py", line 5808, in get_slice_bound
raise err
File "C:\Users\xxxxxxx\PycharmProjects\pythonProject\venv\lib\site-packages\pandas\core\indexes\base.py", line 5802, in get_slice_bound
slc = self.get_loc(label)
File "C:\Users\xxxxxxx\PycharmProjects\pythonProject\venv\lib\site-packages\pandas\core\indexes\base.py", line 3363, in get_loc
raise KeyError(key) from err
KeyError: 0
When loading the data directly from Excel (same data that produced by the function), for example, "CUSTOMER_PORTFOLIO-11082021_234057.xlsx" and running the following code:
sum_df = pd.read_excel("CUSTOMER_PORTFOLIO-11082021_234057.xlsx")
reindex_subset = sum_df[["GCAS", "Week_num", "Stat_unit_qty"]]
reindex_subset = reindex_subset.dropna()
reindex_subset = reindex_subset.set_index("Week_num")
reindex_subset = (reindex_subset.groupby("GCAS").apply(
lambda x: x.reindex(list(range(reindex_subset.index.min(), reindex_subset.index.max() + 1)), fill_value=0))
.drop("GCAS", axis=1).
reset_index("GCAS").
fillna(0).
reset_index())
reindex_subset = reindex_subset.drop(columns=["Stat_unit_qty"])
final_df = pd.merge(sum_df, reindex_subset, how="outer", on=["GCAS", "Week_num"])
The code gives me results that I want.
What am I missing? I tried searching for this on SO overflow, but no success as of yet. I have tried resetting index, but unfortunately, it didn't help.
UPDATE: Pasted the full error traceback. Moreover, as I said above, when I run the function without the part of the code that "reindexes" the data, the code works just fine. I have also tried and still no luck:
df_new = df.copy(deep=True)
df_week= df_week.copy(deep=True)
And when I run the "reindex" part of the code on a finished .xlsx, it works just fine, which is strange in itself.

How to read json format from binance api using pandas?

I want to get live prices of concurrency from rest api of binance.
I am using:
def inCoin(coin):
url = 'https://api.binance.com/api/v3/ticker/price?symbol='+coin+'USDT'
df = pd.read_json(url)
df.columns = ["symbol","price"]
return df
It gives the following error when this function is called:
Traceback (most recent call last):
File "ee2.py", line 201, in <module>
aa = inCoin('BTC')
File "ee2.py", line 145, in inCoin
df = pd.read_json(url, orient='index')
File "/home/hspace/.local/lib/python3.6/site-packages/pandas/io/json/json.py", line 422, in read_json
result = json_reader.read()
File "/home/hspace/.local/lib/python3.6/site-packages/pandas/io/json/json.py", line 529, in read
obj = self._get_object_parser(self.data)
File "/home/hspace/.local/lib/python3.6/site-packages/pandas/io/json/json.py", line 546, in _get_object_parser
obj = FrameParser(json, **kwargs).parse()
File "/home/hspace/.local/lib/python3.6/site-packages/pandas/io/json/json.py", line 638, in parse
self._parse_no_numpy()
File "/home/hspace/.local/lib/python3.6/site-packages/pandas/io/json/json.py", line 861, in _parse_no_numpy
loads(json, precise_float=self.precise_float), dtype=None).T
File "/home/hspace/.local/lib/python3.6/site-packages/pandas/core/frame.py", line 348, in __init__
mgr = self._init_dict(data, index, columns, dtype=dtype)
File "/home/hspace/.local/lib/python3.6/site-packages/pandas/core/frame.py", line 459, in _init_dict
return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
File "/home/hspace/.local/lib/python3.6/site-packages/pandas/core/frame.py", line 7356, in _arrays_to_mgr
index = extract_index(arrays)
File "/home/hspace/.local/lib/python3.6/site-packages/pandas/core/frame.py", line 7393, in extract_index
raise ValueError('If using all scalar values, you must pass'
ValueError: If using all scalar values, you must pass an index
Previously, I used this function to call historical data from binance api:
def Cryptodata2(symbol,tick_interval='1m'):
url = 'https://api.binance.com/api/v1/klines?symbol='+symbol+'&interval='+tick_interval
df = pd.read_json(url)
df.columns = [ "date","open","high","low","close","volume",
"close time","quote asset volume","number of trades","taker buy base asset volume",
"Taker buy quote asset volume","ignore"]
df['date'] = pd.to_datetime(df['date'],dayfirst=True, unit = 'ms')
df.set_index('date',inplace=True)
del df['ignore']
return df
And this works fluently.
I just want price of that coin and show it as an integer or dataframe from this url:
https://api.binance.com/api/v3/ticker/price?symbol=BTCUSDT
Thanks for helping me.
Also, it would be great if you could provide more detail on debugging such "value" errors.

MemoryError merging two dataframes with pandas and dasks---how can I do this?

I have two dataframes in pandas. I would like to merge these two dataframes, but I keep running into Memory Errors. What is a work around I could use?
Here is the setup:
import pandas as pd
df1 = pd.read_cvs("first1.csv")
df2 = pd.read_csv("second2.csv")
print(df1.shape) # output: (4757076, 4)
print(df2.shape) # output: (428764, 45)
df1.head
column1 begin end category
0 class1 10001 10468 third
1 class1 10469 11447 third
2 class1 11505 11675 fourth
3 class2 15265 15355 seventh
4 class2 15798 15849 second
print(df2.shape) # (428764, 45)
column1 begin ....
0 class1 10524 ....
1 class1 10541 ....
2 class1 10549 ....
3 class1 10565 ...
4 class1 10596 ...
I would simply like to merge these two DataFrames on "column1". However, this always causes a memory error.
Let's try this in pandas first, on a system with approximately 2 TB of RAM and hundreds of threads:
import pandas as pd
df1 = pd.read_cvs("first1.csv")
df2 = pd.read_csv("second2.csv")
merged = pd.merge(df1, df2, on="column1", how="outer", suffixes=("","_repeated")
Here's the error I get:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/nfs/sw/python/python-3.5.1/lib/python3.5/site-packages/pandas/tools/merge.py", line 39, in merge
return op.get_result()
File "/nfs/sw/python/python-3.5.1/lib/python3.5/site-packages/pandas/tools/merge.py", line 217, in get_result
join_index, left_indexer, right_indexer = self._get_join_info()
File "/nfs/sw/python/python-3.5.1/lib/python3.5/site-packages/pandas/tools/merge.py", line 353, in _get_join_info
sort=self.sort, how=self.how)
File "/nfs/sw/python/python-3.5.1/lib/python3.5/site-packages/pandas/tools/merge.py", line 559, in _get_join_indexers
return join_func(lkey, rkey, count, **kwargs)
File "pandas/src/join.pyx", line 160, in pandas.algos.full_outer_join (pandas/algos.c:61256)
MemoryError
That didn't work. Let's try with dask:
import pandas as pd
import dask.dataframe as dd
from numpy import nan
ddf1 = dd.from_pandas(df1, npartitions=2)
ddf2 = dd.from_pandas(df2, npartitions=2)
merged = dd.merge(ddf1, ddf2, on="column1", how="outer", suffixes=("","_repeat")).compute(num_workers=60)
Here's the error I get:
Traceback (most recent call last):
File "repeat_finder.py", line 15, in <module>
merged = dd.merge(ddf1, ddf2,on="column1", how="outer", suffixes=("","_repeat")).compute(num_workers=60)
File "/path/python3.5/site-packages/dask/base.py", line 78, in compute
return compute(self, **kwargs)[0]
File "/path/python3.5/site-packages/dask/base.py", line 178, in compute
results = get(dsk, keys, **kwargs)
File "/path/python3.5/site-packages/dask/threaded.py", line 69, in get
**kwargs)
File "/path/python3.5/site-packages/dask/async.py", line 502, in get_async
raise(remote_exception(res, tb))
dask.async.MemoryError:
Traceback
---------
File "/path/python3.5/site-packages/dask/async.py", line 268, in execute_task
result = _execute_task(task, data)
File "/path/python3.5/site-packages/dask/async.py", line 249, in _execute_task
return func(*args2)
File "/path/python3.5/site-packages/dask/dataframe/methods.py", line 221, in merge
suffixes=suffixes, indicator=indicator)
File "/path/python3.5/site-packages/pandas/tools/merge.py", line 59, in merge
return op.get_result()
File "/path/python3.5/site-packages/pandas/tools/merge.py", line 503, in get_result
join_index, left_indexer, right_indexer = self._get_join_info()
File "/path/python3.5/site-packages/pandas/tools/merge.py", line 667, in _get_join_info
right_indexer) = self._get_join_indexers()
File "/path/python3.5/site-packages/pandas/tools/merge.py", line 647, in _get_join_indexers
how=self.how)
File "/path/python3.5/site-packages/pandas/tools/merge.py", line 876, in _get_join_indexers
return join_func(lkey, rkey, count, **kwargs)
File "pandas/src/join.pyx", line 226, in pandas._join.full_outer_join (pandas/src/join.c:11286)
File "pandas/src/join.pyx", line 231, in pandas._join._get_result_indexer (pandas/src/join.c:11474)
File "path/python3.5/site-packages/pandas/core/algorithms.py", line 1072, in take_nd
out = np.empty(out_shape, dtype=dtype, order='F')
How could I get this to work, even if it was shamelessly inefficient?
EDIT: In response to the suggestion of merging on two columns/indices, I don't think I can do this. Here is the code I am trying to run:
import pandas as pd
import dask.dataframe as dd
df1 = pd.read_cvs("first1.csv")
df2 = pd.read_csv("second2.csv")
ddf1 = dd.from_pandas(df1, npartitions=2)
ddf2 = dd.from_pandas(df2, npartitions=2)
merged = dd.merge(ddf1, ddf2, on="column1", how="outer", suffixes=("","_repeat")).compute(num_workers=60)
merged = merged[(ddf1.column1 == row.column1) & (ddf2.begin >= ddf1.begin) & (ddf2.begin <= ddf1.end)]
merged = dd.merge(ddf2, merged, on = ["column1"]).compute(num_workers=60)
merged.to_csv("output.csv", index=False)
You can't just merge the two data frames on column1 only, as column1 is not a unique identifier for each instance in either data frame. Try:
merged = pd.merge(df1, df2, on=["column1", "begin"], how="outer", suffixes=("","_repeated"))
If you also have end column in df2, you may probably need to try:
merged = pd.merge(df1, df2, on=["column1", "begin", "end"], how="outer", suffixes=("","_repeated"))

"Already tz-aware" error when reading h5 file using pandas, python 3 (but not 2)

I have an h5 store named weather.h5. My default Python environment is 3.5.2. When I try to read this store I get TypeError: Already tz-aware, use tz_convert to convert.
I've tried both pd.read_hdf('weather.h5','weather_history') and pd.io.pytables.HDFStore('weather.h5')['weather_history], but I get the error no matter what.
I can open the h5 in a Python 2.7 environment. Is this a bug in Python 3 / pandas?
I have the same issue. I'm using Anaconda Python: 3.4.5 and 2.7.3. Both are using pandas 0.18.1.
Here is a reproducible example:
generate.py (to be executed with Python2):
import pandas as pd
from pandas import HDFStore
index = pd.DatetimeIndex(['2017-06-20 06:00:06.984630-05:00', '2017-06-20 06:03:01.042616-05:00'], dtype='datetime64[ns, CST6CDT]', freq=None)
p1 = [0, 1]
p2 = [0, 2]
# Saving any of these dataframes cause issues
df1 = pd.DataFrame({"p1":p1, "p2":p2}, index=index)
df2 = pd.DataFrame({"p1":p1, "p2":p2, "i":index})
store = HDFStore("./test_issue.h5")
store['df'] = df1
#store['df'] = df2
store.close()
read_issue.py:
import pandas as pd
from pandas import HDFStore
store = HDFStore("./test_issue.h5", mode="r")
df = store['/df']
store.close()
print(df)
Running read_issue.py in Python2 has no issues and produces this output:
p1 p2
2017-06-20 11:00:06.984630-05:00 0 0
2017-06-20 11:03:01.042616-05:00 1 2
But running it in Python3 produces Error with this traceback:
Traceback (most recent call last):
File "read_issue.py", line 5, in
df = store['df']
File "/home/denper/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/io/pytables.py", line 417, in getitem
return self.get(key)
File "/home/denper/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/io/pytables.py", line 634, in get
return self._read_group(group)
File "/home/denper/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/io/pytables.py", line 1272, in _read_group
return s.read(**kwargs)
File "/home/denper/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/io/pytables.py", line 2779, in read
ax = self.read_index('axis%d' % i)
File "/home/denper/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/io/pytables.py", line 2367, in read_index
_, index = self.read_index_node(getattr(self.group, key))
File "/home/denper/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/io/pytables.py", line 2492, in read_index_node
_unconvert_index(data, kind, encoding=self.encoding), **kwargs)
File "/home/denper/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/indexes/base.py", line 153, in new
result = DatetimeIndex(data, copy=copy, name=name, **kwargs)
File "/home/denper/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/util/decorators.py", line 91, in wrapper
return func(*args, **kwargs)
File "/home/denper/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/tseries/index.py", line 321, in new
raise TypeError("Already tz-aware, use tz_convert "
TypeError: Already tz-aware, use tz_convert to convert.
Closing remaining open files:./test_issue.h5...done
So, there is an issue with indices. However, if you save df2 in generate.py (datetime as a column, not as an index), then Python3 in read_issue.py produces a different error:
Traceback (most recent call last):
File "read_issue.py", line 5, in
df = store['/df']
File "/home/denper/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/io/pytables.py", line 417, in getitem
return self.get(key)
File "/home/denper/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/io/pytables.py", line 634, in get
return self._read_group(group)
File "/home/denper/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/io/pytables.py", line 1272, in _read_group
return s.read(**kwargs)
File "/home/denper/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/io/pytables.py", line 2788, in read
placement=items.get_indexer(blk_items))
File "/home/denper/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/core/internals.py", line 2518, in make_block
return klass(values, ndim=ndim, fastpath=fastpath, placement=placement)
File "/home/denper/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/core/internals.py", line 90, in init
len(self.mgr_locs)))
ValueError: Wrong number of items passed 2, placement implies 1
Closing remaining open files:./test_issue.h5...done
Also, if you execute generate_issue.py in Python3 (saving either df1 or df2), then there is no problem executing read_issue.py in either Python3 or Python2

Categories