Python Pandas convert date to epoch timestamp - python
From CSV file, i'm trying with pandas to convert a date column to epoch timestamp as follow, but i got some errors:
csv:
<<Electric power and temperature Information>>
Date,Electric power average,Electric power maximum value,Electric power minimum value,...,...
2021/12/02 00:00:00,1524,1553,1506,22,22,22,,,,,,,21,21,21,,,,,,,,,,,,,,,,,,,,,,,,
2021/12/01 22:00:00,1521,1547,1468,22,22,22,,,,,,,21,21,21,,,,,,,,,,,,,,,,,,,,,,,,
2021/12/01 20:00:00,1546,1613,1524,22,22,22,,,,,,,21,21,21,,,,,,,,,,,,,,,,,,,,,,,,
2021/12/01 18:00:00,1553,1595,1525,22,22,22,,,,,,,21,21,21,,,,,,,,,,,,,,,,,,,,,,,,
2021/12/01 16:00:00,1541,1593,1520,22,22,22,,,,,,,21,21,21,,,,,,,,,,,,,,,,,,,,,,,,
2021/12/01 14:00:00,1540,1580,1514,22,22,22,,,,,,,21,21,21,,,,,,,,,,,,,,,,,,,,,,,,
code:
csv_envfile = csvfile.csv
df = pd.read_csv(csv_envfile[0], skiprows=[0])
date_pattern='%Y/%m/%d %H:%M:%S '
df['epoch'] = df.apply(lambda row: int(time.mktime(time.strptime(row.time,date_pattern))), axis=0) # create epoch as a column
print("epoch:",df['epoch'])
error:
Traceback (most recent call last):
File "./02-pickle-client.py", line 622, in <module>
main()
File "./02-pickle-client.py", line 576, in main
execute_run_csv_environnement(confcsv_path, storage_type, serial)
File "./02-pickle-client.py", line 434, in execute_run_csv_environnement
run_csv_environnement(sock, delay, csvfile, storage_type, serial)
File "./02-pickle-client.py", line 402, in run_csv_environnement
df['epoch'] = df.apply(lambda row: int(time.mktime(time.strptime(row.time,date_pattern))), axis=0) # create epoch as a column
File "/usr/local/lib64/python3.6/site-packages/pandas/core/frame.py", line 7552, in apply
return op.get_result()
File "/usr/local/lib64/python3.6/site-packages/pandas/core/apply.py", line 185, in get_result
return self.apply_standard()
File "/usr/local/lib64/python3.6/site-packages/pandas/core/apply.py", line 276, in apply_standard
results, res_index = self.apply_series_generator()
File "/usr/local/lib64/python3.6/site-packages/pandas/core/apply.py", line 305, in apply_series_generator
results[i] = self.f(v)
File "./02-pickle-client.py", line 402, in <lambda>
df['epoch'] = df.apply(lambda row: int(time.mktime(time.strptime(row.time,date_pattern))), axis=0) # create epoch as a column
File "/usr/local/lib64/python3.6/site-packages/pandas/core/generic.py", line 5141, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'Series' object has no attribute 'time'
Many thanks for help
You should select the Date column when applying the lambda function. In your case this should work:
import pandas as pd
import time
csv_envfile = csvfile.csv
df = pd.read_csv(csv_envfile[0], skiprows=[0])
date_pattern='%Y/%m/%d %H:%M:%S'
df['epoch'] = df["Date"].apply(lambda row: int(time.mktime(time.strptime(row,date_pattern))))
Related
Handling OutOfBoundsDatetime error in pandas
I am trying to apply few functions on a pandas data frame, but am getting OutOfBoundsDatetime error - def data_cleanser(data): clean_df = data.str.strip().str.replace(r'\\', '') return clean_df connection = pyodbc.connect(conn) sql = 'SELECT * FROM {}'.format(tablename) df = pd.read_sql_query(sql, connection) df.replace([None], np.nan, inplace=True) df.fillna('', inplace=True) df=df.applymap(str) df = df.apply(lambda x: data_cleanser(x)) Error message: During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/Python_Scripts/lib64/python3.4/site-packages/pandas/core/internals.py", line 763, in replace copy=not inplace) for b in blocks] File "/Python_Scripts/lib64/python3.4/site-packages/pandas/core/internals.py", line 763, in <listcomp> copy=not inplace) for b in blocks] File "/Python_Scripts/lib64/python3.4/site-packages/pandas/core/internals.py", line 2135, in convert blocks = self.split_and_operate(None, f, False) File "/Python_Scripts/lib64/python3.4/site-packages/pandas/core/internals.py", line 478, in split_and_operate nv = f(m, v, i) File "/Python_Scripts/lib64/python3.4/site-packages/pandas/core/internals.py", line 2125, in f values = fn(v.ravel(), **fn_kwargs) File "/Python_Scripts/lib64/python3.4/site-packages/pandas/core/dtypes/cast.py", line 807, in soft_convert_objects values = lib.maybe_convert_objects(values, convert_datetime=datetime) File "pandas/_libs/src/inference.pyx", line 1290, in pandas._libs.lib.maybe_convert_objects File "pandas/_libs/tslib.pyx", line 1575, in pandas._libs.tslib.convert_to_tsobject File "pandas/_libs/tslib.pyx", line 1669, in pandas._libs.tslib.convert_datetime_to_tsobject File "pandas/_libs/tslib.pyx", line 1848, in pandas._libs.tslib._check_dts_bounds pandas._libs.tslib.OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1-01-01 07:00:00 Sample data frame: appointment_id start_time end_time emp_id 302205 2016-10-26 17:30:00 2016-10-26 18:30:00 45807 462501 2017-04-10 13:00:00 NaT 45807 How can i avoid this error?
KeyError: 0 when used in existing function, otherwise the code works fine
I want to do the following: I have data in long format organized by dates Sometimes, data is missing as it there is no record of it I found a solution by interpolating missing data using reindex which works fine when used outside of function, but for some reason, doesn't work when used inside of a function def sum_customer_portfolio(country, sold_to): df = pd.merge(etl_customer_portfolio(), etl_week(), how="left", on=["Country", "GCAS"]) df = df.loc[df["Country"].isin(country)] df = df.loc[df["Sold_to"].isin(sold_to)] df_week = etl_week() df_week = df_week.dropna(subset=["Sold_to"]) df_week = df_week[["Week_num", "Date_range"]] df_week = df_week.drop_duplicates(subset=["Date_range"]) sum_df = pd.merge(df, df_week, how="outer", on=["Week_num", "Date_range"]) sum_df["Stat_unit_qty"] = sum_df["Stat_unit_qty"].fillna(0, axis=0) sum_df[["Country", "Sold_to", "Customer"]] = sum_df[["Country", "Sold_to", "Customer"]].fillna(method="ffill", axis=0) sum_df = sum_df.fillna("DUMMY_NOT_USE").replace("DUMMY_NOT_USE", np.nan) reindex_subset = sum_df[["GCAS", "Week_num", "Stat_unit_qty"]] reindex_subset = reindex_subset.dropna() reindex_subset = reindex_subset.set_index("Week_num") reindex_subset = (reindex_subset.groupby("GCAS").apply( lambda x: x.reindex(list(range(reindex_subset.index.min(), reindex_subset.index.max() + 1)), fill_value=0)) .drop("GCAS", axis=1). reset_index("GCAS"). fillna(0). reset_index()) reindex_subset = reindex_subset.drop(columns=["Stat_unit_qty"]) final_df = pd.merge(sum_df, reindex_subset, how="outer", on=["GCAS", "Week_num"]) current_date = datetime.now().strftime("%d%m%Y_%H%M%S") # return sum_df.to_excel(f"CUSTOMER_PORTFOLIO-{current_date}.xlsx", sheet_name="GCAS_SUM", index=False) return final_df Code above keeps giving me the following error: Traceback (most recent call last): File "C:\Users\xxxxxxx\PycharmProjects\pythonProject\venv\lib\site-packages\pandas\core\indexes\base.py", line 3361, in get_loc return self._engine.get_loc(casted_key) File "pandas\_libs\index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc File "pandas\_libs\index.pyx", line 103, in pandas._libs.index.IndexEngine.get_loc File "pandas\_libs\index.pyx", line 135, in pandas._libs.index.IndexEngine._get_loc_duplicates File "pandas\_libs\index_class_helper.pxi", line 51, in pandas._libs.index.Float64Engine._maybe_get_bool_indexer File "pandas\_libs\index.pyx", line 161, in pandas._libs.index.IndexEngine._unpack_bool_indexer KeyError: 0 The above exception was the direct cause of the following exception: Traceback (most recent call last): File "C:\Users\xxxxxxx\PycharmProjects\pythonProject\main.py", line 167, in <module> sum_customer_portfolio(country=["Croatia", "Slovenia"], sold_to=[2000829798, 2000558171]).to_excel(writer, index=False, sheet_name="GCAS_SUM") File "C:\Users\xxxxxxx\PycharmProjects\pythonProject\main.py", line 113, in sum_customer_portfolio reindex_subset = (reindex_subset.groupby(["GCAS", "Sold_to"]).apply( File "C:\Users\xxxxxxx\PycharmProjects\pythonProject\venv\lib\site-packages\pandas\core\groupby\groupby.py", line 1253, in apply result = self._python_apply_general(f, self._selected_obj) File "C:\Users\xxxxxxx\PycharmProjects\pythonProject\venv\lib\site-packages\pandas\core\groupby\groupby.py", line 1287, in _python_apply_general keys, values, mutated = self.grouper.apply(f, data, self.axis) File "C:\Users\xxxxxxx\PycharmProjects\pythonProject\venv\lib\site-packages\pandas\core\groupby\ops.py", line 783, in apply result_values, mutated = splitter.fast_apply(f, sdata, group_keys) File "C:\Users\xxxxxxx\PycharmProjects\pythonProject\venv\lib\site-packages\pandas\core\groupby\ops.py", line 1328, in fast_apply return libreduction.apply_frame_axis0(sdata, f, names, starts, ends) File "pandas\_libs\reduction.pyx", line 369, in pandas._libs.reduction.apply_frame_axis0 File "pandas\_libs\reduction.pyx", line 428, in pandas._libs.reduction.BlockSlider.__init__ File "C:\Users\xxxxxxx\PycharmProjects\pythonProject\venv\lib\site-packages\pandas\core\frame.py", line 3430, in __getitem__ indexer = convert_to_index_sliceable(self, key) File "C:\Users\xxxxxxx\PycharmProjects\pythonProject\venv\lib\site-packages\pandas\core\indexing.py", line 2329, in convert_to_index_sliceable return idx._convert_slice_indexer(key, kind="getitem") File "C:\Users\xxxxxxx\PycharmProjects\pythonProject\venv\lib\site-packages\pandas\core\indexes\numeric.py", line 242, in _convert_slice_indexer return self.slice_indexer(key.start, key.stop, key.step, kind=kind) File "C:\Users\xxxxxxx\PycharmProjects\pythonProject\venv\lib\site-packages\pandas\core\indexes\base.py", line 5686, in slice_indexer start_slice, end_slice = self.slice_locs(start, end, step=step) File "C:\Users\xxxxxxx\PycharmProjects\pythonProject\venv\lib\site-packages\pandas\core\indexes\base.py", line 5894, in slice_locs end_slice = self.get_slice_bound(end, "right") File "C:\Users\xxxxxxx\PycharmProjects\pythonProject\venv\lib\site-packages\pandas\core\indexes\base.py", line 5808, in get_slice_bound raise err File "C:\Users\xxxxxxx\PycharmProjects\pythonProject\venv\lib\site-packages\pandas\core\indexes\base.py", line 5802, in get_slice_bound slc = self.get_loc(label) File "C:\Users\xxxxxxx\PycharmProjects\pythonProject\venv\lib\site-packages\pandas\core\indexes\base.py", line 3363, in get_loc raise KeyError(key) from err KeyError: 0 When loading the data directly from Excel (same data that produced by the function), for example, "CUSTOMER_PORTFOLIO-11082021_234057.xlsx" and running the following code: sum_df = pd.read_excel("CUSTOMER_PORTFOLIO-11082021_234057.xlsx") reindex_subset = sum_df[["GCAS", "Week_num", "Stat_unit_qty"]] reindex_subset = reindex_subset.dropna() reindex_subset = reindex_subset.set_index("Week_num") reindex_subset = (reindex_subset.groupby("GCAS").apply( lambda x: x.reindex(list(range(reindex_subset.index.min(), reindex_subset.index.max() + 1)), fill_value=0)) .drop("GCAS", axis=1). reset_index("GCAS"). fillna(0). reset_index()) reindex_subset = reindex_subset.drop(columns=["Stat_unit_qty"]) final_df = pd.merge(sum_df, reindex_subset, how="outer", on=["GCAS", "Week_num"]) The code gives me results that I want. What am I missing? I tried searching for this on SO overflow, but no success as of yet. I have tried resetting index, but unfortunately, it didn't help. UPDATE: Pasted the full error traceback. Moreover, as I said above, when I run the function without the part of the code that "reindexes" the data, the code works just fine. I have also tried and still no luck: df_new = df.copy(deep=True) df_week= df_week.copy(deep=True) And when I run the "reindex" part of the code on a finished .xlsx, it works just fine, which is strange in itself.
How to read json format from binance api using pandas?
I want to get live prices of concurrency from rest api of binance. I am using: def inCoin(coin): url = 'https://api.binance.com/api/v3/ticker/price?symbol='+coin+'USDT' df = pd.read_json(url) df.columns = ["symbol","price"] return df It gives the following error when this function is called: Traceback (most recent call last): File "ee2.py", line 201, in <module> aa = inCoin('BTC') File "ee2.py", line 145, in inCoin df = pd.read_json(url, orient='index') File "/home/hspace/.local/lib/python3.6/site-packages/pandas/io/json/json.py", line 422, in read_json result = json_reader.read() File "/home/hspace/.local/lib/python3.6/site-packages/pandas/io/json/json.py", line 529, in read obj = self._get_object_parser(self.data) File "/home/hspace/.local/lib/python3.6/site-packages/pandas/io/json/json.py", line 546, in _get_object_parser obj = FrameParser(json, **kwargs).parse() File "/home/hspace/.local/lib/python3.6/site-packages/pandas/io/json/json.py", line 638, in parse self._parse_no_numpy() File "/home/hspace/.local/lib/python3.6/site-packages/pandas/io/json/json.py", line 861, in _parse_no_numpy loads(json, precise_float=self.precise_float), dtype=None).T File "/home/hspace/.local/lib/python3.6/site-packages/pandas/core/frame.py", line 348, in __init__ mgr = self._init_dict(data, index, columns, dtype=dtype) File "/home/hspace/.local/lib/python3.6/site-packages/pandas/core/frame.py", line 459, in _init_dict return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype) File "/home/hspace/.local/lib/python3.6/site-packages/pandas/core/frame.py", line 7356, in _arrays_to_mgr index = extract_index(arrays) File "/home/hspace/.local/lib/python3.6/site-packages/pandas/core/frame.py", line 7393, in extract_index raise ValueError('If using all scalar values, you must pass' ValueError: If using all scalar values, you must pass an index Previously, I used this function to call historical data from binance api: def Cryptodata2(symbol,tick_interval='1m'): url = 'https://api.binance.com/api/v1/klines?symbol='+symbol+'&interval='+tick_interval df = pd.read_json(url) df.columns = [ "date","open","high","low","close","volume", "close time","quote asset volume","number of trades","taker buy base asset volume", "Taker buy quote asset volume","ignore"] df['date'] = pd.to_datetime(df['date'],dayfirst=True, unit = 'ms') df.set_index('date',inplace=True) del df['ignore'] return df And this works fluently. I just want price of that coin and show it as an integer or dataframe from this url: https://api.binance.com/api/v3/ticker/price?symbol=BTCUSDT Thanks for helping me. Also, it would be great if you could provide more detail on debugging such "value" errors.
MemoryError merging two dataframes with pandas and dasks---how can I do this?
I have two dataframes in pandas. I would like to merge these two dataframes, but I keep running into Memory Errors. What is a work around I could use? Here is the setup: import pandas as pd df1 = pd.read_cvs("first1.csv") df2 = pd.read_csv("second2.csv") print(df1.shape) # output: (4757076, 4) print(df2.shape) # output: (428764, 45) df1.head column1 begin end category 0 class1 10001 10468 third 1 class1 10469 11447 third 2 class1 11505 11675 fourth 3 class2 15265 15355 seventh 4 class2 15798 15849 second print(df2.shape) # (428764, 45) column1 begin .... 0 class1 10524 .... 1 class1 10541 .... 2 class1 10549 .... 3 class1 10565 ... 4 class1 10596 ... I would simply like to merge these two DataFrames on "column1". However, this always causes a memory error. Let's try this in pandas first, on a system with approximately 2 TB of RAM and hundreds of threads: import pandas as pd df1 = pd.read_cvs("first1.csv") df2 = pd.read_csv("second2.csv") merged = pd.merge(df1, df2, on="column1", how="outer", suffixes=("","_repeated") Here's the error I get: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/nfs/sw/python/python-3.5.1/lib/python3.5/site-packages/pandas/tools/merge.py", line 39, in merge return op.get_result() File "/nfs/sw/python/python-3.5.1/lib/python3.5/site-packages/pandas/tools/merge.py", line 217, in get_result join_index, left_indexer, right_indexer = self._get_join_info() File "/nfs/sw/python/python-3.5.1/lib/python3.5/site-packages/pandas/tools/merge.py", line 353, in _get_join_info sort=self.sort, how=self.how) File "/nfs/sw/python/python-3.5.1/lib/python3.5/site-packages/pandas/tools/merge.py", line 559, in _get_join_indexers return join_func(lkey, rkey, count, **kwargs) File "pandas/src/join.pyx", line 160, in pandas.algos.full_outer_join (pandas/algos.c:61256) MemoryError That didn't work. Let's try with dask: import pandas as pd import dask.dataframe as dd from numpy import nan ddf1 = dd.from_pandas(df1, npartitions=2) ddf2 = dd.from_pandas(df2, npartitions=2) merged = dd.merge(ddf1, ddf2, on="column1", how="outer", suffixes=("","_repeat")).compute(num_workers=60) Here's the error I get: Traceback (most recent call last): File "repeat_finder.py", line 15, in <module> merged = dd.merge(ddf1, ddf2,on="column1", how="outer", suffixes=("","_repeat")).compute(num_workers=60) File "/path/python3.5/site-packages/dask/base.py", line 78, in compute return compute(self, **kwargs)[0] File "/path/python3.5/site-packages/dask/base.py", line 178, in compute results = get(dsk, keys, **kwargs) File "/path/python3.5/site-packages/dask/threaded.py", line 69, in get **kwargs) File "/path/python3.5/site-packages/dask/async.py", line 502, in get_async raise(remote_exception(res, tb)) dask.async.MemoryError: Traceback --------- File "/path/python3.5/site-packages/dask/async.py", line 268, in execute_task result = _execute_task(task, data) File "/path/python3.5/site-packages/dask/async.py", line 249, in _execute_task return func(*args2) File "/path/python3.5/site-packages/dask/dataframe/methods.py", line 221, in merge suffixes=suffixes, indicator=indicator) File "/path/python3.5/site-packages/pandas/tools/merge.py", line 59, in merge return op.get_result() File "/path/python3.5/site-packages/pandas/tools/merge.py", line 503, in get_result join_index, left_indexer, right_indexer = self._get_join_info() File "/path/python3.5/site-packages/pandas/tools/merge.py", line 667, in _get_join_info right_indexer) = self._get_join_indexers() File "/path/python3.5/site-packages/pandas/tools/merge.py", line 647, in _get_join_indexers how=self.how) File "/path/python3.5/site-packages/pandas/tools/merge.py", line 876, in _get_join_indexers return join_func(lkey, rkey, count, **kwargs) File "pandas/src/join.pyx", line 226, in pandas._join.full_outer_join (pandas/src/join.c:11286) File "pandas/src/join.pyx", line 231, in pandas._join._get_result_indexer (pandas/src/join.c:11474) File "path/python3.5/site-packages/pandas/core/algorithms.py", line 1072, in take_nd out = np.empty(out_shape, dtype=dtype, order='F') How could I get this to work, even if it was shamelessly inefficient? EDIT: In response to the suggestion of merging on two columns/indices, I don't think I can do this. Here is the code I am trying to run: import pandas as pd import dask.dataframe as dd df1 = pd.read_cvs("first1.csv") df2 = pd.read_csv("second2.csv") ddf1 = dd.from_pandas(df1, npartitions=2) ddf2 = dd.from_pandas(df2, npartitions=2) merged = dd.merge(ddf1, ddf2, on="column1", how="outer", suffixes=("","_repeat")).compute(num_workers=60) merged = merged[(ddf1.column1 == row.column1) & (ddf2.begin >= ddf1.begin) & (ddf2.begin <= ddf1.end)] merged = dd.merge(ddf2, merged, on = ["column1"]).compute(num_workers=60) merged.to_csv("output.csv", index=False)
You can't just merge the two data frames on column1 only, as column1 is not a unique identifier for each instance in either data frame. Try: merged = pd.merge(df1, df2, on=["column1", "begin"], how="outer", suffixes=("","_repeated")) If you also have end column in df2, you may probably need to try: merged = pd.merge(df1, df2, on=["column1", "begin", "end"], how="outer", suffixes=("","_repeated"))
"Already tz-aware" error when reading h5 file using pandas, python 3 (but not 2)
I have an h5 store named weather.h5. My default Python environment is 3.5.2. When I try to read this store I get TypeError: Already tz-aware, use tz_convert to convert. I've tried both pd.read_hdf('weather.h5','weather_history') and pd.io.pytables.HDFStore('weather.h5')['weather_history], but I get the error no matter what. I can open the h5 in a Python 2.7 environment. Is this a bug in Python 3 / pandas?
I have the same issue. I'm using Anaconda Python: 3.4.5 and 2.7.3. Both are using pandas 0.18.1. Here is a reproducible example: generate.py (to be executed with Python2): import pandas as pd from pandas import HDFStore index = pd.DatetimeIndex(['2017-06-20 06:00:06.984630-05:00', '2017-06-20 06:03:01.042616-05:00'], dtype='datetime64[ns, CST6CDT]', freq=None) p1 = [0, 1] p2 = [0, 2] # Saving any of these dataframes cause issues df1 = pd.DataFrame({"p1":p1, "p2":p2}, index=index) df2 = pd.DataFrame({"p1":p1, "p2":p2, "i":index}) store = HDFStore("./test_issue.h5") store['df'] = df1 #store['df'] = df2 store.close() read_issue.py: import pandas as pd from pandas import HDFStore store = HDFStore("./test_issue.h5", mode="r") df = store['/df'] store.close() print(df) Running read_issue.py in Python2 has no issues and produces this output: p1 p2 2017-06-20 11:00:06.984630-05:00 0 0 2017-06-20 11:03:01.042616-05:00 1 2 But running it in Python3 produces Error with this traceback: Traceback (most recent call last): File "read_issue.py", line 5, in df = store['df'] File "/home/denper/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/io/pytables.py", line 417, in getitem return self.get(key) File "/home/denper/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/io/pytables.py", line 634, in get return self._read_group(group) File "/home/denper/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/io/pytables.py", line 1272, in _read_group return s.read(**kwargs) File "/home/denper/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/io/pytables.py", line 2779, in read ax = self.read_index('axis%d' % i) File "/home/denper/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/io/pytables.py", line 2367, in read_index _, index = self.read_index_node(getattr(self.group, key)) File "/home/denper/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/io/pytables.py", line 2492, in read_index_node _unconvert_index(data, kind, encoding=self.encoding), **kwargs) File "/home/denper/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/indexes/base.py", line 153, in new result = DatetimeIndex(data, copy=copy, name=name, **kwargs) File "/home/denper/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/util/decorators.py", line 91, in wrapper return func(*args, **kwargs) File "/home/denper/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/tseries/index.py", line 321, in new raise TypeError("Already tz-aware, use tz_convert " TypeError: Already tz-aware, use tz_convert to convert. Closing remaining open files:./test_issue.h5...done So, there is an issue with indices. However, if you save df2 in generate.py (datetime as a column, not as an index), then Python3 in read_issue.py produces a different error: Traceback (most recent call last): File "read_issue.py", line 5, in df = store['/df'] File "/home/denper/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/io/pytables.py", line 417, in getitem return self.get(key) File "/home/denper/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/io/pytables.py", line 634, in get return self._read_group(group) File "/home/denper/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/io/pytables.py", line 1272, in _read_group return s.read(**kwargs) File "/home/denper/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/io/pytables.py", line 2788, in read placement=items.get_indexer(blk_items)) File "/home/denper/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/core/internals.py", line 2518, in make_block return klass(values, ndim=ndim, fastpath=fastpath, placement=placement) File "/home/denper/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/core/internals.py", line 90, in init len(self.mgr_locs))) ValueError: Wrong number of items passed 2, placement implies 1 Closing remaining open files:./test_issue.h5...done Also, if you execute generate_issue.py in Python3 (saving either df1 or df2), then there is no problem executing read_issue.py in either Python3 or Python2