How to read json format from binance api using pandas?

How to read json format from binance api using pandas? - python

I want to get live prices of concurrency from rest api of binance.
I am using:
def inCoin(coin):
url = 'https://api.binance.com/api/v3/ticker/price?symbol='+coin+'USDT'
df = pd.read_json(url)
df.columns = ["symbol","price"]
return df
It gives the following error when this function is called:
Traceback (most recent call last):
File "ee2.py", line 201, in <module>
aa = inCoin('BTC')
File "ee2.py", line 145, in inCoin
df = pd.read_json(url, orient='index')
File "/home/hspace/.local/lib/python3.6/site-packages/pandas/io/json/json.py", line 422, in read_json
result = json_reader.read()
File "/home/hspace/.local/lib/python3.6/site-packages/pandas/io/json/json.py", line 529, in read
obj = self._get_object_parser(self.data)
File "/home/hspace/.local/lib/python3.6/site-packages/pandas/io/json/json.py", line 546, in _get_object_parser
obj = FrameParser(json, **kwargs).parse()
File "/home/hspace/.local/lib/python3.6/site-packages/pandas/io/json/json.py", line 638, in parse
self._parse_no_numpy()
File "/home/hspace/.local/lib/python3.6/site-packages/pandas/io/json/json.py", line 861, in _parse_no_numpy
loads(json, precise_float=self.precise_float), dtype=None).T
File "/home/hspace/.local/lib/python3.6/site-packages/pandas/core/frame.py", line 348, in __init__
mgr = self._init_dict(data, index, columns, dtype=dtype)
File "/home/hspace/.local/lib/python3.6/site-packages/pandas/core/frame.py", line 459, in _init_dict
return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
File "/home/hspace/.local/lib/python3.6/site-packages/pandas/core/frame.py", line 7356, in _arrays_to_mgr
index = extract_index(arrays)
File "/home/hspace/.local/lib/python3.6/site-packages/pandas/core/frame.py", line 7393, in extract_index
raise ValueError('If using all scalar values, you must pass'
ValueError: If using all scalar values, you must pass an index
Previously, I used this function to call historical data from binance api:
def Cryptodata2(symbol,tick_interval='1m'):
url = 'https://api.binance.com/api/v1/klines?symbol='+symbol+'&interval='+tick_interval
df = pd.read_json(url)
df.columns = [ "date","open","high","low","close","volume",
"close time","quote asset volume","number of trades","taker buy base asset volume",
"Taker buy quote asset volume","ignore"]
df['date'] = pd.to_datetime(df['date'],dayfirst=True, unit = 'ms')
df.set_index('date',inplace=True)
del df['ignore']
return df
And this works fluently.
I just want price of that coin and show it as an integer or dataframe from this url:
https://api.binance.com/api/v3/ticker/price?symbol=BTCUSDT
Thanks for helping me.
Also, it would be great if you could provide more detail on debugging such "value" errors.

Related

Memory error when exporting large data set processed in python and Dask

I am trying to clean and filter large data I get each month. Recently the data size has grown even larger for a variety of reasons and I can no longer use pandas. I've been attempting to find an alternative and so far Dask has seemed to work until it comes to the export step. My simplified code is:
import pandas as pd
import numpy as np
import dask.dataframe as dd
import dask.array as da
MAP = pd.read_excel('map.csv')
MAP = MAP.from_pandas(MAP,1)
MAP2 = dd.read_csv('map2.csv')
MAP3 = dd.read_CSV('map3.csv')
MAP = dd.merge(
MAP,
MAP2,
how="left",
left_on=("id1", "id2", "id3"),
right_on=("id1", "id2", "id3"),
indicator=False)
MAP = MAP.drop_duplicates(subset=["id1", "id2", "id3",'col1','col2' ])
BIG_DATA = dd.read_csv("BIG_DATA.gz",
sep='|',
compression='gzip',
header=None,
blocksize=None,
dtype={0: np.int64, 1: np.int64, 4: np.int16, 6: str, 8: np.int64, 9: np.float32, 10: np.float32,
11: np.float32, 19: str, 32: str, 37: np.int32, 40: np.float32})
BIG_DATA = pd.merge(
BIG_DATA,
MAP3,
how="left",
left_on=("id3", "id4"),
right_on=("id3", "id4"),
indicator=False)
BIG_DATA = BIG_DATA[filter condition]
groupvars = [g1, g2, g3, g4, g5, g6, g7...g17]
sumlist = [s1, s2, s3, s4]
BIG_DATA = BIG_DATA.groupby(groupvars)[sumlist].sum().reset_index()
BIG_DATA = pd.merge(
BIG_DATA,
MAP,
how="outer",
left_on=("id1", "id2"),
right_on=("id1", "id2"),
indicator=True)
BIG_DATA = BIG_DATA[(BIG_DATA['_merge'].isin(['right_only']) == False)]
BIG_DATA1 = BIG_DATA[filter condition1]
BIG_DATA2 = BIG_DATA[filter condition2]
OUTPUT = pd.concat([BIG_DATA1, BIG_DATA2]).reset_index()
OUTPUT = OUTPUT.repartition(npartitions=100000) #have tried multiple values here
OUTPUT.to_csv(r'\\files\User\test.csv', single_file=True)
When using pandas, this process crashes at the groupby statment. I thought dask might be the way around this, but it seems to always fail when I try to export to csv. I'm new to python and dask, but I'm guessing it is delaying the groupby statement until the export and failing for the same reason as pandas? I've created the same result set using fortran and it results in a 100mb csv file with approximately 600k rows of data. I'm not really sure how to go about changing this so that it will work.
Exact error:
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\envs\pythonProject1\lib\site-packages\IPython\core\interactiveshell.py", line 3553, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-52-29ce59d87600>", line 350, in <cell line: 350>
plinePSA.to_csv(r'\\files\User\test.csv', single_file=True, chunksize = 100)
File "C:\ProgramData\Anaconda3\envs\pythonProject1\lib\site-packages\dask\dataframe\core.py", line 1699, in to_csv
return to_csv(self, filename, **kwargs)
File "C:\ProgramData\Anaconda3\envs\pythonProject1\lib\site-packages\dask\dataframe\io\csv.py", line 972, in to_csv
return list(dask.compute(*values, **compute_kwargs))
File "C:\ProgramData\Anaconda3\envs\pythonProject1\lib\site-packages\dask\base.py", line 598, in compute
results = schedule(dsk, keys, **kwargs)
File "C:\ProgramData\Anaconda3\envs\pythonProject1\lib\site-packages\dask\threaded.py", line 89, in get
results = get_async(
File "C:\ProgramData\Anaconda3\envs\pythonProject1\lib\site-packages\dask\local.py", line 511, in get_async
raise_exception(exc, tb)
File "C:\ProgramData\Anaconda3\envs\pythonProject1\lib\site-packages\dask\local.py", line 319, in reraise
raise exc
File "C:\ProgramData\Anaconda3\envs\pythonProject1\lib\site-packages\dask\local.py", line 224, in execute_task
result = _execute_task(task, data)
File "C:\ProgramData\Anaconda3\envs\pythonProject1\lib\site-packages\dask\core.py", line 119, in _execute_task
return func(*(_execute_task(a, cache) for a in args))
File "C:\ProgramData\Anaconda3\envs\pythonProject1\lib\site-packages\dask\optimization.py", line 990, in __call__
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
File "C:\ProgramData\Anaconda3\envs\pythonProject1\lib\site-packages\dask\core.py", line 149, in get
result = _execute_task(task, cache)
File "C:\ProgramData\Anaconda3\envs\pythonProject1\lib\site-packages\dask\core.py", line 119, in _execute_task
return func(*(_execute_task(a, cache) for a in args))
File "C:\ProgramData\Anaconda3\envs\pythonProject1\lib\site-packages\dask\dataframe\io\csv.py", line 129, in __call__
df = pandas_read_text(
File "C:\ProgramData\Anaconda3\envs\pythonProject1\lib\site-packages\dask\dataframe\io\csv.py", line 182, in pandas_read_text
df = reader(bio, **kwargs)
File "C:\ProgramData\Anaconda3\envs\pythonProject1\lib\site-packages\pandas\util\_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "C:\ProgramData\Anaconda3\envs\pythonProject1\lib\site-packages\pandas\io\parsers\readers.py", line 586, in read_csv
return _read(filepath_or_buffer, kwds)
File "C:\ProgramData\Anaconda3\envs\pythonProject1\lib\site-packages\pandas\io\parsers\readers.py", line 488, in _read
return parser.read(nrows)
File "C:\ProgramData\Anaconda3\envs\pythonProject1\lib\site-packages\pandas\io\parsers\readers.py", line 1059, in read
df = DataFrame(col_dict, columns=columns, index=index)
File "C:\ProgramData\Anaconda3\envs\pythonProject1\lib\site-packages\pandas\core\frame.py", line 614, in __init__
mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy, typ=manager)
File "C:\ProgramData\Anaconda3\envs\pythonProject1\lib\site-packages\pandas\core\internals\construction.py", line 464, in dict_to_mgr
return arrays_to_mgr(
File "C:\ProgramData\Anaconda3\envs\pythonProject1\lib\site-packages\pandas\core\internals\construction.py", line 135, in arrays_to_mgr
return create_block_manager_from_arrays(
File "C:\ProgramData\Anaconda3\envs\pythonProject1\lib\site-packages\pandas\core\internals\managers.py", line 1773, in create_block_manager_from_arrays
blocks = _form_blocks(arrays, names, axes, consolidate)
File "C:\ProgramData\Anaconda3\envs\pythonProject1\lib\site-packages\pandas\core\internals\managers.py", line 1838, in _form_blocks
numeric_blocks = _multi_blockify(
File "C:\ProgramData\Anaconda3\envs\pythonProject1\lib\site-packages\pandas\core\internals\managers.py", line 1928, in _multi_blockify
values, placement = _stack_arrays(
File "C:\ProgramData\Anaconda3\envs\pythonProject1\lib\site-packages\pandas\core\internals\managers.py", line 1957, in _stack_arrays
stacked = np.empty(shape, dtype=dtype)
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 2.11 GiB for an array with shape (3, 189229412) and data type float32

KeyError: 0 when used in existing function, otherwise the code works fine

I want to do the following:
I have data in long format organized by dates
Sometimes, data is missing as it there is no record of it
I found a solution by interpolating missing data using reindex which works fine when used outside of function, but for some reason, doesn't work when used inside of a function
def sum_customer_portfolio(country, sold_to):
df = pd.merge(etl_customer_portfolio(), etl_week(), how="left", on=["Country", "GCAS"])
df = df.loc[df["Country"].isin(country)]
df = df.loc[df["Sold_to"].isin(sold_to)]
df_week = etl_week()
df_week = df_week.dropna(subset=["Sold_to"])
df_week = df_week[["Week_num", "Date_range"]]
df_week = df_week.drop_duplicates(subset=["Date_range"])
sum_df = pd.merge(df, df_week, how="outer", on=["Week_num", "Date_range"])
sum_df["Stat_unit_qty"] = sum_df["Stat_unit_qty"].fillna(0, axis=0)
sum_df[["Country", "Sold_to", "Customer"]] = sum_df[["Country", "Sold_to", "Customer"]].fillna(method="ffill",
axis=0)
sum_df = sum_df.fillna("DUMMY_NOT_USE").replace("DUMMY_NOT_USE", np.nan)
reindex_subset = sum_df[["GCAS", "Week_num", "Stat_unit_qty"]]
reindex_subset = reindex_subset.dropna()
reindex_subset = reindex_subset.set_index("Week_num")
reindex_subset = (reindex_subset.groupby("GCAS").apply(
lambda x: x.reindex(list(range(reindex_subset.index.min(), reindex_subset.index.max() + 1)), fill_value=0))
.drop("GCAS", axis=1).
reset_index("GCAS").
fillna(0).
reset_index())
reindex_subset = reindex_subset.drop(columns=["Stat_unit_qty"])
final_df = pd.merge(sum_df, reindex_subset, how="outer", on=["GCAS", "Week_num"])
current_date = datetime.now().strftime("%d%m%Y_%H%M%S")
# return sum_df.to_excel(f"CUSTOMER_PORTFOLIO-{current_date}.xlsx", sheet_name="GCAS_SUM", index=False)
return final_df
Code above keeps giving me the following error:
Traceback (most recent call last):
File "C:\Users\xxxxxxx\PycharmProjects\pythonProject\venv\lib\site-packages\pandas\core\indexes\base.py", line 3361, in get_loc
return self._engine.get_loc(casted_key)
File "pandas\_libs\index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 103, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 135, in pandas._libs.index.IndexEngine._get_loc_duplicates
File "pandas\_libs\index_class_helper.pxi", line 51, in pandas._libs.index.Float64Engine._maybe_get_bool_indexer
File "pandas\_libs\index.pyx", line 161, in pandas._libs.index.IndexEngine._unpack_bool_indexer
KeyError: 0
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\xxxxxxx\PycharmProjects\pythonProject\main.py", line 167, in <module>
sum_customer_portfolio(country=["Croatia", "Slovenia"], sold_to=[2000829798, 2000558171]).to_excel(writer, index=False, sheet_name="GCAS_SUM")
File "C:\Users\xxxxxxx\PycharmProjects\pythonProject\main.py", line 113, in sum_customer_portfolio
reindex_subset = (reindex_subset.groupby(["GCAS", "Sold_to"]).apply(
File "C:\Users\xxxxxxx\PycharmProjects\pythonProject\venv\lib\site-packages\pandas\core\groupby\groupby.py", line 1253, in apply
result = self._python_apply_general(f, self._selected_obj)
File "C:\Users\xxxxxxx\PycharmProjects\pythonProject\venv\lib\site-packages\pandas\core\groupby\groupby.py", line 1287, in _python_apply_general
keys, values, mutated = self.grouper.apply(f, data, self.axis)
File "C:\Users\xxxxxxx\PycharmProjects\pythonProject\venv\lib\site-packages\pandas\core\groupby\ops.py", line 783, in apply
result_values, mutated = splitter.fast_apply(f, sdata, group_keys)
File "C:\Users\xxxxxxx\PycharmProjects\pythonProject\venv\lib\site-packages\pandas\core\groupby\ops.py", line 1328, in fast_apply
return libreduction.apply_frame_axis0(sdata, f, names, starts, ends)
File "pandas\_libs\reduction.pyx", line 369, in pandas._libs.reduction.apply_frame_axis0
File "pandas\_libs\reduction.pyx", line 428, in pandas._libs.reduction.BlockSlider.__init__
File "C:\Users\xxxxxxx\PycharmProjects\pythonProject\venv\lib\site-packages\pandas\core\frame.py", line 3430, in __getitem__
indexer = convert_to_index_sliceable(self, key)
File "C:\Users\xxxxxxx\PycharmProjects\pythonProject\venv\lib\site-packages\pandas\core\indexing.py", line 2329, in convert_to_index_sliceable
return idx._convert_slice_indexer(key, kind="getitem")
File "C:\Users\xxxxxxx\PycharmProjects\pythonProject\venv\lib\site-packages\pandas\core\indexes\numeric.py", line 242, in _convert_slice_indexer
return self.slice_indexer(key.start, key.stop, key.step, kind=kind)
File "C:\Users\xxxxxxx\PycharmProjects\pythonProject\venv\lib\site-packages\pandas\core\indexes\base.py", line 5686, in slice_indexer
start_slice, end_slice = self.slice_locs(start, end, step=step)
File "C:\Users\xxxxxxx\PycharmProjects\pythonProject\venv\lib\site-packages\pandas\core\indexes\base.py", line 5894, in slice_locs
end_slice = self.get_slice_bound(end, "right")
File "C:\Users\xxxxxxx\PycharmProjects\pythonProject\venv\lib\site-packages\pandas\core\indexes\base.py", line 5808, in get_slice_bound
raise err
File "C:\Users\xxxxxxx\PycharmProjects\pythonProject\venv\lib\site-packages\pandas\core\indexes\base.py", line 5802, in get_slice_bound
slc = self.get_loc(label)
File "C:\Users\xxxxxxx\PycharmProjects\pythonProject\venv\lib\site-packages\pandas\core\indexes\base.py", line 3363, in get_loc
raise KeyError(key) from err
KeyError: 0
When loading the data directly from Excel (same data that produced by the function), for example, "CUSTOMER_PORTFOLIO-11082021_234057.xlsx" and running the following code:
sum_df = pd.read_excel("CUSTOMER_PORTFOLIO-11082021_234057.xlsx")
reindex_subset = sum_df[["GCAS", "Week_num", "Stat_unit_qty"]]
reindex_subset = reindex_subset.dropna()
reindex_subset = reindex_subset.set_index("Week_num")
reindex_subset = (reindex_subset.groupby("GCAS").apply(
lambda x: x.reindex(list(range(reindex_subset.index.min(), reindex_subset.index.max() + 1)), fill_value=0))
.drop("GCAS", axis=1).
reset_index("GCAS").
fillna(0).
reset_index())
reindex_subset = reindex_subset.drop(columns=["Stat_unit_qty"])
final_df = pd.merge(sum_df, reindex_subset, how="outer", on=["GCAS", "Week_num"])
The code gives me results that I want.
What am I missing? I tried searching for this on SO overflow, but no success as of yet. I have tried resetting index, but unfortunately, it didn't help.
UPDATE: Pasted the full error traceback. Moreover, as I said above, when I run the function without the part of the code that "reindexes" the data, the code works just fine. I have also tried and still no luck:
df_new = df.copy(deep=True)
df_week= df_week.copy(deep=True)
And when I run the "reindex" part of the code on a finished .xlsx, it works just fine, which is strange in itself.

how to explode dict (or list of dict) object in multiple column in dask.dataframe

When I try to convert some xml to dataframe using xmltodict it happens that a particular column contains all the info I need as dict or list of dict. I'm able to convert this column in multiple ones with pandas but I'm not able to perform the similar operation in dask.
Is not possible to use meta because I've no idea of all the possible fields that are available in the xml and dask is necessary because the true xml files are bigger than 1Gb each.
example.xml:
<?xml version="1.0" encoding="UTF-8"?>
<itemList>
<eventItem uid="1">
<timestamp>2019-07-04T09:57:35.044Z</timestamp>
<eventType>generic</eventType>
<details>
<detail>
<name>columnA</name>
<value>AAA</value>
</detail>
<detail>
<name>columnB</name>
<value>BBB</value>
</detail>
</details>
</eventItem>
<eventItem uid="2">
<timestamp>2019-07-04T09:57:52.188Z</timestamp>
<eventType>generic</eventType>
<details>
<detail>
<name>columnC</name>
<value>CCC</value>
</detail>
</details>
</eventItem>
</itemList>
Working pandas code:
import xmltodict
import collections
import pandas as pd
def pd_output_dict(details):
detail = details.get("detail", [])
ret_value = {}
if type(detail) in (collections.OrderedDict, dict):
ret_value[detail["name"]] = detail["value"]
elif type(detail) == list:
for i in detail:
ret_value[i["name"]] = i["value"]
return pd.Series(ret_value)
with open("example.xml", "r", encoding="utf8") as f:
df_dict_list = xmltodict.parse(f.read()).get("itemList", {}).get("eventItem", [])
df = pd.DataFrame(df_dict_list)
df = pd.concat([df, df.apply(lambda row: pd_output_dict(row.details), axis=1, result_type="expand")], axis=1)
print(df.head())
Not working dask code:
import xmltodict
import collections
import dask
import dask.bag as db
import dask.dataframe as dd
def dd_output_dict(row):
detail = row.get("details", {}).get("detail", [])
ret_value = {}
if type(detail) in (collections.OrderedDict, dict):
row[detail["name"]] = detail["value"]
elif type(detail) == list:
for i in detail:
row[i["name"]] = i["value"]
return row
with open("example.xml", "r", encoding="utf8") as f:
df_dict_list = xmltodict.parse(f.read()).get("itemList", {}).get("eventItem", [])
df_bag = db.from_sequence(df_dict_list)
df = df_bag.to_dataframe()
df = df.apply(lambda row: dd_output_dict(row), axis=1)
The idea is to have in dask similar result I've in pandas but a the moment I'm receiving errors:
>>> df = df.apply(lambda row: output_dict(row), axis=1)
Traceback (most recent call last):
File "C:\Anaconda3\lib\site-packages\dask\dataframe\utils.py", line 169, in raise_on_meta_error
yield
File "C:\Anaconda3\lib\site-packages\dask\dataframe\core.py", line 4711, in _emulate
return func(*_extract_meta(args, True), **_extract_meta(kwargs, True))
File "C:\Anaconda3\lib\site-packages\dask\utils.py", line 854, in __call__
return getattr(obj, self.method)(*args, **kwargs)
File "C:\Anaconda3\lib\site-packages\pandas\core\frame.py", line 6487, in apply
return op.get_result()
File "C:\Anaconda3\lib\site-packages\pandas\core\apply.py", line 151, in get_result
return self.apply_standard()
File "C:\Anaconda3\lib\site-packages\pandas\core\apply.py", line 257, in apply_standard
self.apply_series_generator()
File "C:\Anaconda3\lib\site-packages\pandas\core\apply.py", line 286, in apply_series_generator
results[i] = self.f(v)
File "<stdin>", line 1, in <lambda>
File "<stdin>", line 4, in output_dict
AttributeError: ("'str' object has no attribute 'get'", 'occurred at index 0')
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Anaconda3\lib\site-packages\dask\dataframe\core.py", line 3964, in apply
M.apply, self._meta_nonempty, func, args=args, udf=True, **kwds
File "C:\Anaconda3\lib\site-packages\dask\dataframe\core.py", line 4711, in _emulate
return func(*_extract_meta(args, True), **_extract_meta(kwargs, True))
File "C:\Anaconda3\lib\contextlib.py", line 130, in __exit__
self.gen.throw(type, value, traceback)
File "C:\Anaconda3\lib\site-packages\dask\dataframe\utils.py", line 190, in raise_on_meta_error
raise ValueError(msg)
ValueError: Metadata inference failed in `apply`.
You have supplied a custom function and Dask is unable to
determine the type of output that that function returns.
To resolve this please provide a meta= keyword.
The docstring of the Dask function you ran should have more information.
Original error is below:
------------------------
AttributeError("'str' object has no attribute 'get'", 'occurred at index 0')
Traceback:
---------
File "C:\Anaconda3\lib\site-packages\dask\dataframe\utils.py", line 169, in raise_on_meta_error
yield
File "C:\Anaconda3\lib\site-packages\dask\dataframe\core.py", line 4711, in _emulate
return func(*_extract_meta(args, True), **_extract_meta(kwargs, True))
File "C:\Anaconda3\lib\site-packages\dask\utils.py", line 854, in __call__
return getattr(obj, self.method)(*args, **kwargs)
File "C:\Anaconda3\lib\site-packages\pandas\core\frame.py", line 6487, in apply
return op.get_result()
File "C:\Anaconda3\lib\site-packages\pandas\core\apply.py", line 151, in get_result
return self.apply_standard()
File "C:\Anaconda3\lib\site-packages\pandas\core\apply.py", line 257, in apply_standard
self.apply_series_generator()
File "C:\Anaconda3\lib\site-packages\pandas\core\apply.py", line 286, in apply_series_generator
results[i] = self.f(v)
File "<stdin>", line 1, in <lambda>
File "<stdin>", line 4, in output_dict

Right, so operations like map_partitions will need to know the column names and data types. As you've mentioned, you can specify this with the meta= keyword.
Perhaps you can run through your data once to compute what these will be, and then construct a proper meta object, and pass that in? This is inefficient, and requires reading through all of your data, but I'm not sure that there is another way.

Pandas hashtable with gives key error:0

I am trying to get the same elements of two pandas data table, with indexing the datas and merge it. I use it for a very large amount of data(millions). The frist table (df) is constatn, and the second(d2) is changing in every loop, with the new elements will be merged with the first table.
here is my code for this process:
df = pd.read_csv("inputfile.csv",header=None)
d1 = pd.DataFrame(df).set_index(0)
for i in range(0, len(df)):
try:
follower_id=twitter.get_followers_ids(user_id=df.iloc[i][0],cursor=next_cursor)
f=follower_id['ids']
json.dumps(f)
d2 = pd.DataFrame(f).set_index(0)
match_result = pd.merge(d1,d2,left_index=True,right_index=True)
fk=[df.iloc[i][0] for number in range(len(match_result))]
DF = pd.DataFrame(fk)
DF.to_csv(r'output1.csv',header=None,sep=' ',index=None)
match_result.to_csv(r'output2.csv', header=None, sep=' ')
I have experienced, that this code, runs well for a while, but after that- probably it is relatad to the second databasses size wich is change every loop- the program gives me the following error message, and stop running:
Traceback (most recent call last):
File "halozat3.py", line 39, in <module>
d2 = pd.DataFrame(f).set_index(0) #1Trump koveto kovetolistaja
File "/usr/lib/python2.7/dist-packages/pandas/core/frame.py", line 2372, in set_index
level = frame[col].values
File "/usr/lib/python2.7/dist-packages/pandas/core/frame.py", line 1678, in __getitem__
return self._getitem_column(key)
File "/usr/lib/python2.7/dist-packages/pandas/core/frame.py", line 1685, in _getitem_column
return self._get_item_cache(key)
File "/usr/lib/python2.7/dist-packages/pandas/core/generic.py", line 1052, in _get_item_cache
values = self._data.get(item)
File "/usr/lib/python2.7/dist-packages/pandas/core/internals.py", line 2565, in get
loc = self.items.get_loc(item)
File "/usr/lib/python2.7/dist-packages/pandas/core/index.py", line 1181, in get_loc
return self._engine.get_loc(_values_from_object(key))
File "index.pyx", line 129, in pandas.index.IndexEngine.get_loc (pandas/index.c:3656)
File "index.pyx", line 149, in pandas.index.IndexEngine.get_loc (pandas/index.c:3534)
File "hashtable.pyx", line 381, in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:7035)
File "hashtable.pyx", line 387, in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:6976)
KeyError: 0
What could be the problem?

Have you only one row in your dataframe?
You must write as many rows as you like
Look

"Already tz-aware" error when reading h5 file using pandas, python 3 (but not 2)

I have an h5 store named weather.h5. My default Python environment is 3.5.2. When I try to read this store I get TypeError: Already tz-aware, use tz_convert to convert.
I've tried both pd.read_hdf('weather.h5','weather_history') and pd.io.pytables.HDFStore('weather.h5')['weather_history], but I get the error no matter what.
I can open the h5 in a Python 2.7 environment. Is this a bug in Python 3 / pandas?

I have the same issue. I'm using Anaconda Python: 3.4.5 and 2.7.3. Both are using pandas 0.18.1.
Here is a reproducible example:
generate.py (to be executed with Python2):
import pandas as pd
from pandas import HDFStore
index = pd.DatetimeIndex(['2017-06-20 06:00:06.984630-05:00', '2017-06-20 06:03:01.042616-05:00'], dtype='datetime64[ns, CST6CDT]', freq=None)
p1 = [0, 1]
p2 = [0, 2]
# Saving any of these dataframes cause issues
df1 = pd.DataFrame({"p1":p1, "p2":p2}, index=index)
df2 = pd.DataFrame({"p1":p1, "p2":p2, "i":index})
store = HDFStore("./test_issue.h5")
store['df'] = df1
#store['df'] = df2
store.close()
read_issue.py:
import pandas as pd
from pandas import HDFStore
store = HDFStore("./test_issue.h5", mode="r")
df = store['/df']
store.close()
print(df)
Running read_issue.py in Python2 has no issues and produces this output:
p1 p2
2017-06-20 11:00:06.984630-05:00 0 0
2017-06-20 11:03:01.042616-05:00 1 2
But running it in Python3 produces Error with this traceback:
Traceback (most recent call last):
File "read_issue.py", line 5, in
df = store['df']
File "/home/denper/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/io/pytables.py", line 417, in getitem
return self.get(key)
File "/home/denper/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/io/pytables.py", line 634, in get
return self._read_group(group)
File "/home/denper/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/io/pytables.py", line 1272, in _read_group
return s.read(**kwargs)
File "/home/denper/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/io/pytables.py", line 2779, in read
ax = self.read_index('axis%d' % i)
File "/home/denper/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/io/pytables.py", line 2367, in read_index
_, index = self.read_index_node(getattr(self.group, key))
File "/home/denper/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/io/pytables.py", line 2492, in read_index_node
_unconvert_index(data, kind, encoding=self.encoding), **kwargs)
File "/home/denper/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/indexes/base.py", line 153, in new
result = DatetimeIndex(data, copy=copy, name=name, **kwargs)
File "/home/denper/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/util/decorators.py", line 91, in wrapper
return func(*args, **kwargs)
File "/home/denper/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/tseries/index.py", line 321, in new
raise TypeError("Already tz-aware, use tz_convert "
TypeError: Already tz-aware, use tz_convert to convert.
Closing remaining open files:./test_issue.h5...done
So, there is an issue with indices. However, if you save df2 in generate.py (datetime as a column, not as an index), then Python3 in read_issue.py produces a different error:
Traceback (most recent call last):
File "read_issue.py", line 5, in
df = store['/df']
File "/home/denper/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/io/pytables.py", line 417, in getitem
return self.get(key)
File "/home/denper/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/io/pytables.py", line 634, in get
return self._read_group(group)
File "/home/denper/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/io/pytables.py", line 1272, in _read_group
return s.read(**kwargs)
File "/home/denper/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/io/pytables.py", line 2788, in read
placement=items.get_indexer(blk_items))
File "/home/denper/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/core/internals.py", line 2518, in make_block
return klass(values, ndim=ndim, fastpath=fastpath, placement=placement)
File "/home/denper/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/core/internals.py", line 90, in init
len(self.mgr_locs)))
ValueError: Wrong number of items passed 2, placement implies 1
Closing remaining open files:./test_issue.h5...done
Also, if you execute generate_issue.py in Python3 (saving either df1 or df2), then there is no problem executing read_issue.py in either Python3 or Python2

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to read json format from binance api using pandas? - python

Related

Memory error when exporting large data set processed in python and Dask

KeyError: 0 when used in existing function, otherwise the code works fine

how to explode dict (or list of dict) object in multiple column in dask.dataframe

Pandas hashtable with gives key error:0

"Already tz-aware" error when reading h5 file using pandas, python 3 (but not 2)

Categories

Resources