Issue with selecting dataframe rows between two dates

Issue with selecting dataframe rows between two dates - python

I am trying to select rows between a set date range but keep running into the same error. I tried all the methods described in this post with no luck.
Select dataframe rows between two dates
Here is my dataframe:
And this is the error I keep getting when trying to select dates:
Traceback (most recent call last): File "X:/GIS/Projects/scripts/test.py", line 7, in <module>
mask = (df['TransDate'] > '2017-01-15') & (df['TransDate'] <= '2017-12-20') File "C:\Python27\ArcGISx6410.6\lib\site-packages\pandas\core\frame.py", line 1997, in __getitem__
return self._getitem_column(key) File "C:\Python27\ArcGISx6410.6\lib\site-packages\pandas\core\frame.py", line 2004, in _getitem_column
return self._get_item_cache(key) File "C:\Python27\ArcGISx6410.6\lib\site-packages\pandas\core\generic.py", line 1350, in _get_item_cache
values = self._data.get(item) File "C:\Python27\ArcGISx6410.6\lib\site-packages\pandas\core\internals.py", line 3290, in get
loc = self.items.get_loc(item) File "C:\Python27\ArcGISx6410.6\lib\site-packages\pandas\indexes\base.py", line 1947, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/index.pyx", line 137, in pandas.index.IndexEngine.get_loc (pandas\index.c:4154)
File "pandas/index.pyx", line 159, in pandas.index.IndexEngine.get_loc (pandas\index.c:4018)
File "pandas/hashtable.pyx", line 675, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12368)
File "pandas/hashtable.pyx", line 683, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12322)
KeyError: 'TransDate'

Related

Pandas reading html table

import pandas as pd
import pandas_datareader.data as web
coins = pd.read_html('https://coinmarketcap.com/')
for name in coins[0][1][1:]:
print(name)
Results in the error message below. When I print coins, I get the complete table, but when I try and get specific info it gives me this error message. I know this format works as I have copied it exactly from other exercises I have been learning from, and have just changed the website. Many thanks.
C:\Users\AppData\Local\Programs\Python\Python36-32\python.exe C:/Users/Desktop/python_work/crypto/crypto_corr.py
Traceback (most recent call last):
File "C:\Users\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\indexes\base.py", line 2525, in get_loc
return self._engine.get_loc(key)
File "pandas\_libs\index.pyx", line 117, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 139, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 1265, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 1273, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 1
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:/Users/Desktop/python_work/crypto/crypto_corr.py", line 6, in <module>
for name in coins[0][1][1:]:
File "C:\Users\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\frame.py", line 2139, in __getitem__
return self._getitem_column(key)
File "C:\Users\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\frame.py", line 2146, in _getitem_column
return self._get_item_cache(key)
File "C:\Users\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\generic.py", line 1842, in _get_item_cache
values = self._data.get(item)
File "C:\Users\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\internals.py", line 3843, in get
loc = self.items.get_loc(item)
File "C:\Users\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\indexes\base.py", line 2527, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas\_libs\index.pyx", line 117, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 139, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 1265, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 1273, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 1
Process finished with exit code 1

If df is a dataframe, indexing like df[column] looks for columns called column. In your case, coins[0] is a dataframe, which does not have a column 1. However, it does have a column Name, so to print all names do the following:
df = coins[0]
for name in df['Name']:
print(name)

TypeError in Dask dataframe while converting to pandas using compute()

I can't figure out what is the problem in the given code:
I am using dask to merge several dataframes. After merging I want to find the unique values from one of the column. I am getting type error while converting from dask to pandas using unique().compute(). But, I cannot seem to find what actually is the problem. It says that str cannot be assigned as int but, in some of the files the code passses through and in some it doesn't. I also cannot find the problem with data structure.
Any suggestions??
import pandas as pd
import dask.dataframe as dd
# Everything is fine until merging
# I have put several print(markers) to find the problem code
print('dask cols')
print(df_by_dask_merged.columns)
print()
print(dask_cols)
print()
print('find unique contigs values in dask dataframe')
pd_df = df_by_dask_merged['contig']
print(pd_df)
print()
print('mark 02')
# this is the problem code ??
pd_df_contig = pd_df.unique().compute()
print(pd_df_contig)
print('mark 03')
Output on Terminal:
dask cols
Index(['contig', 'pos', 'ref', 'all-alleles', 'ms01e_PI', 'ms01e_PG_al',
'ms02g_PI', 'ms02g_PG_al', 'all-freq'],
dtype='object')
['contig', 'pos', 'ref', 'all-alleles', 'ms01e_PI', 'ms01e_PG_al', 'ms02g_PI', 'ms02g_PG_al', 'all-freq']
find unique contigs values in dask dataframe
Dask Series Structure:
npartitions=1
int64
...
Name: contig, dtype: int64
Dask Name: getitem, 52 tasks
mark 02
Traceback (most recent call last):
File "/home/everestial007/.local/lib/python3.5/site-packages/pandas/indexes/base.py", line 2145, in get_value
return tslib.get_value_box(s, key)
File "pandas/tslib.pyx", line 880, in pandas.tslib.get_value_box (pandas/tslib.c:17368)
File "pandas/tslib.pyx", line 889, in pandas.tslib.get_value_box (pandas/tslib.c:17042)
TypeError: 'str' object cannot be interpreted as an integer
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "merge_haplotype.py", line 305, in <module>
main()
File "merge_haplotype.py", line 152, in main
pd_df_contig = pd_df.unique().compute()
File "/home/everestial007/anaconda3/lib/python3.5/site-packages/dask/base.py", line 155, in compute
(result,) = compute(self, traverse=False, **kwargs)
File "/home/everestial007/anaconda3/lib/python3.5/site-packages/dask/base.py", line 404, in compute
results = get(dsk, keys, **kwargs)
File "/home/everestial007/anaconda3/lib/python3.5/site-packages/dask/threaded.py", line 75, in get
pack_exception=pack_exception, **kwargs)
File "/home/everestial007/anaconda3/lib/python3.5/site-packages/dask/local.py", line 521, in get_async
raise_exception(exc, tb)
File "/home/everestial007/anaconda3/lib/python3.5/site-packages/dask/compatibility.py", line 67, in reraise
raise exc
File "/home/everestial007/anaconda3/lib/python3.5/site-packages/dask/local.py", line 290, in execute_task
result = _execute_task(task, data)
File "/home/everestial007/anaconda3/lib/python3.5/site-packages/dask/local.py", line 271, in _execute_task
return func(*args2)
File "/home/everestial007/anaconda3/lib/python3.5/site-packages/dask/dataframe/core.py", line 3404, in apply_and_enforce
df = func(*args, **kwargs)
File "/home/everestial007/anaconda3/lib/python3.5/site-packages/dask/utils.py", line 687, in __call__
return getattr(obj, self.method)(*args, **kwargs)
File "/home/everestial007/.local/lib/python3.5/site-packages/pandas/core/frame.py", line 4133, in apply
return self._apply_standard(f, axis, reduce=reduce)
File "/home/everestial007/.local/lib/python3.5/site-packages/pandas/core/frame.py", line 4229, in _apply_standard
results[i] = func(v)
File "merge_haplotype.py", line 249, in <lambda>
apply(lambda row : update_cols(row, sample_name), axis=1, meta=(int))
File "merge_haplotype.py", line 278, in update_cols
if 'N|N' in df_by_dask[sample_name + '_PG_al']:
File "/home/everestial007/.local/lib/python3.5/site-packages/pandas/core/series.py", line 601, in __getitem__
result = self.index.get_value(self, key)
File "/home/everestial007/.local/lib/python3.5/site-packages/pandas/indexes/base.py", line 2153, in get_value
raise e1
File "/home/everestial007/.local/lib/python3.5/site-packages/pandas/indexes/base.py", line 2139, in get_value
tz=getattr(series.dtype, 'tz', None))
File "pandas/index.pyx", line 105, in pandas.index.IndexEngine.get_value (pandas/index.c:3338)
File "pandas/index.pyx", line 113, in pandas.index.IndexEngine.get_value (pandas/index.c:3041)
File "pandas/index.pyx", line 161, in pandas.index.IndexEngine.get_loc (pandas/index.c:4024)
File "pandas/src/hashtable_class_helper.pxi", line 732, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13161)
File "pandas/src/hashtable_class_helper.pxi", line 740, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13115)
KeyError: ('ms02g_PG_al', 'occurred at index 0')

KeyError: 0 when changing time format of data

I have a column of data that are date formatted as "%d%m%Y" like "15022016".
I need to convert them as "%Y-%m-%d" like"2016-02-15".
The data frame have 911,462 rows, and the code is as below:
for i in range(0,911462):
df['Date'][i]=datetime.datetime.strftime(datetime.datetime.strptime(df['Date'][i],"%d%m%Y"),"%Y-%m-%d")
Then I met with error as below:
Traceback (most recent call last):
File "C:\Users\liangfan\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\indexes\base.py", line 2393, in get_loc
return self._engine.get_loc(key)
File "pandas\_libs\index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5239)
File "pandas\_libs\index.pyx", line 154, in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5085)
File "pandas\_libs\hashtable_class_helper.pxi", line 1207, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20405)
File "pandas\_libs\hashtable_class_helper.pxi", line 1215, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20359)
KeyError: 0
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<input>", line 2, in <module>
File "C:\Users\liangfan\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\frame.py", line 2062, in __getitem__
return self._getitem_column(key)
File "C:\Users\liangfan\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\frame.py", line 2074, in _getitem_column
result = result[key]
File "C:\Users\liangfan\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\frame.py", line 2062, in __getitem__
return self._getitem_column(key)
File "C:\Users\liangfan\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\frame.py", line 2069, in _getitem_column
return self._get_item_cache(key)
File "C:\Users\liangfan\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\generic.py", line 1534, in _get_item_cache
values = self._data.get(item)
File "C:\Users\liangfan\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\internals.py", line 3590, in get
loc = self.items.get_loc(item)
File "C:\Users\liangfan\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\indexes\base.py", line 2395, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas\_libs\index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5239)
File "pandas\_libs\index.pyx", line 154, in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5085)
File "pandas\_libs\hashtable_class_helper.pxi", line 1207, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20405)
File "pandas\_libs\hashtable_class_helper.pxi", line 1215, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20359)
KeyError: 0
I check the raw data in excel, they are all fine so there should be no problems with the raw data.
It's quite wired that Key Error is 0. I totally have no idea what's wrong with it and how to deal with it.
Thanks for reading and waiting for your help! :)

You need pandas.to_datetime with parameter format:
df = pd.DataFrame({'Date':[15022016,15022016]})
print (df)
Date
0 15022016
1 15022016
df['Date'] = pd.to_datetime(df['Date'], format='%d%m%Y')
print (df)
Date
0 2016-02-15
1 2016-02-15
print (df['Date'].dtype)
datetime64[ns]

Python 2 DataFrame KeyError: u'\x

The code:
from __future__ import unicode_literals
#...
print df.columns
print df[u'总资产']
The output:
Index([u'余额', u'参考市值', u'可用', u'总资产', u'盈亏', u'资产(资金股份)'], dtype='object')
Traceback (most recent call last): "user_code.py", line 36, in handle_data
print df[u'总资产'] File "/home/server/y/envs/kuanke/lib/python2.7/site-packages/pandas/core/frame.py", line 1797, in __getitem__
return self._getitem_column(key) File "/home/server/y/envs/kuanke/lib/python2.7/site-packages/pandas/core/frame.py", line 1804, in _getitem_column
return self._get_item_cache(key) File "/home/server/y/envs/kuanke/lib/python2.7/site-packages/pandas/core/generic.py", line 1084, in _get_item_cache
values = self._data.get(item) File "/home/server/y/envs/kuanke/lib/python2.7/site-packages/pandas/core/internals.py", line 2851, in get
loc = self.items.get_loc(item) File "/home/server/y/envs/kuanke/lib/python2.7/site-packages/pandas/core/index.py", line 1572, in get_loc
return self._engine.get_loc(_values_from_object(key)) File "pandas/index.pyx", line 134, in pandas.index.IndexEngine.get_loc (pandas/index.c:3824) File "pandas/index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas/index.c:3704) File "pandas/hashtable.pyx", line 686, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12280) File "pandas/hashtable.pyx", line 694, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12231)
KeyError: u'\xe6\x80\xbb\xe8\xb5\x84\xe4\xba\xa7'
Another version works fine.
#...
print df.columns
print df[unicode('总资产')]
---- Update ---------------------------------------
import pandas as pd
df = pd.DataFrame(columns=[u'总资产'])
print df.columns
print df[u'总资产']
return
Output is garbled
2016-04-05 09:30:00 - INFO - Index([u'æ»èµäº§'], dtype='object')
2016-04-05 09:30:00 - INFO - Series([], Name: æ»èµäº§, dtype:
object)

Pandas hashtable with gives key error:0

I am trying to get the same elements of two pandas data table, with indexing the datas and merge it. I use it for a very large amount of data(millions). The frist table (df) is constatn, and the second(d2) is changing in every loop, with the new elements will be merged with the first table.
here is my code for this process:
df = pd.read_csv("inputfile.csv",header=None)
d1 = pd.DataFrame(df).set_index(0)
for i in range(0, len(df)):
try:
follower_id=twitter.get_followers_ids(user_id=df.iloc[i][0],cursor=next_cursor)
f=follower_id['ids']
json.dumps(f)
d2 = pd.DataFrame(f).set_index(0)
match_result = pd.merge(d1,d2,left_index=True,right_index=True)
fk=[df.iloc[i][0] for number in range(len(match_result))]
DF = pd.DataFrame(fk)
DF.to_csv(r'output1.csv',header=None,sep=' ',index=None)
match_result.to_csv(r'output2.csv', header=None, sep=' ')
I have experienced, that this code, runs well for a while, but after that- probably it is relatad to the second databasses size wich is change every loop- the program gives me the following error message, and stop running:
Traceback (most recent call last):
File "halozat3.py", line 39, in <module>
d2 = pd.DataFrame(f).set_index(0) #1Trump koveto kovetolistaja
File "/usr/lib/python2.7/dist-packages/pandas/core/frame.py", line 2372, in set_index
level = frame[col].values
File "/usr/lib/python2.7/dist-packages/pandas/core/frame.py", line 1678, in __getitem__
return self._getitem_column(key)
File "/usr/lib/python2.7/dist-packages/pandas/core/frame.py", line 1685, in _getitem_column
return self._get_item_cache(key)
File "/usr/lib/python2.7/dist-packages/pandas/core/generic.py", line 1052, in _get_item_cache
values = self._data.get(item)
File "/usr/lib/python2.7/dist-packages/pandas/core/internals.py", line 2565, in get
loc = self.items.get_loc(item)
File "/usr/lib/python2.7/dist-packages/pandas/core/index.py", line 1181, in get_loc
return self._engine.get_loc(_values_from_object(key))
File "index.pyx", line 129, in pandas.index.IndexEngine.get_loc (pandas/index.c:3656)
File "index.pyx", line 149, in pandas.index.IndexEngine.get_loc (pandas/index.c:3534)
File "hashtable.pyx", line 381, in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:7035)
File "hashtable.pyx", line 387, in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:6976)
KeyError: 0
What could be the problem?

Have you only one row in your dataframe?
You must write as many rows as you like
Look

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Issue with selecting dataframe rows between two dates - python

Related

Pandas reading html table

TypeError in Dask dataframe while converting to pandas using compute()

KeyError: 0 when changing time format of data

Python 2 DataFrame KeyError: u'\x

Pandas hashtable with gives key error:0

Categories

Resources