I'm trying to slice a time series, I can do it perfectly this way :
subseries = series['2015-07-07 01:00:00':'2015-07-07 03:30:00'] .
But the following code won't work
def GetDatetime():
Y = int(raw_input("Year "))
M = int(raw_input("Month "))
D = int(raw_input("Day "))
d = datetime.datetime(Y, M, D) #creates a datetime object
return d
filePath = "pathtofile.csv"
series = pd.read_csv(str(filePath), index_col='date')
series.index = pd.to_datetime(series.index, unit='s')
d = GetDatetime()
f = GetDatetime()
subseries = series[d:f]
The last line generates this error:
Traceback (most recent call last):
File "dontgivemeerrorsbrasommek.py", line 37, in <module>
brasla7nina= df[d:f]
File "/usr/local/lib/python2.7/dist-packages/pandas-0.20.2-py2.7-linux-x86_64.egg/pandas/core/frame.py", line 1952, in __getitem__
indexer = convert_to_index_sliceable(self, key)
File "/usr/local/lib/python2.7/dist-packages/pandas-0.20.2-py2.7-linux-x86_64.egg/pandas/core/indexing.py", line 1896, in convert_to_index_sliceable
return idx._convert_slice_indexer(key, kind='getitem')
File "/usr/local/lib/python2.7/dist-packages/pandas-0.20.2-py2.7-linux-x86_64.egg/pandas/core/indexes/base.py", line 1407, in _convert_slice_indexer
indexer = self.slice_indexer(start, stop, step, kind=kind)
File "/usr/local/lib/python2.7/dist-packages/pandas-0.20.2-py2.7-linux-x86_64.egg/pandas/core/indexes/datetimes.py", line 1515, in slice_indexer
return Index.slice_indexer(self, start, end, step, kind=kind)
File "/usr/local/lib/python2.7/dist-packages/pandas-0.20.2-py2.7-linux-x86_64.egg/pandas/core/indexes/base.py", line 3350, in slice_indexer
kind=kind)
File "/usr/local/lib/python2.7/dist-packages/pandas-0.20.2-py2.7-linux-x86_64.egg/pandas/core/indexes/base.py", line 3538, in slice_locs
start_slice = self.get_slice_bound(start, 'left', kind)
File "/usr/local/lib/python2.7/dist-packages/pandas-0.20.2-py2.7-linux-x86_64.egg/pandas/core/indexes/base.py", line 3487, in get_slice_bound
raise err
KeyError: 1435802520000000000
I think it's a time-stamp conversion problem so I tried the following but still it wouldn't work :
d3 = pandas.Timestamp(datetime(Y, M, D, H, m))
d2 = pandas.to_datetime(d)
Your help would be appreciated, thank you. :)
change def GetDatetime() function return value to:
return str(d)
This will return datetime string which times series will be able to deal with.
if I understand your code correctly, when you do this:
subseries = series['2015-07-07 01:00:00':'2015-07-07 03:30:00']
you're slicing series (btw, that's confusing seeing as there is a pandas datatype Series) from two strings.
if that works, then what you need from subseries= df[d:f] would be that d and f be strings.
you can do that by calling the datetime method .strftime() eg:
d= GetDatetime().strftime('%Y-%m-%d 00:00:00')
f= GetDatetime().strftime('%Y-%m-%d 00:00:00')
Related
I am currently exploring Py-Polars and are having some difficulties with getting the Date32 format in its dataframe. I have tried the following means:
Conversion from Pandas to PyPolars directly
import pandas as pd
import pypolars as pyp
a = pd.read_csv(*CSV File*)
b = pyp.from_pandas(a)
The error code is as follows:
Traceback (most recent call last):
File "<pyshell#29>", line 1, in <module>
pyp.from_pandas(a)
File "C:\Users\*Username*\AppData\Local\Programs\Python\Python37\lib\site-packages\pypolars\functions.py", line 235, in from_pandas
pl_s = Series(k, s, nullable=True).cast(datatypes.Date64)
File "C:\Users\*Username*\AppData\Local\Programs\Python\Python37\lib\site-packages\pypolars\series.py", line 783, in cast
return wrap_s(f())
RuntimeError: Any(ArrowError(ComputeError("Casting from Int32 to Date64 not supported")))
Conversion DateTime to String in Pandas, convert to PyPolars, converting String to DateTime in PyPolars
def changeDateTime(value):
return str(value)
a["ACTUAL_DROP_DATE"] = a["ACTUAL_DROP_DATE"].apply(changeDateTime)
a["ACTUAL_END_DATE"] = a["ACTUAL_END_DATE"].apply(changeDateTime)
b = pyp.from_pandas(a)
def changeStrBack(value):
if value == np.str("NaT"):
return ""
else:
year = int(value[0:4])
month = int(value[5:7])
day = int(value[8:10])
return pyp.datetime(year, month, day)
b["ACTUAL_DROP_DATE"] = b["ACTUAL_DROP_DATE"].apply(changeStrBack, dtype_out = pyp.Date32)
b["ACTUAL_END_DATE"] = b["ACTUAL_END_DATE"].apply(changeStrBack, dtype_out = pyp.Date32)
This has thrown me all the null values upon conversion. (i.e. both columns are completely null).
Hope anyone have some ideas on how I can get the columns to datetime in PyPolars.
Thank you!
Reference to: Python pandas convert unix timestamp with timezone into datetime
Did a search on this topic but still can't find the answer.
I have a dataframe whichh is the following format:
df timestamp
1 1549914000
2 1549913400
3 1549935000
3 1549936800
5 1549936200
I use the following to convert epoch to date:
df['date'] = pd.to_datetime(df['timestamp'], unit='s')
This line will produce a date that is always 8 hours behind my local time.
So I followed the example in the link to use apply + tz.localize to Asia/Singapore, I tried the following code on the next line after the above code.
df['date'] = df.apply(lambda x: x['date'].tz_localize(x['Asia/Singapore']), axis=1)
but python return an error as below:
Traceback (most recent call last):
File "/home/test/script.py", line 479, in <module>
schedule.every(10).minutes.do(main).run()
File "/opt/cloudera/parcels/Anaconda-4.0.0/lib/python2.7/site-packages/schedule/__init__.py", line 411, in run
ret = self.job_func()
File "/home/test/script.py", line 361, in main
df['date'] = df.apply(localize_ts, axis = 1)
File "/opt/cloudera/parcels/Anaconda-4.0.0/lib/python2.7/site-packages/pandas/core/frame.py", line 4877, in apply
ignore_failures=ignore_failures)
File "/opt/cloudera/parcels/Anaconda-4.0.0/lib/python2.7/site-packages/pandas/core/frame.py", line 4973, in _apply_standard
results[i] = func(v)
File "/home/test/script.py", line 359, in localize_ts
return pd.to_datetime(row['date']).tz_localize(row['Asia/Singapore'])
File "/opt/cloudera/parcels/Anaconda-4.0.0/lib/python2.7/site-packages/pandas/core/series.py", line 623, in __getitem__
result = self.index.get_value(self, key)
File "/opt/cloudera/parcels/Anaconda-4.0.0/lib/python2.7/site-packages/pandas/core/indexes/base.py", line 2574, in get_value
raise e1
KeyError: ('Asia/Singapore', u'occurred at index 0')
Did I replace .tz_localize(x['tz']) in correctly?
As written, your code is looking for a column named Asia/Singapore. Try this instead:
df['date'] = df['date'].dt.tz_localize('Asia/Singapore')
you can try
import numpy as np
import pandas as pd
df = pd.DataFrame({'timestamp': [1549952400, 1549953600]},index=['1', '2'])
df['timestamp2'] = df['timestamp'] + 28800
df['date'] = pd.to_datetime(df['timestamp2'], unit='s')
df = df.drop('timestamp2', 1)
I have problems when I try to assign a dict to the df DataFrame,
df.loc[index,'count'] = dict()
as I get this error message:
Incompatible indexer with Series
To work around this problem, I can do this,
df.loc[index,'count'] = [dict()]
, but I don't like this solution since I have to resolve the list before getting the dictionary i.e.
a = (df.loc[index,'count'])[0]
How can I solve this situation in a more elegant way?
EDIT1
One way to replicate the whole code is as follow
Code:
import pandas as pd
df = pd.DataFrame(columns= ['count', 'aaa'])
d = dict()
df.loc[0, 'count'] = [d]; print('OK!');
df.loc[0, 'count'] = d
Output:
OK!
Traceback (most recent call last):
File "<ipython-input-193-67bbd89f2c69>", line 4, in <module>
df.loc[0, 'count'] = d
File "/usr/lib64/python3.6/site-packages/pandas/core/indexing.py", line 194, in __setitem__
self._setitem_with_indexer(indexer, value)
File "/usr/lib64/python3.6/site-packages/pandas/core/indexing.py", line 625, in _setitem_with_indexer
value = self._align_series(indexer, Series(value))
File "/usr/lib64/python3.6/site-packages/pandas/core/indexing.py", line 765, in _align_series
raise ValueError('Incompatible indexer with Series')
ValueError: Incompatible indexer with Series
My objective is to..
if the dataframe is empty, i need to insert a row with index->value of the variable URL and columns-> value of URL along with the sorted_list
if non-empty, i need to insert a row with index->value of the variable URL and columns->sorted_list
What I did was... I initialized a DataFrame self.pd and then for each row with values as above said I created a local DataFrame variable df1 and append it to self.df.
My code:
import pandas as pd
class Reward_Matrix:
def __init__(self):
self.df = pd.DataFrame()
def add(self, URL, webpage_list):
sorted_list = []
check_list = list(self.df.columns.values)
print('check_list: ',check_list)
for i in webpage_list: #to ensure no duplication columns
if i not in check_list:
sorted_list.append(i)
if self.df.empty:
sorted_list.insert(0, URL)
df1 = pd.DataFrame(0,index=[URL], columns=[sorted_list])
else:
df1 = pd.DataFrame(0,index=[URL], columns=[sorted_list])
print(df1)
print('sorted_list: ',sorted_list)
print("length: ",len(df1.columns))
self.df.append(df1)
But I get the following error:
Traceback (most recent call last):
File "...Continuum\anaconda3\lib\site-packages\pandas\core\internals.py", line 4294, in create_block_manager_from_blocks
placement=slice(0, len(axes[0])))]
File "...Continuum\anaconda3\lib\site-packages\pandas\core\internals.py", line 2719, in make_block
return klass(values, ndim=ndim, fastpath=fastpath, placement=placement)
File "...Continuum\anaconda3\lib\site-packages\pandas\core\internals.py", line 115, in __init__
len(self.mgr_locs)))
ValueError: Wrong number of items passed 1, placement implies 450
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "...eclipse-workspace\Crawler\crawl_core\src_main\run.py", line 23, in test_start
test.crawl_run(self.URL)
File "...eclipse-workspace\Crawler\crawl_core\src_main\test_crawl.py", line 42, in crawl_run
self.reward.add(URL, webpage_list)
File "...eclipse-workspace\Crawler\crawl_core\src_main\dynamic_matrix.py", line 21, in add
df1 = pd.DataFrame(0,index=[URL], columns=[sorted_list])
File "...Continuum\anaconda3\lib\site-packages\pandas\core\frame.py", line 352, in __init__
copy=False)
File "...Continuum\anaconda3\lib\site-packages\pandas\core\frame.py", line 483, in _init_ndarray
return create_block_manager_from_blocks([values], [columns, index])
File "...Continuum\anaconda3\lib\site-packages\pandas\core\internals.py", line 4303, in create_block_manager_from_blocks
construction_error(tot_items, blocks[0].shape[1:], axes, e)
File "...Continuum\anaconda3\lib\site-packages\pandas\core\internals.py", line 4280, in construction_error
passed, implied))
ValueError: Shape of passed values is (1, 1), indices imply (450, 1)
I am not well-versed with DataFrame and Pandas. I had been getting this error for quite some time and I am getting confused when I go through similar questions asked in StackOverflow as I can't understand where I went wrong!
Can someone help me out?
I think you need remove [], because else get nested list:
df1 = pd.DataFrame(0,index=[URL], columns=sorted_list)
Sample:
sorted_list = ['a','b','c']
URL = 'url1'
df1 = pd.DataFrame(0,index=[URL], columns=sorted_list)
print (df1)
a b c
url1 0 0 0
df1 = pd.DataFrame(0,index=[URL], columns=[sorted_list])
print (df1)
>ValueError: Shape of passed values is (1, 1), indices imply (3, 1)
I am using below code to convert datetime values to integers. It works great except for NaT values. If I am doing this in an iteration, how can I handle NaT values so that I don't get errors such as 'NaTType does not support timetuple'?
import time
from datetime import datetime
t=datetime.now()
t1=t.timetuple()
int(time.mktime(t1)/60/60/24)
Here is the code to create sample data and what I have tried to iterate so far:
create data:
df = pd.DataFrame(data={'date':['05/16/16',''], 'Indicator':[1,0]})
df['date']=pd.to_datetime(df['date'])
Data:
Indicator date
0 1 2016-05-16
1 0 NaT
Iteration code:
def date2int(df):
if df.date:
t=df['date']
t1=t.timetuple()
return int(time.mktime(t1))
df['date2int']=df.apply(date2int,axis=1)
Error message:
Traceback (most recent call last):
File "", line 1, in
df['date2int']=df.apply(date2int,axis=1)
File "/Users/Chen/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 4042, in apply
return self._apply_standard(f, axis, reduce=reduce)
File "/Users/Chen/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 4138, in _apply_standard
results[i] = func(v)
File "", line 4, in date2int
t1=t.timetuple()
File "pandas/tslib.pyx", line 723, in pandas.tslib._make_error_func.f (pandas/tslib.c:16109)
ValueError: ('NaTType does not support timetuple', u'occurred at index 1')
Solution #1:
def date2int(df):
if df.date:
t=df['date']
try:
t1=t.timetuple()
return int(time.mktime(t1))
except ValueError:
return None
df['date2int']=df.apply(date2int,axis=1)
Solution #2:
df=df.dropna()