Reformatting a date column in Pandas - python

I'm using a market data API and am using Pandas dataframes to filter and restructure before storing in a database. Before it goes into InfluxDB, I need to restructure a date column which currently looks like:
Earnings
May 15/b
Apr 09/a
What I have so far is below, but I'm getting the error -
TypeError: Unrecognized value type: <class 'str'>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/OMT/Demo/scratchpad.py", line 41, in <module>
filtered['NewEarnings'] = pd.to_datetime(filtered['NewEarnings'], format='%b%d')
File "/usr/local/lib/python3.8/dist-packages/pandas/core/tools/datetimes.py", line 801, in to_datetime
cache_array = _maybe_cache(arg, format, cache, convert_listlike)
File "/usr/local/lib/python3.8/dist-packages/pandas/core/tools/datetimes.py", line 178, in _maybe_cache
cache_dates = convert_listlike(unique_dates, format)
File "/usr/local/lib/python3.8/dist-packages/pandas/core/tools/datetimes.py", line 460, in _convert_listlike_datetimes
raise e
File "/usr/local/lib/python3.8/dist-packages/pandas/core/tools/datetimes.py", line 423, in _convert_listlike_datetimes
result, timezones = array_strptime(
File "pandas/_libs/tslibs/strptime.pyx", line 144, in pandas._libs.tslibs.strptime.array_strptime
ValueError: time data '-' does not match format '%b%d' (match)
The code in question -
filtered = df
earningsColumn = filtered['Earnings'].squeeze()
stripped = earningsColumn.str.rstrip('. /a/b')
filtered['NewEarnings'] = stripped
filtered['NewEarnings'] = pd.to_datetime(filtered['NewEarnings'], format='%b%d')

Related

KeyError: "['SALINITY;PSU'] not in index"

So, I have this code to process CTD data, it's working normally until I try to process the salinity column, giving me this error:
KeyError: "['SALINITY;PSU'] not in index"
But when I check the columns it's all there, here is the code`
df = pd.read_csv('/home/labdino/PycharmProjects/CTDprocessing/venv/DadosCTD_tabulacao.csv',
sep='\t',
skiprows=header,
)
down, up = df.split()
down = down[["TEMPERATURE;C", "SALINITY;PSU"]]
process = (down.remove_above_water()
.remove_up_to(idx=7)
.despike(n1=2, n2=20, block=100)
.lp_filter()
.press_check()
.interpolate()
.bindata(delta=1, method="average")
.smooth(window_len=21, window="hanning")
)
process.head()
`
Output:
Traceback (most recent call last):
File "/home/labdino/PycharmProjects/CTDprocessing/venv/CTDLab.py", line 47, in <module>
down = down[["TEMPERATURE;C", "SALINITY;PSU"]]
File "/home/labdino/.local/lib/python3.10/site-packages/pandas/core/frame.py", line 3811, in __getitem__
indexer = self.columns._get_indexer_strict(key, "columns")[1]
File "/home/labdino/.local/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 6108, in _get_indexer_strict
self._raise_if_missing(keyarr, indexer, axis_name)
File "/home/labdino/.local/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 6171, in _raise_if_missing
raise KeyError(f"{not_found} not in index")
KeyError: "['SALINITY;PSU'] not in index"
When I use this code with any other column it works, but when trying with salinity it doesn't, and I checked the csv file and it's all normal.

passing values with df.insert

I wanna insert data into a data frame like:
df = pd.DataFrame(columns=["Date", "Title", "Artist"])
insertion happens here:
df.insert(loc=0, column="Date", value=dateTime.group(0), allow_duplicates=True)
df.insert(loc=0, column="Title", value=title, allow_duplicates=True)
df.insert(loc=0, column="Artist", value=artist, allow_duplicates=True)
sadly I don't know how to handle these errors:
Traceback (most recent call last):
File "/Users/sashakaun/IdeaProjects/python/venv/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 697, in _try_cast
subarr = maybe_cast_to_datetime(arr, dtype)
File "/Users/sashakaun/IdeaProjects/python/venv/lib/python3.8/site-packages/pandas/core/dtypes/cast.py", line 1067, in maybe_cast_to_datetime
value = maybe_infer_to_datetimelike(value)
File "/Users/sashakaun/IdeaProjects/python/venv/lib/python3.8/site-packages/pandas/core/dtypes/cast.py", line 865, in maybe_infer_to_datetimelike
if isinstance(value, (ABCDatetimeIndex, ABCPeriodIndex,
File "/Users/sashakaun/IdeaProjects/python/venv/lib/python3.8/site-packages/pandas/core/dtypes/generic.py", line 9, in _check
return getattr(inst, attr, '_typ') in comp
TypeError: 'in <string>' requires string as left operand, not NoneType
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/sashakaun/IdeaProjects/scrapyscrape/test.py", line 24, in <module>
df.insert(loc=0, column="Title", value=title, allow_duplicates=True)
File "/Users/sashakaun/IdeaProjects/python/venv/lib/python3.8/site-packages/pandas/core/frame.py", line 3470, in insert
self._ensure_valid_index(value)
File "/Users/sashakaun/IdeaProjects/python/venv/lib/python3.8/site-packages/pandas/core/frame.py", line 3424, in _ensure_valid_index
value = Series(value)
File "/Users/sashakaun/IdeaProjects/python/venv/lib/python3.8/site-packages/pandas/core/series.py", line 261, in __init__
data = sanitize_array(data, index, dtype, copy,
File "/Users/sashakaun/IdeaProjects/python/venv/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 625, in sanitize_array
subarr = _try_cast(data, False, dtype, copy, raise_cast_failure)
File "/Users/sashakaun/IdeaProjects/python/venv/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 720, in _try_cast
subarr = np.array(arr, dtype=object, copy=copy)
File "/Users/sashakaun/IdeaProjects/python/venv/lib/python3.8/site-packages/bs4/element.py", line 971, in __getitem__
return self.attrs[key]
KeyError: 0
its my first question, please be kind,
thanks in advance
The error seems to be from your value=dateTime.group(0) value. Can you elaborate on what is structure of dateTime?
Plus, df.insert() inserts a column rather than adding the data to a dataframe.
You should first transform your data into a series object and then use df.concat() to concatenate with the original dataframe.
Below are some references:
https://kite.com/python/answers/how-to-insert-a-row-into-a-pandas-dataframe
Add one row to pandas DataFrame
https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html

Timezone issue with UNTIL in ical RRULE statement

The following code throws an exception, which most probably pertains to the TZID replacements I needed to do to fix some other bugs. If I remove the the "UNTIL" statement from the ical string, the code works just fine.
from icalendar.cal import Calendar
import datetime
from dateutil import rrule
from dateutil.tz import gettz
cal_str = "BEGIN:VEVENT\nDTSTART;TZID=America/Los_Angeles:20171019T010000\nDTEND;TZID=America/Los_Angeles:20171019T230000\nRRULE:FREQ=WEEKLY;BYDAY=MO,TU,WE,TH,FR;UNTIL=20180423T191500\nX-OP-ENTRY-STATE:unlocked\nEND:VEVENT"
ical = Calendar.from_ical(cal_str)
start_time_dt = ical.get("DTSTART").dt
end_time_dt = ical.get("DTEND").dt
tzinfo = gettz(str(start_time_dt.tzinfo))
start_time_dt = start_time_dt.replace(tzinfo=tzinfo)
recurring_rule = ical.get('RRULE').to_ical().decode('utf-8')
rules = rrule.rruleset()
first_rule = rrule.rrulestr(recurring_rule, dtstart=start_time_dt)
rules.rrule(first_rule)
event_delta = end_time_dt -start_time_dt
now = datetime.datetime.now(datetime.timezone.utc)
for s in rules.between(now - event_delta, now + datetime.timedelta(minutes=1)):
print(s)
Here is the exception:
Traceback (most recent call last):
File "ical_test.py", line 27, in <module>
for s in rules.between(now - event_delta, now + datetime.timedelta(minutes=1)):
File "/usr/local/lib/python3.5/dist-packages/dateutil/rrule.py", line 290, in between
for i in gen:
File "/usr/local/lib/python3.5/dist-packages/dateutil/rrule.py", line 1362, in _iter
self._genitem(rlist, gen)
File "/usr/local/lib/python3.5/dist-packages/dateutil/rrule.py", line 1292, in __init__
self.dt = advance_iterator(gen)
File "/usr/local/lib/python3.5/dist-packages/dateutil/rrule.py", line 861, in _iter
if until and res > until:
TypeError: can't compare offset-naive and offset-aware datetimes
Anyone help for finding out the root cause of this error and a way to fix this?
First of all they fixed the exception to be more explicit in dateutil>2.7.1 to this:
Traceback (most recent call last):
File "ical_test.py", line 23, in <module>
first_rule = rrule.rrulestr(recurring_rule, dtstart=start_time_dt)
File "/usr/local/lib/python3.5/dist-packages/dateutil/rrule.py", line 1664, in __call__
return self._parse_rfc(s, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/dateutil/rrule.py", line 1547, in _parse_rfc
tzinfos=tzinfos)
File "/usr/local/lib/python3.5/dist-packages/dateutil/rrule.py", line 1506, in _parse_rfc_rrule
return rrule(dtstart=dtstart, cache=cache, **rrkwargs)
File "/usr/local/lib/python3.5/dist-packages/dateutil/rrule.py", line 461, in __init__
'RRULE UNTIL values must be specified in UTC when DTSTART '
ValueError: RRULE UNTIL values must be specified in UTC when DTSTART is timezone-aware
The solution is to calculate the UNTIL time in UTC and add Z to the end of the time string as described in the RFC:
https://icalendar.org/iCalendar-RFC-5545/3-8-5-3-recurrence-rule.html
the correct RRULE string should look like this:
cal_str = "BEGIN:VEVENT\nDTSTART;TZID=America/Los_Angeles:20171019T010000\nDTEND;TZID=America/Los_Angeles:20171019T230000\nRRULE:FREQ=WEEKLY;BYDAY=MO,TU,WE,TH,FR;UNTIL=20180423T001500Z\nX-OP-ENTRY-STATE:unlocked\nEND:VEVENT"

"Already tz-aware" error when reading h5 file using pandas, python 3 (but not 2)

I have an h5 store named weather.h5. My default Python environment is 3.5.2. When I try to read this store I get TypeError: Already tz-aware, use tz_convert to convert.
I've tried both pd.read_hdf('weather.h5','weather_history') and pd.io.pytables.HDFStore('weather.h5')['weather_history], but I get the error no matter what.
I can open the h5 in a Python 2.7 environment. Is this a bug in Python 3 / pandas?
I have the same issue. I'm using Anaconda Python: 3.4.5 and 2.7.3. Both are using pandas 0.18.1.
Here is a reproducible example:
generate.py (to be executed with Python2):
import pandas as pd
from pandas import HDFStore
index = pd.DatetimeIndex(['2017-06-20 06:00:06.984630-05:00', '2017-06-20 06:03:01.042616-05:00'], dtype='datetime64[ns, CST6CDT]', freq=None)
p1 = [0, 1]
p2 = [0, 2]
# Saving any of these dataframes cause issues
df1 = pd.DataFrame({"p1":p1, "p2":p2}, index=index)
df2 = pd.DataFrame({"p1":p1, "p2":p2, "i":index})
store = HDFStore("./test_issue.h5")
store['df'] = df1
#store['df'] = df2
store.close()
read_issue.py:
import pandas as pd
from pandas import HDFStore
store = HDFStore("./test_issue.h5", mode="r")
df = store['/df']
store.close()
print(df)
Running read_issue.py in Python2 has no issues and produces this output:
p1 p2
2017-06-20 11:00:06.984630-05:00 0 0
2017-06-20 11:03:01.042616-05:00 1 2
But running it in Python3 produces Error with this traceback:
Traceback (most recent call last):
File "read_issue.py", line 5, in
df = store['df']
File "/home/denper/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/io/pytables.py", line 417, in getitem
return self.get(key)
File "/home/denper/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/io/pytables.py", line 634, in get
return self._read_group(group)
File "/home/denper/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/io/pytables.py", line 1272, in _read_group
return s.read(**kwargs)
File "/home/denper/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/io/pytables.py", line 2779, in read
ax = self.read_index('axis%d' % i)
File "/home/denper/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/io/pytables.py", line 2367, in read_index
_, index = self.read_index_node(getattr(self.group, key))
File "/home/denper/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/io/pytables.py", line 2492, in read_index_node
_unconvert_index(data, kind, encoding=self.encoding), **kwargs)
File "/home/denper/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/indexes/base.py", line 153, in new
result = DatetimeIndex(data, copy=copy, name=name, **kwargs)
File "/home/denper/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/util/decorators.py", line 91, in wrapper
return func(*args, **kwargs)
File "/home/denper/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/tseries/index.py", line 321, in new
raise TypeError("Already tz-aware, use tz_convert "
TypeError: Already tz-aware, use tz_convert to convert.
Closing remaining open files:./test_issue.h5...done
So, there is an issue with indices. However, if you save df2 in generate.py (datetime as a column, not as an index), then Python3 in read_issue.py produces a different error:
Traceback (most recent call last):
File "read_issue.py", line 5, in
df = store['/df']
File "/home/denper/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/io/pytables.py", line 417, in getitem
return self.get(key)
File "/home/denper/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/io/pytables.py", line 634, in get
return self._read_group(group)
File "/home/denper/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/io/pytables.py", line 1272, in _read_group
return s.read(**kwargs)
File "/home/denper/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/io/pytables.py", line 2788, in read
placement=items.get_indexer(blk_items))
File "/home/denper/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/core/internals.py", line 2518, in make_block
return klass(values, ndim=ndim, fastpath=fastpath, placement=placement)
File "/home/denper/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/core/internals.py", line 90, in init
len(self.mgr_locs)))
ValueError: Wrong number of items passed 2, placement implies 1
Closing remaining open files:./test_issue.h5...done
Also, if you execute generate_issue.py in Python3 (saving either df1 or df2), then there is no problem executing read_issue.py in either Python3 or Python2

String indexes must be integers?

Hey I am having problems accessing some lists.
I can access Items using this code:
data = session.get(BASE_URL + 'tori_market.php',params={'format': 'json'}).json()
except ValueError:
for item in data['items']:
print(item['price'])
But I can not access User using the same code:
data = session.get(BASE_URL + 'tori_market.php',params={'format': 'json'}).json()
except ValueError:
for users in data['user']:
print(user['max'])
Edit: I've posted the wrong code,here is the one i'm using.
data = session.get(BASE_URL + 'tori_market.php',params={'format': 'json'}).json()
except ValueError:
for users in data['user']:
print(users['balance'])
What is wrong with it?
You can check how the API directory are build in this link.
The full traceback is:
Traceback (most recent call last):
File "/Users/Cristi/Desktop/RealBot/prova.py", line 34, in <module>
data = session.get(BASE_URL + 'tori_market.php',params={'format': 'json'}).json()
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/requests/models.py", line 799, in json
return json.loads(self.text, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/json/__init__.py", line 318, in loads
return _default_decoder.decode(s)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/json/decoder.py", line 343, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/json/decoder.py", line 361, in raw_decode
raise ValueError(errmsg("Expecting value", s, err.value)) from None
ValueError: Expecting value: line 1 column 1 (char 0)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/Cristi/Desktop/RealBot/prova.py", line 37, in <module>
print(users['balance'])
TypeError: string indices must be integers
As soon as it's a passworded page i can give you a screenshot,here.
Edited answer
the user in above is a key in json data, so when you do
for users in data["user"]
you are already iterating over its keys.
Instead, for sake of brevity, do,
for key in data["user"]:
print key, data["user"][key]
This will print all the data within the user dict for you. So now key can take the values of "balance" etc.
Original answer
This is a typo between users and user, you use:
for users in data['user']:
but access it as:
print(user['max'])
Instead access it as:
print(users['max'])

Categories