I have a dataframe with a column whose value depends on the existence of another column, so I tried to use np.where to condition the value like this:
the_dataframe["dependant_value"] = np.where(
"independant_value_" + the_dataframe["suffix"].str.lower()
in the_dataframe.columns,
the_dataframe["another_column"]
* the_dataframe[
"independent_value_" + the_dataframe["suffix"].str.lower()
],
0,
)
But I'm getting this error:
File "C:\the_file.py", line 271, in the_method
"independent_value_" + the_dataframe["suffix"].str.lower()
File "C:\the_file.py", line 4572, in __contains__
hash(key)
TypeError: unhashable type: 'Series'
I suppose there must be a propper way to make the logic evaluation of the condition, but I haven't found it.
There is no syntax like Series in Series/Index
$ s = pd.Series([1, 2])
$ s in s
Traceback (most recent call last):
File "~/sourcecode/test/so/73460893.py", line 44, in <module>
s in s
File "~/.local/lib/python3.10/site-packages/pandas/core/generic.py", line 1994, in __contains__
return key in self._info_axis
File "~/.local/lib/python3.10/site-packages/pandas/core/indexes/range.py", line 365, in __contains__
hash(key)
TypeError: unhashable type: 'Series'
You might want Series.isin
("independant_value_" + the_dataframe["suffix"].str.lower()).isin(the_dataframe.columns),
Related
I am parsing some data with predefined columns, and sometimes these columns are duplicated e.g.:
df = pd.DataFrame([['A','B']], columns=['A','A'])
The above works just fine, but I want to also specify the dtype for the column e.g.
df = pd.DataFrame([['A','B']], columns=['A','A'],dtype={'A':str})
However, the above errors out with the following:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py", line 513, in __init__
dtype = self._validate_dtype(dtype)
File "/home/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py", line 345, in _validate_dtype
dtype = pandas_dtype(dtype)
File "/home/anaconda3/lib/python3.7/site-packages/pandas/core/dtypes/common.py", line 1799, in pandas_dtype
npdtype = np.dtype(dtype)
File "/home/anaconda3/lib/python3.7/site-packages/numpy/core/_internal.py", line 62, in _usefields
names, formats, offsets, titles = _makenames_list(adict, align)
File "/home/anaconda3/lib/python3.7/site-packages/numpy/core/_internal.py", line 30, in _makenames_list
n = len(obj)
TypeError: object of type 'type' has no len()
Is there a way around this?
Your syntax is invalid, irrespective of the duplicated columns, the dtype parameter expects a single dtype.
dtype dtype, default None
Data type to force. Only a single dtype is allowed. If None, infer.
You can use:
df = pd.DataFrame([['A','B']], columns=['A','A']).astype({'A':str})
I am trying to add a new column to a DataFrame using Pandas. Although I keep getting an error. Here is my code:
classes = pd.DataFrame(data['class'])
df['class'] = classes
And every time I run this I get:
Traceback (most recent call last):
File "c:\Users\gjohn\Documents\code\machineLearning\trading_bot\filter.py", line 147, in <module>
df['class'] = list(classes)
File "C:\Users\gjohn\AppData\Local\Programs\Python\Python39\lib\site-packages\scipy\sparse\_index.py", line 76, in __setitem__
row, col = self._validate_indices(key)
File "C:\Users\gjohn\AppData\Local\Programs\Python\Python39\lib\site-packages\scipy\sparse\_index.py", line 138, in _validate_indices
row = self._asindices(row, M)
File "C:\Users\gjohn\AppData\Local\Programs\Python\Python39\lib\site-packages\scipy\sparse\_index.py", line 162, in _asindices
raise IndexError('Index dimension must be <= 2')
IndexError: Index dimension must be <= 2
Why am I getting this?
I got a vaex.dataframe.DataFrame called df holding a time column called timestamp of type string. I convert the column to datetime as follows
import numpy as np
from pandas.api.types import is_datetime64_any_dtype as is_datetime
if not is_datetime(df['timestamp']):
df['timestamp'] = df['timestamp'].apply(np.datetime64)
Then I just want to select rows of df where the timestamp is in a specific range. Lets say
sliced_df = df[(df['timestamp'] > np.datetime64("2022-01-01"))]
I am doing that in Sagemaker and it throws a huge error mainly saying the following error messages
Traceback (most recent call last):
File "/home/ec2-user/SageMaker/custom-miniconda/miniconda/envs/custom_python/lib/python3.7/site-packages/vaex/scopes.py", line 106, in evaluate
result = self[expression]
File "/home/ec2-user/SageMaker/custom-miniconda/miniconda/envs/custom_python/lib/python3.7/site-packages/vaex/scopes.py", line 166, in __getitem__
raise KeyError("Unknown variables or column: %r" % (variable,))
KeyError: "Unknown variables or column: 'datetime64(__timestamp)'"
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/ec2-user/SageMaker/custom-miniconda/miniconda/envs/custom_python/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/home/ec2-user/SageMaker/custom-miniconda/miniconda/envs/custom_python/lib/python3.7/site-packages/vaex/expression.py", line 1327, in _apply
scalar_result = self.f(*[fix_type(k[i]) for k in args], **{key: value[i] for key, value in kwargs.items()})
ValueError: Error parsing datetime string "nan" at position 0
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/ec2-user/SageMaker/custom-miniconda/miniconda/envs/custom_python/lib/python3.7/site-packages/vaex/scopes.py", line 265, in __getitem__
values = self.evaluate(expression) # , out=self.buffers[variable])
File "/home/ec2-user/SageMaker/custom-miniconda/miniconda/envs/custom_python/lib/python3.7/site-packages/vaex/scopes.py", line 188, in evaluate
result = eval(expression, expression_namespace, self)
File "<string>", line 1, in <module>
File "/home/ec2-user/SageMaker/custom-miniconda/miniconda/envs/custom_python/lib/python3.7/site-packages/vaex/arrow/numpy_dispatch.py", line 136, in wrapper
result = f(*args, **kwargs)
File "/home/ec2-user/SageMaker/custom-miniconda/miniconda/envs/custom_python/lib/python3.7/site-packages/vaex/expression.py", line 1312, in __call__
return vaex.multiprocessing.apply(self._apply, args, kwargs, self.multiprocessing)
File "/home/ec2-user/SageMaker/custom-miniconda/miniconda/envs/custom_python/lib/python3.7/site-packages/vaex/multiprocessing.py", line 32, in apply
result = _get_pool().apply(f, args, kwargs)
File "/home/ec2-user/SageMaker/custom-miniconda/miniconda/envs/custom_python/lib/python3.7/multiprocessing/pool.py", line 261, in apply
return self.apply_async(func, args, kwds).get()
File "/home/ec2-user/SageMaker/custom-miniconda/miniconda/envs/custom_python/lib/python3.7/multiprocessing/pool.py", line 657, in get
raise self._value
ValueError: Error parsing datetime string "nan" at position 0
ERROR:MainThread:vaex.scopes:error in evaluating: 'timestamp'
"""
The df holds values similar to these under the column timestamp
<pyarrow.lib.StringArray object at 0x7f569e5f54b0>
[
"2021-12-19 06:01:10.789",
"2021-12-20 07:02:11.89",
"2022-01-01 08:02:12.678",
"2022-01-02 09:03:13.567",
"2022-01-03 10:04:14.456"
]
The time stamps look fine to me. I compared with previous data where the comparison worked and nothing seems to be different. I have no clue why this now is not working anymore. I am trying to wrap my head around it for days now but really can't find why its throwing that error.
When I check for
df[df.timestamp.isna()]
it returns nothing. So I don't understand why it found nan in the first position as stated in the error message above.
I appreciate any help. Thanks in advance!
It is probably because you are comparing arrow timestamps to numpy timestamps. You need to chose one framework and work with that.
This issues on vaex's github discusses what you are facing a bit, so it might clear things up more:
https://github.com/vaexio/vaex/issues/1704
Good morning Stackoverflow.
I am trying to find a better way to suck in a CSV file and parse the datetime. Unfortunately my data is coming in as a '%j:%H:%M:%S.%f', such as 234:17:33:00.000206700. I have the year sitting in another field from my header I skip over, so this was my method of converting prior to setting as index, since I have date rollovers to account for. It works, but is slower than I would like and is not intuitive.
dataframe = pd.read_csv(data_file,skiprows=np.arange(0,meta_lines),header=[0,1,2])
dataframe['Temp'] = meta['Date'].split('-')[2] + ' ' # splitting off the year from 08-22-2019
dataframe['Temp'] = dataframe[['Temp','AbsoluteTime']].apply(lambda x: ''.join(x),axis=1)
dataframe['AbsoluteTime'] = pd.to_datetime(dataframe['Temp'],format='%Y %j:%H:%M:%S.%f')
del dataframe['Temp']
dataframe.set_index('AbsoluteTime', inplace=True)
Originally I wanted to have pd.to_datetime parse without the %Y, resulting in the year 1900 and using time delta to add X years, however when I started down that path, I came across this error.
dataframe['AbsoluteTime']
Out[8]:
DDD:HH:MM:SS.sssssssss
Absolute Time
0 234:17:33:00.000206700
1 234:17:33:00.011264914
2 234:17:33:00.015721314
...
pd.to_datetime(dateframe['AbsoluteTime'],format='%j:%H:%M:%S.%f')
Traceback (most recent call last):
File "<ipython-input-9-6dfc074c2dc4>", line 1, in <module>
pd.to_datetime(dateframe['AbsoluteTime'],format='%j:%H:%M:%S.%f')
NameError: name 'dateframe' is not defined
pd.to_datetime(dataframe['AbsoluteTime'],format='%j:%H:%M:%S.%f')
Traceback (most recent call last):
File "<ipython-input-10-bfbf7ee22833>", line 1, in <module>
pd.to_datetime(dataframe['AbsoluteTime'],format='%j:%H:%M:%S.%f')
File "C:\Users\fkatzenb\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\tools\datetimes.py", line 512, in to_datetime
result = _assemble_from_unit_mappings(arg, errors=errors)
File "C:\Users\fkatzenb\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\tools\datetimes.py", line 582, in _assemble_from_unit_mappings
unit = {k: f(k) for k in arg.keys()}
File "C:\Users\fkatzenb\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\tools\datetimes.py", line 582, in <dictcomp>
unit = {k: f(k) for k in arg.keys()}
File "C:\Users\fkatzenb\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\tools\datetimes.py", line 577, in f
if value.lower() in _unit_map:
AttributeError: 'tuple' object has no attribute 'lower'
What gives? My problem isnt from having double brackets [[]] like other threads with this error addresses. If I do this as a test, I see...
pd.to_datetime(['234:17:33:00.000206700'],format='%j:%H:%M:%S.%f')
Out[6]: DatetimeIndex(['1900-08-22 17:33:00.000206700'], dtype='datetime64[ns]', freq=None)
I was then just going to add a timedelta to that to shift the year to the current year.
My only thought is I have is it having to do with my multiple column header (see my from_csv command). Thoughts? Suggestions?
Thanks!
I'm just trying to create a column in my dataframe with the difference of the column value and the same column of the previous month. In case the previous month doesn't exist, don't calculate the difference.
Result table example
df_ranking['cat_race'] = df_ranking.groupby(df_ranking['ID'], df_ranking['DATE'])['POINTS'].shift(1)
But the error message I get is:
Traceback (most recent call last):
File "C:/Users/jhoyo/PycharmProjects/Tennis-Ranking/venv/ranking_2_db.py", line 95, in <module>
df_ranking['cat_race'] = df_ranking.groupby(df_ranking['licencia'], df_ranking['date'])['puntos'].shift(1)
File "C:\Users\jhoyo\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\generic.py", line 7629, in groupby
axis = self._get_axis_number(axis)
File "C:\Users\jhoyo\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\generic.py", line 351, in _get_axis_number
axis = cls._AXIS_ALIASES.get(axis, axis)
File "C:\Users\jhoyo\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\generic.py", line 1816, in __hash__
' hashed'.format(self.__class__.__name__))
TypeError: 'Series' objects are mutable, thus they cannot be hashed
You have to define groupby like this===>
df_ranking['cat_race'] = df_ranking.groupby(['ID','Date'])['POINTS'].shift(1)
Hope it will work