Some operations on DataFrame

Some operations on DataFrame - python

I am working on praising a *.csv file. Therefore I try to create a class which helps me to simplify some operations on DataFrame.
I've created two methods in order to parse a column 'z' that contains values for the 'Price' column.
def subr(self):
isone = self.df.z == 1.0
if isone.any():
atone = self.df.Price[isone].iloc[0]
self.df.loc[self.df.z.between(0.8, 2.5), 'Benchmark'] = atone
# df.loc[(df.r >= .8) & (df.r <= 1.4), 'value'] = atone
return self.df
def obtain_z(self):
"Return a column with z for E_ref"
self.z_col = self.subr()
self.dfnew = self.df.groupby((self.df.z < self.df.z.shift()).cumsum()).apply(self.z_col)
return self.dfnew
def main():
x = ParseDataBase('data.csv')
file_content = x.read_file()
new_df = x.obtain_z()
I'm getting the following error:
'DataFrame' objects are mutable, thus they cannot be hashed
'DataFrame' objects are mutable means that we can change elements of that Frame. I'm not sure when I'm hashing.
I noticed the use of apply(self.z_col) is going wrong.
I also have no clue how to fix it.

You are passing the DataFrame self.df returned by self.subr() to apply, but actually apply only takes functions as parameters (see examples here).

Related

How do I solve the "AttributeError: 'Series' object has no attribute '_check_fillna'" error when using the technical-analysis-library-in-python

I am using the following library to fill a Dataframe with modified data (indicator of financial data) https://technical-analysis-library-in-python.readthedocs.io/en/latest/
However, the library has a number of classes that seem to miss certain attributes; or lack the inheritance from another class.
I have created a pandas.Series filled with ones to demonstrate. I call the method aroon_up() from class AroonIndicator with the aforementioned series as input, but I get a 'Series' object has no attribute '_check_fillna'" error. I see that there is no attribute _check_fillna in the AroonIndicator class, but there is in the IndicatorMixin. I have tried to run the Series through the IndicatorMixin class, but it states that this class takes no arguments.
Can someone explain to me what I am doing wrong?
Library
class IndicatorMixin:
"""Util mixin indicator class"""
_fillna = False
def _check_fillna(self, series: pd.Series, value: int = 0) -> pd.Series:
"""Check if fillna flag is True.
Args:
series(pandas.Series): calculated indicator series.
value(int): value to fill gaps; if -1 fill values using 'backfill' mode.
Returns:
pandas.Series: New feature generated.
"""
if self._fillna:
series_output = series.copy(deep=False)
series_output = series_output.replace([np.inf, -np.inf], np.nan)
if isinstance(value, int) and value == -1:
series = series_output.fillna(method="ffill").fillna(method='bfill')
else:
series = series_output.fillna(method="ffill").fillna(value)
return series
#staticmethod
def _true_range(
high: pd.Series, low: pd.Series, prev_close: pd.Series
) -> pd.Series:
tr1 = high - low
tr2 = (high - prev_close).abs()
tr3 = (low - prev_close).abs()
true_range = pd.DataFrame(data={"tr1": tr1, "tr2": tr2, "tr3": tr3}).max(axis=1)
return true_range
class AroonIndicator(IndicatorMixin):
"""Aroon Indicator
Identify when trends are likely to change direction.
Aroon Up = ((N - Days Since N-day High) / N) x 100
Aroon Down = ((N - Days Since N-day Low) / N) x 100
Aroon Indicator = Aroon Up - Aroon Down
https://www.investopedia.com/terms/a/aroon.asp
Args:
close(pandas.Series): dataset 'Close' column.
window(int): n period.
fillna(bool): if True, fill nan values.
"""
def __init__(self, close: pd.Series, window: int = 25, fillna: bool = False):
self._close = close
self._window = window
self._fillna = fillna
# self._check_fillna = checkfillna
self._run()
self._check_fillna(IndicatorMixin._check_fillna())
def _run(self):
min_periods = 0 if self._fillna else self._window
rolling_close = self._close.rolling(
self._window, min_periods=min_periods)
self._aroon_up = rolling_close.apply(
lambda x: float(np.argmax(x) + 1) / self._window * 100, raw=True
)
def aroon_up(self) -> pd.Series:
"""Aroon Up Channel
Returns:
pandas.Series: New feature generated.
"""
aroon_up_series = self._check_fillna(self._aroon_up, value=0)
return pd.Series(aroon_up_series, name=f"aroon_up_{self._window}")
My program
# Create an empty DataFrame
table = pd.DataFrame()
# Create a serie of ones
list = np.ones((100))
sr = pd.Series(list)
# fill the empty Dataframe with the indicator of the Series
'try 1:'
table['numbers'] = AroonIndicator.aroon_up(sr)
'try 2:'
table['numbers'] = AroonIndicator.aroon_up(IndicatorMixin(sr))
# print the table
print(table)

The Aroon functions return values as panda Series, however you are trying to assign the results to the 'table' variable, which you have initialized as a DataFrame.
Also, when the only parameter you can pass to a function is 'self', you do not include a parameter when you call the function.
Lastly, don't use reserved words like 'list' for variable names.
Try:
import pandas as pd
import numpy as np
list_values = pd.Series(np.ones(100))
sr = AroonIndicator(list_values)
sr = sr.aroon_up()
print(sr)

Python return statement failing to return a list to be written into a pandas DF

For the life of me I cannot figure out why this function is not returning anything. Any insight will be greatly appreciated!
Basically I create a list of string variables that I am preserving in a Pandas DF. I am using the DF to pull the variable to plug into the function via a .apply() method. But my return function yields NONE results in my DF.
def add_combinations_to_directory(comb_tuples, person_id):
meta_list = []
for comb in comb_tuples:
concat_name = generate_normalized_name(comb)
metaphone_tuple = doublemetaphone(concat_name)
meta_list.append(metaphone_tuple[0])
if metaphone_tuple[1] != '':
meta_list.append(metaphone_tuple[1])
if metaphone_tuple[0] in __lookup_dict[0]:
__lookup_dict[0][metaphone_tuple[0]].append(person_id)
else:
__lookup_dict[0][metaphone_tuple[0]] = [person_id]
if metaphone_tuple[1] in __lookup_dict[1]:
__lookup_dict[1][metaphone_tuple[1]].append(person_id)
else:
__lookup_dict[1][metaphone_tuple[1]] = [person_id]
print(meta_list)
return meta_list
def add_person_to_lookup_directory(person_id, name_tuple):
add_combinations_to_directory(name_tuple, person_id)
def create_meta_names(x, id):
add_person_to_lookup_directory(id, x)
other['Meta_names'] = other.apply(lambda x: create_meta_names(x['Owners'], x['place_id']), axis=1)

Figured it out! it was a problem of nested functions. The return value from the add_combinations_to_directory was being returned to the add_person_to_lookup_directory function and not passing through to the dataframe.

Return a unique dataframe name from function

I would like to return several dataframes from def function using unique names based on variables. My code as follows:
def plots_without_outliers(parameter):
"""
The function removes outliers from dataframe variables and plots boxplot and historams
"""
Q1 = df[parameter].quantile(0.25)
Q3 = df[parameter].quantile(0.75)
IQR = Q3 - Q1
df_without_outliers = df[(df[parameter] > (Q1-1.5*IQR)) & (df[parameter] < (Q3+1.5*IQR))]
g = sns.FacetGrid(df_without_outliers, col='tariff', height=5)
g.map(sns.boxplot, parameter, order=['ultra', 'smart'], color='#fec44f', showmeans=True)
g = sns.FacetGrid(df_without_outliers, col='tariff', height=5)
g.map(plt.hist, parameter, bins = 12, color='#41ab5d')
return df_without_outliers
Then I pass a number of variables :
plots_without_outliers('total_minutes_spent_per_month')
plots_without_outliers('number_sms_spent_per_month')
In addition to graphs I want to have dataframes returned with unique names to use them later on. For example:
df_without_outliers_total_minutes_spent_per_month
and
df_without_outliers_number_sms_spent_per_month
What would be the best way to deal with this issue? Thank you very much for your help.

A common way to deal with this is by using a dictionary, which you can make a global variable outside of the function and then update with the returned dataframe and the corresponding name as dictionary key.
dict_of_dfs = dict()
def plots_without_outliers(parameter):
# your function statements
return df_without_outliers
for col in ['total_minutes_spent_per_month', 'number_sms_spent_per_month']:
dict_of_dfs['df_without_outliers_' + col] = (
plots_without_outliers(col)
)
You can then get each dataframe from the dictionary with e.g., dict_of_dfs['df_without_outliers_total_minutes_spent_per_month']

'Series' object has no attribute 'values_counts'

When I try to apply the values_count() method to series within a function, I am told that 'Series' object has no attribute 'values_counts'.
def replace_1_occ_feat(col_list, df):
for col in col_list:
feat_1_occ = df[col].values_counts()[df[col].values_counts() == 1].index
feat_means = df[col].groupby(col)['SalePrice'].mean()
feat_means_no_1_occ = feat_means.iloc[feat_means.difference(feat_1_occ),:]
for feat in feat_1_occ:
# Find the closest mean SalePrice
replacement = (feat_means_no_1_occ - feat_means.iloc[feat,:]).idxmin()
df.col.replace(feat, replacement, inplace = True)
However when running df.column.values_count() outside a function it works.
The problem occurs on the first line when the values_counts() methods is used.
I checked the pandas version it's 0.23.0.

The function is value_counts(). Note only count is plural.

ValueError: DataFrame constructor not properly called

I am trying to create a dataframe with Python, which works fine with the following command:
df_test2 = DataFrame(index = idx, data=(["-54350","2016-06-25T10:29:57.340Z","2016-06-25T10:29:57.340Z"]))
but, when I try to get the data from a variable instead of hard-coding it into the data argument; eg. :
r6 = ["-54350", "2016-06-25T10:29:57.340Z", "2016-06-25T10:29:57.340Z"]
df_test2 = DataFrame(index = idx, data=(r6))
I expect this is the same and it should work? But I get:
ValueError: DataFrame constructor not properly called!

Reason for the error:
It seems a string representation isn't satisfying enough for the DataFrame constructor
Fix/Solutions:
import ast
# convert the string representation to a dict
dict = ast.literal_eval(r6)
# and use it as the input
df_test2 = DataFrame(index = idx, data=(dict))
which will solve the error.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Some operations on DataFrame - python

You are passing the DataFrame self.df returned by self.subr() to apply, but actually apply only takes functions as parameters (see examples here).

Related

How do I solve the "AttributeError: 'Series' object has no attribute '_check_fillna'" error when using the technical-analysis-library-in-python

Python return statement failing to return a list to be written into a pandas DF

Return a unique dataframe name from function

'Series' object has no attribute 'values_counts'

ValueError: DataFrame constructor not properly called

Categories

Resources