Shapes not aligned in Python AutoImpute data imputation package? - python

I'm trying to use the (relatively new) Python AutoImpute package, but I keep getting a shape mismatch error when trying to use a particular column as a predictor.
This is what my pandas dataframe looks like
I can impute using the 'sex', 'group', and 'binned_age' columns, but not using the 'experiment' column. When I try doing that, I get this error:
ValueError: shapes (9,) and (4,13) not aligned: 9 (dim 0) != 4 (dim 0)
This is my code for actually fitting and running the imputer:
cat_predictors = ['experiment', 'sex', 'group', 'binned_age']
si = SingleImputer(
strategy={'FSIQ': 'default predictive'},
predictors={'FSIQ': cat_predictors},
)
imputed_data = si.fit_transform(df2)
In trying to diagnose the problem, I found out that if I reduce the number of unique strings in the 'experiment' column to 3 or fewer, my problem goes away for some reason. But, I don't want to do that and lose some of my data. Any help?
Full trace below:
ValueError Traceback (most recent call last)
<ipython-input-11-3d4388ba92e4> in <module>
1 si = SingleImputer(
2 strategy={'FSIQ': 'pmm'}, imp_kwgs={'pmm': {'tune': 10000, 'sample':10000}})
----> 3 data_imputed_once = si.fit_transform(df2)
/om/user/agupta81/anaconda/envs/myenv/lib/python3.8/site-packages/autoimpute/imputations/dataframe/single_imputer.py in fit_transform(self, X, y)
288 X (pd.DataFrame): imputed in place or copy of original.
289 """
--> 290 return self.fit(X, y).transform(X)
/om/user/agupta81/anaconda/envs/myenv/lib/python3.8/site-packages/autoimpute/utils/checks.py in wrapper(d, *args, **kwargs)
59 err = f"Neither {d_err} nor {a_err} are of type pd.DataFrame"
60 raise TypeError(err)
---> 61 return func(d, *args, **kwargs)
62 return wrapper
63
/om/user/agupta81/anaconda/envs/myenv/lib/python3.8/site-packages/autoimpute/utils/checks.py in wrapper(d, *args, **kwargs)
124
125 # return func if no missingness violations detected, then return wrap
--> 126 return func(d, *args, **kwargs)
127 return wrapper
128
/om/user/agupta81/anaconda/envs/myenv/lib/python3.8/site-packages/autoimpute/utils/checks.py in wrapper(d, *args, **kwargs)
171 err = f"All values missing in column(s) {nc}. Should be removed."
172 raise ValueError(err)
--> 173 return func(d, *args, **kwargs)
174 return wrapper
175
/om/user/agupta81/anaconda/envs/myenv/lib/python3.8/site-packages/autoimpute/imputations/dataframe/single_imputer.py in transform(self, X, imp_ixs)
274
275 # perform imputation given the specified imputer and value for x_
--> 276 X.loc[imp_ix, column] = imputer.impute(x_)
277 return X
278
/om/user/agupta81/anaconda/envs/myenv/lib/python3.8/site-packages/autoimpute/imputations/series/pmm.py in impute(self, X)
187 # imputed values are actual y vals corresponding to nearest neighbors
188 # therefore, this is a form of "hot-deck" imputation
--> 189 y_pred_bayes = alpha_bayes + beta_bayes.dot(X.T)
190 n_ = self.neighbors
191 if X.columns.size == 1:
ValueError: shapes (9,) and (4,13) not aligned: 9 (dim 0) != 4 (dim 0)

Related

lifetimes BetaGeoFitter model is not working, it gives ConvergenceError

I just wanted to apply BetaGeoFitter model to my dataframe as:
> df = summary_data_from_transaction_data(df,"COMPANY_ID","INVOICE_DATE","TOTAL_PRICE",include_first_transaction=True, observation_period_end= today_date, freq= "W")
> bgf = BetaGeoFitter(penalizer_coef=0.0)
> bgf.fit(df['frequency'], df['recency'], df['T'])
It gives the error below(last rows of the error because it's too long). I don't know where is the problem and what this error is telling. By the way, it gives the same error when I change it with a larger penalizer_coef. Can anyone help me to fix it?
C:\ProgramData\Miniconda3\lib\site-packages\autograd\tracer.py:48: RuntimeWarning: invalid value encountered in multiply
return f_raw(*args, **kwargs)
C:\ProgramData\Miniconda3\lib\site-packages\autograd\tracer.py:48: RuntimeWarning: invalid value encountered in subtract
return f_raw(*args, **kwargs)
C:\ProgramData\Miniconda3\lib\site-packages\autograd\numpy\numpy_vjps.py:78: RuntimeWarning: invalid value encountered in double_scalars
defvjp(anp.log, lambda ans, x : lambda g: g / x)
---------------------------------------------------------------------------
ConvergenceError Traceback (most recent call last)
Cell In [19], line 2
1 bgf = BetaGeoFitter(penalizer_coef=0.0)
----> 2 bgf.fit(df['frequency'], df['recency'], df['T'])
3 print(bgf)
File C:\ProgramData\Miniconda3\lib\site-packages\lifetimes\fitters\beta_geo_fitter.py:137, in BetaGeoFitter.fit(self, frequency, recency, T, weights, initial_params, verbose, tol, index, **kwargs)
134 scaled_recency = recency * self._scale
135 scaled_T = T * self._scale
--> 137 log_params_, self._negative_log_likelihood_, self._hessian_ = self._fit(
138 (frequency, scaled_recency, scaled_T, weights, self.penalizer_coef),
139 initial_params,
140 4,
141 verbose,
142 tol,
143 **kwargs
144 )
146 self.params_ = pd.Series(np.exp(log_params_), index=["r", "alpha", "a", "b"])
147 self.params_["alpha"] /= self._scale
File C:\ProgramData\Miniconda3\lib\site-packages\lifetimes\fitters\__init__.py:115, in BaseFitter._fit(self, minimizing_function_args, initial_params, params_size, disp, tol, bounds, **kwargs)
113 return output.x, output.fun, hessian_
114 print(output)
--> 115 raise ConvergenceError(
116 dedent(
117 """
118 The model did not converge. Try adding a larger penalizer to see if that helps convergence.
119 """
120 )
121 )
ConvergenceError:
The model did not converge. Try adding a larger penalizer to see if that helps convergence.
Try grouping values together like it worked for me:
df_ = df.groupby(["frequency", "recency", "T"]).size().reset_index()
BetaGeoBetaBinomFitter().fit(df_['frequency'], df_['recency'], df_['T'], weights=df_[0])

Sktime - how to make in-sample and out-of-sample forecasts with exogenous variables?

I'm trying to make forecasts using sktime for my entire training data and an arbitrary length of out-of-sample data but can't figure it out.
# Generate 2 years of daily data
data = np.random.random(365 * 2,)
df = pd.DataFrame({'y': data})
# Arbirtrary X variable (8% per year growth as a daily increase)
df['daily_growth'] = 8 / 365
# Forecast for entire dataset and 1 year into the future
fh = np.arange(-len(df)+1, 365+1)
# Fit model
arima = AutoARIMA()
arima.fit(df.y, X=df.daily_growth)
# Create forecast df for in-sample and out-of-sample data...
# ...this is probably where the problem lies
forecast_df = pd.DataFrame(index=range(len(fh))) # `index=fh` also fails
forecast_df['daily_growth'] = 8 / 365
# ValueError...
preds_with_X = arima.predict(fh=fh, X=forecast_df)
Output
ValueError Traceback (most recent call last)
Input In [3], in <cell line: 15>()
13 preds_no_X = arima_no_X.predict(fh=fh)
14 len(fh) == len(forecast_df)
---> 15 preds_with_X = arima_with_X.predict(fh=fh, X=forecast_df)
17 plt.plot(df.y, label='Actual')
18 plt.plot(preds_no_X, label='preds_no_X')
File ~/opt/anaconda3/envs/humbl_keywords/lib/python3.9/site-packages/sktime/forecasting/base/_base.py:318, in BaseForecaster.predict(self, fh, X)
316 # we call the ordinary _predict if no looping/vectorization needed
317 if not self._is_vectorized:
--> 318 y_pred = self._predict(fh=fh, X=X_inner)
319 else:
320 # otherwise we call the vectorized version of predict
321 y_pred = self._vectorize("predict", X=X_inner, fh=fh)
File ~/opt/anaconda3/envs/humbl_keywords/lib/python3.9/site-packages/sktime/forecasting/base/adapters/_pmdarima.py:84, in _PmdArimaAdapter._predict(self, fh, X)
81 # both in-sample and out-of-sample values
82 else:
83 y_ins = self._predict_in_sample(fh_ins, X=X)
---> 84 y_oos = self._predict_fixed_cutoff(fh_oos, X=X)
85 return pd.concat([y_ins, y_oos])
File ~/opt/anaconda3/envs/humbl_keywords/lib/python3.9/site-packages/sktime/forecasting/base/adapters/_pmdarima.py:177, in _PmdArimaAdapter._predict_fixed_cutoff(self, fh, X, return_pred_int, alpha)
162 """Make predictions out of sample.
163
164 Parameters
(...)
174 Returns series of predicted values.
175 """
176 n_periods = int(fh.to_relative(self.cutoff)[-1])
--> 177 result = self._forecaster.predict(
178 n_periods=n_periods,
179 X=X,
180 return_conf_int=False,
181 alpha=DEFAULT_ALPHA,
182 )
184 fh_abs = fh.to_absolute(self.cutoff)
185 fh_idx = fh.to_indexer(self.cutoff)
File ~/opt/anaconda3/envs/humbl_keywords/lib/python3.9/site-packages/pmdarima/utils/metaestimators.py:53, in _IffHasDelegate.__get__.<locals>.<lambda>(*args, **kwargs)
50 attrgetter(self.delegate_names[-1])(obj)
52 # lambda, but not partial, allows help() to work with update_wrapper
---> 53 out = (lambda *args, **kwargs: self.fn(obj, *args, **kwargs))
54 # update the docstring of the returned function
55 update_wrapper(out, self.fn)
File ~/opt/anaconda3/envs/humbl_keywords/lib/python3.9/site-packages/pmdarima/arima/auto.py:257, in AutoARIMA.predict(self, n_periods, X, return_conf_int, alpha, **kwargs)
247 #if_has_delegate("model_")
248 def predict(self,
249 n_periods=10,
(...)
254
255 # Temporary shim until we remove `exogenous` support completely
256 X, _ = pm_compat.get_X(X, **kwargs)
--> 257 return self.model_.predict(
258 n_periods=n_periods,
259 X=X,
260 return_conf_int=return_conf_int,
261 alpha=alpha,
262 )
File ~/opt/anaconda3/envs/humbl_keywords/lib/python3.9/site-packages/pmdarima/arima/arima.py:785, in ARIMA.predict(self, n_periods, X, return_conf_int, alpha, **kwargs)
783 X = self._check_exog(X) # type: np.ndarray
784 if X is not None and X.shape[0] != n_periods:
--> 785 raise ValueError('X array dims (n_rows) != n_periods')
787 # f = self.arima_res_.forecast(steps=n_periods, exog=X)
788 arima = self.arima_res_
ValueError: X array dims (n_rows) != n_periods
Alas, pmdarima doesn't print what input it receives for n_rows and n_periods. But I think I am passing the correct shapes.
len(fh) == len(forecast_df) # True
fh.shape, forecast_df.shape # ((1095,), (1095, 1))
P.S. I'm not sure my daily_growth var would actually have any impact on the results. Advice on this point and how to get the model to have 8% growth would be helpful too!

What is 'G' in CVXPY and how to fix it

I'm trying to use a binary integer linear program to assign members of my staff to different shift. I have a 16x9 matrix of preferences for my staff in a csv (16 staff members, 9 slots to fill) and I used the following code to try and assign them:
weights = pd.read_csv("holiday_green day.csv", index_col= 0)
weights = weights.to_numpy().astype(float)
selection = cvx.Variable((9,16), boolean = True)
row_sum_vector = np.ones((16,1)).astype(float)
result_constraint = np.ones((9,1)).astype(float) * 2
objective = cvx.Minimize(cvx.trace(weights # assignments))
prob = cvx.Problem(objective, [assignments # row_sum_vector == result_constraint])
prob.solve()
When I try running this, I get the error TypeError: G must be a 'd' matrix and I don't know where to start debugging. I looked at this post, but it wasn't helpful. Can someone help me figure out what G is and what it means by 'd' matrix? Its my first time actually using CVXPY and I'm very lost.
Full Stack Trace:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-23-d07ad22cbc25> in <module>()
6 objective = cvx.Minimize(cvx.atoms.affine.trace.trace(weights # assignments))
7 prob = cvx.Problem(objective, [assignments # row_sum_vector == result_constraint])
----> 8 prob.solve()
3 frames
/usr/local/lib/python3.7/dist-packages/cvxpy/problems/problem.py in solve(self, *args, **kwargs)
288 else:
289 solve_func = Problem._solve
--> 290 return solve_func(self, *args, **kwargs)
291
292 #classmethod
/usr/local/lib/python3.7/dist-packages/cvxpy/problems/problem.py in _solve(self, solver, warm_start, verbose, parallel, gp, qcp, **kwargs)
570 self._intermediate_problem)
571 solution = self._solving_chain.solve_via_data(
--> 572 self, data, warm_start, verbose, kwargs)
573 full_chain = self._solving_chain.prepend(self._intermediate_chain)
574 inverse_data = self._intermediate_inverse_data + solving_inverse_data
/usr/local/lib/python3.7/dist-packages/cvxpy/reductions/solvers/solving_chain.py in solve_via_data(self, problem, data, warm_start, verbose, solver_opts)
194 """
195 return self.solver.solve_via_data(data, warm_start, verbose,
--> 196 solver_opts, problem._solver_cache)
/usr/local/lib/python3.7/dist-packages/cvxpy/reductions/solvers/conic_solvers/glpk_mi_conif.py in solve_via_data(self, data, warm_start, verbose, solver_opts, solver_cache)
73 data[s.B],
74 set(int(i) for i in data[s.INT_IDX]),
---> 75 set(int(i) for i in data[s.BOOL_IDX]))
76 results_dict = {}
77 results_dict["status"] = results_tup[0]
TypeError: G must be a 'd' matrix
Edit: Tried casting all numpy arrays as float like they suggested in a different post. It didn't work.

Logistic Regression Model (binary) crosstab error = shape of passed values issue

I am currently trying to run logistic regression for a data set. I dummy encoded my cat variables and normalized my continuous variables, and I fill null values with -1 (which works for my dataset). I am going through the steps and I am not getting any errors until I try to run my crosstab where its complaining about the shape of my the values passed. I'm getting the same error for both LogR w/ and w/out CV. I have included my code below, I did not include the encoding because that does not seem to be the issue or the code LogR w/out CV because it is basically identical except it excluding the CV.
# read in the df w/ encoded variables
allyrs=pd.read_csv("C:/Users/cyrra/OneDrive/Documents/Pythonread/HDS805/CS1W1/modelready_working.csv")
# Find locations of where I need to trim the data down selecting only the encoded variables
allyrs.columns.get_loc("BMI_C__-1.0")
23
allyrs.columns.get_loc("N_BMIR")
152
# Finding the location of the Y col
allyrs.columns.get_loc("CM")
23
#create new X and y for binary LR
y_bi = allyrs[["CM"]]
X_bi = allyrs.iloc[0:1305720, 23:152]
I then went ahead and checked the lengths of both variables and checked for all the columns in the X set, everything was there. The values are as followed: y_bi = 1305720 rows x 1 col , X_bi = 1305720 rows × 129 columns
# Create test/train
# Create test/train for bi column
from sklearn.model_selection import train_test_split
Xbi_train, Xbi_test, ybi_train, ybi_test = train_test_split(X_bi, y_bi,
train_size=0.8,test_size = 0.2)
again I check the size of Xbi_train and & Ybi_train: Xbi_train=1044576 rows × 129 columns, ybi_train= 1044576 rows × 1 columns
# LRw/CV for the binary col
from sklearn.linear_model import LogisticRegressionCV
logitbi_cv = LogisticRegressionCV(cv=2, random_state=0).fit(Xbi_train, ybi_train)
# Set predicted (checking to see if its an array)
logitbi_cv.predict(Xbi_train)
array([0, 0, 0, ..., 0, 0, 0], dtype=int64)
# Set predicted to its own variable
[IN]:pred_logitbi_cv =logitbi_cv.predict(Xbi_train)
# Cross tab LR w/0ut
from sklearn.metrics import confusion_matrix
ct_bi_cv=pd.crosstab(ybi_train, pred_logitbi_cv)
The error:
[OUT]:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
~\anaconda3\lib\site-packages\pandas\core\internals\managers.py in create_block_manager_from_arrays(arrays, names, axes)
1701 blocks = _form_blocks(arrays, names, axes)
-> 1702 mgr = BlockManager(blocks, axes)
1703 mgr._consolidate_inplace()
~\anaconda3\lib\site-packages\pandas\core\internals\managers.py in __init__(self, blocks, axes, do_integrity_check)
142 if do_integrity_check:
--> 143 self._verify_integrity()
144
~\anaconda3\lib\site-packages\pandas\core\internals\managers.py in _verify_integrity(self)
322 if block.shape[1:] != mgr_shape[1:]:
--> 323 raise construction_error(tot_items, block.shape[1:], self.axes)
324 if len(self.items) != tot_items:
ValueError: Shape of passed values is (1, 2), indices imply (1044576, 2)
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
<ipython-input-121-c669b17c171f> in <module>
1 # LR W/ CV
2 # Cross tab LR w/0ut
----> 3 ct_bi_cv=pd.crosstab(ybi_train, pred_logitbi_cv)
~\anaconda3\lib\site-packages\pandas\core\reshape\pivot.py in crosstab(index, columns, values, rownames, colnames, aggfunc, margins, margins_name, dropna, normalize)
596 **dict(zip(unique_colnames, columns)),
597 }
--> 598 df = DataFrame(data, index=common_idx)
599 original_df_cols = df.columns
600
~\anaconda3\lib\site-packages\pandas\core\frame.py in __init__(self, data, index, columns, dtype, copy)
527
528 elif isinstance(data, dict):
--> 529 mgr = init_dict(data, index, columns, dtype=dtype)
530 elif isinstance(data, ma.MaskedArray):
531 import numpy.ma.mrecords as mrecords
~\anaconda3\lib\site-packages\pandas\core\internals\construction.py in init_dict(data, index, columns, dtype)
285 arr if not is_datetime64tz_dtype(arr) else arr.copy() for arr in arrays
286 ]
--> 287 return arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
288
289
~\anaconda3\lib\site-packages\pandas\core\internals\construction.py in arrays_to_mgr(arrays, arr_names, index, columns, dtype, verify_integrity)
93 axes = [columns, index]
94
---> 95 return create_block_manager_from_arrays(arrays, arr_names, axes)
96
97
~\anaconda3\lib\site-packages\pandas\core\internals\managers.py in create_block_manager_from_arrays(arrays, names, axes)
1704 return mgr
1705 except ValueError as e:
-> 1706 raise construction_error(len(arrays), arrays[0].shape, axes, e)
1707
1708
ValueError: Shape of passed values is (1, 2), indices imply (1044576, 2)
I realize this is saying that the number of rows being passed in to the cross tab doesn't match but can someone tell me why this is happening or where I am going wrong? I am copying the example code with my own data exactly as it was provided in the book I am working from .
Thank you so much!
Your target variable should be of shape (n,) not (n,1) as is your case when you call y_bi = allyrs[["CM"]] . See the relevant help page. There should be a warning about this because the fit will not work but I guess this was missed somehow.
If you call y_bi = allyrs["CM"], for example, if I set up some dummy data:
import numpy as np
import pandas as pd
np.random.seed(111)
allyrs = pd.DataFrame(np.random.binomial(1,0.5,(100,4)),columns=['x1','x2','x3','CM'])
X_bi = allyrs.iloc[:,:4]
y_bi = allyrs["CM"]
Then run the train test split followed by the fit:
from sklearn.model_selection import train_test_split
Xbi_train, Xbi_test, ybi_train, ybi_test = train_test_split(X_bi, y_bi,
train_size=0.8,test_size = 0.2)
from sklearn.linear_model import LogisticRegressionCV
logitbi_cv = LogisticRegressionCV(cv=2, random_state=0).fit(Xbi_train, ybi_train)
pred_logitbi_cv =logitbi_cv.predict(Xbi_train)
pd.crosstab(ybi_train, pred_logitbi_cv)
col_0 0 1
CM
0 39 0
1 0 41

How do I style a subset of a pandas dataframe?

I previously asked How do I style only the last row of a pandas dataframe? and got a perfect answer to the toy problem that I gave.
Turns out I should have made the toy problem a bit closer to my real problem. Consider a dataframe with more than 1 column of text data (which I can apply styling to):
import pandas as pd
import numpy as np
import seaborn as sns
cm = sns.diverging_palette(-5, 5, as_cmap=True)
df = pd.DataFrame(np.random.randn(3, 4))
df['text_column'] = 'a'
df['second_text_column'] = 'b'
df.style.background_gradient(cmap=cm)
However, like the previous question, I wish to only apply this styling to the last row. The answer to the previous question was:
df.style.background_gradient(cmap=cm, subset=df.index[-1])
which in this case gives the error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/usr/local/miniconda/lib/python3.7/site-packages/IPython/core/formatters.py in __call__(self, obj)
343 method = get_real_method(obj, self.print_method)
344 if method is not None:
--> 345 return method()
346 return None
347 else:
/usr/local/miniconda/lib/python3.7/site-packages/pandas/io/formats/style.py in _repr_html_(self)
161 Hooks into Jupyter notebook rich display system.
162 """
--> 163 return self.render()
164
165 #Appender(_shared_docs['to_excel'] % dict(
/usr/local/miniconda/lib/python3.7/site-packages/pandas/io/formats/style.py in render(self, **kwargs)
457 * table_attributes
458 """
--> 459 self._compute()
460 # TODO: namespace all the pandas keys
461 d = self._translate()
/usr/local/miniconda/lib/python3.7/site-packages/pandas/io/formats/style.py in _compute(self)
527 r = self
528 for func, args, kwargs in self._todo:
--> 529 r = func(self)(*args, **kwargs)
530 return r
531
/usr/local/miniconda/lib/python3.7/site-packages/pandas/io/formats/style.py in _apply(self, func, axis, subset, **kwargs)
536 if axis is not None:
537 result = data.apply(func, axis=axis,
--> 538 result_type='expand', **kwargs)
539 result.columns = data.columns
540 else:
/usr/local/miniconda/lib/python3.7/site-packages/pandas/core/frame.py in apply(self, func, axis, broadcast, raw, reduce, result_type, args, **kwds)
6485 args=args,
6486 kwds=kwds)
-> 6487 return op.get_result()
6488
6489 def applymap(self, func):
/usr/local/miniconda/lib/python3.7/site-packages/pandas/core/apply.py in get_result(self)
149 return self.apply_raw()
150
--> 151 return self.apply_standard()
152
153 def apply_empty_result(self):
/usr/local/miniconda/lib/python3.7/site-packages/pandas/core/apply.py in apply_standard(self)
255
256 # compute the result using the series generator
--> 257 self.apply_series_generator()
258
259 # wrap results
/usr/local/miniconda/lib/python3.7/site-packages/pandas/core/apply.py in apply_series_generator(self)
284 try:
285 for i, v in enumerate(series_gen):
--> 286 results[i] = self.f(v)
287 keys.append(v.name)
288 except Exception as e:
/usr/local/miniconda/lib/python3.7/site-packages/pandas/core/apply.py in f(x)
76
77 def f(x):
---> 78 return func(x, *args, **kwds)
79 else:
80 f = func
/usr/local/miniconda/lib/python3.7/site-packages/pandas/io/formats/style.py in _background_gradient(s, cmap, low, high, text_color_threshold)
941 smin = s.values.min()
942 smax = s.values.max()
--> 943 rng = smax - smin
944 # extend lower / upper bounds, compresses color range
945 norm = colors.Normalize(smin - (rng * low), smax + (rng * high))
TypeError: ("unsupported operand type(s) for -: 'str' and 'str'", 'occurred at index text_column')
<pandas.io.formats.style.Styler at 0x7f948dde7278>
which seems to come from the fact that it's trying to do an operation to strings in the text_column. Fair enough. How do I tell it to only apply to the last row for all non-text columns? I'm ok with giving it explicit column names to use or avoid, but I don't know how to pass that into this inscrutable subset method.
I am running:
python version 3.7.3
pandas version 0.24.2
Using a tuple for subset worked for me, but not sure if it is the most elegant solution:
df.style.background_gradient(cmap=cm,
subset=(df.index[-1], df.select_dtypes(float).columns))
Output:
You want to apply a style on a pandas dataframe and set different colors on differents columns or lines.
Here you can find a code ready to run on your own df. :)
Apply on lines using the axis = 0 and the subset on the df.index or as in this exemple on the columns axis=1 and the subset on the df.columns
cmaps = [
'Greys', 'Purples', 'Blues', 'Greens', 'Oranges', 'Reds',
'YlOrBr', 'YlOrRd', 'OrRd', 'PuRd', 'RdPu', 'BuPu',
'GnBu', 'PuBu', 'YlGnBu', 'PuBuGn', 'BuGn', 'YlGn'
]
df.style.\
background_gradient(
cmap=cmaps[1], axis=0
subset= (
df.index[:],
df.columns[df.columns.get_loc('nb tickets'):df.columns.get_loc('nb ref_prod')+1]
)
).\
background_gradient(
cmap=cmaps[3],
subset= (
df.index[:],
df.columns[df.columns.get_loc('am'):df.columns.get_loc('pm')+1]
)
).\
background_gradient(
cmap=cmaps[4],
subset= (
df.index[:],
df.columns[df.columns.get_loc('Week_1'):df.columns.get_loc('Week_5')+1]
)
).\
background_gradient(
cmap=cmaps[5],
subset= (
df.index[:],
df.columns[df.columns.get_loc('sum qty'):df.columns.get_loc('sum euro')+1]
)
)

Categories