Iterating forward and backward in Python - python

I have a coding interface which has a counter component. It simply increments by 1 with every update. Consider it an infinite generator of {1,2,3,...} over time which I HAVE TO use.
I need to use this value and iterate from -1.5 to 1.5. So, the iteration should start from -1.5 and reach 1.5 and then from 1.5 back to -1.5.
How should I use this infinite iterator to generate an iteration in that range?

You can use cycle from itertools to repeat a sequence.
from itertools import cycle
# build the list with 0.1 increment
v = [(x-15)/10 for x in range(31)]
v = v + list(reversed(v))
cv = cycle(v)
for c in my_counter:
x = next(cv)
This will repeat the list v:
-1.5, -1.4, -1.3, -1.2, -1.1, -1.0, -0.9, -0.8, -0.7, -0.6, -0.5, -0.4,
-0.3, -0.2, -0.1, 0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0,
1.1, 1.2, 1.3, 1.4, 1.5, 1.5, 1.4, 1.3, 1.2, 1.1, 1.0, 0.9, 0.8, 0.7,
0.6, 0.5, 0.4, 0.3, 0.2, 0.1, 0.0, -0.1, -0.2, -0.3, -0.4, -0.5, -0.6,
-0.7, -0.8, -0.9, -1.0, -1.1, -1.2, -1.3, -1.4, -1.5, -1.5, -1.4, -1.3,
-1.2, -1.1, -1.0, -0.9, -0.8, -0.7, -0.6, -0.5, -0.4, -0.3, -0.2, -0.1,
0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2, 1.3,
1.4, 1.5, 1.5, 1.4, 1.3, 1.2, 1.1, 1.0, 0.9 ...

Something like:
import itertools
infGenGiven = itertools.count() # This is similar your generator
def func(x):
if x%2==0:
return 1.5
else:
return -1.5
infGenCycle = itertools.imap(func, infGenGiven)
count=0
while count<10:
print infGenCycle.next()
count+=1
Output:
1.5
-1.5
1.5
-1.5
1.5
-1.5
1.5
-1.5
1.5
-1.5
Note that this starts 1.5 because the first value in infGenGiven is 0, although for your generator it is 1 and so the infGenCycle output will give you what you want.

Thank you all.
I guess the best approach is to use the trigonometric functions (sine or cosine) which oscillate between plus and minus one.
More details at: https://en.wikipedia.org/wiki/Trigonometric_functions
Cheers

Related

Prophet search trial freezes with optuna

I am currently using Prophet for a forecasting prototype, and I've done the search for the hyperparameters using optuna TPESampler. I have around 300 dataframes on which I have to compute the hyperparameter optimizations for. After completing a couple of dataframes the search suddenly freezes without any warning message and without exiting the proccess. It just gets blocked at a certain nr of trial (for the current search it has stopped, after completing 130 dataframes hyperparameter optimization at trial 786 out of 1000). I've tried to increase and decrease the number of trials per dataframe & (as expected) I've seen that as long as the number of trials goes down, more dataframes gets to be computed & the more the number of trials goes up (like 2000) less dataframes gets to be computed. Has everyone encountered this before? It really gets annoying because I have to restart the search because it can never get to an end. I am not quite sure if this is a problem with optuna or Prophet so I will post this on their github issue too.
It gets stucked inside the search function I've created, in the cross_validation() method from Prophet (I can tell that because in here it creates parallel processes = how many folds we want to test the subset of hyperparameters, in my code are defined as cutoffs)
Optuna version: 3.0.3
Python version: 3.9.13
OS: Linux-5.15.0-56-generic-x86_64-with-glibc2.31
numpy: 1.23.5
pandas: 1.5.2
prophet: 1.1.1
scikit_learn: 1.2.0
Here is the code I am using:
def generate_cutoffs(df, weeks_to_forecast):
cutoffs = pd.to_datetime(
[
df.at[len(df) - (weeks_to_forecast*9+1), "ds"],
df.at[len(df) - (weeks_to_forecast*8+1), "ds"],
df.at[len(df) - (weeks_to_forecast*7+1), "ds"],
df.at[len(df) - (weeks_to_forecast*6+1), "ds"],
df.at[len(df) - (weeks_to_forecast*5+1), "ds"],
df.at[len(df) - (weeks_to_forecast*4+1), "ds"],
df.at[len(df) - (weeks_to_forecast*3+1), "ds"],
df.at[len(df) - (weeks_to_forecast*2+1), "ds"],
df.at[len(df) - (weeks_to_forecast+1), "ds"],
]
)
return cutoffs
def get_best_hp_prophet_cv(df_for_cv, weeks_to_forecast, freq, holidays_df):
logging.getLogger('prophet').setLevel(logging.ERROR)
logging.getLogger('fbprophet').setLevel(logging.ERROR)
# print('DF entered for search: ', df_for_cv)
cutoffs = generate_cutoffs(df_for_cv, weeks_to_forecast, freq)
def objective(trial):
# print(cutoffs)
# print(df.tail(30))
param_grid = {
"changepoint_prior_scale": trial.suggest_categorical(
"changepoint_prior_scale", [0.001, 0.002, 0.005, 0.01, 0.02, 0.05, 0.09, 0.1, 0.2, 0.5, 0.7, 0.8, 0.9, 1.0, 2.0, 3.0, 5.0, 9.0, 10.0]
),
"seasonality_prior_scale": trial.suggest_categorical(
"seasonality_prior_scale", [0.001, 0.002, 0.005, 0.01, 0.02, 0.05, 0.09, 0.1, 0.2, 0.5, 0.7, 0.8, 0.9, 1.0, 2.0, 3.0, 5.0, 9.0, 10.0]
),
"seasonality_mode": trial.suggest_categorical(
"seasonality_mode", ["multiplicative", "additive"]
),
"growth": trial.suggest_categorical("growth", ["linear"]),
"yearly_seasonality": trial.suggest_categorical(
"yearly_seasonality", [3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 17]
),
"daily_seasonality": trial.suggest_categorical("daily_seasonality",[False]),
"weekly_seasonality": trial.suggest_categorical("weekly_seasonality",[False]),
"uncertainty_samples": trial.suggest_categorical("uncertainty_samples",[0]),
}
prior_scale_month = trial.suggest_categorical('prior_scale_month', [0.001, 0.002, 0.005, 0.01, 0.02, 0.05, 0.09, 0.1, 0.2, 0.5, 0.7, 0.8, 0.9, 1.0, 2.0, 3.0, 5.0, 9.0, 10.0])
prior_scale_week_num = trial.suggest_categorical('prior_scale_week_num', [0.001, 0.002, 0.005, 0.01, 0.02, 0.05, 0.09, 0.1, 0.2, 0.5, 0.7, 0.8, 0.9, 1.0, 2.0, 3.0, 5.0, 9.0, 10.0])
prior_scale_avg_month_qty_over_df = trial.suggest_categorical('prior_scale_avg_month_qty_over_df', [0.001, 0.002, 0.005, 0.01, 0.02, 0.05, 0.09, 0.1, 0.2, 0.5, 0.7, 0.8, 0.9, 1.0, 2.0, 3.0, 5.0, 9.0, 10.0])
prior_scale_avg_week_nr_qty_over_df = trial.suggest_categorical('prior_scale_avg_week_nr_qty_over_df', [0.001, 0.002, 0.005, 0.01, 0.02, 0.05, 0.09, 0.1, 0.2, 0.5, 0.7, 0.8, 0.9, 1.0, 2.0, 3.0, 5.0, 9.0, 10.0])
prior_scale_avg_solar_rad_of_month = trial.suggest_categorical('prior_scale_avg_solar_rad_of_month', [0.001, 0.002, 0.005, 0.01, 0.02, 0.05, 0.09, 0.1, 0.2, 0.5, 0.7, 0.8, 0.9, 1.0, 2.0, 3.0, 5.0, 9.0, 10.0])
prior_scale_avg_solar_rad_of_week_nr = trial.suggest_categorical('prior_scale_avg_solar_rad_of_week_nr', [0.001, 0.002, 0.005, 0.01, 0.02, 0.05, 0.09, 0.1, 0.2, 0.5, 0.7, 0.8, 0.9, 1.0, 2.0, 3.0, 5.0, 9.0, 10.0])
prior_scale_avg_temp_of_month = trial.suggest_categorical('prior_scale_avg_temp_of_month', [0.001, 0.002, 0.005, 0.01, 0.02, 0.05, 0.09, 0.1, 0.2, 0.5, 0.7, 0.8, 0.9, 1.0, 2.0, 3.0, 5.0, 9.0, 10.0])
prior_scale_avg_temp_of_week_nr = trial.suggest_categorical('prior_scale_avg_temp_of_week_nr', [0.001, 0.002, 0.005, 0.01, 0.02, 0.05, 0.09, 0.1, 0.2, 0.5, 0.7, 0.8, 0.9, 1.0, 2.0, 3.0, 5.0, 9.0, 10.0])
prior_scale_avg_pres_of_month = trial.suggest_categorical('prior_scale_avg_pres_of_month', [0.001, 0.002, 0.005, 0.01, 0.02, 0.05, 0.09, 0.1, 0.2, 0.5, 0.7, 0.8, 0.9, 1.0, 2.0, 3.0, 5.0, 9.0, 10.0])
prior_scale_avg_pres_of_week_nr = trial.suggest_categorical('prior_scale_avg_pres_of_week_nr', [0.001, 0.002, 0.005, 0.01, 0.02, 0.05, 0.09, 0.1, 0.2, 0.5, 0.7, 0.8, 0.9, 1.0, 2.0, 3.0, 5.0, 9.0, 10.0])
prior_scale_avg_solar_rad_last_week_week_nr = trial.suggest_categorical('prior_scale_avg_solar_rad_last_week_week_nr', [0.001, 0.002, 0.005, 0.01, 0.02, 0.05, 0.09, 0.1, 0.2, 0.5, 0.7, 0.8, 0.9, 1.0, 2.0, 3.0, 5.0, 9.0, 10.0])
prior_scale_avg_temp_last_week_week_nr = trial.suggest_categorical('prior_scale_avg_temp_last_week_week_nr', [0.001, 0.002, 0.005, 0.01, 0.02, 0.05, 0.09, 0.1, 0.2, 0.5, 0.7, 0.8, 0.9, 1.0, 2.0, 3.0, 5.0, 9.0, 10.0])
prior_scale_avg_pres_last_week_week_nr = trial.suggest_categorical('prior_scale_avg_pres_last_week_week_nr', [0.001, 0.002, 0.005, 0.01, 0.02, 0.05, 0.09, 0.1, 0.2, 0.5, 0.7, 0.8, 0.9, 1.0, 2.0, 3.0, 5.0, 9.0, 10.0])
prior_scale_avg_solar_rad_last_2weeks_week_nr = trial.suggest_categorical('prior_scale_avg_solar_rad_last_2weeks_week_nr', [0.001, 0.002, 0.005, 0.01, 0.02, 0.05, 0.09, 0.1, 0.2, 0.5, 0.7, 0.8, 0.9, 1.0, 2.0, 3.0, 5.0, 9.0, 10.0])
prior_scale_avg_temp_last_2weeks_week_nr = trial.suggest_categorical('prior_scale_avg_temp_last_2weeks_week_nr', [0.001, 0.002, 0.005, 0.01, 0.02, 0.05, 0.09, 0.1, 0.2, 0.5, 0.7, 0.8, 0.9, 1.0, 2.0, 3.0, 5.0, 9.0, 10.0])
prior_scale_avg_pres_last_2weeks_week_nr = trial.suggest_categorical('prior_scale_avg_pres_last_2weeks_week_nr', [0.001, 0.002, 0.005, 0.01, 0.02, 0.05, 0.09, 0.1, 0.2, 0.5, 0.7, 0.8, 0.9, 1.0, 2.0, 3.0, 5.0, 9.0, 10.0])
prior_scale_avg_solar_rad_next_week_week_nr = trial.suggest_categorical('prior_scale_avg_solar_rad_next_week_week_nr', [0.001, 0.002, 0.005, 0.01, 0.02, 0.05, 0.09, 0.1, 0.2, 0.5, 0.7, 0.8, 0.9, 1.0, 2.0, 3.0, 5.0, 9.0, 10.0])
prior_scale_avg_temp_next_week_week_nr = trial.suggest_categorical('prior_scale_avg_temp_next_week_week_nr', [0.001, 0.002, 0.005, 0.01, 0.02, 0.05, 0.09, 0.1, 0.2, 0.5, 0.7, 0.8, 0.9, 1.0, 2.0, 3.0, 5.0, 9.0, 10.0])
prior_scale_avg_pres_next_week_week_nr = trial.suggest_categorical('prior_scale_avg_pres_next_week_week_nr', [0.001, 0.002, 0.005, 0.01, 0.02, 0.05, 0.09, 0.1, 0.2, 0.5, 0.7, 0.8, 0.9, 1.0, 2.0, 3.0, 5.0, 9.0, 10.0])
prior_scale_avg_solar_rad_of_last_month = trial.suggest_categorical('prior_scale_avg_solar_rad_of_last_month', [0.001, 0.002, 0.005, 0.01, 0.02, 0.05, 0.09, 0.1, 0.2, 0.5, 0.7, 0.8, 0.9, 1.0, 2.0, 3.0, 5.0, 9.0, 10.0])
prior_scale_avg_temp_of_last_month = trial.suggest_categorical('prior_scale_avg_temp_of_last_month', [0.001, 0.002, 0.005, 0.01, 0.02, 0.05, 0.09, 0.1, 0.2, 0.5, 0.7, 0.8, 0.9, 1.0, 2.0, 3.0, 5.0, 9.0, 10.0])
prior_scale_avg_pres_of_last_month = trial.suggest_categorical('prior_scale_avg_pres_of_last_month', [0.001, 0.002, 0.005, 0.01, 0.02, 0.05, 0.09, 0.1, 0.2, 0.5, 0.7, 0.8, 0.9, 1.0, 2.0, 3.0, 5.0, 9.0, 10.0])
prior_scale_avg_qty_last_2weeks_week_nr = trial.suggest_categorical('prior_scale_avg_qty_last_2weeks_week_nr', [0.001, 0.002, 0.005, 0.01, 0.02, 0.05, 0.09, 0.1, 0.2, 0.5, 0.7, 0.8, 0.9, 1.0, 2.0, 3.0, 5.0, 9.0, 10.0])
prior_scale_avg_qty_next_week_week_nr = trial.suggest_categorical('prior_scale_avg_qty_next_week_week_nr', [0.001, 0.002, 0.005, 0.01, 0.02, 0.05, 0.09, 0.1, 0.2, 0.5, 0.7, 0.8, 0.9, 1.0, 2.0, 3.0, 5.0, 9.0, 10.0])
# I ve used this only for testing to see if everything works fine
# param_grid = {
# 'changepoint_prior_scale': trial.suggest_categorical('changepoint_prior_scale', [0.001]),
# 'seasonality_prior_scale': trial.suggest_categorical('seasonality_prior_scale',[0.01, 0.1]),
# 'seasonality_mode' : trial.suggest_categorical('seasonality_mode',['additive']),
# 'growth': trial.suggest_categorical('growth',['linear']),
# 'yearly_seasonality': trial.suggest_categorical('yearly_seasonality',[14,15]),
# 'holidays_prior_scale' : trial.suggest_categorical('holidays_prior_scale',[10])
# }
# all_params = [dict(zip(param_grid.keys(), v)) for v in itertools.product(*param_grid.values())]
# mses = [] # Store the MSEs for each params here
# Use cross validation to evaluate all parameters
# for params in all_params:
m = Prophet(**param_grid, holidays = holidays_df)
m.add_regressor('month', prior_scale = prior_scale_month)
m.add_regressor('week_num', prior_scale = prior_scale_week_num)
m.add_regressor('avg_month_qty_over_df', prior_scale = prior_scale_avg_month_qty_over_df)
m.add_regressor('avg_week_nr_qty_over_df', prior_scale = prior_scale_avg_week_nr_qty_over_df)
m.add_regressor('avg_solar_rad_of_month', prior_scale = prior_scale_avg_solar_rad_of_month)
m.add_regressor('avg_solar_rad_of_week_nr', prior_scale = prior_scale_avg_solar_rad_of_week_nr)
m.add_regressor('avg_pres_of_month', prior_scale = prior_scale_avg_pres_of_month)
m.add_regressor('avg_pres_of_week_nr', prior_scale = prior_scale_avg_pres_of_week_nr)
m.add_regressor('avg_temp_of_month', prior_scale = prior_scale_avg_temp_of_month)
m.add_regressor('avg_temp_of_week_nr', prior_scale = prior_scale_avg_temp_of_week_nr)
m.add_regressor('avg_solar_rad_last_week_week_nr', prior_scale = prior_scale_avg_solar_rad_last_week_week_nr)
m.add_regressor('avg_temp_last_week_week_nr', prior_scale = prior_scale_avg_temp_last_week_week_nr)
m.add_regressor('avg_pres_last_week_week_nr', prior_scale = prior_scale_avg_pres_last_week_week_nr)
m.add_regressor('avg_solar_rad_last_2weeks_week_nr', prior_scale = prior_scale_avg_solar_rad_last_2weeks_week_nr)
m.add_regressor('avg_temp_last_2weeks_week_nr', prior_scale = prior_scale_avg_temp_last_2weeks_week_nr)
m.add_regressor('avg_pres_last_2weeks_week_nr', prior_scale = prior_scale_avg_pres_last_2weeks_week_nr)
m.add_regressor('avg_solar_rad_next_week_week_nr', prior_scale = prior_scale_avg_solar_rad_next_week_week_nr)
m.add_regressor('avg_temp_next_week_week_nr', prior_scale = prior_scale_avg_temp_next_week_week_nr)
m.add_regressor('avg_pres_next_week_week_nr', prior_scale = prior_scale_avg_pres_next_week_week_nr)
m.add_regressor('avg_solar_rad_of_last_month', prior_scale = prior_scale_avg_solar_rad_of_last_month)
m.add_regressor('avg_temp_of_last_month', prior_scale = prior_scale_avg_temp_of_last_month)
m.add_regressor('avg_pres_of_last_month', prior_scale = prior_scale_avg_pres_of_last_month)
m.add_regressor('avg_qty_last_2weeks_week_nr', prior_scale = prior_scale_avg_qty_last_2weeks_week_nr)
m.add_regressor('avg_qty_next_week_week_nr', prior_scale = prior_scale_avg_qty_next_week_week_nr)
print("Model_params currently are: ", param_grid)
dict_with_non_model_params_to_print = {
'month': prior_scale_month,
'week_num': prior_scale_week_num,
'avg_month_qty_over_df': prior_scale_avg_month_qty_over_df,
'avg_week_nr_qty_over_df': prior_scale_avg_week_nr_qty_over_df,
'avg_solar_rad_of_month': prior_scale_avg_solar_rad_of_month,
'avg_solar_rad_of_week_nr': prior_scale_avg_solar_rad_of_week_nr,
'avg_pres_of_month': prior_scale_avg_pres_of_month,
'avg_pres_of_week_nr': prior_scale_avg_pres_of_week_nr,
'avg_temp_of_month': prior_scale_avg_temp_of_month,
'avg_temp_of_week_nr': prior_scale_avg_temp_of_week_nr,
'avg_solar_rad_last_week_week_nr': prior_scale_avg_solar_rad_last_week_week_nr,
'avg_temp_last_week_week_nr': prior_scale_avg_temp_last_week_week_nr,
'avg_pres_last_week_week_nr': prior_scale_avg_pres_last_week_week_nr,
'avg_solar_rad_last_2weeks_week_nr':prior_scale_avg_solar_rad_last_2weeks_week_nr,
'avg_temp_last_2weeks_week_nr': prior_scale_avg_temp_last_2weeks_week_nr,
'avg_pres_last_2weeks_week_nr': prior_scale_avg_pres_last_2weeks_week_nr,
'avg_solar_rad_next_week_week_nr': prior_scale_avg_solar_rad_next_week_week_nr,
'avg_temp_next_week_week_nr': prior_scale_avg_temp_next_week_week_nr,
'avg_pres_next_week_week_nr': prior_scale_avg_pres_next_week_week_nr,
'avg_solar_rad_of_last_month': prior_scale_avg_solar_rad_of_last_month,
'avg_temp_of_last_month': prior_scale_avg_temp_of_last_month,
'avg_pres_of_last_month': prior_scale_avg_pres_of_last_month,
'avg_qty_last_2weeks_week_nr': prior_scale_avg_qty_last_2weeks_week_nr,
'avg_qty_next_week_week_nr': prior_scale_avg_qty_next_week_week_nr
}
print("Non_model_params_currently are: ", dict_with_non_model_params_to_print)
# m.add_country_holidays(country_name='UK')
# m.add_regressor('month')
# m.add_regressor('week_num')
# m.add_regressor('avg_month_qty_over_df')
# m.add_regressor('avg_week_nr_qty_over_df')
# m.add_regressor('BH_Minus_1_Week')
# m.add_regressor('BH_Minus_4_Weekday')
# m.add_regressor('BH_Minus_3_Weekday')
# m.add_regressor('BH_Minus_2_Weekday')
# m.add_regressor('BH_Minus_1_Weekday')
# m.add_regressor('BH_Nearest_Sat')
# m.add_regressor('Between_Xmas_NY')
# m.add_regressor('Xmas_NY')
# m.add_regressor('BH')
# m.add_regressor('BH_Plus_1_Weekday')
# m.add_regressor('BH_Plus_2_Weekday')
# m.add_regressor('BH_Plus_3_Weekday')
# m.add_regressor('BH_Plus_4_Weekday')
# m.add_regressor('BH_Plus_1_Week')
# m.add_regressor('Holiday_Xmas_Before_After')
# m.add_regressor('Holiday_Xmas')
# m.add_regressor('Holiday_Easter_Before_After')
# m.add_regressor('Holiday_Easter')
# m.add_regressor('Holiday_Summer_Before_After')
# m.add_regressor('Holiday_Summer')
# m.add_regressor('Half_Term')
m.fit(df_for_cv)
df_cv = cross_validation(
m, cutoffs=cutoffs, horizon="{} days".format(weeks_to_forecast*7), parallel="processes"
)
df_p = performance_metrics(df_cv, rolling_window=1)
return df_p["mse"].values[0]
# Find the best parameters
optuna_prophet = optuna.create_study(
direction="minimize", sampler=TPESampler(seed=321)
)
# * n_trials optuna hyperparameter.
# optuna_prophet.optimize(objective, n_trials=5000)
optuna_prophet.optimize(objective, n_trials=1000)
prophet_trial = optuna_prophet.best_trial
prophet_trial_params = prophet_trial.params
list_of_variables_outside_the_param_grid = ['prior_scale_month',
'prior_scale_week_num',
'prior_scale_avg_month_qty_over_df',
'prior_scale_avg_week_nr_qty_over_df',
'prior_scale_avg_solar_rad_of_month',
'prior_scale_avg_solar_rad_of_week_nr',
'prior_scale_avg_temp_of_month',
'prior_scale_avg_temp_of_week_nr',
'prior_scale_avg_pres_of_month',
'prior_scale_avg_pres_of_week_nr',
'prior_scale_avg_solar_rad_last_week_week_nr',
'prior_scale_avg_temp_last_week_week_nr',
'prior_scale_avg_pres_last_week_week_nr',
'prior_scale_avg_solar_rad_last_2weeks_week_nr',
'prior_scale_avg_temp_last_2weeks_week_nr',
'prior_scale_avg_pres_last_2weeks_week_nr',
'prior_scale_avg_solar_rad_next_week_week_nr',
'prior_scale_avg_temp_next_week_week_nr',
'prior_scale_avg_pres_next_week_week_nr',
'prior_scale_avg_solar_rad_of_last_month',
'prior_scale_avg_temp_of_last_month',
'prior_scale_avg_pres_of_last_month',
'prior_scale_avg_qty_last_2weeks_week_nr',
'prior_scale_avg_qty_next_week_week_nr']
params_outside_the_param_grid={}
param_grid = {}
for param_name in prophet_trial_params.keys():
if param_name in list_of_variables_outside_the_param_grid:
params_outside_the_param_grid.update({param_name : prophet_trial_params[param_name]})
else:
param_grid.update({param_name : prophet_trial_params[param_name]})
return param_grid, params_outside_the_param_grid
The 300 dataframes on which I tried to optimize the Prophet model for, are about >280 rows & < 290 rows each. I do a 9 fold cross_validation using cuttofs (a list datetime objects, they are created using the generate_cutoffs() where df is the dataframe I want to compute the hyperparameter optimization, and weeks_to_forecast is always 12)
Has anybody experienced something similar? Or does anybody know any workarounds?

How can a tensor in tensorflow be sliced ​using elements of another array as an index?

I'm looking for a similar function to tf.unsorted_segment_sum, but I don't want to sum the segments, I want to get every segment as a tensor.
So for example, I have this code:
(In real, I have a tensor with shapes of (10000, 63), and the number of segments would be 2500)
to_be_sliced = tf.constant([[0.1, 0.2, 0.3, 0.4, 0.5],
[0.3, 0.2, 0.2, 0.6, 0.3],
[0.9, 0.8, 0.7, 0.6, 0.5],
[2.0, 2.0, 2.0, 2.0, 2.0]])
indices = tf.constant([0, 2, 0, 1])
num_segments = 3
tf.unsorted_segment_sum(to_be_sliced, indices, num_segments)
The output would be here
array([sum(row1+row3), row4, row2]
What I am looking for is 3 tensor with different shapes (maybe a list of tensors), first containing the first and third rows of the original (shape of (2, 5)), the second contains the 4th row (shape of (1, 5)), the third contains the second row, like this:
[array([[0.1, 0.2, 0.3, 0.4, 0.5],
[0.9, 0.8, 0.7, 0.6, 0.5]]),
array([[2.0, 2.0, 2.0, 2.0, 2.0]]),
array([[0.3, 0.2, 0.2, 0.6, 0.3]])]
Thanks in advance!
You can do that like this:
import tensorflow as tf
to_be_sliced = tf.constant([[0.1, 0.2, 0.3, 0.4, 0.5],
[0.3, 0.2, 0.2, 0.6, 0.3],
[0.9, 0.8, 0.7, 0.6, 0.5],
[2.0, 2.0, 2.0, 2.0, 2.0]])
indices = tf.constant([0, 2, 0, 1])
num_segments = 3
result = [tf.boolean_mask(to_be_sliced, tf.equal(indices, i)) for i in range(num_segments)]
with tf.Session() as sess:
print(*sess.run(result), sep='\n')
Output:
[[0.1 0.2 0.3 0.4 0.5]
[0.9 0.8 0.7 0.6 0.5]]
[[2. 2. 2. 2. 2.]]
[[0.3 0.2 0.2 0.6 0.3]]
For your case, you can do Numpy slicing in Tensorflow. So this will work:
sliced_1 = to_be_sliced[:3, :]
# [[0.4 0.5 0.5 0.7 0.8]
# [0.3 0.2 0.2 0.6 0.3]
# [0.3 0.2 0.2 0.6 0.3]]
sliced_2 = to_be_sliced[3, :]
# [0.3 0.2 0.2 0.6 0.3]
Or a more general option, you can do it in the following way:
to_be_sliced = tf.constant([[0.1, 0.2, 0.3, 0.4, 0.5],
[0.3, 0.2, 0.2, 0.6, 0.3],
[0.9, 0.8, 0.7, 0.6, 0.5],
[2.0, 2.0, 2.0, 2.0, 2.0]])
first_tensor = tf.gather_nd(to_be_sliced, [[0], [2]])
second_tensor = tf.gather_nd(to_be_sliced, [[3]])
third_tensor = tf.gather_nd(to_be_sliced, [[1]])
concat = tf.concat([first_tensor, second_tensor, third_tensor], axis=0)

Normalize with respect to row and column

I have an array of probabilities. I would like the columns to sum to 1 (representing probability) and the rows to sum to X (where X is an integer, say 9 for example).
I thought that I could normalize the columns, and then normalize the rows and times by X. But this didn't work, the resulting sums of the rows and columns were not perfectly 1.0 and X.
This is what I tried:
# B is 5 rows by 30 columns
# Normalizing columns to 1.0
col_sum = []
for col in B.T:
col_sum.append(sum(col))
for row in range(B.shape[0]):
for col in range(B.shape[1]):
if B[row][col] != 0.0 and B[row][col] != 1.0:
B[row][col] = (B[row][col] / col_sum[col])
# Normalizing rows to X (9.0)
row_sum = []
for row in B:
row_sum.append(sum(row))
for row in range(B.shape[0]):
for col in range(B.shape[1]):
if B[row][col] != 0.0 and B[row][col] != 1.0:
B[row][col] = (B[row][col] / row_sum[row]) * 9.0
I'm not sure if I understood correctly, but it seems like what you're trying to accomplish might mathematically not be feasible?
Imagine you have a 2x2 matrix where you want the rows to sum up to 1 and the columns to 10. Even if you made all the numbers in the columns 1 (their max possible value) you would still not be able to sum them up to 10 in their columns?
This can only work if your matrix's number of columns is X times the number of rows. For example, if X = 3 and you have 5 rows, then you must have 15 columns. So, you could make your 5x30 matrix work for X=6 but not X=9.
The reason for this is that, if each column sums up to 1.0, the total of all values in the matrix will be 1.0 times the number of columns. And since you want each row to sum up to X, then the total of all values must also be X times the number of rows.
So: Columns * 1.0 = X * Rows
If that constraint is met, you only have to adjust all values proportionally to X/sum(row) and both dimensions will work automatically unless the initial values are not properly balanced. If the matrix is not already balanced, adjusting the values would be similar to solving a sudoku (allegedly an NP problem) and the result would largely be unrelated to the initial values. The matrix is balanced when all rows, adjusted to have the same sum, result in all columns having the same sum.
[0.7, 2.1, 1.4, 0.7, 1.4, 1.4, 0.7, 1.4, 1.4, 2.1, 0.7, 2.1, 1.4, 2.1, 1.4] 21
[2.8, 1.4, 0.7, 2.1, 1.4, 2.1, 0.7, 1.4, 2.1, 1.4, 0.7, 0.7, 1.4, 0.7, 1.4] 21
[1.4, 1.4, 1.4, 1.4, 1.4, 1.4, 1.4, 1.4, 1.4, 0.7, 2.8, 0.7, 0.7, 1.4, 2.1] 21
[1.4, 1.4, 1.4, 1.4, 2.1, 1.4, 1.4, 1.4, 0.7, 0.7, 2.1, 1.4, 1.4, 1.4, 1.4] 21
[0.7, 0.7, 2.1, 1.4, 0.7, 0.7, 2.8, 1.4, 1.4, 2.1, 0.7, 2.1, 2.1, 1.4, 0.7] 21
apply x = x * 3 / 21 to all elements ...
[0.1, 0.3, 0.2, 0.1, 0.2, 0.2, 0.1, 0.2, 0.2, 0.3, 0.1, 0.3, 0.2, 0.3, 0.2] 3.0
[0.4, 0.2, 0.1, 0.3, 0.2, 0.3, 0.1, 0.2, 0.3, 0.2, 0.1, 0.1, 0.2, 0.1, 0.2] 3.0
[0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.1, 0.4, 0.1, 0.1, 0.2, 0.3] 3.0
[0.2, 0.2, 0.2, 0.2, 0.3, 0.2, 0.2, 0.2, 0.1, 0.1, 0.3, 0.2, 0.2, 0.2, 0.2] 3.0
[0.1, 0.1, 0.3, 0.2, 0.1, 0.1, 0.4, 0.2, 0.2, 0.3, 0.1, 0.3, 0.3, 0.2, 0.1] 3.0
[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]

how to add value within certain intervals only python

I have a dataframe and values in a columns ranges from -1 to 1. I want to add 0.1 to all value between -1 to 0.6 only. Is it possible to do it?
suppose a is my list:
a = ([-1. , -0.5, 0.1 , 0.2, 0.45, 0.7, 0.64, 1])
and I want this:
([-0.9, -0.4, 0.2, 0.3, 0.55, 0.7, 0.74, 1])
Yes, it's possible:
a = [-1. , -0.5, 0.1 , 0.2, 0.45, 0.7, 0.64, 1]
a = [x + 0.1 if -1 <= x <= 0.6 else x for x in a]
print a
Results:
[-0.9, -0.4, 0.2, 0.3, 0.55, 0.7, 0.64, 1]

Combining rows of the same key into single array

I have a pandas dataframe as follows:
error
0: [[0.1,0.4,-0.3]]
1: [[-0.6,-0.3,0.2]]
.
.
.
99: [[0.4,-0.7,0.1]]
I would like to combine all values into a single array like this:
[0.1,0.4,-0.3,-0.6,-0.3,0.2,...,0.4,-0.7,0.1]
Is there a fast way to do this using pandas or do I need to iterate over the data and build the array "manually" ?
The data order, in this case, is not important.
In a more general case, how to combine arrays that don't have the same size (e.g. row 0 contains an array of 3 elements, row 1 contains an array of 6 elements,etc...) ?
Use numpy.ravel:
L = np.array(df['error'].values.tolist()).ravel().tolist()
print (L)
[0.1, 0.4, -0.3, -0.6, -0.3, 0.2, 0.4, -0.7, 0.1]
More general solutions with str[0] for select nested lists:
print (df)
error
0 [[0.1,0.4,-0.3]]
1 [[-0.6,-0.3]]
99 [[0.4,-0.7,0.1]]
from itertools import chain
L = list(chain.from_iterable(df['error'].str[0]))
print (L)
[0.1, 0.4, -0.3, -0.6, -0.3, 0.4, -0.7, 0.1]
L = np.concatenate(df['error'].str[0].values).tolist()
print (L)
[0.1, 0.4, -0.3, -0.6, -0.3, 0.4, -0.7, 0.1]
df=pd.DataFrame([[0.1,0.4,-0.3],[-0.6,-0.3,0.2]])
df.values.flatten()
will return :
array([ 0.1, 0.4, -0.3, -0.6, -0.3, 0.2])
if you would like to append the element by column
df.values.flatten(order='F')
then it will return:
array([ 0.1, -0.6, 0.4, -0.3, -0.3, 0.2])

Categories