Sort PowerPoint chart data with Python? - python

I am sourcing chart data from an excel spreadsheet, by using Openpyxl.
I need to be able to sort this data largest to smallest without it getting jumbled. Sometimes there are multiple series in the charts.
Is there a method by which this can be accomplished?
If it cannot be sorted before being dropped into the plot, is there a means by which it can be sorted afterward? I imagine it would need to be located, then sorted. This would need to happen for every chart in the document.

Here's what I did to solve this problem, and it seems to be working currently. If anyone has suggestions to make this process more streamlined, I'm all ears!
for cat in data_collect['categories']: #creates new lists by category
new_list = []
for key in data_collect:
if key != 'categories':
new_list.append(data_collect[key][indexcounter])
else:
pass
new_list.insert(1, cat)
indexcounter += 1
data_sort.append(new_list)
data_sort.sort() #sorts list by first column
cat_list_sorted = []
for lst in data_sort: #removes sorted category list and drops it into the chart
cat_list_sorted.append(lst[1])
del lst[1]
chart_data.categories = cat_list_sorted
indexcounter = 0 #pulls data back apart and creates new series lists. Drops each series into ChartData()
while indexcounter < len(series_name_list):
series_sorted = []
for lst in data_sort:
series_sorted.append(lst[indexcounter])
series_name = series_name_list[indexcounter]
chart_data.add_series(series_name, series_sorted, '0%')
indexcounter += 1```

Related

Different ways of iterating through pandas DataFrame

I am currently working on a short pandas project. The project assessment keeps marking this task as incorrect for me even though the resulting list appears to be the same as when the provided correct code is used. Is my code wrong and it just happens to give the same results for this particular DataFrame?
My code:
# Define an empty list
colors = []
# Iterate over rows of netflix_movies_col_subset
for t in netflix_movies_col_subset['genre']:
if t == 'Children' :
colors.append('red')
elif t == 'Documentaries' :
colors.append('blue')
elif t == 'Stand-up' :
colors.append('green')
else:
colors.append('black')
# Inspect the first 10 values in your list
print(colors[:10])
Provided code:
# Define an empty list
colors = []
# Iterate over rows of netflix_movies_col_subset
for lab, row in netflix_movies_col_subset.iterrows():
if row['genre'] == 'Children' :
colors.append('red')
elif row['genre'] == 'Documentaries' :
colors.append('blue')
elif row['genre'] == 'Stand-up' :
colors.append('green')
else:
colors.append('black')
# Inspect the first 10 values in your list
print(colors[0:10])
I've always been told, that the best way to iterate over a dataframe row by row is NOT TO DO IT.
I your case, you could very nicely use df.ne()
First create a dataframe that holds all genres (df_genres)
then use
netflix_movies_col_subset['genre'].ne(df_genres, axis=0)
this should create a dataframe that has a line for every movie and columns for every genre. If a certain movie is a documentary, values in all columns would be False, only in the Documentary column it would be True.
This method is by multiple orders of magnitude faster than iterating with multiple if statements.
Does this help? I haven't test it yet.
# Define an empty list
colors = []
# Iterate over rows of netflix_movies_col_subset
for t in netflix_movies_col_subset['genre']:
if t == 'Children' :
x='red'
elif t == 'Documentaries' :
x= 'blue'
elif t == 'Stand-up' :
x ='green'
else:
x ='black'
colors.append(x)
# Inspect the first 10 values in your list
print(colors[:10])
Or you can do match case.
# Define an empty list
colors = []
# Iterate over rows of netflix_movies_col_subset
for t in netflix_movies_col_subset['genre']:
match t:
case 'Children':
x ='red'
case 'Documentaries':
x ='blue'
case 'Stand-up':
x ='green'
else:
x ='black'
colors.appent(x)
# Inspect the first 10 values in your list
print(colors[:10])

Index count being wrapped into the data as one object

First time on stackoverflow so bear with me. Code is below. Basically, the df_history is a dataframe with different variables. I am trying to pull the 'close' variable and sort it based on the categorical type of the currency.
When I pull data over using the .query command, it gives me 1 object with all the individual observations together separated by a space. I know how to separate that back into independent data, but issue is that it is pulling the index count with the observations. In the image you can see 179, 178, 177 etc in the BTC object. I dont want that there and didnt indend to pull it. How do I get rid of that?
additional_rows = []
for currency in selected_coins:
df_history = df_history.sort_values(['date'], ascending=True)
row_data = [currency,
df_history.query('granularity == \'daily\' and currency == #currency')['close'],
df_history.query('granularity == \'daily\' and currency == #currency').head(180)['close'].pct_change(),
df_history['date']
]
additional_rows.append(row_data)
df_additional_info = pd.DataFrame(additional_rows, columns = ['currency',
'close',
'returns',
'df_history'])
df_additional_info.set_index('currency').transpose()
import ast
list_of_lists = df_additional_info.close.to_list()
flat_list = [i for sublist in list_of_lists for i in ast.literal_eval(sublist)]
uniq_list = list(set(flat_list))
len(uniq_list),len(flat_list)
I was trying to pull data from one data frame to the next and sort it based on a categorical input from the currency variable. It is not transferring over well

Operating on Dictionary of Pandas Dataframes

I have a dictionary of dataframes dico where each dataframe looks something like:
Cust Cont Rate
0 Cust Cont Rate
1 Vent 8001 TOU-GS
2 Vent 8001 TOU-GS
3 nan nan nan
I am trying to operate on the dictionary to clean up each dataframe, first by dropping the row containing column headers (whichever row they happen to be in) and dropping any rows or columns full of nulls.
colheaders = ['Cust','Cont','Rate']
for key, item in dico.items():
item = item.drop(item[item.iloc[:,0].isin(colheaders)].index)
item = item.dropna(how = 'all')
item = item.dropna(how = 'all', axis=1)
My code doesn't return any errors, but it doesn't show any changes. Any idea what I'm doing wrong here? Operating on dictionary of dataframes in this fashion seemed to work for this solution. Perhaps this is a larger lesson of learning how to operate on dataframes in a loop, but I just can't seem to crack it.
You forget to re-assign the values of your dictionnary. That's why the changes are ineffective.
Use this :
colheaders = ['Cust','Cont','Rate']
for key, item in dico.items():
item = item.drop(item[item.iloc[:,0].isin(colheaders)].index)
item = item.dropna(how = 'all')
item = item.dropna(how = 'all', axis=1)
dico[key] = item
Quick note: Use dico, my_dict, dfs_dict, ... as a name of your dictionnary variable instead of dict since this one is a python constructor.

Comparing data from 2 nested dictionaries and producing a box-plot

I am trying to produce a box plot using matplotlib with data from nested dictionaries. Below is a rough outline of the structure of dictionary in question.
m_data = {scenario:{variable:{'model_name':value, ''model_name':value ...}
One issue is that I want to look at the change in the models output between the two different scenarios ( scenario 1 [VAR1] - scenario 2 [VAR2]) and then plot this difference in a box plot.
I have managed to do this, however, I want to be able to label the outliers with the model name. My current method separates the keys from the values, therefore the outlier data point has no name associated with it anymore.
#BOXPLOT
#set up blank lists
future_rain = []
past_rain = []
future_temp = []
past_temp = []
#single out the values for each model from the nested dictioaries
for key,val in m_data[FUTURE_SCENARIO][VAR1].items():
future_rain.append(val)
for key,val in m_data[FUTURE_SCENARIO][VAR2].items():
future_temp.append(val)
for key,val in m_data['historical'][VAR1].items():
past_rain.append(val)
for key,val in m_data['historical'][VAR2].items():
past_temp.append(val)
#blanks for final data
bx_plt_rain = []
bx_plt_temp = []
#allow for the subtration of two lists
zip_object = zip(future_temp, past_temp)
for future_temp_i, past_temp_i in zip_object:
bx_plt_temp.append(future_temp_i - past_temp_i)
zip_object = zip(future_rain, past_rain)
for future_rain_i, past_rain_i in zip_object:
bx_plt_rain.append(future_rain_i - past_rain_i)
#colour ouliers red
c = 'red'
outlier_col = {'flierprops': dict(color =c, markeredgecolor=c)}
#plot
bp = plt.boxplot(bx_plt_rain, patch_artist=True, showmeans=True, vert= False, meanline=True, **outlier_col)
bp['boxes'][0].set(facecolor = 'lightgrey')
plt.show()
If anyone knows of a workaround for this I would be extremely grateful.
As a bit of a hack you could create a function that looks through the dict for the outlier value and returns the key.
def outlier_name(outlier_val, inner_dict):
for key, value in inner_dict.items():
if value == outlier_val:
return key
This could be pretty intensive if your data sets are large.

Creating multiple lists in for loop with dynamic names in Python

I'm trying to find out averages and standard deviation of multiple columns of my dataset and then save them as a new column in a new dataframe. i.e. for every 'GROUP' in the dataset, I want one columns in the new dataframe with its average and SD. I came up with the following script but I'm not able to name it dynamically.
Average_F1_S_list, Average_F1_M_list, SD_F1_S_list, SD_F1_M_list = ([] for i in range(4))
Groups= DF['GROUP'].unique().tolist()
for key in Groups:
Average_F1_S = DF_DICT[key]['F1_S'].mean()
Average_F1_S_list.append(Average_F1_S)
SD_F1_S = DF_DICT[key]['F1_S'].std()
SD_F1_S_list.append(SD_F1_S)
Average_F1_M = DF_DICT[key]['F1_M'].mean()
Average_F1_M_list.append(Average_F1_M)
SD_F1_M = DF_DICT[key]['F1_M'].std()
SD_F1_M_list.append(SD_F1_M)
df=pd.DataFrame({'Group':Groups,
'Average_F1_S':Average_F1_S_list,'Standard_Dev_F1_S':SD_F1_S_list,
'Average_F1_M':Average_F1_M_list,'Standard_Dev_F1_M':SD_F1_M_list},
columns=['Group','Average_F1_S','Standard_Dev_F1_S','Average_F1_M', 'Standard_Dev_F1_M'])
This will not be a good solution as there are too many features. Is there any way I can create the lists dynamically?
This should do the trick! Hope this helps
# These are all the keys you want
key_names = ['F1_S', 'F1_M']
# Holds the data you want to pass to the dataframe.
df_info = {'Groups': Groups}
for group_name in Groups:
# For each group in the groups, we iterate over all the keys we want.
for key in key_names:
# Generate a keyname that you want for your dataframe.
avg_key_name = key + '_Average'
std_key_name = key + '_Standard_Dev'
if avg_key_name not in df_info:
df_info[avg_key_name] = []
df_info[std_key_name] = []
df_info[avg_key_name].append(DF_DICT[group_name][key].mean())
df_info[std_key_name].append(DF_DICT[group_name][key].std())
df = pd.DataFrame(df_info)

Categories