Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed last year.
Improve this question
I read many questions/answers, but it does not really fit with what I'm looking for...
Here's the "story" : I'm anonymizing 1000 time series and now, I need to analyze how it looks like between real patient time series and the anonymized ones. I have the following parameters to test :
list_patient = ['pa', 'pr'] # pa = anonymized patients / pr = real patients
list_param = ['avg', 'std', 'med', 'max', 'min'] # Measures that I want to do
list_param_physio = ['FC', 'PAS', 'PAM', 'PAD'] # Physiological parameters such as cardiac frequency (FC)
And I would definitely prefer to avoid this... : "3 lessons to make developers cry..." or how looks like a big part of my code without automation.
Just a taste of it :
avg_pa_fc, avg_pa_pas, avg_pa_pam, avg_pa_pad, std_pa_fc, std_pa_pas, std_pa_pam, std_pa_pad = ([] for i in range(8))
med_pa_fc, med_pa_pas, med_pa_pam, med_pa_pad = ([] for i in range(4))
min_pa_fc, min_pa_pas, min_pa_pam, min_pa_pad, max_pa_fc, max_pa_pas, max_pa_pam, max_pa_pad = ([] for i in range(8))
...
# Calculte means and stdev for each pa file, add it to a np.array and convert it to a pd.df
avg_pa_fc, std_pa_fc = pd.DataFrame(np.append(avg_pa_fc, (statistics.mean(pa_series_fc)))), pd.DataFrame(np.append(std_pa_fc, (statistics.stdev(pa_series_fc))))
avg_pa_pas, std_pa_pas = pd.DataFrame(np.append(avg_pa_pas, (statistics.mean(pa_series_pas)))), pd.DataFrame(np.append(std_pa_pas, (statistics.stdev(pa_series_pas))))
avg_pa_pam, std_pa_pam = pd.DataFrame(np.append(avg_pa_pam, (statistics.mean(pa_series_pam)))), pd.DataFrame(np.append(std_pa_pam, (statistics.stdev(pa_series_pam))))
avg_pa_pad, std_pa_pad = pd.DataFrame(np.append(avg_pa_pad, (statistics.mean(pa_series_pad)))), pd.DataFrame(np.append(std_pa_pad, (statistics.stdev(pa_series_pad))))
So, I would like to create automatically some empty lists with this format : {list1}_{list2}_{list3} = [] (underscores are between lists as below)
I tried many things, such as :
list = []
for i in range(40):
list.append(f'{list_param}_{list_patient}_{list_param_physio}')
print(list)
# Output : ["['pa', 'pr']_['avg', 'std', 'med', 'max', 'min']_['FC', 'PAS', 'PAM', 'PAD']", "['pa', 'pr']_....
for param, patient, param_phy in enumerate(list_param, list_patient, list_param_physio):
list.append(f'{param}_{patient}_{param_phy}')
print(list)
# TypeError: enumerate() takes at most 2 arguments (3 given)
# For anonymized patients :
for param, param_phy in enumerate(list_param, list_param_physio):
list.append((f'{param}_pa_{param_phy}').aslist())
print(list)
# TypeError: 'list' object cannot be interpreted as an integer
I also tried to use dictionaries, but with 3 parameters, it begins to be too tricky for me...
If you have any idea, that would be great !
Since you seemed in interested in my idea of using a single map to store all of your patient combinations rather than trying to create a bunch of variable names, here's an example of how to do that. This initially sets each key(variable) to an empty list, but you can set each one to whatever you want.
import itertools
from pprint import pprint
list_patient = ['pa', 'pr'] # pa = anonymized patients / pr = real patients
list_param = ['avg', 'std', 'med', 'max', 'min'] # Measures that I want to do
list_param_physio = ['FC', 'PAS', 'PAM', 'PAD'] # Physiological parameters such as cardiac frequency (FC)
patients = {}
for patient, param, param_physio in itertools.product(list_patient, list_param, list_param_physio):
key = f"{patient}_{param}_{param_physio}"
patients[key] = []
pprint(patients)
Result:
{'pa_avg_FC': [],
'pa_avg_PAD': [],
'pa_avg_PAM': [],
'pa_avg_PAS': [],
'pa_max_FC': [],
'pa_max_PAD': [],
'pa_max_PAM': [],
'pa_max_PAS': [],
'pa_med_FC': [],
'pa_med_PAD': [],
'pa_med_PAM': [],
'pa_med_PAS': [],
'pa_min_FC': [],
'pa_min_PAD': [],
'pa_min_PAM': [],
'pa_min_PAS': [],
'pa_std_FC': [],
'pa_std_PAD': [],
'pa_std_PAM': [],
'pa_std_PAS': [],
'pr_avg_FC': [],
'pr_avg_PAD': [],
'pr_avg_PAM': [],
'pr_avg_PAS': [],
'pr_max_FC': [],
'pr_max_PAD': [],
'pr_max_PAM': [],
'pr_max_PAS': [],
'pr_med_FC': [],
'pr_med_PAD': [],
'pr_med_PAM': [],
'pr_med_PAS': [],
'pr_min_FC': [],
'pr_min_PAD': [],
'pr_min_PAM': [],
'pr_min_PAS': [],
'pr_std_FC': [],
'pr_std_PAD': [],
'pr_std_PAM': [],
'pr_std_PAS': []}
Rather that reduce your data down to a flat list with complex names, you're probably better off preserving the structure. Here's an example of how to do that. This creates a single structure where you can access each patient, then each set of physios for that patient, and then each of the stats (avg, std, etc.) for that patient:
patients = {}
for patient_name in list_patient:
patient = {}
for physio_name in list_param_physio:
physio = {}
for param in list_param:
physio[param] = 0.0
patient[physio_name] = physio
patients[patient_name] = patient
pprint(patients)
Result:
{'pa': {'FC': {'avg': 0.0, 'max': 0.0, 'med': 0.0, 'min': 0.0, 'std': 0.0},
'PAD': {'avg': 0.0, 'max': 0.0, 'med': 0.0, 'min': 0.0, 'std': 0.0},
'PAM': {'avg': 0.0, 'max': 0.0, 'med': 0.0, 'min': 0.0, 'std': 0.0},
'PAS': {'avg': 0.0, 'max': 0.0, 'med': 0.0, 'min': 0.0, 'std': 0.0}},
'pr': {'FC': {'avg': 0.0, 'max': 0.0, 'med': 0.0, 'min': 0.0, 'std': 0.0},
'PAD': {'avg': 0.0, 'max': 0.0, 'med': 0.0, 'min': 0.0, 'std': 0.0},
'PAM': {'avg': 0.0, 'max': 0.0, 'med': 0.0, 'min': 0.0, 'std': 0.0},
'PAS': {'avg': 0.0, 'max': 0.0, 'med': 0.0, 'min': 0.0, 'std': 0.0}}}
Here's a simple example of how to use this structure. This shows you how to set and reference the 'avg' stat for the 'FC' physio for patient 'pr':
patients['pr']['FC']['avg'] = 1234.56
print(patients['pr']['FC']['avg'])
Result:
1234.56
If you want all the stats for the 'PAS' physio for the 'pa' patient, that's:
pprint(patients['pa']['PAS'])
Result:
{'avg': 0.0, 'max': 0.0, 'med': 0.0, 'min': 0.0, 'std': 0.0}
This structure will be easy to iterate over to gather or compute values within it without any of your code having any idea how many items there are at each level.
You could do it with:
>>> list_patient = ['pa', 'pr']
>>> list_param = ['avg', 'std', 'med', 'max', 'min']
>>> list_param_physio = ['FC', 'PAS', 'PAM', 'PAD']
>>> {f"{x}_{y}_{z}": [] for x in list_patient for y in list_param for z in list_param_physio}
This will give you following output:
{
'pa_avg_FC': [],
'pa_avg_PAS': [],
'pa_avg_PAM': [],
'pa_avg_PAD': [],
'pa_std_FC': [],
'pa_std_PAS': [],
'pa_std_PAM': [],
'pa_std_PAD': [],
'pa_med_FC': [],
'pa_med_PAS': [],
'pa_med_PAM': [],
'pa_med_PAD': [],
'pa_max_FC': [],
'pa_max_PAS': [],
'pa_max_PAM': [],
'pa_max_PAD': [],
'pa_min_FC': [],
'pa_min_PAS': [],
'pa_min_PAM': [],
'pa_min_PAD': [],
'pr_avg_FC': [],
'pr_avg_PAS': [],
'pr_avg_PAM': [],
'pr_avg_PAD': [],
'pr_std_FC': [],
'pr_std_PAS': [],
'pr_std_PAM': [],
'pr_std_PAD': [],
'pr_med_FC': [],
'pr_med_PAS': [],
'pr_med_PAM': [],
'pr_med_PAD': [],
'pr_max_FC': [],
'pr_max_PAS': [],
'pr_max_PAM': [],
'pr_max_PAD': [],
'pr_min_FC': [],
'pr_min_PAS': [],
'pr_min_PAM': [],
'pr_min_PAD': []
}
You can also use the following code if you want to create the variables without creating a dictionary:
for x in list_patient:
for y in list_param:
for z in list_param_physio:
exec(f"{x}_{y}_{z} = list()")
However, be careful exec() is a dangerous function and should not be used recklessly.
Please assume the following dict is given
sources_targets =
{'front': {'source': [[0.021025050381526675, -0.39686011326197257, 3.328947963092819], [1.0601052368302668, -0.3938359761868055, 3.3247223740425893], [1.0543731204008824, -0.038184154352961984, 2.941639590795943], [0.017868184643970383, -0.0445863307249157, 2.9604912584916665]], 'target': [[-250.0, 0.0, 60.0], [-250.0, 0.0, -60.0], [-190.0, 0.0, -60.0], [-190.0, 0.0, 60.0]]}, 'left': {'source': [[-0.9522471062122733, -1.8444069007997075, 5.372044839796925], [-0.9665739520089994, -1.001259794819009, 5.0057689609608005], [0.9940538769534978, -1.851333804840362, 5.340677879647542], [0.9959517362506759, -1.0049420919111534, 4.942663843894899]], 'target': [[60.0, 0.0, 140.0], [60.0, 0.0, 80.0], [-60.0, 0.0, 140.0], [-60.0, 0.0, 80.0]]}, 'right': {'source': [[-0.8596841529333474, -3.0721166255322663, 4.182871479604773], [-0.8796404109762729, -2.117062488877432, 4.147040556143069], [1.0791152756424247, -2.0436646487085532, 4.08578012939533], [1.0951903113036177, -2.994375693306352, 4.124102127893507]], 'target': [[-60.0, 0.0, -140.0], [-60.0, 0.0, -80.0], [60.0, 0.0, -80.0], [60.0, 0.0, -140.0]]}, 'rear': {'source': [[0.08792816743383122, -0.5260295091566244, 3.0182522276468458], [0.9012540916522604, -0.5012267882763559, 3.0172622143554695], [0.8942115223224005, -0.15635208604951806, 2.6353057539009934], [0.08814313470840558, -0.18017837896764446, 2.6579174137231463]], 'target': [[250.0, 0.0, -40.0], [250.0, 0.0, 60.0], [190.0, 0.0, 60.0], [190.0, 0.0, -40.0]]}}
bundle_names = ['front', 'left', 'right', 'rear']
I want to get a list of all the sources and another list of all the targets.
Also, it needs to be done in a one-liner list comprehension.
My successful attempt is
sources3d = [st[1] for name in self._bundle_names for st in sources_targets[name].items() if st[0] == "source"]
targets3d = [st[1] for name in self._bundle_names for st in sources_targets[name].items() if st[0] == "target"]
with the correct output (for "sources" for example)
[[[0.021025050381526675, -0.39686011326197257, 3.328947963092819], [1.0601052368302668, -0.3938359761868055, 3.3247223740425893], [1.0543731204008824, -0.038184154352961984, 2.941639590795943], [0.017868184643970383, -0.0445863307249157, 2.9604912584916665]], [[-0.9522471062122733, -1.8444069007997075, 5.372044839796925], [-0.9665739520089994, -1.001259794819009, 5.0057689609608005], [0.9940538769534978, -1.851333804840362, 5.340677879647542], [0.9959517362506759, -1.0049420919111534, 4.942663843894899]], [[-0.8596841529333474, -3.0721166255322663, 4.182871479604773], [-0.8796404109762729, -2.117062488877432, 4.147040556143069], [1.0791152756424247, -2.0436646487085532, 4.08578012939533], [1.0951903113036177, -2.994375693306352, 4.124102127893507]], [[0.08792816743383122, -0.5260295091566244, 3.0182522276468458], [0.9012540916522604, -0.5012267882763559, 3.0172622143554695], [0.8942115223224005, -0.15635208604951806, 2.6353057539009934], [0.08814313470840558, -0.18017837896764446, 2.6579174137231463]]]
The way I accessed the inner dict using .items() and accessing the tuple by index seems cumbersome.
I would like to access some way with
[st["source"] for name in self._bundle_names for st in sources_targets[name]]
which doesn't work but would be much cleaner.
I am sure there is a way to do this correctly.
You iterate all the items and then pick the one (!) item that has the desired key. Instead, access the key directly.
This should work, yielding a result equal to your sources3d and targets3d:
sources3d = [sources_targets[name]["source"] for name in bundle_names]
targets3d = [sources_targets[name]["target"] for name in bundle_names]
If you need more than one item, you can basically do the same, but also iterate over the list of keys/items you would like to get:
>>> items = ["source", "target"]
>>> [sources_targets[name][item] for name in bundle_names for item in items]
So the task is, to walk selected_movie_genres one by one and to put either 1 or 0 to the result array by all_genres order.
So in the example below, we must check if Action is in the list of selected movie genres, then we put 1, else 0. Then we go to the Adventure. If the selected movie has it we append 1, not 0.
Only if the selected movie has some genre that is not listed in all_genres we put 1 to Other_genre's position as well.
all_genres = ["Action", "Adventure", "Fantasy",
"Science Fiction", "Crime", "Drama",
"Thriller", "Animation", "Family", "Western",
"Comedy", "Romance", "Horror", "Mystery", "History", "War", "Music",
"Documentary", "Foreign", "TV Movie", "Other_genre"]
selected_movie_genres = [
{
"id": 12,
"name": "Action"
},
{
"id": 18,
"name": "Drama"
},
{
"id": 878,
"name": "Autobiography"
}
]
So the expected output must be
result = [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0,0.0, 0.0, 0.0, 0.0, 0.0,0.0, 0.0, 0.0, 1.0]
# Drama, Action and Other_genre categories are on
What is the optimal way to achieve this result without repeating ourselves?
This is not very efficient or pythonic but it gets the job done. I'm fairly new at Python but thought I'd give it a go.
new = []
ans = []
for item in selected_movie_genres:
new.append(item['name'])
for i in range(len(all_genres)):
if all_genres[i] in new:
ans.append(1)
else:
ans.append(0)
for item in new:
if item not in all_genres:
ans[-1] = ans[-1] + 1
True and False map to 1 and 0, so you need to generate a list of True and False values from all_genres depending on if the genre is in selected_movie_genres and then map them to their integer values.
First create a collection of just the selected genre names. I used a set here because it should have a faster lookup time when we determine if a genre is in it
selected_genres = set(genre['name'] for genre in selected_movie_genres)
Then loop over all the genres and returning True if it's in the selected genres and False if not ( we use int() to convert True to 1 and False to 0)
result = [int(genre_name in selected_genres) for genre_name in all_genres]
# create a dictionary from `all_genres` with zero as default value
ag_dic = dict.fromkeys(all_genres, 0.)
# Check if genre in dictionary;
# if so, increment by one
# if not, increment "Other_genre" by one
for genre in selected_movie_genres:
selected = genre["name"]
if selected in ag_dic:
ag_dic[selected] += 1.
else:
# We can use the last item in the list to keep it pretty generic
# assuming that the last item will always be an 'other' category.
ag_dic[all_genres[-1]] += 1.
ag_dic:
{'Action': 1.0,
'Adventure': 0.0,
'Fantasy': 0.0,
'Science Fiction': 0.0,
'Crime': 0.0,
'Drama': 1.0,
'Thriller': 0.0,
'Animation': 0.0,
'Family': 0.0,
'Western': 0.0,
'Comedy': 0.0,
'Romance': 0.0,
'Horror': 0.0,
'Mystery': 0.0,
'History': 0.0,
'War': 0.0,
'Music': 0.0,
'Documentary': 0.0,
'Foreign': 0.0,
'TV Movie': 0.0,
'Other_genre': 1.0}
List values of resulting dictionary:
result = list(ag_dic.values())
result
[1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0]
First would be getting all the selected genres:
selected_genres = {movie['name'] for movie in selected_movie_genres}
Then go through all the genres and determine if they are in the selection:
result = [float(genre in selected_genres) for genre in all_genres]
Then for the 'Other_genre' just figure out if there are any outliers:
result[-1] = float(any(genre in all_genres for genre in selected_genres))
Is it possible to rename/alter all the keys of a dict? As an example, let's look at the following dictionary:
a_dict = {'a_var1': 0.05,
'a_var2': 4.0,
'a_var3': 100.0,
'a_var4': 0.3}
I want to remove all the a_ in the keys, so I end up with
a_dict = {'var1': 0.05,
'var2': 4.0,
'var3': 100.0,
'var4': 0.3}
If you want to alter the existing dict, instead of creating a new one, you can loop the keys, pop the old one, and insert the new, modified key with the old value.
>>> for k in list(a_dict):
... a_dict[k[2:]] = a_dict.pop(k)
...
>>> a_dict
{'var2': 4.0, 'var1': 0.05, 'var3': 100.0, 'var4': 0.3}
(Iterating a list(a_dict) will prevent errors due to concurrent modification.)
Strictly speaking, this, too, does not alter the existing keys, but inserts new keys, as it has to re-insert them according to their new hash codes. But it does alter the dictionary as a whole.
As noted in comments, updating the keys in the dict in a loop can in fact be slower than a dict comprehension. If this is a problem, you could also create a new dict using a dict comprehension, and then clear the existing dict and update it with the new values.
>>> b_dict = {k[2:]: a_dict[k] for k in a_dict}
>>> a_dict.clear()
>>> a_dict.update(b_dict)
You can use:
{k[2:]: v for k, v in a_dict.items()}
You can do that easily enough with a dict comprehension.
a_dict = {'a_var1': 0.05,
'a_var2': 4.0,
'a_var3': 100.0,
'a_var4': 0.3}
a_dict = { k[2:]:v for k,v in a_dict.items() }
Result:
{'var1': 0.05, 'var2': 4.0, 'var3': 100.0, 'var4': 0.3}
You could use the str.replace function to replace key to match the desired format.
a_dict = {'a_var1': 0.05,
'a_var2': 4.0,
'a_var3': 100.0,
'a_var4': 0.3}
a_dict = {k.replace('a_', ''): v for k, v in a_dict.items()}
# {'var1': 0.05, 'var2': 4.0, 'var3': 100.0, 'var4': 0.3}
To efficiently get the frequencies of letters (given alphabet ABC in a dictionary in a string code I can make a function a-la (Python 3) :
def freq(code):
return{n: code.count(n)/float(len(code)) for n in 'ABC'}
Then
code='ABBBC'
freq(code)
Gives me
{'A': 0.2, 'C': 0.2, 'B': 0.6}
But how can I get the frequencies for each position along a list of strings of unequal lengths ? For instance mcode=['AAB', 'AA', 'ABC', ''] should give me a nested structure like a list of dict (where each dict is the frequency per position):
[{'A': 1.0, 'C': 0.0, 'B': 0.0},
{'A': 0.66, 'C': 0.0, 'B': 0.33},
{'A': 0.0, 'C': 0.5, 'B': 0.5}]
I cannot figure out how to do the frequencies per position across all strings, and wrap this in a list comprehension. Inspired by other SO for word counts e.g. the well discussed post Python: count frequency of words in a list I believed maybe the Counter module from collections might be a help.
Understand it like this - write the mcode strings on separate lines:
AAB
AA
ABC
Then what I need is the column-wise frequencies (AAA, AAB, BC) of the alphabet ABC in a list of dict where each list element is the frequencies of ABC per columns.
A much shorter solution:
from itertools import zip_longest
def freq(code):
l = len(code) - code.count(None)
return {n: code.count(n)/l for n in 'ABC'}
mcode=['AAB', 'AA', 'ABC', '']
results = [ freq(code) for code in zip_longest(*mcode) ]
print(results)
Example, the steps are shortly explained in comments. Counter of module collections is not used, because the mapping for a position also contains characters, that are not present at this position and the order of frequencies does not seem to matter.
def freq(*words):
# All dictionaries contain all characters as keys, even
# if a characters is not present at a position.
# Create a sorted list of characters in chars.
chars = set()
for word in words:
chars |= set(word)
chars = sorted(chars)
# Get the number of positions.
max_position = max(len(word) for word in words)
# Initialize the result list of dictionaries.
result = [
dict((char, 0) for char in chars)
for position in range(max_position)
]
# Count characters.
for word in words:
for position in range(len(word)):
result[position][word[position]] += 1
# Change to frequencies
for position in range(max_position):
count = sum(result[position].values())
for char in chars:
result[position][char] /= count # float(count) for Python 2
return result
# Testing
from pprint import pprint
mcode = ['AAB', 'AA', 'ABC', '']
pprint(freq(*mcode))
Result (Python 3):
[{'A': 1.0, 'B': 0.0, 'C': 0.0},
{'A': 0.6666666666666666, 'B': 0.3333333333333333, 'C': 0.0},
{'A': 0.0, 'B': 0.5, 'C': 0.5}]
In Python 3.6, the dictionaries are even sorted; earlier versions can use OrderedDict from collections instead of dict.
Your code isn't efficient at all :
You first need to define which letters you'd like to count
You need to parse the string for each distinct letter
You could just use Counter:
import itertools
from collections import Counter
mcode=['AAB', 'AA', 'ABC', '']
all_letters = set(''.join(mcode))
def freq(code):
code = [letter for letter in code if letter is not None]
n = len(code)
counter = Counter(code)
return {letter: counter[letter]/n for letter in all_letters}
print([freq(x) for x in itertools.zip_longest(*mcode)])
# [{'A': 1.0, 'C': 0.0, 'B': 0.0}, {'A': 0.6666666666666666, 'C': 0.0, 'B': 0.3333333333333333}, {'A': 0.0, 'C': 0.5, 'B': 0.5}]
For Python2, you could use itertools.izip_longest.