Is it possible to rename/alter all the keys of a dict? As an example, let's look at the following dictionary:
a_dict = {'a_var1': 0.05,
'a_var2': 4.0,
'a_var3': 100.0,
'a_var4': 0.3}
I want to remove all the a_ in the keys, so I end up with
a_dict = {'var1': 0.05,
'var2': 4.0,
'var3': 100.0,
'var4': 0.3}
If you want to alter the existing dict, instead of creating a new one, you can loop the keys, pop the old one, and insert the new, modified key with the old value.
>>> for k in list(a_dict):
... a_dict[k[2:]] = a_dict.pop(k)
...
>>> a_dict
{'var2': 4.0, 'var1': 0.05, 'var3': 100.0, 'var4': 0.3}
(Iterating a list(a_dict) will prevent errors due to concurrent modification.)
Strictly speaking, this, too, does not alter the existing keys, but inserts new keys, as it has to re-insert them according to their new hash codes. But it does alter the dictionary as a whole.
As noted in comments, updating the keys in the dict in a loop can in fact be slower than a dict comprehension. If this is a problem, you could also create a new dict using a dict comprehension, and then clear the existing dict and update it with the new values.
>>> b_dict = {k[2:]: a_dict[k] for k in a_dict}
>>> a_dict.clear()
>>> a_dict.update(b_dict)
You can use:
{k[2:]: v for k, v in a_dict.items()}
You can do that easily enough with a dict comprehension.
a_dict = {'a_var1': 0.05,
'a_var2': 4.0,
'a_var3': 100.0,
'a_var4': 0.3}
a_dict = { k[2:]:v for k,v in a_dict.items() }
Result:
{'var1': 0.05, 'var2': 4.0, 'var3': 100.0, 'var4': 0.3}
You could use the str.replace function to replace key to match the desired format.
a_dict = {'a_var1': 0.05,
'a_var2': 4.0,
'a_var3': 100.0,
'a_var4': 0.3}
a_dict = {k.replace('a_', ''): v for k, v in a_dict.items()}
# {'var1': 0.05, 'var2': 4.0, 'var3': 100.0, 'var4': 0.3}
Related
How can I use dictionary comprehension for values that are literals or list
Right now I'm able to iterate through a nested dictionary and get as a result dict with nested values, but I would like to include in the output dict, list, and literals (int, str)
Here is my example ( I know that isinstance is not needed here)
nested_dict = {'first':{'a':1}, 'second':{'b':2}, 'third': 3, 'fourth': [1, 2, 3, 4]}
float_dict = {
outer_k: { float(inner_v)
for (inner_k, inner_v) in outer_v.items()}
for (outer_k, outer_v) in nested_dict.items()
if isinstance(outer_v, dict)
}
print(float_dict)
Expected output:
{'first': {'a': 1.0}, 'second': {'b': 2.0}, 'third': 3.0, 'fourth': [1.0, 2.0, 3.0, 4.0]}
It's not (reasonably) possible using a single comprehension, you want a recursive function like this:
def floatify(v):
if isinstance(v, list):
return list(map(floatify, v))
if isinstance(v, dict):
return {k: floatify(_v) for k, _v in v.items()}
return float(v)
>>> floatify(nested_dict)
{'first': {'a': 1.0}, 'second': {'b': 2.0}, 'third': 3.0, 'fourth': [1.0, 2.0, 3.0, 4.0]}
Note that you can make this function even more generic:
def anyify(v, f):
if isinstance(v, list):
return [anyify(_v, f) for _v in v]
if isinstance(v, dict):
return {k: anyify(_v, f) for k, _v in v.items()}
return f(v)
anyify(nested_dict, float)
Or without recursion you could kinda make it in one-liner:
{outer_k: ({inner_k: float(inner_v) for (inner_k, inner_v) in outer_v.items()} if isinstance(outer_v, dict) else ([float(i) for i in outer_v] if isinstance(outer_v, list) else float(outer_v))) for (outer_k, outer_v) in nested_dict.items()}
Ex:
nested_dict = {'first':{'a':1}, 'second':{'b':2}, 'third': 3, 'fourth': [1, 2, 3, 4]}
float_dict = {outer_k: ({inner_k: float(inner_v) for (inner_k, inner_v) in outer_v.items()} if isinstance(outer_v, dict) else ([float(i) for i in outer_v] if isinstance(outer_v, list) else float(outer_v))) for (outer_k, outer_v) in nested_dict.items()}
print(float_dict)
Output:
{'first': {'a': 1.0}, 'second': {'b': 2.0}, 'third': 3.0, 'fourth': [1.0, 2.0, 3.0, 4.0]}
Here is a solution that uses a queue. By taking advantage of the mutability of dicts and lists, we can enqueue those objects (even inner ones and even how deeply nested) and still be able to update the source data. This will traverse and enqueue each inner element of the data. If the element is an int, it will convert it to float.
import copy
nested_dict = {'first':{'a':1}, 'second':{'b':2}, 'third': 3, 'fourth': [4, {"fifth": 5}, 6, [{"sixth": [7, 8]}, 9], 10]}
float_dict = copy.deepcopy(nested_dict)
queue = [float_dict]
while queue:
data = queue.pop()
items = data.items() if isinstance(data, dict) else enumerate(data)
for key, value in items:
if isinstance(value, int):
data[key] = float(value)
elif isinstance(value, (dict, list)):
queue.append(value)
print(float_dict)
Output
{'first': {'a': 1.0}, 'second': {'b': 2.0}, 'third': 3.0, 'fourth': [4.0, {'fifth': 5.0}, 6.0, [{'sixth': [7.0, 8.0]}, 9.0], 10.0]}
When i pass a list of strings into this function i want it to return a matrix saying how many times each unique word appears in the string, but, i get a matrix with the values for the first string repeated 4 times.
This is the code
def tf(corp):
words_set = set()
for i in corp:
a=i.split(' ')
for j in a:
words_set.add(j)
words_dict = {i:0 for i in words_set}
wcount=0
matr=list()
for doc in corp:
for worduni in words_dict:
count=0
for words in doc.split(' '):
if words==worduni:
count+=1
words_dict[worduni]=count/len(doc.split(' '))
print(words_dict)
matr.append(words_dict)
return matr
when i print the value of matr, i get
[{'the': 0.2,
'first': 0.2,
'document': 0.2,
'third': 0.0,
'is': 0.2,
'one': 0.0,
'and': 0.0,
'this': 0.2,
'second': 0.0},
{'the': 0.2,
'first': 0.2,
'document': 0.2,
'third': 0.0,
'is': 0.2,
'one': 0.0,
'and': 0.0,
'this': 0.2,
'second': 0.0},
{'the': 0.2,
'first': 0.2,
'document': 0.2,
'third': 0.0,
'is': 0.2,
'one': 0.0,
'and': 0.0,
'this': 0.2,
'second': 0.0},
{'the': 0.2,
'first': 0.2,
'document': 0.2,
'third': 0.0,
'is': 0.2,
'one': 0.0,
'and': 0.0,
'this': 0.2,
'second': 0.0}]
What your code is doing is repeatedly adding the same object (word_dict) to matr. Naturally, since matr is a list it can handle this ... and you will have multiple references to the same dictionary. Meanwhile, you are updating the dictionary. So what you see when you print the list is the final state of the dictionary ... N times.
Now I suspect that you intended to save snapshots of the state of word_dict in matr. But if that's want to do, you need to save copies of word_dict in matr; e.g
matr.append(words_dict.copy())
On the other hand, if your intend to generate a separate word frequency dictionary for each doc in corp, then you need to move the creation and initialization of word_dict inside the outer loop.
Separately to the above, the way you are counting the words and computing the frequency seems to be completely wrong. I am assuming that is what you are trying to do here.
Note: if you use more meaningful method and variable names and/or add appropriate comments to your code, it will be easier for other people to understand what your code is intended to do.
I modified this to get you non duplicated data that is identical to your print:
def tf(corp):
words_set = set()
for i in corp:
a=i.split(' ')
for j in a:
words_set.add(j)
words_dict = {i:0 for i in words_set}
wcount=0
matr=list()
for doc in corp:
for worduni in words_dict:
count=0
for words in doc.split(' '):
if words==worduni:
count+=1
words_dict[worduni]=count/len(doc.split(' '))
print(words_dict)
matr.append(words_dict.copy())
return matr
Starting from this dataframe
import pandas as pd
df2 = pd.DataFrame({'t': ['a', 'a', 'a', 'b', 'b', 'b'],
'x': [1.1, 2.2, 3.3, 1.1, 2.2, 3.3],
'y': [1.0, 2.0, 3.0, 2.0, 3.0, 4.0]})
it's possible to simplify these nested for loops:
for t, df in df2.groupby('t'):
print("t:", t)
for d in df.to_dict(orient='records'):
print({'x': d['x'], 'y': d['y']})
by separating the inner loop into a function:
def handle(df):
for d in df.to_dict(orient='records'):
print({'x': d['x'], 'y': d['y']})
for t, df in df2.groupby('t'):
print("t:", t)
handle(df)
How might I similarly separate a nested list comprehension :
mydict = {
t: [{'x': d['x'], 'y': d['y']} for d in df.to_dict(orient='records')]
for t, df in df2.groupby(['t'])
}
into two separate loops?
I'm asking the question with just two levels of nesting, yet with just two nested loops the need is hardly critical. The motivations are:
By the time there are a few levels, the code becomes tough to read.
Developing and testing smaller blocks guards against (present and future) mistakes at more than the outer level.
I have list from mssql query which includes Decimals. Such as:
[(1, Decimal('33.00'), Decimal('5.30'), Decimal('50.00')),
(2, Decimal('17.00'), Decimal('0.50'), Decimal('10.00'))]
I want to transform that to dict and float number like that:
{1: [33.00, 5.30, 50.00],
2: [17.00, 0.50, 10.00]}
I writed below line:
load_dict = {key: values for key, *values in dataRead}
which results:
{1: [Decimal('33.00'), Decimal('105.30'), Decimal('25650.00')],
2: [Decimal('17.00'), Decimal('40.50'), Decimal('10000.00')]}
I am asking that is there anyway making this transformation with list/dict comprehension?
you could use a dict-comprehension with a cast to float like this:
from decimal import Decimal
lst = [(1, Decimal('33.00'), Decimal('5.30'), Decimal('50.00')),
(2, Decimal('17.00'), Decimal('0.50'), Decimal('10.00'))]
ret = {key: [float(f) for f in values] for key, *values in lst}
print(ret)
# {1: [33.0, 5.3, 50.0], 2: [17.0, 0.5, 10.0]}
Apply float to values:
from decimal import Decimal
data = [(1, Decimal('33.00'), Decimal('5.30'), Decimal('50.00')),
(2, Decimal('17.00'), Decimal('0.50'), Decimal('10.00'))]
load_dict = {key: list(map(float, values)) for key, *values in data}
print(load_dict)
Output
{1: [33.0, 5.3, 50.0], 2: [17.0, 0.5, 10.0]}
I have two multi-index dataframes: mean and std
arrays = [['A', 'A', 'B', 'B'], ['Z', 'Y', 'X', 'W']]
mean=pd.DataFrame(data={0.0:[np.nan,2.0,3.0,4.0], 60.0: [5.0,np.nan,7.0,8.0], 120.0:[9.0,10.0,np.nan,12.0]},
index=pd.MultiIndex.from_arrays(arrays, names=('id', 'comp')))
mean.columns.name='Times'
std=pd.DataFrame(data={0.0:[10.0,10.0,10.0,10.0], 60.0: [10.0,10.0,10.0,10.0], 120.0:[10.0,10.0,10.0,10.0]},
index=pd.MultiIndex.from_arrays(arrays, names=('id', 'comp')))
std.columns.name='Times'
My task is to combine them in a dictionary with '{id:' as first level, followed by second level dictionary with '{comp:' and then for each comp a list of tuples, which combines the (time-points, mean, std). So, the result should look like that:
{'A': {
'Z': [(60.0,5.0,10.0),
(120.0,9.0,10.0)],
'Y': [(0.0,2.0,10.0),
(120.0,10.0,10.0)]
},
'B': {
'X': [(0.0,3.0,10.0),
(60.0,7.0,10.0)],
'W': [(0.0,4.0,10.0),
(60.0,8.0,10.0),
(120.0,12.0,10.0)]
}
}
Additionally, when there is NaN in data, the triplets are left out, so value A,Z at time 0, A,Y at time 60 B,X at time 120.
How do I get there? I constructed already a dict of dict of list of tuples for a single line:
iter=0
{mean.index[iter][0]:{mean.index[iter][1]:list(zip(mean.columns, mean.iloc[iter], std.iloc[iter]))}}
>{'A': {'Z': [(0.0, 1.0, 10.0), (60.0, 5.0, 10.0), (120.0, 9.0, 10.0)]}}
Now, I need to extend to a dictionary with a loop over each line {inner dict) and adding the ids each {outer dict}. I started with iterrows and dic comprehension, but here I have problems, indexing with the iter ('A','Z') which i get from iterrows(), and building the whole dict, iteratively.
{mean.index[iter[1]]:list(zip(mean.columns, mean.loc[iter[1]], std.loc[iter[1]])) for (iter,row) in mean.iterrows()}
creates errors, and I would only have the inner loop
KeyError: 'the label [Z] is not in the [index]'
Thanks!
EDIT: I exchanged the numbers to float in this example, because here integers were generated before which was not consistent with my real data, and which would fail in following json dump.
Here is a solution using a defaultdict:
from collections import defaultdict
mean_as_dict = mean.to_dict(orient='index')
std_as_dict = std.to_dict(orient='index')
mean_clean_sorted = {k: sorted([(i, j) for i, j in v.items()]) for k, v in mean_as_dict.items()}
std_clean_sorted = {k: sorted([(i, j) for i, j in v.items()]) for k, v in std_as_dict.items()}
sol = {k: [j + (std_clean_sorted[k][i][1],) for i, j in enumerate(v) if not np.isnan(j[1])] for k, v in mean_clean_sorted.items()}
solution = defaultdict(dict)
for k, v in sol.items():
solution[k[0]][k[1]] = v
Resulting dict will be defaultdict object that you can change to dict easily:
solution = dict(solution)
con = pd.concat([mean, std])
primary = dict()
for i in set(con.index.values):
if i[0] not in primary.keys():
primary[i[0]] = dict()
primary[i[0]][i[1]] = list()
for x in con.columns:
primary[i[0]][i[1]].append((x, tuple(con.loc[i[0]].loc[i[1][0].values)))
Here is sample output
I found a very comprehensive way of putting up this nested dict:
mean_dict_items=mean.to_dict(orient='index').items()
{k[0]:{u[1]:list(zip(mean.columns, mean.loc[u], std.loc[u]))
for u,v in mean_dict_items if (k[0],u[1]) == u} for k,l in mean_dict_items}
creates:
{'A': {'Y': [(0.0, 2.0, 10.0), (60.0, nan, 10.0), (120.0, 10.0, 10.0)],
'Z': [(0.0, nan, 10.0), (60.0, 5.0, 10.0), (120.0, 9.0, 10.0)]},
'B': {'W': [(0.0, 4.0, 10.0), (60.0, 8.0, 10.0), (120.0, 12.0, 10.0)],
'X': [(0.0, 3.0, 10.0), (60.0, 7.0, 10.0), (120.0, nan, 10.0)]}}