Sum values from different dictionaries based on "temporal instant" - python

I have a dictionary, such as:
For each key (A1 and A2) I have a list of items bought in a different place (T1 and T2) with the amount spent by the temporal time. example: Person A1 spend 3.0 at timestep 1, at supermarket T1.
{'A1': {'T1': [1, 3.0, 3, 4.0], 'T2': [2, 2.0]}, 'A2': {'T1': [1, 0.0, 3, 5.0], 'T2': [2, 3.0]}}
What I want to do is sum each sub dictionary, to obtain the total spent at each timestep in each supermarket:
A1 A2 A1 A2 A1 A2
T1+T1 T2+T2 T1+T1 (The lists are followed by: timestep + money spent)
[3.0, 5.0, 9.0] <<<< output
1 2 3
res 3.0 + 0.0 = 3.0 and 2.0 + 3.0 = 5.0 and 5.0 + 4.0 = 9.0
How can I do this? I've tried a for, but I've created a big mess
Output:
[3.0, 5.0, 9.0]

Here is a quick solution for your problem, but I would suggest you use dict for time stamped value to distinguish between timestamp and value.
Instead of this:
{"A1": {"T1": [1, 3.0, 3, 4.0]}}
Do this:
{"A1": {"T1": {1: 3.0, 3: 4.0}}}
import json
dict = json.loads('{"A1": {"T1": [1, 3.0, 3, 4.0], "T2": [2, 2.0]}, "A2": {"T1": [1, 0.0, 3, 5.0], "T2": [2, 3.0]}}')
result = {}
for person, supermarket in dict.items():
for _, timestamped_values in supermarket.items():
for i in range(len(timestamped_values)):
if i % 2 == 0 and i < len(timestamped_values)-1:
result.setdefault(timestamped_values[i], []).append(timestamped_values[i+1])
print(result)
Result is:
{1: [3.0, 0.0], 3: [4.0, 5.0], 2: [2.0, 3.0]}
Just add up the values in the list to get your result. I will keep it this way just in case you need to do other operations on the timestamped values.

Related

How to print a the same map result 5 times in a loop?

Here I've a simple assoc. array of maps where I want to loop, but I want to print the arr['b'] by repeating 5 times.
number = 0
arr = {}
arr['a'] = map(float, [1, 2, 3])
arr['b'] = map(float, [4, 5, 6])
arr['c'] = map(float, [7, 8, 9])
arr['d'] = map(float, [10, 11, 12])
while number < 5:
print(list(arr['b']))
number = number + 1
Why is the output as such, instead of [4.0, 5.0, 6.0] repeating 5 times? How can I loop to get arr['b'] result 5 times?
Output:
[4.0, 5.0, 6.0]
[]
[]
[]
[]
This is the output I really want.
Intended Output:
[4.0, 5.0, 6.0]
[4.0, 5.0, 6.0]
[4.0, 5.0, 6.0]
[4.0, 5.0, 6.0]
[4.0, 5.0, 6.0]
map produces a generator which gets consumed the first time you access its content. Therefore, the first time you convert it to a list, it gives you the expected results, but the second time the resulting list is empty. Simple example:
a = map(float, [1, 2, 3])
print(list(a))
# out: [1.0, 2.0, 3.0]
print(list(a))
# out: []
Convert the map object/generator to a list once (outside the loop!) and you can print it as often as you need: arr['a'] = list(map(float, [1, 2, 3])) etc.
Other improvement: In Python you don't need counters in loops as you use it here. Instead, in order to do something 5 times, rather use range (the _ by convention denotes a value we are not interested in):
for _ in range(5):
print(list(arr['b']))

Cumulative sum by category in time series data in Python pandas

I am trying to make this data frame into a dictionary so I can create a plot in matplotlib. My solution is the following, but I wonder if there is a more elegant way.
import datetime as dt
import pandas as pd
today = dt.date.today()
monday = today - dt.timedelta(days=today.weekday(), weeks=1)
date_range = pd.Series(monday + dt.timedelta(days=x) for x in range(5))
date_range1 = pd.DataFrame({"create_date":pd.to_datetime(date_range)})
countries = list(df['country'].unique())
dic = {}
for country in countries:
lst = df[df.country == country]
sub = date_range1.merge(lst, on='create_date', how='outer')
dic[country] = list(sub['frequency'].fillna(0).cumsum())
DataFrame
create_date country frequency
0 2020-08-24 AU 9.0
1 2020-08-24 CN 3.0
2 2020-08-24 FJ 1.0
3 2020-08-25 CN 3.0
4 2020-08-25 ID 2.0
5 2020-08-26 ID 1.0
6 2020-08-27 NaN NaN
Result
{
'AU': [9, 9, 9, 9],
'CN': [3, 6, 6, 6],
'FJ': [1, 1, 1, 1],
'ID': [0, 2, 3, 3]
}
Use pandas.pivot:
df2 = df.pivot("create_date", "country", "frequency").fillna(0).cumsum()
df2[df2.columns.dropna()].to_dict("list")
Output:
{'AU': [9.0, 9.0, 9.0, 9.0],
'CN': [3.0, 6.0, 6.0, 6.0],
'FJ': [1.0, 1.0, 1.0, 1.0],
'ID': [0.0, 2.0, 3.0, 3.0]}

a more pythonic way to split a column in multiple columns and sum two of them

Sample code:
import pandas as pd
df = pd.DataFrame({'id': [1, 2, 3], 'bbox': [[1.0, 2.0, 3.0, 4.0], [5.0, 6.0, 7.0, 8.0], [9.0, 10.0, 11.0, 12.0]]})
Goal:
df = pd.DataFrame({'id': [1, 2, 3], 'bbox': [[1.0, 2.0, 3.0, 4.0], [5.0, 6.0, 7.0, 8.0], [9.0, 10.0, 11.0, 12.0]], 'x1': [1, 5, 9], 'y1': [2, 6, 10], 'x2': [4, 12, 20], 'y2': [6, 14, 22]})
In words, I want to add four integer columns to the dataframe, where the first two are just the first two elements of each list in bbox, and the last two are respectively the sum of the first and third element of each list, and the sum of the second and fourth one. Currently, I do this:
df[['x1', 'y1', 'w', 'h']] = pd.DataFrame(df['bbox'].values.tolist(), index=df.index).astype(int)
df.assign(x2 = df['x1']+df['w'], y2 = df['y1']+df['h'])
df.drop(['w', 'h'], axis = 1)
It seems a bit convoluted to me. Isn't there a way to avoid creating the intermediate columns w and h, or would it make the code less readable? Readability is an higher priority for me than saving one code line, thus if there are no readable alternatives, I'll settle for this solution.
I think you can create x2 and y2 in first step:
df1 = pd.DataFrame(df['bbox'].values.tolist(),index=df.index).astype(int)
df[['x1', 'y1', 'x2', 'y2']] = df1
df = df.assign(x2 = df['x1']+df['x2'], y2 = df['y1']+df['y2'])
print (df)
id bbox x1 y1 x2 y2
0 1 [1.0, 2.0, 3.0, 4.0] 1 2 4 6
1 2 [5.0, 6.0, 7.0, 8.0] 5 6 12 14
2 3 [9.0, 10.0, 11.0, 12.0] 9 10 20 22
Or use +=:
df1 = pd.DataFrame(df['bbox'].values.tolist(),index=df.index).astype(int)
df[['x1', 'y1', 'x2', 'y2']] = df1
df['x2'] += df['x1']
df['y2'] += df['y1']

Summing over repeated indices in a dictionary and returning the resulting values [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
I have a dictionary:
d = {
'inds': [0, 3, 7, 3, 3, 5, 1],
'vals': [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0]
}
I want to sum over the inds where it sums the repeated inds and outputs the following:
ind: 0 1 2 3* 4 5 6 7
x == [1.0, 7.0, 0.0, 11.0, 0.0, 6.0, 0.0, 3.0]
I've tried various loops but can't seem to figure it out or have idea where to begin otherwise.
>>> from collections import defaultdict
>>> indices = [0,3,7,3,3,5,1]
>>> vals = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0]
>>> d = defaultdict(float)
>>> for i, idx in enumerate(indices):
... d[idx] += vals[i]
...
>>> print(d)
defaultdict(<class 'float'>, {0: 1.0, 3: 11.0, 7: 3.0, 5: 6.0, 1: 7.0})
>>> x = []
>>> for i in range(max(indices)+1):
... x.append(d[i])
...
>>> x
[1.0, 7.0, 0.0, 11.0, 0.0, 6.0, 0.0, 3.0]
Using itertools.groupby
>>> z = sorted(zip(indices, vals), key=lambda x:x[0])
>>> z
[(0, 1.0), (1, 7.0), (3, 2.0), (3, 4.0), (3, 5.0), (5, 6.0), (7, 3.0)]
>>> for k, g in itertools.groupby(z, key=lambda x:x[0]):
... print(k, sum([t[1] for t in g]))
0 1.0
1 7.0
3 11.0
5 6.0
7 3.0
You need x to be a list of sums for every value (say i) in the range of 'inds' in d (min to max) of the 'vals' in d that have a inds matching i at the same position.
d = {
'inds': [0, 3, 7, 3, 3, 5, 1],
'vals': [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0]
}
result = [sum([val for ind, val in zip(d['inds'], d['vals']) if ind == i])
for i in range(min(d['inds']), max(d['inds']) + 1)]
print(result)
The output:
[1.0, 7.0, 0, 11.0, 0, 6.0, 0, 3.0]
No libraries required. Although the list comprehension isn't exactly easy to read - it's fairly efficient and matches the description.
A breakdown of the list comprehension into its parts:
for i in range(min(d['inds']), max(d['inds']) + 1) just gets i to range from the smallest value found in d['inds'] to the largest, the + 1 takes into account that range goes up to (but not including) the second passed to it.
zip(d['inds'], d['vals']) pairs up elements from d['inds'] and d['vals'] and the surrounding for ind, val in .. makes these pairs available as ind, val.
[val for ind, val in .. if ind == i] generates a list of val where ind matches the current i
So, all put together, it creates a list that has the sums of those values that have an index that matches some i for each i in the range of the minimum d['inds'] to the maximum d['inds'].

pandas dataframe how to do "elementwise" concatenation?

I have two pandas dataframes A,B with identical shape, index and column. Each element of A is a np.ndarray with shape (n,1), and each element of B is a float value. Now I want to efficiently append B elementwise to A. A minimal example:
index = ['fst', 'scd']
column = ['a','b']
A
Out[23]:
a b
fst [1, 2] [1, 4]
scd [3, 4] [3, 2]
B
Out[24]:
a b
fst 0.392414 0.641136
scd 0.264117 1.644251
resulting_df = pd.DataFrame([[np.append(A.loc[i,j], B.loc[i,j]) for i in index] for j in column], columns=column, index=index)
resulting_df
Out[27]:
a b
fst [1.0, 2.0, 0.392414377685] [3.0, 4.0, 0.264117463613]
scd [1.0, 4.0, 0.641136433253] [3.0, 2.0, 1.64425062851]
Is there something similar to pd.DataFrame.applymap that can operate elementwise between two instead of just one pandas dataframe?
You can convert the elements in df2 to list using applymap and then just ordinary addition to combine the list i.e
index = ['fst', 'scd']
column = ['a','b']
A = pd.DataFrame([[[1, 2],[1, 4]],[[3, 4],[3, 2]]],index,column)
B = pd.DataFrame([[0.392414,0.264117],[ 0.641136 , 1.644251]],index,column)
Option 1 :
n = B.applymap(lambda y: [y])
ndf = A.apply(lambda x : x+n[x.name])
Option 2 :
using pd.concat to know how this works check here i.e
pd.concat([A,B]).groupby(level=0).apply(lambda g: pd.Series({i: np.hstack(g[i].values) for i in A.columns}))
To make you current method give correct output shift the loops i.e
pd.DataFrame([[np.append(A.loc[i,j], B.loc[i,j]) for j in A.columns] for i in A.index], columns=A.columns, index=A.index)
Output:
a b
fst [1.0, 2.0, 0.392414] [1.0, 4.0, 0.264117]
scd [3.0, 4.0, 0.641136] [3.0, 2.0, 1.644251]
You can simply do this:
>>> A + B.applymap(lambda x : [x])
a b
fst [1, 2, 0.392414] [1, 4, 0.264117]
scd [3, 4, 0.641136] [3, 2, 1.644251]

Categories