Aggregate Monthly Values

Aggregate Monthly Values - python

I have a python list containing multiple list:
A = [['1/1/1999', '3.0'],
['1/2/1999', '4.5'],
['1/3/1999', '6.8'],
......
......
['12/31/1999', '8.7']]
What I need is to combine all the values corresponding to each month, preferably in the form of a dictionary containing months as keys and their values as values.
Example:
>>> A['1/99']
>>> ['3.0', '4.5', '6.8'.....]
Or in the form of a list of list, so that:
Example:
>>> A[0]
>>> ['3.0', '4.5', '6.8'.....]
Thanks.

Pandas is perfect for this, if you don't mind another dependency:
For example:
import pandas
import numpy as np
# Generate some data
dates = pandas.date_range('1/1/1999', '12/31/1999')
values = (np.random.random(dates.size) - 0.5).cumsum()
df = pandas.DataFrame(values, index=dates)
for month, values in df.groupby(lambda x: x.month):
print month
print values
The really neat thing, though, is aggregation of the grouped DataFrame. For example, if we wanted to see the min, max, and mean of the values grouped by month:
print df.groupby(lambda x: x.month).agg([min, max, np.mean])
This yields:
min max mean
1 -0.812627 1.247057 0.328464
2 -0.305878 1.205256 0.472126
3 1.079633 3.862133 2.264204
4 3.237590 5.334907 4.025686
5 3.451399 4.832100 4.303439
6 3.256602 5.294330 4.258759
7 3.761436 5.536992 4.571218
8 3.945722 6.849587 5.513229
9 6.630313 8.420436 7.462198
10 4.414918 7.169939 5.759489
11 5.134333 6.723987 6.139118
12 4.352905 5.854000 5.039873

from collections import defaultdict
from datetime import date
month_aggregate = defaultdict (list)
for [d,v] in A:
month, day, year = map(int, d.split('/'))
date = date (year, month, 1)
month_aggregate [date].append (v)
I iterate over each date and value, I pull out the year and month and create a date with those values. I then append the value to a list associated with that year and month.
Alternatively, if you want to use a string as a key then you can
from collections import defaultdict
month_aggregate = defaultdict (list)
for [d,v] in A:
month, day, year = d.split('/')
month_aggregate [month + "/" + year[2:]].append (v)

here is my solution without includes
def getKeyValue(lst):
a = lst[0].split('/')
return '%s/%s' % (a[0], a[2][2:]), lst[1]
def createDict(lst):
d = {}
for e in lst:
k, v = getKeyValue(e)
if not k in d: d[k] = [v]
else: d[k].append(v)
return d
A = [['1/1/1999', '3.0'],
['1/2/1999', '4.5'],
['1/3/1999', '6.8'],
['12/31/1999', '8.7']]
print createDict(A)
>>>{'1/99': ['3.0', '4.5', '6.8'], '12/99': ['8.7']}

Related

Group and sum values of elements of a list, according by month of another time-stamp value

I have this list in Python:
L_initial=[[1,'2019-01-01'], [1,'2019-01-01'],[2,'2019-02-02'],[4,'2019-03-03'],[5,'2019-03-04']]
I want to sum up all first value of the elements, grouped by same month of time-stamp form string, a final list is like this:
L_final=[[2,'2019-01'],[2,'2019-02'],[9,'2019-03']]
I am able to do it by day but not by month, this is my code for sum day per day :
dico = {}
for v, k in liste_initial:
dico[k] = dico.get(k, 0) + v
liste_final=[[v, k] for k, v in dico.items()]

Since your dates are in YYYY-MM-DD format, you can simply take the first 7 characters of each date to group by month:
liste_initial = [[1,'2019-01-01'], [1,'2019-01-01'],[2,'2019-02-02'],[4,'2019-03-03'],[5,'2019-03-04']]
dico = {}
for v, k in liste_initial:
dico[k[:7]] = dico.get(k[:7], 0) + v
liste_final=[[v, k] for k, v in dico.items()]
Output:
[[2, '2019-01'], [2, '2019-02'], [9, '2019-03']]

How do I keep only min and max values of sublists within a list?

I have a dataframe column as:
df = pd.DataFrame({'a':[1,3,5,7,4,5,6,4,7,8,9]})
My output list needed is (keeping the min max values of ranges):
[[1,4],[5,7],[8,9]]
Here's how far I got:
import pandas as pd
df = pd.DataFrame({'a':[1,3,5,7,4,5,6,4,7,8,9]})
# Convert df column to a unique value list and sort ascending
us = df['a'].unique().tolist()
us.sort()
lst1 = [int(v) for v in us]
# Create 3 groups of values
lst2 = [lst1[i:i + 3] for i in xrange(0, len(lst1), 3)]
# Keep only min and max of these groups
How do I convert this:
[[1,3,4],[5,6,7],[8,9]]
to my desired output ?

You can use a list comprehension for this:
lst3 = [[min(i), max(i)] for i in lst2]

You can use dataframe
df = df.sort_values("a").drop_duplicates().reset_index(drop=True)
df.groupby(df.index // 3).agg(['min', 'max']).values.tolist()

Python - sorting a list of numbers based on indexes

I need to create a program that has a class that crates an object "Food" and a list called "fridge" that holds these objects created by class "Food".
class Food:
def __init__(self, name, expiration):
self.name = name
self.expiration = expiration
fridge = [Food("beer",4), Food("steak",1), Food("hamburger",1), Food("donut",3),]
This was not hard. Then i created an function, that gives you a food with highest expiration number.
def exp(fridge):
expList=[]
xen = 0
for i in range(0,len(fridge)):
expList.append(fridge[xen].expiration)
xen += 1
print(expList)
sortedList = sorted(expList)
return sortedList.pop()
exp(fridge)
This one works too, now i have to create a function that returns a list where the index of the list is the expiration date and the number of that index is number of food with that expiration date.
The output should look like: [0,2,1,1] - first index 0 means that there is no food with expiration date "0". Index 1 means that there are 2 pieces of food with expiration days left 1. And so on. I got stuck with too many if lines and i cant get this one to work at all. How should i approach this ? Thanks for the help.

In order to return it as a list, you will first need to figure out the maximum expiration date in the fridge.
max_expiration = max(food.expiration for food in fridge) +1 # need +1 since 0 is also a possible expiration
exp_list = [0] * max_expiration
for food in fridge:
exp_list[food.expiration] += 1
print(exp_list)
returns [0, 2, 0, 1, 1]

You can iterate on the list of Food objects and update a dictionary keyed on expiration, with the values as number of items having that expiration. Avoid redundancy such as keeping zero counts in a list by using a collections.Counter object (a subclass of dict):
from collections import Counter
d = Counter(food.expiration for food in fridge)
# fetch number of food with expiration 0
print(d[0]) # -> 0
# fetch number of food with expiration 1
print(d[1]) # -> 2

You can use itertools.groupby to create a dict where key will be the food expiration date and value will be the number of times it occurs in the list
>>> from itertools import groupby
>>> fridge = [Food("beer",4), Food("steak",1), Food("hamburger",1), Food("donut",3),]
>>> d = dict((k,len(list(v))) for k,v in groupby(sorted(l,key=lambda x: x.expiration), key=lambda x: x.expiration))
Here we specify groupby to group all elements of list that have same expiration(Note the key argument in groupby). The output of groupby operation is roughly equivalent to (k,[v]), where k is the group key and [v] is the list of values belong to that particular group.
This will produce output like this:
>>> d
>>> {1: 2, 3: 1, 4: 1}
At this point we have expiration and number of times a particular expiration occurs in a list, stored in a dict d.
Next we need to create a list such that If an element is present in the dict d output it, else output 0. We need to iterate from 0 till max number in dict d keys. To do this we can do:
>>> [0 if not d.get(x) else d.get(x) for x in range(0, max(d.keys())+1)]
This will yield your required output
>>> [0,2,0,1,1]

Here is a flexible method using collections.defaultdict:
from collections import defaultdict
def ReverseDictionary(input_dict):
reversed_dict = defaultdict(set)
for k, v in input_dict.items():
reversed_dict[v].add(k)
return reversed_dict
fridge_dict = {f.name: f.expiration for f in fridge}
exp_food = ReverseDictionary(fridge_dict)
# defaultdict(set, {1: {'hamburger', 'steak'}, 3: {'donut'}, 4: {'beer'}})
exp_count = {k: len(exp_food.get(k, set())) for k in range(max(exp_food)+1)}
# {0: 0, 1: 2, 2: 0, 3: 1, 4: 1}

Modify yours with count().
def exp(fridge):
output = []
exp_list = [i.expiration for i in fridge]
for i in range(0, max(exp_list)+1):
output.append(exp_list.count(i))
return output

How to get an average from a row then make a list out of it [duplicate]

This question already has answers here:
Reading a CSV file, calculating averages and printing said averages
(2 answers)
Closed 5 years ago.
If I have a csv data that gives two row values of:
years grades
2001 98
2001 75
2008 100
2003 57
2008 50
I have more values but I will try to explain what I am trying to get here.
I want to get the average for each year. For instance, for 2001, the answer would be (98+75)/(# of counts 2001, which is 2 in this case).
def construct_values(filing):
"""
Parameters
----------
Returns
-------
years: array of integers
average_grades: array of floats
"""
years, average_grades = [], []
grades = []
with open('grades.csv', 'r') as filing:
next(filing)
for row in file_path:
year, grade = (s.strip() for s in row.split(','))
years.append(year)
grades.append(grade)
return years, average_grades
What I did was just to get two arrays of years and grades. I don't know how to get average arrays and then print out like:
2001, 88.5555 for instance if 88.555 is average.
Instead of being them as dictionaries, what I want to have are two arrays that will just return together.

Why not build a dictionary of grades with the key being each year:
from collections import defaultdict
grades = defaultdict(lambda: [])
with open('grades.csv', 'r') as f:
year, grade = (s.strip() for s in row.split(','))
grades[year].append(grade)
Then print the averages:
for y, g in grades:
print('{}: {}', y, sum(g) / float(len(g)))

you can use defaultdict to form a dictionary whose value (grade) is a list , key is year, then append grade to same year in the dictionary, after that data will be a defaultdict(list):
defaultdict(<type 'list'>, {'2003': ['57'], '2008': ['100', '50'], '2001': ['98', '75']})
Then, you can for loop the key and value to calculate the average:
from collections import defaultdict
data = defaultdict(list)
average_grade_by_year = dict()
with open('grades.csv', 'r') as filing:
next(filing)
for row in filing:
year, grade = (s.strip() for s in row.split(','))
data[year].append(grade)
for k, v in data.items():
average_grade_by_year[k] = float(sum(int(x) for x in v))/len(v)
print(average_grade_by_year)
average_grade_by_year will be: {'2001': 86.5, '2003': 57.0, '2008': 75.0}

Splitting data - specific case

I'm trying to split some data, the data is in this form...
['20150406,34.4800,34.8100,34.2300,34.4200,21480500', '20150407,34.5400,34.8900,34.5100,34.6300,14331200']
The first item in each string in the list is a date, I am trying split the data at a chosen date. But have the whole string... For example if my chosen date was 2015-04-07 the above data would split like this...
['20150406,34.4800,34.8100,34.2300,34.4200,21480500']
['20150407,34.5400,34.8900,34.5100,34.6300,14331200']
This also has to work for lists with lots of strings in the same form as this...

Use next() and enumerate() to find the position of the string with the desired date, then slice:
>>> d = '20150407'
>>> l = [
... '20150406,34.4800,34.8100,34.2300,34.4200,21480500',
... '20160402,34.1,32.8100,33.2300,31.01,22282510',
... '20150407,34.5400,34.8900,34.5100,34.6300,14331200',
... '20120101,2.540,14.8201,32.00,30.1311,12331230'
... ]
>>> index = next(i for i, item in enumerate(l) if item.startswith(d))
>>> l[:index]
['20150406,34.4800,34.8100,34.2300,34.4200,21480500', '20160402,34.1,32.8100,33.2300,31.01,22282510']
>>> l[index:]
['20150407,34.5400,34.8900,34.5100,34.6300,14331200', '20120101,2.540,14.8201,32.00,30.1311,12331230']
Couple notes:
next() would through a StopIteration exception if there will be no match - you should either handle it with try/except or provide a default value, -1 for example:
next((i for i, item in enumerate(l) if item.startswith(d)), -1)
to check if the date matches a desired one, we are simply checking if an item starts with a specific date string. If the desired date comes as a date or datetime, you would need to format it beforehand using strftime():
>>> from datetime import datetime
>>> d = datetime(2015, 4, 7)
>>> d = d.strftime("%Y%m%d")
>>> d
'20150407'

I think you want a groupby, grouping strings that don't start with the date and ones that do so the date delimits the groups:
l = ['20150406,34.4800,34.8100,34.2300,34.4200,21480500', '2015010,34.5400,34.8900,34.5100,34.6300,14331200'
, '20150407,34.5400,34.8900,34.5100,34.6300,14331200']
dte = "2015-04-07"
delim = dte.replace("-","") + ","
from itertools import groupby
print([list(v) for k,v in groupby(l,key=lambda x: not x.startswith(delim))])
[['20150406,34.4800,34.8100,34.2300,34.4200,21480500', '2015010,34.5400,34.8900,34.5100,34.6300,14331200'], ['20150407,34.5400,34.8900,34.5100,34.6300,14331200']]
The groupby will keep splitting the data as many times as there are strings the start with the date.

by extend from alecxe answer:
The code can split original list to couple sublist by input date.
l = [
... '20150406,34.4800,34.8100,34.2300,34.4200,21480500',
... '20160402,34.1,32.8100,33.2300,31.01,22282510',
... '20150407,34.5400,34.8900,34.5100,34.6300,14331200',
... '20120101,2.540,14.8201,32.00,30.1311,12331230',
... '20150407,34.5400,34.8900,34.5100,34.6300,14331200',]
index = [i for i, item in enumerate(l) if item.startswith(d)]
[l[i:j] for i, j in zip([0]+index, index+[None])]
output:
[['20150406,34.4800,34.8100,34.2300,34.4200,21480500', '20160402,34.1,32.8100,33.2300,31.01,22282510'], ['20150407,34.5400,34.8900,34.5100,34.6300,14331200', '20120101,2.540,14.8201,32.00,30.1311,12331230'], ['20150407,34.5400,34.8900,34.5100,34.6300,14331200']]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Aggregate Monthly Values - python

Related

Group and sum values of elements of a list, according by month of another time-stamp value

How do I keep only min and max values of sublists within a list?

Python - sorting a list of numbers based on indexes

How to get an average from a row then make a list out of it [duplicate]

Splitting data - specific case

Categories

Resources