Python dictionary merge up two dictionaries and building cumulative

Python dictionary merge up two dictionaries and building cumulative - python

I have two arrays date_IN and date_OUT that contain the dates when animals come in and leave the farm. Now I want to plot the total population over time.
date_IN, date_OUT
26.09.1999,19.12.2006
26.09.1999,19.01.2005
26.09.1999,15.02.2007
26.09.1999,29.03.2006
...
I tried to first count the entries for each day, subtract the number of animals that leave the farm from the number of animals that come in and then sum up the sorted values.
But unfortunately the subtraction doesn't work.
date_EIN, date_AUS=np.genfromtxt("Gesamtbestand.txt",delimiter=',',unpack = True, converters={ 0: mdates.strpdate2num('%d.%m.%Y'), 1: mdates.strpdate2num('%d.%m.%Y') or 0})
c = Counter(date_EIN)
d = Counter(date_AUS)
tn_each_day = c - d
sorted_keys = sorted(tn_each_day,key=tn_each_day.get)
z = cumsum(sorted(d.values())) # or z = cumsum([d[k] for k in sorted_keys])
tn = dict(zip(sorted_keys,z))
Does anyone have an idea how to fix this?

For brevity, let's use numbers instead of dates:
>>> from collections import Counter
>>> import itertools as it
>>> import operator as op
>>> ein = Counter([1,2,2,3,3])
>>> aus = Counter([1,2,3,4])
>>> delta = {k:ein.get(k,0)-aus.get(k,0) for k in set(it.chain(ein,aus))}
>>> delta
{1: 0, 2: 1, 3: 1, 4: -1}
>>> sorted_dates = sorted(delta)
>>> population = dict(zip(sorted_dates, it.accumulate((delta[k] for k in sorted_dates), add)))
>>> population
{1: 0, 2: 1, 3: 2, 4: 1}
i.e. for each date, population holds the number of animals present in the farm.
e.g.
on date 1 one animal went in and one went out -> 0 farm population
on date 2 two animals went in and one went out -> 1 farm population
on date 3 two animals went in and one went out -> 2 farm population
on date 4 zero animals went in and one went out -> 1 farm population

Subtracting a Counter from another Counter just removes the keys that exist in both Counters from the first Counter. This is not what you are trying to do. Here's a working example:
from collections import Counter
def compareDates(d1, d2):
d1, d2 = d1.split('.'), d2.split('.')
for i in range(2,-1,-1):
if d1[i] > d2[i]:
return 1
if d1[i] < d2[i]:
return -1
return 0
IN = ['26.09.1999', '26.09.1999', '26.09.1999', '26.09.1999', '26.8.2008']
OUT = ['19.12.2006', '19.01.2005', '15.02.2007', '29.03.2006', '27.8.2008']
c_IN = Counter(IN)
c_OUT = Counter(OUT)
tn_each_day = {}
for date, count in c_IN.items():
if date not in tn_each_day :
tn_each_day[date] = 0
tn_each_day[date] += count
for date, count in c_OUT.items():
if date not in tn_each_day :
tn_each_day [date] = 0
tn_each_day[date] -= count
cumulative = {}
population = 0
for date in sorted(tn_each_day, cmp=compareDates):
population += tn_each_day[date]
cumulative[date] = population
print '{}: {}'.format(date, population)
This produces an output of:
26.09.1999: 4
19.01.2005: 3
29.03.2006: 2
19.12.2006: 1
15.02.2007: 0
26.8.2008: 1
27.8.2008: 0

Related

(Python) pandas.DataFrame doesn't update value each for-loop cycle, why?

I am new to Pandas DataFrame and was curious why my basic thinking of adding new values to a new line doesn't work here.
I also tried using different ways with .loc[], .append(), but obciously used them in an incorrect way (still plenty to learn).
Instructions
Add a column to data named length, defined as the length of each word.
Add another column named frequency, which is defined as follows for each word in data:
If count > 10, frequency is "frequent".
If 1 < count <= 10, frequency
is "infrequent".
If count == 1, frequency is "unique".
My if sentenses record for all DataFrame only by last value of dictionary like object (Counter from pandas/numpy?). Word and count values are all returned within for cycle, so I don't understand why DataFrame cannot append values each cycle
data['length'] = ''
data['frequency'] = ''
for word, count in counted_text.items():
if count > 10:
data.length = len(word)
data.frequency = 'frequent'
if 1 < count <=10:
data.length = len(word)
data.frequency = 'infrequent'
if count == 1:
data.length = len(word)
data.frequency = 'unique'
print(word, len(word), '\n')
"""
This is working code that I googled
-----------------------------------
data = pd.DataFrame({
"word": list(counted_text.keys()),
"count": list(counted_text.values())
})
data["length"] = data["word"].apply(len)
data.loc[data["count"] > 10, "frequency"] = "frequent"
data.loc[data["count"] <= 10, "frequency"] = "infrequent"
data.loc[data["count"] == 1, "frequency"] = "unique"
"""
print(data.head(), '\n')
print(data.tail())
Output:
finis 5
word count length frequency
1 the 935 5 unique
2 tragedie 3 5 unique
3 of 576 5 unique
4 hamlet 97 5 unique
5 45513 5 unique
word count length frequency
5109 shooteexeunt 1 5 unique
5110 marching 1 5 unique
5111 peale 1 5 unique
5112 ord 1 5 unique
5113 finis 1 5 unique

Assuming you have only word and count in the data dataframe and that count will not have a value of 0, you could try the following -
import numpy as np
data['length'] = data['word'].str.len()
data['frequency'] = np.where(data['count'] > 10, 'frequent',\
np.where((data['count'] > 1) & (data['count'] <= 10),\
'infrequent', 'unique'))

After #Sajan gave a valid code, I came to a conclusion, that DataFrame doesn't need for-loop at all.

List manipulation for a Conditional Round-Robin Rugby Draw

One of the Rugby Coaches at my school have asked me to code a conditional rugby match draw for the upcoming games with the task laid out something like this: Given a list of teams from 1 - 12 split into 3 groups ([Group1 = 1, 2, 3, 4], [Group2 = 5, 6, 7, 8,], [Group3 = 9, 10, 11, 12])
generate and print an 11 round-robin matchup with the conditions that:
Teams in Group1 does NOT verse teams in Group3
Teams in Group1 verses every other team in Group 1 twice (Eg. 1v2, 2v1, 1v3, 3v1, 1v4, 4v1, 1v5, 5v1.....)
This same rule applies to teams in Group3 as they verse other teams in Group3
Teams in Group2 verse every other team once.
Teams in Group1 and Group3 need one Bye Game.
I have attempted multiple times but inevitably become stuck, below are my 2 attempts:
Attempt 1:
import operator
import functools
import random
###First Generation (Flawed unclean round robin)
def fixtures(teams):
if len(teams) % 2:
teams.append('Day off') # if team number is odd - use 'day off' as fake team
rotation = list(teams) # copy the list
random.shuffle(rotation)
fixtures = []
for i in range(0, len(teams)-1):
fixtures.append(rotation)
rotation = [rotation[0]] + [rotation[-1]] + rotation[1:-1]
return fixtures
def main():
# demo code
teams = ["Team1","Team2","Team3","Team4","Team5","Team6","Team7","Team8","Team9","Team10","Team11","Team12"]
groupA = ["Team1","Team2","Team3","Team4"]
groupB = ["Team5","Team6","Team7","Team8"]
groupC = ["Team9","Team10","Team11","Team12"]
# for one match each - use this block only
matches = fixtures(teams)
print("flawed matches:")
RoundCounter = 0
homeTeams = []
awayTeams = []
for f in matches:
#print(f)
homeTeams = f[::2]
awayTeams = f[1::2]
print("Home Teams:{}".format(homeTeams))
print("Away Teams:{}".format(awayTeams))
HomeTeamGroupA = set(homeTeams).intersection(groupA)
HomeTeamGroupC = set(homeTeams).intersection(groupC)
AwayTeamGroupA = set(awayTeams).intersection(groupA)
AwayTeamGroupC = set(awayTeams).intersection(groupC)
VSCounter = 0
for p, o in zip(homeTeams, awayTeams):
if p in HomeTeamGroupA:
if o in AwayTeamGroupC:
AvsCPosition = awayTeams.index(o)
VSCounter += 1
RoundCleanUp(homeTeams, awayTeams, AvsCPosition, VSCounter) #if this is returned begin cleaning the round
else: print("GroupA is versing either Group B or GroupA") #if this is returned it is a team 1-4 but is vs either group b or group a
elif p in HomeTeamGroupC:
if o in AwayTeamGroupA:
AvsCPosition = awayTeams.index(o)
VSCounter += 1
RoundCleanUp(homeTeams, awayTeams, AvsCPosition, VSCounter) #if this is returned begin cleaning the round
else:
print("GroupC is versing either Group B or GroupC") #if this is returned it is a team 9-12 but is vs either group b or group c
else:
pass
def RoundCleanUp(HTeam, ATeam, AvsCPos, VSCounter):
##gets Value of List at position
HTeamVal = HTeam[AvsCPos]
ATeamVal = ATeam[AvsCPos]
main()
Attempt 2:
import operator
import functools
import random
def make_round(rotation, num_teams, fixtures):
for i in range(num_teams - 1):
rotation = list(range(1, num_teams + 1))
# clip to 0 .. num_teams - 2 # if i == 0, no rotation is needed (and using -0 as list index will cause problems)
i %= (num_teams - 1)
if i:
rotation = rotation[:1] + rotation[-i:] + rotation[1:-i]
half = num_teams // 2
fixtures.append(list(rotation[:half]))
fixtures.append(list(rotation[half:][::-1]))
return fixtures
def make_schedule(teams):
"""Produces RoundRobin"""
# number of teams must be even
TeamLength = len(teams)
if TeamLength % 2:
TeamLength += 1 # add a dummy team for padding
# build first round-robin
rotation = list(teams)
Fixture = []
schedule = make_round(rotation, TeamLength, Fixture)
return schedule
def homeAwayRotation(matches):
for homeTeams, awayTeams in zip(matches[0::2], matches[1::2]):
print("Home Rotation: {}".format(homeTeams))
print("Away Rotation: {}".format(awayTeams))
validation(homeTeams, awayTeams)
def validation(homeTeams, awayTeams):
groupA = [1, 2, 3, 4]
groupC = [9, 10, 11, 12]
for x, y in zip(homeTeams, awayTeams):
if x in groupA:
if y in groupC:
AvsCPosition = awayTeams.index(y)
cleanDirtyData(homeTeams, awayTeams, AvsCPosition)
else:
# if this is returned it is a team 1-4 but is vs either group b or group a
print("Group A vsing either itself or GroupB\n")
elif x in groupC:
if y in groupA:
AvsCPosition = awayTeams.index(y)
cleanDirtyData(homeTeams, awayTeams, AvsCPosition)
else:
# if this is returned it is a team 9-12 but is vs either group b or group c
print("Group C vsing either itself or GroupB\n")
else:
# if this is returned it is a team in group B
print("This is team B\n")
def cleanDirtyData(homeTeams, awayTeams, AvsCPosition):
HTeamVal = homeTeams[AvsCPosition]
ATeamVal = awayTeams[AvsCPosition]
Dirtlist = []
Dirtlist.append(HTeamVal)
Dirtlist.append(ATeamVal)
def main():
# demo code
teams = ["Team1", "Team2", "Team3", "Team4", "Team5", "Team6",
"Team7", "Team8", "Team9", "Team10", "Team11", "Team12"]
# for one match each - use this block only
matches = make_schedule(teams)
print("flawed matches:")
homeAwayRotation(matches)
main()
My expected results would be printing each round showing which team is versing which and each team having a history a bit like this:
a team in Group1 has a verse history of: (in any random order)
1v2, 2v1, 1v3, 3v1, 1v4, 4v1, 1v5, 1v6, 1v7, 1v8, bye
a team in Group2 has a verse history of: (in any random order)
5v1, 5v2, 5v3, 5v4, 5v6, 5v7, 5v8, 5v9 5v10, 5v11, 5v12
a team in Group3 has a verse history of: (in any random order)
9v10, 10v9, 9v11, 11v9, 9v12, 12v9, 9v5, 9v6, 9v7, 9v8, bye
Any pointers or improvements I could possibly do would be greatly appreciated as I have been stuck on the final hurdle for the last 2 weeks

If I have understood the problem correct, then all you need is some combining of teams with every member in different groups.
I put some code together that should solve your problem:
def vs(team, group):
matchups = map(lambda opponent: (team,opponent), group)
matchups = filter(lambda tup: tup[0] != tup[1], matchups)
return list(matchups)
def matches(teams):
group_size = len(teams) // 3
# Make the groups, basically just splitting the team list in three parts
groups = [teams[:group_size], teams[group_size:2*group_size], teams[2*group_size:]]
matchups = []
for index, team in enumerate(teams):
group_index = index // group_size
current_matchup = []
# Check if we're working with group 1 or 3
if group_index == 0 or group_index == 2:
# Flip the order of a tuple
def flip(x):
return (x[1], x[0])
own_group = vs(team, groups[group_index])
# Add matches against everyone in the group
current_matchup.extend(own_group)
# Add matches agains everyone in the group again, but now the current team is 'away'
current_matchup.extend(list(map(flip, own_group)))
# Add matches against everyone in group 2
current_matchup.extend(vs(team, groups[1]))
# Lastly, add the bye
current_matchup.append((team, "bye"))
else:
# Just all matches against all other teams, once.
current_matchup.extend(vs(team, teams))
matchups.append(current_matchup)
return matchups
# This can be anything. Numbers, 'Team 1' or even 'The wondrous flying squirrels of death'
teams = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
# Make matches
matchups = matches(teams)
# Just pretty print
for i in range(len(matchups)):
matches = '\n\t'.join(map(lambda m: f'{str(m[0]).rjust(10)} vs {str(m[1]).ljust(10)}', matchups[i]))
print(f"Team '{teams[i]}' matches:\n\t{matches}")

Feature extraction from the training data

I have a training data like below which have all the information under a single column. The data set has above 300000 data.
id features label
1 name=John Matthew;age=25;1.=Post Graduate;2.=Football Player; 1
2 name=Mark clark;age=21;1.=Under Graduate;Interest=Video Games; 1
3 name=David;age=12;1:=High School;2:=Cricketer;native=america; 2
4 name=George;age=11;1:=High School;2:=Carpenter;married=yes 2
.
.
300000 name=Kevin;age=16;1:=High School;2:=Driver;Smoker=No 3
Now i need to convert this training data like below
id name age 1 2 Interest married Smoker
1 John Matthew 25 Post Graduate Football Player Nan Nan Nan
2 Mark clark 21 Under Graduate Nan Video Games Nan Nan
.
.
Is there any efficient way to do this. I tried the below code but it took 3 hours to complete
#Getting the proper features from the features column
cols = {}
for choices in set_label:
collection_list = []
array = train["features"][train["label"] == choices].values
for i in range(1,len(array)):
var_split = array[i].split(";")
try :
d = (dict(s.split('=') for s in var_split))
for x in d.keys():
collection_list.append(x)
except ValueError:
Error = ValueError
count = Counter(collection_list)
for k , v in count.most_common(5):
key = k.replace(":","").replace(" ","_").lower()
cols[key] = v
columns_add = list(cols.keys())
train = train.reindex(columns = np.append( train.columns.values, columns_add))
print (train.columns)
print (train.shape)
#Adding the values for the newly created problem
for row in train.itertuples():
dummy_dic = {}
new_dict={}
value = train.loc[row.Index, 'features']
v_split = value.split(";")
try :
dummy_dict = (dict(s.split('=') for s in v_split))
for k, v in dummy_dict.items():
new_key = k.replace(":","").replace(" ","_").lower()
new_dict[new_key] = v
except ValueError:
Error = ValueError
for k,v in new_dict.items():
if k in train.columns:
train.loc[row.Index, k] = v
Is there any useful function that i can apply here for efficient way of feature extraction ?

Create two DataFrames (in the first one all the features are the same for every data point and the second one is a modification of the first one introducing different features for some data points) meeting your criteria:
import pandas as pd
import numpy as np
import random
import time
import itertools
# Create a DataFrame where all the keys for each datapoint in the "features" column are the same.
num = 300000
NAMES = ['John', 'Mark', 'David', 'George', 'Kevin']
AGES = [25, 21, 12, 11, 16]
FEATURES1 = ['Post Graduate', 'Under Graduate', 'High School']
FEATURES2 = ['Football Player', 'Cricketer', 'Carpenter', 'Driver']
LABELS = [1, 2, 3]
df = pd.DataFrame()
df.loc[:num, 0]= ["name={0};age={1};feature1={2};feature2={3}"\
.format(NAMES[np.random.randint(0, len(NAMES))],\
AGES[np.random.randint(0, len(AGES))],\
FEATURES1[np.random.randint(0, len(FEATURES1))],\
FEATURES2[np.random.randint(0, len(FEATURES2))]) for i in xrange(num)]
df['label'] = [LABELS[np.random.randint(0, len(LABELS))] for i in range(num)]
df.rename(columns={0:"features"}, inplace=True)
print df.head(20)
# Create a modified sample DataFrame from the previous one, where not all the keys are the same for each data point.
mod_df = df
random_positions1 = random.sample(xrange(10), 5)
random_positions2 = random.sample(xrange(11, 20), 5)
INTERESTS = ['Basketball', 'Golf', 'Rugby']
SMOKING = ['Yes', 'No']
mod_df.loc[random_positions1, 'features'] = ["name={0};age={1};interest={2}"\
.format(NAMES[np.random.randint(0, len(NAMES))],\
AGES[np.random.randint(0, len(AGES))],\
INTERESTS[np.random.randint(0, len(INTERESTS))]) for i in xrange(len(random_positions1))]
mod_df.loc[random_positions2, 'features'] = ["name={0};age={1};smoking={2}"\
.format(NAMES[np.random.randint(0, len(NAMES))],\
AGES[np.random.randint(0, len(AGES))],\
SMOKING[np.random.randint(0, len(SMOKING))]) for i in xrange(len(random_positions2))]
print mod_df.head(20)
Assume that your original data is stored in a DataFrame called df.
Solution 1 (all the features are the same for every data point).
def func2(y):
lista = y.split('=')
value = lista[1]
return value
def function(x):
lista = x.split(';')
array = [func2(i) for i in lista]
return array
# Calculate the execution time
start = time.time()
array = pd.Series(df.features.apply(function)).tolist()
new_df = df.from_records(array, columns=['name', 'age', '1', '2'])
end = time.time()
new_df
print 'Total time:', end - start
Total time: 1.80923295021
Edit: The one thing you need to do is to edit accordingly the columns list.
Solution 2 (The features might be the same or different for every data point).
import pandas as pd
import numpy as np
import time
import itertools
# The following functions are meant to extract the keys from each row, which are going to be used as columns.
def extract_key(x):
return x.split('=')[0]
def def_columns(x):
lista = x.split(';')
keys = [extract_key(i) for i in lista]
return keys
df = mod_df
columns = pd.Series(df.features.apply(def_columns)).tolist()
flattened_columns = list(itertools.chain(*columns))
flattened_columns = np.unique(np.array(flattened_columns)).tolist()
flattened_columns
# This function turns each row from the original dataframe into a dictionary.
def function(x):
lista = x.split(';')
dict_ = {}
for i in lista:
key, val = i.split('=')
dict_[key ] = val
return dict_
df.features.apply(function)
arr = pd.Series(df.features.apply(function)).tolist()
pd.DataFrame.from_dict(arr)

Suppose your data is like this :
features= ["name=John Matthew;age=25;1:=Post Graduate;2:=Football Player;",
'name=Mark clark;age=21;1:=Under Graduate;2:=Football Player;',
"name=David;age=12;1:=High School;2:=Cricketer;",
"name=George;age=11;1:=High School;2:=Carpenter;",
'name=Kevin;age=16;1:=High School;2:=Driver; ']
df = pd.DataFrame({'features': features})
I will start by this answer and try to replace all separator (name, age , 1:= , 2:= ) by ;
with this function
def replace_feature(x):
for r in (("name=", ";"), (";age=", ";"), (';1:=', ';'), (';2:=', ";")):
x = x.replace(*r)
x = x.split(';')
return x
df = df.assign(features= df.features.apply(replace_feature))
After applying that function to your df all the values will a list of features. where you can get each one by index
then I use 4 customs function to get each attribute name, age, grade; job,
Note: There can be a better way to do this by using only one function
def get_name(df):
return df['features'][1]
def get_age(df):
return df['features'][2]
def get_grade(df):
return df['features'][3]
def get_job(df):
return df['features'][4]
And finaly applying that function to your dataframe :
df = df.assign(name = df.apply(get_name, axis=1),
age = df.apply(get_age, axis=1),
grade = df.apply(get_grade, axis=1),
job = df.apply(get_job, axis=1))
Hope this will be quick and fast

As far as I understand your code, the poor performances comes from the fact that you create the dataframe element by element. It's better to create the whole dataframe at once whith a list of dictionnaries.
Let's recreate your input dataframe :
from StringIO import StringIO
data=StringIO("""id features label
1 name=John Matthew;age=25;1.=Post Graduate;2.=Football Player; 1
2 name=Mark clark;age=21;1.=Under Graduate;2.=Football Player; 1
3 name=David;age=12;1:=High School;2:=Cricketer; 2
4 name=George;age=11;1:=High School;2:=Carpenter; 2""")
df=pd.read_table(data,sep=r'\s{3,}',engine='python')
we can check :
print df
id features label
0 1 name=John Matthew;age=25;1.=Post Graduate;2.=F... 1
1 2 name=Mark clark;age=21;1.=Under Graduate;2.=Fo... 1
2 3 name=David;age=12;1:=High School;2:=Cricketer; 2
3 4 name=George;age=11;1:=High School;2:=Carpenter; 2
Now we can create the needed list of dictionnaries with the following code :
feat=[]
for line in df['features']:
line=line.replace(':','.')
lsp=line.split(';')[:-1]
feat.append(dict([elt.split('=') for elt in lsp]))
And the resulting dataframe :
print pd.DataFrame(feat)
1. 2. age name
0 Post Graduate Football Player 25 John Matthew
1 Under Graduate Football Player 21 Mark clark
2 High School Cricketer 12 David
3 High School Carpenter 11 George

Python: Why am I getting a "ZeroDivisionError: division by zero" in function?

My goal is to count the total amount of tweets in a file that fall under certain time zones.
I have the following function (I have noted the trouble area near the end of the function with comments):
def readTweets(inFile, wordsName):
words = []
lat = 0
long = 0
keyword = keywords(wordsName)
sents = keywordSentiment(wordsName)
value = 0
eastern = 0
central = 0
mountain = 0
pacific = 0
a = 0
b = 0
c = 0
d = 0
easternTweets = 0
centralTweets = 0
mountainTweets = 0
pacificTweets = 0
for line in inFile:
entry = line.split()
for n in range(0, len(entry) - 1):
entry[n] = entry[n].strip("[],!?#./-=+_#")
if n > 4: # n>4 because words begin on 5th index of list
entry[n] = entry[n].lower()
words.append(entry[n])
lat = float(entry[0])
long = float(entry[1])
timezone = getTimeZone(lat, long)
if timezone == "eastern":
easternTweets += 1
if timezone == "central":
centralTweets += 1
if timezone == "mountain":
mountainTweets += 1
if timezone == "pacific":
pacificTweets += 1
for i in range(0, len(words)):
for k in range(0, len(keyword)):
if words[i] == keyword[k]:
value = int(sents[k])
if timezone == "eastern":
eastern += value
a += 1
if timezone == "central":
central += value
b += 1
if timezone == "mountain":
mountain += value
c += 1
if timezone == "pacific":
pacific += value
d += 1
# the values of a,b,c,d are 0
easternTotal = eastern/a # getting error
centralTotal = central/b # for
mountainTotal = mountain/c # these
pacificTotal = pacific/d # values
print("Total tweets per time zone:")
print("Eastern: %d" % easternTweets)
print("Central: %d" % centralTweets)
print("Mountain: %d" % mountainTweets)
print("Pacific: %d" % pacificTweets)
I am getting a ZeroDivisionError: division by zero error for easternTotal and the other total values that use a, b, c, and d for division.
If I print the values of a, b, c, or d it shows 0. My question is why are their values 0? Does the value of a, b, c, and d not change in the if statements?

So the only way this can happen is because the code that increments a, b, c and d is never reached.
That can have a few reasons:
inFile is empty so the whole for loop never enters its body
len(words) is 0, so that for loop never enters its body
len(keywords) is 0, so that for loop never enters its body
The value of timezone is something other than those values you test for
words is initially [], so its length can stay 0 if that loop that appends things to it never runs.
From here, it's impossible for us to see which of these is happening, but it should be very easy for you with some print statements or such.

you divide eastern by 0. You can avoid it by doing
easternTotal = eastern/a if a > 0 else eastern

because you set a,b,c,d=0;
when readTweets(inFile, wordsName) did not get any data, "eastern/a" may cause "eastern/0 " .
So, make sure your readTweets() did get data first.

Python - getting the average of two dict lists?

Here I have two lists: "donlist" is a list of random donation amounts between $1 and $100, and "charlist" is a list of random charity numbers between 1 and 15. I used two "dict"'s: "totals" to calculate the total donation per charity, and "numdon" to calculate the number of donations per charity. I now have to find the average donation per charity. I tried dividing "totals" by "numdon", but the output is just a list of "1.0"'s. I think it's because the dict has the charity number as well as the total/number of donations in them. Please help me calculate the average donation per charity. Thank you!
from __future__ import division
import random
from collections import defaultdict
from pprint import pprint
counter = 0
donlist = []
charlist = []
totals = defaultdict(lambda:0)
numdon = defaultdict(lambda:0)
while counter != 100:
d = round(random.uniform(1.00,100.00),2)
c = random.randint(1,15)
counter +=1
donlist.append(d)
donlist = [round(elem,2) for elem in donlist]
charlist.append(c)
totals[c] += d
numdon[c] += 1
if counter == 100:
break
print('Charity \t\tDonations')
for (c,d) in zip(charlist,donlist):
print(c,d,sep='\t\t\t')
print("\nTotal Donations per Charity:")
pprint(totals)
print("\nNumber of Donations per Charity:")
pprint(numdon)
# The average array doesn't work; I think it's because the "totals" and "numdon" have the charity numbers in them, so it's not just two lists of floats to divide.
avg = [x/y for x,y in zip(totals,numdon)]
pprint(avg)

Solve your problem:
avg = [totals[i] / numdon[i] for i in numdon]
Reason:
In python list comprehension of a dict, the default iteration will be on the key of a dict. Try this:
l = {1: 'a', 2: 'b'}
for i in l:
print(i)
# output:
# 1
# 2

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python dictionary merge up two dictionaries and building cumulative - python

Related

(Python) pandas.DataFrame doesn't update value each for-loop cycle, why?

List manipulation for a Conditional Round-Robin Rugby Draw

Feature extraction from the training data

Python: Why am I getting a "ZeroDivisionError: division by zero" in function?

Python - getting the average of two dict lists?

Categories

Resources