Get a running total from a list - python

I'm reading in items:
for line in sys.stdin:
line = line.strip()
data = line.split("-")
If I print data as it is read, it looks like:
['Adam', '5']
['Peter', '7']
['Adam', '8']
['Lucy', '2']
['Peter', '4']
How can I get a running total for each unique name, such my new list would look like:
['Adam', '13'],
['Peter', '11'],
['Lucy', '2']

Use a collections.Counter() to count the occurrences:
import collections
lines = [['Adam', '5'],
['Peter', '7'],
['Adam', '8'],
['Lucy', '2'],
['Peter', '4']]
counter = collections.Counter()
for data in lines:
counter[data[0]] += int(data[1])
print(counter)
You'll get:
Counter({'Adam': 13, 'Peter': 11, 'Lucy': 2})

Initialize a defaultdict with type int and use the name as the key
from collections import defaultdict
name_list = defaultdict(int)
for line in sys.stdin:
line = line.strip()
data = line.split("-")
name = data[0]
value = int(data[1])
name_list[name] += value
for key, value in name_list.items(): print key, value

I recommend creating a dictonary and updating that as you go. I have assumed your data format for data is a list of lists.
finalList = {}
for name, value in data:
if name in finalList.keys():
finalList[name] = finalList[name] + int(value)
else:
finalList[name] = int(value)
print(finalList)

Pandas does a very good job in handling this kind of situations
import pandas as pd
df_data=pd.read_csv(filepath_or_buffer=path,sep='_',names =['Name','value'])
df=df_data.groupby(['Name'])['value'].sum()
print df
output
'Adam' 13
'Lucy' 2
'Peter' 11
Input file
Adam_5
Peter_7
Adam_8
Lucy_2
Peter_4

Related

creating dictionary with nested loop

I tried to create a dictionary with nested loops but failed. I do not know what's wrong:
dict={}
for i in range(0,4):
node_1=str(i)
for j in range(0,4):
node_2=str(j)
dict[node_1]=[node_2]
print(dict)
It should have created:
{'0':['1','2','3'],'1':['0','2','3'],'2':['0','1','3']}
In your code, you are overwriting the previous j value with the new j value. Instead, you should be appending it to a list.
mydict = {}
for i in range(0,4):
node_1 = str(i)
mydict[node_1] = [] # assign empty list
for j in range(0,4):
node_2 = str(j)
mydict[node_1].append(node_2) # append in list
print(mydict)
Output:
{'0': ['0', '1', '2', '3'], '1': ['0', '1', '2', '3'], '2': ['0', '1', '2', '3'], '3': ['0', '1', '2', '3']}
Note: You should not name your variable dict which is the name for a built-in method.
Something like this?:
d = {}
for i in range(0,4):
node_1=str(i)
for j in range(0,4):
node_2=str(j)
if node_1 not in d:
d[node_1] = []
d[node_1].append(node_2)
print(d)
Please do not use dict for variable name.

How to add numbers in duplicate list

I've collected data from txt file and made it to the list (actually there are a lot more players, so it is impossible to count without loop), like:
data_list = [
['FW', '1', 'Khan', '2', '0'],
['FW', '25', 'Daniel', '0', '0'],
['FW', '3', 'Daniel', '1', '0'],
['FW', '32', 'Daniel', '0', '0'],
['FW', '4', 'Khan', '1', '0']
]
and I want to add the goal of each Khan and Daniel and make a list like:
['Khan', 3]
['Daniel', 1]
I have a name list (name_list = [Khan, Daniel])
I've tried to do with for loop, like:
goal = []
num = 0
for i in name_list:
for j in data_list:
if i == j[2]:
num += int(j[3])
goal.append([i, num])
else:
continue
and it did not work.
I am very novice, so your comments will be a really big help.
Thanks!
Your code is very close from working, there are syntax error and one single real problem.
The problem is that you are appending num too soon. You should sum over rows that contain the name you are looking for, then, once all rows have been seen append the value:
data_list = [
['pos', 'num', 'name', 'goal', 'assist'],
['FW', '1', 'Khan', '2', '0'],
['FW', '25', 'Daniel', '0', '0'],
['FW', '3', 'Daniel', '1', '0'],
['FW', '32', 'Daniel', '0', '0'],
['FW', '4', 'Khan', '1', '0']
]
name_list = ['Khan', 'Daniel']
goal = []
for name in name_list:
total_score = 0
for j in data_list:
if name == j[2]:
total_score += int(j[3])
goal.append([i, total_score])
On the other hand this strategy is not the most efficient since for every name the code will iterate over all rows. You could (using dictionaries to store intermediate results) need a single look on each row, independently of the number of "names" you are looking for.
name_list = {'Khan', 'Daniel'}
goal = dict()
for row in data_list:
if row[2] in name_list:
if not row[2] in goal:
goal[row[2]] = 0
goal[row[2]] += int(row[3])
Which set goal to {'Khan': 3, 'Daniel': 1}.
Yet this could be improved (readability), using defaultdict. What default dictionary do is doing the existence check of a given "key" and initialisation automatically for you, which simplifies the code:
from collections import defaultdict
goal = defaultdict(int)
for row in data_list:
if row[2] in name_list:
goal[row[2]] += int(row[3])
Which does the exact same thing as before. At that point it's not even clear that we really need to provide a list of names (unless memory is an issue). Getting a dictionary for all names would again simplify the code (we just need to make sure to ignore the first row using the slice notation [1:]):
goal = defaultdict(int)
for row in data_list[1:]:
goal[row[2]] += int(row[3])
You can create a dictionary to keep the sum number of goals, with the names as keys. This will make easier to access the values:
goals_dict = {}
for name in name_list:
goals_dict[name] = 0
# {'Khan': 0, 'Daniel': 0}
Then just sum it:
for name in name_list:
for data in data_list:
if data[2] == name:
goals_dict[name] += int(data[3])
Now you will have your dictionary populated correctly. Now to set the result as the list you requested, do as such:
result = [[key, value] for key, value in d.items()]
Don't bother doing it manually. Use a Counter instead:
from collections import Counter
c = Counter()
for j in data_list:
name = j[2]
goal = int(j[3])
c[name] += goal
print(c.most_common()) # -> [('Khan', 3), ('Daniel', 1)]
In your above code you increment the value of num without first defining it. You'll want to initialize it to 0 outside of your inner for loop. You'd then append the name/goal to the list like this:
for i in name_list:
#Init num
num = 0
# Iterate through each data entry
for j in data_list:
if i == j[2]:
# Increment goal count for this player
num+= int(j[3])
# Append final count to goal list
goal.append([i, num])
This should have the desired effect, although as #wjandrea has pointed out, a Counter would be a much cleaner implementation.

Iterating through txt file and adding words to separate lists in Python

I have a text file that has about 50 lines and follows the following format:
immediate ADC #oper 69 2 2
absolute ADC oper 6D 3 4
etc..
What I would like to do is create 6 different lists and add every word in each column on a single line to the separate lists, so that the output becomes this
addressing: ['immediate', 'absolute']
symbol: ['ADC', 'ADC']
symbol2: ['#oper', 'oper']
opcode: ['69', '6D']
bytes: ['2', '3']
cycles: ['2', '4']
I'm trying to do this in Python but at the moment my code isn't working and adds every word into every list:
addressing: ['immidiate', 'ADC', '#oper', '69', '2', '2', 'absolute', 'ADC', 'oper', '6D', '3', '4',]
symbol: ['immidiate', 'ADC', '#oper', '69', '2', '2', 'absolute', 'ADC', 'oper', '6D', '3', '4',]
symbol2: ['immidiate', 'ADC', '#oper', '69', '2', '2', 'absolute', 'ADC', 'oper', '6D', '3', '4',]
opcode: ['immidiate', 'ADC', '#oper', '69', '2', '2', 'absolute', 'ADC', 'oper', '6D', '3', '4',]
bytes: ['immidiate', 'ADC', '#oper', '69', '2', '2', 'absolute', 'ADC', 'oper', '6D', '3', '4',]
cycles: ['immidiate', 'ADC', '#oper', '69', '2', '2', 'absolute', 'ADC', 'oper', '6D', '3', '4',]
How can I change the below code so that it produces the output I want?
addressing = []
symbol = []
symbol2 = []
opcode = []
bytes = []
cycles = []
index = 1;
for line in f:
for word in line.split():
if index == 1:
addressing.append(word)
index += 1
print(index)
if index == 2:
symbol.append(word)
index += 1
print(index)
if index == 3:
symbol2.append(word)
index += 1
print(index)
if index == 4:
opcode.append(word)
index += 1
print(index)
if index == 5:
bytes.append(word)
index += 1
print(index)
if index == 6:
cycles.append(word)
index += 1
print(index)
index = 1
There are two ways to solve this:
The static way which assumes the format will never change and each row will have the same number of values
The dynamic way that is flexible to format changes and variable number of items per row assuming that the order of the items remain the same.
I'll detail both ways belows:
The Static Way:
Split the line and append using indexes
addressing = []
symbol = []
symbol2 = []
opcode = []
bytes = []
cycles = []
for line in f:
splitted = line.split()
addressing.append(splitted[0])
symbol.append(splitted[1])
symbol2.append(splitted[2])
opcode.append(splitted[3])
bytes.append(splitted[4])
cycles.append(splitted[5])
Dynamic Way: Create a dictionary and iterate over keys.
information = {}
information['addressing'] = []
information['symbol'] = []
information['symbol2'] = []
information['opcode'] = []
information['bytes'] = []
information['cycles'] = []
key_list = list(information.keys())
for line in f:
splitted = line.split()
for i in range(0,len(splitted)):
information[key_list[i]].append(splitted[i])
print(information)
You can use regular expressions to split each line at the longest block of \s:
import re
f = [re.split('\s+', i.strip('\n')) for i in open('filename.txt')]
final_data = [{a:list(i)} for a, i in zip(['addressing', 'symbol', 'symbol2', 'opcode', 'bytes', 'cycles'], zip(*f))]
Output:
[{'addressing': ['immediate', 'absolute']}, {'symbol': ['ADC', 'ADC']}, {'symbol2': ['#oper', 'oper']}, {'opcode': ['69', '6D']}, {'bytes': ['2', '3']}, {'cycles': ['2', '4']}]
You can use the built-in zip function to transpose your rows of data into columns. The code below puts the data into a dictionary of tuples, with the field names as the keys. For this demo I've embedded the data into the script, since that's simpler than reading from a file, but it's easy to modify the code to read from a file.
file_data = '''\
immediate ADC #oper 69 2 2
absolute ADC oper 6D 3 4
'''.splitlines()
fields = 'addressing', 'symbol', 'symbol2', 'opcode', 'bytes', 'cycles'
values = zip(*[row.split() for row in file_data])
data = dict(zip(fields, values))
for k in fields:
print(k, data[k])
output
addressing ('immediate', 'absolute')
symbol ('ADC', 'ADC')
symbol2 ('#oper', 'oper')
opcode ('69', '6D')
bytes ('2', '3')
cycles ('2', '4')
If you really want separate named variables, that's even easier, but as you can see it's more painful to work with.
file_data = '''\
immediate ADC #oper 69 2 2
absolute ADC oper 6D 3 4
'''.splitlines()
(addressing, symbol, symbol2,
opcode, bytecode, cycles) = zip(*[row.split() for row in file_data])
print(addressing)
print(symbol)
print(symbol2)
print(opcode)
print(bytecode)
print(cycles)
output
('immediate', 'absolute')
('ADC', 'ADC')
('#oper', 'oper')
('69', '6D')
('2', '3')
('2', '4')
The issue is that you're incrementing the index in every if block. So at the end of this block:
if index == 1:
addressing.append(word)
index += 1
print(index)
The value of index is 2. Then when it hits if index == 2: that evaluates to True, adds that word to the second list, increments the index, and so on.
You could solve this by changing the inside for loop to for index in range(1,6): and stop incrementing index manually, but if you know that every line has 6 words it might be better to remove the inside for loop altogether and assign the words to the arrays manually.
for line in f:
words = line.split()
addressing.append(words[0])
symbol.append(words[1])
...etc
As already commented, you should remove all index += 1 statements and leave just a single index += 1 right at the end of the inner for loop. Or use elif intead of if.
Also, consider using enumerate(). There is no need to manually update the index variable:
# Example use of enumerate()
for line in f:
for index, word in enumerate(line.split()):
print(index, word)

Adding dictionary keys and values after line split?

If I have for instance the file:
;;;
;;;
;;;
A 1 2 3
B 2 3 4
C 3 4 5
And I want to read it into a dictionary of {str: list of str} :
{'A': ['1', '2', '3'], 'B': ['2', '3', '4'], 'C': ['3', '4', '5']
I have the following code:
d = {}
with open('file_name') as f:
for line in f:
while ';;;' not in line:
(key, val) = line.split(' ')
#missingcodehere
return d
What should I put in after the line.split to assign the keys and values as a str and list of str?
To focus on your code and what you are doing wrong.
You are pretty much in an infinite loop with your while ';;;' not in line. So, you want to change your logic with how you are trying to insert data in to your dictionary. Simply use a conditional statement to check if ';;;' is in your line.
Then, when you get your key and value from your line.strip().split(' ') you simply just assign it to your dictionary as d[key] = val. However, you want a list, and val is currently a string at this point, so call split on val as well.
Furthermore, you do not need to have parentheses around key and val. It provides unneeded noise to your code.
The end result will give you:
d = {}
with open('new_file.txt') as f:
for line in f:
if ';;;' not in line:
key, val = line.strip().split(' ')
d[key] = val.split()
print(d)
Using your sample input, output is:
{'C': ['3', '4', '5'], 'A': ['1', '2', '3'], 'B': ['2', '3', '4']}
Finally, to provide an improvement to the implementation as it can be made more Pythonic. We can simplify this code and provide a small improvement to split more generically, rather than counting explicit spaces:
with open('new_file.txt') as fin:
valid = (line.split(None, 1) for line in fin if ';;;' not in line)
d = {k:v.split() for k, v in valid}
So, above, you will notice our split looks like this: split(None, 1). Where we are providing a maxsplit=1.
Per the docstring of split, it explains it pretty well:
Return a list of the words in S, using sep as the
delimiter string. If maxsplit is given, at most maxsplit
splits are done. If sep is not specified or is None, any
whitespace string is a separator and empty strings are
removed from the result.
Finally, we simply use a dictionary comprehension to obtain our final result.
Why not simply:
def make_dict(f_name):
with open(f_name) as f:
d = {k: v.split()
for k, v in [line.strip().split(' ')
for line in f
if ';;;' not in line]}
return d
Then
>>> print(make_dict('file_name'))
{'A': ['1', '2', '3'], 'B': ['2', '3', '4'], 'C': ['3', '4', '5']}

Defaultdict appending trick

I have a text file where elements are stored in two column like the following:
a 1,a 3,a 4,b 1,b 2,b 3,b 4,c 1,c 2.... etc
The file contains two columns, one is the key a,b,c etc, and the other is the elements 1,2,3,4 etc.
I stored these items using defaultdict and appended them.
The items in the default dict are:
defaultdict(<type 'list'>, `{'a': ['0', '1', '2', '3', '4'], 'c': ['1', '2'], 'b': ['1', '2', '3', '4']}`)
I used following command:
from collections import defaultdict
positions = defaultdict(list)
with open('test.txt') as f:
for line in f:
sob = line.split()
key=sob[0]
ele=sob[1]
positions[key].append(ele)
print positions
insted of defaultdict you can use OrderedDict
from collections import OrderedDict
positions = OrderedDict()
with open('test.txt') as f:
for line in f:
key, ele = line.strip().split()
positions[key] = positions.get(key, []) + [ele]
print positions

Categories