How to add numbers in duplicate list - python

I've collected data from txt file and made it to the list (actually there are a lot more players, so it is impossible to count without loop), like:
data_list = [
['FW', '1', 'Khan', '2', '0'],
['FW', '25', 'Daniel', '0', '0'],
['FW', '3', 'Daniel', '1', '0'],
['FW', '32', 'Daniel', '0', '0'],
['FW', '4', 'Khan', '1', '0']
]
and I want to add the goal of each Khan and Daniel and make a list like:
['Khan', 3]
['Daniel', 1]
I have a name list (name_list = [Khan, Daniel])
I've tried to do with for loop, like:
goal = []
num = 0
for i in name_list:
for j in data_list:
if i == j[2]:
num += int(j[3])
goal.append([i, num])
else:
continue
and it did not work.
I am very novice, so your comments will be a really big help.
Thanks!

Your code is very close from working, there are syntax error and one single real problem.
The problem is that you are appending num too soon. You should sum over rows that contain the name you are looking for, then, once all rows have been seen append the value:
data_list = [
['pos', 'num', 'name', 'goal', 'assist'],
['FW', '1', 'Khan', '2', '0'],
['FW', '25', 'Daniel', '0', '0'],
['FW', '3', 'Daniel', '1', '0'],
['FW', '32', 'Daniel', '0', '0'],
['FW', '4', 'Khan', '1', '0']
]
name_list = ['Khan', 'Daniel']
goal = []
for name in name_list:
total_score = 0
for j in data_list:
if name == j[2]:
total_score += int(j[3])
goal.append([i, total_score])
On the other hand this strategy is not the most efficient since for every name the code will iterate over all rows. You could (using dictionaries to store intermediate results) need a single look on each row, independently of the number of "names" you are looking for.
name_list = {'Khan', 'Daniel'}
goal = dict()
for row in data_list:
if row[2] in name_list:
if not row[2] in goal:
goal[row[2]] = 0
goal[row[2]] += int(row[3])
Which set goal to {'Khan': 3, 'Daniel': 1}.
Yet this could be improved (readability), using defaultdict. What default dictionary do is doing the existence check of a given "key" and initialisation automatically for you, which simplifies the code:
from collections import defaultdict
goal = defaultdict(int)
for row in data_list:
if row[2] in name_list:
goal[row[2]] += int(row[3])
Which does the exact same thing as before. At that point it's not even clear that we really need to provide a list of names (unless memory is an issue). Getting a dictionary for all names would again simplify the code (we just need to make sure to ignore the first row using the slice notation [1:]):
goal = defaultdict(int)
for row in data_list[1:]:
goal[row[2]] += int(row[3])

You can create a dictionary to keep the sum number of goals, with the names as keys. This will make easier to access the values:
goals_dict = {}
for name in name_list:
goals_dict[name] = 0
# {'Khan': 0, 'Daniel': 0}
Then just sum it:
for name in name_list:
for data in data_list:
if data[2] == name:
goals_dict[name] += int(data[3])
Now you will have your dictionary populated correctly. Now to set the result as the list you requested, do as such:
result = [[key, value] for key, value in d.items()]

Don't bother doing it manually. Use a Counter instead:
from collections import Counter
c = Counter()
for j in data_list:
name = j[2]
goal = int(j[3])
c[name] += goal
print(c.most_common()) # -> [('Khan', 3), ('Daniel', 1)]

In your above code you increment the value of num without first defining it. You'll want to initialize it to 0 outside of your inner for loop. You'd then append the name/goal to the list like this:
for i in name_list:
#Init num
num = 0
# Iterate through each data entry
for j in data_list:
if i == j[2]:
# Increment goal count for this player
num+= int(j[3])
# Append final count to goal list
goal.append([i, num])
This should have the desired effect, although as #wjandrea has pointed out, a Counter would be a much cleaner implementation.

Related

creating dictionary with nested loop

I tried to create a dictionary with nested loops but failed. I do not know what's wrong:
dict={}
for i in range(0,4):
node_1=str(i)
for j in range(0,4):
node_2=str(j)
dict[node_1]=[node_2]
print(dict)
It should have created:
{'0':['1','2','3'],'1':['0','2','3'],'2':['0','1','3']}
In your code, you are overwriting the previous j value with the new j value. Instead, you should be appending it to a list.
mydict = {}
for i in range(0,4):
node_1 = str(i)
mydict[node_1] = [] # assign empty list
for j in range(0,4):
node_2 = str(j)
mydict[node_1].append(node_2) # append in list
print(mydict)
Output:
{'0': ['0', '1', '2', '3'], '1': ['0', '1', '2', '3'], '2': ['0', '1', '2', '3'], '3': ['0', '1', '2', '3']}
Note: You should not name your variable dict which is the name for a built-in method.
Something like this?:
d = {}
for i in range(0,4):
node_1=str(i)
for j in range(0,4):
node_2=str(j)
if node_1 not in d:
d[node_1] = []
d[node_1].append(node_2)
print(d)
Please do not use dict for variable name.

Iterating through txt file and adding words to separate lists in Python

I have a text file that has about 50 lines and follows the following format:
immediate ADC #oper 69 2 2
absolute ADC oper 6D 3 4
etc..
What I would like to do is create 6 different lists and add every word in each column on a single line to the separate lists, so that the output becomes this
addressing: ['immediate', 'absolute']
symbol: ['ADC', 'ADC']
symbol2: ['#oper', 'oper']
opcode: ['69', '6D']
bytes: ['2', '3']
cycles: ['2', '4']
I'm trying to do this in Python but at the moment my code isn't working and adds every word into every list:
addressing: ['immidiate', 'ADC', '#oper', '69', '2', '2', 'absolute', 'ADC', 'oper', '6D', '3', '4',]
symbol: ['immidiate', 'ADC', '#oper', '69', '2', '2', 'absolute', 'ADC', 'oper', '6D', '3', '4',]
symbol2: ['immidiate', 'ADC', '#oper', '69', '2', '2', 'absolute', 'ADC', 'oper', '6D', '3', '4',]
opcode: ['immidiate', 'ADC', '#oper', '69', '2', '2', 'absolute', 'ADC', 'oper', '6D', '3', '4',]
bytes: ['immidiate', 'ADC', '#oper', '69', '2', '2', 'absolute', 'ADC', 'oper', '6D', '3', '4',]
cycles: ['immidiate', 'ADC', '#oper', '69', '2', '2', 'absolute', 'ADC', 'oper', '6D', '3', '4',]
How can I change the below code so that it produces the output I want?
addressing = []
symbol = []
symbol2 = []
opcode = []
bytes = []
cycles = []
index = 1;
for line in f:
for word in line.split():
if index == 1:
addressing.append(word)
index += 1
print(index)
if index == 2:
symbol.append(word)
index += 1
print(index)
if index == 3:
symbol2.append(word)
index += 1
print(index)
if index == 4:
opcode.append(word)
index += 1
print(index)
if index == 5:
bytes.append(word)
index += 1
print(index)
if index == 6:
cycles.append(word)
index += 1
print(index)
index = 1
There are two ways to solve this:
The static way which assumes the format will never change and each row will have the same number of values
The dynamic way that is flexible to format changes and variable number of items per row assuming that the order of the items remain the same.
I'll detail both ways belows:
The Static Way:
Split the line and append using indexes
addressing = []
symbol = []
symbol2 = []
opcode = []
bytes = []
cycles = []
for line in f:
splitted = line.split()
addressing.append(splitted[0])
symbol.append(splitted[1])
symbol2.append(splitted[2])
opcode.append(splitted[3])
bytes.append(splitted[4])
cycles.append(splitted[5])
Dynamic Way: Create a dictionary and iterate over keys.
information = {}
information['addressing'] = []
information['symbol'] = []
information['symbol2'] = []
information['opcode'] = []
information['bytes'] = []
information['cycles'] = []
key_list = list(information.keys())
for line in f:
splitted = line.split()
for i in range(0,len(splitted)):
information[key_list[i]].append(splitted[i])
print(information)
You can use regular expressions to split each line at the longest block of \s:
import re
f = [re.split('\s+', i.strip('\n')) for i in open('filename.txt')]
final_data = [{a:list(i)} for a, i in zip(['addressing', 'symbol', 'symbol2', 'opcode', 'bytes', 'cycles'], zip(*f))]
Output:
[{'addressing': ['immediate', 'absolute']}, {'symbol': ['ADC', 'ADC']}, {'symbol2': ['#oper', 'oper']}, {'opcode': ['69', '6D']}, {'bytes': ['2', '3']}, {'cycles': ['2', '4']}]
You can use the built-in zip function to transpose your rows of data into columns. The code below puts the data into a dictionary of tuples, with the field names as the keys. For this demo I've embedded the data into the script, since that's simpler than reading from a file, but it's easy to modify the code to read from a file.
file_data = '''\
immediate ADC #oper 69 2 2
absolute ADC oper 6D 3 4
'''.splitlines()
fields = 'addressing', 'symbol', 'symbol2', 'opcode', 'bytes', 'cycles'
values = zip(*[row.split() for row in file_data])
data = dict(zip(fields, values))
for k in fields:
print(k, data[k])
output
addressing ('immediate', 'absolute')
symbol ('ADC', 'ADC')
symbol2 ('#oper', 'oper')
opcode ('69', '6D')
bytes ('2', '3')
cycles ('2', '4')
If you really want separate named variables, that's even easier, but as you can see it's more painful to work with.
file_data = '''\
immediate ADC #oper 69 2 2
absolute ADC oper 6D 3 4
'''.splitlines()
(addressing, symbol, symbol2,
opcode, bytecode, cycles) = zip(*[row.split() for row in file_data])
print(addressing)
print(symbol)
print(symbol2)
print(opcode)
print(bytecode)
print(cycles)
output
('immediate', 'absolute')
('ADC', 'ADC')
('#oper', 'oper')
('69', '6D')
('2', '3')
('2', '4')
The issue is that you're incrementing the index in every if block. So at the end of this block:
if index == 1:
addressing.append(word)
index += 1
print(index)
The value of index is 2. Then when it hits if index == 2: that evaluates to True, adds that word to the second list, increments the index, and so on.
You could solve this by changing the inside for loop to for index in range(1,6): and stop incrementing index manually, but if you know that every line has 6 words it might be better to remove the inside for loop altogether and assign the words to the arrays manually.
for line in f:
words = line.split()
addressing.append(words[0])
symbol.append(words[1])
...etc
As already commented, you should remove all index += 1 statements and leave just a single index += 1 right at the end of the inner for loop. Or use elif intead of if.
Also, consider using enumerate(). There is no need to manually update the index variable:
# Example use of enumerate()
for line in f:
for index, word in enumerate(line.split()):
print(index, word)

Get a running total from a list

I'm reading in items:
for line in sys.stdin:
line = line.strip()
data = line.split("-")
If I print data as it is read, it looks like:
['Adam', '5']
['Peter', '7']
['Adam', '8']
['Lucy', '2']
['Peter', '4']
How can I get a running total for each unique name, such my new list would look like:
['Adam', '13'],
['Peter', '11'],
['Lucy', '2']
Use a collections.Counter() to count the occurrences:
import collections
lines = [['Adam', '5'],
['Peter', '7'],
['Adam', '8'],
['Lucy', '2'],
['Peter', '4']]
counter = collections.Counter()
for data in lines:
counter[data[0]] += int(data[1])
print(counter)
You'll get:
Counter({'Adam': 13, 'Peter': 11, 'Lucy': 2})
Initialize a defaultdict with type int and use the name as the key
from collections import defaultdict
name_list = defaultdict(int)
for line in sys.stdin:
line = line.strip()
data = line.split("-")
name = data[0]
value = int(data[1])
name_list[name] += value
for key, value in name_list.items(): print key, value
I recommend creating a dictonary and updating that as you go. I have assumed your data format for data is a list of lists.
finalList = {}
for name, value in data:
if name in finalList.keys():
finalList[name] = finalList[name] + int(value)
else:
finalList[name] = int(value)
print(finalList)
Pandas does a very good job in handling this kind of situations
import pandas as pd
df_data=pd.read_csv(filepath_or_buffer=path,sep='_',names =['Name','value'])
df=df_data.groupby(['Name'])['value'].sum()
print df
output
'Adam' 13
'Lucy' 2
'Peter' 11
Input file
Adam_5
Peter_7
Adam_8
Lucy_2
Peter_4

how to remove the first occurence of an integer in a list

this is my code:
positions = []
for i in lines[2]:
if i not in positions:
positions.append(i)
print (positions)
print (lines[1])
print (lines[2])
the output is:
['1', '2', '3', '4', '5']
['is', 'the', 'time', 'this', 'ends']
['1', '2', '3', '4', '1', '5']
I would want my output of the variable "positions" to be; ['2','3','4','1','5']
so instead of removing the second duplicate from the variable "lines[2]" it should remove the first duplicate.
You can reverse your list, create the positions and then reverse it back as mentioned by #tobias_k in the comment:
lst = ['1', '2', '3', '4', '1', '5']
positions = []
for i in reversed(lst):
if i not in positions:
positions.append(i)
list(reversed(positions))
# ['2', '3', '4', '1', '5']
You'll need to first detect what values are duplicated before you can build positions. Use an itertools.Counter() object to test if a value has been seen more than once:
from itertools import Counter
counts = Counter(lines[2])
positions = []
for i in lines[2]:
counts[i] -= 1
if counts[i] == 0:
# only add if this is the 'last' value
positions.append(i)
This'll work for any number of repetitions of values; only the last value to appear is ever used.
You could also reverse the list, and track what you have already seen with a set, which is faster than testing against the list:
positions = []
seen = set()
for i in reversed(lines[2]):
if i not in seen:
# only add if this is the first time we see the value
positions.append(i)
seen.add(i)
positions = positions[::-1] # reverse the output list
Both approaches require two iterations; the first to create the counts mapping, the second to reverse the output list. Which is faster will depend on the size of lines[2] and the number of duplicates in it, and wether or not you are using Python 3 (where Counter performance was significantly improved).
you can use a dictionary to save the last position of the element and then build a new list with that information
>>> data=['1', '2', '3', '4', '1', '5']
>>> temp={ e:i for i,e in enumerate(data) }
>>> sorted(temp, key=lambda x:temp[x])
['2', '3', '4', '1', '5']
>>>

Getting rid of Characters in CVS file to get mean of columns

I asked for help a while ago and I thought this was what I was looking for unfortunately I ran into another problem. In my CSV file I have ?'s inplace of missing data in some rows in the 13 columns. I have an idea of how to fix it but have yet to be successful in implementing it. My current Idea would be to use use ord and chr to change the ? to 0 but not sure how to implement that to list. This is the error I get
File "C:\Users\David\Documents\Python\asdf.py", line 46, in <module>
iList_sum[i] += float(ill_data[i])
ValueError: could not convert string to float: '?'
Just so you know I can not use numby or panda. I am also trying to refrain from using mapping since I am trying to get a very simplistic code.
import csv
#turn csv files into a list of lists
with open('train.csv','rU') as csvfile:
reader = csv.reader(csvfile)
csv_data = list(reader)
# Create two lists to handle the patients
# And two more lists to collect the 'sum' of the columns
# The one that needs to hold the sum 'must' have 0 so we
# can work with them more easily
iList = []
iList_sum = [0,0,0,0,0,0,0,0,0,0,0,0,0]
hList = []
hList_sum = [0,0,0,0,0,0,0,0,0,0,0,0,0]
# Only use one loop to make the process mega faster
for row in csv_data:
# If row 13 is greater than 0, then place them as unhealthy
if (row and int(row[13]) > 0):
# This appends the whole 'line'/'row' for storing :)
# That's what you want (instead of saving only one cell at a time)
iList.append(row)
# If it failed the initial condition (greater than 0), then row 13
# is either less than or equal to 0. That's simply the logical outcome
else:
hList.append(row)
# Use these to verify the data and make sure we collected the right thing
# print iList
# [['67', '1', '4', '160', '286', '0', '2', '108', '1', '1.5', '2', '3', '3', '2'], ['67', '1', '4', '120', '229', '0', '2', '129', '1', '2.6', '2', '2', '7', '1']]
# print hList
# [['63', '1', '1', '145', '233', '1', '2', '150', '0', '2.3', '3', '0', '6', '0'], ['37', '1', '3', '130', '250', '0', '0', '187', '0', '3.5', '3', '0', '3', '0']]
# We can use list comprehension, but since this is a beginner task, let's go with basics:
# Loop through all the 'rows' of the ill patient
for ill_data in iList:
# Loop through the data within each row, and sum them up
for i in range(0,len(ill_data) - 1):
iList_sum[i] += float(ill_data[i])
# Now repeat the process for healthy patient
# Loop through all the 'rows' of the healthy patient
for healthy_data in hList:
# Loop through the data within each row, and sum them up
for i in range(0,len(healthy_data) - 1):
hList_sum[i] += float(ill_data[i])
# Using list comprehension, I basically go through each number
# In ill list (sum of all columns), and divide it by the lenght of iList that
# I found from the csv file. So, if there are 22 ill patients, then len(iList) will
# be 22. You can see that the whole thing is wrapped in brackets, so it would show
# as a python list
ill_avg = [ ill / len(iList) for ill in iList_sum]
hlt_avg = [ hlt / len(hList) for hlt in hList_sum]
Here is a screenshot of the CSV file.
Simply check the value you get from the list:
# Loop through the data within each row, and sum them up
qmark_counter = 0
for i in range(0,len(ill_data) - 1):
if ill_data[i] == '?':
val = 0
qmark_counter += 1
else
val = ill_data[i]
iList_sum[i] += float(val)
And so on for the other ones. There are many other improvements that could be done; for instance, I would put the snippet of code in a function so that it does not have to be repeated multiple times.
EDIT: added the counter for question marks. If you want to keep track of question marks separately for each list, you may want to use a dictionary.

Categories