Averaging specific list elements iteratively? - python

Say I have a dataset with a variable, lines, that looks like this:
lines = ['QA7uiXy8vIbUSPOkCf9RwQ3FsT8jVq2OxDr8zqa7bRQ=', '1', '10', '38', '0.0', '9', '20050407', '20170319', '0', '0', '0', '0', '1', '1', '281.6']
['QA7uiXy8vIbUSPOkCf9RwQ3FsT8jVq2OxDr8zqa7bRQ=', '1', '10', '38', '0.0', '9', '20050407', '20170319', '0', '0', '0', '0', '1', '1', '281.6']
['QA7uiXy8vIbUSPOkCf9RwQ3FsT8jVq2OxDr8zqa7bRQ=', '1', '10', '38', '0.0', '9', '20050407', '20170319', '0', '0', '0', '0', '1', '1', '281.6']
['QA7uiXy8vIbUSPOkCf9RwQ3FsT8jVq2OxDr8zqa7bRQ=', '1', '10', '38', '0.0', '9', '20050407', '20170319', '0', '0', '0', '0', '1', '1', '281.6']
How do I, if and only if lines[0] == lines[0], meaning only if the first element of the list is the exact same, average specific values in the rest of the list, and combine that into one, averaged list? Of course, I will have to convert all numbers into floats.
In the specific example, I want a singular list, where all the numeric values besides lines[1] and lines[-1] are averaged. Any easy way?
Expected output
['QA7uiXy8vIbUSPOkCf9RwQ3FsT8jVq2OxDr8zqa7bRQ=', 1, avg_of_var, avg_of_var, avg, , '20050407', '20170319', '0', '0', '0', '0', '1', '1', '281.6']
Basically - and I see now that my example data is unfortunate as all values are the same - but I want a singular list containing an average of the numeric values of the four lines in the example.

You can use pandas to create a dataframe. You can then group by lines[0] and then aggregate by mean (for desired columns only). However, you also need to specify aggregation method for other columns as well. I will assume, you also need the mean for these columns.
import pandas as pd
from numpy import mean
lines = [['QA7uiXy8vIbUSPOkCf9RwQ3FsT8jVq2OxDr8zqa7bRQ=', 1, 10, 38, 0.0, 9,
20050407, 20170319, 0, 0, 0, 0, 1, 1, 281.6],
['QA7uiXy8vIbUSPOkCf9RwQ3FsT8jVq2OxDr8zqa7bRQ=', 1, 10, 38, 0.0, 9,
20050407, 20170319, 0, 0, 0, 0, 1, 1, 281.6],
['QA7uiXy8vIbUSPOkCf9RwQ3FsT8jVq2OxDr8zqa7bRQ=', 1, 10, 38, 0.0, 9,
20050407, 20170319, 0, 0, 0, 0, 1, 1, 281.6],
['QA7uiXy8vIbUSPOkCf9RwQ3FsT8jVq2OxDr8zqa7bRQ=', 1, 10, 38, 0.0, 9,
20050407, 20170319, 0, 0, 0, 0, 1, 1, 281.6]]
# I have removed the quotes around numbers for simplification but this can also be handled by pandas.
# create a data frame and give names to your fields.
# Here 'KEY' is the name of the first field we will use for grouping
df = pd.DataFrame(lines,columns=['KEY','a','b','c','d','e','f','g','h','i','j','k','l','m','n'])
This yields something like this:
KEY a b c d e f g h i j k l m n
0 QA7uiXy8vIbUSPOkCf9RwQ3FsT8jVq2OxDr8zqa7bRQ= 1 10 38 0.0 9 20050407 20170319 0 0 0 0 1 1 281.6
1 QA7uiXy8vIbUSPOkCf9RwQ3FsT8jVq2OxDr8zqa7bRQ= 1 10 38 0.0 9 20050407 20170319 0 0 0 0 1 1 281.6
2 QA7uiXy8vIbUSPOkCf9RwQ3FsT8jVq2OxDr8zqa7bRQ= 1 10 38 0.0 9 20050407 20170319 0 0 0 0 1 1 281.6
3 QA7uiXy8vIbUSPOkCf9RwQ3FsT8jVq2OxDr8zqa7bRQ= 1 10 38 0.0 9 20050407 20170319 0 0 0 0 1 1 281.6
This is the operation you are looking for:
data = df.groupby('KEY',as_index=False).aggregate(mean)
This yields:
KEY a b c d e f g h i j k l m n
0 QA7uiXy8vIbUSPOkCf9RwQ3FsT8jVq2OxDr8zqa7bRQ= 1 10 38 0.0 9 20050407 20170319 0 0 0 0 1 1 281.6
You can specify the aggregation type by field by using a dictionary (assuming 'mean' for every field):
data = df.groupby('KEY',as_index=False).aggregate({'a':mean,'b':mean,'c':mean,'d':mean,'e':mean,'f':mean,'g':mean,'h':mean,'i':mean,'j':mean,'k':mean,'l':mean,'m':mean,'n':mean})
More information about groupby can be found here: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.groupby.DataFrameGroupBy.agg.html

will this simple python snippet works
# I am assuming lines is a list of line
lines = [['QA7uiXy8vIbUSPOkCf9RwQ3FsT8jVq2OxDr8zqa7bRQ=', '1', '10', '38', '0.0', '9', '20050407', '20170319', '0', '0', '0', '0', '1', '1', '281.6'],
['QA7uiXy8vIbUSPOkCf9RwQ3FsT8jq2OxDr8zqa7bRQ=', '1', '10', '38', '0.0', '9', '20050407', '20170319', '0', '0', '0', '0', '1', '1', '281.6'],
['QA7uiXy8vIbUSPOkCf9RwQ3FsT8jq2OxDr8zqa7bRQ=', '1', '10', '38', '0.0', '9', '20050407', '20170319', '0', '0', '0', '0', '1', '1', '281.6'],
['QA7uiXy8vIbUSPOkCf9RwQ3FsT8jVq2OxDr8zqa7bRQ=', '1', '10', '38', '0.0', '9', '20050407', '20170319', '0', '0', '0', '0', '1', '1', '281.6']]
# I am gonna use dict to distinct line[0] as key
# will keep adding to dict , if first time
# otherwise add all the values to corresponding index
# also keep track of number of lines to find out avg at last
average = {}
for line in lines:
# first time just enter data to dict
# and initialise qty as 1
if line[0] not in average:
average[line[0]] = {
'data': line,
'qty' : 1
}
continue
add column data after type conversion to float
i = 1
while i < len(line):
average[line[0]]['data'][i] = float(average[line[0]]['data'][i]) + float(line[i])
i+=1
average[line[0]]['qty'] += 1;
# now create another list of required lines
merged_lines = []
for key in average:
line = []
line.append(key)
# this is to calculate average
for element in average[key]['data'][1:]:
line.append(element/average[key]['qty'])
merged_lines.append(line)
print merged_lines

Related

how do i convert results to list python

I have this code I'm trying to run by using two columns of a csv file that I've converted into lists and used those lists to get a < and > comparison between the numbers inside, now i want to get the results from this comparison in a list format of multiple lists that I want to display in an interval of six digits(the results) per list
eg I get
1
2
3
4
5
6
7
8
9
10
11
12
and i want to display this as
[1,2,3,4,5,6]
[7,8,9,10,11,12]
this is the code I'm using for comparing the lists
'''
for i in range(len(fsa)):
if fsa[i] < ghf[i]:
print('1')
else:
print('0')
'''
the code that's not working which is the one for showing results in an intervalled list format is this one
'''
print()
start = 0
end = len(''' i want the length of my results from the previous code, the 1's and 0's here. ''')
for x in range(start,end,6):
print('''i want the results here as my list'''[x:x+6])
'''
I'm a beginner, please help, how do i make the results a list?
i got the answer i wanted. Incase someone else was suffering with this as
well here's my solution
'''
kol = []
for i in range(len(fsa)):
if fsa[i] < ghf[i]:
kol.append('1')
else:
kol.append('0')
start = 0
end = len(fsa)
for x in range(start,end,6):
print(kol[x:x+6])
'''
outcome
'''
['1', '1', '0', '0', '1', '1']
['1', '0', '0', '1', '0', '0']
['0', '0', '0', '0', '1', '1']
['1', '1', '1', '0', '1', '1']
'''
you just need to make a new list and append it instead of print.
...
...
temp = []
for x in range(start,end,6):
temp.append(fsa[x:x+6])
print(temp)
#[[1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12]]

How to transform index values into columns using Pandas?

I have a dictionary like this:
my_dict = {'RuleSet': {'0': {'RuleSetID': '0',
'RuleSetName': 'Allgemein',
'Rules': [{'RulesID': '10',
'RuleName': 'Gemeinde Seiten',
'GroupHits': '2',
'KeyWordGroups': ['100', '101', '102']}]},
'1': {'RuleSetID': '1',
'RuleSetName': 'Portale Berlin',
'Rules': [{'RulesID': '11',
'RuleName': 'Portale Berlin',
'GroupHits': '4',
'KeyWordGroups': ['100', '101', '102', '107']}]},
'6': {'RuleSetID': '6',
'RuleSetName': 'Zwangsvollstr. Berlin',
'Rules': [{'RulesID': '23',
'RuleName': 'Zwangsvollstr. Berlin',
'GroupHits': '1',
'KeyWordGroups': ['100', '101']}]}}}
When using this code snippet it can be transformed into a dataframe:
rules_pd = pd.DataFrame(my_dict['RuleSet'])
rules_pd
The result is:
I would like to make it look like this:
Does anyone know how to tackle this challenge?
Doing from_dict with index
out = pd.DataFrame.from_dict(my_dict['RuleSet'],'index')
Out[692]:
RuleSetID ... Rules
0 0 ... [{'RulesID': '10', 'RuleName': 'Gemeinde Seite...
1 1 ... [{'RulesID': '11', 'RuleName': 'Portale Berlin...
6 6 ... [{'RulesID': '23', 'RuleName': 'Zwangsvollstr....
[3 rows x 3 columns]
#out.columns
#Out[693]: Index(['RuleSetID', 'RuleSetName', 'Rules'], dtype='object')
You could try use Transpose()
rules_pd = pd.DataFrame(my_dict['RuleSet']).transpose()
print(rules_pd)

Is there a way to use compare a list with a dictionary key, then rearrange that list element to be in the value position?

I have a dictionary that defines the mapping I want. for example, I want list item 2 to be swapped to the list item 5 position
# FOR ENCRYPTION - Create dictionary mapping inputs to outputs of pbox of form {input:output}
pboxEncrypt = {1: 1, 2: 5, 3: 9, 4: 13, 5: 2, 6: 6, 7: 10, 8: 14, 9: 3, 10: 7, 11: 11, 12: 15, 13: 4, 15: 12, 16: 16}
I have a divded_list containing 16 individual binary string values, namely:
Divided List : ['0', '0', '1', '1', '1', '0', '0', '0', '0', '0', '1', '0', '0', '0', '0', '0']
I would like to build a function where I pass in the divided_list and pbx, loop to see if the list index matches the dictionary key, if list index=key, store this list element in a temporary list under the index defined by the dictionary value. The function returns the temporary array of rearranged values.
(assuming an index starting position of 1) an example output of this function would be:
list item at index position #3 ['1'] should be stored in the temporary list at index position #9
divided_list = [x , x , 1 , x , x , x , x , x , x , x , x , x , x , x , x , x ]
After dictionary mapping swap
temp_list = [x , x , x , x , x , x , x , x , 1 , x , x , x , x , x , x , x ]
Now for my approach to solving this issue:
def mapPBOX(div_list, pbx):
temp_list = []
for i in range(1,16):
key_list = pbx.keys() #make list of dict keys
val_list = pbx.values() #make list of dict values
for match in key_list:
if div_list[i] == key_list[match]:
temp_list[val_list[match]] = div_list[i] #map
return temp_list
Which returns an error of:
if div_list[i] == key_list[match]:
TypeError: 'dict_keys' object is not subscriptable
If temp_list[val_list[match]] = div_list[i] is not subscriptable, how should I retrieve the numerical value in the dictionary, and then perform a swap on the div_list item location?
I am familiar with using arrays in other languages like C, but python and dictionaries are both new to me. Thanks for your help
IIUC, a simple list comprehension would work.
Note that indexing starts with 0 in python, so you need to correct here.
Also you have missing mappings in your dictionary. I left those indexes unchanged.
pboxEncrypt = {1: 1, 2: 5, 3: 9, 4: 13, 5: 2, 6: 6, 7: 10, 8: 14, 9: 3, 10: 7, 11: 11, 12: 15, 13: 4, 15: 12, 16: 16}
divded_list = ['0', '0', '1', '1', '1', '0', '0', '0', '0', '0', '1', '0', '0', '0', '0', '0']
out = [divded_list[pboxEncrypt.get(i+1, i+1)-1]
for i in range (len(divded_list))]
Output:
['0', '1', '0', '0', '0', '0', '0', '0', '1', '0', '1', '0', '1', '0', '0', '0']

How can I loop through a Python list and perform math calculations on elements of the list?

I am attempting to create a contract bridge match point scoring system. In the list below the 1st, 3rd, etc. numbers are the pair numbers (players) and the 2nd, 4th etc. numbers are the scores achieved by each pair. So pair 2 scored 430, pair 3 scored 420 and so on.
I want to loop through the list and score as follows:
for each pair score that pair 2 beats they receive 2 points, for each they tie 1 point and where they don't beat they get 0 points. The loop then continues and compares each pair's score in the same way. In the example below, pair 2 gets 7 points (beating 3 other pairs and a tie with 1), pair 7 gets 0 points, pair 6 gets 12 points beating every other pair.
My list (generated from an elasticsearch json object) is:
['2', '430', '3', '420', '4', '460', '5', '400', '7', '0', '1', '430', '6', '480']
The python code I have tried (after multiple variations) is:
nsp_mp = 0
ewp_mp = 0
ns_list = []
for row in arr["hits"]["hits"]:
nsp = row["_source"]["nsp"]
nsscore = row["_source"]["nsscore"]
ns_list.append(nsp)
ns_list.append(nsscore)
print(ns_list)
x = ns_list[1]
for i in range(6): #number of competing pairs
if x > ns_list[1::2][i]:
nsp_mp = nsp_mp + 2
elif x == ns_list[1::2][i]:
nsp_mp = nsp_mp
else:
nsp_mp = nsp_mp + 1
print(nsp_mp)
which produces:
['2', '430', '3', '420', '4', '460', '5', '400', '7', '0', '1', '430', '6', '480']
7
which as per calculation above is correct. But when I try to execute a loop it does not return the correct results.
Maybe the approach is wrong. What is the correct way to do this?
The elasticsearch json object is:
arr = {'took': 0, 'timed_out': False, '_shards': {'total': 5, 'successful': 5, 'skipped': 0, 'failed': 0}, 'hits': {'total': 7, 'max_score': 1.0, 'hits': [{'_index': 'match', '_type': 'score', '_id': 'L_L122cBjpp4O0gQG0qd', '_score': 1.0, '_source': {'tournament_id': 1, 'board_number': '1', 'nsp': '2', 'ewp': '9', 'contract': '3NT', 'by': 'S', 'tricks': '10', 'nsscore': '430', 'ewscore': '0', 'timestamp': '2018-12-23T16:45:32.896151'}}, {'_index': 'match', '_type': 'score', '_id': 'MPL122cBjpp4O0gQHEog', '_score': 1.0, '_source': {'tournament_id': 1, 'board_number': '1', 'nsp': '3', 'ewp': '10', 'contract': '4S', 'by': 'N', 'tricks': '10', 'nsscore': '420', 'ewscore': '0', 'timestamp': '2018-12-23T16:45:33.027631'}}, {'_index': 'match', '_type': 'score', '_id': 'MfL122cBjpp4O0gQHEqk', '_score': 1.0, '_source': {'tournament_id': 1, 'board_number': '1', 'nsp': '4', 'ewp': '11', 'contract': '3NT', 'by': 'N', 'tricks': '11', 'nsscore': '460', 'ewscore': '0', 'timestamp': '2018-12-23T16:45:33.158060'}}, {'_index': 'match', '_type': 'score', '_id': 'MvL122cBjpp4O0gQHUoj', '_score': 1.0, '_source': {'tournament_id': 1, 'board_number': '1', 'nsp': '5', 'ewp': '12', 'contract': '3NT', 'by': 'S', 'tricks': '10', 'nsscore': '400', 'ewscore': '0', 'timestamp': '2018-12-23T16:45:33.285460'}}, {'_index': 'match', '_type': 'score', '_id': 'NPL122cBjpp4O0gQHkof', '_score': 1.0, '_source': {'tournament_id': 1, 'board_number': '1', 'nsp': '7', 'ewp': '14', 'contract': '3NT', 'by': 'S', 'tricks': '8', 'nsscore': '0', 'ewscore': '50', 'timestamp': '2018-12-23T16:45:33.538710'}}, {'_index': 'match', '_type': 'score', '_id': 'LvL122cBjpp4O0gQGkqt', '_score': 1.0, '_source': {'tournament_id': 1, 'board_number': '1', 'nsp': '1', 'ewp': '8', 'contract': '3NT', 'by': 'N', 'tricks': '10', 'nsscore': '430', 'ewscore': '0', 'timestamp': '2018-12-23T16:45:32.405998'}}, {'_index': 'match', '_type': 'score', '_id': 'M_L122cBjpp4O0gQHUqg', '_score': 1.0, '_source': {'tournament_id': 1, 'board_number': '1', 'nsp': '6', 'ewp': '13', 'contract': '4S', 'by': 'S', 'tricks': '11', 'nsscore': '480', 'ewscore': '0', 'timestamp': '2018-12-23T16:45:33.411104'}}]}}
List appears to be a poor data structure for this, I think you are making everything worse by flattening your elasticsearch object.
Note there are a few minor mistakes in listings below - to make sure
I'm not solving someone's homework for free. I also realize this is
not the most efficient way of doing so.
Try with dicts:
1) convert elasticsearch json you have to a dict with a better structure:
scores = {}
for row in arr["hits"]["hits"]:
nsp = row["_source"]["nsp"]
nsscore = row["_source"]["nsscore"]
scores[nsp] = nsscore
This will give you something like this:
{'1': '430',
'2': '430',
'3': '420',
'4': '460',
'5': '400',
'6': '480',
'7': '0'}
2) write a function to calculate pair score:
def calculate_score(pair, scores):
score = 0
for p in scores:
if p == pair:
continue
if scores[p] < scores[pair]:
score += 2 # win
elif scores[p] == scores[pair]:
score += 1
return score
This should give you something like this:
In [13]: calculate_score('1', scores)
Out[13]: 7
In [14]: calculate_score('7', scores)
Out[14]: 0
3) loop over all pairs, calculating scores. I'll leave this as exercise.
The main problem with your code is, that the loop is one short, you have 7 entries. Then you should convert the numbers to int, so that the comparison is correct. In your code, you get for ties 0 points.
Instead of having a list, with flattend pairs, you should use tuple pairs.
ns_list = []
for row in arr["hits"]["hits"]:
nsp = int(row["_source"]["nsp"])
nsscore = int(row["_source"]["nsscore"])
ns_list.append((nsp, nsscore))
print(ns_list)
x = ns_list[0][1]
nsp_mp = 0
for nsp, nsscore in ns_list:
if x > nsscore:
nsp_mp += 2
elif x == nsscore:
nsp_mp += 1
print(nsp_mp)
So we can do it like so:
import itertools
d = [(i['_source']['nsp'], i['_source']['nsscore']) for i in arr['hits']['hits']]
d
[('2', '430'),
('3', '420'),
('4', '460'),
('5', '400'),
('7', '0'),
('1', '430'),
('6', '480')]
c = itertools.combinations(d, 2)
counts = {}
for tup in c:
p1, p2 = tup
if not counts.get(p1[0]):
counts[p1[0]] = 0
if int(p1[1]) > int(p2[1]):
counts[p1[0]] += 1
counts
{'2': 3, '3': 2, '4': 3, '5': 1, '7': 0, '1': 0}
I first convert the list of your score to a dictionary object using itertools, then iterating through each key, and for each key, compare the values available in the list
and add accordingly the score you provided and since in this approach you will always add the value 1 because you will always compare it with itself so at end i decrease 1 from the final score there may be a better approach for this
ls = ['2', '430', '3', '420', '4', '460', '5', '400', '7', '0', '1', '430', '6', '480']
d = dict(itertools.zip_longest(*[iter(ls)] * 2, fillvalue=""))
values= d.values()
for item in d.keys():
score=0
for i in values:
if d[item]>i:
score+=2
elif d[item]==i:
score+=1
else:
pass
print(item,":",score-1)
Output:
2 : 7
3 : 4
4 : 10
5 : 2
7 : 0
1 : 7
6 : 12

replace blanks in numpy array

The third column in my numpy array is Age. In this column about 75% of the entries are valid and 25% are blank. Column 2 is Gender and using some manipulation I have calculated the average age of the men in my dataset to be 30. The average age of women in my dataset is 28.
I want to replace all blank Age values for men to be 30 and all blank age values for women to be 28.
However I can't seem to do this. Anyone have a suggestion or know what I am doing wrong?
Here is my code:
# my entire data set is stored in a numpy array defined as x
ismale = x[::,1]=='male'
maleAgeBlank = x[ismale][::,2]==''
x[ismale][maleAgeBlank][::,2] = 30
For whatever reason when I'm done with the above code, I type x to display the data set and the blanks still exist even though I set them to 30. Note that I cannot do x[maleAgeBlank] because that list will include some female data points since the female data points are not yet excluded.
Is there any way to get what I want? For some reason, if I do x[ismale][::,1] = 1 (setting the column with 'male' equal to 1), that works, but x[ismale][maleAgeBlank][::,2] = 30 does not work.
sample of array:
#output from typing x
array([['3', '1', '22', ..., '0', '7.25', '2'],
['1', '0', '38', ..., '0', '71.2833', '0'],
['3', '0', '26', ..., '0', '7.925', '2'],
...,
['3', '0', '', ..., '2', '23.45', '2'],
['1', '1', '26', ..., '0', '30', '0'],
['3', '1', '32', ..., '0', '7.75', '1']],
dtype='<U82')
#output from typing x[0]
array(['3', '1', '22', '1', '0', '7.25', '2'],
dtype='<U82')
Note that I have changed column 2 to be 0 for female and 1 for male already in the above output
How about this:
my_data = np.array([['3', '1', '22', '0', '7.25', '2'],
['1', '0', '38', '0', '71.2833', '0'],
['3', '0', '26', '0', '7.925', '2'],
['3', '0', '', '2', '23.45', '2'],
['1', '1', '26', '0', '30', '0'],
['3', '1', '32', '0', '7.75', '1']],
dtype='<U82')
ismale = my_data[:,1] == '0'
missing_age = my_data[:, 2] == ''
maleAgeBlank = missing_age & ismale
my_data[maleAgeBlank, 2] = '30'
Result:
>>> my_data
array([[u'3', u'1', u'22', u'0', u'7.25', u'2'],
[u'1', u'0', u'38', u'0', u'71.2833', u'0'],
[u'3', u'0', u'26', u'0', u'7.925', u'2'],
[u'3', u'0', u'30', u'2', u'23.45', u'2'],
[u'1', u'1', u'26', u'0', u'30', u'0'],
[u'3', u'1', u'32', u'0', u'7.75', u'1']],
dtype='<U82')
You can use the where function:
arr = array([['3', '1', '22', '1', '0', '7.25', '2'],
['3', '', '22', '1', '0', '7.25', '2']],
dtype='<U82')
blank = np.where(arr=='')
arr[blank] = 20
array([[u'3', u'1', u'22', u'1', u'0', u'7.25', u'2'],
[u'3', u'20', u'22', u'1', u'0', u'7.25', u'2']],
dtype='<U82')
If you want to change a specific column you can do the do the following:
male = np.where(arr[:, 1]=='') # where 1 is the column
arr[male] = 30
female = np.where(arr[:, 2]=='') # where 2 is the column
arr[female] = 28
You could try iterating through the array in a simpler way. It's not the most efficient solution, but it should get the job done.
for row in range(len(x)):
if row[2] == '':
if row[1] == 1:
row[2] == 30
else:
row[2] == 28

Categories