I create a list in python with 17 other lists inside. See the exemple:
[[3.29588, 3.14241, 2.53874, 1.87257, 1.01365, 0.844504, 0.761601, 1.28007, 1.95795, 2.33491, 3.21032, 3.6976],
[3.74857, 3.4343, 2.97245, 1.7386, 0.931359, 0.82109, 0.840537, 1.46436, 1.75026, 2.467, 3.36575, 3.6428],
[3.2517, 3.37892, 2.84753, 1.7375, 1.11921, 0.761399, 0.780625, 1.40971, 1.80878, 2.49257, 3.0503, 3.22026],
[4.86471, 3.95591, 3.31745, 2.16819, 1.40167, 0.962902, 1.01542, 1.56245, 2.2488, 3.30197, 3.78625, 4.16218],
[4.37859, 3.58889, 2.18892, 1.85142, 1.36302, 1.04413, 1.14967, 1.63279, 2.06895, 3.36799, 3.64174, 4.00779],
[3.78213, 2.85967, 2.29597, 2.0755, 1.32856, 1.07074, 1.05019, 1.43226, 2.01495, 2.96983, 4.20358, 3.97129],
[4.11538, 2.98188, 2.51697, 1.81049, 1.23526, 0.982138, 1.09718, 1.55118, 2.42966, 3.4746, 3.70046, 4.6149],
[4.28626, 4.00553, 3.36899, 2.40897, 1.40696, 0.961761, 0.881263, 1.25325, 2.05434, 2.54193, 4.13187, 4.60115],
[4.15797, 3.16266, 3.31037, 2.16276, 1.42262, 0.924327, 1.11161, 1.57012, 2.21882, 2.94404, 4.18211, 4.19463],
[3.94132, 3.74934, 3.52944, 1.98444, 1.33248, 0.974261, 0.976807, 1.63763, 1.96279, 3.17012, 2.96314, 4.23448],
[4.21067, 4.1027, 3.48602, 2.26189, 1.36373, 1.06551, 1.06262, 1.24214, 2.11701, 3.19951, 3.83816, 4.18072],
[4.52377, 4.02346, 3.10936, 2.41148, 1.44596, 1.03784, 0.997611, 1.66809, 2.2909, 3.13247, 4.07816, 3.4008],
[2.40782, 3.18881, 2.95376, 1.84203, 1.28495, 0.957945, 1.03246, 1.80852, 2.15366, 2.74635, 4.26849, 4.12046],
[4.48346, 3.81883, 2.96019, 2.34712, 1.33384, 1.01678, 1.09052, 1.44302, 2.18529, 3.29472, 3.90009, 4.67098],
[4.34282, 4.45031, 3.55955, 2.35169, 1.44429, 1.02647, 1.24539, 1.73125, 2.3716, 3.3476, 4.21021, 4.11485],
[4.5259, 4.21495, 3.26138, 2.38399, 1.55304, 1.21289, 1.17101, 1.79027, 2.24747, 3.03854, 3.31494, 3.70687],
[4.47717, 4.6265, 3.10359, 2.15151, 1.26597, 0.886686, 1.18106, 1.67292, 2.45298, 3.21713, 4.20611, 4.35356],
[4.10159, 3.83354, 2.95835, 1.65168, 1.26774, 0.846464, 0.943836, 1.49787, 2.01609, 2.84914, 3.47291, 3.63075]]
How i create a mean to each elemento of this lists. i need take the first element of each list and calculate the mean, after i need take the second element of each list and calculate the mean... And this for each one of the twelve elements of this list. In the end, i'll have just one list, with 12 elements, that represent the mean of the twelve elements of each list.
Thank you so much for the help!
Here is a solution (lst is your list of lists):
means = [sum(sublst[i] for sublst in lst) / len(lst) for i in range(len(lst[0]))]
Using map and zip functions would be appropriate here:
list(map(lambda x: sum(x)/len(x), zip(*lst)))
[4.049761666666666,
3.695478333333333,
3.015501666666667,
2.067323888888889,
1.3063504999999997,
0.9665465000000002,
1.0216338888888887,
1.5359944444444444,
2.130572222222222,
2.993912222222222,
3.7513661111111114,
4.029226111111111]
You could also use statistics.mean:
from statistics import mean
list(map(mean, zip(*lst)))
a = [[3.29588, 3.14241, 2.53874, 1.87257, 1.01365, 0.844504, 0.761601, 1.28007, 1.95795, 2.33491, 3.21032, 3.6976], [3.74857, 3.4343, 2.97245, 1.7386, 0.931359, 0.82109, 0.840537, 1.46436, 1.75026, 2.467, 3.36575, 3.6428], [3.2517, 3.37892, 2.84753, 1.7375, 1.11921, 0.761399, 0.780625, 1.40971, 1.80878, 2.49257, 3.0503, 3.22026], [4.86471, 3.95591, 3.31745, 2.16819, 1.40167, 0.962902, 1.01542, 1.56245, 2.2488, 3.30197, 3.78625, 4.16218], [4.37859, 3.58889, 2.18892, 1.85142, 1.36302, 1.04413, 1.14967, 1.63279, 2.06895, 3.36799, 3.64174, 4.00779], [3.78213, 2.85967, 2.29597, 2.0755, 1.32856, 1.07074, 1.05019, 1.43226, 2.01495, 2.96983, 4.20358, 3.97129], [4.11538, 2.98188, 2.51697, 1.81049, 1.23526, 0.982138, 1.09718, 1.55118, 2.42966, 3.4746, 3.70046, 4.6149], [4.28626, 4.00553, 3.36899, 2.40897, 1.40696, 0.961761, 0.881263, 1.25325, 2.05434, 2.54193, 4.13187, 4.60115], [4.15797, 3.16266, 3.31037, 2.16276, 1.42262, 0.924327, 1.11161, 1.57012, 2.21882, 2.94404, 4.18211, 4.19463], [3.94132, 3.74934, 3.52944, 1.98444, 1.33248, 0.974261, 0.976807, 1.63763, 1.96279, 3.17012, 2.96314, 4.23448], [4.21067, 4.1027, 3.48602, 2.26189, 1.36373, 1.06551, 1.06262, 1.24214, 2.11701, 3.19951, 3.83816, 4.18072], [4.52377, 4.02346, 3.10936, 2.41148, 1.44596, 1.03784, 0.997611, 1.66809, 2.2909, 3.13247, 4.07816, 3.4008], [2.40782, 3.18881, 2.95376, 1.84203, 1.28495, 0.957945, 1.03246, 1.80852, 2.15366, 2.74635, 4.26849, 4.12046], [4.48346, 3.81883, 2.96019, 2.34712, 1.33384, 1.01678, 1.09052, 1.44302, 2.18529, 3.29472, 3.90009, 4.67098], [4.34282, 4.45031, 3.55955, 2.35169, 1.44429, 1.02647, 1.24539, 1.73125, 2.3716, 3.3476, 4.21021, 4.11485], [4.5259, 4.21495, 3.26138, 2.38399, 1.55304, 1.21289, 1.17101, 1.79027, 2.24747, 3.03854, 3.31494, 3.70687], [4.47717, 4.6265, 3.10359, 2.15151, 1.26597, 0.886686, 1.18106, 1.67292, 2.45298, 3.21713, 4.20611, 4.35356], [4.10159, 3.83354, 2.95835, 1.65168, 1.26774, 0.846464, 0.943836, 1.49787, 2.01609, 2.84914, 3.47291, 3.63075]]
avgs = []
cnts = []
N = len(a)
M = len(a[0])
# initialize arrays
for i in range(0, M):
avgs.append(0)
cnts.append(0)
# update averages
for i in range(0, N):
for j in range(0, len(a[i])):
cnts[j] += 1
avgs[j] += a[i][j]
# divide by count
for i in range(0, M):
avgs[i] /= cnts[i]
# print averages
print(avgs)
I am running my own little experiment and need a little help with the code.
I am creating a list that stores 100 sets in index locations 0-99, with each stored set storing random numbers ranging from 1 to 100 that came from a randomly generated list containing 100 numbers.
For each set of numbers, I use the set() command to filter out any duplicates before appending this set to a list...so basically I have a list of 100 sets which contain numbers between 1-100.
I wrote a little bit of code to check the length of each set - I noticed that my sets were often 60-69 elements in length! Basically, 1/3 of all numbers is a duplicate.
The code:
from random import randint
sets = []
#Generate list containing 100 sets of sets.
#sets contain numbers between 1 and 100 and no duplicates.
for i in range(0, 100):
nums = []
for x in range(1, 101):
nums.append(randint(1, 100))
sets.append(set(nums))
#print sizes of each set
for i in range(0, len(sets)):
print(len(sets[i]))
#I now want to create a final set
#using the data stored within all sets to
#see if there is any unique value.
So here is the bit I can't get my head around...I want to see if there is a unique number in all of those sets! What I can't work out is how I go about doing that.
I know I can directly compare a set with another set if they are stored in their own variables...but I can't work out an efficient way of looping through a list of sets and compare them all to create a new set which, I hope, might contain just one unique value!
I have seen this code in the documentation...
s.symmetric_difference_update(t)
But I can't work out how I might apply that to my code.
Any help would be greatly appreciated!!
You could use a Counter dict to count the occurrences keeping values that only have a value of 1 across all sets:
from collections import Counter
sets = [{randint(1, 100) for _ in range(100)} for i in range(100)]
from itertools import chain
cn = Counter(chain.from_iterable(sets))
unique = [k for k, v in cn.items() if v == 1] # use {} to get a set
print(unique)
For an element to only be unique to any set the count of the element must be 1 across all sets in your list.
If we use a simple example where we add a value definitely outside our range:
In [27]: from random import randint
In [28]: from collections import Counter
In [29]: from itertools import chain
In [30]: sets = [{randint(1, 100) for _ in range(100)} for i in range(0, 100)]+ [{1, 2, 102},{3,4,103}]
In [31]: cn = Counter(chain.from_iterable(sets))
In [32]: unique = [k for k, v in cn.items() if v == 1]
In [33]: print(unique)
[103, 102]
If you want to find the sets that contain any of those elements:
In [34]: for st in sets:
....: if not st.isdisjoint(unique):
....: print(st)
....:
set([1, 2, 102])
set([3, 4, 103])
For your edited part of the question you can still use a Counter dict using Counter.most_common to get the min and max occurrence:
from collections import Counter
cn = Counter()
identified_sets = 0
sets = ({randint(1, MAX) for _ in range(MAX)} for i in range(MAX))
for i, st in enumerate(sets):
cn.update(st)
if len(st) < 60 or len(st) > 70:
print("Set {} Set Length: {}, Duplicates discarded: {:.0f}% *****".
format(i, len(st), (float((MAX - len(st)))/MAX)*100))
identified_sets += 1
else:
print("Set {} Set Length: {}, Duplicates discarded: {:.0f}%".
format(i, len(st), (float((MAX - len(st)))/MAX)*100))
#print lowest fequency
comm = cn.most_common()
print("key {} : count {}".format(comm[-1][0],comm[-1][1]))
#print highest frequency
print("key {} : count {}".format(comm[0][0], comm[0][1]))
print("Count of identified sets: {}, {:.0f}%".
format(identified_sets, (float(identified_sets)/MAX)*100))
If you call random.seed(0) before you create the sets in this and your own code you will see they both return identical numbers.
well you can do:
result = set()
for s in sets:
result.symmetric_difference_update(s)
After looking through the comments I decided to do things a little differently to accomplish my goal. Essentially, I realised I just wanted to check the frequency of numbers generated by a random number generator after all duplicates have been removed. I thought I could do this by using sets to remove duplicates and then using a set to remove duplicates found in sets...but this actually doesn't work!!
I also noticed that with 100 sets containing a maximum 100 possible numbers, on average the number of duplicated numbers was around 30-40%. As you increase the maximum number of sets and, thus the maximum number of numbers generated, the % of duplicated numbers discarded decreases by a clear pattern.
After further investigation you can work out the % of discarded numbers - its all down to probability of hitting the same number once a number has been generated...
Anyway...thanks for the help!
The code updated:
from random import randint
sets = []
identified_sets = 0
MAX = 100
for i in range(0, MAX):
nums = []
for x in range(1, MAX + 1):
nums.append(randint(1, MAX))
nums.sort()
print("Set %i" % i)
print(nums)
print()
sets.append(set(nums))
for i in range(0, len(sets)):
#Only relevant when using MAX == 100
if len(sets[i]) < 60 or len(sets[i]) > 70:
print("Set %i Set Length: %i, Duplicates discarded: %.0f%s *****" %
(i, len(sets[i]), (float((MAX - len(sets[i])))/MAX)*100, "%"))
identified_sets += 1
else:
print("Set %i Set Length: %i, Duplicates discarded: %.0f%s" %
(i, len(sets[i]), (float((MAX - len(sets[i])))/MAX)*100, "%"))
#dictionary of numbers
count = {}
for i in range(1, MAX + 1):
count[i] = 0
#count occurances of numbers
for s in sets:
for e in s:
count[int(e)] += 1
#print lowest fequency
print("key %i : count %i" %
(min(count, key=count.get), count[min(count, key=count.get)]))
#print highest frequency
print("key %i : count %i" %
(max(count, key=count.get), count[max(count, key=count.get)]))
#print identified sets <60 and >70 in length as these appear less often
print("Count of identified sets: %i, %.0f%s" %
(identified_sets, (float(identified_sets)/MAX)*100, "%"))
You can keep the reversed matrix as well, which is a mapping from numbers to the set of set indexes where this number has places in. This mapping should be a dict (from numbers to sets) in gerenal, but a simple list of sets can do the trick here.
(We could use Counter too, instead of keeping the whole reversed matrix)
from random import randint
sets = [set() for _ in range(100)]
byNum = [set() for _ in range(100)]
#Generate list containing 100 sets of sets.
#sets contain numbers between 1 and 100 and no duplicates.
for setIndex in range(0, 100):
for numIndex in range(100):
num = randint(1, 100)
byNum[num].add(setIndex)
sets[setIndex].add(num)
#print sizes of each set
for setIndex, _set in enumerate(sets):
print(setIndex, len(_set))
#I now want to create a final set
#using the data stored within all sets to
#see if there is any unique value.
for num, setIndexes in enumerate(byNum)[1:]:
if len(setIndexes) == 100:
print 'number %s has appeared in all the random sets'%num
Hi I'm trying to make a list of the maximum value of a unique string within a list.
example:
a = ['DS10.json', 'DS11.json', 'DT4.json', 'DT5.json', 'DT6.json', 'CJ6.json', 'CJ7.json']
should return me a list of the following:
['DS11.json', 'DT6.json', 'CJ7.json']
I have tried the following code:
def j(l):
p = []
for i in l:
digcode = i.split('.')[0]
if any(s.startswith(digcode[:2]) for s in p): #there exists prefex in list
if digcode[2:] > p[[n for n, l in enumerate(p) if l.startswith(digcode[:2])][0]][2:]:
p.pop([n for n, l in enumerate(p) if l.startswith(digcode[:2])][0])
p.append(digcode)
else:
pass
else:
p.append(digcode)
return p
But when I apply it to a larger sample it does not do an accurate job
>>> o = ['AS6.json', 'AS7.json', 'AS8.json', 'AS9.json', 'BS1.json', 'BS2.json', 'BS3.json', 'BS4.json', 'BS5.json', 'CS1.json', 'CS2.json', 'CS3.json', 'CS4.json', 'CS5.json', 'CS6.json', 'DS10.json', 'DS11.json', 'DS4.json', 'DS5.json', 'DS6.json', 'DS7.json', 'DS8.json', 'DS9.json', 'ES4.json', 'ES5.json', 'ES6.json', 'FS5.json', 'FS6.json', 'FS7.json', 'FS8.json', 'MS4.json', 'MS5.json', 'MS6.json', 'MS7.json', 'MS8.json', 'MS9.json', 'NR1.json', 'NR2.json', 'NR3.json', 'NR4.json', 'NR5.json', 'NR6.json', 'NR7.json', 'NR8.json', 'VR1.json', 'VR2.json', 'VR3.json', 'VR4.json', 'VR5.json', 'VR6.json', 'VR7.json', 'VR8.json', 'XS11.json', 'XS9.json']
>>> j(o)
['AS9', 'BS5', 'CS6', 'DS9', 'ES6', 'FS8', 'MS9', 'NR8', 'VR8', 'XS9']
which is incorrect as there is a XS11 and DS11 as an example.
I would appreciate if someone could help me rectify my problem or perhaps find a simpler solution to my problem. Thank you
You are making string comparisons; '9' is greater than '11' because the character '9' comes later in the alphabet. You'll have to convert those to integers first.
I'd use a dictionary to map prefixes to the maximum number:
def find_latest(lst):
prefixes = {}
for entry in lst:
code, value = entry[:2], int(entry.partition('.')[0][2:])
if value > prefixes.get(code, (float('-inf'), ''))[0]:
prefixes[code] = (value, entry)
return [entry for value, entry in prefixes.values()]
This is far more efficient as it doesn't loop over your whole input list each time; you are processing the list N^2 times (add one element and you are adding N tests to work through); it processes your list in N steps instead. So instead of 100 tests for 10 elements, this just executes 10 tests.
Demo:
>>> sample = ['AS6.json', 'AS7.json', 'AS8.json', 'AS9.json', 'BS1.json', 'BS2.json', 'BS3.json', 'BS4.json', 'BS5.json', 'CS1.json', 'CS2.json', 'CS3.json', 'CS4.json', 'CS5.json', 'CS6.json', 'DS10.json', 'DS11.json', 'DS4.json', 'DS5.json', 'DS6.json', 'DS7.json', 'DS8.json', 'DS9.json', 'ES4.json', 'ES5.json', 'ES6.json', 'FS5.json', 'FS6.json', 'FS7.json', 'FS8.json', 'MS4.json', 'MS5.json', 'MS6.json', 'MS7.json', 'MS8.json', 'MS9.json', 'NR1.json', 'NR2.json', 'NR3.json', 'NR4.json', 'NR5.json', 'NR6.json', 'NR7.json', 'NR8.json', 'VR1.json', 'VR2.json', 'VR3.json', 'VR4.json', 'VR5.json', 'VR6.json', 'VR7.json', 'VR8.json', 'XS11.json', 'XS9.json']
>>> def find_latest(lst):
... prefixes = {}
... for entry in lst:
... code, value = entry[:2], int(entry.partition('.')[0][2:])
... if value > prefixes.get(code, (float('-inf'), ''))[0]:
... prefixes[code] = (value, entry)
... return [entry for value, entry in prefixes.values()]
...
>>> find_latest(sample)
['FS8.json', 'VR8.json', 'AS9.json', 'MS9.json', 'BS5.json', 'CS6.json', 'XS11.json', 'NR8.json', 'DS11.json', 'ES6.json']
It looks as though your digcode[2:] values are being compared lexicographically (dictionary order), rather than numerically.
So 9 is considered to be "larger than" 11, because in a list of words, a word that began with "9" would come after a word that began with "11".
For comparison purposes you should convert digcode[2:] to a number i.e. int(digcode[2:])
if digcode[2:] > p[[n for n, l in enumerate(p) if l.startswith(digcode[:2])][0]][2:]:
to
if int(digcode[2:]) > int(p[[n for n, l in enumerate(p) if l.startswith(digcode[:2])][0]][2:]):
This gives:
>>> j(o)
['AS9', 'BS5', 'CS6', 'DS11', 'ES6', 'FS8', 'MS9', 'NR8', 'VR8', 'XS11']
Instead of a complete shuffle, I am looking for a partial shuffle function in python.
Example : "string" must give rise to "stnrig", but not "nrsgit"
It would be better if I can define a specific "percentage" of characters that have to be rearranged.
Purpose is to test string comparison algorithms. I want to determine the "percentage of shuffle" beyond which an(my) algorithm will mark two (shuffled) strings as completely different.
Update :
Here is my code. Improvements are welcome !
import random
percent_to_shuffle = int(raw_input("Give the percent value to shuffle : "))
to_shuffle = list(raw_input("Give the string to be shuffled : "))
num_of_chars_to_shuffle = int((len(to_shuffle)*percent_to_shuffle)/100)
for i in range(0,num_of_chars_to_shuffle):
x=random.randint(0,(len(to_shuffle)-1))
y=random.randint(0,(len(to_shuffle)-1))
z=to_shuffle[x]
to_shuffle[x]=to_shuffle[y]
to_shuffle[y]=z
print ''.join(to_shuffle)
This is a problem simpler than it looks. And the language has the right tools not to stay between you and the idea,as usual:
import random
def pashuffle(string, perc=10):
data = list(string)
for index, letter in enumerate(data):
if random.randrange(0, 100) < perc/2:
new_index = random.randrange(0, len(data))
data[index], data[new_index] = data[new_index], data[index]
return "".join(data)
Your problem is tricky, because there are some edge cases to think about:
Strings with repeated characters (i.e. how would you shuffle "aaaab"?)
How do you measure chained character swaps or re arranging blocks?
In any case, the metric defined to shuffle strings up to a certain percentage is likely to be the same you are using in your algorithm to see how close they are.
My code to shuffle n characters:
import random
def shuffle_n(s, n):
idx = range(len(s))
random.shuffle(idx)
idx = idx[:n]
mapping = dict((idx[i], idx[i-1]) for i in range(n))
return ''.join(s[mapping.get(x,x)] for x in range(len(s)))
Basically chooses n positions to swap at random, and then exchanges each of them with the next in the list... This way it ensures that no inverse swaps are generated and exactly n characters are swapped (if there are characters repeated, bad luck).
Explained run with 'string', 3 as input:
idx is [0, 1, 2, 3, 4, 5]
we shuffle it, now it is [5, 3, 1, 4, 0, 2]
we take just the first 3 elements, now it is [5, 3, 1]
those are the characters that we are going to swap
s t r i n g
^ ^ ^
t (1) will be i (3)
i (3) will be g (5)
g (5) will be t (1)
the rest will remain unchanged
so we get 'sirgnt'
The bad thing about this method is that it does not generate all the possible variations, for example, it could not make 'gnrits' from 'string'. This could be fixed by making partitions of the indices to be shuffled, like this:
import random
def randparts(l):
n = len(l)
s = random.randint(0, n-1) + 1
if s >= 2 and n - s >= 2: # the split makes two valid parts
yield l[:s]
for p in randparts(l[s:]):
yield p
else: # the split would make a single cycle
yield l
def shuffle_n(s, n):
idx = range(len(s))
random.shuffle(idx)
mapping = dict((x[i], x[i-1])
for i in range(len(x))
for x in randparts(idx[:n]))
return ''.join(s[mapping.get(x,x)] for x in range(len(s)))
import random
def partial_shuffle(a, part=0.5):
# which characters are to be shuffled:
idx_todo = random.sample(xrange(len(a)), int(len(a) * part))
# what are the new positions of these to-be-shuffled characters:
idx_target = idx_todo[:]
random.shuffle(idx_target)
# map all "normal" character positions {0:0, 1:1, 2:2, ...}
mapper = dict((i, i) for i in xrange(len(a)))
# update with all shuffles in the string: {old_pos:new_pos, old_pos:new_pos, ...}
mapper.update(zip(idx_todo, idx_target))
# use mapper to modify the string:
return ''.join(a[mapper[i]] for i in xrange(len(a)))
for i in xrange(5):
print partial_shuffle('abcdefghijklmnopqrstuvwxyz', 0.2)
prints
abcdefghljkvmnopqrstuxwiyz
ajcdefghitklmnopqrsbuvwxyz
abcdefhwijklmnopqrsguvtxyz
aecdubghijklmnopqrstwvfxyz
abjdefgcitklmnopqrshuvwxyz
Evil and using a deprecated API:
import random
# adjust constant to taste
# 0 -> no effect, 0.5 -> completely shuffled, 1.0 -> reversed
# Of course this assumes your input is already sorted ;)
''.join(sorted(
'abcdefghijklmnopqrstuvwxyz',
cmp = lambda a, b: cmp(a, b) * (-1 if random.random() < 0.2 else 1)
))
maybe like so:
>>> s = 'string'
>>> shufflethis = list(s[2:])
>>> random.shuffle(shufflethis)
>>> s[:2]+''.join(shufflethis)
'stingr'
Taking from fortran's idea, i'm adding this to collection. It's pretty fast:
def partial_shuffle(st, p=20):
p = int(round(p/100.0*len(st)))
idx = range(len(s))
sample = random.sample(idx, p)
res=str()
samptrav = 1
for i in range(len(st)):
if i in sample:
res += st[sample[-samptrav]]
samptrav += 1
continue
res += st[i]
return res