I have a file say : file1.txt, which has multiple rows and columns. I want to read that and store that as list of lists. Now I want to pair them using the logic, no 2 same rows can be in a pair. Now the 2nd lastcolumn represent the class. Below is my file:
27,28,29,30,1,0.67
31,32,33,34,1,0.84
35,36,37,38,1,0.45
39,40,41,42,1,0.82
43,44,45,46,1,0.92
43,44,45,46,1,0.92
Here all the 6 rows are class 1. I am using below logic to do this pairing part.
from operator import itemgetter
rule_file_name = 'file1.txt'
rule_fp = open(rule_file_name)
list1 = []
for line in rule_fp.readlines():
list1.append(line.replace("\n","").split(","))
list1=sorted(list1,key=itemgetter(-1),reverse=True)
length = len(list1)
middle_index = length // 2
first_half = list1[:middle_index]
second_half = list1[middle_index:]
result=[]
result=list(zip(first_half,second_half))
for a,b in result:
if a==b:
result.remove((a, b))
print(result)
print("-------------------")
It is working absolutely fine when I have one class only. But if my file has multiple classes then I want the pairing to be done with is the same class only. For an example if my file looks like below: say file2
27,28,29,30,1,0.67
31,32,33,34,1,0.84
35,36,37,38,1,0.45
39,40,41,42,1,0.82
43,44,45,46,1,0.92
43,44,45,46,1,0.92
51,52,53,54,2,0.28
55,56,57,58,2,0.77
59,60,61,62,2,0.39
63,64,65,66,2,0.41
75,76,77,78,3,0.51
90,91,92,93,3,0.97
Then I want to make 3 pairs from class 1, 2 from class 2 and 1 from class 3.Then I am using this logic to make the dictionary where the keys will be the classes.
d = {}
sorted_grouped = []
for row in list1:
# Add name to dict if not exists
if row[-2] not in d:
d[row[-2]] = []
# Add all non-Name attributes as a new list
d[row[-2]].append(row)
#print(d.items())
for k,v in d.items():
sorted_grouped.append(v)
#print(sorted_grouped)
gp_vals = {}
for i in sorted_grouped:
gp_vals[i[0][-2]] = i
print(gp_vals)
Now how can I do it, please help !
My desired output for file2 is:
[([43,44,45,46,1,0.92], [39,40,41,42,1,0.82]), ([43,44,45,46,1,0.92],
[27,28,29,30,1,0.67]), ([31,32,33,34,1,0.84], [35,36,37,38,1,0.45])]
[([55,56,57,58,2,0.77], [59,60,61,62,2,0.39]), ([63,64,65,66,2,0.41],
[51,52,53,54,2,0.28])] [([90,91,92,93,3,0.97], [75,76,77,78,3,0.51])]
Edit1:
All the files will have even number of rows, where every class will have even number of rows as well.
For a particular class(say class 2), if there are n rows then there can be maximum n/2 identical rows for that class in the dataset.
My primary intention was to get random pairing but making sure no self pairing is allowed. For that I thought of taking the row with the highest fitness value(The last column) inside any class and take any other row from that class randomly and make a pair just by making sure both the rows are not exactly the same. And this same thing is repeated for every class separately.
First read in the data from the file, I'd use assert here to communicate your assumptions to people who read the code (including future you) and to confirm the assumption actually holds for the file. If not it will raise an AssertionError.
rule_file_name = 'file2.txt'
list1 = []
with open(rule_file_name) as rule_fp:
for line in rule_fp.readlines():
list1.append(line.replace("\n","").split(","))
assert len(list1) & 1 == 0 # confirm length is even
Then use a defaultdict to store the lists for each class.
from collections import defaultdict
classes = defaultdict(list)
for _list in list1:
classes[_list[4]].append(_list)
Then use sample to draw pairs and confirm they aren't the same. Here I'm including a seed to make the results reproducible but you can take that out for randomness.
from random import sample, seed
seed(1) # remove this line when you want actual randomness
for key, _list in classes.items():
assert len(_list) & 1 == 0 # each also be even else an error in data
_list.sort(key=lambda x: x[5])
pairs = []
while _list:
first = _list[-1]
candidate = sample(_list, 1)[0]
if first != candidate:
print(f'first {first}, candidate{candidate}')
pairs.append((first, candidate))
_list.remove(first)
_list.remove(candidate)
classes[key] = pairs
Note that an implicit assumption in the way to do the sampling (stated in edit) is that the duplicates arise from the highest fitness values. If this is not true this could go into an infinite loop.
If you want to print them then iterate over the dictionary again:
for key, pairs in classes.items():
print(key, pairs)
which for me gives:
1 [(['43', '44', '45', '46', '1', '0.92'], ['27', '28', '29', '30', '1', '0.67']), (['43', '44', '45', '46', '1', '0.92'], ['31', '32', '33', '34', '1', '0.84']), (['39', '40', '41', '42', '1', '0.82'], ['35', '36', '37', '38', '1', '0.45'])]
2 [(['55', '56', '57', '58', '2', '0.77'], ['51', '52', '53', '54', '2', '0.28']), (['63', '64', '65', '66', '2', '0.41'], ['59', '60', '61', '62', '2', '0.39'])]
3 [(['90', '91', '92', '93', '3', '0.97'], ['75', '76', '77', '78', '3', '0.51'])]
Using these values for file2.text-the first numbers are row numbers and not part of the actual file.
1 27,28,29,30,1,0.67
2 31,32,33,34,1,0.84
3 35,36,37,38,1,0.45
4 39,40,41,42,1,0.82
5 43,44,45,46,1,0.92
6 43,44,45,46,1,0.92
7 51,52,53,54,2,0.28
8 55,56,57,58,2,0.77
9 59,60,61,62,2,0.39
10 63,64,65,66,2,0.41
11 75,76,77,78,3,0.51
12 90,91,92,93,3,0.97
There are lots of similar posts out there, but I could not find something that directly matched, or resulted in a solution to, the issue I am dealing with.
I want to use the second instance of a repeated index contained in a list as the index of another list. When the function is executed, I want all numbers from the start of the list up to the first '\*' to print after Code1, all numbers between the first '\*' and the second '\*' to print after Code2, and then all numbers following the second '\*' until the end of the list to print after Code3. Example data for digit would be "['1', '2', '3', '4', '5', '\*', '6', '\*', '7', '8', '9', '10', '1']".
In other words, I want the code below to print , assuming those digits exist, User Code: 12345, Pass Code: 6, Pin Code: 789101, all in one line.
print_string += 'User Code: {} '.format(''.join(str(dig) for dig in digit[:digit.index('*')])) + \
'Pass Code: {} '.format(''.join(str(dig) for dig in digit[digit.index('*'):digit.index('*')])) + \
'Pin Code: {} '.format(''.join(str(dig) for dig in digit[digit.index('*'):]))
print(print_string)
Essentially, I would like to call the first asterisk as the right index for User Code, the first asterisk as the left index and the second asterisk as the right index for Pass Code, and the second asterisk as the left index for Pin Code.
I just cannot figure out how make it look for sequential asterisks. If there is a simpler way to execute this, please let me know!
Given,
L = ['1', '2', '3', '4', '5', '\*', '6', '\*', '7', '8', '9', '10', '1']
Then
str.join('', L)
will form a string
'12345\\*6\\*789101'
which you can split into the three parts
parts = str.join('', L).split('\*')
and then pull out what you need
user_code = parts[0]
pass_code = parts[1]
pin = parts[2]
If you have actually got all the digits in a list like shape ina string,
"['1', '2', '3', '4', '5', '\*', '6', '\*', '7', '8', '9', '10', '1']"
it might be worth just having them as a list, then you can use the join/split method above.
this is my code:
positions = []
for i in lines[2]:
if i not in positions:
positions.append(i)
print (positions)
print (lines[1])
print (lines[2])
the output is:
['1', '2', '3', '4', '5']
['is', 'the', 'time', 'this', 'ends']
['1', '2', '3', '4', '1', '5']
I would want my output of the variable "positions" to be; ['2','3','4','1','5']
so instead of removing the second duplicate from the variable "lines[2]" it should remove the first duplicate.
You can reverse your list, create the positions and then reverse it back as mentioned by #tobias_k in the comment:
lst = ['1', '2', '3', '4', '1', '5']
positions = []
for i in reversed(lst):
if i not in positions:
positions.append(i)
list(reversed(positions))
# ['2', '3', '4', '1', '5']
You'll need to first detect what values are duplicated before you can build positions. Use an itertools.Counter() object to test if a value has been seen more than once:
from itertools import Counter
counts = Counter(lines[2])
positions = []
for i in lines[2]:
counts[i] -= 1
if counts[i] == 0:
# only add if this is the 'last' value
positions.append(i)
This'll work for any number of repetitions of values; only the last value to appear is ever used.
You could also reverse the list, and track what you have already seen with a set, which is faster than testing against the list:
positions = []
seen = set()
for i in reversed(lines[2]):
if i not in seen:
# only add if this is the first time we see the value
positions.append(i)
seen.add(i)
positions = positions[::-1] # reverse the output list
Both approaches require two iterations; the first to create the counts mapping, the second to reverse the output list. Which is faster will depend on the size of lines[2] and the number of duplicates in it, and wether or not you are using Python 3 (where Counter performance was significantly improved).
you can use a dictionary to save the last position of the element and then build a new list with that information
>>> data=['1', '2', '3', '4', '1', '5']
>>> temp={ e:i for i,e in enumerate(data) }
>>> sorted(temp, key=lambda x:temp[x])
['2', '3', '4', '1', '5']
>>>
I am trying to find the maximum value for different subsets of a list.
def max_value(filename):
CHR=['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', 'X']
SNP = [ ]
chr_max=[ ]
for n in CHR:
for r in reader:
if r[1]==n:
SNP.append(r[2]) #append values into empty list SNP
SNP = [try_int(x) for x in SNP] #convert to integers
max_val=max(SNP) #find the maximum value
chr_max.append((n, max_val)) #append this maximum to a new list
del SNP[:] #clear the list and loop for next item in CHR list
return chr_max
I keep getting
ValueError: max() arg is an empty sequence
When I remove the del SNP[:] step I get output, but it returns the max value for n='1'(since it is the maximum value overall it gets returned for all 20 loops, if i do not empty clear the list).
How do I clear the SNP list at the end of each loop, so I can find the maximum value for different subsets of the list?
You need to reverse the reader and CHR loops so you only loop reader once:
SNPs = {}
for r in reader:
for n in CHR:
if r[1]==n:
SNPs.setdefault(n, []).append(r[2]) #append values into empty list SNP
for n in CHR:
SNP = SNPs[n]
# I didn't change anything below here..
SNP = [try_int(x) for x in SNP] #convert to integers
max_val=max(SNP) #find the maximum value
chr_max.append((n, max_val)) #append this maximum to a new list
Note you can also use
from itertools import defaultdict
SNPs = defaultdict(list)
and change the append to:
SNPs[n].append(r[2])
If reader is a file object or csv.reader() object, you cannot loop over it multiple times and expect it to start from the beginning again.
A file object would need to be rewound to the start with reader.seek(0), for example.
As a consequence, the second time your code reaches the for r in reader: loop, the loop terminates immediately without executing any iterations, no new elements are added to SNP and it remains empty.
You could just sort the input from the reader iterable into a dictionary instead of continues looping:
CHR=['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', 'X']
values = {c: [] for c in CHR}
for row in reader:
if row[1] in values:
values[row[1]].append(try_int(row[2]))
return [max(values[c]) for c in CHR if values[c]]