I'm trying to make a python script, that gets me the longest repeated character in a given matrix (horizontally and vertically).
Example:
I have this matrix:
afaaf
rbaca
rlaff
Giving this matrix for input, it should result: a 3
You can see that that the 3rd column of the matrix, is full of a's and also, it's the most repeated character in the matrix.
What I have:
#!/bin/python2.7
#Longest string in matrix
#Given a matrix filled with letters. Find the longest string, containing only the same letter, which can be obtained by starting
#with any position and then moving horizontally and vertically (each cell can be visited no more than 1 time).
# Settings here
# -------------
string_matrix = """
afaaf
rbaca
rlaff
"""
pos = (0,0)
# -------------
import pdb
import time
import collections
from collections import defaultdict
import re
rows = 0
columns = 0
matrix = []
matrix2 = []
counter = 0
res_l = []
i = 0
c = ''
# if matrix2 is full of 1's, stop
def stop():
for i in range(0, rows):
for j in range(0, columns):
if matrix2[i][j] == 0:
return False
return True
# checks the points, and returns the most repeated char and length
def check_points(points1, points2):
r = []
r.append(-1)
r.append('')
# create strings from matrix
s1 = ''
s2 = ''
for point in points1:
s1 += matrix[point[0]][point[1]]
for point in points2:
s2 += matrix[point[0]][point[1]]
rr = {}
for c in s1:
rr[c] = 0
for c in s2:
rr[c] = 0
for i in range(0, len(s1)):
k = 1
for j in range(i+1, len(s1)):
if s1[i] == s1[j]:
k += 1
else:
break
if k > rr[s1[i]]:
rr[s1[i]] = k
for i in range(0, len(s2)):
k = 1
for j in range(i+1, len(s2)):
if s2[i] == s2[j]:
k += 1
else:
break
if k > rr[s2[i]]:
rr[s2[i]] = k
m = -1
c = ''
for key,value in rr.iteritems():
if value > m:
m = value
c = key
return m, c
# Depth-first search, recursive
def search(pos):
global res_l
global matrix2
global c
counter = 0
x = pos[0]
y = pos[1]
c = matrix[x][y]
# base clause
# when matrix2 is all checked
if stop():
return counter, c
points1 = []
points2 = []
allpoints = []
for i in range(0, columns):
if matrix2[x][i] != 1:
points1.append([x, i])
allpoints.append([x, i])
for i in range(0, rows):
if matrix2[i][x] != 1:
points2.append([i, x])
allpoints.append([i, x])
r = check_points(points1, points2)
if r[0] > counter:
counter = r[0]
c = r[1]
matrix2[x][y] = 1
for point in allpoints:
rr = search(point)
if rr[0] > counter:
counter = int(rr[0])
c = rr[1]
#print 'c: ' + str(c) + ' - k: ' + str(counter)
return counter, c
def main():
# create the matrix from string
string_matrix_l = string_matrix.strip()
splited = string_matrix_l.split('\n')
global rows
global columns
global matrix
global matrix2
rows = len(splited)
columns = len(splited[1])
# initialize matrixes with 0
matrix = [[0 for x in range(columns)] for x in range(rows)]
matrix2 = [[0 for x in range(columns)] for x in range(rows)]
# string to matrix
i = 0
for s in splited:
s = s.strip()
if s == '':
continue
j = 0
for c in s:
try:## Heading ##
matrix[i][j] = c
#print 'ok: ' + str(i) + ' ' + str(j) + ' ' + c
except:
print 'fail: index out of range matrix[' + str(i) + '][' + str(j)+'] ' + c
j = j + 1
i = i + 1
# print some info
print 'Given matrix: ' + str(matrix) + '\n'
print 'Start position: ' + str(pos)
print 'Start character: ' + str(matrix[pos[0]][pos[1]])
# get the result
res = search(pos)
print '-------------------------------------'
print '\nChar: ' + str(res[1]) + '\nLength: ' + str(res[0])
if __name__ == "__main__":
main()
This is my source code.
The example given above, is also used in the source code. The result given is: r 2 which is wrong ... again, should be a 3
It has 4 functions: main, search, stop and check_points.
main is initializing things up,
search is my recursive function that takes one parameter (the start point), and should recursively check for the longest string. I have another matrix, same length as original, which is just 1 and 0. 1 means the position was visited, 0, not. The search function is setting 1 on the right position after a certain position was processed by the search function.
stop is checking if matrix2 is full of 1's, in this case, the matrix was all parsed
check_points takes 2 parameters, 2 list of points, and returns the most repeated character and it's length for those points
What doesn't work:
Most of the time is giving me the wrong character as result, even thought the count might be right sometimes. Sometimes it's working on horizontally, sometimes it doesn't. I am sure that I'm doing something wrong, but ... it's over 1 week now since I'm trying to figure out how to do this. Asked another question here on stackoverflow, got bit further but ... still stuck.
Any suggestion is appreciated.
You can use itertools.groupby to quickly find the count of repetitions of some character, and izip_longest(*matrix) to transpose the matrix (swap its rows and columns).
from itertools import groupby, izip_longest
matrix_string = """
afaaf
rbaca
rlaff
"""
def longest_repetition(row):
return max((sum(1 for item in group), letter)
for letter, group in groupby(row)
if letter is not None)
def main():
matrix = [[letter for letter in row.strip()]
for row in matrix_string.strip().split('\n')]
count, letter = max(
max(longest_repetition(row) for row in matrix),
max(longest_repetition(col) for col in izip_longest(*matrix))
)
print letter, count
if __name__ == '__main__':
main()
Since you've updated the requirement here is a recursive version of the code with some explanations. If it were not an assignment and this task came up in some real life problem, you should really have used the first version.
matrix_string = """
afaaf
rbaca
rlaff
"""
def find_longest_repetition(matrix):
rows = len(matrix)
cols = len(matrix[0])
# row, col - row and column of the current character.
# direction - 'h' if we are searching for repetitions in horizontal direction (i.e., in a row).
# 'v' if we are searching in vertical direction.
# result - (count, letter) of the longest repetition we have seen by now.
# This order allows to compare results directly and use `max` to get the better one
# current - (count, letter) of the repetition we have seen just before the current character.
def recurse(row, col, direction, result, current=(0, None)):
# Check if we need to start a new row, new column,
# new direction, or finish the recursion.
if direction == 'h': # If we are moving horizontally
if row >= rows: # ... and finished all rows
return recurse( # restart from the (0, 0) position in vertical direction.
0, 0,
'v',
result
)
if col >= cols: # ... and finished all columns in the current row
return recurse( # start the next row.
row + 1, 0,
direction,
result
)
else: # If we are moving vertically
if col >= cols: # ... and finished all columns
return result # then we have analysed all possible repetitions.
if row >= rows: # ... and finished all rows in the current column
return recurse( # start the next column.
0, col + 1,
direction,
result
)
# Figure out where to go next in the current direction
d_row, d_col = (0, 1) if direction == 'h' else (1, 0)
# Try to add current character to the current repetition
count, letter = current
if matrix[row][col] == letter:
updated_current = count + 1, letter
else:
updated_current = 1, matrix[row][col]
# Go on with the next character in the current direction
return recurse(
row + d_row,
col + d_col,
direction,
max(updated_current, result), # Update the result, if necessary
updated_current
)
return recurse(0, 0, 'h', (0, None))
def main():
matrix = [[letter for letter in row.strip()]
for row in matrix_string.strip().split('\n')]
count, letter = find_longest_repetition(matrix)
print letter, count
if __name__ == '__main__':
main()
You can also try the collections.Counter(string).most_common() to get the most repetitions of a character.
from collections import Counter
string_matrix = """
afaaf
rbaca
rlaff
"""
def GetMostRepetitions(pos):
mc = []
for ii in range(pos[0],len(working_mat)):
mc.extend(Counter(working_mat[ii]).most_common(1))
for jj in range(pos[1],len(working_mat[0])):
column = []
for kk in range(ii,len(working_mat)):
column.append(working_mat[kk][jj])
mc.extend(Counter(column).most_common(1))
m = 0
for item in mc:
if item[1] > m:
m = item[1]
k = item[0]
print(k, m)
working_mat = string_matrix.strip().split('\n')
for ii in range(len(working_mat)):
for jj in range(len(working_mat[0])):
pos = (ii,jj)
GetMostRepetitions(pos)
As Kolmar said, you can also use a better way to transpose the matrix.
Related
I'm trying to make a spiral matrix, but when I tried to make the numbers appear one by one, it only shows the lines one by one.
Please help!
import numpy as np
from time import sleep
n = int(input("Width : "))
k = int(input("Space : "))
a = np.zeros((n,n))
print(a/n)
i = 0 #i line
j = 0 #j column
it = -1 #it upper line
id = n #id downer line
jt = -1 #jt left column
jp = n #jp right column
x = k #x starter number
while j < jp:
while j < jp:
a[i][j] = x
x += k
j +=1
it +=1
i=it+1
j=jp-1
while i< id:
a[i][j] = x
x += k
i +=1
jp -=1
j=jp-1
i=id-1
while j > jt:
a[i][j] = x
x += k
j -=1
id -=1
i=id-1
j=jt+1
while i>it:
a[i][j] = x
x += k
i -=1
jt +=1
i=it+1
j=jt+1
for x in a:
print(x)
sleep(0.1)
Here's an example:
Each number is suppose to appear one by one.
(I'm just putting this here so I can post this since I need to add more details)
Not quite trivial.
I found a solution using an empty character array and cursor-manipulation.
This should result in the desired output:
# replace your last "for x in a" loop by the following
def print_array(arr, block_size=4):
"""
prints a 2-D numpy array in a nicer format
"""
for a in arr:
for elem in a:
print(elem.rjust(block_size), end="")
print(end="\n")
# empty matrix which gets filled with rising numbers
matrix_to_be_filled = np.chararray(a.shape, itemsize=3, unicode=True)
for _ in range(a.size):
# find position of minimum
row, col = np.unravel_index(a.argmin(), a.shape)
# add minimum at correct position
matrix_to_be_filled[row, col] = f'{a.min():.0f}'
# write partially filled matrix
print_array(matrix_to_be_filled)
# delete old minimum in a-matrix
a[row, col] = a.max()+1
sleep(0.1)
# bring cursor back up except for last element
if _ < a.size-1:
print('\033[F'*(matrix_to_be_filled.shape[0]+1))
If you are using pycharm, you need to edit the run/debug configuration and active "emulate terminal in output console" to make the special "move up" character work
Given two matrices A and B.
Is there efficient or more pythonic way to find if B is a Sub-Matrix of A?
The code below is worked but I got a Time limit exceeded when the matrix is so big!
Note: The matrix data structure is represented as an array of strings.
[Input]
A=[
'7283455864',
'6731158619',
'8988242643',
'3830589324',
'2229505813',
'5633845374',
'6473530293',
'7053106601',
'0834282956',
'4607924137']
B=[
'9505',
'3845',
'3530']
[Code]
def gridSearch(G, P):
# Write your code here
n=len(G) # Matrix G row number
m=len(G[0]) # Matrix G col number
r=len(P) # Matrix P row number
c=len(P[0]) # Matrix P col number
found='NO'
for i in range(n-r+1):
for j in range(m-c+1):
start_row=i
end_row=i+r
start_col=j
end_col=j+c
sub=[]
if G[start_row][start_col]==P[0][0]:
sub=[x[start_col:end_col] for x in G[start_row:end_row] ]
if (sub==P):
found='YES'
#print(i,j)
return(found)
[Expected Output]
YES
Rather than search on a letter by letter basis as you seem to be doing with your code, I would utilize the ability to search for strings within another string as follows:
def gridSearch(G, P):
g_sze = len(G)
p_sze = len(P)
p_ptr = 0
for g_ptr, g_row in enumerate(G):
if P[p_ptr] in g_row:
p_ptr += 1
if p_ptr == p_sze:
return True
else:
p_ptr = 0
if g_sze - g_ptr < p_sze-p_ptr:
return False
return False
The above code makes use of two efficiency approaches, first it searches the arrays by rows and secondly it stops searching rows when the ability to match the remaining rows of the smaller array is no longer possible
Given your second example, here is an approach which uses the string search approach and requires column alignment.
def gridSearch2(G, P):
g_sze = len(G)
p_sze = len(P)
candidates = []
# First scan for possible start postions
for row in range(g_sze - p_sze + 1):
idx = 0
while idx < g_sze - p_sze:
ptr = G[row].find(P[0], idx)
if ptr < 0:
break
candidates.append((row, ptr+idx))
idx = idx + ptr + 1
# test each possible start postion for matching follow on rows
while len(candidates) > 0:
row, idx = candidates.pop(0);
rslt = True
for pr in range(1, p_sze):
if G[row + pr].find(P[pr], idx) != idx:
rslt = False
break
if rslt:
return True
return False
The aim of this program is to build a 3x3 matrix which then reduces additional rows, but, for some reason, after the second row is added to M in the while loop, it replaces it with the new row, rather than adding a third row, and, then, reducing additional (most likely 3) vectors after that. Here's the code:
from sympy import *
init_printing(use_unicode= True)
A = []
def reduceOneRow(M):
k = 0
for i in range(k,min(M.shape)-1):
if M[i,i]!=0 or i ==2:
for j in range(k,min(M.shape)-1):
T = Matrix([M.row(j+1)-(M[j+1,i]/M[i,i])*M.row(i)])
A.append(M[j+1]/M[i,i])
M.row_del(j+1)
M = M.row_insert(j+1,T)
k = k+1
else:
i = i+1
return M
# M = Matrix([[1,1,1],[1,4,7],[3,2,5]])
# reduceOneRow(M)
# A
#The following block of code generates a list of monomials, but not in reverse
#lexicagraphical order. This can be fixed later. Ultimately, I'd like to
#make it it's own function
sigma = symbols('x1:4')
D = [1]
for d in D:
for s in sigma:
if s*d not in D:
D.append(s*d)
if len(D) > 20:
break
# print(D)
# print(D[9].subs([('x1',4),('x2',2),('x3',3)]))
#We begin with the set up described in C1
P = [(1,2,3),(4,5,6),(7,8,9)]
G = []
Q = []
S = []
L = [1]
M = Matrix([])
#Here we being step C2.
while L != []:#what follows this while statement is the loop C2-C5 and back
t = L[0]
L.remove(L[0])
K = Matrix([]) #K is a kind of bucket matrix
if t==1: #this block generates the firs line in M. It had to be separate
for j in range(len(P)):#because of the way sympy works. 1 is int, rather
K = K.col_insert(j,Matrix([1])) #than a symbol
else: #here we generate all other rows of M, using K for the name of the rows
for p in P:
K = K.col_insert(0,Matrix([t.subs([(sigma[0],p[0]),(sigma[1],p[1]),(sigma[2],p[2])])]))
# K = K.col_insert(i,Matrix([t.subs([(sigma[0],p[0]),(sigma[1],p[1]),(sigma[2],p[2])]))
M = M.row_insert(min(M.shape)+1,K) #K gets added to M
M
A = []
reduceOneRow(M)#row reduces M and produces the ai in C3
sum = 0
for n in range(len(A)):
sum = sum + A[n]*S[n]
V = M.row(-1)
if V == zeros(1,len(V)):
G.append(t - sum)
M.row_del(-1)
else:
S.append(t-sum)
Q.append(t)
for i in range(1,4):
#if not t*D[i] == Q[0]:
L.append(t*D[i])
L
print('G =',' ',G,' ','Q =',Q)
I figure it out. I changed 'reduceRowOne(M)' to 'M = reduceRowOne'. Ugh.
Thank you all who took a look at this!
I need to write a function to find the smallest window that contains all the elements in an array. Below is what I have tried:
def function(item):
x = len(set(item))
i = 0
j = len(item) - 1
result = len(item)
while i <= j:
if len(set(item[i + 1: j + 1])) == x:
result = min(result, len(item[i + 1: j + 1]))
i += 1
elif len(set(item[i:j])) == x:
result = min(result, len(item[i:j]))
j -= 1
else:
return result
return result
print(function([8,8,8,8,1,2,5,7,8,8,8,8]))
The time complexity is in O(N^2), Can someone help me to improve it to O(N) or better? Thanks.
You can use the idea from How to find smallest substring which contains all characters from a given string? for this specific case and get a O(N) solution.
Keep a counter for how many copies of each unique number is included in the window and move the end of the window to the right until all unique numbers are included at least once. Then move the start of the window until one unique number disappears. Then repeat:
from collections import Counter
def smallest_window(items):
element_counts = Counter()
n_unique = len(set(items))
characters_included = 0
start_enumerator = enumerate(items)
min_window = len(items)
for end, element in enumerate(items):
element_counts[element] += 1
if element_counts[element] == 1:
characters_included += 1
while characters_included == n_unique:
start, removed_element = next(start_enumerator)
min_window = min(end-start+1, min_window)
element_counts[removed_element] -= 1
if element_counts[removed_element] == 0:
characters_included -= 1
return min_window
>>> smallest_window([8,8,8,8,1,2,5,7,8,8,8,8])
5
This problem can be solved as below.
def lengthOfLongestSublist(s):
result = 0
#set a dictionary to store item in s as the key and index as value
d={}
i=0
j=0
while (j < len(s)):
#if find the s[j] value is already exist in the dictionary,
#move the window start point from i to i+1
if (s[j] in d):
i = max(d[s[j]] + 1,i)
#each time loop, compare the current length of s to the previouse one
result = max(result,j-i+1)
#store s[j] as key and the index of s[j] as value
d[s[j]] = j
j = j + 1
return result
lengthOfLongestSubstring([8,8,8,8,8,5,6,7,8,8,8,8,])
Output: 4
Set a dictionary to store the value of input list as key and index
of the list as the value. dic[l[j]]=j
In the loop, find if the current value exists in the dictionary. If
exist, move the start point from i to i + 1.
Update result.
The complexity is O(n).
I am having difficulty getting this scoring function to work. The objective of my program is to make a t x n matrix and find a consensus sequence.
I keep getting a error :
TypeError: 'int' object is not subscriptable.
Any help would be appreciated.
def Score(s, i, l, dna):
t = len(dna) # t = number of dna sequences
# Step 1: Extract the alignment corresponding to starting positions in s
alignment = []
for j in range(0, i):
alignment.append(dna[j][s[j]:s[j]+l])
# Step 2: Create the corresponding profile matrix
profile = [[],[],[],[]] # prepare an empty 4 x l profile matrix first
for j in range(0, 4):
profile[j] = [0] * l
for c in range(0, l): # for each column number c
for r in range(0, i): # for each row number r in column c
if alignment[r][c] == 'a':
profile[0][c] = profile[0][c] + 1
elif alignment[r][c] == 't':
profile[1][c] = profile[1][c] + 1
elif alignment[r][c] == 'g':
profile[2][c] = profile[2][c] + 1
else:
profile[3][c] = profile[3][c] + 1
# Step 3: Compute the score from the profile matrix
score = 0
for c in range(0, l):
score = score + max([profile[0][c], profile[1][c], profile[2][c], profile[3][c]])
return score
Is your variable dna a dictionary,
if so use def Score(s, i, l, **dna)
If it is int variable, you can't access it as dna[j][s[j]:s[j]+l]