I need to check if the numbers in gradescale is in my NxM matrix as a numpy array, if example the number 8 is in my matrix, I would like to append the number to a empty list and the row number to another list
So how do i check if the number in my matrix isn't in gradescale, i have tried different types of loops, but they dont work.
wrongNumber = []
Rows = []
gradeScale = np.array([-3,0,2,4,7,10,12])
if there is a number i matrix which is not i gradeScale
wrongNumber.append[number]
Rows.append[rownumber]
print("the grade {} in line {} is out of range",format(wrongNumber),
format(Rows))
You can use numpy.ndarray.shape to go through your rows.
for row in range(matrix.shape[0]):
for x in matrix[row]:
if x not in gradeScale:
wrongNumber.append(x)
Rows.append(row)
In addition, you do not use format correctly. Your print statement should be
print("The grade {} in line {} is out of range".format(wrongNumber, Rows))
The following post has some more information on formatting String formatting in Python .
Example
import numpy as np
wrongNumber = []
Rows = []
matrix = np.array([[1,2],[3,4],[5,6],[7,8]])
gradeScale = [1,3,4,5,8]
for row in range(matrix.shape[0]):
for x in matrix[row]:
if x not in gradeScale:
wrongNumber.append(x)
Rows.append(row)
print("The grades {} in lines {} (respectively) are out of range.".format(wrongNumber, Rows))
Output
The grades [2, 6, 7] in lines [0, 2, 3] (respectively) are out of range
Probably a for loop with enumerate() is what you are looking for.
Example:
for rowNumber, number in enumerate(matrix)
if number not in gradeScale:
wrongNumber.append[number]
Rows.append[rowNumber]
Related
I want to find all combinations of numbers {0,1,2,3,4}, and print them out in a table, with the first column being the order number and the second column being a particular combination. My desired output should take the following form:
1 (0,)
2 (1,)
... ...
6 (0,1)
... ...
I tried the following codes
import numpy as np
import itertools
rows=list(range(5))
combrows=[]
for k in range(1,5): #the number of rows k takes values from 1 to 5
for combo in itertools.combinations(rows,k):
combrows.append(combo)
ind=1
store=[]
for i in combrows:
store.append([[ind],[i]])
ind=ind+1
print(store)
But the resulting table is a horizontal line instead of a 2D rectangular table with two columns. How could I fix this? Thanks!
This is a quite straightforward solution:
from itertools import combinations
numbers = list(range(5))
lst = []
for l in (combinations(numbers, r) for r in range(1, 5)):
lst.extend(l)
for i, j in enumerate(lst):
print(i+1, j)
Try it online!
enumerate generates the line numbers automatically.
I have a function to build adjacency matrix. I want to improve matrix readability for humans, so I decided to print row index like this:
Now I want to print column index in the same way, but I can't do it properly. best result I get is this:
Any Ideas and suggestions how i can print column indexes neatly?
Source code here.
def generate_adjacency_matrix(vertices):
# Create empty Matrix
matrix = [['.' for _ in range(len(vertices))] for _ in range(len(vertices))]
# Fill Matrix
for row in range(len(matrix)):
for num in range(len(matrix)):
if num in vertices[row]:
matrix[row][num] = '1'
# Print column numbers
numbers = list(range(len(matrix)))
for i in range(len(numbers)):
numbers[i] = str(numbers[i])
print(' ', numbers)
#Print matrix and row numbers
for i in range(len(matrix)):
if len(str(i)) == 1:
print(str(i) + ' ', matrix[i])
else:
print(i, matrix[i])
If it matters Parameter in my function is a dictionary that looks like:
{0:[1],
1:[0,12,8],
2:[3,8,15]
....
20:[18]
}
If you know you're only going to 20, then just pad everything to 2 chars:
For the header row:
numbers[i] = str(numbers[i].zfill(2))
For the other rows, set to ". " or ".1" or something else that looks neat.
That would seem to be the easiest way.
Alternative way is to have 2 column headers, one above the other, first one is the tens value, second is the unit value. That allows you to keep the width of 1 in the table as well, which maybe you need.
I have a very large square matrix of order around 570,000 x 570,000 and I want to power it by 2.
The data is in json format casting to associative array in array (dict inside dict in python) form
Let's say I want to represent this matrix:
[ [0, 0, 0],
[1, 0, 5],
[2, 0, 0] ]
In json it's stored like:
{"3": {"1": 2}, "2": {"1": 1, "3": 5}}
Which for example "3": {"1": 2} means the number in 3rd row and 1st column is 2.
I want the output to be the same as json, but powered by 2 (matrix multiplication)
The programming language isn't important. I want to calculate it the fastest way (less than 2 days, if possible)
So I tried to use Numpy in python (numpy.linalg.matrix_power), but it seems that it doesn't work with my nested unsorted dict format.
I wrote a simple python code to do that but I estimated that it would take 18 days to accomplish:
jsonFileName = "file.json"
def matrix_power(arr):
result = {}
for x1,subarray in arr.items():
print("doing item:",x1)
for y1,value1 in subarray.items():
for x2,subarray2 in arr.items():
if(y1 != x2):
continue
for y2,value2 in subarray2.items():
partSum = value1 * value2
result[x1][y2] = result.setdefault(x1,{}).setdefault(y2,0) + partSum
return result
import json
with open(jsonFileName, 'r') as reader:
jsonFile = reader.read()
print("reading is succesful")
jsonArr = json.loads(jsonFile)
print("matrix is in array form")
matrix = matrix_power(jsonArr)
print("Well Done! matrix is powered by 2 now")
output = json.dumps(matrix)
print("result is in json format")
writer = open("output.json", 'w+')
writer.write(output)
writer.close()
print("Task is done! you can close this window now")
Here, X1,Y1 is the row and col of the first matrix which then is multiplied by the corresponding element of the second matrix (X2,Y2).
Numpy is not the problem, you need to input it on a format that numpy can understand, but since your matrix is really big, it probably won't fit in memory, so it's probably a good idea to use a sparse matrix (scipy.sparse.csr_matrix):
m = scipy.sparse.csr_matrix((
[v for row in data.values() for v in row.values()], (
[int(row_n) for row_n, row in data.items() for v in row],
[int(column) for row in data.values() for column in row]
)
))
Then it's just a matter of doing:
m**2
now I have to somehow translate csr_matrix back to json serializable
Here's one way to do that, using the attributes data, indices, indptr - m is the csr_matrix:
d = {}
end = m.indptr[0]
for row in range(m.shape[0]):
start = end
end = m.indptr[row+1]
if end > start: # if row not empty
d.update({str(1+row): dict(zip([str(1+i) for i in m.indices[start:end]], m.data[start:end]))})
output = json.dumps(d, default=int)
I don't know how it can hold csr_matrix format but not in dictionary. d.update gives MemoryError after some time
Here's a variant which doesn't construct the whole output dictionary and JSON string in memory, but prints the individual rows directly to the output file; this should need considerably less memory.
#!/usr/bin/env python3
…
import json
import sys
sys.stdout = open("output.json", 'w')
delim = '{'
end = m.indptr[0]
for row in range(m.shape[0]):
start = end
end = m.indptr[row+1]
if end > start: # if row not empty
print(delim, '"'+str(1+row)+'":',
json.dumps(dict(zip([str(1+i) for i in m.indices[start:end]], m.data[start:end])), default=int)
)
delim = ','
print('}')
I am trying to compare a variable to the values that are stored in an array. The values in the array are extracted out from a csv file. If the values of the array are equal to the variable it will print out true.
import csv
array=[]
values = csv.reader(open('SampleEEG data Insight-1-30.11.15.17.36.16.csv', 'r'),
delimiter=',',
quotechar='|')
for row in values:
array.append(row[5])
number= 4200
for a in array:
if number == a:
print ('True')
print ('False')
The code only compares one value in the array and returns a false. How do I compare all the values in the array to the variable?
Use the all function with list comprehensions
number = 10
array = [1, 2, 3, 4]
print( all(number == a for a in array) )
# False
array = [10, 10, 10, 10]
print( all(number == a for a in array) )
# True
You can use all() - builtin function
all (number == a for a in array)
From what I could figure out from your comment, this is probably what you are looking for:
array=[]
with open('SampleEEG data Insight-1-30.11.15.17.36.16.csv', 'r') as file:
lines = [line.split() for line in file.readlines()]
for line in lines:
try:
array.append(float(line[5]))
except ValueError:
pass
number= 4200
for a in array:
if number == a:
print ('True')
print ('Done, all checked')
Because it is exiting from the loop after it hits the first true value. Use the following code:
for i in array:
if number == i:
print ('True')
else:
print ('False')
I have a protein sequence file looks like this:
>102L:A MNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNAAAKSELDKAIGRNTNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYKNL -------------------------------------------------------------------------------------------------------------------------------------------------------------------XX
The first one is the name of the sequence, the second one is the actual protein sequence, and the first one is the indicator that shows if there is any missing coordinates. In this case, notice that there is two "X" in the end. That means that the last two residue of the sequence witch are "NL" in this case are missing coordinates.
By coding in Python I would like to generate a table which should look like this:
name of the sequence
total number of missing coordinates (which is the number of X)
the range of these missing coordinates (which is the range of the position of those X)
4)the length of the sequence
5)the actual sequence
So the final results should looks like this:
>102L:A 2 163-164 164 MNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNAAAKSELDKAIGRNTNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYKNL
And my code looks like this so far:
total_seq = []
with open('sample.txt') as lines:
for l in lines:
split_list = l.split()
# Assign the list number
header = split_list[0] # 1
seq = split_list[1] # 5
disorder = split_list[2]
# count sequence length and total residue of missing coordinates
sequence_length = len(seq) # 4
for x in disorder:
counts = 0
if x == 'X':
counts = counts + 1
total_seq.append([header, seq, str(counts)]) # obviously I haven't finish coding 2 & 3
with open('new_sample.txt', 'a') as f:
for lol in total_seq:
f.write('\n'.join(lol))
I'm new in python, would anyone help please?
Here's your modified code. It now produces your desired output.
with open("sample.txt") as infile:
matrix = [line.split() for line in infile.readlines()]
header_list = [row[0] for row in matrix]
seq_list = [str(row[1]) for row in matrix]
disorder_list = [str(row[2]) for row in matrix]
f = open('new_sample.txt', 'a')
for i in range(len(header_list)):
header = header_list[i]
seq = seq_list[i]
disorder = disorder_list[i]
# count sequence length and total residue of missing coordinates
sequence_length = len(seq)
# get total number of missing coordinates
num_missing = disorder.count('X')
# get the range of these missing coordinates
first_X_pos = disorder.find('X')
last_X_pos = disorder.rfind('X')
range_missing = '-'.join([str(first_X_pos), str(last_X_pos)])
reformat_seq=" ".join([header, str(num_missing), range_missing, str(sequence_length), seq, '\n'])
f.write(reformat_seq)
f.close()
Some more tips:
Don't forget about python's string functions. They will solve a lot of your problems automatically. The documentation is very good.
If you searched for how to do just part 2 or just part 3 in your question, you would find the results elsewhere.