Search for unknown numbers in a txt and plot them - python

Recently, I started to evaluate some data with Python. However, it seems complicated to evaluate and manipulate my recorded data.
For instance, my .txt file consists of:
1551356567 0598523403 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1551356567 0598523436 0000003362 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
1551356567 0598523469 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1551356567 0598523502 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1551356567 0598523535 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1551356567 0598523766 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1551356567 0598523799 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1551356567 0598523832 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1551356567 0598523865 0000003314 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
1551356567 0598523898 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1551356567 0598523931 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1551356568 0598524756 0000003384 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
The important values are only the third column (with 3362) and the first one (1551...), whereby the third column should be the x axis and the first the y axis. Only the lines with a value not equal to 0 are important. The idea is to create a loop which searches for values in the third column, and if there is a value != 0, then this value should be saved in a x-list (x) and the corresponding y value in a y-list (y).
Currently my script to read and manipulate the data looks like this:
import numpy as np
rawdata = np.loadtxt("file.txt")
num_lines = sum(1 for line in open("file.txt"))
with open("file.txt") as hv:
line = hv.readline()
x = list()
y = list()
i = 1
j = 0
while line != num_lines:
if rawdata[j][2] != 0:
x = x.append(rawdata[j][2])
y = x.append(rawdata[j][0])
else:
j += 1
if i == num_lines:
break
i += 1
print(x)
print(y)
I think there are some local and global variable problems but I couldn't solve them to lets say "update" my lists with the new values. At the end there should be a list with only:
[3362, 3314, 3384] for x and
[1551356567, 1551356567, 1551356568] for y
Do you have any suggestions how I can "update" my list?

As you read each line, split it on whitespace and convert each column to integers:
x = []
y = []
with open('file.txt') as f:
for line in f:
data = [int(col) for col in line.split()]
if data[2] != 0:
x.append(data[2])
y.append(data[0])
print(x)
print(y)
Output:
[3362, 3314, 3384]
[1551356567, 1551356567, 1551356568]

Related

Tokenizer keras.preprocessing.text always returns a list of many zeros

I created a simple neural network that has 9 labels to determan hatespeech comments.
i used the glove.6B.100d as a base for my AI.
now after i build the model i wanted to predict a scentence but after i use the tokenizer.texts_to_sequence("Text") the array is just"[[]]"
here is the code for the prediction
import tensorflow as tf
from keras.preprocessing.text import Tokenizer
Z = []
#toxic_comments is a csv with some comments and the labels
sentences = list(toxic_coments["Text"])
for sen in sentences:
Z.append(preprocess_text(str(sen)))
tokenizer = Tokenizer(num_words=5000)
tokenizer.fit_on_texts(Z)
X = tokenizer.texts_to_sequences(['Naughty Comment']) #<= Bad Test Comment
X = pad_sequences(X, padding='post', maxlen=maxlen)
print(X)
labels = ["Reject Newspaper","Reject Crowd","Rejection Count Crowd","Sexism Count Crowd","Threat Count Crowd","Insult Count Crowd","Profanity Count Crowd","Meta Count Crowd","Advertisement Count Crowd"]
pred = model.predict(X)
for i, p in enumerate(pred[0]):
print(labels[i], p)
The Output:
[[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]
Reject Newspaper 0.08198401
Reject Crowd 0.044294477
Rejection Count Crowd 0.31464878
Sexism Count Crowd 0.015070438
Threat Count Crowd 0.0095721185
Insult Count Crowd 0.17211622
Profanity Count Crowd 0.07748076
Meta Count Crowd 0.06826261
Advertisement Count Crowd 0.017438024
what am i doing wrong here? Why is it always zeros no matter what i put in the Text field

Python keeps updating wrong list

I'm trying to create a code to simulate the spread of something, via a 2D list of nxn structure. My issue is this: when I create a temp of my original list via temp = [*board], board[:], etc. it nonetheless updates both lists and instead of returning,
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 1 1 1 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
returns
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
my code is here:
def spread(board, iterations, size):
temp = board[:]
for iteration in range(iterations):
for x in range(size):
for y in range(size):
if board[x][y] == 1:
if x+1 < size:
temp[x+1][y] = 1
if x-1 >= 0:
temp[x-1][y] = 1
if y+1 < size:
temp[x][y+1] = 1
if y-1 >= 0:
temp[x][y-1] = 1
board = temp[:]
return board
and I called it via
new_board = spread(my_board, 1, 15)
This is programming 101. Remember, lists are stored in heap, with pointers to them.
So really the variable board points to the place in heap where that array is stored. When you assign temp to board, what you are doing is creating a new pointer which points to that same array. I suggest taking a look at this using python tutor: https://pythontutor.com/visualize.html#mode=edit
For example:
b = [1,2,3,4,5]
a = b
a[0] = 2
print(b)
will output
[2,2,3,4,5]
Try it out in python tutor and you'll see what's happening!
To solve your problem, create a deep copy
def deep_copy(board):
temp = []
for i in range(len(board)):
row_copy = []
for j in range(len(board[0])):
row_copy.append(board[i][j])
temp.append(row_copy)
return temp

Storing integers from text file to an array

I have a task to assign integers from a text file to an array in python.
I tried reading by lines and splitting, but none worked.
The task goes like this: we have an array
1 4 5 7 3 2 8 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 4 0 0 0 0 0
0 0 0 0 0 0 0 0 0 3 0 0 0 0
0 0 0 0 0 0 0 5 0 0 6 0 0 0
0 0 0 0 0 0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 0 0 0 0 0 2 9 0
0 0 0 0 0 0 0 0 0 0 10 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 11
0 0 0 0 0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 0 0 0 0 8
0 0 0 0 0 0 0 0 0 0 0 0 0 9
0 0 0 0 0 0 0 0 0 0 0 0 0 14
0 0 0 0 0 0 0 0 0 0 0 0 0 5
0 0 0 0 0 0 0 0 0 0 0 0 0 0
and this needs to be assigned to an array x in order to use it in further functions.
Do something like:
with open('my_raw_file.txt', 'r') as file:
all_file = file.read().strip() # Read and remove any extra new line
all_file_list = all_file.split('\n') # make a list of lines
final_data = [[int(each_int) for each_int in line.split()] for line in all_file_list] # make list of list and convert to int
print(final_data)
if you don't mind numpy arrays and pandas:
import pandas as pd
integers = pd.read_csv('test.txt', sep=" ", header=None)

Find nearest item given an angle in 2D numpy array

Given a numpy 2D array, what would be the best way to get the nearest item (for this example '1') from a specified coordinate (where 'X' is located) given an angle.
For example, lets say we have 'X' located at (1,25) in a 2D array shown below. Say with an angle 225 degrees, assuming 0 degrees goes straight to the right and 90 degrees goes straight up. How can I get the nearest coordinate of a '1' located towards that vector direction?
[
0000000000000000000000000000
0000000000000000000000000X00
0000000000000000000000000000
1110000000000000000000000000
1111100000000000000000000000
1111110000000000000000000000
1111111000000000000000000000
1111111110000000000000000000
1111111111100000000000000000
]
I'm assuming by towards that direction you mean something like on that ray. In that case 255° has no solution so I took the liberty of changing that to 195°.
You could then brute-force it:
import numpy as np
a = """
0000000000000000000000000000
0000000000000000000000000X00
0000000000000000000000000000
1110000000000000000000000000
1111100000000000000000000000
1111110000000000000000000000
1111111000000000000000000000
1111111110000000000000000000
1111111111100000000000000000
"""
a = np.array([[int(i) for i in row] for row in a.strip().replace('X', '2').split()], dtype=np.uint8)
x = np.argwhere(a==2)[0]
y = np.argwhere(a==1)
d = y-x
phi = 195 # 255 has no solutions
on_ray = np.abs(d#(np.sin(np.radians(-phi-90)), np.cos(np.radians(-phi-90))))<np.sqrt(0.5)
show_ray = np.zeros_like(a)
show_ray[tuple(y[on_ray].T)] = 1
print(show_ray)
ymin=y[on_ray][np.argmin(np.einsum('ij,ij->i', d[on_ray], d[on_ray]))]
print(ymin)
Output:
# [[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
# [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
# [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
# [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
# [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
# [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
# [0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
# [1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
# [1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]
# [6 6]

Python: How to read one element first and followed by two elements?

I would like to scan through sequences and return the value either 1 or 0 to indicate whether they are present or absent. For example: XYZXYZ
X Y Z X Y Z
1 0 0 1 0 0 - X
0 1 0 0 1 0 - Y
0 0 1 0 0 1 - Z
0 0 0 0 0 0 - XX
1 1 0 1 1 0 - XY
0 0 0 0 0 0 - XZ
0 0 0 0 0 0 - YX
0 0 0 0 0 0 - YY
0 1 1 0 1 1 - YZ
0 0 1 1 0 1 - ZX
0 0 0 0 0 0 - ZY
0 0 0 0 0 0 - ZZ
For two elements like XY, while scanning two elements at position X it will be given value one and when scanning at position Y, it will be given value one as well.
The example code below only scans one element at a time. When I replaced this line of code,
CHARS = ['X','Y','Z']
to
CHARS = ['X','Y','Z','XX','XY','XZ',...,'ZZ']
It can't read two elements.
The code below returns binary values in one line starting from X first and then Y and then followed by Z.
import numpy as np
seqs = ["XYZXYZ","YZYZYZ"]
CHARS = ['X','Y','Z']
CHARS_COUNT = len(CHARS)
maxlen = max(map(len, seqs))
res = np.zeros((len(seqs), CHARS_COUNT * maxlen), dtype=np.uint8)
for si, seq in enumerate(seqs):
seqlen = len(seq)
arr = np.chararray((seqlen,), buffer=seq)
for ii, char in enumerate(CHARS):
res[si][ii*seqlen:(ii+1)*seqlen][arr == char] = 1
print res
Example output of the code above:
[[1 0 0 1 0 0 0 1 0 0 1 0 0 0 1 0 0 1]
[0 0 0 0 0 0 1 0 1 0 1 0 0 1 0 1 0 1]]
How to enable it scan one element first and then followed by two elements?
Expected output:
[[1 0 0 1 0 0 0 1 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 1 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 1 0 1 0 1 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0]]
I'm not sure if I completely get all the details, but this is what I'd do
seqs = ['xyzxyz', 'yzyzyz']
chars = ['x','y','z','xx','xy','xz','yx','yy','yz','zx','zy','zz']
N = len(chars)
out = []
for i, seq in enumerate(seqs):
M = len(seq) # if different seqs have different lenghts, this will break!
tmp = np.array([], dtype=int)
for c in chars:
o = np.array([0]*M)
index = -1
try:
while True:
index = seq.index(c, index+1)
o[index:(index+len(c))] = 1
except ValueError:
pass
finally:
tmp = np.r_[tmp, o]
out.append(tmp)
out = np.array(out)

Categories