Related
Suppose I have a tf tensor with indices for two samples:
x = [[2,3,5], [5,7,5]]
I would like to create a tensor with a certain shape (samples, 10), where the indices of each sample in x are set to 1 and the rest to 0 like this:
output = [[0, 0, 1, 1, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 1, 0, 0]]
What is the best way to do this, without creating a lot of intermediary matrices?
The closest I got was using tf.scatter_nd, but I couldn't figure out how to transform x and the updates correctly, except manually adding additional information like this:
>>> tf.cast(tf.scatter_nd([[0,2], [0,3], [0,5], [1,5], [1,7], [1,5]], [1, 1, 1, 1, 1, 1] ,
[2, 10]) > 0, dtype="int64")
<tf.Tensor: id=1191, shape=(2, 10), dtype=int64, numpy=
array([[0, 0, 1, 1, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 1, 0, 0]])>
Also, this approach will aggregate duplicate indices at first, which makes an intermediary boolean matrix necessary. (This I could live with though, the main problem is getting from x to a matrix with shape (samples, 10) where non-existent indices are 0 for each sample.)
Thanks for any help! :)
I found a solution (tensorflow 2.2.0):
class BinarizeSequence(tf.keras.layers.Layer):
"""
Transforms an integer sequence into a binary representation
with shape (samples, vocab_size).
Example:
In: [[2,3,5], [5,7,5]]
Out: [[0, 0, 1, 1, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 1, 0, 0]]
By default the output is returned as SparseTensor.
Use dense_output=True if you need a dense representation.
"""
def __init__(self, vocab_size, dense_output=False, **kwargs):
super(BinarizeSequence, self).__init__(**kwargs)
self.vocab_size = vocab_size
self.dense_output = dense_output
def get_config(self):
config = super().get_config().copy()
config.update(
{"vocab_size": self.vocab_size, "dense_output": self.dense_output}
)
return config
def call(self, x, mask=None):
# create indices for binarized representation
x = tf.cast(x, dtype=tf.int32)
x_1d = tf.reshape(x, [-1])
sample_dim = tf.repeat(
tf.range(tf.shape(x)[0], dtype=tf.int32), tf.shape(x)[1]
)
indices = tf.transpose(tf.stack([sample_dim, x_1d]))
# only keep unique indices
# (see https://stackoverflow.com/a/42245425/979377)
indices64 = tf.bitcast(indices, type=tf.int64)
unique64, idx = tf.unique(indices64)
unique_indices = tf.bitcast(unique64, type=tf.int32)
# build binarized representation
updates = tf.ones(tf.shape(unique_indices)[0])
output_shape = [tf.shape(x)[0], self.vocab_size]
if self.dense_output:
output = tf.scatter_nd(unique_indices, updates, output_shape)
else:
output = tf.sparse.SparseTensor(
tf.cast(unique_indices, tf.int64), updates, output_shape
)
return output
In Pytorch, you can hardcode your filters to be whatever you like.
At the moment, I'm doing text detection and I need to identify the location of a certain information. This information always starts with the letter 'X'. Could this radically improve detection performance if I hardcode the 'X' filter?
Here's what I have so far:
import torch
import torch.nn as nn
import matplotlib.pyplot as plt
kernel = (torch.zeros((9, 9)) + \
torch.eye(9) + \
torch.rot90(torch.eye(9))).type(torch.bool)*1
print(kernel)
tensor([[1, 0, 0, 0, 0, 0, 0, 0, 1],
[0, 1, 0, 0, 0, 0, 0, 1, 0],
[0, 0, 1, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 1, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 1, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 1, 0],
[1, 0, 0, 0, 0, 0, 0, 0, 1]])
We can visualize it like this:
plt.imshow(kernel)
plt.show()
Then, we can set the filter weights as such:
conv = nn.Conv2d(in_channels=1,
out_channels=1,
kernel_size=3,
stride=3,
bias=None)
conv.weight.data = kernel
No, I do not think this will improve detection performance.
Detection performance is usually known as "inference," that is, it is the process of running your network on new data where the training labels are unknown. Hard-coding the weights will make absolutely no difference on the test performance of the network, as you still need to compute the convolutions.
We could also ask if it will improve the training performance. Here, too, I expect that the answer is no. One of the reasons that neural networks achieve the high accuracy they do is that they pick up on subtle patterns in the training data. A real x on a real page is very unlikely to align with the pixels you set to 1 in your example. Slight rotations or sub-pixel shifts or even different aspect ratios of the letter will change what the optimal filter will look like.
Indeed, one of the major changes in Machine Learning as we move into the Deep Learning era is that neural networks do a better job of picking the low-level features than a human engineer could do.
But thank you for the question -- just the code snippet of how to hard-code the value of a layer was useful to me!
I need to calculate the eigenvalues of an 8x8-matrix and plot each of the eigenvalues for a symbolic variable occuring in the matrix. For the matrix I'm using I get 8 different eigenvalues where each is representing a function in "W", which is my symbolic variable.
Using python I tried calculating the eigenvalues with Scipy and Sympy which worked kind of, but the results are stored in a weird way (at least for me as a newbie not understanding much of programming so far) and I didn't find a way to extract just one eigenvalue in order to plot it.
import numpy as np
import sympy as sp
W = sp.Symbol('W')
w0=1/780
wl=1/1064
# This is my 8x8-matrix
A= sp.Matrix([[w0+3*wl, 2*W, 0, 0, 0, np.sqrt(3)*W, 0, 0],
[2*W, 4*wl, 0, 0, 0, 0, 0, 0],
[0, 0, 2*wl+w0, np.sqrt(3)*W, 0, 0, 0, np.sqrt(2)*W],
[0, 0, np.sqrt(3)*W, 3*wl, 0, 0, 0, 0],
[0, 0, 0, 0, wl+w0, np.sqrt(2)*W, 0, 0],
[np.sqrt(3)*W, 0, 0, 0, np.sqrt(2)*W, 2*wl, 0, 0],
[0, 0, 0, 0, 0, 0, w0, W],
[0, 0, np.sqrt(2)*W, 0, 0, 0, W, wl]])
# Calculating eigenvalues
eva = A.eigenvals()
evaRR = np.array(list(eva.keys()))
eva1p = evaRR[0] # <- this is my try to refer to the first eigenvalue
In the end I hope to get a plot over "W" where the interesting range is [-0.002 0.002]. For the ones interested it's about atomic physics and W refers to the rabi frequency and I'm looking at so called dressed states.
You're not doing anything incorrectly -- I think you're just caught up since your eigenvalues look so jambled and complicated.
import numpy as np
import sympy as sp
import matplotlib.pyplot as plt
W = sp.Symbol('W')
w0=1/780
wl=1/1064
# This is my 8x8-matrix
A= sp.Matrix([[w0+3*wl, 2*W, 0, 0, 0, np.sqrt(3)*W, 0, 0],
[2*W, 4*wl, 0, 0, 0, 0, 0, 0],
[0, 0, 2*wl+w0, np.sqrt(3)*W, 0, 0, 0, np.sqrt(2)*W],
[0, 0, np.sqrt(3)*W, 3*wl, 0, 0, 0, 0],
[0, 0, 0, 0, wl+w0, np.sqrt(2)*W, 0, 0],
[np.sqrt(3)*W, 0, 0, 0, np.sqrt(2)*W, 2*wl, 0, 0],
[0, 0, 0, 0, 0, 0, w0, W],
[0, 0, np.sqrt(2)*W, 0, 0, 0, W, wl]])
# Calculating eigenvalues
eva = A.eigenvals()
evaRR = np.array(list(eva.keys()))
# The above is copied from your question
# We have to answer what exactly the eigenvalue is in this case
print(type(evaRR[0])) # >>> Piecewise
# Okay, so it's a piecewise function (link to documentation below).
# In the documentation we see that we can use the .subs method to evaluate
# the piecewise function by substituting a symbol for a value. For instance,
print(evaRR[0].subs(W, 0)) # Will substitute 0 for W
# This prints out something really nasty with tons of fractions..
# We can evaluate this mess with sympy's numerical evaluation method, N
print(sp.N(evaRR[0].subs(W, 0)))
# >>> 0.00222190090611143 - 6.49672880062804e-34*I
# That's looking more like it! Notice the e-34 exponent on the imaginary part...
# I think it's safe to assume we can just trim that off.
# This is done by setting the chop keyword to True when using N:
print(sp.N(evaRR[0].subs(W, 0), chop=True)) # >>> 0.00222190090611143
# Now let's try to plot each of the eigenvalues over your specified range
fig, ax = plt.subplots(3, 3) # 3x3 grid of plots (for our 8 e.vals)
ax = ax.flatten() # This is so we can index the axes easier
plot_range = np.linspace(-0.002, 0.002, 10) # Range from -0.002 to 0.002 with 10 steps
for n in range(8):
current_eigenval = evaRR[n]
# There may be a way to vectorize this computation, but I'm not familiar enough with sympy.
evaluated_array = np.zeros(np.size(plot_range))
# This will be our Y-axis (or W-value). It is set to be the same shape as
# plot_range and is initally filled with all zeros.
for i in range(np.size(plot_range)):
evaluated_array[i] = sp.N(current_eigenval.subs(W, plot_range[i]),
chop=True)
# The above line is evaluating your eigenvalue at a specific point,
# approximating it numerically, and then chopping off the imaginary.
ax[n].plot(plot_range, evaluated_array, "c-")
ax[n].set_title("Eigenvalue #{}".format(n))
ax[n].grid()
plt.tight_layout()
plt.show()
And as promised, the Piecewise documentation.
I've been creating a solar system simulation as a project for fun and practice in python. The problem I'm facing is that storing the data for the planets in the .py itself is getting rather hectic. Example:
#shaped as: name, parent, type, size, orbital radius (AU), x, y, r, t, hidden, theta, orbitalperiod (y), color
#for type 0=sun, 1=planet, 2=moon, 3=asteroid (unused)
#x, y, r and t start as 0 and get assigned values later on. hidden is 0 or 1, if obscured by body
solsystem = [('sun', 'none', 0, 20, 0, centerx, centery, 0, 0, 0, 0, 0, (255, 255, 0)),
('earth', 'sun', 1, 2, 1, 0, 0, 0, 0, 0, 0, 1, (0, 0, 255)),
('luna', 'earth', 1, 1, 0.04, 0, 0, 0, 0, 0, 0, 0.075, (169,169,169)), #actual radius is 0.00254
('venus', 'sun', 1, 2, 0.675, 0, 0, 0, 0, 0, 0, 0.616, (255,255,0)),
('mercury', 'sun', 1, 2, 0.387, 0, 0, 0, 0, 0, 0, 0.24, (169,169,169)),
('mars', 'sun', 1, 2, 1.524, 0, 0, 0, 0, 0, 0, 1.88, (255, 0, 0)),
('jupiter', 'sun', 1, 4, 5.20, 0, 0, 0, 0, 0, 0, 11.86, (255, 0, 0)),
('io', 'jupiter', 1, 1, 0.08, 0, 0, 0, 0, 0, 0, 0.00484, (169,169,169)), #different radiuses for moons to keep visibility
('europa', 'jupiter', 1, 1, 0.12, 0, 0, 0, 0, 0, 0, 0.0097, (169,169,169)),
('ganymede', 'jupiter', 1, 1, 0.16, 0, 0, 0, 0, 0, 0, 0.0195, (169,169,169)),
('callisto', 'jupiter', 1, 1, 0.2, 0, 0, 0, 0, 0, 0, 0.0456, (169,169,169))]
This is what I'm currently add, and I plan on adding asteroids, more planets and moons, and all that stuff... What would be a better way to do this? To store the data in a more organized way, something like a spreadsheet perhaps so I could easily add more values if needed.
For reference, the full code: https://pastebin.com/L8n23bLt (It's working pretty decently but there's quite a few kinks and bugs I still want to work out. Any tips on stuff I'm doing wrong here are appreciated too!)
it's OK to store values like that if it's OK for you and your project.
I prefer to use library configparser or json files.
ConfigParser vs JSON files for config
Here is short example from my project:
def get_current_scenario_number():
"""
Get scenario number from temporary config.
:return:
"""
config = configparser.ConfigParser()
config.read('scenario_data.ini')
scenario_number = config['scenario_data']['scenario_config']
return int(scenario_number)
def set_current_scenario_number(scenario_number):
"""
Change scenario number in temporary config.
:param scenario_number:
:return:
"""
config = configparser.ConfigParser()
config.read('scenario_data.ini')
try:
config['scenario_data']['scenario_config'] = str(scenario_number)
except KeyError:
config.add_section('scenario_data')
config.set('scenario_data', 'scenario_config', str(scenario_number))
with open('scenario_data.ini', 'w') as configfile:
config.write(configfile)
Consider a simple csv or yaml file as the next step. Both will allow you to explicitly name the fields and read the elements as dictionaries. If maintaining a file by hand gets too cumbersome, consider sqlite.
There's a lot of options in this space that work well. At the most simple end of things would be a csv file (a text-based spreadsheet) storing all of your objects, which you could then read in and parse using the csv module of the standard library.
Your CSV file could look like this:
"sun","none",0,20,0,111,111,0,0,0,0,0,"(255,255,0)"
"earth","sun",1,2,1,0,0,0,0,0,0,1,"(0,0,255)"
"luna","earth",1,1,0.04,0,0,0,0,0,0,0.075,"(169,169,169)"
...
And the reading code like this:
import csv, ast
with open('test.csv') as csvfile:
reader = csv.reader(csvfile, quoting=csv.QUOTE_NONNUMERIC)
solsystem = []
for row in reader:
solsystem.append(row)
solsystem[-1][12] = ast.literal_eval(solsystem[-1][12])
A similar alternative is to format your files using JSON, YAML, or XML, and read in and parse with with the json or xml modules of the standard library.
At the more complex end of the space are full relational (and non-relational) databases. Your use case, plus the fact that there is a sqlite3 module in the standard library makes sqlite a good potential option in this space.
I have a simple CSV-file like this:
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
Note ,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32
1,,,,,X,,,,,,,,X,,,,,,,,X,,,,,,,,X,,,
2,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
3,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
4,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
5,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
I need to parse it into a 2D array containing 1-s where X are and 0 otherwise, ignoring the headers/extra rows.
After reading the docs on the csv module I wrote a simple script like so:
import csv
csvfile = open('input.csv', 'rb')
reader = csv.reader(csvfile,dialect='excel', delimiter=' ', quotechar='|')
data = []
rowCount = 0
for row in reader:
if(rowCount > 2): #skip first 3 rows (2 empty and 1 label)
dataRow = []
for i in xrange(1,len(row[0])):#skip 1st label column
dataRow.append(1 if row[0][i] == 'X' else 0) #append 1s for X, 0s otherwise
data.append(dataRow)
rowCount += 1
print data
This gives me the expected output:
[0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0]
for
,,,,,X,,,,,,,,X,,,,,,,,X,,,,,,,,X,,,
The ternary condition could be written as ord(row[0][i])/88, but would it be possible to map the each string row to an integer row of ones and zeros?
Is there's a more 'pythonic' way of writing this ?
You should use delimiter=',':
reader = csv.reader(csvfile, dialect='excel', delimiter=',', quotechar='|')
Actually:
dialect='excel', delimiter=',' are defaults and quotechar='|' is not need for your example file (keep it if needed). So this is shorter:
reader = csv.reader(csvfile)
Throw away the first three lines:
[next(reader) for _ in range(3)]
Read all lines:
data = [[1 if entry=='X' else 0 for entry in row[1:]] for row in reader]
This equivalent to:
data = []
for row in reader:
data.append([1 if entry=='X' else 0 for entry in row[1:]])
Of course, open your file with automatic closing after dedent:
with open('input.csv', 'rb‘) as csvfile:
# Put the rest of the algorithm here.
# The file is closed automatically just because continuing detended.
This is the prime example of a context manager.
Putting it all together:
import csv
with open('input.csv', 'rb') as csvfile:
reader = csv.reader(csvfile)
[next(reader) for _ in range(3)]
data = [[1 if entry=='X' else 0 for entry in row[1:]] for row in reader]
Firstly, skipping three rows can be done like:
for _ in range(3):
next(reader)
then you can use a list comprehension on the rest:
data = [[int(cell == 'X') for cell in line[1:]] for line in reader]
This gives you a list of lists:
>>> data
[[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]
If efficiency is important and the lines are long, using itertools.islice lets you slice each line without creating a new list.
Note that your delimiter and quotechar settings don't seem to match with the example file, so you might want to double-check that.
one thing i want to add to #jonrsharpe answer.
Pythonic way of using a file is given below.
This will take care of closing the file for you when your calculation is complete
with open('input.csv', 'rb') as csvfile:
//use the csvfile here
A: Try to avoid overheads ... a smart-enough reader comes from numpy.genfromtxt()
DATA = np.genfromtxt( aFH, # aFH = open( <aFileNAME>, "r" )
#kiprows = 1, # DeprecationWarning: The use of `skiprows` is deprecated, it will be removed in numpy 2.0.
skip_header = 3, # twice "..." + "Note,1,2,3.."
delimiter = ",",
converters = { 1: lambda aString: mPlotDATEs.date2num( datetime.datetime.strptime( aString[1:-1], "%m/%d/%y %H:%M" ) ),
0: lambda aString: float( aString[1:-1] )
} # left as an example of powers the genfromtxt()'s inline conversters create
)
print "DATA has shape of ", DATA.shape
aFH.close()
converters ( asDict ) are the key ( per-column specified ) -- i.e. may post-process X-s into 1-s and isBlank-s into 0-s
+ Other benefits
Any serious number crunching is, be it visible or not, based on numpy. So the imported modules' foot-print does not go as far as if importing pandas ( very smart and very powerful ) just to import rows of text without any further use of the advanced DataFrame capabilities.
Additionally, you may benefit from other, smarter numpy features, if your DATA.size grows bigger and bigger. As seen from your sample piece of data, this is a typical situation to use SPARSE-MATRICEs, where you do not pay immense-overhead costs associated with handling almost-empty-cells ( 0-s ).
This may bring you even a much more memory-compact solution, than the bool-ean-MASK array, which still spends more than a few bits on 0-cell-s in the fully-meshed ( DENSE-MATRIX ).
Q: Is there a more-python-ic approach?
Hope this will not fire off a war of flames. So as a minimalistic view, let's assume that for-loops were avoided ( first plus - both formally & performance-wise ), a one-liner-populists may be also happy as the whole import process takes one line ( well a bit talkative one, but still one ... SLOC-line if anyone seriously cares :o) ). Finally the lambda-inliners are so smart and so powerful python-ic non-plus-ultra ( get in love with 'em as they both give you immense powers & are so, so, so python-ic ( though in principle these "Chuck Norrises"-of-code originated in the LISP generation of science about efficient software design in late 50-s of the previous century) )