List and Numpy array in Python - python

I was trying to hot-encode data.
Data is list of vocabulary_size = 17005207.
To hot-encode, I made a list of inputs of num_labels = 100.
Following code:
inputs = []
for i in range(vocabulary_size):
inputs.append(np.arange(num_labels) == data[i]).astype(np.float32)
Throws me an Error:
AttributeError: 'NoneType' object has no attribute 'astype'
I tried dtype = np.float32 inside append function but again erroneous.
When I try this :
inputs = []
for i in range(vocabulary_size):
inputs.append(np.arange(num_labels) == data[i])
inputs = np.array(inputs,dtype=np.float32)
I get correct answer : Hot-Encoded Input Sequence of vocabulary_size x num_labels.
Any Alternative Solution Of One Line Without Using Numpy?
Solved :Can I be done directly using numpy array(input) with list(data)?
Info about data : data = np.ndarray(len(words), dtype=np.int32)
Reformat function:
def reformat(data):
num_labels = vocabulary_size
print (type(data))
data = (np.arange(num_labels) == data[:,None]).astype(np.int32)
return data
print (data,len(data))
return data
New Question : The dimension of data is (vocabulary_size,)...How to convert data using ravel or reshape into dimension of (1,vocabulary_size)?

Not sure whether I've understood correctly what you're asking for, but if what you want is a oneliner, you could transform you're already working code into this:
inputs = np.array([np.arange(num_labels) == data[i] for i in range(vocabulary_size)], dtype=np.float32)

Related

Error when trying to simulate one FMU with 4 inputs from csv files [duplicate]

I have a fmu created in gt-suite. I am trying to work with it in python using python PyFMI package.
My code
from pyfmi import load_fmu
import numpy as np
model = load_fmu('AHUPIv2b.fmu')
t = np.linspace(0.,100.,100)
u = np.linspace(3.5,4.5,100)
v = np.linspace(900,1000,100)
u_traj = np.transpose(np.vstack((t,u)))
v_traj = np.transpose(np.vstack((t,v)))
input_object = (('InputVarI','InputVarP'),(u_traj,v_traj))
res = model.simulate(final_time=500, input=input_object, options={'ncp':500})
res = model.simulate(final_time=10)
model.simulate takes input as one of its parameters, Documentation says
input --
Input signal for the simulation. The input should be a 2-tuple
consisting of first the names of the input variable(s) and then
the data matrix.
'InputVarI','InputVarP' are the input variables and u_traj,v_traj are data matrices.
My code gives an error
gives an error -
TypeError: tuple indices must be integers or slices, not tuple
Is the input_object created wrong? Can someone help with how to create the input tuples correctly as per the documentation?
The input object is created incorrect. The second variable in the input tuple should be a single data matrix, not two data matrices.
The correct input should be:
data = np.transpose(np.vstack((t,u,v)))
input_object = (['InputVarI','InputVarP'],data)
See also pyFMI parameter change don't change the simulation output

Tensorflow transform each element of a string tensor

I have a tensor of strings. Some example strings are as follows.
com.abc.display,com.abc.backend,com.xyz.forte,blah
com.pqr,npr.goog
I want to do some preprocessing which splits the CSV into its part, then splits each part at the dots and then create multiple strings where one string is a prefix of another. Also, all blahs have to be dropped.
For example, given the first string com.abc.display,com.abc.backend,com.xyz.forte, it is transformed into an array/list of the following strings.
['com', 'com.abc', 'com.abc.display', 'com.abc.backend', 'com.xyz', 'com.xyz.forte']
The resulting list has no duplicates (that is why the prefixed strings for com.abc.backend didn't show up as those were already included - com and com.abc).
I wrote the following python function that would do the above given a single CSV string example.
def expand_meta(meta):
expanded_subparts = []
meta_parts = set([x for x in meta.split(',') if x != 'blah'])
for part in meta_parts:
subparts = part.split('.')
for i in range(len(subparts)+1):
expanded = '.'.join(subparts[:i])
if expanded:
expanded_subparts.append(expanded)
return list(set(expanded_subparts))
Calling this method on the first example
expand_meta('com.abc.display,com.abc.backend,com.xyz.forte,blah')
returns
['com.abc.display',
'com.abc',
'com.xyz',
'com.xyz.forte',
'com.abc.backend',
'com']
I know that tensorflow has this map_fn method. I was hoping to use that to transform each element of the tensor. However, I am getting the following error.
File "mypreprocess.py", line 152, in expand_meta
meta_parts = set([x for x in meta.split(',') if x != 'blah'])
AttributeError: 'Tensor' object has no attribute 'split'
So, it seems like I can't use a regular python function with map_fn since it expects the elements to be tensors. How can I do what I intend to do here?
(My Tensorflow version is 1.11.0)
I think this does what you want:
import tensorflow as tf
# Function to process a single string
def make_splits(s):
s = tf.convert_to_tensor(s)
# Split by comma
split1 = tf.strings.split([s], ',').values
# Remove blahs
split1 = tf.boolean_mask(split1, tf.not_equal(split1, 'blah'))
# Split by period
split2 = tf.string_split(split1, '.')
# Get dense split tensor
split2_dense = tf.sparse.to_dense(split2, default_value='')
# Accummulated concatenations
concats = tf.scan(lambda a, b: tf.string_join([a, b], '.'),
tf.transpose(split2_dense))
# Get relevant concatenations
out = tf.gather_nd(tf.transpose(concats), split2.indices)
# Remove duplicates
return tf.unique(out)[0]
# Test
with tf.Graph().as_default(), tf.Session() as sess:
# Individual examples
print(make_splits('com.abc.display,com.abc.backend,com.xyz.forte,blah').eval())
# [b'com' b'com.abc' b'com.abc.display' b'com.abc.backend' b'com.xyz'
# b'com.xyz.forte']
print(make_splits('com.pqr,npr.goog').eval())
# [b'com' b'com.pqr' b'npr' b'npr.goog']
# Apply to multiple strings with a loop
data = tf.constant([
'com.abc.display,com.abc.backend,com.xyz.forte,blah',
'com.pqr,npr.goog'])
ta = tf.TensorArray(size=data.shape[0], dtype=tf.string,
infer_shape=False, element_shape=[None])
_, ta = tf.while_loop(
lambda i, ta: i < tf.shape(data)[0],
lambda i, ta: (i + 1, ta.write(i, make_splits(data[i]))),
[0, ta])
out = ta.concat()
print(out.eval())
# [b'com' b'com.abc' b'com.abc.display' b'com.abc.backend' b'com.xyz'
# b'com.xyz.forte' b'com' b'com.pqr' b'npr' b'npr.goog']
I'm not sure if you want the total results concatenated like that, or maybe you want to apply tf.unique to the global result, but in any case the idea is the same.

Python Crashing with error with Nested For Loop: ValueError: Buffer has wrong number of dimensions (expected 1, got 0)

I have the following nested for loop. This purpose is to create an Adjacency matrix (295x295) from my data set which is (658,295) in size.
When I run this code, I get the error in the title, and I am not sure how to solve for this. Any help is appreciated.
gensum = np.sum(data)
p=len(gensum)
W = pd.DataFrame(np.zeros([p,p]))
for i in range(np.shape(data)[1]):
for k in range(np.shape(data)[1]):
temp = data.iloc[:,[k,i+1]]
temp['new'] = temp.iloc[:,0] + temp.iloc[:,1]
temp['new']=temp['new'].map({0:0,1:0,2:1})
assign = np.sum(temp['new'])
W.iat[k,i+1] = assign
W.iat[i+1,k] = assign

How to create an empty numpy array with semi-specified dims?

I am trying to read data from a server like this:
with requests.Session() as s:
data = {}
r = s.get('https://something.com' , json = data ).json()
training_set1 = np.empty([-1,4])
training_set1[:,0] = r["o"]
training_set1[:,1] = r["h"]
training_set1[:,2] = r["l"]
training_set1[:,3] = r["c"]
But I don't know the length of arrays, so I used -1 then got this error message:
ValueError: negative dimensions are not allowed
How can I fix this code? The response r is a JSON object:
{"t":[1322352000,1322438400],
"o":[123,123],
"h":[123,123],
"l":[123,123],
"c":[123,123]}
that I am trying to rearrange it to a numpy array.
Numpy arrays have fixed sizes. You cannot initialize a dynamic sized array. What you can do is use a list of lists and later convert the list to a numpy array.
Something like this should work assuming r["x"] is a list. (Untested code)
with requests.Session() as s:
data = {}
r = s.get('https://something.com' , json = data ).json()
t_set1 = []
t_set1.append(r["o"])
t_set1.append(r["h"])
t_set1.append(r["l"])
t_set1.append(r["c"])
training_set1 = np.array(t_set1)
Edit: Edited for the order "o","h","l",""c after OP edited the question
You cannot declare a numpy array with an unknown dimension. But you can declare it in one single operation:
training_set1 = np.array([r["o"], r["o"], r["h"], r["l"]])
or even better:
training_set1 = np.array([r[i] for i in "oohl"])

Convert a tensorflow tf.data.Dataset FlatMapDataset to TensorSliceDataset

I want to pass a list of tf.Strings to the .map(_parse_function) function.
def _parse_function(self, img_path):
img_str = tf.read_file(img_path)
img_decode = tf.image.decode_jpeg(img_str, channels=3)
img_decode = tf.divide(tf.cast(img_decode , tf.float32),255)
return img_decode
When the tf.data.Dataset is of type TensorSliceDataset,
dataset_from_slices = tf.data.Dataset.from_tensor_slices((tensor_with_filenames))
I can simply do
dataset_from_slices.map(_parse_function), which works.
However, dataset_from_generator = tf.data.Dataset.from_generator(...) returns a Dataset which is an instance of FlatMapDataset type and dataset_from_generator.map(_parse_function) gives the following error:
InvalidArgumentError: Input filename tensor must be scalar, but had shape: [32]
If I change the first line to:
img_str = tf.read_file(img_path[0])
that also works but then I only get the first image, which is not what I am looking for. Any suggestions?
It sounds like the elements of your dataset_from_generator are batched. The simplest remedy is to use tf.contrib.data.unbatch() to convert them back into individual elements:
# Each element is a vector of strings.
dataset_from_generator = tf.data.Dataset.from_generator(...)
# Converts each vector of strings into multiple individual elements.
dataset = dataset_from_generator.apply(tf.contrib.data.unbatch())
dataset = dataset.map(_parse_function)

Categories