I am new to python as well as Cryptography. I want to generate the PrivateKey from a variable. After looking for so many options I only get to know that I can generate the key pair from Random function only.
Here is my variable
a = [[[3, 1, 85, 33, 0, 0, 255, 254, 255, 254, 255, 254, 255, 254, 255, 6, 248, 0, 240, 0, 224, 0, 192, 0, 192, 0, 128, 0, 128, 0, 128, 0, 128, 0, 128, 0, 128, 0, 192, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 108, 181, 11, 254, 92, 186, 205, 94, 27, 192, 36, 126, 26, 164, 13, 159, 114, 59, 162, 95, 50, 189, 78, 95, 99, 24, 140, 124, 72, 165, 204, 156, 114, 40, 99, 28, 57, 25, 141, 122, 107, 39, 226, 122, 24, 169, 35, 186, 93, 39, 162, 242, 24, 185, 229, 120, 38, 169, 205, 57, 31, 184, 141, 153, 78, 152, 12, 86, 92, 152, 13, 22, 77, 155, 226, 182, 66, 163, 163, 16, 89, 157, 227, 180, 67, 160, 204, 142, 101, 165, 226, 244, 45, 167, 99, 20, 117, 157, 76, 44, 38, 181, 225, 44, 107, 32, 75, 179, 110, 32, 163, 16, 38, 165, 101, 48, 32, 43, 14, 76, 38, 50, 204, 205, 50, 163, 227, 172, 38, 46, 98, 108, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [3, 1, 101, 34, 0, 0, 255, 254, 255, 14, 248, 2, 240, 2, 224, 0, 192, 0, 192, 0, 128, 0, 128, 0, 128, 0, 128, 0, 192, 0, 192, 0, 224, 0, 224, 0, 240, 0, 248, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 36, 17, 229, 30, 78, 19, 140, 30, 99, 21, 226, 190, 113, 22, 97, 190, 30, 152, 36, 94, 115, 162, 75, 222, 21, 35, 228, 126, 119, 168, 97, 222, 99, 40, 140, 222, 32, 52, 14, 222, 108, 53, 204, 94, 67, 55, 78, 222, 76, 193, 100, 62, 57, 172, 142, 31, 92, 181, 77, 191, 47, 55, 101, 31, 119, 184, 75, 63, 75, 11, 22, 92, 108, 15, 227, 156, 44, 157, 34, 28, 25, 158, 13, 220, 32, 33, 228, 92, 35, 165, 141, 220, 44, 152, 140, 189, 100, 142, 139, 218, 35, 156, 205, 154, 32, 169, 228, 90, 95, 11, 140, 18, 94, 141, 226, 146, 40, 170, 14, 82, 38, 172, 100, 210, 51, 149, 33, 208, 70, 13, 162, 215, 44, 147, 228, 180, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]]
So my questions are,
Is it possible to generate the PrivateKey from a sequence/data
If so, any suggestions that how can I proceed with that
And last but not the least, which algorithm should I use (RSA/ECC)
No, it's not possible to generate a private key from just a sequence/data.
Keys are generated using the random number generator(RNG) or pseudo-random number generator(PRNG) that uses a system entropy seed, making it even more harder for an attacker to guess.
Take the case of RSA, as you mentioned you can either create a private key from a random function or have more control over the private key generation by setting up the variables manually. Here you can find the way to generate a private key with some variable that you can manually decide using the 'Crypto.PublicKey.RSA.construct()' function of the 'PyCryptodome' package. For this, you should have a good understanding of RSA.
It is possible to generate a key from the user input in the symmetric cryptography. 'Scrypt' and 'Argon2' are two packages that help you achieve it. The input has to be converted to binary first. The input can be a string or numbers. Here is a simple example using the 'Scrypt' package. You can find more details here.
import scrypt, secrets
password = b'not a number'
salt = secrets.token_bytes(32)
scrypt_key = scrypt.hash(password, salt, N=16384, r=8, p=1, 32)
print('Salt: ', salt)
print('Key: ', scrypt_key)
Output:
Salt: b'\xdbS\x1e\xa2\x81e\xd3\x948p\xc3lmk\xd6\x8b\xb94\x1c\xd5A/\xa5gZ\xb1\xc2\x15\x99\x9d\xc8\xb8'
Key: b'J\xc1\xc1"\xfd\x05\xfb\x14J\x96\xea\xe3\x1d\xa6\xbb\x01\xf7sj\x87\xf9\x18%\x00YK\x1f\xe8\xc8\x8d\xff%'
The algorithm depends on your application. Although, nowadays the ECC cryptography is considered more secure than the RSA cryptography.
Related
I'm pretty amateur with python and computer data, so bare with me -
I have a file called "input.nodes" and I want to have python read each byte, convert them to base10, and then organize each byte chronologically into a list. I have almost succeeded with this.
I'm using Python 3.10, which if I recall correctly, is the latest version.
To put it even more simply -- How do I make python read each byte as base10, and organize all of them chronologically into a list with NO extra characters whatsoever?
Here's how I attempted to do this:
with open('input.nodes') as f:
dataImport = f.read()
dataImportSplit = [] # << The reason we are adding an empty list is because we will
# sort all of the bytes of the stickfigure into this list.
for chr in dataImport: # << As stated, this will take all of the bytes
dataImportSplit.append(ord(chr)) # from input.nodes and sort them into a list.
print('BYTE LIST (in Base10):\n' + str(dataImportSplit)) # << Prints out "dataImportSplit", the
# list we just sorted bytes into.
# Just for debugging purposes.
To elaborate on this code:
Import input.nodes as a string, and let "dataImport" be the variable for it.
Create an empty list to later store all of the bytes into, called "dataImportSplit" .
Find the Unicode code point of each character in dataImport (via the ord function), and append each of them individually into a list.
Print the list into the console as a string for debugging.
--
That code almost worked; most objects in the list were the base10 representation of the byte and I double-checked with a decimal-to-binary and decimal-to-hex calculator. However, there seemed to be a few outliers that have zero correlation to anything, to my knowledge.
Here is the output in the python terminal:
BYTE LIST (in Base10):
[0, 0, 1, 78, 63, 8364, 0, 0, 255, 112, 8250, 68, 255, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 63, 8364, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 32, 0, 0, 0, 32, 0, 0, 0, 0, 0, 0, 0, 0, 63, 0, 0, 0, 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 255, 31, 31, 31, 255, 127, 127, 127, 255, 127, 127, 127, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 63, 8364, 0, 0, 0, 0, 0, 100, 0, 0, 0, 100, 0, 0, 0, 32, 0, 0, 0, 32, 0, 0, 0, 32, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 45, 0, 0, 0, 45, 0, 0, 0, 45, 255, 112, 8250, 68, 255, 112, 8250, 68, 255, 112, 8250, 68, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Obviously you cant understand what this means if I dont give the actual contents of input.nodes , so for reference, here is input.nodes in a hex editor:
[
If this is still not enough information, you can download input.nodes for yourself here.
As you can see, the list objects containing '8364' or '8250' dont appear to have any correlation to anything -- no matter what its converted to.
What am I missing here? What do the numbers "8364" and "8250" have to do with anything?
The bytes type in Python represents a sequence of byte-valued integers (0-255), but displays by default as an immutable byte string. If you open your file in binary mode ('rb') the data you read is a byte string that can be accessed as integers individually through indexing or iteration, or you can convert it explicitly to a list.
Opening in text mode (the default) uses an implicit encoding that varies by OS if the encoding parameters is not used and converts bytes to Unicode code points via that encoding.
If you want the individual bytes, read in binary to prevent any conversion:
with open('downloads\input.nodes', 'rb') as f:
data = f.read()
print(data[:20]) # displays first 20 bytes as a byte string
print(data[:20].hex(' ')) # hexadecimal dump separated by spaces
for b in data[20:40]: # prints next 20 bytes as integers
print(b)
print(list(data)) # convert to list
Output:
b'\x00\x00\x01N?\x80\x00\x00\xffp\x9bD\xff\x00\x00\x00\x00\x00\x00\x00'
00 00 01 4e 3f 80 00 00 ff 70 9b 44 ff 00 00
0
0
0
0
0
0
0
0
0
0
0
0
0
0
63
128
0
0
0
0
[0, 0, 1, 78, 63, 128, 0, 0, 255, 112, 155, 68, 255, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 63, 128, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 32, 0, 0, 0, 32, 0, 0, 0, 0, 0, 0, 0, 0, 63, 0, 0, 0, 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 255, 31, 31, 31, 255, 127, 127, 127, 255, 127, 127, 127, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 63, 128, 0, 0, 0, 0, 0, 100, 0, 0, 0, 100, 0, 0, 0, 32, 0, 0, 0, 32, 0, 0, 0, 32, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 45, 0, 0, 0, 45, 0, 0, 0, 45, 255, 112, 155, 68, 255, 112, 155, 68, 255, 112, 155, 68, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
I'm trying to make a speech recognition algorithm. I've a wav file containing +-20 minutes of speech. I've read it into a numpy array, each chunck of 1024 values is a row. As somehow not all chunks, provided by the wave module's file's readframes method, are of the same length, some rows are padded with zeros using the numpy.padd function in order to make the array have a homogenous shape. These paddings are only appended behind the array, and thus cannot cause the following.
I've noticed that there are columns in the array that are only containing zeros. These columns are always occuring in pairs and are always seperated by two columns containing normal values. The sixth column seems to be an exception on this pattern: It does also contain ones sometimes. Are these columns real recorded speach or are they put in between by my computer for some reason? This is important to know, as I don't wan't my algorithm to train on computer generated values. Should I delete those values or is it better to keep them? The array does not have to be playable as audio anymore.
Here's a sample out of the array. I can't attach the complete array, as it is way to big for that. A more complete version can be found here.
[78, 1, 0, 0, 79, 1, 0, 0, 12, 1, 0, 0, 185, 0, 0, 0, 177, 0, 0, 0, 28, 1, 0, 0, 245, 1, 0, 0, 38, 3, 0, 0, 106, 4, 0, 0, 81, 5, 0, 0, 148, 5, 0, 0, 74, 5, 0, 0, 168, 4, 0, 0, 229, 3, 0, 0, 83, 3, 0, 0, 31, 3, 0, 0, 26, 3, 0, 0, 33, 3, 0, 0, 40, 3, 0, 0, 22, 3, 0, 0, 246, 2, 0, 0, 211, 2, 0, 0, 136, 2, 0, 0, 240, 1, 0, 0, 247, 0, 0, 0, 176, 255, 0, 0, 97, 254, 0, 0, 69, 253, 0, 0, 131, 252, 0, 0, 54, 252, 0, 0, 81, 252, 0, 0, 188, 252, 0, 0, 79, 253, 0, 0, 207, 253, 0, 0, 48, 254, 0, 0, 97, 254, 0, 0, 73, 254, 0, 0, 6, 254, 0, 0, 175, 253, 0, 0, 90, 253, 0, 0, 58, 253, 0, 0, 73, 253, 0, 0, 101, 253, 0, 0, 132, 253, 0, 0, 147, 253, 0, 0, 164, 253, 0, 0, 199, 253, 0, 0, 224, 253, 0, 0, 8, 254, 0, 0, 97, 254, 0, 0, 228, 254, 0, 0, 163, 255, 0, 0, 138, 0, 0, 0, 89, 1, 0, 0, 251, 1, 0, 0, 45, 2, 0, 0, 157, 1, 0, 0, 95, 0, 0, 0, 161, 254, 0, 0, 185, 252, 0, 0, 56, 251, 0, 0, 134, 250, 0, 0, 226, 250, 0, 0, 51, 252, 0, 0, 246, 253, 0, 0, 175, 255, 0, 0, 208, 0, 0, 0, 240, 0, 0, 0, 96, 0, 0, 0, 124, 255, 0, 0, 119, 254, 0, 0, 223, 253, 0, 0, 243, 253, 0, 0, 120, 254, 0, 0, 77, 255, 0, 0, 254, 255, 0, 0, 253, 255, 0, 0, 63, 255, 0, 0, 224, 253, 0, 0, 61, 252, 0, 0, 27, 251, 0, 0, 35, 251, 0, 0, 180, 252, 0, 0, 154, 255, 0, 0, 1, 3, 0, 0, 6, 6, 0, 0, 189, 7, 0, 0, 116, 7, 0, 0, 111, 5, 0, 0, 118, 2, 0, 0, 116, 255, 0, 0, 133, 253, 0, 0, 55, 253, 0, 0, 124, 254, 0, 0, 17, 1, 0, 0, 19, 4, 0, 0, 87, 6, 0, 0, 42, 7, 0, 0, 53, 6, 0, 0, 195, 3, 0, 0]
[240, 0, 0, 0, 235, 254, 0, 0, 155, 254, 0, 0, 87, 0, 0, 0, 80, 3, 0, 0, 25, 6, 0, 0, 137, 7, 0, 0, 253, 6, 0, 0, 146, 4, 0, 0, 57, 1, 0, 0, 35, 254, 0, 0, 73, 252, 0, 0, 56, 252, 0, 0, 185, 253, 0, 0, 251, 255, 0, 0, 60, 2, 0, 0, 204, 3, 0, 0, 18, 4, 0, 0, 15, 3, 0, 0, 21, 1, 0, 0, 137, 254, 0, 0, 95, 252, 0, 0, 99, 251, 0, 0, 137, 251, 0, 0, 210, 252, 0, 0, 46, 255, 0, 0, 161, 1, 0, 0, 40, 3, 0, 0, 102, 3, 0, 0, 95, 2, 0, 0, 127, 0, 0, 0, 92, 254, 0, 0, 160, 252, 0, 0, 18, 252, 0, 0, 216, 252, 0, 0, 133, 254, 0, 0, 181, 0, 0, 0, 136, 2, 0, 0, 248, 2, 0, 0, 213, 1, 0, 0, 130, 255, 0, 0, 202, 252, 0, 0, 199, 250, 0, 0, 241, 249, 0, 0, 56, 250, 0, 0, 121, 251, 0, 0, 15, 253, 0, 0, 72, 254, 0, 0, 235, 254, 0, 0, 188, 254, 0, 0, 180, 253, 0, 0, 61, 252, 0, 0, 191, 250, 0, 0, 167, 249, 0, 0, 108, 249, 0, 0, 33, 250, 0, 0, 144, 251, 0, 0, 106, 253, 0, 0, 20, 255, 0, 0, 227, 255, 0, 0, 169, 255, 0, 0, 173, 254, 0, 0, 82, 253, 0, 0, 3, 252, 0, 0, 69, 251, 0, 0, 115, 251, 0, 0, 146, 252, 0, 0, 109, 254, 0, 0, 137, 0, 0, 0, 64, 2, 0, 0, 28, 3, 0, 0, 241, 2, 0, 0, 233, 1, 0, 0, 148, 0, 0, 0, 143, 255, 0, 0, 60, 255, 0, 0, 183, 255, 0, 0, 185, 0, 0, 0, 203, 1, 0, 0, 149, 2, 0, 0, 208, 2, 0, 0, 97, 2, 0, 0, 127, 1, 0, 0, 103, 0, 0, 0, 66, 255, 0, 0, 74, 254, 0, 0, 172, 253, 0, 0, 121, 253, 0, 0, 158, 253, 0, 0, 218, 253, 0, 0, 242, 253, 0, 0, 207, 253, 0, 0, 97, 253, 0, 0, 191, 252, 0, 0, 91, 252, 0, 0, 144, 252, 0, 0, 82, 253, 0, 0, 100, 254, 0, 0, 122, 255, 0, 0, 34, 0, 0, 0]
[1, 0, 0, 0, 65, 255, 0, 0, 89, 254, 0, 0, 151, 253, 0, 0, 47, 253, 0, 0, 69, 253, 0, 0, 221, 253, 0, 0, 237, 254, 0, 0, 76, 0, 0, 0, 166, 1, 0, 0, 187, 2, 0, 0, 133, 3, 0, 0, 3, 4, 0, 0, 86, 4, 0, 0, 179, 4, 0, 0, 47, 5, 0, 0, 198, 5, 0, 0, 96, 6, 0, 0, 197, 6, 0, 0, 210, 6, 0, 0, 145, 6, 0, 0, 37, 6, 0, 0, 217, 5, 0, 0, 226, 5, 0, 0, 73, 6, 0, 0, 35, 7, 0, 0, 86, 8, 0, 0, 105, 9, 0, 0, 11, 10, 0, 0, 252, 9, 0, 0, 222, 8, 0, 0, 224, 6, 0, 0, 151, 4, 0, 0, 73, 2, 0, 0, 111, 0, 0, 0, 152, 255, 0, 0, 125, 255, 0, 0, 130, 255, 0, 0, 105, 255, 0, 0, 214, 254, 0, 0, 135, 253, 0, 0, 223, 251, 0, 0, 108, 250, 0, 0, 144, 249, 0, 0, 117, 249, 0, 0, 255, 249, 0, 0, 244, 250, 0, 0, 239, 251, 0, 0, 101, 252, 0, 0, 38, 252, 0, 0, 99, 251, 0, 0, 105, 250, 0, 0, 195, 249, 0, 0, 239, 249, 0, 0, 4, 251, 0, 0, 216, 252, 0, 0, 218, 254, 0, 0, 61, 0, 0, 0, 161, 0, 0, 0, 19, 0, 0, 0, 227, 254, 0, 0, 190, 253, 0, 0, 69, 253, 0, 0, 179, 253, 0, 0, 209, 254, 0, 0, 21, 0, 0, 0, 250, 0, 0, 0, 35, 1, 0, 0, 76, 0, 0, 0, 162, 254, 0, 0, 204, 252, 0, 0, 104, 251, 0, 0, 239, 250, 0, 0, 177, 251, 0, 0, 127, 253, 0, 0, 162, 255, 0, 0, 42, 1, 0, 0, 95, 1, 0, 0, 246, 255, 0, 0, 23, 253, 0, 0, 154, 249, 0, 0, 220, 246, 0, 0, 235, 245, 0, 0, 32, 247, 0, 0, 33, 250, 0, 0, 214, 253, 0, 0, 203, 0, 0, 0, 227, 1, 0, 0, 234, 0, 0, 0, 137, 254, 0, 0, 202, 251, 0, 0, 251, 249, 0, 0, 42, 250, 0, 0, 77, 252, 0, 0, 154, 255, 0, 0, 24, 3, 0, 0, 162, 5, 0, 0, 107, 6, 0, 0, 139, 5, 0, 0, 154, 3, 0, 0]
It appeared that the file contained two different sound channels. The second sound channel was not recorded, as I have only one microphone. This caused this channel to be defaulted to all zeros. Because .wav files use two bits to record each channel, this resulted in two places being filled with zeros. Then the wave module combined all the different channels into one chunck, each containing 2 correct bits and 2 incorrect (0) bits and combined 100 of these frames, my chunck size, into one array and thus deliver the array given.
I could perfectly remove the zeros out of the array without any audio quality loss. However, I also had to reduce the amount of channels to 1, as I just deleted the second channel.
I'm trying to implement an LSTM with both character and word embeddings as shown here, except my problem is not NER, just simple text prediction. Right now I'm getting this error:
ValueError: Shapes (None, 135) and (None, 10, 135) are incompatible
This is my model summary:
Model: "model_13"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_34 (InputLayer) [(None, 10, 30)] 0
__________________________________________________________________________________________________
input_33 (InputLayer) [(None, 10)] 0
__________________________________________________________________________________________________
time_distributed_43 (TimeDistri (None, 10, 30, 20) 2380 input_34[0][0]
__________________________________________________________________________________________________
embedding_16 (Embedding) (None, 10, 128) 26887296 input_33[0][0]
__________________________________________________________________________________________________
time_distributed_44 (TimeDistri (None, 10, 20) 3280 time_distributed_43[0][0]
__________________________________________________________________________________________________
concatenate_14 (Concatenate) (None, 10, 148) 0 embedding_16[8][0]
time_distributed_44[0][0]
__________________________________________________________________________________________________
spatial_dropout1d_14 (SpatialDr (None, 10, 148) 0 concatenate_14[0][0]
__________________________________________________________________________________________________
bidirectional_14 (Bidirectional (None, 10, 100) 79600 spatial_dropout1d_14[0][0]
__________________________________________________________________________________________________
time_distributed_45 (TimeDistri (None, 10, 135) 13635 bidirectional_14[0][0]
==================================================================================================
Total params: 26,986,191
Trainable params: 98,895
Non-trainable params: 26,887,296
__________________________________________________________________________________________________
My inputs are X_word, X_char and Y. X_word is a list of encoded words. 10 words per sentence (2770, 10) And X.word[0] looks like this:
array([[ 16871, 298, 0, 0, 0, 0, 0, 0,
0, 0]])
And it's a padded sentence with two words.
My X_char is a list of characters for those words:
array([[ 7, 101, 16, 101, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0],
[ 56, 102, 16, 34, 102, 61, 6, 102, 93, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0]])
X_char has a shape of (2770, 10, 30).
I have 135 labels so Y is in shape (2770,135) and i fit everything like this:
history = model.fit([X_word_tr,
(np.array(X_char_tr)).astype('float32').reshape((len(X_char_tr), max_len, max_len_char))],
np.array(to_categorical(y_tr)), epochs=10, verbose=1)
I can't help but think my logic is flawed somewhere.
If you defined the model like in the tutorial you linked (which you should include in the post), it's because it's set to return a sequence, so it returns the next 10 characters. You need to set return_sequences=False so it returns a single value, and remove the last TimeDistributed layer that wraps around the last Dense layer.
main_lstm = tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(units=50, return_sequences=False,
recurrent_dropout=0.6))(x)
out = tf.keras.layers.Dense(n_tags + 1, activation="sigmoid")(main_lstm)
In the summary, you will see that it returns a single value, and so it can be compared to your labels:
_______________________________________________________________________________________________
dense_5 (Dense) (None, 135) 13635 bidirectional_5[0][0]
===============================================================================================
I have a training set of 2 images which has 64 features and a label attached to them i.e. matched/not matched.
How can I feed this data in a neural network using keras?
My data is as follows:
[
[
[
239,
1,
255,
255,
255,
255,
2,
0,
130,
3,
1,
101,
22,
154,
0,
240,
30,
0,
2,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
128,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
71,
150,
212
],
[
239,
1,
255,
255,
255,
255,
2,
0,
130,
3,
1,
101,
22,
154,
0,
240,
30,
0,
2,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
128,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
71,
150,
212
],
"true"
],
[
[
239,
1,
255,
255,
255,
255,
2,
0,
130,
3,
1,
81,
28,
138,
0,
241,
254,
128,
6,
0,
2,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
128,
0,
128,
2,
128,
2,
192,
6,
224,
6,
224,
62,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
13,
62
],
[
239,
1,
255,
255,
255,
255,
2,
0,
130,
3,
1,
81,
28,
138,
0,
241,
254,
128,
6,
0,
2,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
128,
0,
128,
2,
128,
2,
192,
6,
224,
6,
224,
62,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
13,
62
],
"true"
],
....
]
I want to train neural network so that after training if I provide it 2 array of 64 features then it should able to tell whether they matched or not?
Since you kind of extracted the futures already, I'd suggest by just going with some dense layers and convert the "true" and "false" to a 1 and 0 respectively, and just use a sigmoid on the final dense layer.
Try to experiment with something simple first, see how it goes and continue from there on, need more help, just ask
EDIT
def generator(batch_size=10, nr_features=126):
feed_data = np.zeros((batch_size, nr_features))
labels = np.zeros(batch_size)
i = 0
for entry in data:
if entry.pop(-1) == "true":
labels[i] = 1
else:
labels[i] = 0
feed_data[i, :] = np.array(entry).flatten()
i += 1
if not (i % batch_size):
i = 0
yield feed_data, labels
model = keras.Sequential()
model.add(keras.layers.Dense(126, input_dim=126))
model.add(keras.layers.Dense(20))
model.add(keras.layers.Dense(1))
model.add(keras.layers.Activation('sigmoid'))
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
for d, l in generator():
model.train_on_batch(d, l)
So what happens,
the data in the generator is your full data, I pop out the true/false, convert it to 1/0 and put it into the label array, I concat all features as a feature vecotr of 126. So feed_data.shape = (10, 126) and labels.shape = (10).
I feed that to a simple fully connected network, one that ends up with a sigmoid. Sigmoid is useful for probablilty, so in this case the output will be the probability that a feature vecotr is true. and I just feed the data.
Simple example, is not the full code but should get you started, I tested it, and it runs for me, I did not train anything yet though, that's something for you, good luck!
Oh, and questions, ask away
I have following dict and would like to get a key from it based upon list of values:
d = {
'Mot': [5250, 1085, 1085, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
'Dek': [0, 0, 0, 105, 0, 0, 0, 0, 0, 0, 0, 0, 0],
'Nas': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
'Ost': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
'Suk': [0, 0, 0, 0, 0, 0, 0, 3156, 1320, 450, 0, 0, 0],
'Tas': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 250, 0, 0],
'Sat': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6551, 5000]
}
dz = [[5250, 1085, 1085, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 105, 0, 0, 0, 0, 0, 0, 0, 0, 0]]
I tried to adopt get() method without any success (returning this error: unhashable type: 'list'; the same happened when I tried to have numpy's array instead returning: unhashable type: 'numpy.ndarray'):
tN= []
for index, element in enumerate(dz):
tN.append(dict((v,k) for k,v in dict_res.items()).get(element))
Is there any way how to retrieve the values from dictionary like that?
You can do something like this:
d = {
'Mot': [5250, 1085, 1085, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
'Dek': [0, 0, 0, 105, 0, 0, 0, 0, 0, 0, 0, 0, 0],
'Nas': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
'Ost': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
'Suk': [0, 0, 0, 0, 0, 0, 0, 3156, 1320, 450, 0, 0, 0],
'Tas': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 250, 0, 0],
'Sat': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6551, 5000]
}
dz = [[5250, 1085, 1085, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],[0, 0, 0, 105, 0, 0, 0, 0, 0, 0, 0, 0, 0]]
keys = [key for elem in dz for key, value in d.items() if elem==value]
print(keys)
Output:
['Mot', 'Dek']
UPDATE:
Modified code so that you get the right order of keys. In case of:
dz = [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6551, 5000],
[5250, 1085, 1085, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 105, 0, 0, 0, 0, 0, 0, 0, 0, 0]]
the output is:
['Sat', 'Mot', 'Dek']
The solution posted above by Vasilis G. unfortunately reorders the returned values so in case of having:
dz = [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6551, 5000],
[5250, 1085, 1085, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 105, 0, 0, 0, 0, 0, 0, 0, 0, 0]]
It outputs:
['Mot', 'Dek', 'Sat']
For cases one needs to know the true order of values in the list, this might be a better choice:
keys = []
for i in dz:
keys.append(list(d.keys())[list(d.values()).index(i)])
#using list comprehension:
#keys = [list(d.keys())[list(d.values()).index(i)] for i in dz]
print(keys)
It prints:
['Sat', 'Mot', 'Dek']