Dissimilarity between point and centroid in a kmode cluster - python

Is there a way to measure and plot the dissimilarity between cluster centroids and rows in a matrix - maybe without clustering both? The centroids are from previous cluster analysis. I have five cluster centroids and want to check if the rows in the matrix can be assigned to a cluster. Both are categorical data.
In [1]: matrixDf = array([['a', 'a', 'c', 'b', 'b'],
['a', 'c', 'c', 'b', 'b'],
['a', 'c', 'e', 'c', 'b'],
['a', 'c', 'e', 'b', 'b'],
...
['a', 'a', 'c', 'c', 'b']])
In [10]: clusterCentroidsDf = pd.DataFrame(km_cao.cluster_centroids_)
Out [3]: clusterCentroidsDf = array([['c', 'a', 'a', 'c', 'b'],
['a', 'c', 'a', 'c', 'a'],
['b', 'f', 'b', 'd', 'c'],
['d', 'c', 'f', 'd', 'd'],
['d', 'e', 'd', 'c', 'b']])

Related

how do I input custom arrays into rows & columns in 2d character array

Rows = int(input("give the number of rows:"))
Columns = int(input("Give the number of columns:"))
matrix = []
for i in range(Rows):
matrix.append(['a', 'b', 'c','d', 'e'])
for vector in matrix:
print(matrix)
here's the output:
give the number of rows:3
Give the number of columns:3
[['a', 'b', 'c', 'd', 'e']]
[['a', 'b', 'c', 'd', 'e'], ['a', 'b', 'c', 'd', 'e']]
[['a', 'b', 'c', 'd', 'e'], ['a', 'b', 'c', 'd', 'e']]
[['a', 'b', 'c', 'd', 'e'], ['a', 'b', 'c', 'd', 'e'], ['a', 'b', 'c', 'd', 'e']]
[['a', 'b', 'c', 'd', 'e'], ['a', 'b', 'c', 'd', 'e'], ['a', 'b', 'c', 'd', 'e']]
[['a', 'b', 'c', 'd', 'e'], ['a', 'b', 'c', 'd', 'e'], ['a', 'b', 'c', 'd', 'e']]
[it needed to be like this when the user input the rows and columns 3x3]
a b c
d e f
g h i
There are many ways to initalize an array with a specific size. Below is one of the more concise ways.
Rows = int(input("Give the number of rows:"))
Columns = int(input("Give the number of columns:"))
matrix = [["a"]*Rows]*Columns
print(matrix)
This will give the output
Give the number of rows:3
Give the number of columns:3
[['a', 'a', 'a'], ['a', 'a', 'a'], ['a', 'a', 'a']]
This gives the array sizing that you are looking for.

subsequencing of list of sequence

I have following sequence of data:
['A',
'A',
'A',
'A',
'A',
'A',
'A',
'D',
'D',
'D',
'D',
'D',
'D',
'A',
'A',
'A',
'A',
'A',
'D',
'D',
'D',
'D',
'D',
'D']
How would I be able to create subsequence (list of list) as an example:
[['A',
'A',
'A',
'A',
'A',
'A',
'A'],
['D',
'D',
'D',
'D',
'D',
'D'],
['A',
'A',
'A',
'A',
'A'], ['D',
'D',
'D',
'D', 'D', 'D']]
That is I want to create a sublist which accumulates the first encountered value (eg 'A' or 'D' and append that to sublist and continue until it arrives at a different alphabet. The second list contains the sequence of the letters that were different than the first sequence and appends as sublist. The process continues until the last element of the main list.
itertools.groupby is a good tool for this kind of tasks:
from itertools import groupby
lst = ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'D', 'D', 'D', 'D', 'D', 'D', 'A', 'A', 'A', 'A', 'A', 'D', 'D', 'D', 'D', 'D', 'D']
output = [list(g) for _, g in groupby(lst)]
print(output)
# [['A', 'A', 'A', 'A', 'A', 'A', 'A'], ['D', 'D', 'D', 'D', 'D', 'D'], ['A', 'A', 'A', 'A', 'A'], ['D', 'D', 'D', 'D', 'D', 'D']]
Solution:
import itertools
lis = ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'D', 'D', 'D', 'D', 'D', 'D', 'A', 'A', 'A', 'A', 'A', 'D', 'D', 'D', 'D', 'D', 'D']
print([list(x[1]) for x in itertools.groupby(lis)])
Output:
[['A', 'A', 'A', 'A', 'A', 'A', 'A'], ['D', 'D', 'D', 'D', 'D', 'D'], ['A', 'A', 'A', 'A', 'A'], ['D', 'D', 'D', 'D', 'D', 'D']]

How to cut a txt to a matrix in python?

I've got a txt which looks like this:
JB651 ACDCCADBCADCDA
JB831 ACACCBBBBBDDAC
JC124 DACBDBBACBDCDC
JD329 BAACDBABCCDDAB
JD830 BDDCDBABBBAAAD
JI428 DCBCCBBBBCCCCA
And I have to cut it to a matrix (I would say first I have to count the lines for a for loop) looks like this:
[[JI428] [D,C,B,C,C,B,B,B,B,C,C,C,C,A]]
And then, how can I refer to any line or letter of the 2nd part? (I'm a totally beginer)
You can open the file and read each line, splitting at the space present. To be able to access any line and thus any letter, it may be best to use a dictionary:
with open('filename.txt') as f:
f = [i.strip('\n').split() for i in f]
final_data = {a:list(b) for a, b in f}
Output:
{'JD830': ['B', 'D', 'D', 'C', 'D', 'B', 'A', 'B', 'B', 'B', 'A', 'A', 'A', 'D'], 'JC124': ['D', 'A', 'C', 'B', 'D', 'B', 'B', 'A', 'C', 'B', 'D', 'C', 'D', 'C'], 'JB651': ['A', 'C', 'D', 'C', 'C', 'A', 'D', 'B', 'C', 'A', 'D', 'C', 'D', 'A'], 'JB831': ['A', 'C', 'A', 'C', 'C', 'B', 'B', 'B', 'B', 'B', 'D', 'D', 'A', 'C'], 'JD329': ['B', 'A', 'A', 'C', 'D', 'B', 'A', 'B', 'C', 'C', 'D', 'D', 'A', 'B'], 'JI428': ['D', 'C', 'B', 'C', 'C', 'B', 'B', 'B', 'B', 'C', 'C', 'C', 'C', 'A']}
You can access the rows by passing a key from the first column:
print(final_data['JD830'])
Output:
['B', 'D', 'D', 'C', 'D', 'B', 'A', 'B', 'B', 'B', 'A', 'A', 'A', 'D']

Appending a value to start of list

Currently working on a 2D transposition cipher in Python. So I have a list that contains an encoded message, like below:
['BF', 'AF', 'AF', 'DA', 'CD', 'DD', 'BC', 'EF', 'DA', 'AA', 'EF', 'BF']
The next step is taking that list, splitting it up and putting it into a new matrix with regards to a keyword that the user enters. Which I have below:
Enter the keyword for final encryption: hide
H I D E
['B', 'F', 'A', 'F']
['A', 'F', 'D', 'A']
['C', 'D', 'D', 'D']
['B', 'C', 'E', 'F']
['D', 'A', 'A', 'A']
['E', 'F', 'B', 'F']
What I would like to do next and haven't done is take each of the columns above and print them in alphabetical order, therefore getting another cipher text, like below:
D E H I
['A', 'F', 'B', 'F']
['D', 'A', 'A', 'F']
['D', 'D', 'C', 'D']
['E', 'F', 'B', 'C']
['A', 'A', 'D', 'A']
['B', 'F', 'E', 'F']
Here's my code:
def encodeFinalCipher():
matrix2 = []
# Convert keyword to upper case
keywordKey = list(keyword.upper())
# Convert firstEncryption to a string
firstEncryptionString = ''.join(str(x) for x in firstEncryption)
# Print the first table that will show the firstEncryption and the keyword above it
keywordList = list(firstEncryptionString)
for x in range(0,len(keywordList),len(keyword)):
matrix2.append(list(keywordList[x:x+len(keyword)]))
# Print the matrix to the screen
print (' %s' % ' '.join(map(str, keywordKey)))
for letters in matrix2:
print (letters)
return finalEncryption
I have traversed the 2D matrix and got all the column entries like below:
b = [[matrix2[i][j] for i in range(len(matrix2))] for j in range(len(matrix2[0]))]
for index, item in enumerate (b):
print("\n",index, item)
OUTPUT:------
0 ['B', 'A', 'C', 'B', 'D', 'E']
1 ['F', 'F', 'D', 'C', 'A', 'F']
2 ['A', 'D', 'D', 'E', 'A', 'B']
3 ['F', 'A', 'D', 'F', 'A', 'F']
How would I append each letter of the keywordKey (e.g. 'H' 'I' 'D' 'E') to the list where the numbers 0,1,2,3 are?
Or probably a more efficient solution. How would I put the letters into the keywordKey columns when creating the matrix? Would a dictionary help here? Then I could sort the dictionary and print the final cipher.
Many thanks
You can do something like this:
>>> from operator import itemgetter
>>> from pprint import pprint
>>> lst = [['B', 'F', 'A', 'F'],
['A', 'F', 'D', 'A'],
['C', 'D', 'D', 'D'],
['B', 'C', 'E', 'F'],
['D', 'A', 'A', 'A'],
['E', 'F', 'B', 'F']]
>>> key = 'HIDE'
Sort xrange(len(key)) or range(len(key)) using the corresponding values from key and then you will have a list of indices:
>>> indices = sorted(xrange(len(key)), key=key.__getitem__)
>>> indices
[2, 3, 0, 1]
Now all we need to do is loop over the list and apply these indices to each item using operator.itemgetter and get the corresponding items:
>>> pprint([list(itemgetter(*indices)(x)) for x in lst])
[['A', 'F', 'B', 'F'],
['D', 'A', 'A', 'F'],
['D', 'D', 'C', 'D'],
['E', 'F', 'B', 'C'],
['A', 'A', 'D', 'A'],
['B', 'F', 'E', 'F']]
#or simply
>>> pprint([[x[i] for i in indices] for x in lst])
[['A', 'F', 'B', 'F'],
['D', 'A', 'A', 'F'],
['D', 'D', 'C', 'D'],
['E', 'F', 'B', 'C'],
['A', 'A', 'D', 'A'],
['B', 'F', 'E', 'F']]

Iteratively collect first two elements of each vector of a matrix

I have a matrix:
matrix = [['F', 'B', 'F', 'A', 'C', 'F'],
['D', 'E', 'B', 'E', 'B', 'E'],
['F', 'A', 'D', 'B', 'F', 'B'],
['B', 'E', 'F', 'B', 'D', 'D']]
I want to remove and collect the first two elements of each sub-list, and add them to a new list.
so far i have got:
while messagecypher:
for vector in messagecypher:
final.extend(vector[:2])
the problem is; the slice doesn't seem to remove the elements, and I end up with a huge list of repeated chars. I could use .pop(0) twice, but that isn't very clean.
NOTE: the reason i remove the elements is becuase i need to keep going over each vector until the matrix is empty
You can keep your slice and do:
final = []
for i in range(len(matrix)):
matrix[i], final = matrix[i][:2], final + matrix[i][2:]
Note that this simultaneously assigns the sliced list back to matrix and adds the sliced-off part to final.
Well you can use a list comprehension to get the thing done, but its perhaps counter-intuitive:
>>> matrix = [['F', 'B', 'F', 'A', 'C', 'F'],
['D', 'E', 'B', 'E', 'B', 'E'],
['F', 'A', 'D', 'B', 'F', 'B'],
['B', 'E', 'F', 'B', 'D', 'D']]
>>> while [] not in matrix: print([i for var in matrix for i in [var.pop(0), var.pop(0)]])
['F', 'B', 'D', 'E', 'F', 'A', 'B', 'E']
['F', 'A', 'B', 'E', 'D', 'B', 'F', 'B']
['C', 'F', 'B', 'E', 'F', 'B', 'D', 'D']
EDIT:
Using range makes the syntax look cleaner:
>>> matrix = [['C', 'B', 'B', 'D', 'F', 'B'], ['D', 'B', 'B', 'A', 'B', 'A'], ['B', 'D', 'E', 'F', 'C', 'B'], ['B', 'A', 'C', 'B', 'E', 'F']]
>>> while [] not in matrix: print([var.pop(0) for var in matrix for i in range(2)])
['C', 'B', 'D', 'B', 'B', 'D', 'B', 'A']
['B', 'D', 'B', 'A', 'E', 'F', 'C', 'B']
['F', 'B', 'B', 'A', 'C', 'B', 'E', 'F']
Deleting elements is not an efficient way to go about your task. It requires Python to perform a lot of unnecessary work shifting things around to fill the holes left by the deleted elements. Instead, just shift your slice over by two places each time through the loop:
final = []
for i in xrange(0, len(messagecypher[0]), 2):
for vector in messagecypher:
final.extend(vector[i:i+2])

Categories