Is it possible to handle a matrix with strings and numbers? - python
I am a beginner, self-taught.
I am wondering if a matrix filled with strings and numbers can be made in python? Something similar like handle it in bash?
The type of matrix (14 rows x 14 columns) that I would like to handle is:
,,1,2,3,4,5,6,7,8,9,10,11,12
,,C,O,O,C,H,H,H,C,C,H,H,H
1,C,0.0,1.205475107329386,1.3429319010227962,2.3430136323519886,3.22738313640333,2.640130058756468,2.6401484355574363,1.4784953771865779,2.4427526711622995,3.4404701049315856,2.6506415109695562,2.173942147030341
2,O,1.205475107329386,0.0,2.245467917547002,2.6443156030953032,3.702905546101439,2.6354536594179083,2.6355724561170515,2.3918864536893496,2.871975783234887,3.9479515489105172,2.5936449600745437,3.2896946757332293
3,O,1.3429319010227962,2.245467917547002,0.0,1.418915551312475,2.015476882415432,2.0693088134923188,2.0692958839669946,2.3236193736523485,3.560975969980456,4.431347320573397,3.951843753512012,2.4366421143893597
4,C,2.3430136323519886,2.6443156030953032,1.418915551312475,0.0,1.0868846056358739,1.0921261760040055,1.092126228351473,3.6419246237091034,4.772348473634059,5.725281935435472,4.948741644534887,3.855293676517857
5,H,3.22738313640333,3.702905546101439,2.015476882415432,1.0868846056358739,0.0,1.7916118321336392,1.7916073980710447,4.336840746006843,5.570012200282658,6.44436962662531,5.876935928592363,4.304036910039309
6,H,2.640130058756468,2.6354536594179083,2.0693088134923188,1.0921261760040055,1.7916118321336392,0.0,1.774322615322816,3.999843247699306,5.001451201004137,5.992370839831868,5.038926795069471,4.349546588337786
7,H,2.6401484355574363,2.6355724561170515,2.0692958839669946,1.092126228351473,1.7916073980710447,1.774322615322816,0.0,3.9999029642804302,5.001556219427222,5.992449776200327,5.039085741282741,4.349558376763068
8,C,1.4784953771865779,2.3918864536893496,2.3236193736523485,3.6419246237091034,4.336840746006843,3.999843247699306,3.9999029642804302,0.0,1.324770443414403,2.107792016824585,2.085364895492881,1.079295724832157
9,C,2.4427526711622995,2.871975783234887,3.560975969980456,4.772348473634059,5.570012200282658,5.001451201004137,5.001556219427222,1.324770443414403,0.0,1.0763707503087891,1.0781013610472885,2.1192372863195152
10,H,3.4404701049315856,3.9479515489105172,4.431347320573397,5.725281935435472,6.44436962662531,5.992370839831868,5.992449776200327,2.107792016824585,1.0763707503087891,0.0,1.8418880170159488,2.4949700018092598
11,H,2.6506415109695562,2.5936449600745437,3.951843753512012,4.948741644534887,5.876935928592363,5.038926795069471,5.039085741282741,2.085364895492881,1.0781013610472885,1.8418880170159488,0.0,3.067298402780731
12,H,2.173942147030341,3.2896946757332293,2.4366421143893597,3.855293676517857,4.304036910039309,4.349546588337786,4.349558376763068,1.079295724832157,2.1192372863195152,2.4949700018092598,3.067298402780731,0.0
If your data comes from a csv file, you can use the standard csv module:
import csv
reader = csv.reader('data.csv')
matrix = list(reader)
or you can use Pandas package (pip install pandas or conda install pandas)
import pandas as pd
matrix = pd.read_csv('data.csv')
If you are manually entering the values in the form of a matrix, you can simply use Numpy Arrays for having multiple data types into one by setting dtype as object.
import numpy as np
# 1D Matrix
matrix = np.array(['d',1,'e','c',2,5],dtype='object')
# 2D Matrix (as per your case)
matrix = np.array([[1, 's', 2], ['h', 4, 6]],dtype='object')
Related
Is there a numpy function to find an array in multi dimensional array?
I have a numpy array with n row and p columns. I want to check if a given row is in my array and find the index. For exemple I have a numpy array like this : [[1,0,8,7,2,2],[1,3,7,0,3,0],[1,7,1,0,1,0],[1,9,1,0,6,0],[1,8,1,7,9,0],....] I want to check if this array [6,0,5,8,2,1] is in my numpy array or and where. Is there a numpy function for that ? I'm sorry for asking naive question but I'm quite confuse right now.
You can use == and .all(axis=1) to match entire rows, then use numpy.where() to get the index: import numpy as np a = np.array([[1,0,8,7,2,2],[1,3,7,0,3,0],[1,7,1,0,1,0],[1,9,1,0,6,0],[1,8,1,7,9,0], [6,0,5,8,2,1]]) b = np.array([6,0,5,8,2,1]) print(np.where((a==b).all(axis=1))) Output: (array([5], dtype=int32),)
Python split string inside a numpy array
I have an numpy array like this Input array([['ATS1, ATS2', 'P_CD'], ['ATS1,ATS2,ATS3', 'C_CD']], dtype=object) I would like to convert this numpy array as stated below Expected output array([['ATS1' , 'ATS2', 'P_CD'], ['ATS1','ATS2','ATS3', 'C_CD']], dtype=object) As you can notice above, I would like to split the string with a delimeter and make it as a separate entry Any suggestions on how to achieve using python?
You can use re.split and join This is just changing the type but this results in numpy array of lists as inside sub lists can be of variable length so they will not be of numpy array types. import numpy as np import re arr = np.array([['ATS1, ATS2', 'P_CD'], ['ATS1,ATS2,ATS3', 'C_CD']], dtype=object) arr = np.array([re.split('[-,]','-'.join(ele)) for ele in arr] ,dtype=object) print(arr)
Export a numpy matrix of complex numbers to CSV
I am having the following trouble in Python. Assume a numpy.matrix A with entities of dtype to be complex128. I want to export A in CSV format so that the entries are separated by commas and each line at the output file corresponds to a row of A. I also need 18 decimal points of precision for both the real and imaginary parts and no spaces within an entry for example I need this `6.103515626000000000e+09+1.712134684679831166e+05j` instead of `6.103515626000000000e+09 + 1.712134684679831166e+05j` The following command works but only for 1-by-1 matrix numpy.savetxt('A.out', A, fmt='%.18e%+.18ej', delimiter=',') If I use: numpy.savetxt('A.out', A, delimiter=',') there are two problems. First, I don't know how many decimal points are preserved by default. Second, each complex entry is put in parentheses like (6.103515626000000000e+09+1.712134684679831166e+05j) and I cannot read the file in Matlab. What do you suggest?
This is probably not the most efficient way of converting data in the large matrix and I am sure there exists a more efficient one-line-of-code solution, but you can try executing the code below and see if it works. Here I will be using pandas to save data to a csv file. The first columns in the generated csv file would be respectively your real and imaginary parts. Here I also assume that the dimension of the input matrix is Nx1. import pandas as pd import numpy as np def to_csv(t, nr_of_decimal = 18): t_new = np.matrix(np.zeros((t.shape[0], 2))) t_new[:,:] = np.round(np.array(((str(np.array(t[:])[0][0])[1:-2]).split('+')), dtype=float), decimals=nr_of_decimal) (pd.DataFrame(t_new)).to_csv('out.csv', index = False, header = False) #Assume t is your complex matrix t = np.matrix([[6.103515626000000000e+09+1.712134684679831166e+05j], [6.103515626000000000e+09+1.712134684679831166e+05j]]) to_csv(t)
Numpy 2D array to Table
I've got an 18x18 2d numpy array (it's a confusion matrix)...and I need/would like to display it as a table in an ipython notebook. When I simply print it out, it displays with overlap--the rows are so long they take up two lines. Is there a library that will allow me to print this array in a sort of spreadsheet format?
You can use Pandas for that. import pandas as pd print pd.DataFrame(yourArray)
Note: Konstantinos proposal holds only for 1-D and 2-D arrays! You can use numpy.array2string(): from pprint import pprint import numpy as np array = np.array([[1,2,3], [4,5,6]]) print(np.array2string(array).replace('[[',' [').replace(']]',']')) Output: [1 2 3] [4 5 6] See also: Printing Lists as Tabular Data
Data.Frames in Python Numpy
How do I build data.frames containing multiple types of data(Strings, int, logical) and both continuous and factors in Python Numpy? The following code makes my headers NaN's and all but my float values Nan's from numpy import genfromtxt my_data = genfromtxt('FlightDataTraining.csv', delimiter=',') This puts a "b'data'" on all of my data, such that year becomes "b'year'" import numpy as np d = np.loadtxt('FlightDataTraining.csv',delimiter=',',dtype=str)
Try genfromtxt('FlightDataTraining.csv', delimiter=',', dtype=None). This tells genfromtxt to intelligently guess the dtype of each column. If that does not work, please post a sample of your CSV and what the desired output should look like. The b in b'data' is Python's way of representing bytes as opposed to str objects. So the b'data' is okay. If you want strs, you would need to decode the bytes. NumPy does not have a dtype for representing factors, though Pandas does have a pd.Categorical type.