Read contents of a text file into a dictionary - python

The contents of my text file are as follows:
a 0 45 124 234 53 12 34
a 90 294 32 545 190 87
a 180 89 63 84 73 63 83
How can I read the contents into a dictionary such that a0 becomes the key and the rest of them as values. I would want my dictionary to look like this:
{a0: [45, 124, 234, 53, 12, 34], a90: [294, 32, 545, 190, 87], a180: [89, 63, 84, 73, 63, 83]}
I have tried the conventional approach where I remove the delimiter and then
store it in the dictionary as shown below
newt = {}
newt = {t[0]: t[1:] for t in data}
But here I get only a as the key

This may help you out (it's about Christmas time after all)
d = {}
with open("dd.txt") as f:
for line in f:
els = line.split()
k = ''.join(els[:2])
d[k] = list(map(int,els[2:]))
print(d)
with an input file of
a 0 45 124 234 53 12 34
a 90 294 32 545 190 87
a 180 89 63 84 73 63 83
it produces
{'a90': [294, 32, 545, 190, 87],
'a180': [89, 63, 84, 73, 63, 83],
'a0': [45, 124, 234, 53, 12, 34]}
It essentially reads each line from the file, it then splits it into chunks ignoring whitespace.
It then uses the first two elements to compose the key and the rest to compose a list, converting each element into an integer.
I have assumed you want the numbers as integers. If you want them as strings you can ignore the conversion to int
d[k] = els[2:]

If you like one-liners (kind-of):
with open('my_file.txt') as f:
res = {''.join(r.split(None, 2)[:2]): [int(x) for x in r.split()[2:]] for r in f}
>>> res
{'a0': [45, 124, 234, 53, 12, 34],
'a180': [89, 63, 84, 73, 63, 83],
'a90': [294, 32, 545, 190, 87]}

Related

Accessing required number of indices in an array

I have an array like:
a=np.array([20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67
68 69 70 71 72 73 74 75 76 77 78 79])
requirement:
I would to like to access 10 indices in an array
the above array length is 60,60/10=6. So, i need every 6th indices in an array a.
required output:[0,6,12,18,24,30,36,42,48,64,60]
Numpy is powerful i would recommend to read the Documentation about indexing in numpy
everySixthEntry=a[np.arange(0,a.shape[0],6)]
You can generate the indexes for any array a with np.arange(len(a)). To access every 6th index use the a slice a[start:stop:step]. Jack posted one way, here a bit more detailed.
import numpy as np
# define your data. a = [20, ..., 79]
a = np.arange(60) + 20
# generate indexes for the array, index start at 0 till len(a)-1
indexes = np.arange(len(a))
# reduce the indexes to every 6th index
indexes = indexes[::6] # [start:stop:step]
print(indexes)
# -> array([ 0, 6, 12, 18, 24, 30, 36, 42, 48, 54])
# 60 isn't included as the array is only 59 long
The same result a bit different. You can also use np.arange steps.
# the same result a bit different
indexes = np.arange(0, len(a), 6) # (start,stop,step)
print(indexes)
# -> array([ 0, 6, 12, 18, 24, 30, 36, 42, 48, 54])
and in case you want to access the values of your original array
print(a[indexes])
# -> array([20, 26, 32, 38, 44, 50, 56, 62, 68, 74])
Basics of slicing
a[start:stop:step] is equivalent to a[slice(start, stop, step)]. If you don't want to specify any of start, stop, step set it to None. start and stop takes values from 0 to len(a)-1 and negative represents the position from the end of the array.
Some Slice Examples:
step = 20
a[slice(None, None, step)], a[slice(0, -1, step)], a[0: -1: step], a[::step]
# all -> array([20, 40, 60])
# the first 4 elements
step = 1
start = 0 # or None
end = 5
a[slice(start, end, step)], a[slice(start, end)] , a[start: end: step] , a[start:end]
# all -> array([20, 21, 22, 23])
# the last 4 elements
step = 1
start = -4
end = None # -1 will cute the last entry
a[slice(start, end, step)], a[slice(start, end)] , a[start: end: step] , a[start:end]
# all -> array([76, 77, 78, 79]
I think you meant to say:
The required index values are [0,6,12,18,24,30,36,42,48,64,60]
Corresponding output values are [20, 26, 32, 38, 44, 50, 56, 62, 68, 74]
The code below should give you the values for every 6th index.
a=np.array([20,21,22,23,24,25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43,
44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79,])
Out=[]
for i in range(10):
Out.append(a[6*i])
print(Out)
Output is :
[20, 26, 32, 38, 44, 50, 56, 62, 68, 74]
If the Index values are required: Do the following
Out1=[]
for i in range(0,11): #for only 10 indices (going from 0 to 10)
print(6*i)
Out1.append(6*i)
print("The required index values is : {}".format(Out1))
This gives an output :
0
6
12
18
24
30
36
42
48
54
60
The required index values is : [0, 6, 12, 18, 24, 30, 36, 42, 48, 54, 60]

Getting each column in a 3d numpy array

I converted an image from RBG to CieLab, now I need to use the value of the cielab to calculate some equations.
I have been trying to get the value of each column in the array. For example if I have:
List =
[[[ 65 234 169]
[203 191 245]
[ 36 58 196]
[207 208 143]
[251 208 187]]
[[ 79 69 237]
[ 13 124 42]
[104 165 82]
[170 178 178]
[ 66 42 210]]
[[ 40 163 219]
[142 37 140]
[ 75 205 143]
[246 30 221]
[ 16 98 102]]]
How can I get it to give me the values of each columns like:
1st_column =
65
203
36
207
251
79
13
104
170
66
40
142
75
246
16
Thank you.
Try:
>>> m[:, :, 0]
array([[ 65, 203, 36, 207, 251],
[ 79, 13, 104, 170, 66],
[ 40, 142, 75, 246, 16]])
As suggested by #mozway, you can use the ellipsis syntax: m[..., 0].
To know more, read How do you use the ellipsis slicing syntax in Python?
You can also flatten your array:
>>> m[:, :, 0].flatten()
array([ 65, 203, 36, 207, 251, 79, 13, 104, 170, 66, 40, 142, 75, 246, 16])

Access x_train columns after train test split function

After the splitting of my data, im trying a feature ranking but when im trying to access the X_train.columns im getting this 'numpy.ndarray' object has no attribute 'columns'.
from sklearn.model_selection import train_test_split
y=df['DIED'].values
x=df.drop('DIED',axis=1).values
X_train,X_test,y_train,y_test=train_test_split(x,y,test_size=0.3,random_state=42)
print('X_train',X_train.shape)
print('X_test',X_test.shape)
print('y_train',y_train.shape)
print('y_test',y_test.shape)
bestfeatures = SelectKBest(score_func=chi2, k="all")
fit = bestfeatures.fit(X_train,y_train)
dfscores = pd.DataFrame(fit.scores_)
dfcolumns = pd.DataFrame(X_train.columns)
i know that train test split returns a numpy array, but how i should deal with it?
May be this code makes it clear:
from sklearn.model_selection import train_test_split
import numpy as np
import pandas as pd
# here i imitate your example of data
df = pd.DataFrame(data = np.random.randint(100, size = (50,5)), columns = ['DIED']+[f'col_{i}' for i in range(4)])
df.head()
Out[1]:
DIED col_0 col_1 col_2 col_3
0 36 0 23 43 55
1 81 59 83 37 31
2 32 86 94 50 87
3 10 69 4 69 27
4 1 16 76 98 74
#df here is a DataFrame, with all attributes, like df.columns
y=df['DIED'].values
x=df.drop('DIED',axis=1).values # <- here you get values, so the type of structure is array of array now (not DataFrame), so it hasn't any columns name
x
Out[2]:
array([[ 0, 23, 43, 55],
[59, 83, 37, 31],
[86, 94, 50, 87],
[69, 4, 69, 27],
[16, 76, 98, 74],
[17, 50, 52, 31],
[95, 4, 56, 68],
[82, 35, 67, 76],
.....
# now you can access to columns by index, like this:
x[:,2] # <- gives you access to the 3rd column
Out[3]:
array([43, 37, 50, 69, 98, 52, 56, 67, 81, 64, 48, 68, 14, 41, 78, 65, 11,
86, 80, 1, 11, 32, 93, 82, 93, 81, 63, 64, 47, 81, 79, 85, 60, 45,
80, 21, 27, 37, 87, 31, 97, 16, 59, 91, 20, 66, 66, 3, 9, 88])
# or you able to convert array of array back to DataFrame
pd.DataFrame(data = x, columns = df.columns[1:])
Out[4]:
col_0 col_1 col_2 col_3
0 0 23 43 55
1 59 83 37 31
2 86 94 50 87
3 69 4 69 27
....
The same approach with all your variables: X_train, X_test, Y_train, Y_test

Python: Randomly insert multiple rows into 3D numpy array

I have a 3D array arr of size (2, 5, 5). I also have another array rows_to_ins of size (3, 5).
I would like to randomly insert rows_to_insert into each page of arr. However, rows_to_insert must not be inserted as a block. In addition, the position to insert should be random for ever page of arr.
However, I am struggling with efficiently inserting rows_to_ins. My current solution incorporates a for-loop.
import numpy as np
arr = np.arange(100, 125).reshape(5, 5)
arr = np.repeat(arr[None, :, :], 2, axis=0)
rows_to_ins = np.random.randint(0, 99, (3,5))
row_nums_3D = np.random.randint(0, arr.shape[1], (2, 1, 3))
arr_ins = list()
for i in range(row_nums_3D.shape[0]):
arr_ins.append(np.insert(arr[i, :, :], np.squeeze(row_nums_3D[i, :, :]), rows_to_ins, axis=0))
arr_ins = np.asarray(arr_ins)
I am wondering, if I can avoid the for-loop. What would a vectorize solution look like?
Maybe a more concrete example will help to understand my problem.
# arr - shape (2, 5, 5)
[[[100 101 102 103 104]
[105 106 107 108 109]
[110 111 112 113 114]
[115 116 117 118 119]
[120 121 122 123 124]]
[[100 101 102 103 104]
[105 106 107 108 109]
[110 111 112 113 114]
[115 116 117 118 119]
[120 121 122 123 124]]]
# rows_to_insert - shape(3, 5)
[[37 31 28 34 10]
[ 2 97 89 36 11]
[66 14 70 37 45]]
I am looking for a potential result such like this:
# 3D array with insertet rows - shape (2, 8, 5)
[[[100 101 102 103 104]
[ 2 97 89 36 11]
[66 14 70 37 45]
[105 106 107 108 109]
[110 111 112 113 114]
[115 116 117 118 119]
[120 121 122 123 124]
[37 31 28 34 10]]
[[66 14 70 37 45]
[100 101 102 103 104]
[105 106 107 108 109]
[ 2 97 89 36 11]
[110 111 112 113 114]
[37 31 28 34 10]
[115 116 117 118 119]
[120 121 122 123 124]]]
Here's a vectorized way -
def insert_random_places(arr, rows_to_ins):
m,n,r = arr.shape
N = len(rows_to_ins) + n
idx = np.random.rand(m,N).argsort(1)
out = np.zeros((m,N,r),dtype=np.result_type(arr, rows_to_ins))
np.put_along_axis(out,np.sort(idx[:,:n,None],axis=1),arr,axis=1)
np.put_along_axis(out,idx[:,n:,None],rows_to_ins,axis=1)
return out
Sample run -
In [58]: arr
Out[58]:
array([[[100, 101, 102, 103, 104],
[105, 106, 107, 108, 109],
[110, 111, 112, 113, 114],
[115, 116, 117, 118, 119]],
[[100, 101, 102, 103, 104],
[105, 106, 107, 108, 109],
[110, 111, 112, 113, 114],
[115, 116, 117, 118, 119]]])
In [59]: rows_to_ins
Out[59]:
array([[77, 72, 9, 20, 80],
[69, 79, 47, 64, 82]])
In [60]: np.random.seed(0)
In [61]: insert_random_places(arr, rows_to_ins)
Out[61]:
array([[[100, 101, 102, 103, 104],
[ 69, 79, 47, 64, 82],
[105, 106, 107, 108, 109],
[110, 111, 112, 113, 114],
[115, 116, 117, 118, 119],
[ 77, 72, 9, 20, 80]],
[[100, 101, 102, 103, 104],
[ 77, 72, 9, 20, 80],
[ 69, 79, 47, 64, 82],
[105, 106, 107, 108, 109],
[110, 111, 112, 113, 114],
[115, 116, 117, 118, 119]]])
Another one based on masking -
def insert_random_places_v2(arr, rows_to_ins):
m,n,r = arr.shape
L = len(rows_to_ins)
N = L + n
insert_idx = np.random.rand(m,N).argpartition(kth=-L,axis=1)[:,-L:]
mask = np.zeros((m,N),dtype=bool)
np.put_along_axis(mask,insert_idx,1,axis=1)
out = np.zeros((m,N,r),dtype=np.result_type(arr, rows_to_ins))
rows_to_ins_3D = rows_to_ins[np.random.rand(m,L).argsort(1)]
out[mask] = rows_to_ins_3D.reshape(-1,r)
out[~mask] = arr.reshape(-1,r)
return out

How to convert ASCII integers back to char within a loop?

I'm trying to convert multiple ASCII ints back to char and have it as a single string. I know how to do it one by one but I can't think of how to do it in a loop. This is the code I have to grab all the ascii ints in my ascii_message variable:
for c in ascii_message:
ascii_int = ord(c)
Thanks!
An efficient way to do this in Python 2 is to load the list into a bytearray object & then convert that to a string. Like this:
ascii_message = [
83, 111, 109, 101, 32, 65, 83, 67,
73, 73, 32, 116, 101, 120, 116, 46,
]
a = bytearray(ascii_message)
s = str(a)
print s
output
Some ASCII text.
Here's a variation that works correctly in both Python 2 & 3.
a = bytearray(ascii_message)
s = a.decode('ASCII')
However, in Python 3, it'd be more usual to use an immutable bytes object rather than a mutable bytearray.
a = bytes(ascii_message)
s = a.decode('ASCII')
The reverse procedure can also be done efficiently with a bytearray in both Python 2 and 3.
s = 'Some ASCII text.'
a = list(bytearray(s.encode('ASCII')))
print(a)
output
[83, 111, 109, 101, 32, 65, 83, 67, 73, 73, 32, 116, 101, 120, 116, 46]
If your "list of numbers" is actually a string, you can convert it to a proper list of integers like this.
numbers = '48 98 49 48 49 49 48 48 48 49 48 49 48 49 48 48'
ascii_message = [int(u) for u in numbers.split()]
print(ascii_message)
a = bytearray(ascii_message)
s = a.decode('ASCII')
print(s)
output
[48, 98, 49, 48, 49, 49, 48, 48, 48, 49, 48, 49, 48, 49, 48, 48]
0b10110001010100
That looks the binary representation of a 14 bit number. So I guess there are further steps to solving this puzzle. Good luck!

Categories