I have got a file testforce.dat that shows values divided in 9 columns and 3 rows. The first 3 column represents:
p1 p2 p3 f1 f2 f3 r1 r2 r3
18 5 27 20 21 8 14 12 25
9 26 23 1 4 10 7 16 24
19 22 15 13 17 6 11 2 3
I have got 100 files of this fashion.
I now want to calculate for the file force_00000.dat the vector g = [sum(p1*f1), sum(p2*f2), sum(p3*f3)] but for the next file force_00001.dat the vector should use other columns h = [sum(p1*r1), sum(p2*r2), sum(p3*r3)].
At the moment I am using the glob function to read my files into arrays. It puts every row into one array.
I am not sure how to get my alternating array multiplication done and would appreciate any suggestions :)
import numpy as np
import glob
i = 100
for x in range(0,int(i)):
## turns x into a string and adds if necessary "0" to achieve a fixed digit number;
y = str(x).zfill(5)
## the structure of the forcefile is "force_[00000-00099]";
files = sorted(glob.glob('.//results/force/force_%s.dat' % y))
column_names=('#position')
print files
## loads the file data into arrays
arrays=[np.loadtxt(filename) for filename in files]
print arrays
Edit: I tested the load of the first file with:
b=np.array(arrays)
print b.shape
And I get (1,3,9) for the shape of my generated array.
Edit2: I had the idea to use "usecols" and then multiply the desired values:
xposition=[np.loadtxt(filename,usecols= (0,1,2)) for filename in files]
xforce1=[np.loadtxt(filename,usecols= (3,4,5)) for filename in files]
print xposition
print xforce1
xp=np.asarray(xposition)
xf1=np.asarray(xforce1)
print xp
g=np.multiply(xp,xf1)
print g
this generated the following output:
[[[ 360. 105. 216.]
[ 9. 104. 230.]
[ 247. 374. 90.]]]
which means I have (p11 and f11 being the values of the first row, p21 from second row...)
[[[p11*f11 p12*f12 p13*f13]
[p21*f21 p22*f22 p23*f23]
[p31*f31 p32*f32 p33*f33]]]
which seems like I am slmost done for atleast one file. The desired g(g1,g2,g3) should look like:
p11*f11+p21*f21+p31*f31= g1
p12*f12+p22*f22+p32*f32= g2
p13*f13+p23*f23+p33*f33= g3
Sorry if that is a totally newbie question but I am not so familliar with python yet :)
For the issue with the alternating values I was thinking about using an if function that checks if "i" in the loop is an even number
loadtxt returns an array. [loadtxt(name) for name in filenames] produces a list of arrays, one array per name. np.array([...]) produces an array from that list. If the individual arrays are all the same size, the resulting array will be 3d.
If you need to treat every other file differently you could access them as a set with indexing
arr[::2,...]
arr[1;:2,...]
To multiply the 2 sets of columns from your example file:
In [558]: txt=b"""p1 p2 p3 f1 f2 f3 r1 r2 r3
...: 18 5 27 20 21 8 14 12 25
...: 9 26 23 1 4 10 7 16 24
...: 19 22 15 13 17 6 11 2 3"""
In [560]: arr = np.loadtxt(txt.splitlines(),skiprows=1,dtype=int)
In [561]: arr
Out[561]:
array([[18, 5, 27, 20, 21, 8, 14, 12, 25],
[ 9, 26, 23, 1, 4, 10, 7, 16, 24],
[19, 22, 15, 13, 17, 6, 11, 2, 3]])
In [562]: arr[:, 0:3]*arr[:, 3:6]
Out[562]:
array([[360, 105, 216],
[ 9, 104, 230],
[247, 374, 90]])
In [563]: arr[:, 0:3]*arr[:, 6:9]
Out[563]:
array([[252, 60, 675],
[ 63, 416, 552],
[209, 44, 45]])
If arr was a 3d array from load multiple files,
arr1 = arr[::2,...]
arr2 = arr[1::2,...]
arr1[:,:,0:3] * arr1[:,:,3:6]
etc
Related
For example I have a matrix array
a=np.arrange(25).shape(5,5)
[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]
[20 21 22 23 24]]
How do I make an 1D array of elements that I would like to choose manually? For example [2,3], [4,1], [1,0] and [2,2], so I get the following:
b=[13, 21, 5, 12]
The array should a reference rather than a copy.
You can make a function for this.
# defining the function
def get_value(matrix, row_list, col_list):
for i, j in zip(row_list, col_list):
return matrix[row_list, col_list]
# initializing the array
a = np.arange(0, 25, 1).reshape(5, 5)
# getting the required values and printing
b = get_value(a, [2,4,1,0], [3,1,0,2])
# output
print(b)
Edit
I'll let the previous answer be as is, just in case if anyone else stumbles upon that and needs it.
What the question wants is to give a value from b (i.e. b[0] which is 13) and change the value from the original matrix a based on the index of that passed value from b in a.
def change_the_value(old_mat, val_to_change, new_val):
mat_coor = np.array(np.matrix(np.where(old_mat == val_to_change)).T)[0]
old_mat[mat_coor[0], mat_coor[1]] = new_val
a = np.arange(0, 25, 1).reshape(5,5)
b = [13, 16, 5, 12]
change_the_value(a, b[0], 0)
a=np.arange(25).reshape(5,5)
search=[[2,3], [4,1], [1,0], [2,2]]
for row,col in search:
print(row,col, a[row][col])
output:
r c result
2 3 13
4 1 21
1 0 5
2 2 12
First of all, I've found that constructing a non-contiguous view to a Numpy array is not natively possible, because Numpy efficiently utilises contiguous memory layout of an array, which enables dramatic speed increase.
Here's a solution I found that works the best so far:
Instead of having a view to an array, I construct a collection indices, that I would like to process, [2,3], [4,1], [1,0], [2,2].
The collection type I have chosen are Sets, due to exclusion of duplicates and set().add and set().discard methods that do not require search. Keeping order was not necessary.
To use them for indexing an array they have to be casted from a set of tuples set{(2,3),(4,1),(1,0),(2,2)} to a tuple of arrays (ndarray([2,4,1,2], ndarray[3,1,0,2]).
Which can be achieved by unzipping a set and constructing a tuple of arrays:
import numpy as np
a=np.arrange(25).shape(5,5)
>>>[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]
[20 21 22 23 24]]
my_set = {(2,3),(4,1),(1,0),(2,2)}
uzip_set = list(zip(*my_set))
seq_from_set = (np.asarray(uzip_set[0]),np.asarray(uzip_set[1]))
print(seq_from_set)
>>>(array[2,4,1,2], array[3,1,0,2])
And array a can be manipulated by providing such a sequence of indices:
b = a[seq_from_set]
print(b)
>>>array[13,21,5,12]
a[seq_from_set] = 0
print(a)
>>>[[ 0 1 2 3 4]
[ 0 6 7 8 9]
[10 11 0 0 14]
[15 16 17 18 19]
[20 0 22 23 24]]
The solution is a bit sophisticated compared to something native, but works surprisingly fast. This allows an easy management of the collection of indices and supports quick conversion to a stream of indices on demand.
I have a dataset which is a numpy array with shape (1536 x 16 x 48). A quick explanation of these dimensions that might be helpful:
The dataset consists of data collected by EEG sensors at 256Hz rate (1 second = 256 measures/values);
1536 values represent 6 seconds of EEG data (256 * 6 = 1536);
16 is the number of electrodes used to collect data;
48 is the number of samples.
In summary: i have 48 samples of 6 seconds (1536 values) of EEG data, collected by 16 electrodes.
I need to create a pandas dataframe with all this data, and therefore turn this 3D array into 2D. The depth dimension (48) can be removed if i stack all samples one above another. So the new dataset will be shaped (1536 * 48) x 16.
In addition to that, since this is a classification problem, i have a vector with 48 values that represents the class of each EEG sample. The new dataset should also has this as a "class" column, and then the real shape would be: (1536 * 48) x 16 + 1 (class).
I could easily do that looping through the depth dimension of the 3D array and concatenate everything into a 2D new one. But this looks bad since i will be dealing with many datasets like this one. Performance is an issue. I would like to know if there's any more clever way of doing it.
I tried to provide the maximum of information i could for this question, but since it is not a trivial task feel free to ask further details if needed.
Thanks in advance.
Setup
>>> import numpy as np
>>> import pandas as pd
>>> a = np.zeros((4,3,3),dtype=int) + [0,1,2]
>>> a *= 10
>>> a += np.array([1,2,3,4])[:,None,None]
>>> a
array([[[ 1, 11, 21],
[ 1, 11, 21],
[ 1, 11, 21]],
[[ 2, 12, 22],
[ 2, 12, 22],
[ 2, 12, 22]],
[[ 3, 13, 23],
[ 3, 13, 23],
[ 3, 13, 23]],
[[ 4, 14, 24],
[ 4, 14, 24],
[ 4, 14, 24]]])
Split evenly along the last dimension; stack those elements, reshape, feed to DataFrame. Using the lengths of the array's dimensions simplifies the process.
>>> d0,d1,d2 = a.shape
>>> pd.DataFrame(np.stack(np.dsplit(a,d2)).reshape(d0*d2,d1))
0 1 2
0 1 1 1
1 2 2 2
2 3 3 3
3 4 4 4
4 11 11 11
5 12 12 12
6 13 13 13
7 14 14 14
8 21 21 21
9 22 22 22
10 23 23 23
11 24 24 24
>>>
Using your shape.
>>> b = np.random.random((1536, 16, 48))
>>> d0,d1,d2 = b.shape
>>> df = pd.DataFrame(np.stack(np.dsplit(b,d2)).reshape(d0*d2,d1))
>>> df.shape
(73728, 16)
>>>
After making the DataFrame from the 3d array, add the classification column to it, df['class'] = data. - Column selection, addition, deletion
For the numpy part
x = np.random.random((1536, 16, 48)) # ndarray with simillar shape
x = x.swapaxes(1,2) # swap axes 1 and 2 i.e 16 and 48
x = x.reshape((-1, 16), order='C') # order is important, you may want to check the docs
c = np.zeros((x.shape[0], 1)) # class column, shape=(73728, 1)
x = np.hstack((x, c)) # final dataset
x.shape
Output
(73728, 17)
or in one line
x = np.hstack((x.swapaxes(1,2).reshape((-1, 16), order='C'), c))
Finally,
x = pd.DataFrame(x)
I have a list of variables. I want to assign name of this list to a column in dataframe. The name stress and its elements keep on change.
stress = ['M13', 'M14', 'M15', 'M16', 'M17', 'M18']
outputlist = [ 13, 14, 15, 16, 17 18 ] ### obtained from analysis
resultdf[stress] = outputlist ### I want to name the column same as list name.
I want something like this given below.
print(resultdf)
stress
0 13
1 14
2 15
3 16
4 17
5 18
It results error when I attempt to do this because whole list values getting list in column header. How to achieve this.
Just needs to be a string. You are trying to use a variable as a column name. Instead write
resultd["stress"] = outputlist
This might be what you're looking for, although I'm not sure what the result data looks like:
>>> stress = ['M13', 'M14', 'M15', 'M16', 'M17', 'M18']
>>> data = [[1,2,3,4,5,6], [7,8,9,10,11,12], [13,14,15,16,17,18], [19,20,21,22,23,24], [25,26,27,28,29,30], [31,32,33,34,35,36]]
>>> result = {x: y for x,y in zip(stress, data)}
>>> result
{'M13': [1, 2, 3, 4, 5, 6], 'M14': [7, 8, 9, 10, 11, 12], 'M15': [13, 14, 15, 16, 17, 18], 'M16': [19, 20, 21, 22, 23, 24], 'M17': [25, 26, 27, 28, 29, 30], 'M18': [31, 32, 33, 34, 35, 36]}
Then you can convert the dictionary to a DataFrame:
>>> import pandas as pd
>>> d = pd.DataFrame(result)
>>> d
M13 M14 M15 M16 M17 M18
0 1 7 13 19 25 31
1 2 8 14 20 26 32
2 3 9 15 21 27 33
3 4 10 16 22 28 34
4 5 11 17 23 29 35
5 6 12 18 24 30 36
Edit (based on your update)
If you literally just want a single column with the variable as the name, put it in quotes:
>>> d = pd.DataFrame({'stress': outputlist})
>>> d
stress
0 13
1 14
2 15
3 16
4 17
5 18
I have a larger 2 dimensional matrix which is 36*72 and I want to select a small matrix from it by using indexes.
The matrix looks like this:
[ [312, 113, 525, 543, ...] ,
[...],
[...],
... ].
And I print the shape like this:
print(array(matrix).shape)
(36, 72)
But when I try to print out the small matrix like this
print(matrix[6:9][9])
The error is "IndexError: list index out of range"
Then I tried
print(matrix[6:9,9])
It showed "TypeError: list indices must be integers, not tuple"
Then I tried
print(matrix[6:9][8:9])
I get the empty list. But when I tried
print(matrix[9][9])
It did give out some number.
With numpy arrays, you can use quite convenient indexing methods, which is a feature of numpy parts of which are refered to as fancy indexing.
Let's try that with a small example 2D-array:
import numpy as np
a=np.arange(48).reshape(6, 8)
print(a)
#[[ 0 1 2 3 4 5 6 7]
# [ 8 9 10 11 12 13 14 15]
# [16 17 18 19 20 21 22 23]
# [24 25 26 27 28 29 30 31]
# [32 33 34 35 36 37 38 39]
# [40 41 42 43 44 45 46 47]]
If you now want to index e.g. rows 2 and 3 and columns 3 to 6, you can simply write that down in slices, no matter if by constants or variables:
r1 = 2; r2 = 4
print(a[r1:r2, 3:7])
#[[19 20 21 22]
# [27 28 29 30]]
You might want to read further here: https://docs.scipy.org/doc/numpy/user/basics.indexing.html
Here's an example. I have a 3x3 matrix, named 'a' and I want to select the top left 2x2 matrix named 'c'.
>>> import numpy as np # importing numpy
>>> a=np.matrix('1 2 3;4 5 6;7 8 9') # creating an example matrix, named a
>>> a
matrix([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
>>> b=[[a.item(0,0),a.item(0,1)],[a.item(1,0),a.item(1,1)]] # creating a list, with 1,1 1,2 2,1 and 2,2 indices of a. remember, in math indexing starts from 1 but in most programming languages, it starts from 0
>>> b
[[1, 2], [4, 5]]
>>> c=np.matrix(b) # creating an numpy matrix object from b which is a part of a
>>> c
matrix([[1, 2],
[4, 5]])
noe = Mx1[2:4, 2:4 ] # this is the salshuen, but yo use 2 becuse 0 is 1 like bit.
# Mx1 [row:colems , Colems:row ] |bns be cerefel
# It confusing but works
noe =[[ 8750. 8750.]
[ 8750. 70000.]]
Mx1 = [[ 8750. 8750. -8750. -8750.]
[ 8750. 8750. -8750. -8750.]
[-8750. -8750. 8750. 8750.]
[-8750. -8750. 8750. 70000.]]
If I have a txt file with the contents as such:
4 2 45 21
0 92 12 2
345 9 3 4
1 2 39 93
Is there a quick and easy way to turn this into a matrix of int?
Right now, I have accessed the file this way:
file = open(testFile, 'r')
data = []
for row in file:
data.append(row)
This stores the data as an array where each line is a string. Instead of going through and converting the data types and then turning it into a matrix, is there a way I can immediately store this data in matrix form as ints as I read it in?
Simple way of doing this is:
>>> for row in file:
... data.append([int(x) for x in row.split()])
...
>>> data
[[4, 2, 45, 21], [0, 92, 12, 2], [345, 9, 3, 4], [1, 2, 39, 93]]
IMO, this is the most pythonic way
for a nested list:
text = """4 2 45 21
0 92 12 2
345 9 3 4
1 2 39 93"""
[[*map(int, line.split())] for line in text.split('\n')]
Out[16]: [[4, 2, 45, 21], [0, 92, 12, 2], [345, 9, 3, 4], [1, 2, 39, 93]]
If you're okay with having your data stored as a numpy.ndarray, you can use numpy's genfromtext() with the dtype flag set to int:
from StringIO import StringIO
import numpy as np
text = """4 2 45 21
0 92 12 2
345 9 3 4
1 2 39 93"""
a = np.genfromtxt(StringIO(text), dtype=int) #replace the arg with your filename
print(a)
#[[ 4 2 45 21]
# [ 0 92 12 2]
# [345 9 3 4]
# [ 1 2 39 93]]
An alternative is to use loadtxt() instead of genfromtxt() as #Zhiya pointed out in the comments.
a = np.loadtxt(StringIO(text), dtype=int)
As per this post, both functions are basically the same except that genfromtxt() provides more options for dealing with missing data.