(Python3.x)Splitting arrays and saving them into new arrays

(Python3.x)Splitting arrays and saving them into new arrays - python

I'm writing a Python script intended to split a big array of numbers into equal sub-arrays. For that purpose, I use Numpy's split method as follows:
test=numpy.array_split(raw,nslices)
where raw is the complete array containing all the values, which are float64-type by the way.
nslices is the number of sub-arrays I want to create from the raw array.
In the script, nslices may vary depending of the size of the raw array, so I would like to "automatically" save each created sub-arrays in a particular array as : resultsarray(i)in a similar way that it can be made in MATLAB/Octave.
I tried to use afor in range loop in Python but I am only able to save the last sub-array in a variable.
What is the correct way to save the sub-array for each each incrementation from 1 to nslices?
Here, the complete code as is it now (I am a Python beginner, please bother the low-level of the script).
import numpy as np
file = open("results.txt", "r")
raw = np.loadtxt(fname=file, delimiter="/n", dtype='float64')
nslices = 3
rawslice = np.array_split(raw,nslices)
for i in range(0,len(rawslice)):
resultsarray=(rawslice[i])
print(rawslice[i])
Thank you very much for your help solving this problem!

First - you screwed up delimiter :)
It should be backslash+n \n instead of /n.
Second - as Serge already mentioned in comment you can just access to split parts by index (resultarray[0] to [2]). But if you really wanted to assign each part to a separate variable you can do this in fommowing way:
result_1_of_3, result_2_of_3, result_3_of_3 = rawslice
print(result_1_of_3, result_2_of_3, result_3_of_3)
But probably it isn't the way you should go.

Related

How to do math operation on imported csv data?

I have read in a csv file ('Air.csv') and have performed some operations to get rid of the header (not important). Then, I used dB_a.append(row[1]) to put this column of the csv data into an array which I could later plot.
This data is dB data, and I want to convert this to power using the simple equation P = 10^(dB/10) for every value. I am new to Python, so I don't quite understand how operations within arrays, lists, etc. works. I think there is something I need to do to iterate over that full data set, which was my attempt at a for loop, but I am still receiving errors. Any suggestions?
Thank you!
frequency_a=[]
dB_a=[]
a = csv.reader(open('Air.csv'))
for row in itertools.islice(a, 18, 219):
frequency_a.append(row[0])
dB_a.append(row[1])
#print(frequency_a)
print(dB_a)
for item in dB_a:
power_a = 10**(dB_a/10)
print(power_a)

In your for loop, item is the iterator, so you need to use that. So instead of:
power_a = 10**(dB_a/10)
use:
power_a = 10**(item/10)
A nicer way to create a new list with that data could be:
power_a = [10**(db/10) for db in dB_a]
EDIT: The other issue as pointed out in the comment, is that the values are strings. The .csv file is essentially a text file, so a collection of string, rather than integers. What you can do is convert them to numeric values using int(db) or float(db), depending whether you have whole or floating point numbers.
EDIT2: As pointed out by #J. Meijers, I was using multiplication instead of exponentiation - this has been fixed in the answer.

To build on the answer #ed Jaras posted.
power_a = [10*(db/10) for db in dB_a]
is not correct, since this divides by 10, and then multiplies by the same.
It should be:
power_a = [10**(db/10) for db in dB_a]
Credits still go to #Ed Jaras though
Note:
If you're wondering what this [something for something in a list] is, it is a list comprehension. They are amazingly elegant constructs that python allows.
What is basically means is [..Add this element to the result.. for ..my element.. in ..a list..].
You can even add conditionals to them if you want.
If you want to read more about them, I suggest checking out:
http://www.secnetix.de/olli/Python/list_comprehensions.hawk
Addition:
#k-schneider: You are probably doing numerical operations (dividing, power, etc. ) on a string, this is because when importing a csv, it is possible for fields to be imported as a string.
To make sure that you are working with integers, you can cast db to a string by doing:
str(db)

Excel worksheet to Numpy array

I'm trying to do an unbelievably simple thing: load parts of an Excel worksheet into a Numpy array. I've found a kludge that works, but it is embarrassingly unpythonic:
say my worksheet was loaded as "ws", the code:
A = np.zeros((37,3))
for i in range(2,39):
for j in range(1,4):
A[i-2,j-1]= ws.cell(row = i, column = j).value
loads the contents of "ws" into array A.
There MUST be a more elegant way to do this. For instance, csvread allows to do this much more naturally, and while I could well convert the .xlsx file into a csv one, the whole purpose of working with openpyxl was to avoid that conversion. So there we are, Collective Wisdom of the Mighty Intertubes: what's a more pythonic way to perform this conceptually trivial operation?
Thank you in advance for your answers.
PS: I operate Python 2.7.5 on a Mac via Spyder, and yes, I did read the openpyxl tutorial, which is the only reason I got this far.

You could do
A = np.array([[i.value for i in j] for j in ws['C1':'E38']])
EDIT - further explanation.
(firstly thanks for introducing me to openpyxl, I suspect I will use it quite a bit from time to time)
the method of getting multiple cells from the worksheet object produces a generator. This is probably much more efficient if you want to work your way through a large sheet as you can start straight away without waiting for it all to load into your list.
to force a generator to make a list you can either use list(ws['C1':'E38']) or a list comprehension as above
each row is a tuple (even if only one column wide) of
Cell objects. These have a lot more about them than just a number but if you want to get the number for your array you can use the .value attribute. This is really the crux of your question, csv files don't contain the structured info of an excel spreadsheet.
there isn't (as far as I can tell) a built in method for extracting values from a range of cells so you will have to do something effectively as you have sketched out.
The advantages of doing it my way are: no need to work out the dimension of the array and make an empty one to start with, no need to work out the corrected index number of the np array, list comprehensions faster. Disadvantage is that it needs the "corners" defining in "A1" format. If the range isn't know then you would have to use iter_rows, rows or columns
A = np.array([[i.value for i in j[2:5]] for j in ws.rows])
if you don't know how many columns then you will have to loop and check values more like your original idea

If you don't need to load data from multiple files in an automated manner, the package tableconvert I recently wrote may help. Just copy and paste the relevant cells from the excel file into a multiline string and use the convert() function.
import numpy as np
from tableconvert.converter import convert
array = convert("""
123 456 3.14159
SOMETEXT 2,71828 0
""")
print(type(array))
print(array)
Output:
<class 'numpy.ndarray'>
[[ 123. 456. 3.14159]
[ nan 2.71828 0. ]]

Creating and Storing Multi-Dimensional Array in a netCDF File

This question has potentially two parts but maybe only one if the first part can be encapsulated by the second. I am using python with numpy and netCDF4
First:
I have four lists of different variable values (hereafter referred to elevation values) each of which has a length of 28. These four lists are one set of 5 different latitude values of which are one set of the 24 different time values.
So 24 times...each time with 5 latitudes...each latitude with four lists...each list with 28 values.
I want to create an array with the following dimensions (elevation, latitude, time, variable)
In words, I want to be able to specify which of the four lists I access,which index in the list, and specify a specific time and latitude. So an index into this array would look like this:
array(0,1,2,3) where 0 specifies the first index of the the 4th list specified by the 3. 1 specifies the 2nd latitude, and 2 specifies the 3rd time and the output is the value at that point.
I won't include my code for this part since literally the only things of mention are the lists
list1=[...]
list2=[...]
list3=[...]
list4=[...]
How can I do this, is there an easier structure of the array, or is there anything else I a missing?
Second:
I have created a netCDF file with variables with these four dimensions. I need to set those variables to the array structure made above. I have no idea how to do this and the netCDF4 documentation does a 1-d array in a fairly cryptic way. If the arrays can be made directly into the netCDF file bypassing the need to use numpy first, by all means show me how.
Thanks!

After talking to a few people where I work we came up with this solution:
First we made an array of zeroes using the following argument:
array1=np.zeros((28,5,24,4))
Then appended this array by specifying where in the array we wanted to change:
array1[:,0,0,0]=list1
This inserted the values of the list into the first entry in the array.
Next to write the array to a netCDF file, I created a netCDF in the same program I made the array, made a single variable and gave it values like this:
netcdfvariable[:]=array1
Hope that helps anyone who finds this.

How do I make a dynamically expanding array in python

Ok I have this part of code:
def Reading_Old_File(self, Path, turn_index, SKU):
print "Reading Old File! Turn Index = ", turn_index, "SKU= ", SKU
lenght_of_array=0
array_with_data=[]
if turn_index==1:
reading_old_file = open(Path,'rU')
data=np.genfromtxt(reading_old_file, delimiter="''", dtype=None)
for index, line_in_data in enumerate(data, start=0):
if index<3:
print index, "Not Yet"
if index>=3:
print ">>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Reading All Old Items"
i=index-3
old_items_data[i]=line_in_data.split("\t")
old_items_data[i]=[lines_old.strip()for lines_old in old_items_data]
print old_items_data[i]
print len(old_items_data)
So what I am doing here is, I'm reading a file, on my first turn, I want to read it all, and keep all data, so it would be something like:
old_items_data[1]=['123','dog','123','dog','123','dog']
old_items_data[2]=['124','cat','124','cat','124','cat']
old_items_data[n]=['amount of list members is equal each time']
each line of the file should be stored in list, so I can use it in future for comparing, when turn_index will be greater than 2 I'll compare coming line with lines in every list(array) by iterating over all lists.
So question is how do I do it, or is there any better way to compare lists?
I'm new to python so maybe someone could help me with this issue?
Thanks

You just need to use append.
old_items_data.append(line_in_data.split("\t"))

I would use the package pandas for this. It will not only be much quicker, but also simpler. Use pandas.read_table to import the data (specifying delimiter and row-skipping can be done here by passing arguments to sep and skiprows). Then, use pandas.DataFrame.apply to apply your function to the rows of your data.
The speed gains are going to come from the fact that pandas was optimized to perform actions across lists like this (in the case of a pandas DataFrame, these would be called rows). This applies to both importing the data and applying a function to every row. The simplicity gains should hopefully be clear.

Conditionally add 1 to int element of a NumPy record array

I have a large NumpPy record array 250 million rows by 9 columns (MyLargeRec). and I need to add 1 to the 7th column (dtype = "int") if the the index of that row is in another list or 300,000 integers(MyList). If this was a normal python list I would use the following simple code...
for m in MyList:
MyLargeRec[m][6]+=1
However I can not seem to get a similar functionality using the NumPy record array. I have tried a few options such as nditer, but this will not let me select the specific indices I want.
Now you may say that this is not what NumPy was designed for, so let me explain why I a using this format. I am using it is because it only takes 30 mins to build the record array from scratch whereas it takes over 24 hours if using a conventional 2D list format. I spent all of yesterday trying to find a way to do this and could not, I eventually converted it to a list using...
MyLargeList = list(MyLargeRec)
so I could use the simple code above to achieve what I want, however this took 8.5 hours to perform this function.
Therefore, can anyone tell me first, is there a method to achieve what i want within a NumPy record array? and second, if not, any ideas on the best methods within python 2.7 to create, update and store such a large 2D matrix?
Many thanks
Tom

your_array[index_list, 6] += 1
Numpy allows you to construct some pretty neat slices. This selects the 6th column of all rows in your list of indices and adds 1 to each. (Note that if an index appears multiple times in your list of indices, this will still only add 1 to the corresponding cell.)

This code...
for m in MyList:
MyLargeRec[m][6]+=1
does actually work, silly question by me.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.