Save float and int to file simultaneously in Python - python

I have a numpy array with a few columns, containing floats, and I want to add one more columns, containing only zeros and save this to file. For the program I need to use this file for, the last column should appear as 0 instead of 0.00000e+00. I tired this:
z = np.zeros(len(data[:,0])).astype(int)
new_data = np.column_stack((data,z))
np.savetxt("data_new.dat",new_data)
but it doesn't seem to work i.e. the zeros appear as floats. One more thing, how can I also specify the number of decimals that the floats should be saved with to the file? Thank you!

Related

(Python3.x)Splitting arrays and saving them into new arrays

I'm writing a Python script intended to split a big array of numbers into equal sub-arrays. For that purpose, I use Numpy's split method as follows:
test=numpy.array_split(raw,nslices)
where raw is the complete array containing all the values, which are float64-type by the way.
nslices is the number of sub-arrays I want to create from the raw array.
In the script, nslices may vary depending of the size of the raw array, so I would like to "automatically" save each created sub-arrays in a particular array as : resultsarray(i)in a similar way that it can be made in MATLAB/Octave.
I tried to use afor in range loop in Python but I am only able to save the last sub-array in a variable.
What is the correct way to save the sub-array for each each incrementation from 1 to nslices?
Here, the complete code as is it now (I am a Python beginner, please bother the low-level of the script).
import numpy as np
file = open("results.txt", "r")
raw = np.loadtxt(fname=file, delimiter="/n", dtype='float64')
nslices = 3
rawslice = np.array_split(raw,nslices)
for i in range(0,len(rawslice)):
resultsarray=(rawslice[i])
print(rawslice[i])
Thank you very much for your help solving this problem!
First - you screwed up delimiter :)
It should be backslash+n \n instead of /n.
Second - as Serge already mentioned in comment you can just access to split parts by index (resultarray[0] to [2]). But if you really wanted to assign each part to a separate variable you can do this in fommowing way:
result_1_of_3, result_2_of_3, result_3_of_3 = rawslice
print(result_1_of_3, result_2_of_3, result_3_of_3)
But probably it isn't the way you should go.

Fastest way to format a column with openpyxl (Python)

I'm using Openpyxl and applying number formatting for a dynamically determined number of columns and rows (based on available data), e.g.
ws.cell(row=i, column=idx + 1).number_format = '_(* #,##0.00_);_(* (#,##0.00);_(* "-"??_);_(#_)'
It takes a long time to format some of the bigger workbooks.
All I'm trying to accomplish is creating workbooks that treat integers and floats as numbers (either no decimal places or two decimal places), rather than strings, and I want that for all idx columns. I've read that it's possible, presumably related to this: https://openpyxl.readthedocs.io/en/stable/_modules/openpyxl/styles/numbers.html but I'm not sure how to implement this.
If what you are trying to accomplish is just to make Excel treat numbers well, as numbers and not like strings you can try converting them to float in python. This method is almost 50% faster than assigning format to each cell;
ws.cell(row=i, column=idx + 1).value = float(ws.cell(row=i, column=idx + 1).value)

Creating and Storing Multi-Dimensional Array in a netCDF File

This question has potentially two parts but maybe only one if the first part can be encapsulated by the second. I am using python with numpy and netCDF4
First:
I have four lists of different variable values (hereafter referred to elevation values) each of which has a length of 28. These four lists are one set of 5 different latitude values of which are one set of the 24 different time values.
So 24 times...each time with 5 latitudes...each latitude with four lists...each list with 28 values.
I want to create an array with the following dimensions (elevation, latitude, time, variable)
In words, I want to be able to specify which of the four lists I access,which index in the list, and specify a specific time and latitude. So an index into this array would look like this:
array(0,1,2,3) where 0 specifies the first index of the the 4th list specified by the 3. 1 specifies the 2nd latitude, and 2 specifies the 3rd time and the output is the value at that point.
I won't include my code for this part since literally the only things of mention are the lists
list1=[...]
list2=[...]
list3=[...]
list4=[...]
How can I do this, is there an easier structure of the array, or is there anything else I a missing?
Second:
I have created a netCDF file with variables with these four dimensions. I need to set those variables to the array structure made above. I have no idea how to do this and the netCDF4 documentation does a 1-d array in a fairly cryptic way. If the arrays can be made directly into the netCDF file bypassing the need to use numpy first, by all means show me how.
Thanks!
After talking to a few people where I work we came up with this solution:
First we made an array of zeroes using the following argument:
array1=np.zeros((28,5,24,4))
Then appended this array by specifying where in the array we wanted to change:
array1[:,0,0,0]=list1
This inserted the values of the list into the first entry in the array.
Next to write the array to a netCDF file, I created a netCDF in the same program I made the array, made a single variable and gave it values like this:
netcdfvariable[:]=array1
Hope that helps anyone who finds this.

Converting long integers to strings in pandas (to avoid scientific notation)

I want the following records (currently displaying as 3.200000e+18 but actually (hopefully) each a different long integer), created using pd.read_excel(), to be interpreted differently:
ipdb> self.after['class_parent_ref']
class_id
3200000000000515954 3.200000e+18
3200000000000515951 NaN
3200000000000515952 NaN
3200000000000515953 NaN
3200000000000515955 3.200000e+18
3200000000000515956 3.200000e+18
Name: class_parent_ref, dtype: float64
Currently, they seem to 'come out' as scientifically notated strings:
ipdb> self.after['class_parent_ref'].iloc[0]
3.2000000000005161e+18
Worse, though, it's not clear to me that the number has been read correctly from my .xlsx file:
ipdb> self.after['class_parent_ref'].iloc[0] -3.2e+18
516096.0
The number in Excel (the data source) is 3200000000000515952.
This is not about the display, which I know I can change here. It's about keeping the underlying data in the same form it was in when read (so that if/when I write it back to Excel, it'll look the same and so that if I use the data, it'll look like it did in Excel and not Xe+Y). I would definitely accept a string if I could count on it being a string representation of the correct number.
You may notice that the number I want to see is in fact (incidentally) one of the labels. Pandas correctly read those in as strings (perhaps because Excel treated them as strings?) unlike this number which I entered. (Actually though, even when I enter ="3200000000000515952" into the cell in question before redoing the read, I get the same result described above.)
How can I get 3200000000000515952 out of the dataframe? I'm wondering if pandas has a limitation with long integers, but the only thing I've found on it is 1) a little dated, and 2) doesn't look like the same thing I'm facing.
Thank you!
Convert your column values with NaN into 0 then typcast that column as integer to do so.
df[['class_parent_ref']] = df[['class_parent_ref']].fillna(value = 0)
df['class_parent_ref'] = df['class_parent_ref'].astype(int)
Or in reading your file, specify keep_default_na = False for pd.read_excel() and na_filter = False for pd.read_csv()

numpy array values to be converted from string to float?

I have a dataset like the one shown below
http://i.stack.imgur.com/1uxCK.png
I am able to read them into an numpy array but the datatype is of type string when it has read from the CSV file. I am unable to convert the same into float since without that i would not be able to proceed further.Mind you there are blank spaces between the two data columns shown in the first screenshot.
The numpy array structure when printed looks like in the screenshot given below:
http://i.stack.imgur.com/JFfzw.png
Note: (Observe the Single Quotation Marks between the start and end of each data line in the screenshot which is a proof that numpy has stored the data as a string rather than float)
Any help would be appreciated in helping me convert the data from string to float type?????? have Tried many things but yet all in vain!!!!!!!!
numpy.loadtxt(filename) should work out of the box: it yields numbers.

Categories