I am trying to output a few separate variables (separate data sets created previously in my code) as a single csv file. I have found the most luck using np.asarray and np.savetxt. I am now trying to format the csv output file and want to have my variables read in each column (header then data below for each variable being written to csv). I have successfully had the data transfer to the csv file along with adding column title headers, but I cannot get the variables to format from one row into separate rows.
I have tried changing the order from 'C' to 'F' in np.asarray. I have also used a few things with the csv writing library in python but I did find np.savetxt and asarray were the best routes (so far)
my code for this is as follows. Each variable type in csvData is listed as 'float64' in my variable explorer if that helps at all.
csvData=np.asarray([[timestep], [IQR_bot],[IQR_top],[median],
[prcnt95],[prcnt5], [KFTG_tot]], dtype=object, order='C')
np.savetxt("pathout.csv",csvData,fmt='%s', header='Timestep,
IQR_bot,IQR_top,median,prcnt95,prcnt5, KFTG_top')
I want each input variable from csvData to be its own separate column of data tied with the respective header listed in np.savetxt. This code is not currently throwing any error messages, but the output format is not what I want it to be.
Related
first post here.
I am very new to programmation, sorry if it is confused.
I made a database by collecting multiple different data online. All these data are in one xlsx file (each data a column), that I converted in csv afterwards because my teacher only showed us how to use csv file in Python.
I installed pandas and make it read my csv file, but it seems it doesnt understand that I have multiple columns, it reads one column. Thus, I can't get the info on each data (and so i can't transform the data).
I tried df.info() and df.info(verbose=True, show_counts=True) but it makes the same thing
len(df.columns) = 1 which proves it doesnt see that each data has its own column
len(df) = 1923, which is right
I was expecting that : https://imgur.com/a/UROKtxN (different project, not the same database)
database used: https://imgur.com/a/Wl1tsYb
And I have that instead : https://imgur.com/a/iV38YNe
database used: https://imgur.com/a/VefFrL4
idk, it looks pretty similar, why doesn't work :((
Thanks.
I have a pandas dataframe abc which I created as follows:
abc = pd.DataFrame({"A":[1,2,3],"B":[2,3,4]})
I added some additional attributes of the dataframe as follows:
abc.attrs = {"Name":"John", "Country":"Nepal"}
I'd like to save the pandas dataframe into an Excel file in xlsx or CSV format. I can do that using abc.to_excel("filename.xlsx") or abc.to_csv("filename.csv") where filename is the required name of the file.
However, I am not able to print the attributes in the saved file. I'd like to save the dataframe in Excel file such that first row gives Name and second row gives Country in two columns as shown below:
How can I do that?
Unfortunately, .to_excel() and .to_csv() do not provide any explicit functionality to insert meta information ahead of the actual dataframe as documented for the Excel and CSV write functions.
Regardless, one could exploit the header argument to hardcode this preamble into the frame. This can be achieved, for example, with
abc.to_csv("filename.csv", header=[str(k) + ',' + str(v) + '\n' for k,v in abc.attrs.items()])
Please note, however, that data tables store homogenous data across rows and columns. Adding meta information on top makes the data harder to read and process. Consider adding it (a) in the file name, (b) in a distinct table, or (c) dropping it altogether.
Additionally, it shall be noted that as of now (Pandas 1.4.3), the attributes feature is experimental and could change/disappear at any future version which makes any implementation brittle.
I have a program that will add a variable number of rows of data to an hdf5 file as shown below.
data_without_cosmic.to_hdf(new_file,key='s', append=True, mode='r+', format='table')
New_file is the file name and data_without_cosmic is a pandas data frame with 'x' , 'y', 'z', and 'i' columns representing positional data and a scalar quantity. I may add several data frames of this form to the file each time I run the full program. For each data frame I add, the 'z' values are a constant value.
The next time I use the program, I would need to access the last batch of rows that was added to the data in order to perform some operations. I wondered if there was a fast way to retrieve just the last data frame that was added to the file or if I could group the data in some way as I add it in order to be able to do so.
The only other way I can think of achieving my goal is by reading the entire file and then checking the z values from bottom up until it changes, but that seemed a little excessive. Any ideas?
P.S I am very inexperienced with working with hdf5 files but I read that they are efficient to work with.
I opened a .mat file in python. I can see that there is one main column named 'CloudData' within the CloudData there are two columns 'Points' and 'RGB'. I can access the Points columns by using:
points_data=(data['CloudData']['Points'][0:1])
where data is the name of the file read in python using scipy.io. But when i try to read the data values, which is inside RGB as below:
channel_data=(data['RGB']['data_values'])
I get the error complaining:
KeyError: 'RGB'
Is there any consideration to be taken after opening .mat file in python?
Actually in the Matlab, the variable data_values is displayed with CloudData.RGB.data_values as title on the variable viewer window.
For the first step, i want to read the values inside the RGB header, which is inside CloudData.
From what you wrote it looks to me that you should access RGB with
channel_data = (data['CloudData']['RGB'][0:1])
or
channel_data = (data['CloudData']['RGB']['data_values'][0:1])
depending on how you .mat file is constructed in matlab.
I have a pandas data frame with two columns:
year experience and salary
I want to save a csv file with these two columns and also have some stats at the head of the file as in the image:
Is there any option to handle these with pandas or any other library of do I have to make a script to write it line adding the commas between fields?
Pandas does not support what you want to do here. The problem is that your format is no valid csv. The RFC for CSV states that Each record is located on a separate line, implying that a line corresponds to a record, with an optional header line. Your format adds the average and max values, which do not correspond to records.
As I see it, you have three paths to go from here: i. You create two separate data frames and map them to csv files (super precise would be 3), one with your records, one with the additional values. ii. Write your data frame to csv first, then open that file and insert the your additional values at the top. iii. If your goal is an import into excel, however, #gefero 's suggestion is the right hint: try using the xslxwriter package do directly write to cells in a spreadsheet.
You can read the file as two separate parts (stats and csv)
Reading stats:
number_of_stats_rows = 3
stats = pandas.read_csv(file_path, nrows=number_of_stats_rows, header=None).fillna('')
Reading remaining file:
other_data = pandas.read_csv(file_path, skiprows=number_of_stats_rows).fillna('')
Take a look to xslxwriter. Perhaps it´s what you are looking for.