I opened a .mat file in python. I can see that there is one main column named 'CloudData' within the CloudData there are two columns 'Points' and 'RGB'. I can access the Points columns by using:
points_data=(data['CloudData']['Points'][0:1])
where data is the name of the file read in python using scipy.io. But when i try to read the data values, which is inside RGB as below:
channel_data=(data['RGB']['data_values'])
I get the error complaining:
KeyError: 'RGB'
Is there any consideration to be taken after opening .mat file in python?
Actually in the Matlab, the variable data_values is displayed with CloudData.RGB.data_values as title on the variable viewer window.
For the first step, i want to read the values inside the RGB header, which is inside CloudData.
From what you wrote it looks to me that you should access RGB with
channel_data = (data['CloudData']['RGB'][0:1])
or
channel_data = (data['CloudData']['RGB']['data_values'][0:1])
depending on how you .mat file is constructed in matlab.
Related
I am trying to create h5 file. I was successful in storing numerical values using the following lines of code:
`hf = h5py.File('test_file.hdf5', 'w')
col1 = pd.read_csv('dataset.csv')
hf['/col1'] = col1`
A part of excel sheet data for the above is attached as image which has numerical values.
enter image description here
When I used the same code as above to store string, it gives me the following error:
TypeError: Object dtype dtype('O') has no native HDF5 equivalent
The data that I am trying to save is attached in the image which has characters. I want it as it is in the image.
enter image description here
I tried the following code to store string data, but it stores everything in a single cell, it looked like, but I want each string in separate cell as shown in image
`my_Dat = pd.read_csv('dta.csv')
hf['/col2'] = np.array(my_Dat)`
Help would be appreciated.
I have checked numerous solutions for this in the stackoverflow community but none seemed to work. I want each string on separate cell as shown below, adding individually in h5 file using HDFView would be cumbersome since I have 1600+ rows
enter image description here
After scraping I have put the information in a dataframe and want to export it to a .csv but one of the three columns returns empty in the .csv file ("Content"). This is weird since the all of the three columns are visible in the dataframe, see screenshot.
Screenshot dataframe
Line I use to convert:
df.to_csv('filedestination.csv')
Inspecting the df returns objects:
Inspecting dataframe
Does anyone know how it is possible that the last column, "Content" does not show any data in the .csv file?
Screenshot .csv file
After suggestions it seems that the data is available when opening with .txt. How is it possible that excel does not show the data properly?
Screenshot .txt file data
What is the data type of the Content column?
It is not a string, you can convert that to a string. And then perform df.to_csv
Sometimes, this happens weirdly. View & export will be different. Try Resetting the index before exporting it to .csv/ excel. This always works for me.
df.reset_index()
then,
df.to_csv(r'file location/filename.csv')
I am trying to output a few separate variables (separate data sets created previously in my code) as a single csv file. I have found the most luck using np.asarray and np.savetxt. I am now trying to format the csv output file and want to have my variables read in each column (header then data below for each variable being written to csv). I have successfully had the data transfer to the csv file along with adding column title headers, but I cannot get the variables to format from one row into separate rows.
I have tried changing the order from 'C' to 'F' in np.asarray. I have also used a few things with the csv writing library in python but I did find np.savetxt and asarray were the best routes (so far)
my code for this is as follows. Each variable type in csvData is listed as 'float64' in my variable explorer if that helps at all.
csvData=np.asarray([[timestep], [IQR_bot],[IQR_top],[median],
[prcnt95],[prcnt5], [KFTG_tot]], dtype=object, order='C')
np.savetxt("pathout.csv",csvData,fmt='%s', header='Timestep,
IQR_bot,IQR_top,median,prcnt95,prcnt5, KFTG_top')
I want each input variable from csvData to be its own separate column of data tied with the respective header listed in np.savetxt. This code is not currently throwing any error messages, but the output format is not what I want it to be.
I have a program that will add a variable number of rows of data to an hdf5 file as shown below.
data_without_cosmic.to_hdf(new_file,key='s', append=True, mode='r+', format='table')
New_file is the file name and data_without_cosmic is a pandas data frame with 'x' , 'y', 'z', and 'i' columns representing positional data and a scalar quantity. I may add several data frames of this form to the file each time I run the full program. For each data frame I add, the 'z' values are a constant value.
The next time I use the program, I would need to access the last batch of rows that was added to the data in order to perform some operations. I wondered if there was a fast way to retrieve just the last data frame that was added to the file or if I could group the data in some way as I add it in order to be able to do so.
The only other way I can think of achieving my goal is by reading the entire file and then checking the z values from bottom up until it changes, but that seemed a little excessive. Any ideas?
P.S I am very inexperienced with working with hdf5 files but I read that they are efficient to work with.
I want to extract certain information from many PNG/JPEG files through pytesseract and write them into an excel file if possible.
I've figured out how to extract the text from the pictures but what I haven't figured out is:
1) How do I extract specific information instead of a whole blob of words? For example, I want the account numbers and reference numbers from each photo, nothing else.
2) How do I write these account numbers and reference numbers into an external file such as excel?
I'll attach what I've got so far below:
I've heard that using pandas dataframes was a good way to append data into columns for Excel but I'm not sure if I can do that for a task like this.
from PIL import Image
import pytesseract
import pandas as pd
pytesseract.pytesseract.tesseract_cmd = "C:\\Program Files (x86)\\Tesseract-OCR\\tesseract.exe"
im = Image.open("C:/Users/user1/desktop/scripts/ocr/example bills/pic.jpg")
content = pd.DataFrame()
text = pytesseract.image_to_string(im, lang= 'eng')
temp = pd.DataFrame({'Words':[text]})
content.append(temp)
content.head()
print(text)
writer = pd.ExcelWriter('wordstest.xlsx')
content.to_excel(writer,'Sheet1')
writer.save()
Expected Results:
An excel file with two columns, account number and reference number.
Actual Results:
An excel file with no data.
To convert a dataframe to a spreadsheet try this
content.to_csv('wordstest.csv',sep=',')
This can be opened in excel. If you need more columns just add them to the dataframe and then write the csv file
You have to filter the text that you have read from the image or find the portions of the image that you want to read before actually reading them with tesseract. For filtering the read text you can use regexes and for finding the portions in the image you'll have to use some of the computer vision algorithms that predict some portions of the image (object detection) and train them on your data.
And for adding the dataframe to excel just use pandas to_csv or to_excel methods