Reading .ASC format file in python - python

I am dealing with a certain .asc file format which contains some data regarding weight and height. I just want to find BMI indexes of people with this data. I am not able to make sense of the dataframe formed after reading the data.
import pandas as pd
df = pd.read_table("data.asc")
I am not able to make sense of the result that I get. Please help me out

I recently had to work with a file with the extension "asc". My solution was the following:
I opened the file with a text editor and check the separator for the file, then transform it into a spreadsheet. In my case, I turned the document into a "csv" file.
After that I ran:
import pandas as pd
df = pd.read_csv('path to your file')

Related

Open an existing excel file and writing a new line in Python

I've searched through some old answers about my problem, but couldn't find an answer.
The problem: I want to open an existing Excel file, than write a new-line with a list and than save the current Excel-File.
My current code is:
import pandas
import pandas as pd
l_bsp = range(1,13)
df = pd.read_excel("Existing_file.xlsx")
df.loc[df.shape[0]+1] = l_bsp
print(df)
Right now it only changes my Dataframe without changing the Excel File. How to I add the list into the existing excel file?
Thank you.
Your code only read the Excel file without writing it back. You can use command such as df.to_excel(). For better understanding of the writing process, suggest you also take a look at the pandas user guide on Writing Excel files

Reading XLSB (binary) file with Pandas read_excel using pyxlsb reads empty rows for some xlsb file

I'm trying to read binary Excel files using read_excel method in pandas with pyxlsb engine as below:
import pandas as pd
df = pd.read_excel('test.xlsb', engine='pyxlsb')
If the xlsb file is like this file (Right now, I'm sharing this file via WeTransfer, but if there is a better way to share files on StackOverflow, let me know), the returned dataframe is filled with NaN's. I suspected that it might be because the file was saved with active cell pointing at the empty cells after the data originally. So I tried this:
import pandas as pd
with open('test.xlsb', 'rb') as data:
data.seek(0,0)
df = pd.read_excel(data, engine='pyxlsb')
but it still doesn't seem to work. I also tried reading the data from byte number 0 (from the beginning), writing it into a new file, 'test_1.xlsb', and finally reading it with pandas, but that doesn't work.
with open('test.xlsb','rb') as data:
data.seek(0,0)
with open('test_1.xlsb','wb') as outfile:
outfile.write(data.read())
df = pd.read_excel('test_1.xlsb', engine='pyxlsb')
If anyone has suggestion as to what might be going on and how to resolve it, I'd greatly appreciate the help.

How do you read rows from a csv file and store it in an array using Python codes?

I have a CSV file, diseases_matrix_KNN.csv which has excel table.
Now, I would like to store all the numbers from the row like:
Hypothermia = [0,-1,0,0,0,0,0,0,0,0,0,0,0,0]
For some reason, I am unable to find a solution to this. Even though I have looked. Please let me know if I can read this type of data in the chosen form, using Python please.
most common way to work with excel is use Pandas.
Here is example:
import pandas as pd
df = pd.read_excel(filename)
print (df.iloc['Hypothermia']). # gives you such result

How do I add specifically found (OCR) text into a list and write it to an excel file? [pytesseract]

I want to extract certain information from many PNG/JPEG files through pytesseract and write them into an excel file if possible.
I've figured out how to extract the text from the pictures but what I haven't figured out is:
1) How do I extract specific information instead of a whole blob of words? For example, I want the account numbers and reference numbers from each photo, nothing else.
2) How do I write these account numbers and reference numbers into an external file such as excel?
I'll attach what I've got so far below:
I've heard that using pandas dataframes was a good way to append data into columns for Excel but I'm not sure if I can do that for a task like this.
from PIL import Image
import pytesseract
import pandas as pd
pytesseract.pytesseract.tesseract_cmd = "C:\\Program Files (x86)\\Tesseract-OCR\\tesseract.exe"
im = Image.open("C:/Users/user1/desktop/scripts/ocr/example bills/pic.jpg")
content = pd.DataFrame()
text = pytesseract.image_to_string(im, lang= 'eng')
temp = pd.DataFrame({'Words':[text]})
content.append(temp)
content.head()
print(text)
writer = pd.ExcelWriter('wordstest.xlsx')
content.to_excel(writer,'Sheet1')
writer.save()
Expected Results:
An excel file with two columns, account number and reference number.
Actual Results:
An excel file with no data.
To convert a dataframe to a spreadsheet try this
content.to_csv('wordstest.csv',sep=',')
This can be opened in excel. If you need more columns just add them to the dataframe and then write the csv file
You have to filter the text that you have read from the image or find the portions of the image that you want to read before actually reading them with tesseract. For filtering the read text you can use regexes and for finding the portions in the image you'll have to use some of the computer vision algorithms that predict some portions of the image (object detection) and train them on your data.
And for adding the dataframe to excel just use pandas to_csv or to_excel methods

Creating a dataframe from a csv file in pandas: column issue

I have a messy text file that I need to sort into columns in a dataframe so I
can do the data analysis I need to do. Here is the messy looking file:
Messy text
I can read it in as a csv file, that looks a bit nicer using:
import pandas as pd
data = pd.read_csv('phx_30kV_indepth_0_0_outfile.txt')
print(data)
And this prints out the data aligned, but the issue is that the output is [640 rows x 1 column]. And I need to separate it into multiple columns and manipulate it as a dataframe.
I have tried a number of solutions using StringIO that have worked here before, but nothing seems to be doing the trick.
However, when I do this, there is the issue that the
delim_whitespace=True
Link to docs ^
df = pd.read_csv('phx_30kV_indepth_0_0_outfile.txt', delim_whitespace=True)
Your input file is actually not in CSV format.
As you provided only .png picture, it is even not clear, whether this file
is divided into rows or not.
If not, you have to start from "cutting" the content into individual lines and
read the content from the output file - result of this cutting.
I think, this is the first step, before you can use either read_csv or read_table (of course, with delim_whitespace=True).

Categories