I am dealing with a certain .asc file format which contains some data regarding weight and height. I just want to find BMI indexes of people with this data. I am not able to make sense of the dataframe formed after reading the data.
import pandas as pd
df = pd.read_table("data.asc")
I am not able to make sense of the result that I get. Please help me out
I recently had to work with a file with the extension "asc". My solution was the following:
I opened the file with a text editor and check the separator for the file, then transform it into a spreadsheet. In my case, I turned the document into a "csv" file.
After that I ran:
import pandas as pd
df = pd.read_csv('path to your file')
I've searched through some old answers about my problem, but couldn't find an answer.
The problem: I want to open an existing Excel file, than write a new-line with a list and than save the current Excel-File.
My current code is:
import pandas
import pandas as pd
l_bsp = range(1,13)
df = pd.read_excel("Existing_file.xlsx")
df.loc[df.shape[0]+1] = l_bsp
print(df)
Right now it only changes my Dataframe without changing the Excel File. How to I add the list into the existing excel file?
Thank you.
Your code only read the Excel file without writing it back. You can use command such as df.to_excel(). For better understanding of the writing process, suggest you also take a look at the pandas user guide on Writing Excel files
I'm attempting to convert a JSON file to an SQLite or CSV file so that I can manipulate the data with python. Here is where the data is housed: JSON File.
I found a few converters online, but those couldn't handle the quite large JSON file I was working with. I tried using a python module called sqlbiter but again, like the others, was never really able to output or convert the file.
I'm not. sure where to go now, if anyone has any recommendations or insights on how to get this data into a database, I'd really appreciate it.
Thanks in advance!
EDIT: I'm not looking for anyone to do it for me, I just need to be pointed in the right direction. Are there other methods I haven't tried that I could learn?
You can utilize pandas module for this data processing task as follows:
First, you need to read the JSON file using with, open and json.load.
Second, you need to change the format of your file a bit by changing the large dictionary that has a main key for every airport into a list of dictionaries instead.
Third, you can now utilize some pandas magic to convert your list of dictionaries into a DataFrame using pd.DataFrame(data=list_of_dicts).
Finally, you can utilize pandas's to_csv function to write your DataFrame as a CSV file into disk.
It would look something like this:
import pandas as pd
import json
with open('./airports.json.txt','r') as f:
j = json.load(f)
l = list(j.values())
df = pd.DataFrame(data=l)
df.to_csv('./airports.csv', index=False)
You need to load your json file and parse it to have all the fields available, or load the contents to a dictionary, then you could using pyodbc to write to the database these fields, or write them to the csv if you use import csv first.
But this is just a general idea. You need to study python and how to do every step.
For instance for writting to the database you could do something like:
for i in range(0,max_len):
sql_order = "UPDATE MYTABLE SET MYTABLE.MYFIELD ...."
cursor1.execute(sql_order)
cursor1.commit()
I met a DF file which is encoded in binary format. But when I open it using Vim, still I can see characters like "pandas.core.frame", "numpy.core.multiarray". So I guess it is related with Python. However I know little about the Python language. Though I have tried using pandas and numpy modules, I failed to read the file. Could you guys give any suggestion on this issue? Thank you in advance. Here is the Dropbox link to the DF file: https://www.dropbox.com/s/b22lez3xysvzj7q/flux.df
Looks like DataFrame stored with pickle, use read_pickle() to read it:
import pandas as pd
df = pd.read_pickle('flux.df')
The Python library pandas can read Excel spreadsheets and convert them to a pandas.DataFrame with pandas.read_excel(file) command. Under the hood, it uses xlrd library which does not support ods files.
Is there an equivalent of pandas.read_excel for ods files? If not, how can I do the same for an Open Document Formatted spreadsheet (ods file)? ODF is used by LibreOffice and OpenOffice.
This is available natively in pandas 0.25. So long as you have odfpy installed (conda install odfpy OR pip install odfpy) you can do
pd.read_excel("the_document.ods", engine="odf")
You can read ODF (Open Document Format .ods) documents in Python using the following modules:
odfpy / read-ods-with-odfpy
ezodf
pyexcel / pyexcel-ods
py-odftools
simpleodspy
Using ezodf, a simple ODS-to-DataFrame converter could look like this:
import pandas as pd
import ezodf
doc = ezodf.opendoc('some_odf_spreadsheet.ods')
print("Spreadsheet contains %d sheet(s)." % len(doc.sheets))
for sheet in doc.sheets:
print("-"*40)
print(" Sheet name : '%s'" % sheet.name)
print("Size of Sheet : (rows=%d, cols=%d)" % (sheet.nrows(), sheet.ncols()) )
# convert the first sheet to a pandas.DataFrame
sheet = doc.sheets[0]
df_dict = {}
for i, row in enumerate(sheet.rows()):
# row is a list of cells
# assume the header is on the first row
if i == 0:
# columns as lists in a dictionary
df_dict = {cell.value:[] for cell in row}
# create index for the column headers
col_index = {j:cell.value for j, cell in enumerate(row)}
continue
for j, cell in enumerate(row):
# use header instead of column index
df_dict[col_index[j]].append(cell.value)
# and convert to a DataFrame
df = pd.DataFrame(df_dict)
P.S.
ODF spreadsheet (*.ods files) support has been requested on the pandas issue tracker: https://github.com/pydata/pandas/issues/2311, but it is still not implemented.
ezodf was used in the unfinished PR9070 to implement ODF support in pandas. That PR is now closed (read the PR for a technical discussion), but it is still available as an experimental feature in this pandas fork.
there are also some brute force methods to read directly from the XML code (here)
Here is a quick and dirty hack which uses ezodf module:
import pandas as pd
import ezodf
def read_ods(filename, sheet_no=0, header=0):
tab = ezodf.opendoc(filename=filename).sheets[sheet_no]
return pd.DataFrame({col[header].value:[x.value for x in col[header+1:]]
for col in tab.columns()})
Test:
In [92]: df = read_ods(filename='fn.ods')
In [93]: df
Out[93]:
a b c
0 1.0 2.0 3.0
1 4.0 5.0 6.0
2 7.0 8.0 9.0
NOTES:
all other useful parameters like header, skiprows, index_col, parse_cols are NOT implemented in this function - feel free to update this question if you want to implement them
ezodf depends on lxml make sure you have it installed
pandas now supports .ods files. you must install the odfpy module first. then it will work like a normal .xls file.
conda install -c conda-forge odfpy
then
pd.read_excel('FILE_NAME.ods', engine='odf')
Edit: Happily, this answer below is now out of date, if you can update to a recent Pandas version.
If you'd still like to work from a Pandas version of your data, and update it from ODS only when needed, read on.
It seems the answer is No!
And I would characterize the tools to read in ODS still ragged.
If you're on POSIX, maybe the strategy of exporting to xlsx on the fly before using Pandas' very nice importing tools for xlsx is an option:
unoconv -f xlsx -o tmp.xlsx myODSfile.ods
Altogether, my code looks like:
import pandas as pd
import os
if fileOlderThan('tmp.xlsx','myODSfile.ods'):
os.system('unoconv -f xlsx -o tmp.xlsx myODSfile.ods ')
xl_file = pd.ExcelFile('tmp.xlsx')
dfs = {sheet_name: xl_file.parse(sheet_name)
for sheet_name in xl_file.sheet_names}
df=dfs['Sheet1']
Here fileOlderThan() is a function (see http://github.com/cpbl/cpblUtilities) which returns true if tmp.xlsx does not exist or is older than the .ods file.
Another option: read-ods-with-odfpy. This module takes an OpenDocument Spreadsheet as input, and returns a list, out of which a DataFrame can be created.
If you only have a few .ods files to read, I would just open it in openoffice and save it as an excel file. If you have a lot of files, you could use the unoconv command in Linux to convert the .ods files to .xls programmatically (with bash)
Then it's really easy to read it in with pd.read_excel('filename.xls')
I've had good luck with pandas read_clipboard.
Selecting cells and then copy from excel or opendocument.
In python run the following.
import pandas as pd
data = pd.read_clipboard()
Pandas will do a good job based on the cells copied.
Some responses have pointed out that odfpy or other external packages are needed to get this functionality, but note that in recent versions of Pandas (current is 1.1, August-2020) there is support for ODS format in functions like pd.ExcelWriter() and pd.read_excel(). You only need to specify the propper engine "odf" to be able of working with OpenDocument file formats (.odf, .ods, .odt).
Based heavily on the answer by davidovitch (thank you), I have put together a package that reads in a .ods file and returns a DataFrame. It's not a full implementation in pandas itself, such as his PR, but it provides a simple read_ods function that does the job.
You can install it with pip install pandas_ods_reader. It's also possible to specify whether the file contains a header row or not, and to specify custom column names.
There is support for reading Excel files in Pandas (both xls and xlsx), see the read_excel command. You can use OpenOffice to save the spreadsheet as xlsx. The conversion can also be done automatically on the command line, apparently, using the convert-to command line parameter.
Reading the data from xlsx avoids some of the issues (date formats, number formats, unicode) that you may run into when you convert to CSV first.
If possible, save as CSV from the spreadsheet application and then use pandas.read_csv(). IIRC, an 'ods' spreadsheet file actually is an XML file which also contains quite some formatting information. So, if it's about tabular data, extract this raw data first to an intermediate file (CSV, in this case), which you can then parse with other programs, such as Python/pandas.