I am a python pandas user but recently found about polars dataframe and it seems quite promising and blazingly fast. I am not able to find a way to open an excel file in polars. Polars is happily reading csv, json, etc. but not excel.
I am extensive user of excel files in pandas and I want to try using polars. I have many sheets in excel that pandas automatically read. How can I do same with polars?
What am I missing?
This is more of a workaround than a real answer, but you can read it into pandas and then convert it to a polars dataframe.
import polars as pl
import pandas as pd
df = pd.read_excel(...)
df_pl = pl.DataFrame(df)
You could, however, make a feature request to the Apache Arrow community to support excel files.
Polars now has a read_excel method, as of this PR!
https://pola-rs.github.io/polars/py-polars/html/reference/api/polars.read_excel.html
You should be able to just do:
import polars as pl
df = pl.read_excel("file.xlsx")
Related
I've noticed that the Polars read_excel/Xlsx2csv doesn't include rows that are filtered in excel file(s). However, there is an option in Polars read_csv to set xlsx2csv options (xlsx2csv_options: dict[str, Any]).
I've tried to check from Xlsx2csv documentation whether there would be such an option, but couldn't find it. ChatGPT instructed to try "--skip-filtered-rows", but not sure how can I enter that to the Polars read_csv xlsx2csv option as a dictionary?
Workaround I've used is to read the excel file with pandas first and then read it with polars, but would prefer to do this with polars directly.
pl.from_pandas(pd.read_excel(file))
xlsx2csv version 0.8.0
polars version 0.15.13
EDIT: Found the same issue in github as well: https://github.com/dilshod/xlsx2csv/issues/246
Looks like there was added a new setting to xlsx2csv "skip_hidden_rows" in the version 0.8.0. However, when I tried to use the below, I am still getting only the filtered rows from the excel files.
pl.read_excel(file, xlsx2csv_options={"skip_hidden_rows": False})
I am to do a Table and was wondering if there is an index_col alternative in python natively. I have this assignment where the csv file is inversed and I need to make the last row the column labels. I am not to use the pandas library, only the datascience one.
This is how the csv file looks
Code is like this:
import datascience
Table().read_table(document_name)
Is there any way to directly open .sframe extension file in pandas.
Like an easy way
df = pd.read_csv('people.sframe')
Thank you.
No, you can't import sframe files directly with Pandas. Rather you can use a free python library named sframe:
import sframe
import pandas as pd
sf = sframe.SFrame('people.sframe')
Then you can convert it to a pandas DataFrame using:
df = sf.to_dataframe()
I'm trying to read binary Excel files using read_excel method in pandas with pyxlsb engine as below:
import pandas as pd
df = pd.read_excel('test.xlsb', engine='pyxlsb')
If the xlsb file is like this file (Right now, I'm sharing this file via WeTransfer, but if there is a better way to share files on StackOverflow, let me know), the returned dataframe is filled with NaN's. I suspected that it might be because the file was saved with active cell pointing at the empty cells after the data originally. So I tried this:
import pandas as pd
with open('test.xlsb', 'rb') as data:
data.seek(0,0)
df = pd.read_excel(data, engine='pyxlsb')
but it still doesn't seem to work. I also tried reading the data from byte number 0 (from the beginning), writing it into a new file, 'test_1.xlsb', and finally reading it with pandas, but that doesn't work.
with open('test.xlsb','rb') as data:
data.seek(0,0)
with open('test_1.xlsb','wb') as outfile:
outfile.write(data.read())
df = pd.read_excel('test_1.xlsb', engine='pyxlsb')
If anyone has suggestion as to what might be going on and how to resolve it, I'd greatly appreciate the help.
I am new to Python, coming from MATLAB. In MATLAB, I used to create a variable table (copy from excel to MATLAB) in MATLAB and save it as a .mat file and whenever I needed the data from the MATLAB, I used to import it using:
A = importdata('Filename.mat');
[Filename is 38x5 table, see the attached photo]
Is there a way I can do this in Python? I have to work with about 35 such tables and loading everytime from excel is not the best way.
In order to import excel tables into your python environment you have to install pandas.
Check out the detailed guideline.
import pandas as pd
xl = pd.ExcelFile('myFile.xlsx')
I hope this helps.
Use pandas:
import pandas as pd
dataframe = pd.read_csv("your_data.csv")
dataframe.head() # prints out first rows of your data
Or from Excel:
dataframe = pd.read_excel('your_excel_sheet.xlsx')