How to load sframe format file in pandas? - python

Is there any way to directly open .sframe extension file in pandas.
Like an easy way
df = pd.read_csv('people.sframe')
Thank you.

No, you can't import sframe files directly with Pandas. Rather you can use a free python library named sframe:
import sframe
import pandas as pd
sf = sframe.SFrame('people.sframe')
Then you can convert it to a pandas DataFrame using:
df = sf.to_dataframe()

Related

How to open excel file in Polars dataframe?

I am a python pandas user but recently found about polars dataframe and it seems quite promising and blazingly fast. I am not able to find a way to open an excel file in polars. Polars is happily reading csv, json, etc. but not excel.
I am extensive user of excel files in pandas and I want to try using polars. I have many sheets in excel that pandas automatically read. How can I do same with polars?
What am I missing?
This is more of a workaround than a real answer, but you can read it into pandas and then convert it to a polars dataframe.
import polars as pl
import pandas as pd
df = pd.read_excel(...)
df_pl = pl.DataFrame(df)
You could, however, make a feature request to the Apache Arrow community to support excel files.
Polars now has a read_excel method, as of this PR!
https://pola-rs.github.io/polars/py-polars/html/reference/api/polars.read_excel.html
You should be able to just do:
import polars as pl
df = pl.read_excel("file.xlsx")

How do I create a dataframe from a geojason file without using geopandas?

I'm looking to turn a geojason into a pandas dataframe that I can work with using python. However, for some reason, the geojason package will not install on my computer.
So wanted to know how I could turn a geojason file into a dataframe witout using the geojason package.
This is what I have so far
import json
import pandas as pd
with open('Local_Authority_Districts_(December_2020)_UK_BGC.geojson') as f:
data = json.load(f)
Here is a link to the geojason that I'm working with. I'm new to python so any help would be much appreciated. https://drive.google.com/file/d/1V4WljiJcASqq9ksh8CHM_2nBC0K2PR18/view?usp=sharing
You could use geopandas. It's as easy as this:
import geopandas as gpd
gdf = gpd.read_file('Local_Authority_Districts_(December_2020)_UK_BGC.geojson')
You can turn the resulting geodataframe into a regular dataframe with:
df = pd.DataFrame(gdf)

How to use the gzip module to open a csv file

I am looking to read in a .csv.gz file that is in the same directory as my python script using the gzip and pandas module only.
So far I have,
import gzip
import pandas as pd
data = gzip.open(test_data.csv.gz, mode='rb')
How do I proceed in converting / reading this file in as a dataframe without using the csv module as seen in similarly answered questions?
You can use pandas.read_csv directly:
import pandas as pd
df = pd.read_csv('test_data.csv.gz', compression='gzip')
If you must use gzip:
with gzip.open('test_data.csv.gz', mode='rb') as csv:
df = pd.read_csv(csv)

Python - read parquet data from a variable

I am reading a parquet file and transforming it into dataframe.
from fastparquet import ParquetFile
pf = ParquetFile('file.parquet')
df = pf.to_pandas()
Is there a way to read a parquet file from a variable (that previously read and now hold parquet data)?
Thanks.
In Pandas there is method to deal with parquet. Here is reference to the docs. Something like that:
import pandas as pd
pd.read_parquet('file.parquet')
should work. Also please read this post for engine selection.
You can read a file from a variable also using pandas.read_parquet using the following code. I tested this with the pyarrow backend but this should also work for the fastparquet backend.
import pandas as pd
import io
with open("file.parquet", "rb") as f:
data = f.read()
buf = io.BytesIO(data)
df = pd.read_parquet(buf)

Import data tables in Python

I am new to Python, coming from MATLAB. In MATLAB, I used to create a variable table (copy from excel to MATLAB) in MATLAB and save it as a .mat file and whenever I needed the data from the MATLAB, I used to import it using:
A = importdata('Filename.mat');
[Filename is 38x5 table, see the attached photo]
Is there a way I can do this in Python? I have to work with about 35 such tables and loading everytime from excel is not the best way.
In order to import excel tables into your python environment you have to install pandas.
Check out the detailed guideline.
import pandas as pd
xl = pd.ExcelFile('myFile.xlsx')
I hope this helps.
Use pandas:
import pandas as pd
dataframe = pd.read_csv("your_data.csv")
dataframe.head() # prints out first rows of your data
Or from Excel:
dataframe = pd.read_excel('your_excel_sheet.xlsx')

Categories