Import data tables in Python - python

I am new to Python, coming from MATLAB. In MATLAB, I used to create a variable table (copy from excel to MATLAB) in MATLAB and save it as a .mat file and whenever I needed the data from the MATLAB, I used to import it using:
A = importdata('Filename.mat');
[Filename is 38x5 table, see the attached photo]
Is there a way I can do this in Python? I have to work with about 35 such tables and loading everytime from excel is not the best way.

In order to import excel tables into your python environment you have to install pandas.
Check out the detailed guideline.
import pandas as pd
xl = pd.ExcelFile('myFile.xlsx')
I hope this helps.

Use pandas:
import pandas as pd
dataframe = pd.read_csv("your_data.csv")
dataframe.head() # prints out first rows of your data
Or from Excel:
dataframe = pd.read_excel('your_excel_sheet.xlsx')

Related

Inserting Data into an Excel file using Pandas - Python

I have an excel file that contains the names of 60 datasets.
I'm trying to write a piece of code that "enters" the Excel file, accesses a specific dataset (whose name is in the Excel file), gathers and analyses some data and finally, creates a new column in the Excel file and inserts the information gathered beforehand.
I can do most of it, except for the part of adding a new column and entering the data.
I was trying to do something like this:
path_data = **the path to the excel file**
recap = pd.read_excel(os.path.join(path_data,'My_Excel.xlsx')) # where I access the Excel file
recap['New information Column'] = Some Value
Is this a correct way of doing this? And if so, can someone suggest a better way (that works ehehe)
Thank you a lot!
You can import the excel file into python using pandas.
import pandas as pd
df = pd.read_excel (r'Path\Filename.xlsx')
print (df)
If you have many sheets, then you could do this:
import pandas as pd
df = pd.read_excel (r'Path\Filename.xlsx', sheet_name='sheetname')
print (df)
To add a new column you could do the following:
df['name of the new column'] = 'things to add'
Then when you're ready, you can export it as xlsx:
import openpyxl
# to excel
df.to_excel(r'Path\filename.xlsx')

How do I create a dataframe from a geojason file without using geopandas?

I'm looking to turn a geojason into a pandas dataframe that I can work with using python. However, for some reason, the geojason package will not install on my computer.
So wanted to know how I could turn a geojason file into a dataframe witout using the geojason package.
This is what I have so far
import json
import pandas as pd
with open('Local_Authority_Districts_(December_2020)_UK_BGC.geojson') as f:
data = json.load(f)
Here is a link to the geojason that I'm working with. I'm new to python so any help would be much appreciated. https://drive.google.com/file/d/1V4WljiJcASqq9ksh8CHM_2nBC0K2PR18/view?usp=sharing
You could use geopandas. It's as easy as this:
import geopandas as gpd
gdf = gpd.read_file('Local_Authority_Districts_(December_2020)_UK_BGC.geojson')
You can turn the resulting geodataframe into a regular dataframe with:
df = pd.DataFrame(gdf)

How to Extract the result from python into a xls file

I'm a novice in python and I need to extract references from scientific literature. Following is the code I'm using
from refextract import extract_references_from_url
references = extract_references_from_url('https://arxiv.org/pdf/1503.07589.pdf')
print(references)
So, Please guide me on how to extract this printed information into a Xls file. Thank you so much.
You could use the pandas library to write the references into excel.
from refextract import extract_references_from_url
import pandas as pd
references = extract_references_from_url('https://arxiv.org/pdf/1503.07589.pdf')
print(references)
# convert to pandas dataframe
dfref = pd.DataFrame(references)
# write dataframe into excel
dfref.to_excel('./refs.xlsx')
You should have a look at xlsxwriter, a module for creating excel files.
Your code could then look like this:
import xlsxwriter
from refextract import extract_references_from_url
workbook = xlsxwriter.Workbook('References.xlsx')
worksheet = workbook.add_worksheet()
references = extract_references_from_url('https://arxiv.org/pdf/1503.07589.pdf')
row = 0
col = 0
worksheet.write(references)
workbook.close
(modified based upon https://xlsxwriter.readthedocs.io/tutorial01.html)
After going through the documentation of refextract here, I found that your variable references is a dictionary. For converting such a dictionary to python you can use Pandas as follows-
import pandas as pd
# create a pandas dataframe using a dictionary
df = pd.DataFrame(data=references, index=[0])
# Take transpose of the dataframe
df = (df.T)
# write the dictionary to an excel file
df.to_excel('extracted_references.xlsx')

How do split columns of csv file after importing in jupetyr notebook?

I have imported the csv file onto jupetyr notebook, but i am unable to visualize properly
Use pandas library and read your data as a DataFrame:
import pandas as pd
dataframe = pd.read_csv('\filepath')
Then you visualize your columns as:
dataframe.columns
or you can visualize a snapshot of your data like this:
dataframe.head()
or perhaps access a column value by referencing it like this: dataframe['column_name'].
Read a quick tutorial here: https://www.datacamp.com/community/blog/python-pandas-cheat-sheet

Update a single cell in an Excel spreadsheet using Pandas

I'm just wondering how to update a single cell in an excel spreadsheet with Pandas in a python script. I don't want any of the other cells in the file to be overwritten, just the one cell I'm trying to update. I tried using .at[], .iat[], and .loc() but my excel spreadsheet does not update. None of the other deprecated methods like .set_value() work either. What am I doing wrong?
import pandas as pd
tp = pd.read_excel("testbook.xlsx", sheet_name = "Sheet1")
tp.at[1, 'A'] = 10
I might suggest using xlwings for this operation, as it might be easier than reading and writing a sheet in pandas dataframes. The example below changes the value of "A1".
import xlwings as xw
sheet = xw.Book("testbook.xlsx").sheets("Sheet1")
sheet.range("A1").value = "hello world"
Also note xlwings is included with all Anaconda packages if you're using that: https://docs.xlwings.org/en/stable/api.html

Categories