How to read an .xlsx file on sharepoint into a pandas dataframe? - python

I have a Python script which loads an .xslx file into a pandas dataframe using read_excel:
import os
import pandas as pd
V_file = "My_file.xlsx"
V_path = r"C:\My_folder"
os.chdir(V_path)
V_df = pd.read_excel(V_file, sheet_name = "Sheet1")
This works for files saved locally. However, I want to read in a file that is saved in Sharepoint. Does anyone know how I can adapt the code above to do this please? And also, if it's not too much trouble, an explanation of what the adapted code is doing exactly please?

Related

How to make pandas recognise my xlsx file as multiple-columned datafrane

While making my bot to set permissions automatically while it came into a guild, Writing codes for this seemed getting too long. So, I just wanted to made my bot to get xlsx file as dataframe and set permissions from that data inside.
I wanted to make this xlsx file of mine as multiple-columned dataframe, but I don't think my program recognises it as one. Do I have my errors in my code below or I have to change my excel file for it to be rocognised as I wanted?
from pandas import read_excel
perm_data = read_excel('E:/Discord bot/Grail-Relique/data/xlsx/TextPermission.xlsx', header=[0,1], engine='openpyxl')
print(perm_data)
print(perm_data.loc[0,(0,0)])
result
This should do the work:
import pandas as pd
df = pd.read_excel('your/path/to/file.xlsx',
header=[0,1],
index_col=0)
print(df.head())

Reading XLSB (binary) file with Pandas read_excel using pyxlsb reads empty rows for some xlsb file

I'm trying to read binary Excel files using read_excel method in pandas with pyxlsb engine as below:
import pandas as pd
df = pd.read_excel('test.xlsb', engine='pyxlsb')
If the xlsb file is like this file (Right now, I'm sharing this file via WeTransfer, but if there is a better way to share files on StackOverflow, let me know), the returned dataframe is filled with NaN's. I suspected that it might be because the file was saved with active cell pointing at the empty cells after the data originally. So I tried this:
import pandas as pd
with open('test.xlsb', 'rb') as data:
data.seek(0,0)
df = pd.read_excel(data, engine='pyxlsb')
but it still doesn't seem to work. I also tried reading the data from byte number 0 (from the beginning), writing it into a new file, 'test_1.xlsb', and finally reading it with pandas, but that doesn't work.
with open('test.xlsb','rb') as data:
data.seek(0,0)
with open('test_1.xlsb','wb') as outfile:
outfile.write(data.read())
df = pd.read_excel('test_1.xlsb', engine='pyxlsb')
If anyone has suggestion as to what might be going on and how to resolve it, I'd greatly appreciate the help.

Reading xls file after resaving

I am downloading an excel file from a website.
If I just use pandas to open the file
import pandas as pd
df = pd.read_excel('filepath')
I get an error CompDocError: Workbook corruption: seen[2] == 4
If I resave file before opening it everything works fine
import pandas as pd
import win32com.client
def resave_excel(filename):
xcl = win32com.client.Dispatch('Excel.Application')
wb = xcl.workbooks.open(filename)
xcl.DisplayAlerts = False
wb.Save()
xcl.Quit()
resave_excel('filepath')
df = pd.read_excel('filepath')
The problem with this approach is that I actually call Excel application and it is not the safest thing to do, especially if I want to run the full script on some automated basis or if I want to run it on a different platform.
Is there a different approach that I am missing?
The only solution that I found is discussed on https://github.com/python-excel/xlrd/issues/149.
Instead of pandas you need to use xlrd and make changes to xlrd/compdoc.py.

I am trying to upload a csv file onto Python (Azure) but am running into file IO Error does not exist

My code is:
import pandas as pd
df=pd.read_csv('Project_Wind_Data.csv'), usecols = ['U100', 'V100']) with open
('Project_Wind_Data.csv',"r") as csvfile:
I am trying to access certain columns within the csv file. I recive an error message saying that the data file does not exist
My data is in the following form:
This is must a be trivial issue but help would be much appreciated.
If your csv file is in the same working directory as your .py code, you use directly
import pandas as pd
df=pd.read_csv('Project_Wind_Data.csv'), usecols = ['U100', 'V100'])
If the file is in another directory, replace 'Project_Wind_Data.csv' with the full path to the file like c:User/Documents/file.txt

python pandas read_excel return an AssertionError: importing a file with images

I can't use the read_excel method from pandas library in my Ipython note book.
After some test and cleaning in the Excel file, I understood their is a complete column of drawings (or images). When I deleted this column I stop the error message. Does somebody know how to configure read_excel option to collect only dataes? This is my code:
import pandas as pd
import os
# File selection
userfilepath = r'C:\Temp'
filename = "exportCS12.xlsx"
filenameCS12 = os.path.join(userfilepath, filename)
print(filenameCS12)
# workbook upload
df = pd.read_excel(filenameCS12, sheetname='Sheet1')
Pandas import was not working due to a none clean excel file. Problem sovlve with openpyxl, able to navigate in excel only in validated areas.

Categories