I am trying to read an excel file from sharepoint to python.
Q1: There are two URLs for the file. If I directly copy the link of the file, I get:
https://company.sharepoint.com/:x:/s/project/letters-numbers?e=lettersnumbers
If I click into folders from the webpage one after another, until I click and open the excel file, the URL now is:
https://company.sharepoint.com/:x:/r/sites/project/_layouts/15/Doc.aspx?sourcedoc=letters-numbers&file=Table.xlsx&action=default&mobileredirect=true
Which one should I use?
Q2: My code below:
import pandas as pd
from office365.runtime.auth.authentication_context import AuthenticationContext
from office365.sharepoint.client_context import ClientContext
from office365.sharepoint.files.file import File
URL = "https://company.sharepoint.com/:x:/s/project/letters-numbers?e=lettersnumbers"
USERNAME = "abc#a.com"
PASSWORD = "abcd"
ctx_auth = AuthenticationContext(URL)
if ctx_auth.acquire_token_for_user(USERNAME, PASSWORD):
ctx = ClientContext(URL, ctx_auth)
web = ctx.web
ctx.load(web)
ctx.execute_query()
print("Authentication successful")
else:
print(ctx_auth.get_last_error())
response = File.open_binary(ctx, URL)
bytes_file_obj = io.BytesIO()
bytes_file_obj.write(response.content)
bytes_file_obj.seek(0)
df = pd.read_excel(bytes_file_obj, sheet_name="Sheet2")
It works until the pd.read_excel(), where I get ValueError.
ValueError: Excel file format cannot be determined, you must specify an engine manually.
I don't know where it went wrong and if there will be further problems with loading. It will be highly appreciated if someone could warn me of the problems or leave an example.
If you take a look at the pandas documentation for ‘read_excel’ (https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html), you’ll see that there is an ‘engine’ parameter.
Try the different options and see which one works, since your error is saying that an engine has to be specified manually.
If this is correct, in the future, take the error messages literally and check the documentation
I have tried different URLs (and how to obtain them), and received different binary files. They are either a line of code status (like 403) or warning, or something that looks like a header. So I believe the problem is the URL format.
Here (github.com/vgrem) I found the answer.
It basically says that for ClientContext you need an absolute URL,
URL = "https://company.sharepoint.com/:x:/r/sites/project"
And for File you need a relative path, but with overlap with the URL:
RELATIVE_PATH = "/sites/project/Shared%20Documents/Folder/Table.xlsx"
The RELATIVE_PATH can be found like this:
Go to the folder of the file in Teams (or on the webpage).
Choose the file, Open in app (Excel).
In Excel, File -> Property, copy the path and adapt to the above format.
Replace Space with "%20".
ctx_auth = AuthenticationContext(URL)
if ctx_auth.acquire_token_for_user(USERNAME, PASSWORD):
ctx = ClientContext(URL, ctx_auth)
web = ctx.web
ctx.load(web)
ctx.execute_query()
print("Authentication successful")
else:
print(ctx_auth.get_last_error())
response = File.open_binary(ctx, RELATIVE_PATH)
bytes_file_obj = io.BytesIO()
bytes_file_obj.write(response.content)
bytes_file_obj.seek(0)
df = pd.read_excel(bytes_file_obj, sheet_name='Sheet2')
If the sheet_name is not specified and the original .xlsx has multiple sheets, the pd.read_excel() will generate warnings and the df here is actually a dict.
Related
I have encountered a problem while coding using Flask for my website
I have a CSV file that I converted to Dataframe then I want to return it as CSV to the user so they can download it.
however, when I do that the Arabic letters do not show clearly only symbols :( I tried different possible ways but unfortunately, all doesn't work
#flask_app.route('/getPlotCSV')
def download_file():
path ="marked_test_df.csv"
#path.encode('utf-8-sig')
try:
df = pd.read_csv(path, encoding='utf-8-sig')
df_ob= df.to_csv(encoding='utf-8-sig')
resp = make_response(df_ob)
resp.headers["Content-Disposition"] = "attachment; filename=marked_test_df.csv "
#resp.charset='utf-8-sig'
#resp.iter_encoded='utf-8-sig'
resp.headers["Content-Type"] = "text/csv ; charset = utf-16"
resp.content_encoding = 'utf-16'
os.remove(path)
except Exception as error:
flask_app.logger.error("Error removing or closing downloaded file handle", error)
return resp
sample of the CSV before the user can download it:
However, when the user downloads it through the download button it shows like that:
I am an absolute beginner when it comes to working with REST APIs with python. We have received a share-point URL which has multiple folders and multiples files inside those folders in the 'document' section. I have been provided an 'app_id' and a 'secret_token'.
I am trying to access the .csv file and read them as a dataframe and perform operations.
The code for operation is ready after I downloaded the .csv and did it locally but I need help in terms of how to connect share-point using python so that I don't have to download such heavy files ever again.
I know there had been multiple queries already on this over stack-overflow but none helped to get to where I want.
I did the following and I am unsure of what to do next:
import json
from office365.runtime.auth.user_credential import UserCredential
from office365.sharepoint.client_context import ClientContext
from office365.runtime.http.request_options import RequestOptions
site_url = "https://<company-name>.sharepoint.com"
ctx = ClientContext(site_url).with_credentials(UserCredential("{app_id}", "{secret_token}"))
Above for site_url, should I use the whole URL or is it fine till ####.com?
This is what I have so far, next I want to read files from respective folders and convert them into a dataframe? The files will always be in .csv format
The example hierarchy of the folders are as follows:
Documents --> Folder A, Folder B
Folder A --> a1.csv, a2.csv
Folder B --> b1.csv, b2.csv
I should be able to move to whichever folder I want and read the files based on my requirement.
Thanks for the help.
This works for me, using a Sharepoint App Identity with an associated client Id and client Secret.
First, I demonstrate authenticating and reading a specific file, then getting a list of files from a folder and reading the first one.
import pandas as pd
import json
import io
from office365.sharepoint.client_context import ClientCredential
from office365.sharepoint.client_context import ClientContext
from office365.sharepoint.files.file import File
#Authentication (shown for a 'modern teams site', but I think should work for a company.sharepoint.com site:
site="https://<myteams.companyname.com>/sites/<site name>/<sub-site name>"
#Read credentials from a json configuration file:
spo_conf = json.load(open(r"conf\spo.conf", "r"))
client_credentials = ClientCredential(spo_conf["RMAppID"]["clientId"],spo_conf["RMAppID"]["clientSecret"])
ctx = ClientContext(site).with_credentials(client_credentials)
#Read a specific CSV file into a dataframe:
folder_relative_url = "/sites/<site name>/<sub site>/<Library Name>/<Folder Name>"
filename = "MyFileName.csv"
response = File.open_binary(ctx, "/".join([folder_relative_url, filename]))
df = pd.read_csv(io.BytesIO(response.content))
#Get a list of file objects from a folder and read one into a DataFrame:
def getFolderContents(relativeUrl):
contents = []
library = ctx.web.get_list(relativeUrl)
all_items = library.items.filter("FSObjType eq 0").expand(["File"]).get().execute_query()
for item in all_items: # type: ListItem
cur_file = item.file
contents.append(cur_file)
return contents
fldrContents = getFolderContents('/sites/<site name>/<sub site>/<Library Name>')
response2 = File.open_binary(ctx, fldrContents[0].serverRelativeUrl)
df2 = pd.read_csv(io.BytesIO(response2.content))
Some References:
Related SO thread.
Office365 library github site.
Getting a list of contents in a doc library folder.
Additional notes following up on comments:
The site path doesn't not include the full url for the site home page (ending in .aspx) - it just ends with the name for the site (or sub-site, if relevant to your case).
You don't need to use a configuration file to store your authentication credentials for the Sharepoint application identity - you could just replace spo_conf["RMAppID"]["clientId"] with the value for the Sharepoint-generated client Id and do similarly for the client Secret. But this is a simple example of what the text of a JSON file could look like:
{
"MyAppName":{
"clientId": "my-client-id",
"clientSecret": "my-client-secret",
"title":"name_for_application"
}
}
I am trying to build a simple Streamlit app, where, I am uploading a CSV file, then loads it into dataframe, display the dataframe, and then upload it to a pre-defined FTP server.
The first part is working, the file is successfully uploaded and visualized, but then I cannot upload it to the FTP server. This is my code:
import ftplib
import pandas as pd
import streamlit as st
ftp_server = "ftp.test.com"
ftp_username = "user"
ftp_password = "password"
input_file = st.file_uploader("Upload a CSV File",type=['csv'])
if (input_file is not None) and input_file.name.endswith(".csv"):
df = pd.read_csv(input_file, delimiter="\t", encoding = 'ISO-8859-1')
st.dataframe(df)
session = ftplib.FTP(ftp_server, ftp_username, ftp_password)
file = open(input_file, "rb")
session.storbinary(input_file.name, input_file)
input_file.close()
session.quit()
st.success(f"The {input_file.name} was successfully uploaded to the FTP server: {ftp_server}!")
I am getting an error that
TypeError: expected str, bytes or os.PathLike object, not UploadedFile.
I am using Streamlit v.1.1.0.
Please note that I have simplified my code and replaced the FTP credentials. In the real world, I would probably use try/except for the session connection, etc.
I guess you get the error here:
file = open(input_file, "rb")
That line is both wrong and useless (you never use the file). Remove it.
You might need to seek the input_file back to the beginning after you have read it in read_csv:
file_input.seek(0)
You are missing the upload command (STOR) the storbinary call:
session.storbinary("STOR " + input_file.name, input_file)
I am trying to save the data that pulled out from PostgreSQL db onto designated MS SharePoint folder. To do so, first I retrieved data from local db, then I need to store/save this data onto SharePoint folder. I tried of using office365 api to do this, but no data saved on SharePoint folder. Does anyone has similar experiences of doing this in python? Any workaround to do this in python? any thoughts?
My current attempt:
first, I did pull up data from local postgresql db as follow:
from sqlalchemy import create_engine
import pandas as pd
import os.path
hostname = 'localhost'
database_name = 'postgres'
user = 'kim'
pw = 'password123'
engine = create_engine('postgresql+psycopg2://'+user+':'+pw+'#'+hostname+'/'+database_name)
sql = """ select * from mytable """
with engine.connect() as conn:
df = pd.read_sql_query(sql,con=engine)
then, I tried to store/save the data to designated sharepoint folder as follow:
from office365.runtime.auth.authentication_context import AuthenticationContext
from office365.sharepoint.client_context import ClientContext
from office365.sharepoint.files.file import File
url_shrpt = 'https://xyzcompany.sharepoint.com/_layouts/15/sharepoint.aspx?'
username_shrpt = 'kim#xyzcompany.com'
password_shrpt = 'password123'
folder_url_shrpt = 'https://xyzcompany.sharepoint.com/:f:/g/EnIh9jxkDVpOsTnAUbo-LvIBdsN0X_pJifX4_9Rx3rchnQ'
ctx_auth = AuthenticationContext(url_shrpt)
if ctx_auth.acquire_token_for_user(username_shrpt, password_shrpt):
ctx = ClientContext(url_shrpt, ctx_auth)
web = ctx.web
ctx.load(web)
ctx.execute_query()
else:
print(ctx_auth.get_last_error())
response = File.open_binary(ctx, df)
with open("Your_Offline_File_Path", 'wb') as output_file:
output_file.write(response.content)
but file was not saved on SharePoint folder. How should we save the data from PostgreSQL onto SharePoint folder using python? Is there any workaround to do this? any thoughts?
objective:
I want to write down the data that pulled out from PostgreSQL db onto SharePoint folder. From my current attempt, above attempt didn't save data onto sharepoint folder. Can anyone suggest possible way of doing this?
I think you should write csv files locally, then try following in order to upload them onto SharePoint folder:
from shareplum import Site
from shareplum import Office365
from requests_ntlm import HttpNtlmAuth
from shareplum.site import Version
UN = "myself#xyzcompany.com"
PW = "hello#"
cred = HttpNtlmAuth(UN,PW)
authcookie = Office365('https://xyzcompany.sharepoint.com',username=UN,password=PW).GetCookies()
site = Site('https://xyzcompany.sharepoint.com/sites/sample_data/',version=Version.v365,authcookie=authcookie)
folder = site.Folder('Shared Documents/New Folder')
files = Path(os.getcwd()).glob('*.csv')
for file in files:
with open(file, mode='rb') as rowFile:
fileContent = rowFile.read()
folder.upload_file(fileContent, os.path.basename(file))
this is error-free and working solution, this should work for uploading files to SharePoint folder.
A slight variation to what #Jared had up there for example if one wants to create a folder based on a date and upload files to it from a location other than the root folder on the user's computer. This will be handy to people interested in a such a solution, a problem I had.
from shareplum import Site
from shareplum import Office365
from shareplum.site import Version
import pendulum #Install it for manipulation of dates
todaysdate = pendulum.now() Get todays date
foldername1 = todaysdate.strftime('%d-%m-%Y') #Folder name in a format such as 19-06-2021
UN = "myself#xyzcompany.com"
PW = "hello#"
path = r"C:\xxxx\xxx\xxx" #Path where the files to be uploaded are stored.
doc_library = "xxxxx/yyyy" #Folder where the new folder (foldername1) will be stored
authcookie = Office365('https://xyzcompany.sharepoint.com',username=UN,password=PW).GetCookies()
site = Site('https://xyzcompany.sharepoint.com/sites/sample_data/',version=Version.v365,authcookie=authcookie)
folder = site.Folder(doc_library+'/'+foldername1) #Creates the new folder matching todays date.
files = glob.glob(path+"\\*.csv")
for file in files:
with open(file, mode='rb') as rowFile:
fileContent = rowFile.read()
folder.upload_file(fileContent, os.path.basename(file))
That's a solution that will work well for anyone looking around for such code.
I have a requirement of downloading and uploading the files to Sharepoint sites. This has to be done using python.
My site will be as https://ourOrganizationName.sharepoint.com/Followed by Further links
Initially I thought I could do this using Request, BeautifulSoup etc., But I am not at all able to go to "Inspect Element" on the body of the site.
I have tried libraries such as Sharepoint,HttpNtlmAuth,office365 etc., but I am not successful. It always returning 403.
I tried google as much I can but again not successful. Even Youtube hasn't helped me.
Could anyone help me how to do that? Suggestion on Libraries with documentation link is really appreciated.
Thanks
Have you tried Office365-REST-Python-Client library, it supports SharePoint Online authentication and allows to download/upload a file as demonstrated below:
Download a file
from office365.runtime.auth.authentication_context import AuthenticationContext
from office365.sharepoint.client_context import ClientContext
from office365.sharepoint.files.file import File
ctx_auth = AuthenticationContext(url)
ctx_auth.acquire_token_for_user(username, password)
ctx = ClientContext(url, ctx_auth)
response = File.open_binary(ctx, "/Shared Documents/User Guide.docx")
with open("./User Guide.docx", "wb") as local_file:
local_file.write(response.content)
Upload a file
ctx_auth = AuthenticationContext(url)
ctx_auth.acquire_token_for_user(username, password)
ctx = ClientContext(url, ctx_auth)
path = "./User Guide.docx" #local path
with open(path, 'rb') as content_file:
file_content = content_file.read()
target_url = "/Shared Documents/{0}".format(os.path.basename(path)) # target url of a file
File.save_binary(ctx, target_url, file_content) # upload a file
Usage
Install the latest version (from GitHub):
pip install git+https://github.com/vgrem/Office365-REST-Python-Client.git
Refer /examples/shrepoint/files/* for a more details
You can also try this solution to upload file. For me, first solution to upload doesn't work.
First step: pip3 install Office365-REST-Python-Client==2.3.11
import os
from office365.sharepoint.client_context import ClientContext
from office365.runtime.auth.user_credential import UserCredential
def print_upload_progress(offset):
print("Uploaded '{0}' bytes from '{1}'...[{2}%]".format(offset, file_size, round(offset / file_size * 100, 2)))
# Load file to upload:
path = './' + filename # if file to upload is in the same directory
try:
with open(path, 'rb') as content_file:
file_content = content_file.read()
except Exception as e:
print(e)
file_size = os.path.getsize(path)
site_url = "https://YOURDOMAIN.sharepoint.com"
user_credentials = UserCredential('user_login', 'user_password') # this user must login to space
ctx = ClientContext(site_url).with_credentials(user_credentials)
size_chunk = 1000000
target_url = "/sites/folder1/folder2/folder3/"
target_folder = ctx.web.get_folder_by_server_relative_url(target_url)
# Upload file to SharePoint:
try:
uploaded_file = target_folder.files.create_upload_session(path, size_chunk, print_upload_progress).execute_query()
print('File {0} has been uploaded successfully'.format(uploaded_file.serverRelativeUrl))
except Exception as e:
print("Error while uploading to SharePoint:\n", e)
Based on: https://github.com/vgrem/Office365-REST-Python-Client/blob/e2b089e7a9cf9a288204ce152cd3565497f77215/examples/sharepoint/files/upload_large_file.py