I am trying to find any way possible to get a SharePoint list in Python. I was able to connect to SharePoint and get the XML data using Rest API via this video: https://www.youtube.com/watch?v=dvFbVPDQYyk... but not sure how to get the list data into python. The ultimate goal will be to get the SharePoint data and import into SSMS daily.
Here is what I have so far..
import requests
from requests_ntlm import HttpNtlmAuth
url='URL would go here'
username='username would go here'
password='password would go here'
r=requests.get(url, auth=HttpNtlmAuth(username,password),verify=False)
I believe these would be the next steps. I really only need help getting the data from SharePoint in Excel/CSV format preferably and should be fine from there. But any recommendations would be helpful..
#PARSE XML VIA REST API
#PRINT INTO DATAFRAME AND CONVERT INTO CSV
#IMPORT INTO SQL SERVER
#EMAIL RESULTS
from shareplum import Site
from requests_ntlm import HttpNtlmAuth
server_url = "https://sharepoint.xxx.com/"
site_url = server_url + "sites/org/"
auth = HttpNtlmAuth('xxx\\user', 'pwd')
site = Site(site_url, auth=auth, verify_ssl=False)
sp_list = site.List('list name in my share point')
data = sp_list.GetListItems('All Items', rowlimit=200)
this can be done using SharePlum and Pandas
following is the working code snippet
import pandas as pd # importing pandas to write SharePoint list in excel or csv
from shareplum import Site
from requests_ntlm import HttpNtlmAuth
cred = HttpNtlmAuth(#userid_here, #password_here)
site = Site('#sharePoint_url_here', auth=cred)
sp_list = site.List('#SharePoint_list name here') # this creates SharePlum object
data = sp_list.GetListItems('All Items') # this will retrieve all items from list
# this creates pandas data frame you can perform any operation you like do within
# pandas capabilities
data_df = pd.DataFrame(data[0:])
data_df.to_excel("data.xlsx")
please rate if this helps.
Thank you in advance!
I know this doesn't directly answer your question (and you probably have an answer by now) but I would give the SharePlum library a try. It should hopefully simplify the process you have for interacting with SharePoint.
Also, I am not sure if you have a requirement to export the data into a csv but, you can connect directly to SQL Server and insert your data more directly.
I would have just added this into the comments but don't have a high enough reputation yet.
I can help with most of these issues
import requests
import xml.etree.ElementTree as ET
import csv
from requests_ntlm import HttpNtlmAuth
response = requests.get("your_url", auth=HttpNtlmAuth('xxxx\\username','password'))
tree = ET.ElementTree(ET.fromstring(response.content))
tree.write('file_name_xml.xml')
root = tree.getroot()
#Create csv file
csv_file = open('file_name_csv.csv', 'w', newline = '', encoding='ansi')
csvwriter = csv.writer(csv_file)
col_names = ['Col_1', 'Col_2', 'Col_3', 'Col_n']
csvwriter.writerow(col_names)
field_tag = ['dado_1', 'dado_2', 'dado_3', 'dado_n']
#schema XML microsoft
ns0 = "http://www.w3.org/2005/Atom"
ns1 = "http://schemas.microsoft.com/ado/2007/08/dataservices/metadata"
ns2 = "http://schemas.microsoft.com/ado/2007/08/dataservices"
for member in root:
if member.tag == '{' + ns0 + '}entry':
for element in member:
if element.tag == '{' + ns0 + '}content':
data_line = []
for field in element[0]:
for count in range(0, len(field_tag)):
if field.tag == '{' + ns2 + '}' + field_tag[count]:
data_line.append(field.text)
csvwriter.writerow(data_line)
csv_file.close()
Related
I am trying to read an excel file from sharepoint to python.
Q1: There are two URLs for the file. If I directly copy the link of the file, I get:
https://company.sharepoint.com/:x:/s/project/letters-numbers?e=lettersnumbers
If I click into folders from the webpage one after another, until I click and open the excel file, the URL now is:
https://company.sharepoint.com/:x:/r/sites/project/_layouts/15/Doc.aspx?sourcedoc=letters-numbers&file=Table.xlsx&action=default&mobileredirect=true
Which one should I use?
Q2: My code below:
import pandas as pd
from office365.runtime.auth.authentication_context import AuthenticationContext
from office365.sharepoint.client_context import ClientContext
from office365.sharepoint.files.file import File
URL = "https://company.sharepoint.com/:x:/s/project/letters-numbers?e=lettersnumbers"
USERNAME = "abc#a.com"
PASSWORD = "abcd"
ctx_auth = AuthenticationContext(URL)
if ctx_auth.acquire_token_for_user(USERNAME, PASSWORD):
ctx = ClientContext(URL, ctx_auth)
web = ctx.web
ctx.load(web)
ctx.execute_query()
print("Authentication successful")
else:
print(ctx_auth.get_last_error())
response = File.open_binary(ctx, URL)
bytes_file_obj = io.BytesIO()
bytes_file_obj.write(response.content)
bytes_file_obj.seek(0)
df = pd.read_excel(bytes_file_obj, sheet_name="Sheet2")
It works until the pd.read_excel(), where I get ValueError.
ValueError: Excel file format cannot be determined, you must specify an engine manually.
I don't know where it went wrong and if there will be further problems with loading. It will be highly appreciated if someone could warn me of the problems or leave an example.
If you take a look at the pandas documentation for ‘read_excel’ (https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html), you’ll see that there is an ‘engine’ parameter.
Try the different options and see which one works, since your error is saying that an engine has to be specified manually.
If this is correct, in the future, take the error messages literally and check the documentation
I have tried different URLs (and how to obtain them), and received different binary files. They are either a line of code status (like 403) or warning, or something that looks like a header. So I believe the problem is the URL format.
Here (github.com/vgrem) I found the answer.
It basically says that for ClientContext you need an absolute URL,
URL = "https://company.sharepoint.com/:x:/r/sites/project"
And for File you need a relative path, but with overlap with the URL:
RELATIVE_PATH = "/sites/project/Shared%20Documents/Folder/Table.xlsx"
The RELATIVE_PATH can be found like this:
Go to the folder of the file in Teams (or on the webpage).
Choose the file, Open in app (Excel).
In Excel, File -> Property, copy the path and adapt to the above format.
Replace Space with "%20".
ctx_auth = AuthenticationContext(URL)
if ctx_auth.acquire_token_for_user(USERNAME, PASSWORD):
ctx = ClientContext(URL, ctx_auth)
web = ctx.web
ctx.load(web)
ctx.execute_query()
print("Authentication successful")
else:
print(ctx_auth.get_last_error())
response = File.open_binary(ctx, RELATIVE_PATH)
bytes_file_obj = io.BytesIO()
bytes_file_obj.write(response.content)
bytes_file_obj.seek(0)
df = pd.read_excel(bytes_file_obj, sheet_name='Sheet2')
If the sheet_name is not specified and the original .xlsx has multiple sheets, the pd.read_excel() will generate warnings and the df here is actually a dict.
I am an absolute beginner when it comes to working with REST APIs with python. We have received a share-point URL which has multiple folders and multiples files inside those folders in the 'document' section. I have been provided an 'app_id' and a 'secret_token'.
I am trying to access the .csv file and read them as a dataframe and perform operations.
The code for operation is ready after I downloaded the .csv and did it locally but I need help in terms of how to connect share-point using python so that I don't have to download such heavy files ever again.
I know there had been multiple queries already on this over stack-overflow but none helped to get to where I want.
I did the following and I am unsure of what to do next:
import json
from office365.runtime.auth.user_credential import UserCredential
from office365.sharepoint.client_context import ClientContext
from office365.runtime.http.request_options import RequestOptions
site_url = "https://<company-name>.sharepoint.com"
ctx = ClientContext(site_url).with_credentials(UserCredential("{app_id}", "{secret_token}"))
Above for site_url, should I use the whole URL or is it fine till ####.com?
This is what I have so far, next I want to read files from respective folders and convert them into a dataframe? The files will always be in .csv format
The example hierarchy of the folders are as follows:
Documents --> Folder A, Folder B
Folder A --> a1.csv, a2.csv
Folder B --> b1.csv, b2.csv
I should be able to move to whichever folder I want and read the files based on my requirement.
Thanks for the help.
This works for me, using a Sharepoint App Identity with an associated client Id and client Secret.
First, I demonstrate authenticating and reading a specific file, then getting a list of files from a folder and reading the first one.
import pandas as pd
import json
import io
from office365.sharepoint.client_context import ClientCredential
from office365.sharepoint.client_context import ClientContext
from office365.sharepoint.files.file import File
#Authentication (shown for a 'modern teams site', but I think should work for a company.sharepoint.com site:
site="https://<myteams.companyname.com>/sites/<site name>/<sub-site name>"
#Read credentials from a json configuration file:
spo_conf = json.load(open(r"conf\spo.conf", "r"))
client_credentials = ClientCredential(spo_conf["RMAppID"]["clientId"],spo_conf["RMAppID"]["clientSecret"])
ctx = ClientContext(site).with_credentials(client_credentials)
#Read a specific CSV file into a dataframe:
folder_relative_url = "/sites/<site name>/<sub site>/<Library Name>/<Folder Name>"
filename = "MyFileName.csv"
response = File.open_binary(ctx, "/".join([folder_relative_url, filename]))
df = pd.read_csv(io.BytesIO(response.content))
#Get a list of file objects from a folder and read one into a DataFrame:
def getFolderContents(relativeUrl):
contents = []
library = ctx.web.get_list(relativeUrl)
all_items = library.items.filter("FSObjType eq 0").expand(["File"]).get().execute_query()
for item in all_items: # type: ListItem
cur_file = item.file
contents.append(cur_file)
return contents
fldrContents = getFolderContents('/sites/<site name>/<sub site>/<Library Name>')
response2 = File.open_binary(ctx, fldrContents[0].serverRelativeUrl)
df2 = pd.read_csv(io.BytesIO(response2.content))
Some References:
Related SO thread.
Office365 library github site.
Getting a list of contents in a doc library folder.
Additional notes following up on comments:
The site path doesn't not include the full url for the site home page (ending in .aspx) - it just ends with the name for the site (or sub-site, if relevant to your case).
You don't need to use a configuration file to store your authentication credentials for the Sharepoint application identity - you could just replace spo_conf["RMAppID"]["clientId"] with the value for the Sharepoint-generated client Id and do similarly for the client Secret. But this is a simple example of what the text of a JSON file could look like:
{
"MyAppName":{
"clientId": "my-client-id",
"clientSecret": "my-client-secret",
"title":"name_for_application"
}
}
I'm using the simple-smartsheet library for read data from a sheet in Smartsheet and download existing attachments on each row of the sheet.
I can already read the data for each row, however I cannot download existing attachments.
import config
from simple_smartsheet import Smartsheet
sheet = smartsheet.sheets.get(id=config.SHEET_ID)
for row in sheet.rows:
attachments = row.attachments
print(attachments)
when executing the above command I get as a result:
[]
simple-smartsheet
I use the simple-smartsheet library as it is the only one that supports python versions 3.6+
my python version 3.7.5
You can use list_row_attachments to find information of the attachments that belongs to a row.
The code might look like this:
import config
from simple_smartsheet import Smartsheet
sheet = smartsheet.sheets.get(id=config.SHEET_ID)
for row in sheet.rows:
response = smartsheet_client.Attachments.list_row_attachments(
config.SHEET_ID,
row.id,
include_all=True
)
attachments = response.data
print(attachments)
my solution is not very pythonic, but works, it consist of 2 steps
Get the attachment links
Save the file to a local HDD (I'm doing backups too) as a pivot place
1. to get the list of attachments:
import smartsheet
import urllib.request
smart = smartsheet.Smartsheet()
att_list = smart.Attachments.list_all_attachments(<sheet_id>, include_all=True)
2. Downloading the attachments to local disk, you need to create a loop to go through the list of attachments, you can also add your own conditions to discriminate which ones to download:
for attach in att_list:
att_id = attach.id #get the id of the attachment
att_name = attach.name # get the name of the attachment
retrieve_att = smart.Attachments.get_attachment(<sheet id>, att_id) #downloads the atachment
dest_dir = "C:\\path\\to\\folder\\"
dest_file = destd+str(att_name) # parsing the destination path
dwnld_url = retrieve_att.url # this link gives you access to download the file for about 5 to 10 min. before expire
urllib.request.urlretrieve(dwnld_url, dest_file) ## retrieving attachement and saving locally
Now you have the file and you can do whatever you need with it
It looks like that library has not implemented logic for dealing with the attachments yet.
as an alternative to solving this problem I implemented a solution with the code below:
import requests
#token = 'Your smartsheet Token'
#sheetId = 'Your sheet id'
r = requests.get('https://api.smartsheet.com/2.0/sheets/{sheetId}/rows/{rowId}/attachments', headers={'Authorization': f'Bearer {token}'})
response_json = r.json()
print(response_json)
see Get Attachments for more details on handling attachments Smartsheets
I am trying to save the data that pulled out from PostgreSQL db onto designated MS SharePoint folder. To do so, first I retrieved data from local db, then I need to store/save this data onto SharePoint folder. I tried of using office365 api to do this, but no data saved on SharePoint folder. Does anyone has similar experiences of doing this in python? Any workaround to do this in python? any thoughts?
My current attempt:
first, I did pull up data from local postgresql db as follow:
from sqlalchemy import create_engine
import pandas as pd
import os.path
hostname = 'localhost'
database_name = 'postgres'
user = 'kim'
pw = 'password123'
engine = create_engine('postgresql+psycopg2://'+user+':'+pw+'#'+hostname+'/'+database_name)
sql = """ select * from mytable """
with engine.connect() as conn:
df = pd.read_sql_query(sql,con=engine)
then, I tried to store/save the data to designated sharepoint folder as follow:
from office365.runtime.auth.authentication_context import AuthenticationContext
from office365.sharepoint.client_context import ClientContext
from office365.sharepoint.files.file import File
url_shrpt = 'https://xyzcompany.sharepoint.com/_layouts/15/sharepoint.aspx?'
username_shrpt = 'kim#xyzcompany.com'
password_shrpt = 'password123'
folder_url_shrpt = 'https://xyzcompany.sharepoint.com/:f:/g/EnIh9jxkDVpOsTnAUbo-LvIBdsN0X_pJifX4_9Rx3rchnQ'
ctx_auth = AuthenticationContext(url_shrpt)
if ctx_auth.acquire_token_for_user(username_shrpt, password_shrpt):
ctx = ClientContext(url_shrpt, ctx_auth)
web = ctx.web
ctx.load(web)
ctx.execute_query()
else:
print(ctx_auth.get_last_error())
response = File.open_binary(ctx, df)
with open("Your_Offline_File_Path", 'wb') as output_file:
output_file.write(response.content)
but file was not saved on SharePoint folder. How should we save the data from PostgreSQL onto SharePoint folder using python? Is there any workaround to do this? any thoughts?
objective:
I want to write down the data that pulled out from PostgreSQL db onto SharePoint folder. From my current attempt, above attempt didn't save data onto sharepoint folder. Can anyone suggest possible way of doing this?
I think you should write csv files locally, then try following in order to upload them onto SharePoint folder:
from shareplum import Site
from shareplum import Office365
from requests_ntlm import HttpNtlmAuth
from shareplum.site import Version
UN = "myself#xyzcompany.com"
PW = "hello#"
cred = HttpNtlmAuth(UN,PW)
authcookie = Office365('https://xyzcompany.sharepoint.com',username=UN,password=PW).GetCookies()
site = Site('https://xyzcompany.sharepoint.com/sites/sample_data/',version=Version.v365,authcookie=authcookie)
folder = site.Folder('Shared Documents/New Folder')
files = Path(os.getcwd()).glob('*.csv')
for file in files:
with open(file, mode='rb') as rowFile:
fileContent = rowFile.read()
folder.upload_file(fileContent, os.path.basename(file))
this is error-free and working solution, this should work for uploading files to SharePoint folder.
A slight variation to what #Jared had up there for example if one wants to create a folder based on a date and upload files to it from a location other than the root folder on the user's computer. This will be handy to people interested in a such a solution, a problem I had.
from shareplum import Site
from shareplum import Office365
from shareplum.site import Version
import pendulum #Install it for manipulation of dates
todaysdate = pendulum.now() Get todays date
foldername1 = todaysdate.strftime('%d-%m-%Y') #Folder name in a format such as 19-06-2021
UN = "myself#xyzcompany.com"
PW = "hello#"
path = r"C:\xxxx\xxx\xxx" #Path where the files to be uploaded are stored.
doc_library = "xxxxx/yyyy" #Folder where the new folder (foldername1) will be stored
authcookie = Office365('https://xyzcompany.sharepoint.com',username=UN,password=PW).GetCookies()
site = Site('https://xyzcompany.sharepoint.com/sites/sample_data/',version=Version.v365,authcookie=authcookie)
folder = site.Folder(doc_library+'/'+foldername1) #Creates the new folder matching todays date.
files = glob.glob(path+"\\*.csv")
for file in files:
with open(file, mode='rb') as rowFile:
fileContent = rowFile.read()
folder.upload_file(fileContent, os.path.basename(file))
That's a solution that will work well for anyone looking around for such code.
I am currently using the code below to go out and fetch a Salesforce report and try to write it to a csv file. When I take the length of items its 2000, but when I execute this code it produces a CSV file that only contains 55 rows total. My guess is something is off in the write function but I am unsure.
Anyone suggestions would be appreciated.
import csv
from salesforce_reporting import Connection
import salesforce_reporting
sf = Connection(username='user',password='pw',security_token='token')
report = sf.get_report('report_id',details=True)
parser = salesforce_reporting.ReportParser(report)
items = parser.records()
with open("output.csv", "w") as f:
writer = csv.writer(f)
writer.writerows(items)
I was able to figure out that the issue was indeed in the writing aspect of my code. The code below will export your report without headers.
import csv
from salesforce_reporting import Connection
import salesforce_reporting
sf = Connection(username='user',password='pw',secrity_token='token')
report = sf.get_report('reportid',details=True)
parser = salesforce_reporting.ReportParser(report)
items = parser.records()
f = csv.writer(open('test_output.csv','w'))
f.writerows(items)