I am trying to save the data that pulled out from PostgreSQL db onto designated MS SharePoint folder. To do so, first I retrieved data from local db, then I need to store/save this data onto SharePoint folder. I tried of using office365 api to do this, but no data saved on SharePoint folder. Does anyone has similar experiences of doing this in python? Any workaround to do this in python? any thoughts?
My current attempt:
first, I did pull up data from local postgresql db as follow:
from sqlalchemy import create_engine
import pandas as pd
import os.path
hostname = 'localhost'
database_name = 'postgres'
user = 'kim'
pw = 'password123'
engine = create_engine('postgresql+psycopg2://'+user+':'+pw+'#'+hostname+'/'+database_name)
sql = """ select * from mytable """
with engine.connect() as conn:
df = pd.read_sql_query(sql,con=engine)
then, I tried to store/save the data to designated sharepoint folder as follow:
from office365.runtime.auth.authentication_context import AuthenticationContext
from office365.sharepoint.client_context import ClientContext
from office365.sharepoint.files.file import File
url_shrpt = 'https://xyzcompany.sharepoint.com/_layouts/15/sharepoint.aspx?'
username_shrpt = 'kim#xyzcompany.com'
password_shrpt = 'password123'
folder_url_shrpt = 'https://xyzcompany.sharepoint.com/:f:/g/EnIh9jxkDVpOsTnAUbo-LvIBdsN0X_pJifX4_9Rx3rchnQ'
ctx_auth = AuthenticationContext(url_shrpt)
if ctx_auth.acquire_token_for_user(username_shrpt, password_shrpt):
ctx = ClientContext(url_shrpt, ctx_auth)
web = ctx.web
ctx.load(web)
ctx.execute_query()
else:
print(ctx_auth.get_last_error())
response = File.open_binary(ctx, df)
with open("Your_Offline_File_Path", 'wb') as output_file:
output_file.write(response.content)
but file was not saved on SharePoint folder. How should we save the data from PostgreSQL onto SharePoint folder using python? Is there any workaround to do this? any thoughts?
objective:
I want to write down the data that pulled out from PostgreSQL db onto SharePoint folder. From my current attempt, above attempt didn't save data onto sharepoint folder. Can anyone suggest possible way of doing this?
I think you should write csv files locally, then try following in order to upload them onto SharePoint folder:
from shareplum import Site
from shareplum import Office365
from requests_ntlm import HttpNtlmAuth
from shareplum.site import Version
UN = "myself#xyzcompany.com"
PW = "hello#"
cred = HttpNtlmAuth(UN,PW)
authcookie = Office365('https://xyzcompany.sharepoint.com',username=UN,password=PW).GetCookies()
site = Site('https://xyzcompany.sharepoint.com/sites/sample_data/',version=Version.v365,authcookie=authcookie)
folder = site.Folder('Shared Documents/New Folder')
files = Path(os.getcwd()).glob('*.csv')
for file in files:
with open(file, mode='rb') as rowFile:
fileContent = rowFile.read()
folder.upload_file(fileContent, os.path.basename(file))
this is error-free and working solution, this should work for uploading files to SharePoint folder.
A slight variation to what #Jared had up there for example if one wants to create a folder based on a date and upload files to it from a location other than the root folder on the user's computer. This will be handy to people interested in a such a solution, a problem I had.
from shareplum import Site
from shareplum import Office365
from shareplum.site import Version
import pendulum #Install it for manipulation of dates
todaysdate = pendulum.now() Get todays date
foldername1 = todaysdate.strftime('%d-%m-%Y') #Folder name in a format such as 19-06-2021
UN = "myself#xyzcompany.com"
PW = "hello#"
path = r"C:\xxxx\xxx\xxx" #Path where the files to be uploaded are stored.
doc_library = "xxxxx/yyyy" #Folder where the new folder (foldername1) will be stored
authcookie = Office365('https://xyzcompany.sharepoint.com',username=UN,password=PW).GetCookies()
site = Site('https://xyzcompany.sharepoint.com/sites/sample_data/',version=Version.v365,authcookie=authcookie)
folder = site.Folder(doc_library+'/'+foldername1) #Creates the new folder matching todays date.
files = glob.glob(path+"\\*.csv")
for file in files:
with open(file, mode='rb') as rowFile:
fileContent = rowFile.read()
folder.upload_file(fileContent, os.path.basename(file))
That's a solution that will work well for anyone looking around for such code.
Related
I am trying to read a pdf file which I have uploaded on an Azure storage account. I am trying to do this using python.
I have tried using the SAS token/URL of the file and pass it thorugh PDFMiner but I am not able get the path of the file which will be accepted by PDFMiner. I am using something like the below code:
from azure.storage.filedatalake import DataLakeServiceClient
from azure.storage.filedatalake import generate_file_sas
import os
storage_account_name = "mystorageaccount"
storage_account_key = "mystoragekey"
container_name = "mycontainer"
directory_name = 'mydirectory'
service_client = DataLakeServiceClient(account_url="{}://{}.dfs.core.windows.net".format(
"https", storage_account_name), credential=storage_account_key)
file_system_client = service_client.get_file_system_client(file_system=container_name)
directory_client = file_system_client.get_directory_client(directory_name)
file_client = directory_client.get_file_client('XXX.pdf')
download = file_client.download_file()
downloaded_bytes = download.readall()
file_sas = generate_file_sas(account_name= storage_account_name,file_system_name= container_name,directory_name= directory_name,file_name= dir_name,credential= storage_account_key)
from pdfminer.pdfpage import PDFPage
with open(downloaded_bytes, 'rb') as infile:
PDFPage.get_pages(infile, check_extractable=False)
from pdfminer.pdfpage import PDFPage
with open(file_sas, 'rb') as infile:
PDFPage.get_pages(infile, check_extractable=False)
Neither of the options are working.
Initially the input_dir was setup locally, so the code was able to fetch the pdf file and read it.
Is there a different way to pass the URL/path of the file from the storage account to the pdf's read function?
Any help is appreciated.
I tried in my environment and got below results:
Initially, I tried with same process without downloading the Pdf files from azure Datalake storage account and got no results. But AFAIK, to read the pdf file with downloading is possible way.
I tried with below code to read pdf file with Module PyPDF2, and it executed with content successfully.
Code:
from azure.storage.filedatalake import DataLakeFileClient
import PyPDF2
service_client = DataLakeFileClient.from_connection_string("<your storage connection string>",file_system_name="test",file_path="dem.pdf")
with open("dem.pdf", 'wb') as file:
data = service_client.download_file()
data.readinto(file)
object=open("dem.pdf",'rb')
pdfread=PyPDF2.PdfFileReader(object)
print("Number of pages:",pdfread.numPages)
pageObj = pdfread.getPage(0)
print(pageObj.extractText())
Console:
You can also read the pdf file through browser using file URL:
https://<storage account name >.dfs.core.windows.net/test/dem.pdf+? sas-token
Browser:
I use this code to read an Excel file in SharePoint, to read it, no problem the code works very well, I would also like to modify it live
I read that it is possible with the O365 module
But I don't know how to use Office365-REST-Python-Client connection with O35
here is my test code
client_id = "client_id"
client_secret = "client_secret"
site_url = "https://site.sharepoint.com/sites/test"
path = 'test/General/test.xlsx'
from office365.runtime.auth.client_credential import ClientCredential
from office365.sharepoint.client_context import ClientContext
from office365.sharepoint.files.file import File
from xlrd import open_workbook # test for open excel
credentials = ClientCredential(client_id, client_secret)
ctx = ClientContext(site_url).with_credentials(credentials)
web = ctx.web
ctx.load(web)
ctx.execute_query()
print("Web title: {0}".format(web.properties['Title']))
print(web.properties)
from O365.excel import WorkBook
# given a File instance that is a xlsx file ...
excel_file = WorkBook(my_file_instance) # my_file_instance should be an instance of File.
ws = excel_file.get_worksheet('my_worksheet')
cella1 = ws.get_range('A1')
cella1.values = 35
cella1.update()
If anyone has a solution
Thanks for your help.
I am an absolute beginner when it comes to working with REST APIs with python. We have received a share-point URL which has multiple folders and multiples files inside those folders in the 'document' section. I have been provided an 'app_id' and a 'secret_token'.
I am trying to access the .csv file and read them as a dataframe and perform operations.
The code for operation is ready after I downloaded the .csv and did it locally but I need help in terms of how to connect share-point using python so that I don't have to download such heavy files ever again.
I know there had been multiple queries already on this over stack-overflow but none helped to get to where I want.
I did the following and I am unsure of what to do next:
import json
from office365.runtime.auth.user_credential import UserCredential
from office365.sharepoint.client_context import ClientContext
from office365.runtime.http.request_options import RequestOptions
site_url = "https://<company-name>.sharepoint.com"
ctx = ClientContext(site_url).with_credentials(UserCredential("{app_id}", "{secret_token}"))
Above for site_url, should I use the whole URL or is it fine till ####.com?
This is what I have so far, next I want to read files from respective folders and convert them into a dataframe? The files will always be in .csv format
The example hierarchy of the folders are as follows:
Documents --> Folder A, Folder B
Folder A --> a1.csv, a2.csv
Folder B --> b1.csv, b2.csv
I should be able to move to whichever folder I want and read the files based on my requirement.
Thanks for the help.
This works for me, using a Sharepoint App Identity with an associated client Id and client Secret.
First, I demonstrate authenticating and reading a specific file, then getting a list of files from a folder and reading the first one.
import pandas as pd
import json
import io
from office365.sharepoint.client_context import ClientCredential
from office365.sharepoint.client_context import ClientContext
from office365.sharepoint.files.file import File
#Authentication (shown for a 'modern teams site', but I think should work for a company.sharepoint.com site:
site="https://<myteams.companyname.com>/sites/<site name>/<sub-site name>"
#Read credentials from a json configuration file:
spo_conf = json.load(open(r"conf\spo.conf", "r"))
client_credentials = ClientCredential(spo_conf["RMAppID"]["clientId"],spo_conf["RMAppID"]["clientSecret"])
ctx = ClientContext(site).with_credentials(client_credentials)
#Read a specific CSV file into a dataframe:
folder_relative_url = "/sites/<site name>/<sub site>/<Library Name>/<Folder Name>"
filename = "MyFileName.csv"
response = File.open_binary(ctx, "/".join([folder_relative_url, filename]))
df = pd.read_csv(io.BytesIO(response.content))
#Get a list of file objects from a folder and read one into a DataFrame:
def getFolderContents(relativeUrl):
contents = []
library = ctx.web.get_list(relativeUrl)
all_items = library.items.filter("FSObjType eq 0").expand(["File"]).get().execute_query()
for item in all_items: # type: ListItem
cur_file = item.file
contents.append(cur_file)
return contents
fldrContents = getFolderContents('/sites/<site name>/<sub site>/<Library Name>')
response2 = File.open_binary(ctx, fldrContents[0].serverRelativeUrl)
df2 = pd.read_csv(io.BytesIO(response2.content))
Some References:
Related SO thread.
Office365 library github site.
Getting a list of contents in a doc library folder.
Additional notes following up on comments:
The site path doesn't not include the full url for the site home page (ending in .aspx) - it just ends with the name for the site (or sub-site, if relevant to your case).
You don't need to use a configuration file to store your authentication credentials for the Sharepoint application identity - you could just replace spo_conf["RMAppID"]["clientId"] with the value for the Sharepoint-generated client Id and do similarly for the client Secret. But this is a simple example of what the text of a JSON file could look like:
{
"MyAppName":{
"clientId": "my-client-id",
"clientSecret": "my-client-secret",
"title":"name_for_application"
}
}
I'm currently using shareplum and was able to do the download thing using this code below:
from shareplum import Site
from shareplum import Office365
from shareplum.site import Version
import csv
authcookie = Office365('https://bboxxeng.sharepoint.com/', username='---', password='---').GetCookies()
site = Site('https://bboxxeng.sharepoint.com/sites/TESTIAN', version=Version.v2016, authcookie=authcookie)
folder = site.Folder('Shared%20Documents/Test')
data = folder.get_file('Office ss E1.csv')
with open('asas.csv', 'wb') as f:
f.write(data)
f.close()
I tried using list_data = sp_list.GetListItems() but have no luck extracting the file names, I've also ready and tried googling but still no luck.
I understand you want to list all files in a folder so that you can download or do other modification via the file name. If so, you can get it via below attrbutes:
files
folder = site.Folder('Shared Documents/test')
allfiles= folder.files
print(allfiles)
//////////// Updated //////
The result contains file name and other properties.
For example, i want to get the name of first file from the returned result.
allfiles= folder.files
demofile= allfiles[0]
print(demofile['Name'])
Looking for a way using Azure files SDK to upload files to my azure databricks blob storage
I tried many things using function from this page
But nothing worked. I don't understand why
example:
file_service = FileService(account_name='MYSECRETNAME', account_key='mySECRETkey')
generator = file_service.list_directories_and_files('MYSECRETNAME/test') #listing file in folder /test, working well
for file_or_dir in generator:
print(file_or_dir.name)
file_service.get_file_to_path('MYSECRETNAME','test/tables/input/referentials/','test.xlsx','/dbfs/FileStore/test6.xlsx')
with test.xlsx = name of file in my azure file
/dbfs/FileStore/test6.xlsx => path where to upload the file in my dbfs system
I have the error message:
Exception=The specified resource name contains invalid characters
Tried to change the name but doesn't seem to work
edit: I'm not even sure the function is doing what I want. What is the best way to load file from azure files?
Per my experience, I think the best way to load file from Azure Files is directly to read a file via its url with sas token.
For example, as the figures below, it's a file named test.xlsx in my test file share, that I viewed it using Azure Storage Explorer, then to generate its url with sas token.
Fig 1. Right click the file and then click the Get Shared Access Signature...
Fig 2. Must select the option Read permission for directly reading the file content.
Fig 3. Copy the url with sas token
Here is my sample code, you can run it with the sas token url of your file in your Azure Databricks.
import pandas as pd
url_sas_token = 'https://<my account name>.file.core.windows.net/test/test.xlsx?st=2020-01-27T10%3A16%3A12Z&se=2020-01-28T10%3A16%3A12Z&sp=rl&sv=2018-03-28&sr=f&sig=XXXXXXXXXXXXXXXXX'
# Directly read the file content from its url with sas token to get a pandas dataframe
pdf = pd.read_excel(url_sas_token )
# Then, to convert the pandas dataframe to a PySpark dataframe in Azure Databricks
df = spark.createDataFrame(pdf)
Alternatively, to use Azure File Storage SDK to generate the url with sas token for your file or to get the bytes of your file for reading, please refer to the offical document Develop for Azure Files with Python and my sample code below.
# Create a client of Azure File Service as same as yours
from azure.storage.file import FileService
account_name = '<your account name>'
account_key = '<your account key>'
share_name = 'test'
directory_name = None
file_name = 'test.xlsx'
file_service = FileService(account_name=account_name, account_key=account_key)
To generate the sas token url of a file
from azure.storage.file import FilePermissions
from datetime import datetime, timedelta
sas_token = file_service.generate_file_shared_access_signature(share_name, directory_name, file_name, permission=FilePermissions.READ, expiry=datetime.utcnow() + timedelta(hours=1))
url_sas_token = f"https://{account_name}.file.core.windows.net/{share_name}/{file_name}?{sas_token}"
import pandas as pd
pdf = pd.read_excel(url_sas_token)
df = spark.createDataFrame(pdf)
Or using get_file_to_stream function to read the file content
from io import BytesIO
import pandas as pd
stream = BytesIO()
file_service.get_file_to_stream(share_name, directory_name, file_name, stream)
pdf = pd.read_excel(stream)
df = spark.createDataFrame(pdf)
Just as an addition to #Peter Pan answer, the alternative approach without using Pandas with python azure-storage-file-share library.
Very detailed documentation: https://pypi.org/project/azure-storage-file-share/#downloading-a-file