Retrieve files from google drive folder based on search term - python

I am quite new to working in google drive and I am well aware that i can't ask stackoverflow the complete example of the below scenario, however if you can direct me to something similar it would be really helpful. I am quite stuck and couldn't move forward.
I have uploaded the contents of 7-8 gb of pdf files which includes pdf, docx, ppt etc in google drive. My concern is to list all the files that contain the term queried by user. For instance, if i want to search 'computer vision using google drive api' then the results should contain the list of files that contain the term 'computer vision' .
The above scenario is possible when i type something in google drive search box and below is the screen shot.
When i type machine learning, i get list of files. How to retrieve the same results by programatically. I have read the documentation of google drive api and came across the syntac 'fulltext contains term' but then i don't know how to use it.

As you correctly said, an easy way to do this is to use the q parameter of the request, along with the fullText contains X operator. Below you can see an adaptation of the Python Quickstart from the reference that uses this feature:
from __future__ import print_function
import pickle
import os.path
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request
# If modifying these scopes, delete the file token.pickle.
SCOPES = ['https://www.googleapis.com/auth/drive.metadata.readonly']
def main():
"""Shows basic usage of the Drive v3 API.
Prints the names and ids of the first 10 files the user has access to.
"""
creds = None
# The file token.pickle stores the user's access and refresh tokens, and is
# created automatically when the authorization flow completes for the first
# time.
if os.path.exists('token.pickle'):
with open('token.pickle', 'rb') as token:
creds = pickle.load(token)
# If there are no (valid) credentials available, let the user log in.
if not creds or not creds.valid:
if creds and creds.expired and creds.refresh_token:
creds.refresh(Request())
else:
flow = InstalledAppFlow.from_client_secrets_file(
'credentials.json', SCOPES)
creds = flow.run_local_server(port=0)
# Save the credentials for the next run
with open('token.pickle', 'wb') as token:
pickle.dump(creds, token)
service = build('drive', 'v3', credentials=creds)
# Call the Drive v3 API
results = service.files().list(
pageSize=1000, fields="nextPageToken, files(id, name)", q="fullText contains 'computer vision'").execute()
items = results.get('files', [])
if not items:
print('No files found.')
else:
print('Files:')
for item in items:
print(u'{0} ({1})'.format(item['name'], item['id']))
if __name__ == '__main__':
main()
Notice the q parameter upon calling the service.files().list() method.
Reference
Google Drive API - Search for Files
Python Drive API v3 reference - list()

Related

Trying to create a simple python script. Keep getting errors

This is my first time posting here so i apologize if i have not posted in the correct format!
from google.auth.transport.requests import Request
from google.oauth2.credentials import Credentials
from google_auth_oauthlib.flow import InstalledAppFlow
from googleapiclient.discovery import build
from googleapiclient.errors import HttpError
# If modifying these scopes, delete the file token.json.
SCOPES = ['https://www.googleapis.com/auth/drive.metadata.readonly']
def main():
"""Shows basic usage of the Drive v3 API.
Prints the names and ids of the first 10 files the user has access to.
"""
creds = None
# The file token.json stores the user's access and refresh tokens, and is
# created automatically when the authorization flow completes for the first
# time.
if os.path.exists('token.json'):
creds = Credentials.from_authorized_user_file('token.json', SCOPES)
# If there are no (valid) credentials available, let the user log in.
if not creds or not creds.valid:
if creds and creds.expired and creds.refresh_token:
creds.refresh(Request())
else:
flow = InstalledAppFlow.from_client_secrets_file(
'credentials.json', SCOPES)
creds = flow.run_local_server(port=0)
# Save the credentials for the next run
with open('token.json', 'w') as token:
token.write(creds.to_json())
try:
service = build('drive', 'v3', credentials=creds)
# Call the Drive v3 API
results = service.files().list(
pageSize=10, fields="nextPageToken, files(id, name)").execute()
items = results.get('files', [])
if not items:
print('No files found.')
return
print('Files:')
for item in items:
print(u'{0} ({1})'.format(item['name'], item['id']))
except HttpError as error:
# TODO(developer) - Handle errors from drive API.
print(f'An error occurred: {error}')
if __name__ == '__main__':
main()
I'm trying to create a simple python script that does the following.
1.Import the necessary Libraries.
2.Connect/Authenticate to Google drive.
3.Check a folder on google drive for file names and then upload all files that are not on the drive yet from a
specific folder on computer to the checked folder on google drive.
4.repeat every hour, If no new files found then restart loop.
this is what i tried so far but i get errors when running the script, The issue is it cannot find the credentials.json file.
The credentials.json file is in the same folder as the settings.yaml and the Setup.py
I was expecting to be directed to a login page when running the script.
To be fair i know very very little about python and the most experience i have with coding would be from my old modded minecraft days using lua. on computercraft. hahah thanks for your help in advance.**

Is it possible to sync or upload the file from google drive without copying the whole thing in the folder using python?

I just start learning the python scripting and I created a script using pydrive and the function is uploading all files from local folder (linux OS) to google drive but I'm planning to modify the script for my automation and add the function that can upload only the most recent file added to the local folder with no reuploading of all the files inside the folder, may I know if this is possible with python script alone?
Thank you in advance!
You dont need to use pydrive. You can use the Google api python client library directly. As far as i know pydrive does use the client library internally. There's a starter example here
Quick start python
from __future__ import print_function
import os.path
from google.auth.transport.requests import Request
from google.oauth2.credentials import Credentials
from google_auth_oauthlib.flow import InstalledAppFlow
from googleapiclient.discovery import build
from googleapiclient.errors import HttpError
# If modifying these scopes, delete the file token.json.
SCOPES = ['https://www.googleapis.com/auth/drive.metadata.readonly']
def main():
"""Shows basic usage of the Drive v3 API.
Prints the names and ids of the first 10 files the user has access to.
"""
creds = None
# The file token.json stores the user's access and refresh tokens, and is
# created automatically when the authorization flow completes for the first
# time.
if os.path.exists('token.json'):
creds = Credentials.from_authorized_user_file('token.json', SCOPES)
# If there are no (valid) credentials available, let the user log in.
if not creds or not creds.valid:
if creds and creds.expired and creds.refresh_token:
creds.refresh(Request())
else:
flow = InstalledAppFlow.from_client_secrets_file(
'credentials.json', SCOPES)
creds = flow.run_local_server(port=0)
# Save the credentials for the next run
with open('token.json', 'w') as token:
token.write(creds.to_json())
try:
service = build('drive', 'v3', credentials=creds)
# Call the Drive v3 API
results = service.files().list(
pageSize=10, fields="nextPageToken, files(id, name)").execute()
items = results.get('files', [])
if not items:
print('No files found.')
return
print('Files:')
for item in items:
print(u'{0} ({1})'.format(item['name'], item['id']))
except HttpError as error:
# TODO(developer) - Handle errors from drive API.
print(f'An error occurred: {error}')
if __name__ == '__main__':
main()
Manage uploads
file_metadata = {'name': 'photo.jpg'}
media = MediaFileUpload('files/photo.jpg', mimetype='image/jpeg')
file = drive_service.files().create(body=file_metadata,
media_body=media,
fields='id').execute()
print 'File ID: %s' % file.get('id')

Is it possible to list files in a specified directory in Google Drive, using google drive api?

For example, in my Google Drive, I have a directory called raw_pdf, is it possible to list all the files in that directory using Google Drive API?
Using the Q paramater which is part of files.list allows you to do a Files search
parents in Whether the parents collection contains the specified ID.
by sending something like
parents in 1234
where 1234 is equal to the file id of your raw_pdf directory
I recommend following the official Python quick start example which shows how to authenticate your application and how to use file.list you will just need to then add the q parameter to the request.
from __future__ import print_function
import pickle
import os.path
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request
# If modifying these scopes, delete the file token.pickle.
SCOPES = ['https://www.googleapis.com/auth/drive.metadata.readonly']
def main():
"""Shows basic usage of the Drive v3 API.
Prints the names and ids of the first 10 files the user has access to.
"""
creds = None
# The file token.pickle stores the user's access and refresh tokens, and is
# created automatically when the authorization flow completes for the first
# time.
if os.path.exists('token.pickle'):
with open('token.pickle', 'rb') as token:
creds = pickle.load(token)
# If there are no (valid) credentials available, let the user log in.
if not creds or not creds.valid:
if creds and creds.expired and creds.refresh_token:
creds.refresh(Request())
else:
flow = InstalledAppFlow.from_client_secrets_file(
'credentials.json', SCOPES)
creds = flow.run_local_server(port=0)
# Save the credentials for the next run
with open('token.pickle', 'wb') as token:
pickle.dump(creds, token)
service = build('drive', 'v3', credentials=creds)
# Call the Drive v3 API
results = service.files().list(
pageSize=10, fields="nextPageToken, files(id, name)").execute()
items = results.get('files', [])
if not items:
print('No files found.')
else:
print('Files:')
for item in items:
print(u'{0} ({1})'.format(item['name'], item['id']))
if __name__ == '__main__':
main()

Create new contacts through google contacts API or people API

Is it possible to create google contacts using the google contact APIs or people APIs?
I'm having trouble creating new contacts using the google APIs.
I'm searching for days and found the following information:
1 - Looks like the people API package comes to replace google contacts API
https://gsuite-developers.googleblog.com/2017/07/google-people-api-now-supports-updates.html
2 - Many people are unable to create new contacts with python 3+ using gdata and atom packages.
3 - people API appears as recommended by Gsuite
https://support.google.com/a/answer/6103110?hl=pt-BR
I would like to know if anyone is creating new contacts using these google APIs.
Is a g suite email required?
How do I get access token?
I've done all the setup on google cloud platform (enable APIs and auth2), i have the json file, secret key and client id
edit:
I am managing to list my 50 contacts with this code, I am having to modify the blocks to create new contacts
from __future__ import print_function
import pickle
import os.path
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request
# If modifying these scopes, delete the file token.pickle.
SCOPES = ['https://www.googleapis.com/auth/contacts']
def main():
"""Shows basic usage of the People API.
Prints the name of the first 10 connections.
"""
creds = None
# The file token.pickle stores the user's access and refresh tokens, and is
# created automatically when the authorization flow completes for the first
# time.
if os.path.exists('token.pickle'):
with open('token.pickle', 'rb') as token:
creds = pickle.load(token)
# If there are no (valid) credentials available, let the user log in.
if not creds or not creds.valid:
if creds and creds.expired and creds.refresh_token:
creds.refresh(Request())
else:
flow = InstalledAppFlow.from_client_secrets_file(
'credentials.json', SCOPES)
creds = flow.run_local_server(port=0)
# Save the credentials for the next run
with open('token.pickle', 'wb') as token:
pickle.dump(creds, token)
service = build('people', 'v1', credentials=creds)
# Call the People API
print('List 50 connection names')
results = service.people().connections().list(
resourceName='people/me',
pageSize=50,
personFields='names,emailAddresses').execute()
connections = results.get('connections', [])
for person in connections:
names = person.get('names', [])
if names:
name = names[0].get('displayName')
print(name)
if __name__ == '__main__':
main()
Since you already have auth working to list contacts, you should be able to do something like this to create one:
newContact = { "names": [{ "givenName": "John", "familyName": "Doe" }] }
result = service.people().createContact(body=newContact).execute()
The full definition of what can be in the body/person is here.

Looking for example using MediaFileUpload

Does anyone know where I can find complete sample code for uploading a local file and getting contents with MediaFileUpload?
I really need to see both the HTML form used to post and the code to accept it. I'm pulling my hair out and so far only getting partial answers.
I found this question while trying to figure out where the heck "MediaFileUpload" came from in the Google API examples, and I eventually figured it out. Here is a more complete code example that I used to test things with Python 2.7.
You need a JSON credentials file for this code to work. This is the credentials file you get from your Google app / project / thing.
You also need a file to upload, I'm using "test.html" here in the example.
from oauth2client.service_account import ServiceAccountCredentials
from apiclient.discovery import build
from apiclient.http import MediaFileUpload
#Set up a credentials object I think
creds = ServiceAccountCredentials.from_json_keyfile_name('credentials_from_google_app.json', ['https://www.googleapis.com/auth/drive'])
#Now build our api object, thing
drive_api = build('drive', 'v3', credentials=creds)
file_name = "test"
print "Uploading file " + file_name + "..."
#We have to make a request hash to tell the google API what we're giving it
body = {'name': file_name, 'mimeType': 'application/vnd.google-apps.document'}
#Now create the media file upload object and tell it what file to upload,
#in this case 'test.html'
media = MediaFileUpload('test.html', mimetype = 'text/html')
#Now we're doing the actual post, creating a new file of the uploaded type
fiahl = drive_api.files().create(body=body, media_body=media).execute()
#Because verbosity is nice
print "Created file '%s' id '%s'." % (fiahl.get('name'), fiahl.get('id'))
A list of valid Mime Types to use in the "body" hash is available at
https://developers.google.com/drive/v3/web/mime-types
A list of valid mimetype strings for the MediaFileUpload (they'll attempt to convert your file to whatever you put here):
https://developers.google.com/drive/v3/web/integrate-open#open_files_using_the_open_with_contextual_menu
Python 2.7, resumable upload.
https://github.com/googleapis/google-api-python-client/blob/master/docs/media.md
from __future__ import print_function
import pickle
import os.path
from googleapiclient.http import MediaFileUpload
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request
# If modifying these scopes, delete the file token.pickle.
SCOPES = ['https://www.googleapis.com/auth/drive']
def main():
"""Shows basic usage of the Drive v3 API.
Prints the names and ids of the first 10 files the user has access to.
"""
creds = None
# The file token.pickle stores the user's access and refresh tokens, and is
# created automatically when the authorization flow completes for the first
# time.
if os.path.exists('token.pickle'):
with open('token.pickle', 'rb') as token:
creds = pickle.load(token)
# If there are no (valid) credentials available, let the user log in.
if not creds or not creds.valid:
if creds and creds.expired and creds.refresh_token:
creds.refresh(Request())
else:
flow = InstalledAppFlow.from_client_secrets_file(
'credentials.json', SCOPES)
creds = flow.run_local_server(port=0)
# Save the credentials for the next run
with open('token.pickle', 'wb') as token:
pickle.dump(creds, token)
service = build('drive', 'v3', credentials=creds)
media = MediaFileUpload(
'big.jpeg',
mimetype='image/jpeg',
resumable=True
)
request = service.files().create(
media_body=media,
body={'name': 'Big', 'parents': ['<your folder Id>']}
)
response = None
while response is None:
status, response = request.next_chunk()
if status:
print("Uploaded %d%%." % int(status.progress() * 100))
print("Upload Complete!")
if __name__ == '__main__':
main()
You won't need to post JSON yourself, the client library handles that for you.
We provide full code samples already which can be found here: https://github.com/gsuitedevs/python-samples
Also you could check the file.insert reference documentation which contains a Python sample: https://developers.google.com/drive/v2/reference/files/insert
If this does not answer what you want perhaps you could explain in more details what you want to achieve and your architecture currently in place.
I want to provide additional information on uploading to a specific drive folder. I am providing the example from my AWS Lambda with Python 3.7
Note:
You need the folder ID for your desired location. You can find this by going to your drive folder and looking for the ID in the URL in the Browser.
For example in this URL, https://drive.google.com/drive/u/0/folders/1G91IKgQqI9YgNj8Odc8SIOPHrWOjdvOO, 1G91IKgQqI9YgNj8Odc8SIOPHrWOjdvOO would be your ID.
You need to provide your service account email access to the folder to be accessed. The service account email is found in the IAM section of your Google Cloud account. Add access to your folder by going to it in Drive, clicking the "i" icon on the top right, clicking details, then manage access.
You also need the JSON file associated with the service account. Find/create this in the Service Accounts section in Google Cloud IAM on the KEYS tab. The file contains the private key for your project. Store it where your code can access.
I'm not sure which dependencies you need to install but I think
pydrive installed them all for me: pip3 install pydrive
from apiclient.discovery import build
from google.oauth2 import service_account
from googleapiclient.http import MediaFileUpload
# This provides what authority the service account has as well as the location of the JSON file containing the private key.
scopes = ['https://www.googleapis.com/auth/drive']
service_account_file = 'path/to/service_account.json'
# Create the credentials object for the service account
credentials = service_account.Credentials.from_service_account_file(service_account_file, scopes=scopes)
drive = build('drive', 'v3', credentials=credentials)
# Create the metadata for the file and upload it to the drive folder. Supply the corresponding MIME type for your file. The parents parameter is very important, this is where you supply the ID you found for your drive folder.
body = {'name': 'testfile.txt', 'mimeType': 'text/plain', 'parents': ["theStringForTheDriveFolder"]}
media = MediaFileUpload('path/to/testfile.txt', mimeType='text/plain')
drive.files().create(body=body, media_body=media).execute()
Here's the documentation I followed:
How to use service accounts to call google APIs: https://developers.google.com/identity/protocols/oauth2/service-account#python
Documentation for the Drive API V3 to upload files:
https://developers.google.com/drive/api/v3/reference/files/create
Documentation for the Google-Python API client:
https://github.com/googleapis/google-api-python-client/blob/main/docs/oauth.md
How to perform various upload types with the Drive API:
https://developers.google.com/drive/api/guides/manage-uploads#simple
How to create and use Service Accounts, including generating the JSON
file/private keys:
https://developers.google.com/identity/protocols/oauth2/service-account

Categories