The Google Apps Script built-in services like
GmailApp are a great onramp for users with limited coding experience, the flip side is you can find yourself easily getting results but not in the most efficient way.
DriveApp is a great example where Google have made it easy to iterate across folders and files, but when you have lots of folders and files it becomes a time consuming process and you hit execution limits.
An alternative approach is using Google Apps Script to make direct calls to the Drive API. The benefits of this approach is you can be more specific in the data you want back and it gives more flexibility with how you call the API, in some cases with the ability to make batch or asynchronous processes.
In this post I highlight a method ideal for scenarios when you want to index larger volumes of My Drive files and folders to a Google Sheet with calls directly to the Drive API. The post includes some sample code you can use which instead is able to reduce a 4 minute runtime to index 10,000 files and folders to one that can complete in under 40 seconds!
Recently I’ve had a couple of customer projects where I needed to generate reports on the contents and folder structure in Google Drive. A common approach to do this is recursively call
DriveApp using the file and folder iterator methods to get data of the users file structure. Here is a gist for a
DriveApp file/folder iterator approach.
This is a good solution when there aren’t many folders, but can be slow for larger file/folder structures. Running this script on my personal My Drive took almost 4 minutes to index 10,000 files/folders. The issue with speed is related to the number of calls to
DriveApp. To highlight this consider the following simplified file/folder structure which includes the required calls to
In the diagram above note the number of calls to DriveApp which return empty responses. For example, because Folder A only contains subfolders, when
DriveApp.getFiles() is called on the folder it has no data to return.
Instead of using the
DriveApp built-in service an alternative approach is directly call the Google Drive API. The benefit of this is we have more control over what data is queried and returned. For example, instead of individually getting each file/folder branch we can use the following approach:
- Get all the folder IDs which also includes the parent folder ID
- Filter the folder IDs to the root parent folder ID
- Get all the files where the parent folder ID is in the filtered list
This is more efficient because the Drive API can return up to 1,000 items in a single call. Once we get all the files we need, we can then reconstruct the structure using the data in the file response object. This means if you have a filetree with less than 1,000 folders and 1,000 files you can potentially get all the data in two API calls as illustrated below:
To see what this looks like here is an example Apps Script gist for calling the Drive API. When testing with My Drive it can get 10,000 files/folders in a Google Sheet in 40 seconds! [If you would like to try both methods on your own Google My Drive you can make a copy of this Google Sheet].
There are a couple of considerations to keep in mind with this approach. This version of the script is limited to reporting on My Drive, but with minor modification you can implement shared drive support, or by setting up a service account with delegated admin view other My Drives and Shared Drives.
This approach is also more efficient when you are indexing high volumes of files/folders close or at the root of the drive. I would anticipate as you move further away from the root there is a crossover point where you go from: the Drive API for files and folders; to a mixed
DriveAppfor folders and Drive API for files; and
DriveAppfor small numbers of folders and files.
So in summary there is a lot you can do ‘out-of-the-box’ very easily with Google Apps Script, but there are times where knowledge of the underlying APIs can greatly improve the solutions you develop … a little engineering knowledge can greatly improve your productivity.