Two ways to remove duplicate rows from Google Sheets using Google Apps Script

For many years users had to find a variety of workarounds if they wanted to remove duplicate rows from Google Sheets. This all changed in 2019 when Google announced new features, which included removing duplicates from Google Sheets. Recently I got tagged in a conversation with Andrea Guerri who shared some ‘remove duplicate’ example scripts. This sent me down a bit of a rabbit hole looking at various ‘remove duplicate’ Apps Script solutions and I’ll share two of my favourites. If you prefer to jump straight to testing both of these solutions you can Make a Copy of Remove Duplicates Example Data, which includes the sample code.

Remove duplicates from 2D arrays

The first comes from the link/example Andrea shared to a related post looking at a JavaScript approach to remove duplicates from 2D arrays, which is ideal for handling Google Sheets data handled using .getValues()/.setValues(). More detail about this approach is detailed in a related blog post by kirupa on Removing Duplicate Arrays from an Array of Arrays and the Apps Script equivalent has been added by Andrea to StackOverflow and copied below:

// From https://stackoverflow.com/a/76327128/1027723 
function removeDuplicates() {
  const sheet = SpreadsheetApp.getActiveSheet(),
        data = sheet.getDataRange().getValues();

  let newValue = data.map(JSON.stringify),
      newData = [...new Set(newValue)],
      unique = Array.from(newData, JSON.parse);

   sheet.getRange(1, 1, unique.length, unique[0].length).setValues(unique);
 }

One limitation of this code is that because JSON.stringify/JSON.parse is used, cell date objects are removed in the unique array. A workaround I found for this was to use a JSON.parse() reviver to match and ‘revive’ date strings. With this Andrea’s code can be rewritten as:

/**
 * Removes duplicates from the active sheet and writes the results to the output sheet using array manipulation.
 * @see https://stackoverflow.com/a/76327128/1027723
 * @param {Sheet} outputSheet The sheet to write the results to.
 */
function removeDuplicatesWithArray(outputSheet) {
  const sheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName('Data'),
        data = sheet.getDataRange().getValues();

  let newValue = data.map(JSON.stringify),
      newData = [...new Set(newValue)],
      unique = Array.from(newData, x => JSON.parse(x, dateTimeReviver));

   outputSheet.getRange(1, 1, unique.length, unique[0].length).setValues(unique);
 }

/**
 * A reviver function that parses a date string into a Date object.
 * @see https://stackoverflow.com/a/14509447/1027723
 * @param {string} key The key of the value.
 * @param {string} value The value to parse.
 * @returns {Date|String} The parsed Date object.
 */
const dateTimeReviver = (key, value) => {
  if (typeof value === 'string') {
    const a = /^(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}.\d{3}Z)$/.exec(value);
    if (a) return new Date(a[1]);
  }
  return value;
}

removeDuplicates() built-in method

In 2019, shortly after the ‘remove duplicate’ feature was added to the Google Sheets UI, the Range.removeDuplicates() method was added to Google Apps Script. As noted in the documentation:

Removes rows within this range that contain values that are duplicates of values in any previous row. Rows with identical values but different letter cases, formatting, or formulas are considered to be duplicates. This method also removes duplicates [sic] rows hidden from view (for example, due to a filter). Content outside of this range isn’t removed.

Range.removeDuplicates()

This method can be wrapped in a similar function to compare performance:

/**
 * Removes duplicates from the active sheet and writes the results to the output sheet using built-in method.
 * @param {Sheet} outputSheet The sheet to write the results to.
 */
function removeDuplicatesBuiltIn(outputSheet) {
  const sheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName('Data'),
    data = sheet.getDataRange().getValues();
  outputSheet.getRange(1, 1, data.length, data[0].length).setValues(data);
  outputSheet.getDataRange().removeDuplicates();
}

In Copy of Remove Duplicates Example Data you can open the Script Editor and use the test() function to try both these methods out.

Results

Limited testing of both of these approaches indicated that the array method was slightly quicker, but the difference is not significant and if you normalise the data using the process time when there 0 rows of data it’s even closer. There are some untested variables to consider as my limited testing was based on 10% duplicated data with one date column.

Ultimately, the approach you use is more likely to be informed by how you need to handle the data. The nice thing about programmatically removing duplicates is the result is stored in memory, while the built-in method requires the data to be in a Google Sheet range.

Easter Egg – Faking Data using Google Colab

Finally, a little easter egg. To generate my fake data with duplicate rows I used Google Colab to create the fake dataset then write it to a Google Sheet. To generate the code I asked both Bard and ChatGPT using the prompt:

Can you write a colab notebook that can make mock data that includes 10% duplicates with the column headings Blood Type, City, Age, Number of Blood Donations, Check and Date?

ChatGPT gave me a better starting point, with some additions from me to get the data into a Google Sheet. My notebook is here if you’d like to view/copy.

chevron_left
chevron_right

Leave a comment

Your email address will not be published. Required fields are marked *

Comment
Name
Email
Website

css.php