Migrating a call-off process to Twilio Functions

Upd:

Introduction

As I discussed in my article on my reporting automation workflow, I don't get a lot of IT resources to lean on, so I've relied a lot on local code execution to accomplish process automation.

My best (worst) example is the process for the associate call-off voicemail. We have a voicemail that associates are required to call when they miss, or will miss, any part of a scheduled shift. Call the number, leave a voicemail with your information: chat notifications go out and the voicemail gets logged. My automation flow for this depends on no fewer than four independent points of failure and brittle steps that need to be babysat.

This year I decided to re-implement the whole thing on a cloud communications platform to improve error handling, operate consistently with no or minimal manual intervention, and be easily handed off to another person or team. I ended up going with Twilio due to their in-platform serverless function support (Twilio Functions). This post covers my learning process in assembling the new flow and walks through the final result.

Note: This is firmly in the Learning In Public category, so don't depend on any of the following to represent correctness or best practices. Feedback welcome, though.

Background

Before I automated any of it, the call-off process was entirely manual: all of the leads would get a copy of the voicemail email and listen to the audio to determine who, when, and why, then document that information in the right place. It often resulted in duplicate efforts and required a lot of manual steps (e.g., notice email, open email, listen to the audio, document the occurrence, etc.). Poorly-configured call forwarding from a toll-free number meant that caller ID information didn't even come through, so associates had to be relied upon to correctly identify themselves (amazingly hit-or-miss).

I went through a few iterations of automation for the above process, but it eventually landed at this:

  1. Outlook forwarding rule to forward the Mitel voicemail emails from the contact email to a distribution list (server-side).
  2. Outlook rule to run a VBA script to save the voicemail attachment locally (with caller ID information) (client-side).
  3. Sync script to send the voicemails to a different machine to run the actual scripts, since Windows Outlook is required to save attachments with mail rules, but Windows Python made running the scripts more painful than I wanted to deal with.
  4. Hazel rule to monitor the incoming voicemail directory and run a script to convert the audio to FLAC from WAV.
  5. Hazel rule to monitor for new FLAC files and run the final Python script.
  6. Python script that:
    1. Does a roster lookup for the associate (based on phone number).
    2. Transcribes the voicemail audio using Google Cloud Speech-to-Text.
    3. Appends the occurrence information to a Google Sheet.
    4. Sends a notification to the relevant internal chat rooms.

Yikes.

Why status-quo doesn't work

There are four separate points of failure here (the VoIP system, the Outlook computer, the scripts computer, and the cloud servers) and it requires that the two local machines remain online and connected to VPN to function. When any of those are turned off or disconnected, the whole process stops.

Outlook also has a helpful feature that monitors attachment file activity and will warn you if attachments are open to help avoid file state issues (I guess?). Unfortunately, the warning dialog is modal and the sync script seems to trigger it infrequently on the inbound voicemail emails. The result is a dialog that stops all other operations (mail handling and rule processing) until OK is clicked. This happens a couple times a month, so Outlook needs babysitting.

Our VPN does not automatically reconnect, so if the gateway or the connection to the gateway goes down for any reason, I need to login again. I have to funnel notifications for VPN disconnects through my own SMTP server since the work server is also behind VPN.

Finally, all of this mess is dependent on me operating and babysitting it. If I change roles or leave the company it can't be easily handed off to somebody else.

Finding a real solution

That's all clearly untenable. It's brittle, not reliable, and not easily transferable. It's an excellent example of how not to implement process automation. An ideal solution needs to be the opposite: reliably handle error conditions, operate consistently with no or minimal manual intervention, and can easily be handed off to another person or team.

I evaluated a couple of potential platforms to meet the needs of a VoIP provider with SMS and voicemail support, built-in voice transcription, contact flows, and serverless functions. Plivo and Twilio both ended up on my radar, but as only Twilio seems to have serverless functions support right now, so I went with them.

I started nearly from scratch with this project. I have some previous experience with JavaScript, but it's been years since I did anything serious with the language and I've never worked with Node.js, which is what Twilio is using for their serverless functions. I also haven't had much opportunity to do asynchronous JavaScript.

Twilio cloud platform

Twilio's solution involved a few pieces:

  1. Provision a number to use (I'm skipping trying to transfer the existing number).
  2. Create a Studio flow to handle the various steps of contact handling.
  3. Implement Functions for the API calls and data lookups that can't be handled by Studio.
  4. Point the call and SMS handling hooks for your provisioned number at the Studio flow.

The new automation flow

Studio is Twilio's contact handling workflow editor. With it, we can solicit caller input, send messages or play text-to-speech, trigger API calls, and use branching logic. We have three potential arrival points in a Studio flow:

  1. SMS
  2. Phone call
  3. API

I only implemented SMS and phone call, as that's what associates are going to be using. I may implement an API front-end in the future with a web form they can use to submit occurrences.

Twilio Studio contact flow
Twilio Studio contact flow

SMS

The SMS flow starts with the arrival of an inbound text message. The first step is to call our roster-lookup function to look up the username of the associate based on the phone number. We branch here depending on whether we can find a match.

Known phone number

If the phone number is in the roster, we assign the associate_ad and department variables. The department variable is used later to tell us which Trillian rooms to send the notification to. We then reply noting the call-off has been logged and ask the associate to verify their username:

Thank you, {{flow.variables.associate_ad}}. Your call-off has been recorded.

(If you are not {{flow.variables.associate_ad}}, please reply with the correct Active Directory username [e.g., "sams"])

Wait fifteen seconds for a reply and then proceed if none is received. If we get a reply, then assign the reply to associate_ad and do another roster using the ad-lookup function to get department for the new Active Directory username.

Unknown phone number

If the phone number is not in the roster, we reply to the SMS and ask them to provide the Active Directory name:

We don't have your phone number on file. What is your Active Directory username (e.g., "sams")?

Wait sixty seconds for a reply, then assign the reply to the associate_ad variable and do another roster lookup using the ad-lookup function to get department for the Active Directory username. End the interaction if no reply is received.

Now we should have complete information for the associate (unless they're missing entirely from the roster), so we can call the Trillian notification function to construct a chat message and send it to the right rooms.

Whether that succeeds or fails, next we call the Google Sheets function to append the call-off information to the Google spreadsheet for tracking attendance.

Fin.

Voicemail

The voicemail flow has a lot less Studio logic. This is partly because there's no branching on the phone portion because there's no easy way to collect an Active Directory username for an unknown user via DTMF or voice transcription, and partly the actions are triggered by a callback from the transcription job, which operates outside the Studio flow, so once we're waiting on the transcription text we're stuck in that function.

Once we receive a call, we play the voicemail announcement with message instructions, then record the voicemail. The transcription callback calls our call-off-voicemail function which gets the roster information, sends the Trillian notifications, and appends to the Google Sheet.

Functions

Twilio Functions
Twilio Functions for the call-off project

The Twilio Functions serverless functions, much like AWS Lambda, make it much easier to do real program logic without having to maintain infrastructure. I could just use AWS for this combined with some other cloud communications provider, but having it within a single client with one billing contact makes this an easier sell at work and makes it easier to hand off in the future.

This is the source code for each of the serverless functions I used in the flow. Some of the stuff is fairly specific to our environment (I doubt many people are using Trillian), but it should give you a good idea of how to structure different steps in a workflow.

roster-lookup

We provide the phone number from the message trigger and look it up in our roster.csv static asset to get the associate's username and department, then return that back to the Studio flow.

Depends on lookup-csv for the CSV search.

/*
 * Looks up a phone number to find the Active Directory name and department.
 *
 * References the associate-phone-book.csv phone roster.
 */

const lookup_csv = require('lookup-csv');

exports.handler = function (context, event, callback) {
    // Get the roster path from the Twilio static assets.
    let file = Runtime.getAssets()['/roster.csv'].path;

    // CSV lookup on the phoneNum column in our roster data.
    const lookup_table = lookup_csv(file, ['phoneNum']);

    // Validate against any possible phone number pattern and sanitize input.
    let phone_regex = (
        /^\s*(?:\+?(\d{1,3}))?[-. (]*(\d{3})[-. )]*(\d{3})[-. ]*(\d{4})(?: *x(\d+))?\s*$/
    );
    if (phone_regex.test(event.From)) {
        // Since we're only looking at US numbers, strip anything that's not
        // a numeral.
        // TODO: International number support
        var phone_num = event.From.replace(/\D/g, '');
    } else {
        // Respond with 400 if we didn't get a valid phone number.
        return callback(null, 400);
    }

    // Match the row to get the associate information.
    let matching_row = lookup_table.get(phone_num);

    // If we find a match, return it, otherwise we'll give a 404 response.
    if (typeof(matching_row) !== 'undefined') {
        let response = {
            'associate_ad': matching_row['prettyUsername'],
            'department': matching_row['department'],
        };

        return callback(null, response);
    } else {
        return callback(null, 404);
    };
};

ad-lookup

We provide the Active Directory username, given by the associate in a reply text, and look it up in our roster.csv static asset to get the department, then return that back to the Studio flow.

This for the portion of the SMS flow where the from phone number is not in the roster.

Depends on lookup-csv to do the CSV search.

/*
 * Looks up an Active Directory username to find the department.
 *
 * References the associate-phone-book.csv phone roster.
 */

const lookup_csv = require('lookup-csv');

exports.handler = function (context, event, callback) {
    // Get the roster path from the Twilio static assets,.
    let file = Runtime.getAssets()['/roster.csv'].path;

    // CSV lookup on the activeDirectory column in our roster data.
    const lookup_table = lookup_csv(file, ['activeDirectory']);

    // Match the row to get the associate information. We'll also sanitize the
    // input a bit.
    let matching_row = lookup_table.get(event.Ad.toLowerCase().trim());

    // If we find a match, return it, otherwise we'll give a 404 response.
    if (typeof matching_row !== 'undefined') {
        // The roster is long instead of wide, so we only need the first result.
        let response = {
            'associate_ad': matching_row[0]['prettyUsername'],
            'department': matching_row[0]['department'],
        };

        return callback(null, response);
    } else {
        return callback(null, 404);
    };
};

trillian-notification-sms

This constructs a chat message and sends it to the appropriate rooms based on the department provided. Twilio doesn't include the received date among the data accessible from the Studio flow, so we have to make a separate API call to get this using the message ID.

Depends on axios for the Trillian REST calls, qs to stringify the data payloads for the Trillian requests, and luxon for date and time formatting.

/*
 * This function sends the Trillian notifications for an SMS call-off.
 *
 * Notification destination depends on the associate's department.
 *
 */

const axios = require('axios');
const qs = require('qs');
const { DateTime } = require('luxon');

exports.handler = async function(context, event, callback) {
    // Trillian rooms to send notification to for each department
    let room_ids = {
        '<redacted>': '<redacted>',
    };
    let room_keys = {
        '<redacted>': '<redacted>',
    };

    const client = context.getTwilioClient();

    // Fetch the message record details
    const message = await client.messages(event.MessageSid).fetch().then((message) =>  {
        return message;
    }).catch((error) => {
        console.log(error);
        return callback(error);
    });

    // Get the date and turn it into a DateTime object.
    let date = DateTime.fromJSDate(message.dateCreated);
    // Contact Center does everything in Eastern time.
    date = date.setZone('America/New_York');

    // Construct the chat message and payload for the API call
    var trillian_message = (
        `New SMS from ${event.AssociateAd} (${event.From}) at ` +
        `${date.toLocaleString(DateTime.DATETIME_FULL)}: "${event.Body}"`
    );

    // Our Trillian API instance.
    const instance = axios.create({
        baseURL: 'https://api.trillian.im',
        timeout: 300,
        headers: {
            'X-Custom-Header': 'Twilo',
            'Content-Type': 'application/x-www-form-urlencoded',
        },
    });

    // Get the correct room ID based on the associate's department.
    let room_id = room_ids[event.Department];

    // Build the Trillian API payloads.
    // For the Call-off voicemail room.
    let all_payload = qs.stringify({
        'from': '<redacted>',
        'message': trillian_message,
        'token': room_keys['all'],
    });
    // For the department room.
    let dept_payload = qs.stringify({
        'from': '<redacted>',
        'message': trillian_message,
        'token': room_keys[event.Department],
    });

    // POST the chat message to the Trillian API
    // For the Call-off voicemail room.
    const all_message = instance.post(
        `<redacted>/groupchats/${room_ids['all']}/messages/`,
        all_payload,
    ).then((response) =>  {
        console.log(JSON.stringify(response.data));
    }).catch((error) => {
        console.log(error);
    });

    const dept_message = instance.post(
        `<redacted>/groupchats/${room_ids[event.Department]}/messages/`,
        dept_payload,
    ).then((response) =>  {
        console.log(JSON.stringify(response.data));
    }).catch((error) => {
        console.log(error);
    });

    // Wait until we get back both Trillian promises and finish.
    Promise.all([all_message, dept_message]).then((values) => {
        return callback(null);
    }).catch((error) => {
        return callback(error);
    });
};

google-sheets-sms

This appends the call-off contact data to the attendance tracking Google Sheet. We need to fetch the Twilio message details again.

Depends on luxon for date and time formatting, and google-spreadsheet to interact with the Google Sheets API.

/*
 * This function appends the details from an SMS call-off to the Google Sheet.
 */

const fs = require('fs');
const { DateTime } = require('luxon');
const { GoogleSpreadsheet } = require('google-spreadsheet');

exports.handler = async function(context, event, callback) {
    /**
     * Twilio details
     */

    const client = context.getTwilioClient();

    // Fetch the message record details
    const message = await client.messages(event.MessageSid).fetch().then((message) =>  {
        return message;
    }).catch((error) => {
        console.log(error);
        return callback(error);
    });

    // Get the date and turn it into a DateTime object.
    let date = DateTime.fromJSDate(message.dateCreated);
    // Contact Center does everything in Eastern time.
    date = date.setZone('America/New_York');

    // Get the Google Sheets credentials, which are a Twilio static asset
    const credentials = JSON.parse(
        fs.readFileSync(Runtime.getAssets()['/credentials.json'].path, 'utf8')
    );

    // Initialize the spreadsheet object and authenticate
    const doc = new GoogleSpreadsheet(context.GOOGLE_SPREADSHEET_KEY);
    await doc.useServiceAccountAuth(credentials);
    await doc.loadInfo();

    // Appending to the first sheet
    const sheet = doc.sheetsById[0];
    // Append the call-off data
    sheet.addRow({
        'Transcript': event.Body,
        'Type': 'SMS',
        'Date': date.toISODate(),
        'Time (Eastern)': date.toFormat('HH:mm'),
        'Associate': event.AssociateAd,
    }).then((response) => {
        return callback(null);
    }).catch((error) => {
        console.log(error);
        return callback(error);
    });
};

call-off-voicemail

This is a mess of a monolithic function that combines the functionality of the above functions. I am omitting it here because it's mostly a duplication of the above with some minor changes to pull the call-relevant data (e.g., event.TranscriptionText instead of event.Body) and to format things for "voicemail" instead of "SMS".

Challenges

My biggest challenge was wrapping my head around the details of asynchronous JavaScript. The first step was understanding promises. They're liberally applied in the Twilio examples, so this was a relatively easy concept to identify the need for, but took some time to internalize. Promise.all() (actually, you should probably use Promise.allSettled(), but Twilio uses Node 10 and that's a Node 12 feature) was helpful for parallelizing what I could to to ensure we stay under the 10-second execution limit (especially for the call-off-voicemail function, which is large).

Next was figuring out the async/await stuff to deal with race conditions on asynchronous functions. I stumbled on the answer almost immediately but I didn't have success when I tried it in an early attempt. I think I either omitted the async part on the event handler or entered it incorrectly. I didn't come back to it until after many other dead ends and wasted a lot of time here.

Finally, I went down a rabbit hole with the GSuite APIs trying to use the googleapis basic library and going through the same confusion I went through implementing it in Python between the OAuth2 authentication and using a service account with an API key. I finally stumbled on the google-spreadsheets wrapper library, which made things a lot easier, as it both streamlined the service account authentication and the actual appending to the spreadsheet.

Conclusion

With this approach we eliminated dependence on local infrastructure; encapsulated the whole flow within a single provider to make maintenance, hand-offs, and billing easier; added features like SMS and flow branching; and made the whole thing more reliable. Rather than depending on seven interaction points for the end-to-end process we now have three.

The project took me about two weeks to complete. Testing, training, and deployment of the new process to the team will probably take another week. This is longer than I would have preferred, but has the side benefit of familiarizing me with the bare basics of Node.js which has relevance to other projects on the horizon. Particularly in a serverless context. Someone familiar with the tools and language involved could probably throw this together in a day or two.

I find process automation really rewarding, but the duct-tape-and-creativity approach I've had to take in the past has resulted in a lot of processes that are brittle, unmanageable, non-transferrable messes. This was a big step in cleaning up one of the more important ones and I'm happy with the results. Twilio provides a nice platform and the learning experience was challenging enough be interesting while not causing me to beat my head against a wall needlessly.

Resources

I found a few references that were particularly helpful: