Enterprise Explainable AI User Guide

Document Overview

This document details the use of Chatterbox Labs’ Enterprise Explainable AI software product. The intended readers of this document are business users looking to use the software via the browser-based user interface.

Prerequisites

Prior to using Explainable AI software, you should have a trained machine learning system which has a predict function that takes input and returns a score as output. You must also have at least 1 test data point that is to be explained.

Accessing the software

Once the software is deployed, users access it via the web browser. Navigate to the computer on which is it installed, using port 3000. For example, if the software is deployed on the local machine navigate to:

if it is installed on a network location with IP 10.10.10.10 navigate to:

Connect an AI Model

The first step is to connect to the endpoint which you wish to explain. Connectors are provided to each of the cloud providers (AWS, IBM Watson, Microsoft Azure & Google Cloud) along with support OpenAPI auto generated connectors and for custom connectors via REST or gRPC.

Here we are connecting to a REST endpoint. We enter the endpoint (the URL of the predict function) and the key to the JSON payload (typically ‘text’, ‘images’ or ‘payload’).

Hit Done to move on to the next step.

Explaining Text

Connect Data

The test data point is the text data point you wish to explain.  Paste the text in the box.

If you wish to carry out the trace step, you can upload the training data.  The training data is not needed for explain. The train data should be in a CSV format with a maximum file size of 40mb.  Larger datasets can be processed when the software is integrated (see the Developer Documentation).

Test the connection by entering the label to be explained and hit Predict.  To explain the prediction result that has just been returned, hit Explain

Explain

The explain step will interrogate the endpoint many times as it assesses the contribution of various components of the text to the final prediction. 

Without needing any knowledge of the underlying model, Chatterbox Labs has extracted complex, multiword phrases.  This goes well beyond standard feature importance methods which rely on the machine learning features having meaning (which deep networks often do not), do not model the interaction between words (which is a critical part of text & language) and struggle to scale given the high dimensionality of text.

Immediately you have transparency in the machine learning prediction.  We understand which complex text components are most responsible for the prediction.  This is easily visualized in the chart showing the ranking of the importance, with two further visuals. We see the phrase in the context of the whole text datapoint, and we are able to drill down into the phrase to understand the interaction between its subcomponents.

Explaining Mixed Data

Connect Data

Enter the relevant data on this step. There is no hard requirement to have access to the training data. If you do not have access to a data sample, provide a description of your data domain using the user interface components. If you have a sample of data, or the training data set, this data domain description can be learned automatically.

The data sample should be in a CSV format with a maximum file size of 40mb. Larger datasets can be processed when the software is integrated (see the Developer Documentation). It is important to ensure that the order of the variables in your CSV file match the order that your model (and hence prediction endpoint) is expecting. The order of columns in the CSV are preserved.

The test data points are the data points you wish to explain. Load a csv with these datapoints or paste a CSV into the test box.

Test the connection by entering the label to be explained and hit Predict. To explain the prediction result that has just been returned, hit Explain

Explain

The explain step will interrogate the endpoint many times as it assesses the contributions of each variable (and the interactions between these variables) on the final predictions.

Without needing any knowledge of the underlying model, Chatterbox Labs has identified which features are important for each test point. The explanation will likely be different for each data point that is shown.

Aggregate scores are shown across all data points initially, select an individual datapoint from the data table to show the explanation specific to that data point.

Immediately valuable information can be seen. Those variables that increase the prediction score (either the confidence or probability of the prediction if it is a classification task, or the value that is returned if it is a regression task) are shown with positive scores, those with negative scores decrease this value.

Explaining Images

Connect Data

Locate the image which you wish to explain.  It is good practice to ensure that the dimensions of the image match those of your machine learning model. Commonly this is 224 x 224.  This is not a hard requirement; however it can significantly improve performance.

Provide some details about the image to kick start the explanation algorithm.  Does the target of your image classification model take up a large area of the image or a small area?  Move the slider to control this.

Test the connection by entering the label to be explained and hit Predict.  To explain the prediction result that has just been returned, hit Explain

Explain

The explain step will interrogate the endpoint many times as it assesses the contribution of various areas of the image to the final prediction. 

The explanation is returned and rendered using a heatmap.  If you wish to access the underlying data that generated this heatmap, please see the Developer Documentation. 

The heatmap ranges from yellow to purple, showing areas that are important to the classifier in yellow and those that either decrease performance or pull towards another class in purple.  The heatmap colours can be scaled by the highest contribution found in this test image (the default) or normalized to the theoretical maximum contribution any image on this task could make.

Tracing Text & Mixed Data

Now that we understand the why of the prediction, we need to Trace this prediction back to the training data. We want to do this to audit and validate the business case.  This is necessary because machine learning systems in the wild are often subject to business specification drift, and whilst they may still be confident they may not be doing what we expect them to.

Trace applies to text and mixed data, here we will use text as an example.

Move onto the Trace step. Variable types will be automatically assigned, however you can modify them if you wish. Ignore variables which are not part of your machine learning model (such as IDs, test/train splits or the target value).

The Trace step identifies datapoints which are most similar to your test data point.  This is so that you can audit the business specification.  This similarity, or distance, is shown in two ways:  the raw data is listed in the table whist the bubble plot renders this visually.  Data points on the right are farthest away, data points on the left are close.  You can choose to split the data, most often this would be by the target variable in your data. 

You can now check whether your test data point is most similar to training data of the same class (which shows that the business specification still holds), that it’s most similar to a different class (the model has been subjected to business specification drift) or there is not a clear separation of classes at all (the business specification was not clear in the first place).

Hardware Requirements

XAI sits as a layer on top of an existing ML model therefore the underlying machine learning model will have its own resource requirements.

Typical minimum requirements for XAI application are:

  • CPU: Dual Core @ > 2GHz
  • Memory: 16GB
  • Software: JDK 11 or Docker

Client computers used to access the software (if different from where the application is deployed) should have a minimum of:

  • CPU: Dual Core @ > 2GHz
  • Memory: 8GB
  • Web browser: Firefox, Google Chrome, Safari or Microsoft Edge (Chromium)

Get in Touch