Testing

AI Testing

Testing AI

The key to mitigation of the AI risks is transparency. In bias we need insight into therepresentativeness of training data and labelling, but most of all we need insight intohow important expectations and consequences for all parties involved are reflectedin the results.

Building the right amount of confidence and traceability needs transparency too.Transparency will not be achieved by illuminating the code. Even if this werepossible, by showing a heat-map of the code indicating which part of the neuralnetwork is active when a particular part of an object is analysed or a calculation ina layer is produced, means close to nothing. Looking inside a brain will never showa thought or decision. It could show which part is activated but all mental processesalways involve multiple brain parts to be involved and most of all experience fromthe past.

AI systems are black boxes, so we should test them like we do in black boxtesting: from the outside, developing test cases that are modelled on real-life input.From there expectations on the output are determined. Sounds traditional and wellknown, doesn’t it?

The basic logic of testing AI might be familiar, the specific tasks and elementsare very different.

Traditionally requirements and specifications are determined upfront and testersreceive them ready to be used at the start. In AI, requirements and specifications aretoo diverse and dynamic to expect them to be determined at the start completelyand once and for all. Product owners and business consultants should deliverrequirements, but testers need to take initiative to get the requirements in the form,granularity and actuality that they need.

The challenges with testing AI and their accessory measures from start to finishare discussed next.

  • AI in testing involves black box development where the algorithm is not explicitly coded but emerges from training data, parameterization, and neural networks.
  • Machine learning in AI is akin to human learning, where neural networks create models based on training data to classify input and assign labels.
  • The algorithm in AI is a combination of data, code, and labels, not directly fixable like traditional code due to its reliance on training data and labels.
  • AI calculations produce probabilities, not definitive results, with outputs representing the extent to which criteria have been met for each label.
  • Developing a neural network involves configuring input, labels, and parameterizing results, with tweaks impacting overall performance and potentially causing regression.
  • Massive regression testing is crucial in AI due to the unintended impacts of parameter tweaks, requiring overall evaluation based on graded results to determine system improvements.

Β 

Understanding the Review of Neural Network, Training Data, and Labelling

  • Neural Network Review:

    • In the context of testing AI, it is essential to review the neural network being used. This involves assessing whether the chosen neural network is suitable for the specific task it is intended for.
    • The review also includes evaluating the setup of the neural network to ensure it aligns with the objectives of the AI system.
    • Testers need to have a comprehensive understanding of various types of neural networks, their functionalities, strengths, and weaknesses to effectively review and assess the neural network in use.
  • Assessment of Fitness for Purpose:

    • One crucial aspect of reviewing the neural network is determining if it is fit for the intended purpose. This assessment involves analyzing whether the neural network architecture and design are appropriate for the tasks it is expected to perform.
    • Testers need to consider factors such as the complexity of the problem, the type of data being processed, and the desired outcomes when evaluating the fitness of the neural network for the specific AI application.
  • Exploring Alternatives:

    • As part of the review process, testers should also explore alternative neural network architectures or setups that could potentially better suit the requirements of the AI system.
    • Understanding the various alternatives available and their respective strengths and limitations is crucial in making informed decisions about the neural network to be used.
  • Comprehensive Knowledge Requirement:

    • Reviewing the neural network requires testers to have a broad knowledge base encompassing different types of neural networks, such as convolutional neural networks, recurrent neural networks, and deep learning architectures.
    • Testers need to be familiar with the specific qualities and shortcomings of each type of neural network to make informed judgments during the review process.
  • Training Data and Label Assessment:

    • In addition to reviewing the neural network, testers also need to assess the training data and labels used to train the AI system.
    • This assessment involves reviewing the quality, relevance, and potential biases present in the training data, as well as evaluating the accuracy and appropriateness of the labels assigned to the data.
    • Testers must identify any sensitivity to risks within the training data and labels that could impact the performance and outcomes of the AI system.