AI Testing

Testing AI
The key to mitigation of the AI risks is transparency. In bias we need insight into the
representativeness of training data and labelling, but most of all we need insight into
how important expectations and consequences for all parties involved are reflected
in the results.
Building the right amount of confidence and traceability needs transparency too.
Transparency will not be achieved by illuminating the code. Even if this were
possible, by showing a heat-map of the code indicating which part of the neural
network is active when a particular part of an object is analysed or a calculation in
a layer is produced, means close to nothing. Looking inside a brain will never show
a thought or decision. It could show which part is activated but all mental processes
always involve multiple brain parts to be involved and most of all experience from
the past.
AI systems are black boxes, so we should test them like we do in black box
testing: from the outside, developing test cases that are modelled on real-life input.
From there expectations on the output are determined. Sounds traditional and well
known, doesnβt it?
The basic logic of testing AI might be familiar, the specific tasks and elements
are very different.
Traditionally requirements and specifications are determined upfront and testers
receive them ready to be used at the start. In AI, requirements and specifications are
too diverse and dynamic to expect them to be determined at the start completely
and once and for all. Product owners and business consultants should deliver
requirements, but testers need to take initiative to get the requirements in the form,
granularity and actuality that they need.
The challenges with testing AI and their accessory measures from start to finish
are discussed next.
- AI in testing involves black box development where the algorithm is not explicitly coded but emerges from training data, parameterization, and neural networks.
- Machine learning in AI is akin to human learning, where neural networks create models based on training data to classify input and assign labels.
- The algorithm in AI is a combination of data, code, and labels, not directly fixable like traditional code due to its reliance on training data and labels.
- AI calculations produce probabilities, not definitive results, with outputs representing the extent to which criteria have been met for each label.
- Developing a neural network involves configuring input, labels, and parameterizing results, with tweaks impacting overall performance and potentially causing regression.
- Massive regression testing is crucial in AI due to the unintended impacts of parameter tweaks, requiring overall evaluation based on graded results to determine system improvements.
Β
Understanding the Review of Neural Network, Training Data, and Labelling
Neural Network Review:
- In the context of testing AI, it is essential to review the neural network being used. This involves assessing whether the chosen neural network is suitable for the specific task it is intended for.
- The review also includes evaluating the setup of the neural network to ensure it aligns with the objectives of the AI system.
- Testers need to have a comprehensive understanding of various types of neural networks, their functionalities, strengths, and weaknesses to effectively review and assess the neural network in use.
Assessment of Fitness for Purpose:
- One crucial aspect of reviewing the neural network is determining if it is fit for the intended purpose. This assessment involves analyzing whether the neural network architecture and design are appropriate for the tasks it is expected to perform.
- Testers need to consider factors such as the complexity of the problem, the type of data being processed, and the desired outcomes when evaluating the fitness of the neural network for the specific AI application.
Exploring Alternatives:
- As part of the review process, testers should also explore alternative neural network architectures or setups that could potentially better suit the requirements of the AI system.
- Understanding the various alternatives available and their respective strengths and limitations is crucial in making informed decisions about the neural network to be used.
Comprehensive Knowledge Requirement:
- Reviewing the neural network requires testers to have a broad knowledge base encompassing different types of neural networks, such as convolutional neural networks, recurrent neural networks, and deep learning architectures.
- Testers need to be familiar with the specific qualities and shortcomings of each type of neural network to make informed judgments during the review process.
Training Data and Label Assessment:
- In addition to reviewing the neural network, testers also need to assess the training data and labels used to train the AI system.
- This assessment involves reviewing the quality, relevance, and potential biases present in the training data, as well as evaluating the accuracy and appropriateness of the labels assigned to the data.
- Testers must identify any sensitivity to risks within the training data and labels that could impact the performance and outcomes of the AI system.