Examples of AI applications and how possibly test them

Recently I attended an online crowdchat hosted by ministry of testing about testing AI applications.

The questions were very interesting, but it was hard to think of a right answer for all AI applications, as this is a very broad field. Explaining it over twitter would be confusing, so I thought I may as well create a post giving some examples.

Kudos to someone on Twitter that mentioned supervised and unsupervised learning at the end of the chat. I was very sleepy at the time (the chat started at 4am my time) so I was not able to find his tweet in the morning to vote for it. I think that we could understand better the types of AI applications that we could have if we divide them in supervised vs unsupervised. More information here.

Supervised learning examples

The idea behind it is easy to understand: These applications have a phase of learning in which we keep feeding them with data and rewarding them if they produce a correct result, while punishing them when they don’t until the produced results match the expected results within a threshold. (In other words, until we are happy with the results).

Let’s ignore for now the exact ways in which we can punish or reward a machine and just focus on the general idea.

After this learning phase, we generally just use the application and no more learning takes place. We “turn off” the learning. This is called inference phase. Not all the applications have inference phase; sometimes we want them to keep learning from the users, but this can turn out to be problematic, as we will see further on.

I think these are the easiest AI applications to test functionally speaking, as we just need to pass new data and check the results obtained against the expected. Apart from this, they behave just like any other application and we can also go through the other types of testing without many changes (performance, security, system..)

NPR / OCR:

Imagine for example a¬†number plate recognition system – once the system learns how to recognize the numbers in the license plate, you don’t have to keep training it. The application can use the learned patterns to verify the new number plates.

There are many tests we could think of here, without caring for how the application gets the results: try with characters that have strange typography (if allowed in the country), tilt the number plate, check the boundary in distance from the vehicle…

An OCR (optical character recognition) application could also be done with this technique. In fact, the number plate recognition system could be considered as a specific type of OCR.

Digital personal assistance (Cortana, Siri, Alexa…):

Quite common nowadays, they help you find out information using voice commands. They could also use supervised learning (although, I believe the right classification for them would be “semi-supervised learning”, but let’s think of them as just supervised for the sake of the example). However, in this case the application keeps learning from the users. It stays in the learning phase.

The reason they can ‘safely’ do this it is because they collect data from the users but not their direct input in whether the result was to be penalized or rewarded. An example of application getting direct input from the user to keep learning would be a chatbot that guesses something and asks if that guess was correct. This could be easily tricked by dishonest users.

Applications that keep learning are much trickier to test, even functionally, as if we pass wrong inputs to test, they will learn wrong. If I had to test one of these, I would use a copy of the state of each iteration we would like to test in an isolated environment, so we don’t break the acquired good learning. For performance testing it would be best to use valid data, to ensure the learning process continues well.

If anybody is concerned about AI gaining consciousness, this type of applications would be the problematic one, as they could be learning things we are not aware of depending on the power that the programmer and the user gave them and the data they are able to collect. This brings up the question: Should testers be responsible to test consciousness?

Unsupervised learning examples

The key of these applications is to discover relationships on the data without direct penalization or reward. They are very useful when we are not sure of what the output should be, and to discover things that we would not naturally think of being related.

There are two types: Clustering (when the system discovers groupings in data) and association (for discovering rules describing data). I won’t go deep on them in this post, as it is a lot of information as it is.

Tailored content-advertising (Amazon, Netflix, Google…)

These apps want to be able to predict what the customers that bought something would be interested on next. In fact, digital personal assistance tools could also use this data to help you find what you want (that’s why I mentioned before they should be classified as ‘semi-supervised’ learning). I cannot think of any ways of testing this except checking on the impact on the sales after the application is in place, but this could potentially be subjective to chance or other factors not related with the application itself.

Apart from that, the test of the application should be the same as we already do with non-AI applications (not just the results, but how the user inputs the data and how the application responds and shows back the data…) Imagine this as a feature of a bigger product, all the other features would also need to be tested as well.

The moral impact of these applications, in my opinion, is that at some point they might be telling you (as a user) what you want, even before you know you wanted it.

What could possibly go wrong?

What should we be careful about in AI that might not need so much attention in other  apps?

Things could go very wrong if we leave apps learning constantly and we leave the users to provide the penalization or rewards. You probably have heard of applications such as image recognition systems and chatbots becoming racists and sexists. Sometimes this is because the test data given to the application is biased, but it could also be because of trolls playing around with the application in unexpected ways and giving rewards when the application is wrong.

Also leaving apps learning on their own is not the best idea, as we do not control what they are actually learning, as mentioned before.

If you are interested, I found an article with some more examples of issues with AI applications here.

What else have you got?

Below is a list of readings that I found very interesting while researching for this post (a couple of the links are about video games and AI):

How “hello neighbor” game’s AI works

AI predicting coding mistakes before developers make them

Examples of AI

Game examples of AI

How would you test these applications?

What do you think about the moral connotations?

If used well, AI could be harmless and powerful. In fact, it could also be a good tool that we could use for automating our testing, but that’s…well…another story.