As
part of our AI auditing framework blog series, Reuben Binns, our Research Fellow in
Artificial Intelligence (AI), Peter Brown, Technology Policy Group Manager, and
Valeria Gallo, Technology Policy Adviser, look at how AI can exacerbate known
security risks and make them more difficult to manage.
Risk example #1 – Losing track of training data
part of our AI auditing framework blog series, Reuben Binns, our Research Fellow in
Artificial Intelligence (AI), Peter Brown, Technology Policy Group Manager, and
Valeria Gallo, Technology Policy Adviser, look at how AI can exacerbate known
security risks and make them more difficult to manage.
Personal
data must always be processed in a manner that ensures appropriate levels of security
against unauthorised processing, accidental loss, destruction or damage.
data must always be processed in a manner that ensures appropriate levels of security
against unauthorised processing, accidental loss, destruction or damage.
There is no
“one-size-fits-all” approach to security. The appropriate security measures
organisations should adopt depend on the level and type of risks that arise
from specific processing activities. Using AI to process any personal data will
have important implications for an organisation’s security risk profile, which need
to be assessed and managed carefully.
“one-size-fits-all” approach to security. The appropriate security measures
organisations should adopt depend on the level and type of risks that arise
from specific processing activities. Using AI to process any personal data will
have important implications for an organisation’s security risk profile, which need
to be assessed and managed carefully.
Some implications may
be triggered by the introduction of new types of risks, eg adversarial attacks
on machine learning models, which we will examine in future blogs. In this post
we will focus on the way AI can adversely affect security by making known risks
worse and more challenging to control.
be triggered by the introduction of new types of risks, eg adversarial attacks
on machine learning models, which we will examine in future blogs. In this post
we will focus on the way AI can adversely affect security by making known risks
worse and more challenging to control.
Information
security is a key component of our AI Auditing Framework, but is also central
to our work as the information rights regulator. The ICO is planning to expand
its general security guidance to take into account the additional requirements
set out in the new General Data Protection Regulation (GDPR). While this
guidance will not be AI-specific, it will cover a range of topics that are
relevant for organisations using AI, including software supply chain security
and increasing use of open-source software.
security is a key component of our AI Auditing Framework, but is also central
to our work as the information rights regulator. The ICO is planning to expand
its general security guidance to take into account the additional requirements
set out in the new General Data Protection Regulation (GDPR). While this
guidance will not be AI-specific, it will cover a range of topics that are
relevant for organisations using AI, including software supply chain security
and increasing use of open-source software.
We are therefore particularly keen
to hear your views on this topic so we can integrate them into both the
framework and the guidance. We encourage you to use the comments section below,
or to email us, to share your thoughts on AI related security
challenges, best practices, and any additional guidance you would like the ICO
to issue.
to hear your views on this topic so we can integrate them into both the
framework and the guidance. We encourage you to use the comments section below,
or to email us, to share your thoughts on AI related security
challenges, best practices, and any additional guidance you would like the ICO
to issue.
Managing security
in AI vs. traditional technologies
in AI vs. traditional technologies
Some
of the unique characteristics of AI mean compliance with security requirements
can be more challenging than with more established technologies, both from a
technological and human perspective.
of the unique characteristics of AI mean compliance with security requirements
can be more challenging than with more established technologies, both from a
technological and human perspective.
From
a technological perspective AI systems introduce new kinds of complexity not
found in the IT systems most organisations will have dealt with previously.
They are also likely to rely heavily on third party code or relationships, and
will need to be integrated with several other new and existing IT components, which
are also intricately connected. This complexity may make it more difficult to
identify and manage some security risks, and may increase others, such as the
risk of outages.
a technological perspective AI systems introduce new kinds of complexity not
found in the IT systems most organisations will have dealt with previously.
They are also likely to rely heavily on third party code or relationships, and
will need to be integrated with several other new and existing IT components, which
are also intricately connected. This complexity may make it more difficult to
identify and manage some security risks, and may increase others, such as the
risk of outages.
From
a human perspective, the people involved in building and deploying AI systems
are likely to have a wider range of backgrounds than usual, including traditional
software engineering, systems administration, data scientists, statisticians,
as well as domain experts. Security practices and expectations may vary
significantly, and for some there may be less understanding of broader security
compliance requirements. Security of personal data may not always have been a
key priority, especially if someone was previously building AI applications
with non-personal data or in a research capacity.
a human perspective, the people involved in building and deploying AI systems
are likely to have a wider range of backgrounds than usual, including traditional
software engineering, systems administration, data scientists, statisticians,
as well as domain experts. Security practices and expectations may vary
significantly, and for some there may be less understanding of broader security
compliance requirements. Security of personal data may not always have been a
key priority, especially if someone was previously building AI applications
with non-personal data or in a research capacity.
Common
practices about how to process data securely in data science and AI engineering
are still developing, which causes further complications.
practices about how to process data securely in data science and AI engineering
are still developing, which causes further complications.
It
is not possible to list all known security risks that might be exacerbated when
AI is used to process personal data. The impact of AI on security will depend
on the way the technology is built and deployed, the complexity of the
organisation, and the strength and maturity of the existing risk management
capabilities.
is not possible to list all known security risks that might be exacerbated when
AI is used to process personal data. The impact of AI on security will depend
on the way the technology is built and deployed, the complexity of the
organisation, and the strength and maturity of the existing risk management
capabilities.
The
following hypothetical scenario should raise awareness of some of the known
security risks that AI can exacerbate and some of the challenges.
following hypothetical scenario should raise awareness of some of the known
security risks that AI can exacerbate and some of the challenges.
Our
key message for organisations is: review
risk management practices to ensure personal data is secure in an AI context.
key message for organisations is: review
risk management practices to ensure personal data is secure in an AI context.
Hypothetical
scenario: AI in recruitment
scenario: AI in recruitment
A
recruitment firm decides to use an AI system based on machine learning (ML) to match
CVs to job descriptions automatically, rather than through a manual review. The
AI system will select the best candidates to be forwarded to potential employers
for consideration. To make a recommendation, the AI system will process the job
descriptions, personal data provided by the candidates themselves, and data provided
by the employers about previous hiring decisions for similar roles.
recruitment firm decides to use an AI system based on machine learning (ML) to match
CVs to job descriptions automatically, rather than through a manual review. The
AI system will select the best candidates to be forwarded to potential employers
for consideration. To make a recommendation, the AI system will process the job
descriptions, personal data provided by the candidates themselves, and data provided
by the employers about previous hiring decisions for similar roles.
Risk example #1 – Losing track of training data
ML
systems require large sets of training and testing data to be shared. In the
example above, for the AI system to be effective, employers will need to share
data about similar previous hiring decisions (e.g. sales manager) with the
recruitment firm.
systems require large sets of training and testing data to be shared. In the
example above, for the AI system to be effective, employers will need to share
data about similar previous hiring decisions (e.g. sales manager) with the
recruitment firm.
While
some sharing of personal data (e.g. candidates’ CVs) would have taken place while
the CV scanning process was manual, it did not involve the transfer of large
quantities of personal data between the employers and the recruitment firm.
some sharing of personal data (e.g. candidates’ CVs) would have taken place while
the CV scanning process was manual, it did not involve the transfer of large
quantities of personal data between the employers and the recruitment firm.
Leaving aside
questions about the legal basis for the processing, sharing this additional data
could involve creating multiple copies, in different formats stored in
different locations (see below), which require important security and
information governance considerations:
questions about the legal basis for the processing, sharing this additional data
could involve creating multiple copies, in different formats stored in
different locations (see below), which require important security and
information governance considerations:
- The
employer may need to copy HR and recruitment data into a separate database
system to interrogate and select the data relevant to the vacancies the
recruitment firm is working on. - The
selected data subsets will need to be saved and exported into files, and then transferred
to the recruitment firm in compressed form. - Upon
receipt the recruitment firm could upload the files to a remote location, eg
the cloud. - Once
in the cloud, the files may be loaded into a programming environment to be
cleaned and used in building the AI system. - Once ready, the data is likely to be saved
into a new file to be used at a later time.
For both the
recruitment firm and employers, this will increase the risk of a data breach, including
unauthorised processing, loss, destruction and damage.
recruitment firm and employers, this will increase the risk of a data breach, including
unauthorised processing, loss, destruction and damage.
What should organisations do?
All copies of
training data will need to be shared, managed, and when necessary deleted in
line with security policies. While many recruitment firms will already have
information governance and security policies in place, these may no longer be fit-for-purpose
once AI is adopted, and should be reviewed and, if necessary, updated.
training data will need to be shared, managed, and when necessary deleted in
line with security policies. While many recruitment firms will already have
information governance and security policies in place, these may no longer be fit-for-purpose
once AI is adopted, and should be reviewed and, if necessary, updated.
Technical
teams should record and document all movements and storing of personal data from
one location to another. This will help organisations apply the appropriate security
risk controls and monitor their effectiveness. Clear audit trails are also necessary
to satisfy accountability and documentation requirements.
teams should record and document all movements and storing of personal data from
one location to another. This will help organisations apply the appropriate security
risk controls and monitor their effectiveness. Clear audit trails are also necessary
to satisfy accountability and documentation requirements.
In
addition, any intermediate files containing personal data, eg compressed versions
of files created to transfer data between systems, should be deleted as soon as
it is no longer required.
addition, any intermediate files containing personal data, eg compressed versions
of files created to transfer data between systems, should be deleted as soon as
it is no longer required.
Depending
on the likelihood and severity of the risk to data subjects, organisations may
also need to apply de-identification techniques to training data before it is
extracted from its source and shared internally or externally.
on the likelihood and severity of the risk to data subjects, organisations may
also need to apply de-identification techniques to training data before it is
extracted from its source and shared internally or externally.
For
example, the employers may need to remove certain features from their HR data,
or apply privacy enhancing technologies (PETs) like differential privacy,
before sharing it with the recruitment firm.
example, the employers may need to remove certain features from their HR data,
or apply privacy enhancing technologies (PETs) like differential privacy,
before sharing it with the recruitment firm.
For
more on these techniques, see our Anonymisation
Code of Practice and future blog posts on data minimisation. New guidance on Anonymisation will
also be published soon.
more on these techniques, see our Anonymisation
Code of Practice and future blog posts on data minimisation. New guidance on Anonymisation will
also be published soon.
Risk example #2 – Security risks introduced by externally maintained software used to build AI systems
Very
few organisations build AI systems entirely in-house. In most cases, the design,
building, and running of AI systems will be provided, at least in part, by
third parties that the organisation may not always have a contractual
relationship with.
few organisations build AI systems entirely in-house. In most cases, the design,
building, and running of AI systems will be provided, at least in part, by
third parties that the organisation may not always have a contractual
relationship with.
Even
if an organisation hires its own ML engineers, they may still rely
significantly on third-party frameworks and code libraries. In fact, many of
the most popular ML development frameworks are open source.
if an organisation hires its own ML engineers, they may still rely
significantly on third-party frameworks and code libraries. In fact, many of
the most popular ML development frameworks are open source.
Using
third-party and open source code is a valid option. Developing all software components
of an AI system from scratch requires a large investment of time and resources
that many organisations cannot afford, and especially compared to open source
tools, would not benefit from the rich ecosystem of contributors and services
built up around existing frameworks.
third-party and open source code is a valid option. Developing all software components
of an AI system from scratch requires a large investment of time and resources
that many organisations cannot afford, and especially compared to open source
tools, would not benefit from the rich ecosystem of contributors and services
built up around existing frameworks.
However,
one important drawback is that these standard ML frameworks often depend on
other pieces of software being already installed on an IT system. To give a
sense of the risks involved, a recent study found the most
popular ML development frameworks include up to 887,000 lines of code and rely
on 137 external dependencies. Therefore implementing AI will require changes to
an organisation’s software stack (and possibly hardware) that may introduce additional
security risks.
one important drawback is that these standard ML frameworks often depend on
other pieces of software being already installed on an IT system. To give a
sense of the risks involved, a recent study found the most
popular ML development frameworks include up to 887,000 lines of code and rely
on 137 external dependencies. Therefore implementing AI will require changes to
an organisation’s software stack (and possibly hardware) that may introduce additional
security risks.
For
example, let’s say the recruitment firm above hired an ML engineer to build the
automated CV filtering system using a Python-based ML framework. The ML framework
depends on a number of specialist open-source programming libraries, which needed
to be downloaded on the firm’s IT system.
example, let’s say the recruitment firm above hired an ML engineer to build the
automated CV filtering system using a Python-based ML framework. The ML framework
depends on a number of specialist open-source programming libraries, which needed
to be downloaded on the firm’s IT system.
One
of these libraries, contains a software function to convert the raw training
data into the format required to train the ML model. It is later discovered the
function has a security vulnerability. Due to an unsafe default configuration,
an attacker introduced and executed malicious code remotely on the system by
disguising it as training data.
of these libraries, contains a software function to convert the raw training
data into the format required to train the ML model. It is later discovered the
function has a security vulnerability. Due to an unsafe default configuration,
an attacker introduced and executed malicious code remotely on the system by
disguising it as training data.
This
is not a far-fetched example, in January of 2019, such a vulnerability
was discovered in ‘NumPy’, a popular library for the Python programming
language used by many machine learning developers.
is not a far-fetched example, in January of 2019, such a vulnerability
was discovered in ‘NumPy’, a popular library for the Python programming
language used by many machine learning developers.
What should organisations do?
Whether
AI systems are built in house, externally, or a combination of both, they will
need to be assessed for security risks. As well as ensuring the security of any
code developed in-house, organisations need to assess the security of any externally
maintained code and frameworks.
AI systems are built in house, externally, or a combination of both, they will
need to be assessed for security risks. As well as ensuring the security of any
code developed in-house, organisations need to assess the security of any externally
maintained code and frameworks.
The
ICO has already produced some guidance
on managing security of internal and external code in the related context of
online services. This includes external code security measures, such as subscribing
to security advisories to be notified of vulnerabilities, and internal code
security measures, such as coding standards and source code review. The same or
similar measures will apply to AI applications. However as we mentioned at the
beginning, the ICO is developing further security guidance, which will include
additional recommendations for the oversight and review of externally
maintained source code, as well as its implications for security and data protection
by design.
ICO has already produced some guidance
on managing security of internal and external code in the related context of
online services. This includes external code security measures, such as subscribing
to security advisories to be notified of vulnerabilities, and internal code
security measures, such as coding standards and source code review. The same or
similar measures will apply to AI applications. However as we mentioned at the
beginning, the ICO is developing further security guidance, which will include
additional recommendations for the oversight and review of externally
maintained source code, as well as its implications for security and data protection
by design.
In
addition however, organisations developing ML systems can further mitigate security
risks associated with third party code, by separating the ML development
environment from the rest of their IT
infrastructure where possible.
addition however, organisations developing ML systems can further mitigate security
risks associated with third party code, by separating the ML development
environment from the rest of their IT
infrastructure where possible.
Two ways to
achieve this are:
achieve this are:
- Use ‘virtual machines’
or ‘containers’
– emulations of a
computer system that run inside, but isolated from the rest of the IT system. These can be pre-configured specifically for
ML tasks. In our recruitment example, if the ML engineer had used a virtual
machine, then the vulnerability could have been contained. - Many ML systems are developed using programming
languages that are well-developed for scientific and machine learning uses, like
Python, but are not necessarily the most secure. However, it is possible to
train an ML model using one programming language (eg Python) but then, before
deployment, convert the model into another language (eg Java) that makes making
insecure coding less likely. To
return to our recruitment example, another way the ML engineer could have
mitigated the risk of a malicious attack on CV filtering model, would have been
to convert the model into a different programming language prior to deployment.
We would like to hear your views on
this topic and welcome any feedback on our current thinking. In particular, we
would appreciate your insights on the following questions:
- How and to what degree are organisations currently inspecting externally maintained software code for potential vulnerabilities?
- Are there any other well-known security risks which AI is likely to exacerbate? If so, which ones and what effect will AI have?
- What should any additional ICO security guidance cover?
Dr Reuben Binns, a researcher working on AI and data protection, joined the ICO on a fixed term fellowship in December 2018. During his two-year term, Reuben will research and investigate a framework for auditing algorithms and conduct further in-depth research activities in AI and machine learning. |