Chatbots and Personal Data: Benefits and Risks

6 Apr

Chatbots – the ultimate disruptor?

On social media, the Internet, the airwaves and the pages of the good old-fashioned ‘dead tree’ press, the chat has been all about chatbots and artificial intelligence (AI).  OK, let’s start with defining a ‘chatbot’.  ChatGPT1, currently the best-known bot, describes itself as “a language model developed by OpenAI [a software startup backed by Microsoft], and it can answer questions and generate text based on the information it has been trained on.”  In other words, an AI program that writes answers to any question under the sun you care to ask it, as though you were corresponding with a human being (and an exceptionally well-informed human being at that).

You may not have heard about ChatGPT, but it is highly likely you have already read something created using it (not this blog I hasten to add – I’m a real person, honest!).

But to characterise chatbots (Google also has its own version, called Bard) as merely the next generation of ‘conversational-search’ search engines would be to underestimate the potential they have to revolutionise how we humans work in a wide variety of sectors.  Sectors already looking nervously over their shoulder include: the law and other professional advisory services, education, journalism, and any of the creative industries which depend on generating text (e.g., advertising).  And even, dare we say, consultancy work!

The game-changing USP of chatbots is that the answers they produce (typed on your screen in real time within a few seconds of submitting your question) are set out in grammatical sentences (even if they do use US spelling!).  The responses are supposedly authoritative summaries culled from the vast datasets, and other materials from the Internet, which the chatbot has been ‘trained’ on.  As such, the response can be of whatever length the user specifies – from a job application letter to a 5,000 word essay; and are (again purportedly) accurate to a level that most mere humans cannot guarantee.  And the more exact your question (and chatbots love a precise question, it must be said), the more accurate will be the answer.

These features obviously lend themselves to ChatGPT answers being incorporated more or less wholesale into reports, papers, analyses and, ahem, school homework and university assignments.  All this sounds like a copy-and-pasting plagiarist’s charter of course, although the terms of use for ChatGPT from OpenAI (the developer of the software) states that ‘Users may not …represent that output from the Services was human-generated when it is not’ -good luck enforcing that many would say!.  However, it is not the clear possible copyright ramifications, or the potential for bias inherent in all machine learning algorithms that this blog will be focussing on, but rather the privacy implications of using this new AI ‘super tool’.

1) In the name ChatGPT, the GPT part stands for ‘Generative Pretrained Transformer’ by the way - and if that doesn’t sound like something out of Philip K Dick, I’m Isaac Azimov.

Data protection benefits

As with other areas of their use, chatbots offer a potentially huge step forward in aiding the understanding by laypeople and organisations of the impact of the substantial expansion of data protection (DP) law since the GDPR came into effect. Chatbots could certainly help users better understand how they can apply and comply with DP law.  It can be argued that a large part (perhaps 80 or even 90%)  of the value solicitors impart to clients is using their knowledge and experience to assist them formulate the correct legal question (i.e., the precise request for advice which exactly suits the facts of their particular circumstances).  Once the right question is identified, finding the answer is very often the ‘easy’ part!  So when applying chatbots, such as ChatGPT, to data protection compliance questions and, if used correctly, they could be of significant value in summarising, in a logical and concise way, the questions which organisations can put to their human advisors (such as consultants).  In the process, this will save the client organisations time and money that would otherwise be spent on clarifying their possible requirements, which the consultancy can then respond to and advise upon. Clients often report that the hardest part of their DP compliance journey is knowing where to start – the right questions to ask to understand what they need – be it advice, documentation or other materials.  Anything which makes it easier for clients to verbalise their DP requirements to their privacy professionals, and thus improve their organisations’ compliance, is a good thing – and if it comes at no extra cost (currently!) to the client, so much the better.

Data protection risks

Chatbots are still very much in their infancy – ChatGPT was only launched in November of 2022.  But some potential privacy law pitfalls in using ChatGPT are already emerging.  These, currently, fall into three broad types of legal risk – all of which revolve around the licence terms of use which the developer OpenAI enters into with the user.  The first data risk arises from how the ChatGPT algorithm is trained on very large datasets, including text drawn from the Internet.  These datasets could very well include personal data (data capable of identifying a living individual or individuals, like a name or a face), including sensitive or ‘special category’ information.  However, the GDPR (including the UK GDPR) requires that every organisation that processes personal data has to have a lawful basis for doing so.  In order to address this requirement, as ChatGPT does not have the ability to check if it has the right to process personal data in its training data, OpenAI's terms of use places the legal onus on the user to ensure that the user has the right (including the lawful basis) to process any personal data so included.  However, this ignores the likelihood that OpenAI, in training ChatGPT, is itself acting as a controller of any personal data in the training datasets (i.e., OpenAI is deciding how that data is used for machine-learning by their chatbot and, therefore, needs a lawful basis for such processing, independent of the users’ lawful basis/bases).  This could prove costly for OpenAI because last year the UK’s Information Commissioner’s Office (ICO), along with other regulators, heavily fined another AI company, Clearview, for harvesting large amounts of publicly available personal data from the Net without an adequate lawful basis.  Furthermore, the UK regulator has said that personal data processing by AI organisations will continue to be a primary area of focus for its enforcement activity for the foreseeable future.

The second data risk occurs where users input personal data into ChatGPT (in their questions for example).  ChatGPT’s terms of use state that the user will be considered to be the controller of that data and, therefore, responsible for discharging all the obligations of a controller vis-à-vis the processing and the data subjects involved, i.e., the individuals whose identifiable information is being processed by the user.  As an example, the controller will be responsible for informing data subjects about the ChatGPT processing by way of privacy notices, as well as adhering to all the GDPR’s data processing principles, and be able to demonstrate accountability, practise data protection by design and by default etc, etc.  The OpenAI terms proceed to provide that, unless they specifically opt out, users “agree and instruct” OpenAI to use any input data and the output ChatGPT provides to “develop and improve” OpenAI’s services.  This again imposes the responsibility on the layperson user (as controller of the data) to ensure that the transfer of any personal data contained in such inputs and outputs to OpenAI, and such use by OpenAI for their own business purposes, is permitted under DP law (unless, as stated, the user opts out of this use by OpenAI).  One way to avoid part of this risk would be for the user to refrain from including any personal data at all in the questions they input into the chatbot, for example, by anonymising the input data.  On this point, it is interesting to note that it was reported recently that Microsoft itself has advised its employees not to submit questions containing sensitive personal data into ChatGPT.

Thirdly, data protection law also requires that, where an organisation is using a third-party supplier to process personal data on its behalf, the parties must enter into what is called a ‘data processing agreement’ (DPA), or sometimes a ‘data processing addendum’.  Such an agreement needs to contain some minimum specified mandatory provisions.  ChatGPT’s terms do require users who are (as data controllers) using the chatbot (as a processor) to process personal data on their behalf, to enter its standard data processing addendum.  However, this does not cut in automatically – users have to opt in to the DPA.  In order to do so, users first need to be aware that they are in fact entering a controller-processor relationship requiring them to execute a DPA.  Many controller users may not be so aware, and so by using ChatGPT, may well enter into controller-processor relationships unlawfully.

Heavily qualified liability

The terms of use for ChatGPT purport to exclude as much liability from Open AI as it is probably reasonably possible to do.  They obligate users to indemnify OpenAI for any claims for damages arising from the users’ of the tool (even non-negligent use), and that ChatGPT is provided "as is" (that is, without any warranty that its output will be accurate – NB! - or fit for the user’s particular purpose).  These disclaimers are wider than those found in most commercial contracts and more extensive than most businesses would willingly accept in their normal course of trading.  It should of course be noted that OpenAI is, at the time of writing this blog, providing ChatGPT for no monetary charge to users who subscribe to MS’s ChatGPT-enabled browser, Bing.  As such, hefty exclusions of liability should come as no surprise.  Even with the restricted liability which OpenAI does accept, this is limited to the greater of $100 or the amount paid for the service in the previous twelve months, which is of course a very low liability cap.

So, if a commercial organisation intends to embed the use of ChatGPT as part of its business model, it must first be aware of the major curbs on the organisation’s rights to redress against OpenAI if any of the above data protection risks materialise.  Breach of privacy claims tend to come in at a lot more than $100 ...!

Proceed with care?

In late March 2023, it was reported that the Italian privacy regulator, the Garante, had imposed an enforcement notice on the creator of ChatGPT, Open AI, ordering the company to cease using the personal data of Italian people in its training of the tool or in the output answers it produces.  This is due to Open AI having insufficient lawful bases to collect and process such data in such ways, and the lack of effective age verification in the model.  Whether this ban is even possible for Open AI to comply with in the long term remains to be seen.  How, for example, would Open AI be able even to identify, let alone remove, from its immense training materials (which spans almost the entire internet remember), what is and is not the data of Italian people?  Open AI responded by saying that it has temporarily disabled ChatGPT for users in Italy, at the request of the data protection authority, while the Garante’s investigation proceeds.

The UK Government and the ICO apparently have no immediate plans to follow Italy’s lead.  This has attracted criticism from privacy experts who were quoted as describing the UK as ‘asleep at the wheel’ in respect of needing to regulate the serious potential harms involved in AI large language models, such as ChatGPT.  They unfavourably contrasted the UK’s position with the more proactive stances taken by lawmakers and regulators in the US and EU (with the latter having its draft AI Act, a world first, in the pipeline).  So data protection challenges, along with intellectual property issues and ethical risks (such as machine learning bias, spreading of deliberate misinformation for fraud and propaganda purposes, and the ability to generate incredibly convincing fake photos which are practically indistinguishable from the real thing) appear to be at the forefront of the threats which this revolutionary technology brings.

As stated, this blog has only considered, at a high-level, a single ‘slice’ of the various possible legal ramifications of using ChatGPT, A.I., those concerned with data protection risks.  To repeat, the potential vulnerabilities to claims for intellectual property infringement flowing from use of ChatGPT are at least as significant (and possibly more so).  A prudent approach for many organisations to take in the short term might therefore be to ‘wait and see’, until the legal position on copyright and DP exposure resulting from use becomes clearer.  As stated, it is still very early days in terms of the use of these tools.  And until we get that clarification, many organisations will be advising employees to refrain from business-related use of chatbots – and especially to avoid incorporating their outputs into commissioned work which has been paid for by clients.  Indeed, it is understood that two large US organisations, the bank JP Morgan and the mobile phone network Verizon, recently took this stance – reportedly issuing outright bans on their staff making workplace use of ChatGPT.

People, be they teachers, university tutors, law firm clients, consumers of news or customers of consultancies, ultimately want to see the creativity of the individual humans they have instructed: bespoke original material, expressed through the still irreplaceable filter of human experience, and written with feeling and a sense of identification with other human beings (and maybe a little appropriate humour too – have you seen a joke written by ChatGPT?).  What they don’t want to see is the generic output of a machine, however factually accurate and comprehensive.  They could easily ask this for themselves2.

2) P.S. Do me one favour – when you have finished reading this, please resist the temptation to enter into ChatGPT the question “What are the privacy pros and cons of ChatGPT?” and compare the answer the chatbot comes back with to this blog.  It’s the beginning of the end otherwise!.

Gain a sound grounding and practical interpretation of the GDPR and the DPA 2018!

By attending URM’s online BCS Foundation Certificate in Data Protection course, you will gain valuable insights into the key aspects of current DP legislation including rights of data subjects and data controller obligations.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Thumbnail of the Blog Illustration
Data Protection
Published on
When and How to Conduct a Data Protection Impact Assessment (DPIA)

A DPIA delivers a pre-emptive approach to assessing these risks, and can prevent a data breach occurring. We present an outline of steps in conducting a DPIA

Read more
Thumbnail of the Blog Illustration
Data Protection
Published on
Data Transfer Risk Assessment

We are focussing on transfer risk assessments (TRAs), commencing with the background that led to their introduction and then addressing the five questions.

Read more
Thumbnail of the Blog Illustration
Data Protection
Published on
BS 10012:2017 – What are the Benefits and How Do I Achieve Certification

BS 10012 is a standard which has been developed to enable organisations to implement a personal information management system (PIMS).

Read more
Excellent knowledge by the team!
Webinar 'GDPR - Back to Basics'
contact US

Let us help you

Let us help you in your compliance journey by completing the form and letting us know how we can best support you.