Artilium Blog

Monday, May 25, 2009

Do I know who you are?

Previous blogs have discussed the need for explicit subscriber opt-ins when personal data obtained within the mobile network is being used. However, it can be useful to use grouped subscriber data for gathering intelligence about subscriber behavior in general. Essentially this helps in understanding the subscriber’s needs and enables services to be optimized for customers in general. This data can be grouped (or bulked) in a number of ways often with the groupings based on subscribers demographic, subscriber location and time.

To use this type of bulked data you would assume that the subscriber does not need to give their explicit permission since this no longer appears to be “personal data”. However, this is not quite as clear cut as this and very much depends whether the bulk data can still be considered to be anonymous.

If we consider the definition of “personal data” as defined in the Data Protection Act 1998 (i), when data is obviously about a particular individual then it is definitely “personal data”. However, data may appear to be anonymous and might not be obviously about an individual, but it might still be possible to identify the individual with reasonable probability even after it has been de-personalised.

Data can be de-personalised in a couple of ways.

  • Individual records are anonymised by removing personal details (e.g. name, address, phone, email).
  • Records become anonymised by grouping data such that the personal details become irrelevant and hence are not stored. The source data is then destroyed after it has been grouped.

In either case the de-personalised data may not automatically be considered to be anonymous. For example, subscriber, John, has his location history stored under a pseudonym Peter and there is no direct link between the two names. If subscriber home addresses are known within the system, and Peter’s location history indicates that he goes to a particular location each night which can relate to only one subscriber home address within the database, then it is possible to connect Peter to the real name of John with reasonable probability and so the location history data may not be considered as anonymous.

We favour the de-personalisation of data using the second method of grouping data so that it forms part of a sample. There are two reasons for this: first the grouped data is very useful for looking at statistics, demographics and trends, and second it is the most effective way to protect the privacy of the subscribers.

The data can be grouped according to the categories of location, time and subscriber demographics (age and gender). To ensure that there are enough samples within each group, we can widen the resolution of each category so that the sample sizes are large enough to ensure that data cannot be related back to a specific subscriber. For example, we might try to store the location postcode of subscribers aged 30-35 in 5-minute intervals. If we find that only one or two subscribers fit some of the groupings, we might consider 15-minute intervals, we could widen the age range to 25-40 or we might widen postcode resolution to just the outer code (e.g. KY11 rather than KY11 8GR).

As a rule of thumb, we consider that information is fully anonymised if there are at least 10 subscribers to whom the information could refer. Even if someone did attempt to use additional data to identify a subscriber, at best they would still only have a 10% chance of correctly identifying them. This is often called k-anonymity (ii), where k reflects the level of privacy protection.

So in answer to the original question “Do I know who you are?”. After the data has been grouped the answer should be “No, I do not know who you are and I have little chance of figuring it out.”

(i) http://www.ico.gov.uk/upload/documents/library/data_protection/detailed_specialist_guides/
(ii) http://privacy.cs.cmu.edu/people/sweeney/kanonymity.pdf,
L. Sweeney, “k-anonymity: a model for protecting privacy,” Int. J.Uncertain. Fuzziness Knowl.-Based Syst., vol. 10, no. 5, pp. 557–570, 2002.

Posted on 05/25 at 06:49 AM

Name:

Email:

Location:

URL:

Remember my personal information

Notify me of follow-up comments?


back to the top