Leaked clarifications about Chatcontrol

Context

A week ago (2022-06-29), netzpolitik.org published a leaked document showing answers from the european commission to the member states regarding the chatcontrol regulation. The leak is mostly in german, very long and was/is mostly ignored by mainstream media (even in germany). In this post, I've translated the parts I personally consider the most relevant and added some thoughts/criticism/relevant context.

Format notes

Quotes from the article (translated or not) are formatted as blockquotes, any emphasis or links has been added by me. The questions are in english and taken straight from the source, answers from the commission have been translated by me (which I should have noted everywhere).

I'll go over the text in order. To find the original german text in the leak (or other linked documents), I note the search terms using Ctrl-F "the search term in quotes".

I haven't translated the whole leak, instead stuck to the points I think are important. Despite this, the post is longer than my original chatcontrol criticism. You may want to skip around using the headings or read the summary at the end.

Disclaimer

Obvious disclaimer: I'm not a lawyer, nor a professional translator, so I may get things wrong. Send me a message if you want mistakes to be fixed.

The leak

The leak is a diplomatic cable, mostly in germand & addressed to multiple german government agencies. It contains a notes from a session of the LEWP (info here). During the session, the various written questions about chatcontrol asked by different member states were answered.

In the following I'll translate the points made by the european commission, including the question asked by germany whereever it is noted. Titles are by me, where I consider it necessary I added some context. After most points I added my own notes/thoughts, which you can safely skip if you are only interested in a rough translation. (Though in that case, you may prefer the full translation here).

Does the regulation violate a mass surveillance ban?

Context

Article 15 of the eCommerce directive prohibits mass surveillance:

Member States shall not impose a general obligation on providers [...] to monitor the information which they transmit or store, nor a general obligation actively to seek facts or circumstances indicating illegal activity.

Despite this, due to the planned regulation, service providers will be forced to scan most communication. How does the commission resolve this problem?

The commission claims surveillance is specific

(Translated from german, Ctrl-F "KOM legte dar")

The commission explained that the regulation draft is in accordance with Article 15 of the eCommerce directive (TN: Ctrl-F "Article 15", 2nd match) in conjunction with recital 47 (TN: Ctrl-F "(47)"). Targeted detection of definitively illegal material based on national orders is not covered by this ban. Specifically, the regulation is proportionate because detection orders are only issued in specific cases [...]. Finally, commission highlighted that independent of context, CSAM is illegal in every case.

It's still mass surveillance

It may be true that detection orders are issued in specific cases. However, those cases can occur when any member state has decided that a service provider has not done enough to detect Child Sexual Abuse Material (CSAM). In any case, there is an obligation for service providers to conduct a risk assesment for their services, and then conduct surveillance based to that risk assesment. In most cases, a detection order will therefore be unnecessary.

Therefore, to avoid detection orders, in the process of fullfilling the obligations of the chatcontrol regulation, service providers will be forced to scan interactions between users. As it is impossible to know in advance which users will exchange CSAM or try to groom children, this surveillance must apply to everyone - mass surveillance.

The specific reason is said to be "targeted detection of definitively illegal material", however I personally do not think this is specific enough. Would we accept that every physical letter sent by post is opened and searched for hints of child abuse? I don't think so - such searches would obviously be considered mass surveillance. Why would it not be the same thing if the messages are exchanged online?

Illegal in every case?

The statement that "CSAM is illegal in every case" is misleading at best: The technologies used for detection are not good enough and will cause many false positives, which are most likely NOT illegal. Even if "real" CSAM is detected, what is/isn't considered CSAM varies between member states due to differing age of consent laws (mentioned later). That's not even taking into consideration the varying laws about cartoon child pornography.

In short:

what is illegal varies, and is not illegal in every case
detection must logically affect all users
there will be false positives with completely legal images/messages

To claim that this legally mandated monitoring is specific seems very far fetched.

Why not prefer voluntary measures?

The commission thinks companies aren't doing enough

The commission claims four things, presented as list for ease of reading (Ctrl-F "Zu DEU Frage 20" in the leak):

The measures taken by companies strongly vary (in 2020 more than 1600 companies were mandatory reporters to NCMEC, only 10% reported anything, 95% of reports were by meta)
Voluntary measures may not last due to changes in company policy
Voluntary measures still impact human rights and shouldn't be left to companies
Even companies actively taking measures do not support victims enough when seeking removal of CSAM, and victims do not have a legal basis to ask for search of CSAM.

My thoughts

Sure, how much the companies are doing varies. However Meta is basically the messaging service provider, of course they submit the most reports. Not every company has the Meta's/Facebook's ability to process such a huge amount of data, and few companies have access to tools to reliably detect CSAM. Additionally, I would be curious to see the false positives rate between different providers - do most of them filter out false positives and facebook just reports everything it detects?

It's also entirely unsurprising that few companies scan messages: From December 2020-July 2021 mass scanning of messages (chatcontrol) was explicitly forbidden in the EU (Ctrl-F "EECC")! Only at the insistence of companies already scanning messages, an interim regulation was established legalizing this scanning.

It seems quite ridiculous to me for the commission to claim that the scanning impacts human rights, then have their reaction be to make scanning mandatory and force every company to violate the human rights, at the same claiming that no rights will be impacted.

I agree that the decisions over human rights must not be made by companies, but the proposed regulation will not solve this problem! Once implemented, company policy will still dictate:

what they want to scan
the technologies used for scanning
how to react in case of CSAM detection
when to forward information about detections
how to deal with false positives
if the company scans/cooperates at all (unless forced by detection orders)

This means that company effectively have a rather high degree of control about the whole scanning process, suitable for both scanning more than legally required as well as not enough if they don't want to. The decision on which measures to take effectively remains with the company.

Put yourself in the shoes of a small commercial service provider: You are forced to make a decision on whether or not to scan all communication you process. The detection will require you to buy additional servers to process the data and possibly paid employees to review the many false positives. Worse, your customers do not want their messages to be scanned, and some may even be willing to move to a non-commercial service provider or one outside the EU. This regulation may cost you customers! Do you:

continuously spend money to establish a monitoring system and support victims as much as you can OR
deny any responsability and avoid scanning everything, telling your customers you value their privacy?

I think it's rather obvious which possibility will be chosen. Forcing providers to cooperate will not lead to providers supporting CSAM victims more. It will only lead to companies doing the bare, legally required, minimum. Given the claim that not even companies voluntarily helping are supportive enough, I doubt companies which are forced to implement measures they don't want to implement will be more cooperative when working with victims.

Meanwhile, while some companies have been voluntarily cooperating, police does not report when they find CSAM material to service providers and sometimes even distribute CSAM themselves. To complain that companies voluntarily performing CSAM detection don't support victims enough while police is not even doing the bare minimum is laughable.

To summarize my criticism:

companies that want to cooperate already do so and likely aren't willing do more
companies that don't want to cooperate will likely fight using company policies and lawyers to avoid having to support victims more
abusers will ignore the laws
police does not combat CSAM or actively distribute it
none of these issues will be fixed by the regulation

How will the regulation deal with encryption?

The commission was asked by the german government:

Could the COM please describe in detail on technology that does not break end-to-endencryption, protect the terminal equipment and can still detect CSA-material? Are there any technical or legal boundaries (existing or future) for using technologies to detect online child sexual abuse?

The commission's answer

(Translated from german, Ctrl-F "KOM-Entwurf richte")

The regulation is not targeted against encryption. The commission does not attempt to break encryption as it recognizes it's importance. Encryption is not only important to protect private communication, but also helps & protects criminals, which is why the commission did not want to exclude encryption from the draft regulation. Commission referenced the Annex 8 of the Impact Assessement which presented different technologies. The commission is in contact with companies which are prepared to implemented various detection technologies or are already using them. Commission highlighted that the least intrusive technology is to be used. If there is no technology available for detection, no detection may be required.

The regulation still attacks encryption

Even though the commission emphasizes how important it is to protect private communication, in the very next sentence it continues to attack encryption by claiming that encryption that protects criminals.

The mentioned Annex 8 does not actually concern encryption, but evaluated different approaches to detect CSAM. Scanning on a device before sending or when receiving messages (so-called client side scanning) indeed does not require to break or ban encryption. However, the result is the same as scanning encrypted messages, but worse:

Since the code performing detection is running on the sending/receiving message client, it can be modified by the provider to report whatever the service provider want, effectively nullifying the benefit of encryption. This scanning solution is also easy to disable, as users may modify the code on their devices to block/disable sending reports. There are many other technical issues. If you want to know more, I wrote at length about many of these here.

Even if client side scanning is the least intrusive technology, it remains very intrusive. It can also be trivially bypassed by e.g. switching to voice chat, as there (hopefully) won't be a detection obligation due to the insufficiently advanced technologies.

No exceptions for small companies

From the same answer as the previous section:

The commission was also asked if there were going to be exceptions for small and medium-sized companies (TN: german KMU). The commission confirmed there would be no exceptions, but that the draft regulation contains measures to support these companies. Support would be in the form of providing detection technologies, help for the evaluation of what needs to be detected as well as training for employees in coordination with Europol. Most notably the EU Centre would review reports

No comment by me.

Clarifications: Who is affected?

The commission clarified which service providers are affected, the following points are translated/summarized from multiple questions:

Search engines are currently not affected by the draft. However, they play a role in the distribution of CSAM and the commission is open for a discussion to include them in the draft
Livestreaming is included in the definition of CSAM
Classified communication and company/government communication do not fall under the scope of the regulation
[file/image] hosting service providers fall under the scope of the regulation, it is a question of proportionality to determine who can be addressed for detection orders. As an example, cloud services which only provide infrastructure are not suitable addressees for detection orders

My thoughts

Expanding the scope to search engines would be a terrible step, as it would make it very difficult, if not impossible, to legally develop an independent web search engine. Together with automated removal, it's conceivable that this leads to a future where whole websites cannot be found because the few remaining search engines (such as google) mistakenly identified an innocent image as CSAM and removed it from their results.

Company/government communication (especially unofficial/unclassified) may take place via third party providers (e.g. WhatsApp, e-mail). Even if the regulation does not apply to this company/government communication directly, surveillance may still take place if companies/governments rely on such third parties for communication. It's not easily possible to split between private and company/government communication.

National laws apply

Translated, Ctrl-F "Zu Artikel 2 j)"

To define "child users", the age for "sexual consent", which varies by member state, is relevant. Providers are unable to determine "consent in peers". In case of a report national laws must be taken into account.

My thoughts

These two sentences create a huge mess:

How will providers determine the age of people in pictures/videos to detect if they are over/under the age of consent? Will there be mandatory age verification?
Will companies enact different policies for each country?
From what country are the laws to be followed? The one where the company is located? The one of an abuser sending a message? The one where the "child user" is located? The most restrictive? All of them?

What surveillance technologies are appropriate?

What kind of (technological) measures does COM consider necessary for providers of hosting services and providers of interpersonal communication in the course of risk assessment? [...] How can these providers fulfill the obligation if their service is end-to-end encrypted?

The commission's answer

(Again translated & summarized as list:)

The technologies to be used depend on the specific service and risk
End-To-End Encryption must not prevent the service provider from performing an analysis
Service providers usually have good knowledge about the risks of their services, e.g. due to content reported by users
The EU Centre will release (non exhaustive) guidelines

My thoughts

The second point again shows that even encrypted messages must be scanned. This basically means outlawing encryption unless client-side-scanning is used!

Publishing guidelines seems sensible, however it is not enough: The commission does not guarantee anything if these guidelines are followed, so it'd still be possible for detection orders to be issued even to a company which does its best to follow the guidelines. Those guidelines could also change at any time without any democratic control.

How precise are the technologies?

The german government asked about the risks of false positives (Ctrl-F "Zu DEU Frage 7"):

How mature are state-of-the-art technologies to avoid false positive hits? What proportion of false positive hits can be expected when technologies are used to detect grooming? In order to reduce false positive hits, does COM deem it necessary to stipulate that hits are only disclosed if the method meets certain parameters (e.g., a hit probability of 99.9% that the content in question is appropriate)?

The commission's answer

The commission did not directly answer the question, instead stating (translated from german):

There are technologies such as PhotoDNA that have been used for years. The accuracy [sic!] of grooming detection is about 90%, meaning about 9 out of 10 pieces of reported content are actually grooming. False positives are to be filtered out by the EU centre. The commission does not prescribe numeric values [for accuracy/precision] to remain open for new technologies.

90% precision is not enough

If you read my previous post about chatcontrol, you'll remember I described the tradeoffs between precision and recall. The grooming detection falls on the side of high precision, but low recall, which means that many criminals will get away undetected (of course the commission doesn't state the recall value). But maybe we'll be happy with just these few, mostly correct reports, right?

Although 90% precision (which the commission mistakenly calls accuracy) seems very high, in practice it's still too low to be reliable. A short estimate how many people will be investigated while innocent: Based on the number of user report by facebook, the commission estimated there will be 8_812_811 reports per year. At 90% precision, this would mean about 881_281 false positive reports.

Lets speculate that only 1% (a number I just made up) of those false reports are actually investigated. This would still mean over 8_800 innocent people investigated PER YEAR. Even if not convicted, just being accused of child abuse likely means they will be ostracized by most of society, including friends, family and employers.

I should also point out that in 2020, europol had a backlog of about 40 million images to analyze. This means that, unless the backlog is more than five years old, there will be way more than 8 million reports per year, so the over 800_000 false reports per year with over 8_800 people wrongfully investigated people are likely to be an underestimate!

The 90% precision mentioned for grooming detection are likely rounded up (!) from the 88% precision (mistakenly referred to as accuracy) claimed (!) by microsoft in the impact report of the commission. This rounding up by "just" 2% is actually quite generous: Without rounding there would be an additional 20% in false reports compared to with rounding, meaning an additional 176_000 false reports per year. Nothing you should just round away.

Even this precision is unlikely to be reached in practice, as this grooming detection technology by microsoft refers to a tool which as of 2020 did not support any language but english and was neither designed for law enforcement nor for end-to-end encrypted communication. Once the tool is adapted to real-world usage, expect a significantly lower precision.

Finally, even though a 90% precision is not good enough in practice, it is an unrealistically high goal due to the various challenges for accurate detection.

How is abuse of scanning prevented?

How do you want to ensure that providers solely use the technology – especially the one offered by the EU Centre – for executing the detection order? How would we handle an error? How should eventual cases of misuse be detected?

The commission's answer

There would be penalties in case of abuse of the technologies. The compliance will be checked by national agencies. The technologies are only suitable for detection of CSAM.

My thoughts

Since the regulation affects every company, no matter the size, thousand (millions?) of companies will be affected. It is obviously impossible for national agencies to continuously check for abuse. Given that abuse could also occur in daily usage (e.g. by employees review reports for false positives), it may not even be possible for agencies to detect abuse.

The statement that the technologies are suitable only for detection of CSAM is laughable. Once implemented, it is trivial to add another rule to a working detection technology, such as a text pattern for grooming detection or a specific image hash.

Human review?

Could you please elaborate on the human oversight and how it can prevent errors by the technologies used?

The commission's answer

Providers are not required to perform human review for every detection. The EU centre guarantees human oversight for reports of known CSAM. The centre is committed to human oversight for new CSAM and grooming. The centre works as filter between LEAs and providers.

My thoughts

This does not mean that providers are not required to perform human review at all. In the proposal it is stated that regular evaluation of accuracy & detection rates is required (regulation Article 10, paragraph 4, letter f), which is only possible with human review.

Not requiring human review by providers raises issues, such as: How will the providers know about false positives? Will the EU centre reply that a report was a false positive?

The centre working as a filter between LEAs actually sounds like a great idea, but it does not require a mass surveillance infrastructure! The centre could just as well work on CSAM reported by users or found by law enforcement, and (unfortunately) still remain very busy, even without introducing mass surveillance.

How many cases will there be?

What number of cases does COM expect for the reports to EU CSA? How many cases will be forwarded to the competent national law enforcement authorities and/or Europol?

The commission's answer

The commission cannot name an exact number. When basing numbers on reports on numbers from the NCMEC, it should be considered that US law is not specific. Currently, many reports are not actionable because they are not CSAM according to EU law or because of missing information. There are also no filters for false positives. Overall, it is expected to see not only increase in number of reports but an increase in quality of reports to LEAs.

My thoughts

It's strange the commission doesn't name a number, or even an estimate. Are there no detailed plans on how the many new reports will affect the number of cases?

If there currently are no filters for false positives and material not illegal in the EU but only in the US, this will make training AI for detection of new material much more difficult, as the training data itself is flawed.

The missing information problem cannot be solved by the new monitoring obligations, as providers cannot hand out information they don't have.

Requirements for fast removal

Context

The draft regulation requires removal of content within 24 hours once the provider receives a removal order (see Article 14 of the draft). This is similar to a different recent regulation which requires reported terrorist content to be removed in less than 24 hours.

Q & A

At what point can knowledge of the content be assumed to have been obtained by the provider, is human knowledge required?

The draft does not specify that human knowledge is required, more detailed specification may be needed. The commission explained differences between removing of terrorist content : In contrast to terrorist content online (TN: abbreviated as TCO, regulated by this regulation), no matter in which context, CSAM is always illegal. Terrorist content online is usually distributed publicly on hosting services. CSAM instead is distributed via interpersonsal commnunication (2/3 of current reports are from interpersonal communication). Due to difference in distribution, different safeguards are necessary. Contrary to the TCO regulation, the draft does not provide crossboarder removal orders, as the draft is designed in a national context.

One member state requested the maximum duration for removal to be reduced to 1 hour, matching the TCO regulation. The commission is open to discussion, but considers 24 hours appropriate.

Will the right to an effective redress be affected by the obligation under art. 14 to execute a removal order within 24 hours?

Complaints of the providers against removal orders have no suspensive effect. Complaints do not free providers from the obligation to remove content.

My thoughts

To assert that CSAM is illegal in every case is very questionable, especially when taking false positives into account.

I'd very much like to know where the statistic from reports comes from, as I've been unable to find detailed statistics on this topic. From my research, collections of CSAM are mostly hosted in encrypted form on regular file services, while links and passwords for access are shared on secretive illegal forums dedicated to exchanging CSAM - one more reason why chatcontrol is useless.

Removal of material within 24 hours is already a very strict requirement which will most likely lead to overblocking. Further reducing the timeframe for removal (as suggested) to less than hour would hardly leave any alternative but automatically block content and ask questions later. This is made worse by the fact that complaints a removal order do not stop the removal order from being valid.

The only saving grace is that the removal orders are that are not automatically created, but must be approved by a judicial authority of the same member state as the one in which the service provider is located. As that is done manually, the process will hopefully reduce overblocking. But still, mistakes may occur.

How will information be exchanged?

Ctrl-F "Frage 49"

Article 39 (2) does not provide for the national law enforcement authorities to be directly connected to the information exchange systems. In which way will reports be passed on to national LEAs?

Commission's answer

(Translated)

Law enforcement agencies are not explicitly named, this may have to be added. Recital 55 mentions the necessity to cooperate. SIENA is not explicitly named to due to possible further development/name changes. The fact is that the new EU centre will forward reports to europol and national law enforcement agencies. Additional text may have to be inserted [into the draft].

My thoughts

SIENA is europols network for information exchange. Few technical details are publicly available, it doesn't even have an english wikipedia page (just a german one). It seems to be mostly be used to exchange messages, a german police website calls the system "the european outlook". In 2021 europol tweeted (nitter alt) that more than 1.5 million messages had been exchanged in that year. With an expected 8.8 million reports per year, even at 1 message per report the number of messages shared by SIENA will raise by a factor of 6. Is SIENA, and more importantly, the police, capable of handling this increase in messages?

Undercover in Fortnite

Here's the one somewhat positive thing I discovered in the leak (only tangentially related to the regulation): At the start of the session, there were multiple presentations ("Information by the Presidency"): (original in german, Ctrl-F "Fortnite")

The presentation showed an "undercover avatar" for preventive contact contact in Video Games (Fortnite), project supported by europol.

I got curious, searched and found it was a project (more explanation) where volunteers and police operated an avatar on Fortnite. Since most children are allowed to play video games unsupervised, it was possible for them to contact the avatar and tell about any abuse occuring at home, which made it to rescue them.

I think this shows an alternative to chatcontrol with great potential: It creates an easy way for children to get help, has a basically zero chance of false positives and does not require mass surveillance. Why can't we have more projects like this instead of chatcontrol?

End of my translation

I feel this covered the most important details. There is a lot more in this document, however it is more concerned with with various legal aspects (such as how the regulation relates to the planned Digital Services Act (DSA)).

If you want to read more, someone translated the full article to english here.

Summary

The leaks showed the commission either misunderstands or does not care about the consequences of its actions. It recognizes encryption as important, yet starts criticizing and undermining it in the next sentence. Later it clarifies that encryption must not prevent scanning.

The commission displays a lack of basic knowledge about statistics, mixing up accuracy and precision. While acknowledging the existence of false positives, the commission does not take into account their scale or impact and refuses to set a minimum required accuracy for detection tool.

There is little focus on human review, as it will not be mandatory for service providers to perform review of every report. Abuse of scanning technologies is also likely remain undetected, as compliance is supposed to be checked by national agencies, which obviously won't have the capacities to perform regular checks for every company.

It's unclear how may cases there will be and how information will be exchanged between law enforcement agencies.

Even though I didn't cover much of the legal aspects, it seems that the legality of this regulation is questionable and that it creates a legal minefield for companies as laws will vary by country.

The leak confirmed that there will be no exceptions even for small companies. Worse, the commission considers expanding the draft to cover search engines or reducing the time in which CSAM must be removed down to 1 hour.

Even though different companies already voluntarily support victims and scan communication, the commission claims that voluntary measures are not enough.

This is far from everything in the leak, however I consider this the most relevant parts. In case you want to read more: Someone fully translated the leak here. If you find something that I probably should've been mentioned but didn't, please contact me and I'll add it to this post.

Finally, I must note that all these issues caused by the new regulation are completely unnecessary, as that the regulation won't help to reduce child abuse or CSAM distribution. If you want to know more about the various issues and what to do about chatcontrol, read through the other post I've written on this topic.