RPKI Repository Analysis and Requirements
draft-tbruijnzeels-sidr-repo-analysis-00

Abstract

The current RPKI Resource Certificate Repository Structure (RFC6480 & RFC6481) uses rsync as both a delta and transfer protocol. Concerns have been raised about the scalability of this repository and the reliance on rsync. This document provides an analysis of these concerns and formulates requirements for future work.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on October 13, 2013.

Copyright Notice

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.

1. Requirements notation
2. Introduction
3. Concerns With current repository
3.1. Scalability of rsync(d) deltas
3.2. Update frequency and propagation times
3.2.1. Migrating to another ASN
3.2.2. Error in ROA
3.2.3. BGPSec
3.3. Lack of rsync standard and implementations
3.4. Inconsistent Responses
3.5. Single publication point per CA
3.6. Scalability through hierarchical fetching
4. Delta Protocol Requirements and Recommendations
4.1. Transport Agnostic
4.2. Support Publication Sets
4.3. Support non-hierarchical repository lay-out
4.4. Expected factors affecting repository load
4.4.1. Disclaimer
4.4.2. Size aspects of the global RPKI
4.4.3. Churn
4.4.4. Number of Relying Parties
4.4.5. Fetch frequency of Relying Parties
4.5. Expected RPKI Repository Requirements
4.5.1. Objects and Relying Parties
4.5.2. Update related throughput
4.5.3. Update related concurrency
4.5.4. Update related traffic volume
4.6. Reduce Load on Central Repositories
4.7. Update notifications
4.8. Reduce Churn
4.9. Signed Deltas
5. Security Considerations
6. Acknowledgements
7. Normative References
Authors' Addresses

1. Requirements notation

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119] .

2. Introduction

The current RPKI Resource Certificate Repository Structure (RFC6480 & RFC6481) uses rsync as both a delta and transfer protocol and recommends that repositories be set up in a hierarchical way such that relying parties (validation tools) can fetch all updates in a single repository efficiently while performing top-down validation.

This structure has its benefits. In particular it has allowed for early deployment of RPKI without the need to re-invent a delta protocol, and this has allowed early adopters of RPKI to build up operational experience more quickly. The delta protocol also has benefits for relying party tools, allowing them to quickly retrieve what’s new in a repository limiting fetch time and bandwidth usage.

Having said this, operational experience, as well as lab testing, have shown that there are concerns with regards to the current infrastructure that justify that the WG thinks about improvements in this space.

3. Concerns With current repository

3.1. Scalability of rsync(d) deltas

Rsync is a very efficient tool when used 1:1 between a client and server. The problem is that in a globally deployed RPKI we can expect in the order of 40k clients, roughly corresponding to the number of ASNs, to connect regularly to a repository server.

When the rsync built-in delta protocol is used (recursive fetching), the server is computationally involved in calculating the delta for each connected client. Performance measurements in a lab have shown that the maximum number of clients that can be processed per second (throughput), and the maximum number of concurrent clients are both linearly dependent on the repository size. The throughput is limited by server CPU. Concurrency is limited by server memory.

As a result a sufficiently large repository has to invest heavily in running multiple rsyncd instances to cope with the expected regular load of a large number of clients, and to counter the risks of DDoS attacks.

3.2. Update frequency and propagation times

The retrieval of signed objects is described in RFC6480 (section 6). There are no formal limits imposed by this informational RFC on the update frequency, but to prevent the overloading of repository servers as described above, the typical update interval of current tools is between 1-24 hours.

In previous discussions in the WG it was suggested that human scale propagation times (i.e. up to 24 hours) are good enough for the problem that we are trying to solve. There are however good reasons why much faster retrieval of newly signed objects is desirable.

3.2.1. Migrating to another ASN

In this scenario, a ROA exists for a prefix and ASN, but the prefix needs to be announced from another ASN.

In many cases the RP can foresee this and create an appropriate ROA well in advance, but there are also failure cases possible where this is not foreseeable.

3.2.2. Error in ROA

The CA operator made a mistake when it created a ROA. The ROA causes announcements that should be considered VALID to appear as INVALID, or vice versa. The CA would like to take appropriate action and revoke the ROA or issue additional ROAs. However, RPs that have received the mistaken ROA may not see these updates for some time.

3.2.3. BGPSec

The BGPSec protocol is still being discussed. However there are indications that it would be desirable if propagation times for new router certificates and CRLs could be reduced. In particular in the context of:

Planned router key roll-overs
Unplanned roll-out of new router hardware with new keys
Replay protection strategies that rely on having shorter cycles & propagation times for router certs and keys

3.3. Lack of rsync standard and implementations

There is only one known implementation of rsync and the standard is not described by any RFC. The implementation is non-modular, making it impossible to use the code as a library even when coding in the same language.

As a result all current implementations of relying party tooling have had no option but to use rsync as a pre-installed external process.

This has several major draw backs for the quality of implementations:

RP tools require that rsync is installed on a system, and they can not assume which version is installed.
Because there is only one implementation all RPKI repositories can be affected by a single bug or exploit in rsync.
Calling an external process is expensive limiting the benefits that can be gained from parallel processing.
Parsing downloaded objects is inefficient -- objects have to be downloaded to disk first before they can be read and parsed.
Dealing with errors is complicated -- exit codes are not always clear, stderr may have to be parsed. Exit codes and messages are not guaranteed to be the same across rsync versions.

3.4. Inconsistent Responses

An 'inconsistent' set of objects is a set of retrieved objects for a CA Certificate where there differences between the objects retrieved and the objects mentioned on the corresponding manifest. If any objects are missing, or if additional objects not mentioned on the manifest are found, or if any of the objects does not match the sha256 hash mentioned on the manifest, then the set as a whole is considered inconsistent. RFC6486 has text advising RPs on possible ways to treat each of these cases. However, there is a large degree of uncertainty as to how different RP tools, and operators, will deal with these corner cases because most decisions are left to local policy. This can lead to inconsistent and possibly surprising differences in the validation of RPKI data.

Missing ROA objects can be particularly problematic because other ROAs, that can be found and validated, may invalidate announcements that would have been marked as valid by these missing ROAS. Additional ROA objects are confusing because to the RP it's not clear whether this ROA was intentional and the MFT is out of date, or not.

The use of rsync as a delta protocol is problematic in this context, because rsync is non-transactional. As a result an RP may get partially updated CA repository objects if it happens to fetch while the objects on disk are being updated. This is confusing to the RP who can not tell the difference between this and a persistent error on the publisher side, or an attack involving partial replay or withholding of objects.

All of this adds up to a repository infrastructure and corresponding validation rules that leave a high degree of uncertainty in case of corner cases. The authors believe that it would be better to (1) improve the standards so that these corner cases are less likely to occur, and (2) formulate much stricter validation rules so that the uncertainty with regards to how RPs may deal with corner cases is further reduced.

3.5. Single publication point per CA

In the current design only publication point per CA is envisioned.

Even though such a publication point may employ various techniques to achieve high-availability, this leaves concerns with regards to:

Attacks on DNS for the publication point
A contractual tie-in between CA and publication server, with no way for planned migration (while staying up to date)
Failure of the publication server
(Legal) attacks on the publication server

3.6. Scalability through hierarchical fetching

The notion that child CAs can publish in a sub-directory of their parent CA publication point has been suggested as mitigation strategy for scalability of fetching RPKI data using rsync.

There are a number of reasons why this hierarchical model may not be advisable or even possible:

The parent CA may not wish to imply responsibility over objects issued by its child
Recursive rsync fetches on sufficiently large repositories are expensive. The parent CA may have no choice but to disallow recursive fetching to mitigate its DDoS vulnerabilities
CAs may wish to publish their own content, or they may wish to publish their content in a repository that is provided by an organisation other than their parent CA.

4. Delta Protocol Requirements and Recommendations

4.1. Transport Agnostic

A future delta protocol should be transport agnostic, allowing agility in future transport protocols between RPs and repositories, and sharing of deltas between RPs.

4.2. Support Publication Sets

A future delta protocol should enable CAs to publish new objects as a set so that errors in evaluating route origin validity as a result of incomplete information may be avoided as much as possible.

4.3. Support non-hierarchical repository lay-out

The scalability of a future delta protocol should not depend on a hierarchical repository lay-out. This is particularly important if one considers the possibility of third party publication servers and/or the possible use of mirror repositories. In both cases the CAs for which objects are published can most likely not be considered children of the publication server, or each other.

4.4. Expected factors affecting repository load

4.4.1. Disclaimer

The numbers cited below reflect our best current estimates based on relevant statistics currently at our disposal. They are intended to provide context for load testing proposed solutions.

We are of course open any suggestions and real world statistics that can improve these estimates.

4.4.2. Size aspects of the global RPKI

4.4.2.1. Mirroring

For the purpose of scalability it would be prudent to assume that mirroring should be supported in the RPKI to the point where one publication server can, in principle, mirror the complete global RPKI. In reality this may not happen to this extent, but any design that can support this should be adequate to support smaller numbers.

4.4.2.2. Number of CAs in the global RPKI

Assuming that only current RIR member organisations that are holding IPv4, IPv6 and/or ASN resources would act as Certificate Authorities (CA), the expected number of CAs in the global RPKI is expected to be around 50.000. However, if one also considers holders of Provider Independent (PI) resources this number may be larger. For reference: the RIPE NCC has roughly 25.000 PI prefixes registered, vs just roughly 9000 regular members. If these numbers are similar for all regions, and we assume the 'worst case' where all PI holders have their own CAs, then we are looking at number that is roughly four times larger: i.e. 200.000 CAs.

Note that each organisation will most likely find all their resources on one certificate, however in case an organisation holds resources from multiple parent sources more than certificate may be needed. For the moment we will assume that the number of CA certificates per organisation will be close to 1.

For each CA certificate 4 objects will be published in the global RPKI: the CA certificate itself (by the parent CA), one manifest, one CRL and one ghostbuster record.

4.4.2.3. Number of ROAs in the global RPKI

The number of ROAs in the global RPKI does not depend on the number of CAs. A small organisation may have only 1 ROA, while a large organisation will need many. Instead it is expected that the number of ROAs is related to the number of intended announcements that are seen in the global BGP. The current routing table has roughly 500.000 such announcements, but the size of the table has been growing steadily.

It should be noted that ROAs can be used to authorise more than one announcement, but there are restrictions:

The ASN must be the same.
The prefixes must all be held by the CA.
Furthermore the prefixes must occur on the same parent certificate. In other words: if an organisation has signed resources from more than one source they can not be aggregated on the same ROA.

Statistics for the RIPE region indicate that an aggregation factor of 3 announcements per ROA is reasonable. This would put the expected number of ROAs in the order of 200.000.

4.4.2.4. Number of router certificates in the global RPKI

The number of router certificates depends on the number of keys that will be used by BGPSec speaking routers.

At a specific ASN, different physical BGPSec speaking routers MAY use the same key, and therefore may require only one certificate for that key. On the other hand to support BGPSec roll-overs it may be advisable to publish not one, but two keys at the same time. Plus some operators may choose to use unique keys per physical router.

All in all it is not entirely clear to the authors how many certified keys may be, but on list numbers as high as 2.000.000 have been mentioned.

4.4.2.5. Total number of objects in the global RPKI

Using the number of objects cited in the previous sections, we can describe the total number of objects in the RPKI with the formula:

Ototal = #CAorganisations * #Avg_CAcert_per_organisation * 4 + #ROAs + #Router Certs

Ototal = 200k * ~1 * 4 + 200k + 2M = 3M

4.4.2.6. Total size of objects in the global RPKI

Based on the current repositories deployed by the RIRs we find these average sizes for different object types:

type	size (bytes)	size in model
CA certificate	1416	1.5 kB
Manifest	1951	2 kB
CRL	692	0.7 kB
ROA	1846	2 kB
Ghostbuster record	unknown, expect similar to ROA	1.5 kB
Router certificate	unknown, expect similar to CA certificate	2 kB

We use rounded off decimal numbers for our calculations for simplicity, and because our predictions are intended to give an idea of the expected order of magnitude of the repository size only.

Using these numbers we can predict a global repository size with the formula:

Stotal = #CAorganisations * ( CA_certificate_size + MFT_size + CRL_size + GB_size) + #ROAs * ROA_size + #Router Certs * Router_Cert_size

Stotal = 200k * ( 1.5k + 2k + 0.7k + 2k) + 200k * 2k + 2M * 2k

Stotal = 4.6G

4.4.3. Churn

The daily churn in the RPKI, i.e. the amount of new objects we're expected to see, per 24 hours is another important factor to consider in the context of scalability

The current RIR managed RPKI services typically update MFT and CRLs for each CA every 8 hours, accounting for a churn of 200k * 2 * (24 hours / 8 hours) = 1.2 M objects per 24 hour (833 per minute). The volume of this churn is expected to amount to 200k * (2kB + 0.7kB) * (24 hours / 8 hours) = 1.6 GB per 24 hour = 1.1 MB per minute.

The expected churn in ROAs and router certificates are expected to depend on:

The amount of new planned announcements in BGP
The average number of routers requiring new keys being rolled-out daily.
BGP Sec key roll-overs

We have no clear idea about these numbers at this time, but we expect this number to be relatively small compared to the churn rate caused by republishing MFTs and CRLs.

4.4.4. Number of Relying Parties

It seems plausible that in a full deployment scenario each ASN will run at least two RP tool instance (one back-up).

There are currently around 40.000 ASNs in the global BGP, so this would suggest a number of 80.000 distinct client RPs accessing repositories.

Others have suggested that RPs can use (file) sharing techniques to reduce their dependency on central repository servers. If this approach would be deployed this could reduce the number of RPs that central repository servers have to serve. We expect though that this sharing will be mostly used to ensure redundancy with an ASN, and much less between ASNs.

4.4.5. Fetch frequency of Relying Parties

The repository servers have little control of the fetch frequencies used by Relying Parties. As mentioned in section 3.2 Relying Parties have an interest in fetching new information much more frequently than they do currently. It's not clear right now what frequency will be most common in a full deployment scenario. We expect though that the desired update frequency will be in the order of every ten minutes. This seems to be in-line with operator time scale changes that would have to be made in BGP and the RPKI.

4.5. Expected RPKI Repository Requirements

4.5.1. Objects and Relying Parties

It should be noted that repository servers have no control over relying parties. RPs are responsible for their own infrastructure and keeping up to date. Well functioning RPs will try to stay up to date at all times, while avoiding to overload the server(s). Badly configured RPs may however fail to retrieve updates, or they may insist on checking for updates at a well-above average rate and cause additional server load.

Having said that we believe that we can stipulate some ball park parameters that large repositories should be prepared to deal with based on the estimates mentioned in the previous section.

The numbers below all assume averages of well behaved RPs. In reality repositories will have to deal with peak loads that may result from a number of different factors, like:

A large number of updates is available
The number of RP connections is not evenly distributed
There is an attack on the server

Defining these factors in formulas is however fairly complicated. The authors believe that it is a more pragmatic and useful strategy to take the naive estimates defined below as a starting point and require that new protocols are load tested to a degree where we can be confident that new implementations will be able to meet the normal load requirements easily, as well as peak load conditions that may exceed normal load by factors of 5-10.

4.5.2. Update related throughput

We define "throughput" as the total number of RP connections that the repository can server per minute. Note that this does not imply anything about the time each connection takes.

By definition the server has to be able to process a number of connections per time unit that is at least equal, and preferably comfortably bigger, than the number of new connections that are expected over that time unit. Failure to meet this number will inevitably lead to a build-up of client connections to the point where the server will no longer be able to accept new connections

Based on 80k RP tools fetching updates every 10 minutes we may assume that a throughput number of 8k connections / min. is the bare minimum that needs to be supported.

4.5.3. Update related concurrency

Another interesting load factor is given by the number of expected concurrent connections. A naive formula for this number is given by:

#Concurrent connections = #New Connections (conn / minute) * # Avg processing time (min / conn)

E.g. if it would take 30 seconds to process the average connection, we would need to support:

8k conn/min * 0.5 min/conn = 4k concurrent RPs

4.5.4. Update related traffic volume

Based un RPs getting deltas alone we expect that the volume of data that the repository server has to serve per minute can be determined by the formula:

Vol_min = Churn_vol_min * #RPs

Vol_min = 1.1 M/min * 80k = 88 GB/min

4.6. Reduce Load on Central Repositories

In a full deployment scenario a large number of RPs are expected to approach a single repository server regularly. Estimates of how large this number of RPs is, and how regularly they will fetch updates, vary. However as a starting point one might expect one RP tool for each ASN, currently 40k, to fetch updates every five minutes. Whatever the real numbers may be it should be noted that the repository server has very little control over these numbers.

Because of this it makes sense to look into a delta protocol where the number of clients and frequency of fetching, has the least possible effect on the central repository server. E.g by enabling pre-computing of updates and offloading to caches or CDNs.

Although this may result in a protocol that causes the Relying Party to do more work, the trade-off of offloading CPU cycles to a large number of frequently polling RPs as opposed to spending CPU on the server is expected to scale much better.

4.7. Update notifications

Higher update frequencies and shorter propagation times are desired. On the other hand it would also be good if unnecessary synchronisation attempts were prevented to reduce load. For this reason a delta protocol would do well to support update notifications to RPs. Both push and poll based strategies may be used for this purpose.

4.8. Reduce Churn

A large part of the churn in the RPKI is caused by the regular republishing of Manifests and CRLs. If this frequency could be reduced without compromising security, such as an RP's sensitivity to replay attacks, then the total load on repository servers could be reduced significantly.

4.9. Signed Deltas

It should be noted that although RPs retrieve objects from untrusted sources, these objects are cryptographically validated. In other words a publisher, or monkey-in-the-middle, can not mislead the RP and generate valid objects without having access to the associated private keys. Having said that, this still leaves RPs vulnerable to attacks where information is withheld or replayed. RPs can notice such attacks if they rely on manifests to inform them about:

which objects should be expected
when the next update is expected at the latest

If deltas were signed it would be possible for RPs to detect attacks or transport errors sooner. However, signing deltas comes at a cost of complexity. In particular it will be difficult to communicate keys used for signing in a secure and dynamic (allow rolls) way. The benefit seems too limited to warrant this.

5. Security Considerations

TBD

6. Acknowledgements

TBD

7. Normative References

[RFC2119]

Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.

Authors' Addresses

Tim Bruijnzeels RIPE NCC EMail: tim@ripe.net

Oleg Muravskiy RIPE NCC EMail: oleg@ripe.net

Bryan Weber Cobenian EMail: bryan@cobenian.com