GStreamer RTP Session Handling in Rust

Recently we at Centricular had the opportunity to write some new GStreamer RTP elements in Rust as part of a grant from the Sovereign Tech Fund. We of course jumped at the chance.
The existing RTP elements are all currently written in the C programming language and have been around for over a decade. The age is showing somewhat with some of the architecture decisions and user visible interfaces.
There is also a general desire within the GStreamer community to write parsers and other elements handling potentially untrusted data in a much more memory safe language, like Rust. As a bonus, Rust comes with better tooling out of the box for developers than C/C++.

In this blog post we will be looking into how RTP sessions are managed in GStreamer, independent of the specifics of a particular RTP payload format.

The Prior Art: rtpbin

In GStreamer, the pre-existing interface for handling RTP sessions is managed by the rtpbin element. The rtpbin element is over 15 years old and has held up surprising well over that time, with features added as new use cases are discovered and implemented. rtpbin itself does not do much with any actual data but is more of a managing element that ensures that any required information is where it needs to be. The actual data processing occurs in child elements such as rtpsession, rtpjitterbuffer, rtpssrcdemux, rtpptdemux, along with the ability to inject other elements into rtpbin for other use cases. Some of these additional use cases include support for Retransmissions (RTX) (RFC 4588), and multiple types of Forward Error Correction (FEC).

With the benefit of hindsight, rtpbin does have some parts of its implementation that are not quite optimal in its design. A non-exhaustive list of things include:
  • Use of GStreamer elements for almost all functionality. Want a different FEC implementation? Write a GStreamer element conforming to a certain interface. Want to implement special parsing of packets per payload type (but not per SSRC)? Write a GStreamer element conforming to a certain interface. There is almost no new feature that cannot be solved by adding yet another element into rtpbin.
  • Synchronisation of multiple streams is split across multiple different components. rtpbin, rtpsession, and rtpjitterbuffer all play a part in the synchronisation calculations through GObject signals, function calls into other GObject signals.
  • RTCP scheduling of multiple RTP sessions requires one RTCP thread per RTP session that, in most cases, do almost nothing. Only waking up when RTCP needs to be sent to the peer. This has an impact on rtpbin elements that handle many hundreds or even thousands of sessions.
  • SFU and MCU use cases require manually constructing a pipeline with 'wormhole' elements in order to avoid a loop in the pipeline graph. This is due to rtpbin handling both sending and receiving in the same element.
  • Adding internal elements (implementing extra functionality) is done through very similarly named GObject signals. We cannot rename those signals or have the resulting element be placed in a different place in the pipeline as that would constitute an API/ABI break. Adding more signals also does not help.
Don't get me wrong, the flexibility of rtpbin is amazing but this flexibility is overkill for the vast majority of scenarios. It is also this flexibility that confuses many new users.

A Better Future

Some (but not all) of the above concerns with rtpbin are possible to fix without breaking API/ABI with some effort. There are however some unsolvable issues with the current design of rtpbin. In order to solve these issues, we need something new. If we are starting from scratch then we can address these issues in our new design. This is what the new rtprecv and rtpsend elements are intended to be. A better interface for implementing RTP sessions.

Two Elements: rtpsend and rtprecv

The first thing to note is that there are now two elements instead of one. In order to support server-side MCU/SFU use cases more effectively, the send half and the receive half of a single RTP session have been split into two separate elements. They are connected to each other by ensuring that the "rtp-id" property is the same before rtpsend and/or rtprecv progress to the READY state. In a SFU/MCU environment, it is very common to need to both send and receive data from multiple peers. Take for example the case of a simple RTP forwarder with two peers sending and receiving to each other with rtpbin:

rtpbin name=rtpbin \
udpsrc name=peer0-recv ! rtpbin.recv_rtp_sink_0 \
rtpbin.recv_rtp_src_0_{pt}_{ssrc} ! rtpbin.send_rtp_sink_1 \
rtpbin.send_rtp_src_1 ! udpsink name=peer1-send \
udpsrc name=peer1-recv ! rtpbin.recv_rtp_sink_1 \
rtpbin.recv_rtp_src_1_{pt}_{ssrc} ! rtpbin.send_rtp_sink_0
rtpbin.send_rtp_src_0 ! udpsink name=peer0-send


This pipeline will fail to start due to the loop in the graph between rtpbin.recv_rtp_src_0_{pt}_{ssrc} ! rtpbin.send_rtp_sink_1 and rtpbin.recv_rtp_src_1_{pt}_{ssrc} ! rtpbin.send_rtp_sink_0. Having different rtpbin elements for each peer also does not help this scenario. The graph loop can be avoided by using wormhole elements like appsrc and appsink, intersink and intersrc, proxysrc and proxysink, or similar elements.
With the new rtpsend and rtprecv there are no wormhole elements required to implement this functionality:

rtprecv name=rtprecv rtpsend name=rtpsend \
udpsrc name=peer0-recv ! rtprecv.rtp_recv_sink_0 \
rtprecv.rtp_src_0_{pt}_{ssrc} ! rtpsend.rtp_sink_1 \
rtpsend.rtp_src_1 ! udpsink name=peer1-send \
udpsrc name=peer1-recv ! rtprecv.rtp_recv_sink_1 \
rtprecv.rtp_src_1_{pt}_{ssrc} ! rtpsend.rtp_sink_0 \
rtpsend.rtp_src_0 ! udpsink name=peer0-send

RTCP Scheduling

Within rtpsend, scheduling of RTCP is implemented with an async Stream returning the RTCP data that needs to be sent to the peer. This easily allows combining multiple different sessions (even from different element instances) to use a single thread for scheduling RTCP. The downstream (blocking) push of these packets is implemented with a shared thread pool, also limiting the number of dedicated threads that are required for sending RTCP across multiple sessions.

Internally sans-IO

The entirety of the session management inside rtpsend and rtprecv is contained within a component that never accesses any external state. All of the external state (current time, incoming and outgoing packets) is passed as input arguments and output data and timeouts are polled etxernally from the session. This is the sans-IO design pattern. As a result, the internal RTP session implementation is extremely testable for the many different scenarios where it is required. Another interesting resource for sans-IO is this talk at FOSDEM in 2019
That way it is also possible to reuse the RTP session implementation elsewhere if the need arose, say in a threadsharing implementation or any other kind of implementation.

Performance

Some initial performance comparisons between rtpbin and rtpsend/rtpsend show that the new elements are slightly faster than rtpbin in the tested scenarios. This performance is with a reasonably naïve implementation in rtpsend and rtprecv. Performance of rtpsend and rtprecv can be improved further by adding support for buffer lists and batching interactions with the internal jitterbuffer. There are also some ideas on how the jitterbuffer 'thread' might be implemented more efficiently.

Synchronisation

rtprecv contains a dedicated component that performs the necessary synchronisation without involving the jitter buffer implementation, or involving many round trips through multiple different components. This results in a much cleaner, localised, more readable and more maintainable implementation.

rtp-types and rtcp-types

Combined with some work on some new RTP payloaders and depayloaders, we also produced two general purpose Rust crates for parsing and writing RTP and RTCP packets. The aim of both crates is to be as performant as reasonably possible with a user friendly API. The parsing of packets performs no memory copies of the data and every access to a particular field is computed from the underlying data. The rtp-types and rtcp-types crates are not GStreamer specific and can be used outside of GStreamer itself.

With the flexibility of Rust's type system, it is possible to write RTP packets using custom data types as the payload, the extension data, and output data. There are of course implementations for &[u8] as payload and extension data. Implementations also exist for writing to &mut [u8], a newly created Vec, or &mut Vec.

Within rtp-types there is some support for editing some parts of an RTP packet in place. Specifically, the values in the RTP fixed header can be edited and rewritten with no allocations. If one instead needs to modify the payload data, then a new RTP packet should be created using the RTP packet builder API.

rtp-types is also widely used by the new Rust RTP payloaders and depayloaders.

rtcp-types supports RTCP packet types for the RTP/AVP and RTP/AVPF profiles. This includes but is not limited to support for sender reports, receiver reports, SDES, BYE packets, APP packets, Transport Feedback, and Payload Feedback RTCP packets. The full list can be gleaned from the list of structs in the documentation at the crate root. Support for custom RTCP packets is also possible to implement externally if required by implementing a couple of traits. If there is an RFC for a particular RTCP packet then we would also like to support it upstream in rtcp-types.

Additional Functionality

Currently, within rtpsend and rtprecv there is no support for any kind of extra functionality. Things like Retransmissions (RTX), and Forward Error Correction (FEC) may need to have some interaction with the jitterbuffer and some careful design work needs to be done to seamlessly integrate these things. There are some preliminary ideas but nothing concrete at this stage.

Users

As part of the same work with the Sovereign Tech Fund, we also wrote a rtspsrc2 element as a RTSP client that can use the new session management handling presented here. Use in rtspsrc2 can be controlled with the USE_RTP2=1 environment variable while in the current testing phase.

The Future

Rewriting a mature RTP implementation is a complicated endeavour that requires many more months of effort to complete. If you would like to help make a secure, mature RTP implementation a reality please get in touch. Otherwise stay tuned for updates as we make this happen.

Comments

Popular posts from this blog

GStreamer 1.6 and OpenGL contexts

qmlglsink - GStreamer and Qt's QML

GStreamer WebRTC in 2020