Feng - the RTSP/RTP streaming server

Feng is a multimedia streaming server compliant with the IETF's standards for real-time streaming of multimedia contents over Internet. Feng implements RTSP – Real-Time Streaming Protocol (rfc2326) and RTP/RTCP – Real-Time Transport Protocol/RTP Control Protocol (rfc3550) supporting the RTP Profile for Audio and Video Conferences with Minimal Control (rfc3551).

Feng supports the following encoding standards:

The main characteristics of Feng are the container support, the ability to handle seeking, also used internally for the compositor metademuxer, and the modular structure focused to ease the extension of the codec and the protocol support.

Feng 2.1.0_rc1 is available, otherwise you may fetch the live sources from our public git tree

Feng is released under the GNU Lesser General Public License version 2.1

Dependencies

NOTE: the clients that are currently working out of box with Feng are VLC, Gstreamer, MPlayer. Currently only Gstreamer supports vorbis and theora playback, soon libnemesi will provide support for it.

Client compatibility list

  libnemesi Live555.com FFPlay GStreamer RealPlayer HelixPlayer QuickTime
General RTSP OK OK OK OK OK OK OK
Seek support Yes Partial1 No No Yes Yes Yes
Pause support Yes Partial1 Yes Yes Yes Yes Yes
General RTP/RTCP OK OK OK OK OK OK Partial2
MPEG Video 1/2 Yes Yes Yes Yes3 Partial4 No Yes
MPEG Audio Yes Yes Yes Yes Yes Yes Partial5
Vorbis Yes No No Yes No No No
Theora No No No Yes No No No
H.264 Yes Yes Yes Yes No No Yes
H.263/H.263+ Yes Yes Yes Yes Partial6 No Partial7
MPEG 4 Visual Yes Yes Yes Yes Partial4 No Yes
AAC Yes Yes Yes Yes Yes No Yes

(1) Live555 Seek and Pause implementation is not compliant with RFC 2326, so pause works only with seekable streams and seek may behave strangely or freeze the client.

(2) QuickTime does not support RTP/AVP/TCP interleaved but a proprietary form of rtsp tunnelling over http, so only UDP is supported

(3) Gstreamer fails to decode mpeg1/2 video when gst-ffmpeg is installed (it will be fixed in gst-plugins-good 0.10.7).

(4) Works on Windows and Mac only.

(5) MPEG Audio Layer-III (MP3) is not supported.

(6) H.263+ is not decoded correctly.

(7) Only standard resolutions (eg. CIF) seem to work.

Internals

Currently feng provides some sparse documentation about its internals you can generate documentation using doxygen from the git sources. The following text aims to give some additional high level insights about the structure before you dive into the doxy.

Feng structure

One of the main improvements over fenice is the modular design. Roughtly feng is splitted in its layers and in some functional modules. Each component will eventually provide some external hooks (e.g. rtcp feedbacks, scheduling behaviours) to enhance and experiment w/out having to dive in the full source.

Network layer: Socket management is implemented as a separated library called NetEmbryo. Feng is currently supporting UDP, TCP and SCTP

RTSP:

RTP:

RTCP:

Buffer management: The bufferpool system bridges the container parsing and packet producing modules to the delivery scheduler and then network layers. Through use of shared memory is possible use external helper applications like felix in order to have advanced features like distributed live streaming.

Mediathread: It encompass all the parsing and packetization activities beside rtp encapsulation. It could be split in two family of modules:

Parsers: provide facilities to packetize proper rtp payloads out of specific codec bitstreams.

Demuxers: read from traditional containers (like mkv, mov or nut) or more specific metademuxer like stream descriptions or editlists.

Authentication and Access

Other streaming servers support access and authentication in a quite complete and probably nice way. We could either try to come up with something similar (they tried to make it similar to apache way) or try to look at alternatives.

Methods

There are two metods defined, plain and digest. We should implement just plain and put it over tls, digest is a bit complex and tls should be better from a security point.

Access lists

There are various ways to describe who has access to what, usually you either create a list with paths and the users that have access to them, sometimes you define groups of users and then use them with the before mentioned lists, sometimes you put a list of users next to the content or even INSIDE the content you want to restrict/grant the access to. Since reinventing the wheel while there are others already around more or less rounder it's pointless, it's better try to consider what others did for their needs:

  • darwin and apache way [todo:describe it]
  • lighttpd way [todo:describe it]

Since we are aiming on the small and slick niche I think we could consider more and more the lighttpd way.

Initial underpinnings about auth are available already on gitweb

Live support

Planning

This page is just for planning.

What is live support

The idea is being able to stream a live event from a real time encoded stream. Sounds quite simple but has a series of caveats. You have different streams (video, audio, subs, metadata) that got produced by independent sources more than often and that must be pushed to the network as soon as possible. The fact that the sources are independent means that is our task give some timestamps to make things synchronous (e.g voice – lips or voice – text), seeking is not supported since it is a live event.

Resource details

Supporting live means that we have a resource that is particular:

  • Every client gets the same data, the data must not be consumed till the latest client gets it (bufferpool already does that)
  • The sdp remains the same across the different calls but the data used to produce it could be as long as gone when we receive later requests.
  • We need a way to present the multiple streams as a single resource (sd does that already)

Problems

  • How are we going to feed the stream to the server?
    • which format?
      • the streams must be elementary streams by definition
      • they are the output from an encoder (ffmpeg? vlc? gst?)
      • the best would be having them already packed in rtp
    • how to communicate?
      • name pipes?
      • sockets?
      • shared memory?
  • Do we need a specific demuxer?
    • Yes, possibly one that gets the sdp data and the rtp streams directly and just pushes them to the bufferpool with almost no work
    • No, we already have demuxer_sd that worked quite well in fenice even if it require lots of manual labour for h264.

Metademuxer

Metademuxers in short

A metademuxer act exactly as a standard demuxer, exposes the very same interface, but instead of reading from a file container it does any kind of manipulation over already mapped resources based on specific per resource configuration files.

Currently implemented

  • demuxer_ds : it simply collates a list of time referenced part of resources together. The resources MUST contain exactly the same kind of media (eg. same codecs, resolutions and parameters) and be seekable and the resulting resource is seekable too.

    Example configuration file

# Version 1 - free form text here
resource.mov 0.0 1.0
resource.ds 1.0 3.0
resource.nut 3.0 4.0

TODO: move the configuration to the unified yaml based configuration, use an uniform uri handler for them

  • demuxer_sd : it is basically the old fenice configuration file, it maps different live streams as a single resource.

Example configuration file

stream
    file_name mq:///video
    encoding_name MPV
stream_end
stream
    file_name mq:///audio
    encoding_name MPA
stream_end

Unified configuration syntax

Background

Fenice started with a simple and plain configuration file for the global settings and then another syntax was used for the stream description and in feng we had to add another for the compositions.

The overall results are the following:

  • we have three ugly hand made parsers for those syntaxes that had been almost fixed overall but remain pretty inflexible.
  • we'll need to have others once we need to express/expose other features to the end user.
  • users have to learn (and we have to document) many different syntax.

Abstract ideas

Main configuration

The main configuration must provide support for host and base virtuals, that means that the following structure should be presented: The main configuration will reside in a feng.conf file as was before, just the path would be $sysconfdir/feng/feng.conf and optionally additional sub configuration files (like module specific configurations) can stay in separate files and be included from the main feng.conf.

  • main
    • global settings
      • priviledges
        • user
        • group
      • max session
      • log tag
      • log file
    • addresses
      • address1
        • ports
          • tcp ports
          • sctp ports
          • name1
            • root paths
            • max session
            • log tag
            • log file
            • acl conf
    • module specific setting
      • module1
        • option1
    • include files/dir

It should allow (maybe in a second time) support for including additional configuration files (like the include directive present in many languages) and merge the informations.

Resource configuration

The resource configuration should provide demuxer specific settings and additional information useful for the stream. There is a group of settings that is common to every demuxer. The resource configurations will be in substitution of the current .sd and .ds files, it will have a magic string on the first line to let the demuxer layer know it is a resource configuration without relying on the extension to guess which kind of file it is. They will reside in the avroot tree, each file name and path will be used as resource identifier.

  • resource
    • aggregate
      • resource pointer stream_id
      • resource
        • …inline resource description
      • map
    • list
      • item
        • resource
        • time
          • start
          • stop
    • uri
    • metadata
      • name
      • author

      • cc tag
    • overrides
      • sdp specific settings

Implementation

The main idea is to use just a single syntax across the configurations. Resources, global settings, per path settings will share the same parser, that has to possibly be a robust library, we have limited resources and I'd rather spend our time implementing streaming features than minimalistic parsers while full featured, fast and complete ones are available already.

XML based syntax

Pros:

  • there are many stable and well known libraries
  • Many people know it already
  • there are tools around to validate the syntax given we create a schema/dtd for it.

Cons:

  • It could be quite complex to hand write.
  • Many people hate it

YAML based syntax

Pros:

  • It is widely available across many different languages
  • It is quite simple to compile by hand, yet as expressive as XML
  • Provides all the facilities XML offers

Cons:

  • there are just two C libraries available libyaml basically born as the core library for the python parser and syck again more focused on being used by other languages than being used directly.
  • The mentioned libs are quite robust for our usage but their release aren't frequent.