Remember this diagram from the first post in this series:
You might be wondering, what is running on the server?
A few things. First, there is a camera capture pipeline for each camera. It captures the video stream from the camera, processes it, and puts it into the database. Motion events are collected through a web server module and also funneled into the database. Finally, there is a daemon that allows the video and motion events to be queried so the iOS and macOS apps and display live and recorded video.
Let’s take a look at one of the camera capture pipelines:
The camera capture pipelines are built with gstreamer, a GNU multimedia framework. The pipelines take the video from the camera and put it into the database. To make this work, I wrote a custom gstreamer plugin called “dbsink” in a mixture of C and C++ to process the video received from the camera through rtsp and insert it into the database. Before discussing what is in the plugin, let me present a little background on h264 video compression and playback, and then describe how the video is stored in the database.
- h264 video consists of parameter packets, key frame packets, and delta frame packets.
- To start to play the video, the decoder must be initialized with parameter packets and a key frame packet.
- The number of delta frames between each key frame can vary, but typically represents a few seconds of video.
Parameter packets are only sent by the camera when it begins to stream. A streaming session may last hours or days (or ideally, forever). Keyframes are sent at intervals. However, the video player needs to be able to start playback at any time, and it can only play video after a keyframe is received, and it also needs the parameter packets before it can be initialized and play any video.
Solution: The video is stored in segments in the database. Each segment contains parameter packets, a keyframe packet, and all the delta frames up until the next keyframe. The player can start playback at the beginning of any segment.
Here’s what the video segments that are stored in the database, look like, to enable random-access playback:
Including the parameter packets and keyframe at the beginning of each video segment, results in independently playable video segments that each contain all the information needed to start playback. The packets are each timestamped and prepended with a size field before being stuffed into the segment stream. The segment streams are stored in the postgresql database as “large objects.”
So, here’s what is inside the dbsink gstreamer plugin:
Basically, h264 binary data is often transported with “quote characters” (called emulation prevention bytes) so that the start codes can be detected cleanly. The dbsink plugin removes these characters and the start codes. Then, it identifies the h264 parameter packets and keyframe and caches them. Upon seeing a non-parameter and non-keyframe packet, it starts a new segment, and dumps out the stored parameter packets and the keyframe data. Then it starts copying over the other packets into the segment.
The next block in dbsink handles packet Presentation TimeStamp conversion. The packet wrapper comes next, followed by a database client that creates the postgresql large objects to store the video segments and inserts the rows tracking the video segment metadata into the mpeg_segments table.
In my next post, I’ll go into more detail about how the Presentation TimeStamps are handled.