IP Camera Viewer – iOS and macOS apps

Hi!

This blog post is the last in a series discussing an IP security camera system I built for my home. This post discusses the iOS and macOS apps that allow viewing of the video and motion events. The apps are written in Objective-C and Objective-C++. Here’s what is in the app (so far!):

The Internet communication to the server uses the gRPC system. gRPC takes files describing the data being sent to the server and generates code for both the server and the client. When a method is called on the stub on the client, a method in the server’s implementation object is called with the data passed to the stub. This simplifies the work of writing an Internet service. The gRPC system handles SSL encryption and various forms of authentication to protect the service on the server.

The main function of the app is a video player that plays the recorded video from the database. To play the video, first the timestamps have to be modified (see Managing h264 Presentation Timestamps (PTS)) then the AVFoundation AVSampleBufferDisplayLayer object is used to play the h264 video. This same object is available on macOS and iOS and it handles hardware-accelerated h264 video decoding.

Here’s what the iOS app looks like on an iPhone 7 simulator. This is the main screen the app launches to, which shows a live stream of the selected camera:

The timestamp at the bottom of the frame is based on the timestamp of the video segment from the database, plus the presentation timestamp of the currently displayed video frame. Zooming and panning the video is supported with touch gestures:

Here’s the date picker. It is displayed when the date on the screen is touched. It chooses whether to play back video from a specific date or to switch to live video:

The camera picker is displayed when the camera name is touched, and it makes it easy to change cameras:

 

Finally, the event picker, which shows all the times and locations when motion was detected. Touching an entry will start playing back video recorded at that time:

That’s it for now.

 

What is running on the camera server? (part 4- video service daemon)

Hi! This is the last thing of note running on the camera server- the video service daemon. This daemon handles requests for video from the iOS and macOS apps that are received over the Internet via the gRPC library. Here’s a diagram:

Incoming RPC calls are received from the iOS and macOS apps over an SSL channel. The sender is authenticated, then queries are executed in the database. The results of the queries are sent back, usually as stream replies (multiple responses) to the RPC calls. These reply streams are received by the iOS and macOS apps and the video or motion events are displayed. The video service daemon is multi-threaded to support the multiple services needed for the iOS and macOS apps and is written entirely in C++.

What is running on the camera server? (part 3- a big database)

You’ve dropped into a multi-post writeup about an IP camera system I put together for home use. This post describes the 10 terabyte database that stores 1 week of video, recorded 24/7 from 8 cameras, along with motion events.

As discussed in other posts, the recorded video is split into segments at keyframes. Each video segment is stored in a postgresql database “large object.” When creating a pgsql large object, an identifier is created which is stored in another table so that the large object can be retrieved later. This table is called “mpeg_segments” and the important fields are a timestamp, for when the segment began, the large object identifier, and the camera_id so that video can be separated from multiple cameras.

One of the problems with continuous video recording systems is that the video data is a firehose that never stops. To keep the disk containing the database from filling up, I installed the partman package. Partman partitions a pgsql table by date and can be configured to drop old tables. Each night after partman runs on my database and the oldest table is dropped, a cron job kicks off vacuumlo which deletes the orphaned large objects containing the video segments that were referenced by the deleted table.

Here’s a simplified diagram of the most important tables in the database schema and how they are used:

The events table is used to collect event notifications sent by the cameras when they detect motion.

What is running on the camera server? (part 2- motion event module)

Hai!

If you’ve been following this series, you’ll know we talked about the dbsink gstreamer plugin for archiving video into a database. Then we discussed the exciting world of h264 PTS timestamps, careful tweaking of which can make random-access video playback possible. In this post, I will discuss how motion events are captured from the IP cameras and stored in the database.

This turned out to be super-simple. Remember this diagram of the IP camera?

The Axis cameras I’m using can be configured to access a get URL on a server when motion is detected. I wrote a C/C++ module for the nginx web server to write a row in a database table when motion is detected.

The URL configured in the camera has an argument that indicates which camera detected motion. Postgresql NOTIFY is used to notify other listening database clients that a new motion event has been added to the database.

Managing h264 Presentation TimeStamps (PTS)

Hi!

Welcome to the series of posts I’m writing about my home security camera system. This post will discuss presentation timestamps, as they apply to an IP camera system.

In my IP camera system, the gstreamer rtsp plugin is used to receive data from the IP cameras. The resulting timestamps start at 0 when the camera starts streaming to the server, and the timestamps increase as long as the stream is connected. If the stream is ended and started again, the timestamps will reset back to zero. Since as discussed in my other post, the video player needs random access to the video, the video is split into segments when it is stored into the database. Each segment from each camera must be independently playable. I made this possible by tracking the pts offset, and modifying the timestamps so the timestamps for each video segment start at 0. This makes it possible to start playing at any video segment. In order to play video continuously, the video player modifies the timestamps again by adding the offset of the maximum timestamp of the previous segment to each timestamp in the current video segment.

Another reason why the captured timestamps are useless, besides the fact that they don’t start from zero, they can reset at any time if the camera restarts streaming, and also there are multiple cameras and their streamed timestamps are not in sync. Instead, the video segments contain their own pts timestamp sequence and the segments themselves are timestamped by the server so that video segments captured at any time can be queried accurately.

Here’s a mocked-up diagram of what the pts timestamps look like from the camera, in the database, and in the player:

In my next post, I will resume discussion of the software running in the server.

What is running on the camera server? (part 1- gstreamer pipelines with dbsink plugin)

Remember this diagram from the first post in this series:

You might be wondering, what is running on the server?

A few things. First, there is a camera capture pipeline for each camera. It captures the video stream from the camera, processes it, and puts it into the database. Motion events are collected through a web server module and also funneled into the database. Finally, there is a daemon that allows the video and motion events to be queried so the iOS and macOS apps and display live and recorded video.

Let’s take a look at one of the camera capture pipelines:

The camera capture pipelines are built with gstreamer, a GNU multimedia framework. The pipelines take the video from the camera and put it into the database. To make this work, I wrote a custom gstreamer plugin called “dbsink” in a mixture of C and C++ to process the video received from the camera through rtsp and insert it into the database. Before discussing what is in the plugin, let me present a little background on h264 video compression and playback, and then describe how the video is stored in the database.

  • h264 video consists of parameter packets, key frame packets, and delta frame packets.
  • To start to play the video, the decoder must be initialized with parameter packets and a key frame packet.
  • The number of delta frames between each key frame can vary, but typically represents a few seconds of video.

Parameter packets are only sent by the camera when it begins to stream. A streaming session may last hours or days (or ideally, forever). Keyframes are sent at intervals. However, the video player needs to be able to start playback at any time, and it can only play video after a keyframe is received, and it also needs the parameter packets before it can be initialized and play any video.

Solution: The video is stored in segments in the database. Each segment contains parameter packets, a keyframe packet, and all the delta frames up until the next keyframe. The player can start playback at the beginning of any segment.

Here’s what the video segments that are stored in the database, look like, to enable random-access playback:

Including the parameter packets and keyframe at the beginning of each video segment, results in independently playable video segments that each contain all the information needed to start playback. The packets are each timestamped and prepended with a size field before being stuffed into the segment stream. The segment streams are stored in the postgresql database as “large objects.” 

So, here’s what is inside the dbsink gstreamer plugin:

Basically, h264 binary data is often transported with “quote characters” (called emulation prevention bytes) so that the start codes can be detected cleanly. The dbsink plugin removes these characters and the start codes. Then, it identifies the h264 parameter packets and keyframe and caches them. Upon seeing a non-parameter and non-keyframe packet, it starts a new segment, and dumps out the stored parameter packets and the keyframe data. Then it starts copying over the other packets into the segment.

The next block in dbsink handles packet Presentation TimeStamp conversion. The packet wrapper comes next, followed by a database client that creates the postgresql large objects to store the video segments and inserts the rows tracking the video segment metadata into the mpeg_segments table.

In my next post, I’ll go into more detail about how the Presentation TimeStamps are handled.

What kind of IP cameras do I use?

I tried out IP cameras from three vendors. I ended up settling on cameras from Axis Communications. The hardware is great, and so is the software. Here is my experience with two other vendors:

Vendor X’s camera looked great on paper, but I didn’t like it! These kinds of cameras have a web server on-board that you go to in order to configure camera parameters, motion detection, h264 encoding, etc. With brand X, I was never able to get the motion detection feature to work reliably. Also, playing around too much in the on-board website could result in crashing the camera and it would have to be power-cycled to restart it. Also, the brand X camera I tried got incredibly hot in continuous use in a cool environment. I put it in my garage so I worried about how hot it would get on a hot day. The video quality wasn’t great during the day, and was blurry at night when the IR illuminators were enabled. This camera’s h264 encoding uses the most bandwidth (and storage in the database) of all the cameras I have.  Finally, I’ve had it only a few months, and I think the image quality has degraded some over time even though it has been indoors (outdoors can be a more difficult environment because of the sun)

I tried Vendor Y’s cameras. I was really impressed with the hardware quality. The image quality is great and the camera itself is sturdy and well-designed. It is also pretty small which is great for home use. The price could not be beat. The web configuration site on the camera is well-designed and worked well. The only software issue I had was that I could not get the camera to upload video to a server when motion was detected (but I didn’t end up needing this feature) I wanted to like this camera, but it could not compare with the camera software features and product breadth (amount of camera product styles) of Axis Communications.

Once I tried out Axis cameras, I was kind of stuck on them. They have such good software, good hardware, and a wide range of (pricy) products. The built-in motion detection allows you to create many detection and exclusion windows in the scene. You can configure the minimum size of an object to be detected. You can ignore swaying objects or things that go by very quickly like bugs at night. The camera has dozens of options for things it can do when motion is detected such as upload videos to a file share or send emails. The option I ended up using was the option to hit a web site. I run a web server on the camera network, which records the motion event into a database, when the cameras access a URL. The camera has configurable https for security and you can generate a self-signed certificate on-the-fly or upload a real one.

Here is a diagram of what’s going on in the cameras as I’ve configured them:

Originally I was thinking “Wow, this camera does everything. I should have this set up in a few weeks.” But it turned out to take much longer before everything was working together… Stay tuned!

So, I decided to set up some security cameras at home.

I bought a new house last year and I decided to install some cameras to monitor the property. I tried out a couple types of consumer cameras and I didn’t like the hardware, image quality, features, or apps, and I decided to make a whole big homebrew project out of setting up a camera system. I ended up building a pretty full-featured camera system that records video 24/7 from multiple cameras and detects and records motion events and has serviceable iOS and macOS apps. Over one week of video retention is available from eight cameras, with resolutions ranging from 1080p and up, with 10 TB of storage. This post post is the first in a series, that will describe my IP camera system.

Let’s start with a high-level overall network diagram:

I wanted the network architecture to keep the cameras away from the Internet, because of the recent DDoS attacks driven by hacked cameras. I used power-over-Ethernet (POE) cameras which are powered through their network cables. I decided to use this type of camera connectivity because, cameras need wires anyway to power them, and long term, wired networks are more reliable than wireless networks. I also didn’t want the cameras to be on my home WiFi network for security and performance reasons. Finally, since the camera power comes from a central location via PoE, it can be backed up with a single UPS.

By the way, the green blocks represent a bunch of software pieces I had to build, myself, to get things to be the way that I wanted them. I learned a ton about h264 compression and gRPC along the way and had fun as well. More details to come!