2008/05/18

The Labs.Com Video Lab Raw Video Studio Specifications
Last update 2000/05/19
The Labs - Design & Functionality For The Net

RawVideoStudio: Specifications

There are still several formats and codecs available, yet we might use our own codecs for specials purposes (e.g. low-bitrate recording). But we definitly like to support MPEG2, Quicktime at least, among our own codecs. So there the RVS format specifications:
  1. RVS Format Proposal
  2. Order of Chunks
  3. Video Header
  4. Frame Header, Frame Chunk-Header & Frame Chunk
  5. Audio Header & Chunk
  6. Considerations
  7. Video Devices
  8. RawVideo API
  9. Codec Matrix
  10. Controllers
Specifications
1. RVS Format Proposal
Since we primary target interlacing video & audio we split both streams into chunks, to keep the design open, we propose:
  • each chunk starts with an ASCII line terminated by "\n":
    1. first word is chunk-name
    2. followed by nothing or whatever ASCII-content terminated by "\n"
  • next line contains the length of the chunk written in ASCII terminated by "\n"
  • next data are binary-data Comments start with "#" and should be ignored (ending with "\n").

    Sample:

     video 15,320,240 
     0 
     title 2001 Odysee 
     0 
     author Stanley Kubrik 
     0 
     frame jpeg 
     4048 
     <binary-data-chunk of 4048 bytes follows> 
     ... 

    Each chunk name which cannot be recognize should be ignored and skipped.

    Following chunk-names are proposed:

    videofirst chunk defining fps, width, height
    titletitle of stream
    authorauthor information
    copyrightcopyright notice
    framedefining codec of frames
    audiodefining codec of audio
    framestartframe start
    framechunkchunk of frame
    audiochunkchunk of audio
    nextfilefile point to next file (2GB filesize problem)

    First we just wanted to support JPEG (MJPEG) and PPM for picture-format, and PCM for audio, but then thought of leaving it open and supporting 'plug-in' for other compression formats. For the beginning we will just support MJPEG & PPM, and PCM for the audio.

    We may have to create an unique file-header, ie. "# RawVideo-version\n" or alike to have applications recognize the file.

  • Specifications
    2. Order of Chunks

    Interlaced:
    video
    frame
    audio
    framestart
    framechunk
    audiochunk
    framechunk
    audiochunk
    framechunk
    audiochunk
    framechunk
    ...
    framestart
    frame
    audio
    ...

    Random:
    video
    frame
    framestart
    framechunk
    framechunk
    framechunk
    framechunk
    ...
    audio
    audiochunk
    audiochunk
    audiochunk
    audiochunk
    audiochunk
    audiochunk
    ...

    Random:
    video
    frame
    framestart
    framestart
    framestart
    framestart
    framestart
    ...
    audio
    audiochunk

    There are two ways to "order" the chunks, either interlaced or random.

    Interlaced is for capturing and playing the best: we capture and get frames and audio at the same time and interlace the stream, so when playing back.

    Random is for best when adding or removing frames or audio, or creating video from still pictures.

    Since we propose both ways, we require to time-code each frame. To play random ordered video-files, one has to index all frames or audio, needless to say this only makes sense for small video-files.


    Specifications
    3. Video Header

    video fps,width,height,order

    fpsframes per second
    widthframe width
    heightframe height
    ordereither interlaced or random

    Specifications
    4. Frame Header, Frame Chunk-Header & Frame Chunk

    Header
     

    frame typetype: jpeg, rtjpeg, ppm, pgm, cmap, rgb555, rgb565, rgb24, yuv444, yuv422, yuv421, yuv420

    Chunk Header
     

    framestart
    lengthlength in bytes of the full frame
    binary-databinary-data, obviously starting with first frame-chunk

    Chunk
     

    framechunk
    lengthlength in bytes of the chunk
    binary-databinary-data of the chunk

    So far the frame-type is based on 'plug-in' encode & decoders, and should be open. We will implement jpeg & ppm (rgb24) for the beginning.

    Codec Overview
     

    Format:Compression:Comment:
    jpeg20-5:1jpeg
    rtjpeg20-10:1realtime jpeg codec (yuv 4:2:0)
    ppm1:1simple format just RGB (3 bytes, 24bpp)
    pgm3:1greyscale 8bpp
    grey3:1greyscale 8bpp
    rgb241:1RGB (3 bytes, 24bpp)
    rgb5553:2two bytes 15bpp (one bit undefined)
    rgb5653:2two bytes fully used 16bpp
    rgb3323:1one byte 8bpp
    cmap3:12^n colormap (1 upto 8 bpp) arguments: (int size, int cmap[size]) ie. cmap(16,000000,00ff00,...)
    yuv4441:1
    yuv4224:3
    yuv4214:3
    yuv4203:1.5

    The codec jpeg is prefered for long recordings, all others (except cmap and yuv's) are simple to implement.

    Datarate
     

    Format
    Frames/sec
    160x120320x240640x480
    10fps288KB/s2.3MB/s9MB/s
    15fps432KB/s3.4MB/s13.5MB/s
    20fps576KB/s4.6MB/s18MB/s
    25fps720KB/s5.7MB/s22.5MB/s
    30fps864KB/s6.9MB/s27MB/s
    50fps1.4MB/s11.5MB/s45MB/s
    60fps1.7MB/s13.8MB/s54MB/s

    JPEG Compression
     JPEG compression definitly helps us to record long videos, ie. 20 minutes without compression 320x240 @ 30fps gives 8.4GB file. Using LIBJPEG we got following table:

    Format:Rate:Compression:Quality:
    PPM 320x256255KB/frame1:1perfect
    JPEG 75% 320x25617KB/frame1:15very good
    JPEG 65% 320x25614KB/frame1:18very good
    JPEG 55% 320x25612KB/frame1:21good
    JPEG 45% 320x25611KB/frame1:23good
    JPEG 35% 320x2569KB/frame1:28reasonable
    JPEG 25% 320x2568KB/frame1:31fairly acceptable
    JPEG 15% 320x2566KB/frame1:42blocks, not good
    JPEG 10% 320x2565KB/frame1:51blocks, not good
    JPEG 5% 320x2563KB/frame1:85not usable

    RTJpeg (RealTime JPEG) produces according Justin Schoeman' tests 20:1 compression for 60fps @ 384x288, or 384x288@12.5fps then 253KB/s, instead of 4MB/s (ratio of 15:1). If we use the original LIBJPEG we have to check performance (encoders differs from RTJpeg and LIBJPEG)

    In general can be said, we reach surely 20:1 compression, at least 15:1 without losing too much. This means for a 20 minute 320x240 @ 30fps gives 420KB file (instead 8.4GB) which is a very reasonable reduction.

    Independent JPEG Group
    Library source-code
    JPEG.Org: Links
    More infos
    RTJPEG
    real-time JPEG library

    Other Compressions
     The other compressions with ratio 3:2 or 3:1 we don't really get much out of it, except the conversion is done quite fast unlike JPEG compression. For YUV compression we have to do some cumbersome color-transformation for recording and playing then, good coding required! Colormap using isn't really usable for video-recording, but may be usuable for frame-based computer generated videos (animated GIFs as example).

     RGB to YUV Conversion 
      
     Y  =  (0.257 * R) + (0.504 * G) + (0.098 * B) + 16 
     Cr =  (0.439 * R) - (0.368 * G) - (0.071 * B) + 128 
     Cb = -(0.148 * R) - (0.291 * G) + (0.439 * B) + 128 
      
     YUV to RGB Conversion 
      
     B = 1.164(Y - 16)                   + 2.018(Cb - 128) 
     G = 1.164(Y - 16) - 0.813(Cr - 128) - 0.391(Cb - 128) 
     R = 1.164(Y - 16) + 1.596(Cr - 128) 

    Specifications
    5. Audio Header & Chunk

    Header
     

    audio typetype:pcm(speed,bits,channels)
    adpcm(bits)
    mp3(bits)

    Chunk
     

    audiochunk
    lengthlength in bytes
    binary-databinary-data

    The audio-type is open and will be 'plugin' based. pcm (Pulse Code Modulation) is surely the most simple but most memory intensive, and mp3 (MPEG Layer 3) will be implement if we get hold of a simple library.

    Datarate
     

    Format /
    Rate
    8bit Mono8bit Stereo /
    16bit Mono
    16Bit Stereo
    8000Hz8KB/s16KB/s32KB/s
    11025Hz11KB/s22KB/s44KB/s
    22050Hz22KB/s44KB/s88KB/s
    44100Hz44KB/s88KB/s176KB/s

    Audio Codecs with Arguments
     

    Type:Arguments:Example:Description:Usage:
    pcmint speed, int bits, int channelspcm(22050,8,2)22.050kHz, 8 bits, Stereo16bit @ 22kHz or 44.1kHz lossless CD-quality, no compression at all (1:1)
    adpcmint bitsadpcm(4)if bits=4, then compression 1/4almost lossless compression (4:1)
    mp3int bitratemp3(64)64kb/s (8KB/s)128kb/s near CD-quality, good compression (10:1)
    lvocoderint bands, int start, int end, int lastlvocoder(12,50,5000,100)linear vocoder, 12 bands, (50Hz - 5kHz), 100 msec packetsvoice only, very high compression (100:1 or more)

    We likely only will implement pcm for the beginning.

    Specifications
    6. Considerations

    The ratio between frames and audio is about 100:1 (ie. 320x240 @ 20fps = 4.6MB/s vs 16bit-Mono @ 22kHz = 44KB/s) = 104:1 For that reason the frames require to be sub-splitted, each frame is splitted in ie. 4KB chunks or whatever.

    Recording
     Checking if the disk-writing is sufficient to save all frames, audio should be sufficient. Good timing-programming required.

    The 2GB-Filesize-Problem: allowing sequential ("filepointer" to next file) and parallel writing (junk1 -> filea, junk2 -> fileb, junk3 -> filea etc)

    One of the reason to define codec and argument in ASCII is to avoid little/big endian problematic. All frame & audio code are or should be little/big endian independent.

    Specifications
    7. Video Devices

    In the moment drivers for Bt848, QuickCam and few other cards are under development. Check Video4Linux Resources for updates.

    Specifications
    8. RawVideo API

    Function:Comment:
    RVVideoHeader *rvreadfile(char *fname);open video-stream
    rvclosefile(RVVideoHeader *rvh);close video-stream
    rvframeadd(RVVideoHeader *rvh, char *type, int len, void *data);add frame, type could be ie. "ppm"; all frames must be the same type
    rvaudioadd(RVVideoHeader *rvh, char *type, int len, void *data);add audio, type could be ie. "pcm(8,22050,2)"; all audio must be the same type
    RVVideoHeader *rvrecord(char *fname, char *dev,
    int (*stop_func()), int time,
    int fps, int w, int h,
    int bits, int speed, int channels);
    filename and device name (/dev/video0)
    stop-button, or time in msec
    frame info
    sound info
    RVFrame *rvframeread(char *fname)read single frame, ppm, jpeg and gif will be supported
    rvframefree(RVFrame *f)
    RVAudio *rvfetchaudio(RVVideoHeader *rvh)playing: get audio-chunk
    RVFrame *rvfetchframe(RVVideoHeader *rvh, int type)playing: get video-frame, type is preferred video-type
    type2itype(char *type)converts string like "ppm" to video-type RV_VIDEO_PPM (used in rvlib.h)

    Conversion of different video-types: we think of targetting rgb24 for MIT-XSHM playback as internal standard format, for that reason rvlib.h provides anytorgb24(type, src, dest), whereas type is RV_VIDEO_*, src and dest void* or unsigned char* pointers which automatically increment. For RV_VIDEO_JPEG we will add appropriate function-call. For now anytorgb24() is a macro for sake of speed (to avoid a function-call). There is also rgb24toany() available which doesn't support many types yet.

    PLEASE NOTE: this API is not final at all, it's a proposal and subject to change at any time. Once the package is programmed we will document the API fully.

    Specifications
    9. Codec Matrix

    We should be able to get any format to read/write we like, for that purpose each supported codec must support xxxtorgb24, rgb24toxxx or more:

    RGBxxx
     Following conversion will be provided: rgb32torgb24(), rgb555torgb24(), rgb565torgb24(), rgb24torgb32(), rgb24torgb555(), rgb24torgb565().

    YUVxxx
     Following conversion will be provided: yuv444torgb24(), yuv422torgb24(), yuv420torgb24(), rgb24toyuv444(), rgb24toyuv422, rgb24toyuv420().

    RTJPEG
     Assuming we have a video-device providing YUV420, since RTJPEG-codec only (for now) supports YUV420 we do not convert into RGB24 before compressing, but pass the entire frame to the compressor:

     YUV420 -> RTJPEG-COMPRESSOR -> RTJPEG 

    In case we have another device which just has RGB24, then we use:

     RGB24 -> YUV420 -> RTJPEG-COMPRESSOR -> RTJPEG 

    To play back a RTJPEG frame we use this pipe:

     RTJPEG -> RTJPEG-UNCOMPRESSOR -> YUV420 -> RGB24 

    Supported conversion: yuv420tortjpeg(), rtjpegtoyuv420().

    Based on this we get a codec-matrix:

    in-codec/
    out-codec
    rgb24rgb32yuv444yuv422yuv420rtjpegjpegmjpeg
    rgb24XXXXXXX
    rgb32XX
    yuv444XX
    yuv422XX
    yuv420XXX
    rtjpegXX
    jpegXX
    mjpegXX

    This is just theorectically, including current RTJPEG limitation of not producing rgb24 direct either way. With a small algorithm it should be determined how to get from "any" input frame-codec to "any" output frame-codec. Additional to each "X" (supported) should be a weight-value for conversion, 1 fast conversion, 2 twices as fast, etc. then the algorithm "walks" through the matrix trying to get the fastest conversion done and contructs then the conversion-pipe.

    Specifications
    10. Controllers

    Additionally also controllers (ie. brightness, contrast, saturation, cropping) should be considered as "format-converters" they just don't convert but apply additional effects and adjustments unto the frame.

     RTJPEG -> RTJPEG-UNCOMPRESSOR -> YUV420 -> BRIGHTNESS -> RGB24 

    List of "controllers":

    • brightness (best using with YUV)
    • saturation (best using with YUV)
    • contrast (best using with YUV)
    • cropping (cut-out or filled with color)
    • border
    • rescale
    • etc.
    It should be avoided to implement more sophisticated effects, because the requirement for every conversion is: real-time, and we speak of 50fps here :-)

    Since RGB24 is our prefered internal format, we will implement all controllers with RGB24 format, and YUV444 too (should be not much changes from RGB24).

                                                                                                                                       

    Raw Video Studio

    Last update 2000/05/19

    All Rights Reserved - (C) 1997 - 2008 by The Labs.Com

    Top of Page

    The Labs.Com