Transcoding

Data model

This section explains ffmpeg/fffw data model in details.

ffmpeg command line structure

Let’s look on short command line produced by fffw in Write your first command:

ffmpeg -loglevel level+info -y \
-t 5.0 -i input.mp4 \
-filter_complex "[0:v]scale=w=1280:h=720[vout0]" \
-map "[vout0]" -c:v libx264 \
-map 0:a -c:a aac \
output.mp4
First section contains common ffmpeg flags:
  • -loglevel - logging setup

  • -y - overwrite mode

Second part contains parameters related to input files:
  • -t - total input read duration

  • -i - input file name

After that there is a -filter_complex parameter that describes stream processing graph. In details we’ll discuss it in section Filter graph definition.

Next part contains codecs parameters:
  • -map - what is an input for this codec, input stream or graph edge.

  • -c:v - video codec identifier.

  • -c:a - audio codec identifier.

This section usually contains lot’s of codec-specific parameter like bitrate or number of audio channels.

The last part is output file definition section. Usually it’s just output file name (output.mp4) but it may contain some muxer parameters.

Filter graph definition

ffmpeg provides a very powerful tool for video and audio stream processing - filter graph. This graph contains filters - nodes connected with named edges.

  • filter is a node that receives one or more input streams and produces one or more output streams.

  • Each stream is a sequence of frames (video or audio)

  • Another node is an input stream: it is a starting node for graph that starts from decoder (a thing that receives chunks of encoded video from demuxer and decodes it to a raw image / audio sample sequence).

  • And the last type of node is a codec: it is an output node for graph that receives a raw video/audio stream from filter graph, compress it and pass to a muxer which writes resulting file.

There are two syntaxes to define edges between graph nodes:

  • Short syntax describes a linear sequence of filters:

    deint,crop=0:10:1920:1060,scale=1280:720
    

    This syntax has no named edges and means that three filters (deint, crop and scale) are applied subsequently to a single video stream.

  • Full syntax describes complicate graph filter:

    [0:v]scale=100:100[logo];
    [1:v][logo]overlay=x=1800:y=100[vout0]
    

    This syntax has named input stream identifiers ([0:v], [1:v]) and named edges ([logo], [vout0]) to have control about how nodes are connected to each other and to codecs.

Implementation

Let’s look how this command line structure is implemented in fffw.

Common ffmpeg flags

FFMPEG class is responsible for rendering common flags like overwrite or loglevel. There are a lot of other flags that are not covered by provided implementation and should be added manually via FFMPEG inheritance as discussed in Extending fffw.

from fffw.encoding import FFMPEG
ff = FFMPEG(overwrite=True)

Input file flags

Input files in fffw are described by Input which stores a list of Stream objects. When Input is a file, Stream is a video or audio sequence in this file. An Input could also be a capture device like x11grab or a network client like hls.

You may initialize Input directly or use input_file helper.

Each Stream can contain metadata - information about dimensions, duration, bitrate and another characteristics described by VideoMeta and AudioMeta.

For an input file you can set such flags as fast seek or input format.

from pymediainfo import MediaInfo

from fffw.encoding import *
from fffw.graph.meta import *

# detect information about input file
mi = MediaInfo.parse('input.mp4')

# initializing streams with metadata
streams = []
for track in from_media_info(mi):
    if isinstance(track, VideoMeta):
        streams.append(Stream(VIDEO, meta=track))
    else:
        streams.append(Stream(AUDIO, meta=track))

# initialize input file
source = Input(input_file='input.mp4', streams=tuple(streams))

# if no metadata is required, just use text variant
ff = FFMPEG(input='logo.png')

# add another input to ffmpeg
ff < source

Filter complex

FilterComplex hides all the complexity of properly linking filters together. It is also responsible for tracing metadata transformations (like dimensions change in Scale filter or duration change in Trim).

from fffw.encoding import *

ff = FFMPEG()

source = ff < input_file('input.mp4')
logo = ff < input_file('logo.png')

# pass first video stream (from source input file) as bottom
# layer to overlay filter.
overlay = ff.video | Overlay(x=1720, y=100)
# scale logo to 100x100 and pass as top layer to overlay filter
logo | Scale(width=100, height=100) | overlay

# output video with logo to destination file
output = overlay > output_file('output.mp4', VideoCodec('libx264'))
# tell ffmpeg that it'll output something to destination file
ff > output

Output files

ffmpeg results are defined by Output class, which contains a list of Codec objects representing video and audio streams in destination file encoded by some codecs.

  • Each codec has -map parameter which links it either to input stream or to a destination node in filter graph

  • Codec defines a set of encoding parameters like bitrate or number of audio channels. These parameters are not defined by fffw library and should be defined via inheritance as discussed in Extending fffw.

  • Codec list definition is followed by a set of muxing parameters (like format) and destination file name. There parameters are kept by Output instance.

  • FFMPEG may have multiple outputs.

from fffw.encoding import *
from fffw.graph import VIDEO

ff = FFMPEG(input='input.mp4')

split = ff.video | Split(VIDEO, output_count=4)

# define video codecs
vc1 = VideoCodec('libx264', bitrate=4_000_000)
split | Scale(1920, 1080) > vc1
vc2 = VideoCodec('libx264', bitrate=2_000_000)
split | Scale(1280, 720) > vc2
vc3 = VideoCodec('libx264', bitrate=1_000_000)
split | Scale(960, 480) > vc3
vc4 = VideoCodec('libx264', bitrate=500_000)
split | Scale(640, 360) > vc4

# add an audio codec for each quality
ac1, ac2, ac3, ac4 = [AudioCodec('aac') for _ in range(4)]

# tell ffmpeg to take single audio stream and encode
# it 4 times for each output
audio_stream = ff.audio
audio_stream > ac1
audio_stream > ac2
audio_stream > ac3
audio_stream > ac4

# define outputs as a filename with codec set
ff > output_file('full_hd.mp4', vc1, ac1)
ff > output_file('hd.mp4', vc2, ac2)
ff > output_file('middle.mp4', vc3, ac3)
ff > output_file('low.mp4', vc4, ac4)

Usage

To process something with fffw you need:

  1. Create FFMPEG instance

  2. Add one or more Input files to it

  3. If necessary, initialize some processing graph

  4. Add one or more Output files

  5. Run command with FFMPEG.run