Transcoding¶
Data model¶
This section explains ffmpeg
/fffw
data model in details.
ffmpeg command line structure¶
Let’s look on short command line produced by fffw
in Write your first command:
ffmpeg -loglevel level+info -y \
-t 5.0 -i input.mp4 \
-filter_complex "[0:v]scale=w=1280:h=720[vout0]" \
-map "[vout0]" -c:v libx264 \
-map 0:a -c:a aac \
output.mp4
- First section contains common
ffmpeg
flags: -loglevel
- logging setup-y
- overwrite mode
- Second part contains parameters related to input files:
-t
- total input read duration-i
- input file name
After that there is a -filter_complex
parameter that describes stream
processing graph. In details we’ll discuss it in section Filter graph definition.
- Next part contains codecs parameters:
-map
- what is an input for this codec, input stream or graph edge.-c:v
- video codec identifier.-c:a
- audio codec identifier.
This section usually contains lot’s of codec-specific parameter like bitrate or number of audio channels.
The last part is output file definition section. Usually it’s just output file
name (output.mp4
) but it may contain some muxer
parameters.
Filter graph definition¶
ffmpeg
provides a very powerful tool for video and audio stream processing -
filter graph. This graph contains filters
- nodes connected with named
edges
.
filter
is a node that receives one or more input streams and produces one or more output streams.Each
stream
is a sequence of frames (video or audio)Another node is an
input stream
: it is a starting node for graph that starts from decoder (a thing that receives chunks of encoded video fromdemuxer
and decodes it to a raw image / audio sample sequence).And the last type of node is a
codec
: it is an output node for graph that receives a raw video/audio stream from filter graph, compress it and pass to amuxer
which writes resulting file.
There are two syntaxes to define edges between graph nodes:
Short syntax describes a linear sequence of filters:
deint,crop=0:10:1920:1060,scale=1280:720
This syntax has no named edges and means that three filters (
deint
,crop
andscale
) are applied subsequently to a single video stream.Full syntax describes complicate graph filter:
[0:v]scale=100:100[logo]; [1:v][logo]overlay=x=1800:y=100[vout0]
This syntax has named input stream identifiers (
[0:v]
,[1:v]
) and named edges ([logo]
,[vout0]
) to have control about how nodes are connected to each other and to codecs.
Implementation¶
Let’s look how this command line structure is implemented in fffw
.
Common ffmpeg flags¶
FFMPEG
class is responsible for rendering
common flags like overwrite
or loglevel
. There are a lot of other flags
that are not covered by provided implementation and should be added manually
via FFMPEG
inheritance as discussed in Extending fffw.
from fffw.encoding import FFMPEG
ff = FFMPEG(overwrite=True)
Input file flags¶
Input files in fffw
are described by
Input
which stores a list of
Stream
objects. When Input
is a
file, Stream
is a video or audio sequence in this file. An Input
could
also be a capture device like x11grab
or a network client like hls
.
You may initialize Input
directly or use
input_file
helper.
Each Stream
can contain metadata - information about dimensions, duration,
bitrate and another characteristics described by
VideoMeta
and
AudioMeta
.
For an input file you can set such flags as fast seek
or input format
.
from pymediainfo import MediaInfo
from fffw.encoding import *
from fffw.graph.meta import *
# detect information about input file
mi = MediaInfo.parse('input.mp4')
# initializing streams with metadata
streams = []
for track in from_media_info(mi):
if isinstance(track, VideoMeta):
streams.append(Stream(VIDEO, meta=track))
else:
streams.append(Stream(AUDIO, meta=track))
# initialize input file
source = Input(input_file='input.mp4', streams=tuple(streams))
# if no metadata is required, just use text variant
ff = FFMPEG(input='logo.png')
# add another input to ffmpeg
ff < source
Filter complex¶
FilterComplex
hides all the
complexity of properly linking filters together. It is also responsible for
tracing metadata transformations (like dimensions change in Scale
filter or
duration change in Trim
).
from fffw.encoding import *
ff = FFMPEG()
source = ff < input_file('input.mp4')
logo = ff < input_file('logo.png')
# pass first video stream (from source input file) as bottom
# layer to overlay filter.
overlay = ff.video | Overlay(x=1720, y=100)
# scale logo to 100x100 and pass as top layer to overlay filter
logo | Scale(width=100, height=100) | overlay
# output video with logo to destination file
output = overlay > output_file('output.mp4', VideoCodec('libx264'))
# tell ffmpeg that it'll output something to destination file
ff > output
Output files¶
ffmpeg
results are defined by
Output
class, which contains
a list of Codec
objects representing
video and audio streams in destination file encoded by some codecs.
Each codec has
-map
parameter which links it either to input stream or to a destination node in filter graphCodec defines a set of encoding parameters like
bitrate
or number of audio channels. These parameters are not defined byfffw
library and should be defined via inheritance as discussed in Extending fffw.Codec list definition is followed by a set of muxing parameters (like
format
) and destination file name. There parameters are kept byOutput
instance.FFMPEG may have multiple outputs.
from fffw.encoding import *
from fffw.graph import VIDEO
ff = FFMPEG(input='input.mp4')
split = ff.video | Split(VIDEO, output_count=4)
# define video codecs
vc1 = VideoCodec('libx264', bitrate=4_000_000)
split | Scale(1920, 1080) > vc1
vc2 = VideoCodec('libx264', bitrate=2_000_000)
split | Scale(1280, 720) > vc2
vc3 = VideoCodec('libx264', bitrate=1_000_000)
split | Scale(960, 480) > vc3
vc4 = VideoCodec('libx264', bitrate=500_000)
split | Scale(640, 360) > vc4
# add an audio codec for each quality
ac1, ac2, ac3, ac4 = [AudioCodec('aac') for _ in range(4)]
# tell ffmpeg to take single audio stream and encode
# it 4 times for each output
audio_stream = ff.audio
audio_stream > ac1
audio_stream > ac2
audio_stream > ac3
audio_stream > ac4
# define outputs as a filename with codec set
ff > output_file('full_hd.mp4', vc1, ac1)
ff > output_file('hd.mp4', vc2, ac2)
ff > output_file('middle.mp4', vc3, ac3)
ff > output_file('low.mp4', vc4, ac4)
Usage¶
To process something with fffw
you need:
Create
FFMPEG
instanceAdd one or more
Input
files to itIf necessary, initialize some processing graph
Add one or more
Output
filesRun command with
FFMPEG.run