When AFGS1 metadata are not present in the bitstream, the AFGS1 film grain synthesis module shall produce output pictures that are identical in all respects and have the same output order as the input pictures to the AFGS1 film grain synthesis module.
When AFGS1 metadata is present in the bitstream, a film grain synthesis module conforming to this specification shall implement a film grain synthesis process that modifies its input arrays OutY, OutU, OutV. The reference film grain synthesis process is described in [section 8.2][].
When AFGS1 metadata is present in the bitstream, a conformant AFGS1 film grain synthesis module shall satisfy at least one of the following two options:
-
A conformant AFGS1 film grain synthesis module shall produce output pictures that are identical in all respects and have the same output order as those produced by the AFGS1 film grain synthesis process specified herein including applying the exact film grain synthesis process as specified in [section 8.2][].
-
A conformant AFGS1 film grain synthesis module shall produce output pictures that are in the same order and do not have perceptually significant differences with the pictures produced by the reference film grain synthesis process specified in [section 8.2][] when applied to the input pictures of the film grain synthesis process with the film grain parameters signaled for these pictures. The definition of "perceptually significant differences" is beyond the scope of this specification and may be specified, for example, by a service provider as part of their accreditation program. The film grain synthesis process applied by a conformant AFGS1 film grain synthesis module should be feature complete with regards to the reference film grain synthesis process of [section 8.2][] including scaling strength of the film grain as a function of intensity according to the signaled parameters, same maximum AR lag, and similar modeling of correlation between luma and chroma and smoothing of transitions between blocks of grain when applicable.
Note: To ensure conformance, film grain module manufacturers are advised to implement the film grain synthesis process as specified in [section 8.2][]. One reason to choose the second conformance option is implementation of optional processing steps before input or after the output of the film grain synthesis process, in which case there could be minor differences in the output with the reference film grain synthesis process of [section 8.2][]. Examples of these optional processing steps are algorithms improving output picture quality, such as de-banding filtering and coding artefacts removal. {:.alert .alert-info }
Note: Some applications, such as transcoding from or to another video coding standard, may use intermediate output pictures for transcoding. In such cases, the film grain synthesis information may be adapted and inserted in the transcoded bitstream. {:.alert .alert-info }
The input to this process is a sequence of the reconstructed pictures and a sequence of associated AFGS1 film grain metadata messages. AFGS1 film grain metadata messages may be contained in an elementary bitstream or in a transport bitstream.
The output from this process is a sequence of updated pictures.
AFGS1 film grain metadata messages are processed in the order they are included in the elementary bitstream, which is typically a picture decoding order.
For each AFGS1 film grain metadata message, in turn the syntax elements are extracted as specified in [section 5][].
The syntax tables include function calls indicating when the corresponding decode processes are triggered.
A film grain parameter set associated with the av1_film_grain_params( ) function is selected and is associated with the output picture.
If afgs1_enable_flag is equal to 1 and apply_grain_flag is present and equal to 1, then the film grain synthesis process specified in [section 8.2][] is invoked with inputs of w, h, SubX, and SubY.
Note: The apply_grain_flag is from the selected film grain parameter set av1_film_grain_params( ). {:.alert .alert-info }
This process modifies arrays OutY, OutU, and OutV that contain the output picture samples of the video decoder or
intermediate values of pictures from the previous post-processing modules.
Finally, the picture to be processed is defined to be the arrays OutY, OutU, and OutV where the bit depth for each sample is BitDepth.
This picture is the overall output and further processing (such as color conversion) is outside the scope of this specification.
For example, a real implementation might use these arrays to display the picture to the user, or a test system might save the arrays so the output can be verified.
Note: If NumPlanes is equal to 1, then the U and V planes should be ignored. {:.alert .alert-info }
The inputs to this process are:
-
variables w and h specifying the width and height of the picture,
-
variables SubX and SubY specifying the subsampling parameters of the picture,
-
arrays OutY, OutU, and OutV,
-
variable BitDepth specifying the bit depth of the picture samples.
The process modifies the arrays OutY, OutU, and OutV to add film grain noise as follows:
-
The variable RandomRegister (used for generating pseudo-random numbers) is set equal to grain_seed.
-
The variable GrainCenter is set equal to 128 << (BitDepth - 8).
-
The variable GrainMin is set equal to = -GrainCenter.
-
The variable GrainMax is set equal to (256 << (BitDepth - 8)) - 1 - GrainCenter.
-
The generate grain process specified in [section 8.2.3][] is invoked.
-
The scaling lookup initialization process specified in [section 8.2.4][] is invoked.
-
The add noise process specified in [section 8.2.5][] is invoked with w, h, SubX, and SubY as inputs.
The input to this process is a variable bits specifying the number of random bits to return.
The output of this process is a pseudo-random number based on the state in RandomRegister.
The process is specified as follows:
get_random_number( bits ) {
r = RandomRegister
bit = ((r >> 0) ^ (r >> 1) ^ (r >> 3) ^ (r >> 12)) & 1
r = (r >> 1) | (bit << 15)
result = (r >> (16 - bits)) & ((1 << bits) - 1)
RandomRegister = r
return result
}
The output of this process is the variable result.
This process generates noise via an auto-regressive filter.
First, an array LumaGrain, which is 82 samples wide and 73 samples high, of white noise is generated for luma as follows:
shift = 12 - BitDepth + grain_scale_shift
for ( y = 0; y < 73; y++ ) {
for ( x = 0; x < 82; x++ ) {
if ( num_y_points > 0 ) {
g = Gaussian_Sequence[ get_random_number( 11 ) ]
} else {
g = 0
}
LumaGrain[ y ][ x ] = Round2( g, shift )
}
}
where the function call get_random_number() invokes the random number process specified in [section 8.2.2][].
Then an auto-regressive filter is applied to the white noise as follows:
shift = ar_coeff_shift_minus6 + 6
for ( y = 3; y < 73; y++ ) {
for ( x = 3; x < 82 - 3; x++ ) {
s = 0
pos = 0
for ( deltaRow = -ar_coeff_lag; deltaRow <= 0; deltaRow++ ) {
for ( deltaCol = -ar_coeff_lag; deltaCol <= ar_coeff_lag; deltaCol++ ) {
if ( deltaRow == 0 && deltaCol == 0 )
break
c = ArCoeffsYPlus128[ pos ] - 128
s += LumaGrain[ y + deltaRow ][ x + deltaCol ] * c
pos++
}
}
LumaGrain[ y ][ x ] = Clip3( GrainMin, GrainMax, LumaGrain[ y ][ x ] + Round2( s, shift ) )
}
}
If luma_only_flag is equal to 0, the chroma grain is generated in a similar way, except the filtering includes a coefficient that introduces a correlation with the luma grain.
The variable chromaW (representing the width of the chroma noise array) is set equal to (SubX ? 44 : 82).
The variable chromaH (representing the height of the chroma noise array) is set equal to (SubY ? 38 : 73).
White noise arrays CbGrain and CrGrain, which are chromaW samples wide and chromaH samples high, are generated as follows:
shift = 12 - BitDepth + grain_scale_shift
RandomRegister = grain_seed ^ 0xb524
for ( y = 0; y < chromaH; y++ ) {
for ( x = 0; x < chromaW; x++ ) {
if ( num_cb_points > 0 || chroma_scaling_from_luma_flag) {
g = Gaussian_Sequence[ get_random_number( 11 ) ]
} else {
g = 0
}
CbGrain[ y ][ x ] = Round2( g, shift )
}
}
RandomRegister = grain_seed ^ 0x49d8
for ( y = 0; y < chromaH; y++ ) {
for ( x = 0; x < chromaW; x++ ) {
if ( num_cr_points > 0 || chroma_scaling_from_luma_flag) {
g = Gaussian_Sequence[ get_random_number( 11 ) ]
} else {
g = 0
}
CrGrain[ y ][ x ] = Round2( g, shift )
}
}
Then the auto-regressive filter is applied as follows:
shift = ar_coeff_shift_minus6 + 6
for ( y = 3; y < chromaH; y++ ) {
for ( x = 3; x < chromaW - 3; x++ ) {
s0 = 0
s1 = 0
pos = 0
for ( deltaRow = -ar_coeff_lag; deltaRow <= 0; deltaRow++ ) {
for ( deltaCol = -ar_coeff_lag; deltaCol <= ar_coeff_lag; deltaCol++ ) {
c0 = ArCoeffsCbPlus128[ pos ] - 128
c1 = ArCoeffsCrPlus128[ pos ] - 128
if ( deltaRow == 0 && deltaCol == 0 ) {
if ( num_y_points > 0 ) {
luma = 0
lumaX = ( (x - 3) << SubX ) + 3
lumaY = ( (y - 3) << SubY ) + 3
for ( i = 0; i <= SubY; i++ )
for ( j = 0; j <= SubX; j++ )
luma += LumaGrain[ lumaY + i ][ lumaX + j ]
luma = Round2( luma, SubX + SubY )
s0 += luma * c0
s1 += luma * c1
}
break
}
s0 += CbGrain[ y + deltaRow ][ x + deltaCol ] * c0
s1 += CrGrain[ y + deltaRow ][ x + deltaCol ] * c1
pos++
}
}
CbGrain[ y ][ x ] = Clip3( GrainMin, GrainMax, CbGrain[ y ][ x ] + Round2( s0, shift ) )
CrGrain[ y ][ x ] = Clip3( GrainMin, GrainMax, CrGrain[ y ][ x ] + Round2( s1, shift ) )
}
}
Note: When num_y_points is equal to 0, this process may use uninitialized values within ArCoeffsYPlus128 to compute LumaGrain. However, LumaGrain will never be read in this case so it does not matter what values are constructed. Similarly, when num_cr_points/num_cb_points are equal to 0 and chroma_scaling_from_luma_flag is equal to 0, the CbGrain/CrGrain arrays will never be read. {:.alert .alert-info }
This process computes a lookup table for each available color component.
Each lookup table ScalingLut[ plane ] contains 256 entries constructed by a piecewise linear interpolation of the given points as follows:
for ( plane = 0; plane < NumPlanes; plane++ ) {
if ( plane == 0 || chroma_scaling_from_luma_flag )
numPoints = num_y_points
else if ( plane == 1 )
numPoints = num_cb_points
else
numPoints = num_cr_points
if ( numPoints == 0 ) {
for ( x = 0; x < 256; x++ ) {
ScalingLut[ plane ][ x ] = 0
}
} else {
for ( x = 0; x < get_x( plane, 0 ); x++ ) {
ScalingLut[ plane ][ x ] = get_y( plane, 0 )
}
for ( i = 0; i < numPoints - 1; i++ ) {
deltaY = get_y( plane, i + 1 ) - get_y( plane, i )
deltaX = get_x( plane, i + 1 ) - get_x( plane, i )
delta = deltaY * ( ( 65536 + (deltaX >> 1) ) / deltaX )
for ( x = 0; x < deltaX; x++ ) {
v = get_y( plane, i ) + ( ( x * delta + 32768 ) >> 16 )
ScalingLut[ plane ][ get_x( plane, i ) + x ] = v
}
}
for ( x = get_x( plane, numPoints - 1 ); x < 256; x++ ) {
ScalingLut[ plane ][ x ] = get_y( plane, numPoints - 1 )
}
}
}
where the functions get_x() and get_y() return the coordinates for a specific point and are specified as:
get_x( plane, i ) {
if ( plane == 0 || chroma_scaling_from_luma_flag )
return PointYValue[ i ]
else if ( plane == 1 )
return PointCbValue[ i ]
else
return PointCrValue[ i ]
}
get_y( plane, i ) {
if ( plane == 0 || chroma_scaling_from_luma_flag )
return PointYScaling[ i ]
else if ( plane == 1 )
return PointCbScaling[ i ]
else
return PointCrScaling[ i ]
}
The inputs to this process are:
-
variables w and h specifying the width and height of the picture,
-
variables SubX and SubY specifying the subsampling parameters of the picture.
This process combines the film grain with the image data.
First an array of noise data noiseStripe is generated for each 32 luma sample high stripe of the image.
noiseStripe[ lumaNum ][ 0 ] is 34 samples high and w samples wide (a few additional samples across are actually written to the array, but these are never read) and contains noise for the luma component.
noiseStripe[ lumaNum ][ 1 ] and noiseStripe[ lumaNum ][ 2 ] are (34 >> SubY) samples high and Round2(w, SubX) samples wide and contain noise for the chroma components.
noiseStripe represents the result of constructing square grain blocks and blending horizontally adjacent blocks together (although blending is only applied if overlap_flag is equal to 1) and is constructed as follows:
lumaNum = 0
for ( y = 0; y < (h + 1)/2 ; y += 16 ) {
RandomRegister = grain_seed
RandomRegister ^= ((lumaNum * 37 + 178) & 255) << 8
RandomRegister ^= ((lumaNum * 173 + 105) & 255)
for ( x = 0; x < (w + 1)/2 ; x += 16 ) {
rand = get_random_number( 8 )
offsetX = rand >> 4
offsetY = rand & 15
for ( plane = 0 ; plane < NumPlanes; plane++ ) {
planeSubX = ( plane > 0) ? SubX : 0
planeSubY = ( plane > 0) ? SubY : 0
planeOffsetX = planeSubX ? 6 + offsetX : 9 + offsetX * 2
planeOffsetY = planeSubY ? 6 + offsetY : 9 + offsetY * 2
for ( i = 0; i < 34 >> planeSubY ; i++ ) {
for ( j = 0; j < 34 >> planeSubX ; j++ ) {
if ( plane == 0 )
g = LumaGrain[ planeOffsetY + i ][ planeOffsetX + j ]
else if ( plane == 1 )
g = CbGrain[ planeOffsetY + i ][ planeOffsetX + j ]
else
g = CrGrain[ planeOffsetY + i ][ planeOffsetX + j ]
if ( planeSubX == 0 ) {
if ( j < 2 && overlap_flag && x > 0 ) {
old = noiseStripe[ lumaNum ][ plane ][ i ][ x * 2 + j ]
if ( j == 0 ) {
g = old * 27 + g * 17
} else {
g = old * 17 + g * 27
}
g = Clip3( GrainMin, GrainMax, Round2(g, 5) )
}
noiseStripe[ lumaNum ][ plane ][ i ][ x * 2 + j ] = g
} else {
if ( j == 0 && overlap_flag && x > 0 ) {
old = noiseStripe[ lumaNum ][ plane ][ i ][ x + j ]
g = old * 23 + g * 22
g = Clip3( GrainMin, GrainMax, Round2(g, 5) )
}
noiseStripe[ lumaNum ][ plane ][ i ][ x + j ] = g
}
}
}
}
}
lumaNum++
}
Then the noise stripes are blended together to form a noise image noiseImage as follows:
for ( plane = 0; plane < NumPlanes; plane++ ) {
planeSubX = ( plane > 0) ? SubX : 0
planeSubY = ( plane > 0) ? SubY : 0
for ( y = 0; y < ( (h + planeSubY) >> planeSubY ) ; y++ ) {
lumaNum = y >> ( 5 - planeSubY )
i = y - (lumaNum << ( 5 - planeSubY ) )
for ( x = 0; x < ( (w + planeSubX) >> planeSubX) ; x++ ) {
g = noiseStripe[ lumaNum ][ plane ][ i ][ x ]
if ( planeSubY == 0 ) {
if ( i < 2 && lumaNum > 0 && overlap_flag ) {
old = noiseStripe[ lumaNum - 1 ][ plane ][ i + 32 ][ x ]
if ( i == 0 ) {
g = old * 27 + g * 17
} else {
g = old * 17 + g * 27
}
g = Clip3( GrainMin, GrainMax, Round2(g, 5) )
}
} else {
if ( i < 1 && lumaNum > 0 && overlap_flag ) {
old = noiseStripe[ lumaNum - 1 ][ plane ][ i + 16 ][ x ]
g = old * 23 + g * 22
g = Clip3( GrainMin, GrainMax, Round2(g, 5) )
}
}
noiseImage[ plane ][ y ][ x ] = g
}
}
}
Note: Although this process is specified in terms of full size noiseStripe and noiseImage arrays, the reference code shows how it is possible to implement the grain synthesis with just 2 line buffers for luma, and 1 line buffer for each chroma component. {:.alert .alert-info }
Finally, the noise is blended with the original image data as follows:
if ( clip_to_restricted_range_flag ) {
minValue = 16 << (BitDepth - 8)
maxLuma = 235 << (BitDepth - 8)
if ( matrix_coefficients == MC_IDENTITY )
maxChroma = maxLuma
else
maxChroma = 240 << (BitDepth - 8)
} else {
minValue = 0
maxLuma = (256 << (BitDepth - 8)) - 1
maxChroma = maxLuma
}
ScalingShift = grain_scaling_minus8 + 8
for ( y = 0; y < ( (h + SubY) >> SubY) ; y++ ) {
for ( x = 0; x < ( (w + SubX) >> SubX) ; x++ ) {
lumaX = x << SubX
lumaY = y << SubY
lumaNextX = Min( lumaX + 1, w - 1 )
if ( SubX )
averageLuma = Round2( OutY[ lumaY ][ lumaX ] + OutY[ lumaY ][ lumaNextX ], 1 )
else
averageLuma = OutY[ lumaY ][ lumaX ]
if ( num_cb_points > 0 || chroma_scaling_from_luma_flag ) {
orig = OutU[ y ][ x ]
if ( chroma_scaling_from_luma_flag ) {
merged = averageLuma
} else {
combined = averageLuma * ( CbLumaMult - 128 ) + orig * ( CbMult - 128 )
merged = Clip1( ( combined >> 6 ) + ( (CbOffset - 256 ) << (BitDepth - 8) ) )
}
noise = noiseImage[ 1 ][ y ][ x ]
noise = Round2( scale_lut( 1, merged ) * noise, ScalingShift )
OutU[ y ][ x ] = Clip3( minValue, maxChroma, orig + noise )
}
if ( num_cr_points > 0 || chroma_scaling_from_luma_flag) {
orig = OutV[ y ][ x ]
if ( chroma_scaling_from_luma_flag ) {
merged = averageLuma
} else {
combined = averageLuma * ( CrLumaMult - 128 ) + orig * ( CrMult - 128 )
merged = Clip1( ( combined >> 6 ) + ( (CrOffset - 256 ) << (BitDepth - 8) ) )
}
noise = noiseImage[ 2 ][ y ][ x ]
noise = Round2( scale_lut( 2, merged ) * noise, ScalingShift )
OutV[ y ][ x ] = Clip3( minValue, maxChroma, orig + noise )
}
}
}
for ( y = 0; y < h ; y++ ) {
for ( x = 0; x < w ; x++ ) {
orig = OutY[ y ][ x ]
noise = noiseImage[ 0 ][ y ][ x ]
noise = Round2( scale_lut( 0, orig ) * noise, ScalingShift )
if ( num_y_points > 0 ) {
OutY[ y ][ x ] = Clip3( minValue, maxLuma, orig + noise )
}
}
}
where scale_lut is a function that performs a piecewise linear interpolation into the appropriate scaling table. The scale_lut function is specified as follows:
scale_lut( plane, index ) {
shift = BitDepth - 8
x = index >> shift
rem = index - ( x << shift )
if ( BitDepth == 8 || x == 255) {
return ScalingLut[ plane ][ x ]
} else {
start = ScalingLut[ plane ][ x ]
end = ScalingLut[ plane ][ x + 1 ]
return start + Round2( (end - start) * rem, shift )
}
}