Awesome
Finding the best methods for image scaling
There is no such thing as a universally best method or the best kernel function, or... Which methods and settings will be best for you will depend on your use case scenario. In this research, several scenarios will be tested using mpv player (https://mpv.io). Settings will be adjusted for the beset or close to best results on tested images (videos). I will try to restrain myself from making claims that this or that is the best or worst, I will try to base all claims on the test results, but do note that in different scenarios, you may get different results.
Testing methodology
All test images were created directly from the same illustration (https://www.freepik.com/free-vector/vector-illustration-mountain-landscape_1215613.htm#query=illustrations&position=11&from_view=keyword&track=sph) in Adobe Illustrator with added text as extra details, and then they were converted with ffmpeg to mp4 with this command ffmpeg -loop 1 -i input.png -c:v libx264 -t 60 -r 30 -pix_fmt yuv444p output.mp4
. Videos were then upscaled from lower resolutions to 1080p and compared to screenshot taken of the original 1080p video, in case of upscaling. The oposite was done in case of downscaling. For comparation was used dssim (https://en.wikipedia.org/wiki/Structural_similarity) in ImageMagick (https://imagemagick.org) with this command magick compare -metric dssim org.png scaled.png x:
. For taking screenshots Single Menu Screenshot (https://github.com/garamond13/SingleMenuScreenshot) was used. The shaders and the config used in testing were modified as needed.
Notes that there is no perfect methodology for testing. I went with this methodology because of past experiences. This methodology also avoids blurring, ringing, and aliasing artifacts that occur in the preparation of test images by conventional scaling of the original image. Also, note that this image may not be very suitable for testing downscaling as it won’t have serious issues with aliasing. And it's possible that errors ocour during testing.
Upscale (upsample)
Ringing and anti-ringing
Reference on ringing https://en.wikipedia.org/wiki/Ringing_artifacts
Let’s first establish whether we should use anti-ringing in further testing. Here we will test what amount of antiringing produces best results. The amount is in range [0.0, 1.0]. We will use altUpscaleHDR.glsl user shader with kernel-function=lanczos radius=2.0 blur=1.0. All tested resoultions are upscaled to 1080p.
anti-ringing amounts and corresponding dssim result
resolution | 0.0 | 0.75 | 0.99 | 1.0 |
---|---|---|---|---|
720 | 0.0238535 | 0.0235485 | 0.0235562 | 0.0234667 |
540 | 0.0372175 | 0.0365213 | 0.0364534 | 0.0363607 |
360 | 0.0646968 | 0.0637967 | 0.0636646 | 0.0636103 |
Based on these results, it should be safe to assume that the optimal setting for anti-ringing is 1.0. However, we will perform a few more tests. We will use altUpscaleHDR.glsl user shader with kernel-function=welch radius=2.0 blur=1.0.
anti-ringing amounts and corresponding dssim result
resolution | 0.99 | 1.0 |
---|---|---|
720 | 0.0229898 | 0.0228918 |
540 | 0.0360343 | 0.0359319 |
360 | 0.0632565 | 0.0631909 |
Now we will use altUpscaleHDR.glsl user shader with kernel-function=bicubic a=-0.5.
anti-ringing amounts and corresponding dssim result
resolution | 0.75 | 1.0 |
---|---|---|
720 | 0.0236675 | 0.0235908 |
540 | 0.0366226 | 0.0364864 |
360 | 0.0638264 | 0.0636606 |
Again we are getting consistent results where 1.0 looks to be most optimal. Based on these results, we will use an anti-ringing value of 1.0 for further testing.
Kernel functions
Reference on kernel https://en.wikipedia.org/wiki/Kernel_(image_processing)
Kernel functions are used for the calculation of kernel weights. We will test a few types of scaling filters and several kernel functions. The filters we are going to test are mpv's ewa or polar (mpv --vo=gpu-next), at the time of the research my experimental jinc (which is similar to mpv's ewa or polar), and orthogonal or separated (alt-scale shaders). Kernel functions we are going to use here will be windowed sinc or jinc, or their alternatives.
Let’s first have a look at windowed sinc and the following windows: cosine, welch, lanczos (sinc window), hann, and hamming. For all of these windows, it can be said that they are controlled by the kernel radius. We will test at what radius these windows are giving best results. We will use altUpscaleHDR.glsl user shader with blur=1.0 anti-ringing=1.0.
720p
window | radius | dssim |
---|---|---|
lanczos | 4.6 | 0.0221655 |
cosine | 4.5 | 0.0221147 |
welch | 4.5 | 0.0220882 |
hann | 6.6 | 0.0221957 |
hamming | 5.9 | 0.0221789 |
540p
window | radius | dssim |
---|---|---|
lanczos | 4.3 | 0.0354828 |
cosine | 4.3 | 0.035486 |
welch | 4.3 | 0.0354907 |
hann | 4.9 | 0.0354921 |
hamming | 4.6 | 0.035502 |
360p
window | radius | dssim |
---|---|---|
lanczos | 2.7 | 0.0628738 |
cosine | 2.6 | 0.0627836 |
welch | 2.5 | 0.0627289 |
hann | 3.4 | 0.0629509 |
hamming | 3.7 | 0.0628955 |
Here, we can notice that all windows can achieve similar results. However, one other thing to note is that hann and hamming usually need a larger radius to achieve those results, which makes them worse in terms of performance.
Now lets test mpv's ewa or polar with these settings added to base config scale=ewa_lanczos scaler-lut-size=10 scale-window= scale-radius= scale-cutoff=0.0 scale-blur=1.0. For antiringing we will use https://github.com/haasn/gentoo-conf/blob/xor/home/nand/.mpv/shaders/antiring.hook without chroma part (default settings).
Note: dont confuse ewa_lanczos and lanczos, ewa_lanczos is jinc windowed jinc and lanczos is sinc windowed sinc, but sinc window is also called lanczos window. Here we will only test sinc windowed jinc (under the name lanczos) also known as ewa_ginseng.
720p
window | radius | dssim |
---|---|---|
lanczos | 3.0 | 0.0228769 |
cosine | 2.9 | 0.0226756 |
welch | 2.8 | 0.0225573 |
hann | 5.3 | 0.0227386 |
hamming | 4.4 | 0.022792 |
540p
window | radius | dssim |
---|---|---|
lanczos | 2.9 | 0.036318 |
cosine | 2.9 | 0.036318 |
welch | 2.7 | 0.0359962 |
hann | 3.4 | 0.0366169 |
hamming | 4.0 | 0.0365035 |
360p
window | radius | dssim |
---|---|---|
lanczos | 2.9 | 0.0638985 |
cosine | 2.7 | 0.0635906 |
welch | 2.7 | 0.0634135 |
hann | 3.3 | 0.0641833 |
hamming | 4.1 | 0.0641346 |
The results are very consistent. For similar radiuses, lanczos, cosine, and welch achieve their best overall results at lower radiuses compared to hann and hamming. This is why I will only test lanczos welch and hann in my experimental jinc (usually if not mentioned assume antiringing=1.0 and blur=1.0).
720p
window | radius | dssim |
---|---|---|
lanczos | 3.0 | 0.0227903 |
welch | 2.8 | 0.0224323 |
hann | 5.3 | 0.0227661 |
540p
window | radius | dssim |
---|---|---|
lanczos | 2.9 | 0.0365359 |
welch | 2.7 | 0.036151 |
hann | 3.4 | 0.0368122 |
360p
window | radius | dssim |
---|---|---|
lanczos | 2.9 | 0.0642838 |
welch | 2.7 | 0.0638784 |
hann | 3.3 | 0.0645392 |
Again we are getting consitent results. Based on the last tests, we could conclude that windows controlled by radius can achieve around similar results, but some windows need higher radiuses to achieve those results.
So far, we have tested windows that don’t offer any control with free parameters. We only get to control them with radius. Now we will test windows with one free parameter blackman, power of cosine, and here I will present the new window with one free parameter.
w(x) = 1.0 - pow(abs(x) / R, n)
, where n is in the range (0.0, +inf] and R is the kernel radius
at n=1.0, w(x)=linear window
at n=2.0, w(x)=welch window
as n aproaches +inf, w(x)=box window
I will refer to it as the garamond window.
Taking into consideration the fact that at n=2.0, the garamond window is exactly the same as the welch window, we will use the previous results of welch and test them against the garamond window at the same radius, and possibly lower radius. We will use my experimental jinc.
720p
window | radius | dssim |
---|---|---|
welch | 2.8 | 0.0224323 |
garamond n=4.0 | 2.8 | 0.0221757 |
garamond n=3.7 | 2.7 | 0.0221628 |
Using the garamond window, we were able to achieve better result at the same radius, and we were able to lower the radius further while achieving even better results.
Lets now test power of cosine with free parameter n.
Few interisting facts about it:
at n=0.0, box window
at n=1.0, cosine window
at n=2.0, hann window
We will use my experimental jinc and compare the results to welch because it achieved the best result in the previous test.
540p
window | radius | dssim |
---|---|---|
welch | 2.7 | 0.036151 |
pow cosine n=0.6 | 2.7 | 0.0360422 |
pow cosine n=0.4 | 2.5 | 0.0357687 |
Again we were able to achive better resut at the same radius, and we were able to lower the radius further while achieving even better results.
Now lets take the best result from earlier test and test blackman against it. (at a=0.16, common blackman). We will use altUpscaleHDR.glsl.
720p
window | radius | dssim |
---|---|---|
welch | 4.5 | 0.0220882 |
blackman a=-0.8 | 4.5 | 0.0219042 |
blackman a=-0.7 | 3.6 | 0.0218609 |
Again, we are getting consistent results. Based on this, we could conclude that these windows, still controlled by radius, and now with one free parameter, can achieve better results and further reduce the needed radius for those results compared to windows controlled only by radius.
Now, we will test the generalized normal window (gnw) and said, windows with two free parameters. One thing to note about them is that they are not affected by radius directly, as previously tested windows. We will take the best results from earlier test and test both against. We will use altUpscaleHDR.glsl.
720p
window | radius | dssim |
---|---|---|
welch | 4.5 | 0.0220882 |
blackman a=-0.7 | 3.6 | 0.0218609 |
gnw s=4.9 n=3.5 | 3.9 | 0.0219217 |
said chi=0.16 eta=1.0 | 4.0 | 0.0220121 |
Here we can see that gnw and said can achive similar results. Now we will use my experimental jinc.
540p
window | radius | dssim |
---|---|---|
welch | 2.7 | 0.036151 |
pow cosine n=0.4 | 2.5 | 0.0357687 |
gnw s=3.0 n=2.9 | 2.5 | 0.0358475 |
said chi=0.17 eta=0.0 | 2.5 | 0.0358198 |
And we are getting consistent results. Based on this, we could conclude that we are able to get at least close to the best results compared to kernel functions with one free parameter (which are affected by radius, so we could count it as a second parameter as well).
Kernel blur
With the kernel blur we can control the kernel funcion's central lobe width. So far, we have conducted tests with a blur value of 1.0, which means neutral or off. Now, we will test whether we can improve results by adjusting the kernel blur. We will use altUpscaleHDR.glsl.
720p
window | radius | blur | dssim |
---|---|---|---|
blackman a=-0.7 | 3.6 | 1.0 | 0.0218609 |
blackman a=-0.7 | 3.6 | 0.93 | 0.0215603 |
Now we will use my experimental jinc.
540p
window | radius | blur | dssim |
---|---|---|---|
pow cosine n=0.4 | 2.5 | 1.0 | 0.0357687 |
pow cosine n=0.4 | 2.5 | 0.89 | 0.0346785 |
By adjusting the kernel blur, we were able to improve some of the best previous results.
Sigmoidal upscale
Reference https://legacy.imagemagick.org/Usage/color_mods/#sigmoidal
So far, tests were done in gamma light because of simplicity. Now, we will test upscaling in sigmoidal light. We control sigmoidal curve with contrast (c) and midpoint (m) parameters. Now we will compare gamma upscale (previous results altUpscaleHDR.glsl) and same settings in sigmoidal light using altUpscale.glsl.
720p
window | radius | blur | dssim |
---|---|---|---|
blackman a=-0.7 (gamma) | 3.6 | 0.93 | 0.0215603 |
blackman a=-0.7 (c=6.0 m=0.6) | 3.6 | 0.93 | 0.0214525 |
And now, I will modify my experimental jinc to upscale in sigmoidal light. For simplicity, I will use the same settings for sigmoidal light control.
540p
window | radius | blur | dssim |
---|---|---|---|
pow cosine n=0.4 (gamma) | 2.5 | 0.89 | 0.0346785 |
pow cosine n=0.4 (c=6.0 m=0.6) | 2.5 | 0.89 | 0.034432 |
By performing upscale in sigmoidal light, we were able to improve the results. Based on this, we could conclude that performing upscale in sigmoidal light can achieve better results. However, for most of the further testing, I will continue to use gamma light because of simplicity.
Post-upscale sharpening
Now, we will test post-upscale sharpening, essentially, can we further improve the results? We will use unsharp mask which is controled by sigma value (s), its kernel radius (r) and sharpening amount (a). We wil still use sigmoidal light altUpscale.glsl.
720p
window | radius | blur | dssim |
---|---|---|---|
blackman a=-0.7 (c=6.0 m=0.6) no sharpening | 3.6 | 0.93 | 0.0214525 |
blackman a=-0.7 (c=6.0 m=0.6) s=1.1 r=2.0 a=0.1 | 3.6 | 0.93 | 0.0213019 |
And we are able to further iprove results by sharpening image after the upscale.
Alternative kernel functions
Alternative kernel functions are mostly designed as alternatives to sinc. However, in this case, we will only test a few adjustable alternatives, meaning we should be able to adjust them to perform well with jinc as well. We will test bicubic with one free parameter (a) and a fixed radius of 2.0, bc-spline with two parameters (b) and (c) and a fixed radius of 2.0. Additionally, I will present here the new modified fsr kernel based on https://github.com/GPUOpen-Effects/FidelityFX-FSR .The original fsr kernel has one parameter (b) and a fixed radius of 2.0. The new modified fsr kernel has two parameters (b) and (c). The c parameter will have a similar effect to the kernel blur in windowed sinc and windowed jinc kernel functions. It will also have a similar effect to the (b) parameter of bc-spline.
modified fsr kernel, fixed radius 2.0
b != 0.0 && b != 2.0 && c != 0.0
(1.0 / (2.0 * b - b * b) * (b / (c * c) * x * x - 1.0) * (b / (c * c) * x * x - 1.0) - (1.0 / (2.0 * b - b * b) - 1.0)) * (0.25 * x * x - 1.0) * (0.25 * x * x - 1.0)
at c=1.0, the original fsr kernel
Now we will test bicubic using altUpscaleHDR.glsl.
resolution | a | dssim |
---|---|---|
720 | -1.1 | 0.0225256 |
540 | -0.8 | 0.0359437 |
360 | -0.9 | 0.0631531 |
Now we will test bicubic using my experimental jinc.
resolution | a | dssim |
---|---|---|
720 | -0.7 | 0.0224556 |
540 | -0.7 | 0.0347427 |
360 | -0.6 | 0.0622983 |
Now we will test bc-spline using altUpscaleHDR.glsl.
resolution | b and c | dssim |
---|---|---|
720 | b=-0.5, c=1.1 | 0.0219281 |
540 | b=-0.5 c=0.9 | 0.0346389 |
360 | b=-0.7 c=0.9 | 0.0619001 |
Now we will test bc-spline using my experimental jinc.
resolution | b and c | dssim |
---|---|---|
720 | b=0.3 c=0.7 | 0.0218987 |
540 | b=0.2 c=0.7 | 0.0343549 |
360 | b=0.2 c=0.6 | 0.0621524 |
Now we will test modified fsr using altUpscaleHDR.glsl.
resolution | b and c | dssim |
---|---|---|
720 | b=0.2, c=0.95 | 0.0218886 |
540 | b=0.2 c=0.88 | 0.034506 |
360 | b=0.2 c=0.94 | 0.0625743 |
Now we will test modified fsr using my experimental jinc.
resolution | b and c | dssim |
---|---|---|
720 | b=0.43 c=1.06 | 0.0220508 |
540 | b=0.44 c=1.03 | 0.0344618 |
360 | b=0.44 c=1.04 | 0.0623959 |
Here, we are able to get results that are close to the windowed sinc and windowed jinc kernel functions. Based on these results, we could conclude that it is easier to achieve good results with bc-spline and modified fsr kernel compared to bicubic. This probably shouldn’t be surprising as bicubic has only one parameter compared to the two parameters in the previously mentioned kernel functions. Additionally, note that all three kernel functions use kernels with a radius of 2.0.
Downscale (downsample)
Downscaling will be kept a bit simpler because it’s more straightforward if done correctly. The correct way to do downscaling is to use linear light because otherwise, the image can be significantly darkened (reference http://www.ericbrasseur.org/gamma.html). The kernel has to be scaled appropriately, and antiringing is generally bad because it can increase aliasing. These rules can be broken and yield better results on some images, but generally, for video with a lot of different frames (images), these rules should probably be enforced.
We will skip testing of many windows since we have already established that more adjustable windows can be adjusted for better results. We will test the power of the cosine window because for n=1 it’s a cosine window, and for n=2, it’s a hann window. We will also test the garamond window because for n=1, it’s a linear window, and for n=2, it’s a Welch window. Now we will use altDownscale.glsl.
540p
window | radius | dssim |
---|---|---|
lanczos | 2.2 | 0.0184956 |
pow cosine n=0.8 | 2.0 | 0.0185596 |
garamond n=0.1 | 2.4 | 0.0173727 |
said chi=0.4 eta=0.0 | 2.0 | 0.0183955 |
720p
window | radius | dssim |
---|---|---|
lanczos | 2.1 | 0.0137789 |
pow cosine n=1.3 | 2.0 | 0.0140686 |
garamond n=0.1 | 2.4 | 0.0132567 |
said chi=0.5 eta=0.0 | 2.1 | 0.0138891 |
Based on these results we could conclude that the garamond window can be adjusted for the best results. Now we will use my experimental jinc.
540p
window | radius | dssim |
---|---|---|
lanczos | 2.5 | 0.0204201 |
pow cosine n=0.6 | 2.1 | 0.0206342 |
garamond n=0.1 | 2.7 | 0.0192115 |
said chi=0.4 eta=0.0 | 2.2 | 0.0202307 |
And we are getting consistent results. Now we will test can we improve the results by adjusting the kernel blur. First we will use altDownscale.glsl.
540p
window | radius | blur | dssim |
---|---|---|---|
garamond n=0.1 | 2.4 | 1.0 | 0.0173727 |
garamond n=0.1 | 2.4 | 0.75 | 0.0153649 |
Now we will use my experimental jinc.
540p
window | radius | blur | dssim |
---|---|---|---|
garamond n=0.1 | 2.7 | 1.0 | 0.0192115 |
garamond n=0.1 | 2.7 | 0.69 | 0.0155021 |
Based on these results, we could conclude that the results can be improved by adjusting the kernel blur.
Now we will test alternative kernel functions. We will test only bc-spline and the modified fsr kernel. We will use altDownscale.glsl.
540p
window | parameters | dssim |
---|---|---|
bcspline | b=-0.5 c=0.6 | 0.0169073 |
modified fsr | b=0.45 c=0.94 | 0.0177153 |
And now we will use my experimental jinc.
540p
window | parameters | dssim |
---|---|---|
bcspline | b=-0.1 c=0.2 | 0.016042 |
modified fsr | b=0.53 c=0.94 | 0.0164078 |
These results may not be too impressive, but note that these kernel functions should be significantly faster and, on top of that, use a radius of 2.0.
Finally, I’m going to just mention that downscaling could be further improved by using pre-filtering (gaussian blur), which will reduce aliasing, and we could use post-scale sharpening for obvious reasons.