Awesome

Video Alignment functions for Vapoursynth

Useful when two sources are available and you would like to combine them in curtain ways, which would only become possible once they are perfectly aligned. For example doing a color transfer, patching black crushed areas, transfering textures, creating a paired dataset, combining high resolution Bluray chroma with better DVD luma, or similar.

Requirements

pytorch
pip install numpy
pip install pyiqa && pip install -U setuptools (optional, only for temporal alignment precision=3)
julek-plugin (optional, only for temporal alignment precision=2)
tivtc (optional, only for temporal alignment with different frame rates)

Setup

Put the entire vs_align folder into your vapoursynth scripts folder.
Or install via pip: pip install git+https://github.com/pifroggi/vs_align.git

Spatial Alignment

Aligns the content of a frame to a reference frame using a modified Rife AI model. Frames should have no black borders before using. Output clip will have the same dimensions as reference clip. Resize reference clip to get desired output scale. Examples: https://slow.pics/c/rqeq3D97

import vs_align
clip = vs_align.spatial(clip, ref, precision=3, iterations=1, blur_strength=0, device="cuda")

clip
Misaligned clip. Must be in RGBS format.

ref
Reference clip that misaligned clip will be aligned to. Must be in RGBS format.

precision
1, 2, 3, 4, or 5. Higher values will internally align at higher resolutions to increase precision. Each step up doubles the internal resolution, which will in turn increase processing time and VRAM usage. Lower values are less precise, but can correct larger misalignments. 3 works great in most cases.

iterations (optional)
Runs the alignment multiple times to dial it in even further. With more than around 5 passes, artifacts can appear.

blur_strength (optional)
Blur is only used internally and will not be visible on the output. It can help to ignore small details in the alignment process (like compression, noise or halos) and focus more on the general shapes. If lines on the output get thinner or thicker, try to increase blur a little. It will reduce accuracy, so try to keep it as low as possible. Good values are 0-10. The best alignment will be at blur 0.

device (optional)
Possible values are "cuda" to use with an Nvidia GPU, or "cpu". This will be very slow on CPU.

Temporal Alignment

Syncs two clips timewise by searching through one clip and selecting the frame that most closely matches the reference clip frame. It is recommended trying to minimize the difference between the two clips by preprocessing. For example removing black borders, cropping to the overlapping region, rough color matching, dehaloing. The closer the clips look to each other, the better the temporal alignment will be. Adapted from decimatch by po5.

import vs_align
clip = vs_align.temporal(clip, ref, clip2, tr=20, precision=1, fallback, thresh=40, device="cuda", fp16=False, debug=False)

clip
Misaligned clip. Must be same format and dimensions as ref.

ref
Reference clip that misaligned clip will be aligned to. Must be same format and dimensions as clip.

clip2 (optional)
Clip and ref will be used for the calculations, but the actual output frame is then copied from clip2 if set. This is useful if you would like to do preprocessing on clip and ref (like downsizing to increase speed), but would like the ouput frame to be unaltered.

tr
Temporal radius. How many frames it will search forward and back to find a match.

precision

Value	Precision	Speed	Usecase	Method
1	worst	very fast	when clips are identical besides the temporal misalignment	PlaneStats
2	better	slow	more robust to differences between clips	Butteraugli
3	best	very slow	extremely accurate with large differences and spatial misalignments between clips	TOPIQ

fallback (optional)
Optional fallback clip in case no frame below thresh can be found. Must have the same format and dimensions as clip (or clip2 if it is set).

thresh (optional)
Threshold for fallback clip. If frame difference is higher than this value, fallback clip is used. Use "debug=True" to get an idea for the values.
Does nothing if no fallback clip is set.

device, fp16 (optional)
"cpu", or "cuda" with an Nvidia GPU. Fp16 will give a slight speed boost and half vram usage if the GPU supports it.
Only has an effect with "precision=3", which will be very slow on CPU.

debug (optional)
Overlays computed difference values for all surrounding frames and the best match directly onto the frame.

clip_num, clip_den, ref_num, ref_den (optional)
Resamples clip to match ref's frame rate. Numerator and Denominator for clip and ref (clip2 uses the same as clip). Set this only if clip and ref have different frame rates (e.g., 29.97fps and 23.976fps), as it will double processing time. Requires all input clips to be in YUV8..16 format.
To avoid removal of the wrong frames during resampling, frames are doubled, resampled, aligned, then halved again.
Example: clip_num=30000, clip_den=1001, ref_num=24000, ref_den=1001

Tips & Troubleshooting

Enums are available in vs_align/enums.py if needed.
For problematic cases of spatial misalignment, it can be helpful to chain multiple alignment calls with increasing precision.
Temporal Alignment precision=3 may need a little time on the first run, as the model needs to download first.
Temporal Alignment precision=2 and 3 are at half or quarter resolution still better than precision 1.