Glossary

The AI video dictionary

Every term you'll meet making video with AI — from camera and codec basics to diffusion, avatars and agentic generation — defined in plain English.

74 terms · Video · AI · AI video

74 terms
Agentic videoAI video
An AI agent that plans and runs the whole production — script, scenes, voice, avatars and edits — from a single brief, rather than one clip at a time.
AI avatarAI video
An AI-generated or cloned on-screen presenter that can speak your script in a chosen voice and language.
AI dubbingAI video
Replacing or adding spoken audio in another language, ideally matched to the speaker's voice and lip movements.
Aspect ratioVideo
The width-to-height proportion of the frame — 16:9 (widescreen), 9:16 (vertical for Reels and TikTok), or 1:1 (square). It decides how your video fits each platform and screen.
B-rollVideo
Supplemental footage cut in over the main shot to add context, illustrate a point, or hide an edit.
BitrateVideo
How much data is used per second of video, measured in kbps or Mbps. A higher bitrate keeps more detail but makes larger files.
BokehVideo
The soft, pleasing out-of-focus blur in the background of a shot, often rendered as glowing circles of light.
Camera controlAI video
Directing virtual camera moves — pan, zoom, orbit, dolly — within an AI-generated shot.
Captions / subtitlesVideo
On-screen text of the spoken audio. Captions also note sounds and speakers for accessibility; subtitles usually transcribe or translate the dialogue.
CheckpointAI
A saved snapshot of a model's weights. Checkpoints are often shared as the downloadable 'model file' people run.
Chroma key (green screen)Video
Replacing a solid-coloured background — usually green — with another image or video by making that colour transparent.
CodecVideo
The algorithm that compresses and decompresses video — such as H.264, H.265/HEVC, AV1 or VP9. It balances visual quality against file size.
Colour gradingVideo
The creative step of adjusting colour, contrast and mood of footage in post-production to give it a consistent, intentional look.
CompositingVideo
Layering multiple visual elements — footage, graphics, effects, text — into a single combined frame.
Container (file format)Video
The file wrapper that holds the video, audio and metadata together — MP4, MOV, WebM or MKV. It is separate from the codec stored inside it.
DatasetAI
The collection of examples — videos, images, text — that a model is trained on. Its quality and variety shape what the model can do.
Deep learningAI
Machine learning that uses many-layered neural networks. It powers today’s image, video, voice and language models.
DeepfakeAI video
Synthetic media that realistically swaps or fabricates someone's face or voice. Powerful, but it raises real consent, authenticity and legal concerns.
Depth of fieldVideo
How much of the image is in sharp focus. A shallow depth of field blurs the background to make the subject pop.
Diffusion modelAI
The dominant approach behind AI images and video: the model starts from random noise and, step by step, removes it until a coherent result matching your prompt appears.
Digital human / digital twinAI video
A photorealistic AI replica of a real person, trained once and reused as an on-camera presenter.
EmbeddingAI
A list of numbers (a vector) that captures the meaning of text, an image or audio, so the model can compare and combine different inputs.
Establishing shotVideo
A wide opening shot that sets the location and context of a scene before you cut in closer.
Fine-tuningAI
Further training a base model on specific data to specialise it — for a particular style, brand or person.
First & last frameAI video
Supplying a start frame and/or an end frame that the model animates between, giving you precise control over a shot's beginning and end.
Foundation modelAI
A large, general-purpose model trained on broad data that can be adapted to many downstream tasks.
Frame interpolationAI video
Generating in-between frames to raise the frame rate or smooth motion — for example turning 24fps into silky 60fps.
Frame rate (FPS)Video
Frames per second — how many still images play each second. 24fps feels cinematic, 30fps is standard for the web, and 60fps looks ultra-smooth for motion and sports.
GANAI
Generative Adversarial Network — an earlier method where a generator and a critic network compete. Largely replaced by diffusion for high-quality video.
Guidance scale (CFG)AI
How strictly the model follows your prompt versus improvising. Higher values stick closer to the words; lower values give the model more creative freedom.
HallucinationAI
When a model produces confident output that is wrong or invented — like garbled text, extra fingers, or impossible motion.
HDR (High Dynamic Range)Video
Video that carries a wider range of brightness and colour than standard (SDR), for more lifelike highlights, shadows and richer tones.
Image-to-video (I2V)AI video
Bringing a still image to life as a video, often guided by a prompt that describes the motion you want.
InferenceAI
Running an already-trained model to produce an output — for example, generating your video from a prompt. This is what you pay for per generation.
Inpainting / outpaintingAI video
Filling in part of a frame (inpainting) or extending beyond its edges (outpainting). In video, used to remove, replace or expand regions over time.
KeyframeVideo
In editing, a marked frame that sets a value (position, scale, opacity) the software animates between. In compression, a full reference frame that nearby frames are rebuilt from.
Latent spaceAI
A compressed mathematical representation where the model actually works. Generation happens here first, then gets decoded into visible pixels.
LetterboxingVideo
Black bars added above and below (or beside) a video so it fits a different aspect ratio without cropping the picture.
Lip syncAI video
Matching a character or avatar's mouth movements to spoken audio so it looks like they are really saying the words.
LoRAAI
Low-Rank Adaptation — a lightweight way to teach a model a new style, character or concept with a small add-on file, instead of retraining the whole model.
Lower thirdVideo
Text placed in the lower part of the frame, typically a speaker's name and title, or a caption.
LUT (Look-Up Table)Video
A preset that remaps colours to apply a specific look in one click, or to convert footage between colour spaces.
ModelAI
A trained AI system that turns an input — like a text prompt — into an output, like a video. Different models have different strengths, speeds and prices.
Motion control / motion brushAI video
Tools that let you direct where and how things move in a generated clip, instead of leaving it entirely to the model.
MultimodalAI
A model that understands or generates more than one type of data at once — for example text, image, video and audio together.
Negative promptAI
A description of what you do NOT want in the output. It steers the model away from unwanted objects, styles or artefacts.
Neural networkAI
A model loosely inspired by the brain: layers of connected 'neurons' that learn patterns from data. It is the foundation of modern generative AI.
Open-weight modelAI
A model whose weights are published so anyone can run, study or fine-tune it (e.g. on fal or locally), as opposed to a closed model reached only through an API.
Parameters (weights)AI
The internal numbers a model learns during training. They store what the model 'knows'; more parameters can mean more capability.
PromptAI
The instruction you give the model — usually text, sometimes plus an image — describing the video you want it to create.
Prompt engineeringAI
The craft of wording prompts so the model reliably produces the result you intend, including subject, style, camera and mood.
Reference imageAI video
An image you give the model to guide the subject, character or style of the generated video.
Render / renderingVideo
Processing a project into a finished video file — or, in AI, the model generating frames into a final clip.
ResolutionVideo
The pixel dimensions of each frame, written width × height (e.g. 1920×1080). More pixels means more detail. Common tiers are 720p (HD), 1080p (Full HD), 4K and 8K.
RLHFAI
Reinforcement Learning from Human Feedback — training that uses people's preferences to align a model's outputs with what humans actually want.
Sampling stepsAI
How many iterations a diffusion model takes to turn noise into the final frame. More steps can mean higher quality but slower, costlier generation.
SeedAI
The starting random number for a generation. Reusing the same seed with the same prompt reproduces the same result — handy for consistency and small tweaks.
ShotVideo
A single continuous piece of footage. Common types include the wide shot, the medium shot and the close-up.
StoryboardVideo
A planned sequence of sketches or frames mapping out each shot before you produce or generate a video.
Style transferAI video
Applying the visual style of one reference to your own footage or generation.
Talking headAI video
A video centred on a person speaking to camera — the classic use case for AI avatars and presenters.
Temporal consistencyAI video
Keeping characters, objects and style stable from frame to frame so the video doesn't flicker, warp or morph unnaturally.
Text-to-speech (TTS)AI video
Turning written text into natural spoken audio with a synthetic voice — the engine behind AI voiceovers.
Text-to-video (T2V)AI video
Generating a video clip directly from a written description — no camera, actors or stock footage required.
TokenAI
The smallest chunk of input a model processes — a piece of a word for text, or a patch or frame for video.
TrainingAI
Teaching a model by showing it huge amounts of data and gradually adjusting its internal parameters until it produces good results.
TransformerAI
A neural-network architecture built on 'attention,' which weighs how parts of the input relate. It underpins large language models and many modern video models.
TransitionVideo
How one shot changes into the next — a hard cut, a dissolve, a fade, or a wipe.
UpscalingAI video
Using AI to increase a video’s resolution — say 1080p to 4K — adding plausible detail rather than just stretching pixels.
Video extensionAI video
Continuing a clip past its original length by generating additional frames that follow on naturally.
Video-to-video (V2V)AI video
Transforming an existing clip into a new style or look while preserving its original motion and timing.
Voice cloningAI video
Recreating a specific person's voice from a short sample so it can speak new text in that same voice.
WatermarkVideo
A logo or text overlaid on a video to mark ownership. Many free AI generators add one; paid plans usually remove it.
World modelAI video
An AI that builds an internal simulation of how scenes, objects and physics behave, helping it generate longer, more coherent and consistent video.

From prompt to render: the language of AI video

Making video with AI sits at the crossroads of two worlds — decades of filmmaking and video-production vocabulary, and the fast-moving language of machine learning. This dictionary brings both together, plus the new terms unique to generative video, so you can read any tool, tutorial or model card with confidence.

Start with the fundamentals: aspect ratio, resolution, frame rate and codecs decide how your video looks and where it plays. Then the AI layer — models, diffusion, prompts, seeds and LoRAs shape what gets generated. Finally the AI-video specifics — text-to-video, image-to-video, lip sync, avatars, temporal consistency and world models — describe what today’s generators can actually do.

Every definition is written in plain English, no maths required. Search by keyword, filter by topic, or browse A–Z — then put the vocabulary to work in the Vivideo studio.