The canonical view volume is a cube with its extreme points at [-1, -1, -1] and [1, 1, 1]. Coordinates in this view volume are called normalized device coordinates (NDC). The objective of this step is to build a transformation matrix so that a region of space we want to render, called the view volume, is mapped to the canonical view volume.

vndc=Mprojvview

Some points expressed in view space won’t be part of the view volume and will be discarded after the transformation. This process is called clipping (we only need to check if any coordinate of a point is outside the range [-1, 1] to discard it).

Later, it’ll be seen that both transformations imply division, and a neat trick is the use of projective geometry to avoid division. Any point that has the form (\alpha x, \alpha y, \alpha z, 1) can be represented as (x, y, z, \tfrac{1}{\alpha}) in homogeneous coordinates. So, we can introduce an intermediate step that transforms the points to clip coordinates and then to normalized device coordinates by doing a division with the w-coordinate: 11/α=α.

vclip=Mprojvviewvndc=αvclip

Orthographic Projection

An orthographic projection matrix is built with six parameters:

  • left, right: planes in the x-axis
  • bottom, top: planes in the y-axis
  • near, far: planes in the z-axis

These parameters bound the view volume, which is an axis-aligned bounding box.

Orthographic Projection

Orthographic Projection

Since the mapping of the range [l,r] to the range [1,1] is linear, we can use the equation of the line y=mx+b and find the values of m and b. However, we can intuitively get a similar equation by creating a function f(x) so that f(0)=1 and f(1)=1. We can create a nested function g(x) so that g(l)=0 and g(r)=1 (note that [l,r] is the input range). Then f(x) has the form:

(1)f(x)=1+2g(x)(2)g(x)=xlrl

Finally, f(x) has the form:

f(x)=1+2xlrl=lrrl+2rlx2lrl=2rlx+lrrl(3)=2rlxr+lrl

We can adapt (3) to have a similar form for the y-coordinate using t and b. These equations are transformations from view space to clip space:

xclip=2rlxviewr+lrl
yclip=2tbyviewt+btb

The zclip value will be different from the ones above since we’re mapping [n,f][1,1]:

zclip=2f(n)zviewf+(n)f(n)=2f+nzviewfnf+n=2fnzview+fnfn=2fnzviewf+nfn

The w is left untouched since the projection doesn’t imply division. The general orthographic projection matrix is:

(4)Mproj=[2rl00r+lrl02tb0t+btb002fnf+nfn0001]

The transformation matrix from view space to clip space is:

vclip=Mprojvview[xclipyclipzclipwclip]=[2rl00r+lrl02tb0t+btb002fnf+nfn0001][xviewyviewzviewwview]

Finally, note that wclip will always have the value of wview=1. Therefore, the transformation to NDC will not modify the coordinates:

[xndcyndczndc]=[xview/1yview/1zview/1]

Building the Matrix Using Combined Transformations

A simpler way to think about this orthographic projection transformation is by splitting it into three steps:

  • Translation of the bottom-left-near corner to the origin, i.e., [l,b,n][0,0,0].
  • Scale it to be a 2-unit length cube.
  • Translation of the bottom-left corner from the origin, i.e., [0,0,0][1,1,1].
Mproj=[1001010100110001][2rl00002tb00002fn00001][100l010b001n0001] =[1001010100110001][2rl002lrl02tb02btb002fn2nfn0001] =[2rl002lrl102tb02btb1002fn2nfn10001] =[2rl00r+lrl02tb0t+btb002fnf+nfn0001]

Perspective Projection

Projective geometry concepts are used in this type of projection, particularly the fact that objects away from the point of view appear smaller after projection. This type of projection mimics how we perceive objects in reality.

A perspective projection matrix is built with six parameters: left, right, bottom, top, near, far.

  • left, right: x-axis bounds for the near plane.
  • bottom, top: y-axis bounds for the near plane.
  • near, far: planes in the z-axis. The intersection point of the line passing through the origin parallel to the vector [l,b,n] and the plane far is the bottom-left-far extreme of the view volume. A similar logic is used to find all the extremes in the far plane of the view volume.

These parameters define a truncated pyramid, also called a frustum .

Perspective Projection

Perspective Projection

General Perspective Projection Matrix

The mapping of the range [l,r] to the range [1,1] can be split into two steps:

  • Project all the points to the near plane. This way, all the x- and y-coordinates will be inside the range [l,r]×[b,t].
  • Map all the values in the range [l,r] and [b,t] to the range [1,1].
Top view of the frustum

Top view of the frustum

Side view of the frustum

Side view of the frustum

Let vview be a vector in view space which is going to be transformed to clip space. By similar triangles, we see that the value of xp and yp (the coordinates projected to the near plane) is:

(5)xpxview=nzviewxp=nxviewzview(6)ypyview=nzviewyp=nyviewzview

Note that both quantities are inversely proportional to zview. What we can do is manipulate the coordinate so that it has a common denominator:

[nxviewzviewnyviewzviewnzviewzview]T=[nxviewnyviewnzview]Tzview

The point in homogeneous coordinates is:

[nxviewnyviewnzview1zview]T

OpenGL will then project any 4D homogeneous coordinate to the 3D hyperplane w=1 by dividing each of the coordinates by w. Note that this division operation isn't done by the application but by OpenGL itself in a further step on the rendering pipeline.

We can take advantage of this process and use zview as our w. With this in mind, we can construct a transformation matrix so that transformed points have w=zview:

(7)[xclipyclipzclipwclip]=[............0010][xviewyviewzviewwview]wclip=zview

Where xclip,yclip,zclip,wclip are expressed in terms of the clip space. When each coordinate is divided by wclip, we’ll have NDC:

[xndcyndczndc]=[xclip/wclipyclip/wclipzclip/wclip]

Next, xp and yp are mapped linearly to [1,1]. We can use the function to perform linear mapping (3):

xndc=2rlxpr+lrl(8)yndc=2tbypt+btb

Next, we substitute the values of xp (5) in xndc (8):

xndc=2rlnxviewzviewr+lrl=2nrlxviewzviewr+lrlzviewzview=(2nrlxview+r+lrlzview)/zview

Note that the second fraction is manipulated so that it’s also divisible by zview. Also, note that the quantity in the parenthesis is in clip space coordinates: xclip.

xclip=2nrlxview+r+lrlzview

Similarly, the value of yclip is:

yclip=2ntbyview+t+btbzview

Then the transformation matrix seen in (7) is now:

(9)[xclipyclipzclipwclip]=[2nrl0r+lrl002ntbt+btb0....0010][xviewyviewzviewwview]

Next, we need to find the value of zclip. Note that the projected value is always a constant because the zclip component depends on zview and is also divided by zview. We need zclip to be unique for the clipping and depth test. Plus, we should be able to unproject it (through an inverse transformation).

Since zndc doesn’t depend on xview or yview, we can borrow the w-coordinate to find the relationship between zndc and zview. With that in mind, we can make the third row of (9) equal to:

(10)[xclipyclipzclipwclip]=[2nrl0r+lrl002ntbt+btb000AB0010][xviewyviewzviewwview]

Then zndc has the form:

zndc=zclipwclip=Azview+Bwviewzview

Since wview=1 in view space:

zndc=Azview+Bzview

Note that the value is not linear, but it needs to be mapped to [n,f][1,1]. Substituting the desired output range [1,1] as zndc, we have a system of equations:

{1=An+Bn1=Af+Bf{An+B=nAf+B=f

Subtracting the second equation from the first:

An+B+AfB=nfA(fn)=nfA=f+nfn

Solving for B given A:

f+nfnn+B=n
B=nf+nfnn=fn+n2fnn2fn=2fnfn

Substituting the values of A and B in (10), we have the general perspective projection matrix:

(11)Mproj=[2nrl0r+lrl002ntbt+btb000f+nfn2fnfn0010]

Symmetric Perspective Projection Matrix

If the viewing volume is symmetric, i.e., r=l and t=b, then some quantities can be simplified:

r+l=0,rl=2rt+b=0,tb=2t

Then (11) becomes:

(12)Mproj=[nr0000nt0000f+nfn2fnfn0010]

Symmetric Perspective Projection Matrix from Field of View/Aspect

gluPerspective receives, instead of the x and y bounds, two arguments:

  • field of view (fov), which specifies the field of view angle in the y direction.
  • aspect (aspect), which is the aspect ratio that determines the field of view in the x direction, calculated as xy. The value is commonly screen widthscreen height.
fov

fov

We see that the value of t (top) is:

(13)tan(fov/2)=tn(14)t=ntan(fov/2)

We can find the value of r (right) with the aspect ratio:

(15)aspect=2r2t=rt(16)r=aspectt(17)=aspectntan(fov/2)

Substituting (14) and (17) in (12):

(18)Mproj=[1aspecttan(fov/2)00001tan(fov/2)0000f+nfn2fnfn0010]