Mmmm....2 GeForces to play with
Chris and Jeremy love the card, it runs QERadient about
10-30 times faster. It is a hard amount to quantify, but
it is at least an order of magnitude faster.
As Rick and Jake have stated - Ghoul can make use of GL
lighting, a feature that has always been in GL but has
never been fast enough to use in a game. Using this, Gil
got about a 20% speed improvement on the GeForce.
Generally your frame rate is limited by one of several
bottlenecks - I shall endeavor to explain what these are
and in what situations they are most prevalent. I'll
try to make it as untechnical as possible and most likely
fail miserably
1. Fill rate - raw pumping of texels to the screen.
This used to be the major hold up, but with the latest
cards is not so much of an issue. For example, the TNT2
can handle 350 million marketing pixels per second, which
is enough to draw an entire 800x600 screen about 12 times
at 60fps. Even in a game with oodles of fancy spell
effects (such as Heretic2) this is plenty. Marketing pixels
are the theoretical peak rate of rendering, real world
apps can achieve about half that.
2. Texture uploading - the amount of texture in a game
This is still a major bottleneck, especially in games
with lots of procedural textures (eg Unreal). If all the
textures in a level will fit onto the cards local memory,
then this is not an issue. However, in the real world we
normally have a lot more texture than will fit for any
given scene, so the game is continually swapping textures
in and out of the cards local RAM. Setting gl_picmip in
Quake based games will reduce the memory required for
textures and so reduce the amount of uploading, but will
have the downside of reducing the final image quality.
S3TC has a similar effect on uploading without the loss
of image quality. In a game with procedural textures, the
CPU creates the texture and then has to upload it, thereby
exaggerating this bottleneck. D3d can handle this better
than OpenGL purely because you can manage your own textures
and optimise for your own special cases.
There are many factors that affect this, Voodoos are
quicker at uploading than TNTs, but they have less
memory so have to do it more. The Permedia3 has a
very interesting approach that it will only upload the
portion of the texture it needs.
3. Geometry - the number of verts in the world.
This is definitely where SoF in limited. The CPU has to
perform lots of calculations to work out where to place
a vertex on the screen. This can either be done by the
driver or by the game. We use the OpenGL transforming
for ease, and although it maybe a little slower on a
vanilla machine, it enables the driver writers to
optimise for SSE or 3dNow! instructions transparently,
and allows us to automagically take advantage of hardware
T&L. The GeForce acts as the equivalent of a parallel
Pentium processor running at about 2GHz (source : nVidia)
which purely handles the transforming and lighting,
thereby offloading a major chunk of work the CPU had
to do to a custom much faster processor. In practice this
means a much faster game.
The downside of this is you are limited by the transform
speed of the card, so if you have a 3GHz processor in your
machine, you may actually have a performance hit
Then
again, feeding the card enough verts to make this noticable
would choke the AGP bus completely. I think I would be fairly
confident in saying this will not be a problem for some
time to come.
Any OpenGL game that uses the OpenGL transform pipeline
will be accelerated by hardware T&L. I can't comment on
any other games than Heretic2, Sof and Trek, which all do.
4. Game code - the amount of processing required to run the game.
If the AI of the game is your bottleneck, no video card
is going to help!
The overall approach to getting performance is to find out
where your limiting process is, the bottleneck that
is holding up the other processes, and speeding that up.