Controlling Frame Rate (FPS)
This page gives an overview of what kind of factors may affect Frame Rate (FPS or Frames per Second) within a game, and how to optimize design so as to bring problems under control, and optimize the speed that graphics are rendered.
We'll try to help explain all that we can, but at the end of the day, especially in an online game environment, your game's design, your art, and the types of computers you plan to require your players to have in order to play your game will be the main contributing factors to providing good performance to your playerbase.
HeroEngine allows developers to create intricate areas filled with an unlimited amount of details. Unfortunately, this means that it's easy to cause framerate to drop below acceptable levels. By the time a developer starts to address this problem and decides to improve framerate, the cause/s of the framerate problems may be very difficult to identify and fix. This is because there are many contributing factors that lead to framerate problems. Before being able to figure out what needs to be fixed, it is necessary to understand the contributing factors and how each part affects the whole.
Why is FPS important?
Frames Per Second (FPS) is the lifeblood of any 3D Game. When FPS drops below 30fps, the human eye can start to detect that the 3D image starts to appear jerky and "laggy". The lower the FPS, the worse this effect. FPS at or above 15fps is sometimes still considered acceptable, as the game still appears to respond appropriately to user input. The user can also still perceive what's happening on their monitor. Under 15fps, gameplay starts to become more difficult, and at around 8/9 fps, the gaming experience starts to hit the "can't play that stupid game" state. It doesn't matter how beautiful a game may be -- if FPS is poor, then players will probably think the game is poor.
But keep in mind, Frames Per Second, have only to do with what's being displayed on the client, it has nothing to do with client to server communication, or internet speed, or routing. Frequently players confuse the two concepts of "lag". Low FPS, or graphical lag, is because to much is happening on the client. Gameplay lag, or slow responsiveness, is frequently due to the client having to take too long to get a message back from the server, but this can frequently be overcome by smart game development, and by learning to not require on direct asynchronous feedback from the server at all times. But that's for another page.
Presumptions this document makes
For the purposes of this discussion, the CPU and the video card are considered independent of each other. In other words, the CPU processes its data at the same time as the GPU processes its data. There are some dependencies between the two but it's useful to treat them as completely separate.
The Rules of Thumb and recommendations here are BEFORE characters, ability effects, NPCs and creatures are added. This includes everything else that goes into designing an area.
The two key parts that affect frameRate: CPU and GPU
CPU Side: All FPS boils down to two main aspects, the CPU and the video card (GPU). HeroBlade runs on the CPU side of the equation. It sets up graphical data and feeds it to the video card. It also runs scripts, animations and a few other things.
GPU Side: The video card receives data from HeroBlade, processes that data, runs the shader code on that data, and then presents the end result as the 3D game world. Note: Our rendering engine also sometimes takes advantage of the GPU by offloading some tasks to the GPU in order to further relieve the burden on the CPU, but that is outside of the scope of this particular document.
Video cards are like CPUs (Central Processing Units), which is why they are often referred to as GPUs (Graphical Processing Units) because they are like having a mini-computer inside your computer. The CPU can run HeroBlade only as fast as it's able. The GPU can only process and present graphical data to the monitor at a fixed maximum rate. The faster/better your CPU, the faster HeroBlade can process data and hand it to the GPU and the faster will be the FPS. The faster/better the GPU, the more graphic data it can process and the faster it can process it. If a CPU can present to the video card 1,000,000 bits of data per second, but the GPU can only process 750,000 bits of data per second, then the CPU must wait for the GPU an extra 1/4 second to finish processing the extra bits. Slow video cards can slow down CPUs (GPU-bound). If a CPU can present to the video card 750,000 bits of data per second and the video card can handle 1,000,000 bits of data per second, then the CPU has failed to take full advantage of the video card's power (CPU-bound). Slow CPUs waste GPU power.
There is a third factor and that's bandwidth between the CPU and GPU. It's possible that the CPU may be so fast that it can prepare more data than can be sent to the GPU in the time available, even if the GPU can actually process all that data if it had it. However, this is not considered as a factor in FPS in the information on this page because the machines and video cards used by developers will rarely encounter this limitation.
There's a fourth factor, which involves how much system RAM you have and whether your computer is "hitting" Swap files (your hard drive is getting accessed constantly even when you are not moving the 3D view or doing anything with HeroBlade). This is worth knowing about, but the only solutions are: Get more RAM; or reduce the memory requirements of HeroBlade; or reduce the number of assets/instances in an area. This situation is not factored into this discussion but there may eventually be a maximum memory size allowed for any single area which will be enforced differently and so should not be a factor in FPS.
CPU-Bound This term is used when the CPU side of the equation can no longer process and present data to the GPU at an FPS that's considered acceptable. It also applies to CPUs that cannot keep up with the GPU. That is, the GPU can process its data faster than the CPU can present it to the GPU (even if FPS is fine).
GPU-Bound This term is used when the GPU cannot process the data given to it faster than the CPU can send it. This happens with older video cards.
Area This refers to the entire area, all its rooms and assets, whether they are instantiated or not and whether they are in the scene (view frustrum) or not.
Scene or View Frustum This refers to what the engine "thinks" you can see in the current scene. This is what it actually draws and renders. If a portion of any asset instance is visible in the view, the entire asset is processed and presented to the GPU.
Culling This refers to algorithms that remove geometry from being processed any further if that geometry can't be seen. Our Rendering engine has three methods of culling:
- The first is room-based culling which is controlled entirely by the developers.
- The second is View Frustum culling, where the engine determines which assets are outside the view (outside the upper, lower, left and right edges of the monitor) and can't possibly be seen. Those assets are not processed and passed on to the GPU. Note that it takes some effort to cull out assets from the view.
- Additionally, HeroEngine utilizes a third-party technology call Umbra (formerly dPVS) to perform occlusion culling.
Over Draw This happens when you have two objects that partially overlap each other (occlusion culling takes care of the situation where one object completely obscures the other). The one behind is drawn first and then the one in front is drawn over the one behind. The overlapping part is where Over Draw occurs. Rooms are our only way of controlling how much overdraw we have. But, in the end, there will be lots of over draw. Even the most efficient 3D engines have lots of overdraw. The only way you can control over draw is to avoid overlapping geometry where possible. In forests, for instance, there is a huge amount of overdraw happening with trees. Efforts should be made to place fewer trees and spread them out so there's less over draw.
Context Switches This term relates to the GPU. The basic thing is that this is a very slow operation (relatively speaking) and so context switches are bad. Every different Hero Material used in a 3D scene has the potential to cause a context switch on the GPU. In the statistics panel, the statistic that most closely relates to this potential problem is the "buckets" statistic. Each bucket represents a different Hero Material which "could" cause a context switch on the GPU. There's no way to tell if that's happening. The general rule of thumb here is that you want as few buckets as possible so you should have as few different assets (that use different materials) as possible.
Hero Material This is a group of textures with some information on how to use those textures. Each bucket in the statistics panel equates to a different Hero Material. Each new asset you add to your Areas has the potential of adding a new Hero Material (bucket) to the rendering pipeline. The rendering pipeline is simply the steps the game engine takes in the process of preparing data to be sent to the GPU. The rendering pipeline ends at the point where the data is sent to the GPU.
Basic components that affect FPS
This is a list of only some of the many things that affect FPS in the rendering engine. The scope of this section is simply to convey some basic issues, and certain key highlights to explain what's going on.
Polygons (Polys) or Primitives This is the key number in relation to framerate. Each asset is built around a mesh of triangles (Polys) which are referred to as primitives. They are called primitives because they are the basic building block of the 3D world. Everything is done against primitives (textures, glow effects, shader effects, etc.). When you look at the number of primitives being rendered in a View, that is the single most important factor affecting FPS. Buckets and HeroMaterials are applied to primitives. Controlling this one number impacts both the CPU and GPU. IF there was one thing to reduce to improve FPS, it would be the number of primitives in view at any one time.
Textures / Hero Materials In Hero's Journey, our textures are part of Hero Materials, which are made up of several textures and some information on how the engine should use those textures. In the Statistics panel, the buckets represent how many Hero Materials that are currently being displayed. The higher this number, the more likely you will affect framerate.
Controlling Buckets: You can control how many Hero Materials are used in a scene by limiting the number of different assets which use different Hero Materials. If you have ten assets but they all use the same Hero Material, then it results in only one bucket. However, two assets that seem to use the same Hero Material can result in two buckets instead of one. This is because the settings in the two Hero Materials are different (they use different shaders).
Each Draw Call is a single mesh. An asset can be made up of more than one mesh (which means it has more than one Hero Material on it). So, this number roughly is the number of different instances of assets that are currently being rendered. It's much more efficient for the CPU and the GPU to have fewer draw calls.
Controlling Draw Calls: You can control the number of draw calls by reducing the number of asset instances in your view. Favor deleting the smallest instances first.
World Updates is the entire process of gathering, preparing and setting up data to be sent to the GPU. It's used here as a catch all for things that don't fit into the Buckets, Draw Calls or Primitives. The more you have of any of the following, the more work the CPU must do to process the data. The less the CPU has to process, the higher your FPS.
Particles: These should rarely be set to COLLIDE unless there is a very clear need for it, and very few collide particles are created. Unsorted billboard particles result in only one additional draw call. Particles which use a model result in one draw call per each particle. Sorting a particle can result in more than one draw call depending on how many different particles are in the scene.
The Rule of Thumb: Use Unsorted, billboard particles that don't collide whenever possible. Use sorting and collide only when necessary.
Collision Detection: Anything that is set to collide is put into the collision detection system. The greater the number of assets set to collide, the slower the engine runs. This isn't usually a problem, but when tweaking for FPS, overriding anything that's collide but will never actually be reachable by players will help.
Rule of Thumb: Any asset instance that can't possibly collide with any player should be set to IGNORE
Lights: Lights double the number of draw calls on any asset instance that the light touches.
Mirrors: Mirrors double the number of draw calls on any asset instance that the mirror reflects. Judiciously set instances to NOREFLECT and use the other techniques described in the documentation.
Scripts: Poorly written scripts (or even well written ones) can take a lot of time to run. If your area is script heavy, you need to do some testing and then optimize code where you can. The flashing HeroScript Performance warning message in the viewport will let you know when scripts are considered a problem, but even if that message is not flashing, speeding up scripts is still a useful way to speed up FPS.
Primitives Modern GPUs can gobble up lots and lots of primitives. When you look at the specs on a video card box, it will boast of how many triangles it can process in a second. They are referring to primitives, but it's based on a best case scenario. When you add in shader effects and multiple textures, many different draw calls and buckets, the real number of primitives a video card can handle is actually only a fraction of what they boast their card can process. In the real world, the quality of the graphics you're trying to achieve can greatly diminish how many primitives can be processed by the video card. And, of course, games must present graphics at 15fps or higher, which is why the actual number of primitives a video card can process in a game engine seems so much less than the marketing hype you read on video card boxes. In general, however, it's not the number of primitives that slow down a GPU, as much as it's the Textures and Shader effects that are applied to those primitives that bog down the GPU.
Shaders All Hero Materials can designate what kinds of shader effects are applied to it. Shader effects include bump mapping, displacement mapping, specularity, and so forth. Shader effects do not add to the CPU load, but they do add to the GPU load. Shaders are all handled by the GPU. Modern GPUs can handle this, but it does impact framerate. For instance, if you create a new area, drop in one terrain piece and then you add in a low poly asset that has a HeroMaterial on it with a variety of shader effects, you will probably notice that framerate drops noticeably. It drops partly because of CPU load, but it is actually because of the added load on the GPU that you're seeing the drop in FPS. Clone that instance up to 50 times and you'll see FPS drop significantly. This is almost all due to the GPU load and not the CPU. You can test whether or not your area is CPU-bound or GPU-bound by going to the render panel and turning off the shader effects. If FPS improves a decent amount, that's a good indication that you're GPU-bound. If the FPS doesn't change much, then this is a good indication that you're CPU-bound.
Rule of Thumb: Use fewer assets with fewer Shader Effects.
The above rule is not easy to follow. However, you can know what kinds of shader effects are being used on an asset. While in HeroBlade, press CTRL-T. This will bring up a tool tip as you hover over asset instances, that gives you information about what effects the artists have added to that Hero Material. See #Misc Tools below for more information. These can be discerned from the file names listed near the bottom of the tool tip.
Context Switches Context Switches are bad for GPUs. These can happen each time a new texture is passed to the video card. Each time a new texture is given to the video card, there is a chance that a context switch might occur. A new texture can be presented to the video card for many different reasons and there's no way to tell if context switching is happening at any particular moment or due to any particular texture change. The only thing you can do about this is to first understand that every bucket is a possible context switch. Every different particle (not each instantiation of that particle) is a possible context switch. Sorted particles can create multiple opportunities for context switches.
Rule of Thumb: Keep Buckets to a minimum, keep Draw Calls to a minimum, Keep the number of different particles to a minimum, avoid sorting particles.
Draw Calls Each draw call is work the GPU has to do. It likes to eat meshes in large chunks and hates to eat small nibbles. There is a good deal in HeroEngine that minimizes Draw Calls, but there are some things that you can do as a designer to help or to help avoid excessive Draw Calls.
First, pay attention to the statistics panel. Avoid using lots of tiny assets to block a large area. Use larger assets whenever possible.
Rule of Thumb: Fewer Assets Instances are better. Use bigger assets in place of lots of smaller.
All of the above are heavily influenced by the number of objects that are visible in the scene. One way to minimize the number of visible objects is through occlusion culling.
The Statistics Panel (CPU / GPU)
- Main page: Statistics panel
This is your best friend for identifying FPS problems early on and for diagnosing FPS problems in current areas. When working on FPS issues, keep this panel open to the "Render" group. Keep in mind, though, that this panel impacts framerate. There is plenty of useful information in this panel. However, three items are particularly important.
The number of triangles / polygons that are being drawn to the screen. On modern computer hardware, this number is typically in the range of 2 million to 4 million. However, on older systems, this number may be as low as 100,000. In any case, drawing fewer primitives will usually result in higher frame rates.
The number of times the engine needs to switch between textures in the current view. This number should not be excessive.
The number of separate calls to the video card to draw meshes. Assets can be made of of multiple meshes. Sometimes meshes are combined into one draw call because they use the same texture (billboard particles). So, this number isn't equal to the number of instances, though it's probably close. This number should be no more than 1000.
Identifying which shader permutations are being used (GPU)
Identifying Shader Effects CTRL-T (how to Identify Shader Effects) This keyboard command brings up a tool tip when you hover the cursor over any asset. The section labled, "Material" lists all the components of the HeroMaterial being used for that instance. You have to go by the name of the file, which will include the words, bump, spec (for specularity), macro, etc. These identify the various shaders. The information under Material includes other information, such as whether it is set to walk, collide, etc. These are not shader effects. The effects appear under the .dds texture file near the bottom of the list of information.
In the image above, under MATERIAL, the shader effects being use are identified right after the .dds file name.
Baselines (how to calibrate your computer)
Here, you actually care less about your computer than you do about your player's computers. The differences between 4 year old machines, and new machines can be light years. But the difference between a 2 year old machine with a dedicated video card, and a modern day machine with an integrated Intel Video Chipset using all shared memory, is also lightyears. But this difference is in the wrong direction. Serious game developers and world builders are always going to have a minimum spec machine, a target spec machine, and a cutting edge machine to test their games against. Even making single player games brings a complicated balance between great looking worlds and what your minimum spec for your players would be. After all, there's a reason most AAA multiplayer games end up putting people in tiny maps, or big maps with lots of corridors.
Making an MMO or even just an arena based FPS game brings even more complexity. How many players will be in one area? How many particles will they be casting, of how many different types, and how many texture atlases will they all be loading into memory? How do you plan to handle if 500 people are all standing in one place? Will you script your game to only show 100 people? To only show people that matter to the player? To cause everyone's textures to downrez to a 128by128 file? It's all up to you. But we encourage you to at least keep a minimum spec machine around to test on.
Interesting side note, just using our starting Fantasy Lowlands area in the HeroCloud: A modern day gaming PC from Dell runs that area at 200 frames per second. A modern day gaming laptop from Alienware runs it at 150 Frames Per Second. A modern day Macbook Pro (running windows) runs it at 90 Frames Per Second. A middle of the road home office PC with a dedicated video card (from somewhere like Best Buy) runs it at 80 FPS, a lower end home office Laptop, with a dedicated video card and dedicated memory runs it at 45 FPS, a lower end home office PC with an integrated intel chipset runs it at 30FPS, and a lower end home office laptop with an integrated intel chipset runs it at 20FPS.
The the last two machines in that list make up 60 percent of the computers out there, so it's important to keep that in mind.
Framerate Graph (CTRL-F)
CTRL-F brings up a framerate graph with useful information for the developer:
- RED: This is the frames per second.
- BRIGHT YELLOW HORIZONTAL LINE: Marks the 50fps divisions (useful only for the FPS (red) line)
- DARK GREENISH/YELLOWISH HORIZONTAL LINES: Marks 10fps divisions (useful only for the FPS (red) line)
- BLUE: Number of Draw calls
- CYAN: Number of calls to SetTexture
- GREEN: KBytes of data received from server
- YELLOW: Minimap texture re-render
- PINK: Area server ping
- ORANGE: Area server message round-trip time (for internal use)
Warning Lights (CPU side only)
On the left side of the viewport, it is common to see words flashing from time to time. These are the "warning lights" which are there to help developers understand what might be causing FPS problems. For example, if HeroScript scripts are using too much CPU, the message "HeroScript Exec" will flash on the left side of the screen.
For a complete list of warning messages, see Performance warning.
Q: What DO blockers really do to affect frame rate in a long line of sight situation? Under what conditions?
A: Blockers are barriers used to control visibility so that your primitive counts, buckets and draw calls can be kept within reasonable limits. Long lines of sight are only problematic because they make it easier to break these limits. In towns they make it harder to separate out the inside of a building from the outside for visibility purposes, though there are ways to overcome this.
Q: When does an asset instance actually get rendered?
A: First, the asset must be in a visible room. Then, an asset gets rendered only when the rendering engine thinks the instance is visible within the view frustrum. An asset can be visually (you don't see it with your eyes) off the screen, but the engine may think it's in the view and so renders it. If any part of an asset is considered visible (no matter how small), the entire asset is rendered. It's an all or nothing thing. Anything that is not within the view frustum (according to the engine) is culled out and not rendered. Large numbers of unrendered assets still have an impact on framerate because some processing power is needed to determine if they are within the view frustum in the first place and marked so they won't be sent to the video card for rendering.
Q: Does audio substantially affect framerate?
A: Audio is generally not a framerate problem. However, that assumes hardware-accelerated sound. On non-hardware accelerated sound cards, there is a more noticeable framerate hit. You want as few sounds in an area as possible, however...perhaps no more than 10. Future changes to the way we deal with sound may change this, but probably only for the better.
Q: When I see 99 trees in the stats panel, where are the 99 trees?
A: They are in your view frustum. They may be behind other assets or just barely off the edge of the screen, but the culling has not removed them because some invisible part is still within the frustum.
Q: Why do the numbers on the stat panel change (slightly) even though I have not moved?
A: The answer depends on what stats you're talking about. It's normal that certain stats fluctuate, such as memory, even though you've not moved your view around. If you're talking about numbers of primitives, draw calls or buckets, then it's possible your viewport did, indeed move slightly (even though you didn't see it move) according to the rendering engine's internal calculations. It's also possible that the movement of creatures or NPCs or other moving assets move close enough to the edge of your screen that they are no longer being culled.
Q: What can be learned from looking at an area in collision mode and what do red, green and white really mean?
A: White means collide ignore, Green means it's a walkable surface and Red means it's not walkable but collides with the character or camera. Red and Green go through the collision detection system. White is completely outside the collision detection system. Generally speaking you want anything that's never going to collide with the camera or the character to be set to no collide/white. In some cases, complex buildings or other assets are set to white/no collide and a very rough outer collide shell is placed very close to it. This is why you sometimes see a white surface and still appear to collide with it (the collision poly is just under the surface). Note: Physical collision of the character controller is provided by the Physics plug-in, which has a much greater detail debug rendering system.
Q: Would it make sense to make a "standard" area of perhaps one or two completely dressed rooms that can be used for measurement purposes? This way a developer could go there, see that on his or her machine they get 60fps and know that when they are building, they should have around 60fps as a goal. A different developer might get 30 fps in the same area, and would know that on his or her system, the aim should be around 30 fps.
A: Yes. Specifically, it's recommended to create a single area in multiple states between initial layout and final details. That way a designer can see what the fps should be at each stage.
Keeping Primitive Counts Low
Keeping primitives low in a well dressed area is difficult to do. The following are suggestions but should help make it easier to control FPS as you build your area. Always be on the look out for creative ways to place assets so that you get the maximum effect with the fewest assets possible.
- Choose your Assets based on their poly/primitive count.
- Create a library for your area and populate it with assets as you use them.
- Only place assets into your area that exist in this library (never cheat, always add any new assets to this library)
- This library will be your first line of defense at controlling the number of buckets stat
- Identify 5-10 trees and other plants assets you plan to use for your area...don't use any more than this...less is better
- Identify a limited number of general dressing assets (5-10?) that you plan to use througout your area.
- Re-use is king. Try to do more with less, always.
- Build the layout of your area first and break into rooms later if you need to drastically improve FPS, but UMBRA will do much of this for you.
- Do not dress the area out until you're confident with its performance: large things are easy to change, dozens of tiny things are annoying to change later.
- It's OK to see popping if it's minor. Use dressing to hide any popping...be aggressive and keep framerate high. Don't fear popping at this point.
- Use the lowest poly assets first, preferably larger assets to make your initial effort at detailing your area.
- In all cases, lower poly assets are preferable to higher poly assets unless you can't get the necessary visuals (subjective stuff)
- Dynamic details can impact framerate significantly.
- If FPS is an issue, make a copy of your desired dynamic detail and then remove details as needed.
- If necessary, create your own particle/s and emitter/s and add them to your dynamic detail and then adjust the particle/emmitter for framerate.
- Mirrors and Lights will probably force you to reduce the amount of dressing. Do NOT ignore fps at this juncture. Go back and prune your details
- Mirrors DOUBLE the number of primitives on every asset instance that is reflected.
Use mirrors only with care and keep an eye on primitive count.
- Lights DOUBLE the number of primitive on every asset instance that is affected by that light
Isolate the lights to see that they affect only enough assets to get your affect. Avoid lighting asssets that are also reflected in a mirror Avoid overlapping lights wherever possible.
- You need to push for quality but never lose site of the FPS
- Look for ways to reduce poly counts by introducing lower poly assets.
- Higher poly assets should be easily seen where their impact can justify their higher cost.
Strategies for Increasing FPS in areas with Low FPS already
First off, accept that it's going to hurt to fix the problem. Small changes are not going to increase FPS enough. You have to plan for making big reductions in your primitive count, in your buckets and in your draw calls.
- Pick one room and work on fixing that one room, then move on to the adjacent rooms.
- Fixing that one room will cause you to fix the other adjacent rooms as well.
- Do some testing...delete massive quantities of various assets until fps is acceptable
This will help you get an idea of what you need to do...don't be too shocked, it'll probably be a lot of assets being deleted or replaced
- First, look for High poly assets and then get rid of them or replace them
- Second, look for details that can be deleted without impacting the look of the area noticeably
- Try shifting things around or using lower poly assets as fillers
- Third, check for mirrors and lights.
- Check for too many lights, make sure they are not lighting up too much and overlapping with other lights or lighting thigns that are mirrored
- Fourth, Re-cut your rooms and change your area layout to reduce the number of primitives.
- Check how many textures you have loaded in the scene, and if you are using a large amount of terrain textures, evaluate if they are using alpha, and being overlapped in a number of places.
- Ensure that your dds files are as small as they can be, and that they have been exported with the correct dxt settings.