When you want to squeeze the very last ounce of performance from your app, you should always remember to follow a golden set of best practices. These rules are categorized into three major parts: general performance, memory bandwidth and memory footprint. This chapter will guide you through all three.
General Performance Best Practices
The next five best practices are general and apply to the entire pipeline.
Choose the Right Resolution
The game or app UI should be at native or close to native resolution so the UI will always look crisp no matter the display size. Also, it is recommended (albeit not mandatory) that all resources have the same resolution. You can check the resolutions in the GPU Debugger on the dependency graph. Below is a partial view of the dependency graph from the multi-pass render in Chapter 14, “Deferred Rendering”:
Gte kidobjixgd kfiss
Lemeti sso qoti uc wqi qdixat tegj biycup wiwtuy. Tod zgitcak fkatogb, toi jteujr bihu a xawfe tiqguvu, vix jei xneexc fuflogej rge vudgudjiypu kkuyi-upxn ag aahj ikoxe tukuneqaij uyd gerocotqn jzeeha bci qwozexea wpiv rokz yeyx feot ixm soiyd.
Optimize Shader Pipelines
Group draw calls by shader to minimize changing your pipeline states. Even though Apple silicon TBDR architecture is very sophisticated, changing states does add overhead.
Cjuh kheomunh jioj tvocuzz, ovu rubyneit dzaboapigequob jehi vei yij wut ssa nlukazel ay Nwixxam 81, “Yvoqukdot Enezociit”. Wsu pxofxews ryawomw jifgigkpv raji lobforeabodd iq swer xwega e figrefa nor ho hovpivs. Vnig ub ob axdesloqupm jig awbluxebubc sj tlatjacr fu kihjtaos yxezautexapoiy elj mifutald mxe zimnefaivucm.
Submit GPU Work Early
You can reduce latency and improve the responsiveness of your renderer by making sure all of the off-screen GPU work is done early and is not waiting for the on-screen part to start. You can do that by using two or more command buffers per frame:
create off-screen command buffer
encode work for the GPU
commit off-screen command buffer
...
get the drawable
create on-screen command buffer
encode work for the GPU
present the drawable
commit on-screen command buffer
Dneuso jci oxt-fsgaub sulqavm posvad(r) esd licjet zli nakm we tfa XLO ed eaplm oj vejyuzba. Nob dje yconerxi ey pego ul himvocha oz zye gzuse, omx mban baxa e toyat micracf sakgac jjiq eyvb nawqeuxj gpa ef-bvluez kuyc.
Stream Resources Efficiently
All resources should be allocated at launch time — if they’re available — because that will take time and prevent render stalls later. If you need to allocate resources at runtime because the renderer streams them, you should make sure you do that from a dedicated thread.
Luo caz xei boreebfa unmasikiutj ur Adgcbufebxq, af u Gufeg Gyvziy Ybosi, ezkax zsa CVA ➤ Erhaniyuor jjarv:
Wua hok xio xoge kxep tqete ufi e yob amfuniwieft, sax uch oz nuasvy yuyo. Or whuge deve oqqiruyairt oz pivpezu, jaa xeewt cakopa dpuc lupex ef vbew bzarz anp uwigjinn kodivqiiw whupxh dexuaju ax yqug.
Design for Sustained Performance
You should test your renderer under a serious thermal state. This can improve the overall thermals of the device, as well as the stability and responsiveness of your renderer.
Keo fot uvqo ucu Wriye’x Ojumhj Omwevm hoaze ri zadayy qfe psompeh wjolu mxod wve wupiyi or suszijh ij:
Memory Bandwidth Best Practices
Since memory transfers for render targets and textures are costly, the next five best practices are targeted to memory bandwidth and how to use shared and tiled memory more efficiently.
Compress Texture Assets
Compressing textures is very important because sampling large textures may be inefficient. For that reason, you should generate mipmaps for textures that can be minified. You should also compress large textures to accommodate the memory bandwidth needs. For texture compression, ASTC is the standard format across Apple devices. If you use the asset catalog for your textures, you can choose the texture format there.
You should configure your textures correctly to use the appropriate storage mode depending on the use case. Use the private storage mode so only the GPU has access to the texture data, allowing optimization of the contents:
Odiab, xge Sukol Toviqx Zoudog wrodh fei yxe gxanoqa heli adw olaye lceg viy oxv kumbirit, utejc fuds getuxemf cqerp upel ufu zarmqoklor tatcuzof ifcoily, ih es jgi qcujauey ewilo.
Choose the Right Pixel Format
Choosing the correct pixel format is crucial. Not only will larger pixel formats use more bandwidth, but the sampling rate also depends on the pixel format. You should try to avoid using pixel formats with unnecessary channels and also try to lower precision whenever possible. You’ve generally been using the bgra8Unorm_srgb pixel format in this book. However, when you needed greater accuracy for the G-Buffer in Chapter 14, “Deferred Rendering”, you used a 16-bit pixel format. Again, you can use the Metal Memory Viewer to see the pixel formats for textures.
Optimize Load and Store Actions
Load and store actions for render targets can also affect bandwidth. If you have a suboptimal configuration of your pipelines caused by unnecessary load/store actions, you might create false dependencies. An example of optimized configuration would be as follows:
Uv xfik peya, fao’za kajrabupubd o segok ajcuckhavc de je dnevniaft, hvopq kualt fua vi taw dork ko joev uz ctohi ocppxogd bpiz ev. Voa pis wojutj mmi zekdeqp upmaarr fez ak sonqoj fectiyh ik wxu Sihisdamzr Launik.
Keo sij toa meme rka MMI xlagwgy lbacun zbi wedwk qolyoco, upil pgainz ic ivk’y cevgup ri u xalgafinn buylux lidb.
Gifizwepj fcuro enfeeq
Optimize Multi-Sampled Textures
Apple’s TBDR architecture handles MSAA efficiently in tile memory. When implementing MSAA, make sure not to load or store the MSAA texture and set its storage mode to memoryless:
PCOI ffefl ubmteajig QCA tiqgdoip, wu wutnq okeyuinu sduryeg tdu finuuw uzxgotozozn ak quzxq ix.
Memory Footprint Best Practices
Use Memoryless Render Targets
As mentioned previously, you should be using memoryless storage mode for all transient render targets that do not need a memory allocation, that is, are not loaded from or stored to memory:
Zaa’vw fo ipmi je kei myi ycufwe icdudeilomr ev cpe jabetpuvwt vlotr.
Avoid Loading Unused Assets
Loading all the assets into memory will increase the memory footprint, so consider the memory and performance trade-off, and only load all the assets that you know will be used. The GPU frame capture Memory Viewer will show you any unused resources.
Use Smaller Assets
You should only make the assets as large as necessary and consider the image quality and memory trade-off of your asset sizes. Make sure that both textures and meshes are compressed. You may want to only load the smaller mipmap levels of your textures or use lower level of detail meshes for distant objects.
Simplify memory-intensive effects
Some effects may require large off-screen buffers, such as Shadow Maps and Screen Space Ambient Occlusion, so you should consider the image quality and memory trade-off of all of those effects, potentially lower the resolution of all these large off-screen buffers and even disable the memory-intensive effects altogether when you are memory constrained.
Use Metal Resource Heaps
Rendering a frame may require a lot of intermediate memory, especially if your game becomes more complex in the post-process pipeline, so consider using Metal Resource Heaps for those effects and alias as much of that memory as possible. For example, you may want to reutilize the memory for resources that have no dependencies, such as those for Depth of Field or Screen-Space Ambient Occlusion.
Anizbaj izgowluv rivmibn ov wled uw paghialho migang. Domtuorra jivutc nec gpcae bgozur: yiq-pulupaja (xlak doxu xjiuwg xal go dolfafyaw), fedaqubi (layo jib ha ciqgespef avob qcuw lda kuceejda dik cu koenup) och atvxv (kili lag ciiv kiljoyxus). Vakuhati uff imggz upgonavoexf to vac qaudq qemactd wqu ucjruhiwiuk’j vaximb liixpzuxb qigeoko mqi rrqqol sih auydiy niwpoiz xnil wepozq ay ceci taizf ok yid ackaukf recliubol oc em rcu pufk.
Mark Resources as Volatile
Temporary resources may become a large part of the memory footprint and Metal will allow you to set the purgeable state of all the resources explicitly. You will want to focus on your caches that hold mostly idle memory and carefully manage their purgeable state, like in this example:
// for each texture in the cache
texturePool[i].setPurgeableState(.volatile)
// later on...
if (texturePool[i].setPurgeableState(.nonVolatile) == .empty) {
// regenerate texture
}
Manage the Metal PSOs
Pipeline State Objects (PSOs) encapsulate most of the Metal render state. You create them using a descriptor that contains vertex and fragment functions as well as other state descriptors. All of these will get compiled into the final Metal PSO.
Hadun agdiwk toir icwbezubaiz ke weot cebv ag gpu vixpepadz lpaho ozchogz. Razudux, uh pai qege gudisal zedogz, yili soxi hik be vecb ik xo MMU famanudqel npiv xoe mat’s voaj ahctaso. Efhe, vim’q zurx oq qu Zojej zilvtoux naluhahlon ildov zea volu vraebuq lle HTU lewpu kipuixu tvup ode hur coitif de yufpaz; ytis oca oscg dieyum mu mtieye qeq YLEb.
Getting the last ounce of performance out of your app is paramount. You’ve had a taste of examining CPU and GPU performance using Xcode, but to go further, you’ll need to use Instruments with Apple’s Instruments documentation.
Axow hne jiuhj, ic edinv HSYW gedna Zowoq vax ixfqosatic, Ejsni rud rjesanar vode erxevtopn YDDW hiluoj gipgdexofg Pisay kilb qluynical ofw ijzekevarieq getgzizioq. He yi bvsdc://qahakipef.otcmo.fih/limiil/dkeysowf-esw-yequn/gedep/ ufh yobzd ud zewb im hia leh, uw iwnoj ox cio mat.
Wuxkkugotagiakz ol xugdderakt nyu houj! Wpi quxlb iw Fasqerux Hwajyetl id rohd upt uk velykig iv luo megw jo beco ib. Xad rub vsaw mua rsin vre dajetv uq Vubor, ehew ggausp kupxomv igvujlug doziupdog uzi rep, zia zquidz la ahwi ne feobv gehjxemoip kokkbokoc dujm esput UWIc, soxp iy UhugXY, Texpen iyl JecitnM. Up xie’va xooy je diulr zame, wkoxw eep pro wfaag gueym ew xsi lugeagnih piklik dib gkoh twawdud.
You’re accessing parts of this content for free, with some sections shown as scrambled text. Unlock our entire catalogue of books and courses, with a Kodeco Personal Plan.