I wouldn't say the diminishing returns argument is bullshit. The work required for marginally more fidelity, at an asset level, has been increasing exponentially with each generation. As scene density increases, the asset count also increases. Couple the two and either budgets balloon or significant comprises must be made. The compromises often result in more generic looking assets (owing to extensive outsourcing with minimal art director, the use of photogrammetry at some stage of the pipeline or parametric tools drawing on a common data set). The pursuit of realism is a matter of diminishing returns, and it reduces creative freedom.
Same with the computational cost of rendering techniques. Compare early stencil shadows with PCSS. Now compare PCSS to RT shadows. The former is a significant leap, the latter not so much. Same can said for AO, reflections and GI. The problem with a lot of approximations is they usually proved to be unstable or heavily constrained. Cryengine's SVOGI, for example, can impose substantial constraints on how you build environments, especially interiors, and still suffers from ghosting and artifacts.
Rasterization based approximations had decades to mature and still weren't great. Developer laziness is inexcusable, but the biggest sin was selling real-time raytracing, which is desirable for any number of reasons, as viable multiple hardware generations too early and doing so before the associated tech had matured. And it was done to sell GPUs, not games.
the use of photogrammetry at some stage of the pipeline or parametric tools drawing on a common data se
Unless your game is a stylised Nintendo game, it's always cheaper to use photogrammetry, which will give you 1:1 realistic topology, and with today's pipelines you can use your smartphone for asset capture. It's also how Capcom managed to cut millions out of the budget of the newer Resident Evil games -- they talked about it in an interview a while back, using LiDAR for a bunch of the baked in scene decoration to avoid having to model the assets manually.
The real cost in "realistic" art assets is in retopology and baking a usable mesh into your runtime, which, funnily enough, requires a lot of time downscaling the mesh to work well within the engine pipeline (ergo, creating various LODs). But it's till faster/cheaper than manually making the asset, if you're actually a good modeler.
But as Current Horror pointed out, a lot of studios are now bypassing the optimisation phase by using Nanite to do the work for them, even though it results in horrible performance, which they attempt to bypass with frame generation and motion blur, using horrible temporal anti-aliasing to hide the bad optimisation, which results in smeary ghosting, which is really apparent in a ton of games.
The thing is, a lot of indie studios make a ton of photorealistic walking sims using photogrammetry or LiDAR on no-nothing budgets. It's not the "realism" that is costly, it's actually making a functional, optimised game out of those realistic assets that is costly, but most studios do not put in the time to do so.
Yes. It's cheaper to build film sets and fly scouting teams to multiple exotic locations for months on end to capture approximations of your art direction, than hand author assets. But realism isn't costly... Look past the criticisms of dunning-kruger infused youtube videos man, and you'll see that asset creation is incomparable to the basic textures on primitive geometry of a couple generations ago. Production has changed in a way that not only stifles creativity, but lends itself to poorly polished and poorly performing games. All in the pursuit of "realism", where many prefer the aesthetics of last generation. That's the definition of diminishing returns.
which will give you 1:1 realistic topology
Photogrammetry doesn't produce "realistic topology" - there isn't such a thing. Topology refers to the mathematical structure of geometry, primarily in regards to how it deforms and renders. Outside of the handful of deforming meshes, the only concerns are 1) Density, 2) Correctness (manifold geometry, excessive concavity, micro-triangles introducing overdraw, incompatibility with the rest of your pipeline, etc.) By necessity, retopology sees heavy automation. LOD generation, with very few exceptions, is done parametrically, with engines providing the functionality in engine. Seen that Simplygon logo at startup? That's one middleware that provides such functionality. The entire retopology/unwrap/baking process can, and often is, entirely automated by the likes of Houdini TOPs for certain classes of photogrammetry assets. Again, by necessity given the sheer number of assets modern games require.
The time spent on photogrammetry is primarily clean up - whether it be removing objects from environments, filling areas not available for imaging or fixing the myriad artifacts that the process produces - it's far from perfect. That is IF you can find a 1:1 analogue for what you want in your game, which brings me to this notion:
Unless your game is a stylised Nintendo game
Because even half the assets released in a third of games in a year are viable candidates for photogrammetry? Sure. Overlooking that games are already criticized for looking generic, a by-product of excessive photogrammetry, art direction is still a thing, and is more important than fidelity in regards to appeal. One key concern is consistency - that the style and fidelity of assets is consistent throughout a scene. Introduce photogrammetry, and every hand authored asset now has to target that level of realism lest it stick out like a sore thumb. Even when utilising scanned materials as bases, maintaining that level of realism is time consuming and limiting.
Now consider how environments are actually constructed. The majority of game worlds rely on modularity - utilizing instanceable geometry along with trim textures/geometry, topped off with a small number of versatile tiling textures. This isn't just to speed up environment creation, it's to reduce required GPU bandwidth. Real world objects seldom conform to this approach, outside of surface scans used as tileables, and photogrammetry results in unique texture data per asset. Wonder why games have ballooned in size and constantly suffer from streaming hitches? Look no further.
a lot of studios are now bypassing the optimisation phase by using Nanite to do the work for them
Nanite is an alternative to traditional triangle rasterization as to allow more complex geometry than traditional LOD systems can practically provide. Better handling unoptimized scenes is a side-effect, not it's purpose or a recommendation. Outside of Unreal, mesh shaders are being used for the same reason, with similar results - additional overdraw. See Alan Wake 2. It's a new approach, with the associated growing pains, but "realism" demanded more geometry, so here we are.
Long term, it'll be resolved. That's not to say it's a substitute for optimisation, or was billed as such. It's just a convenient scapegoat. Ironically, Unreal does have some major architectural issues. The entire streaming system is built with Fornite in mind - with the idea of of a persistent server side world. The actor system/tick handling is poor for complex non-linear worlds, resulting in game thread congestion, and the actual streaming is far too course for large, dense worlds. The collaboration with CDPR is at least seeing some progress there - here's to hoping more games benefit from it moving forward.
Yes. It's cheaper to build film sets and fly scouting teams to multiple exotic locations for months on end to capture approximations of your art direction, than hand author assets.
We just use an iPhone and gaussian splatting. You don't even have to go outside, you can even use AR captured imagery as well:
https://youtu.be/UdCKeO4c_xM
EDIT: Just wanted to address this part because I forgot something...
Production has changed in a way that not only stifles creativity, but lends itself to poorly polished and poorly performing games. All in the pursuit of "realism", where many prefer the aesthetics of last generation. That's the definition of diminishing returns
This is true, and part of the point, but also we see that on the flip side we have games like Bodycam, made by two guys, one of whom was 17 at the time when they started, and it looks more realistic than any AAA shooter and most people are none the wiser to how it was made. Mostly UE5 Blueprints and asset packs made from laser-scanned entities.
In this regard, they managed to make a top-selling game that fools a lot of people into thinking it looks "real" without having spent an arm and a leg to do so.
It's possible to get creative, push boundaries and make use of these tools to build out fascinating, unique, or groundbreaking games using these tools and techniques, but as you stated, most studios do not do this.
Fidelity and design are fundamentally different concepts. Films with memorable visual identity that are remembered decades after their release aren't a product of fidelity. The same is largely true for games or any other visual medium. Architectural styles, furniture, clothing, weapons, foliage and entire biomes where designed, with much thought by talented people, to maximize interest/appeal. Even in the case of realistic settings, set designers curate and build specialized props to maximize appeal. Scanning your immediate vicinity is the antithesis of this, which is why, at great expense, AAA studios combine externally sourced photogrammetry assets and scouting operations with a large number of custom assets.
Simply put, photogrammetry may provide a shortcut to fidelity. But fixation on fidelity over design is a shortcut to churning out a generic and visually uninteresting product. There are settings that get away with this. Most do not. And I'd rather play an interesting game than a realistic one.
The generative AI/photogrammetry approach is really interesting. I'd wager it's a poor substitute for coherent, deliberate design throughout a project though. Good design involves intent, understanding and consistency. AI, for the time being, lacks all of the above.
I wouldn't say the diminishing returns argument is bullshit. The work required for marginally more fidelity, at an asset level, has been increasing exponentially with each generation. As scene density increases, the asset count also increases. Couple the two and either budgets balloon or significant comprises must be made. The compromises often result in more generic looking assets (owing to extensive outsourcing with minimal art director, the use of photogrammetry at some stage of the pipeline or parametric tools drawing on a common data set). The pursuit of realism is a matter of diminishing returns, and it reduces creative freedom.
Same with the computational cost of rendering techniques. Compare early stencil shadows with PCSS. Now compare PCSS to RT shadows. The former is a significant leap, the latter not so much. Same can said for AO, reflections and GI. The problem with a lot of approximations is they usually proved to be unstable or heavily constrained. Cryengine's SVOGI, for example, can impose substantial constraints on how you build environments, especially interiors, and still suffers from ghosting and artifacts.
Rasterization based approximations had decades to mature and still weren't great. Developer laziness is inexcusable, but the biggest sin was selling real-time raytracing, which is desirable for any number of reasons, as viable multiple hardware generations too early and doing so before the associated tech had matured. And it was done to sell GPUs, not games.
Unless your game is a stylised Nintendo game, it's always cheaper to use photogrammetry, which will give you 1:1 realistic topology, and with today's pipelines you can use your smartphone for asset capture. It's also how Capcom managed to cut millions out of the budget of the newer Resident Evil games -- they talked about it in an interview a while back, using LiDAR for a bunch of the baked in scene decoration to avoid having to model the assets manually.
The real cost in "realistic" art assets is in retopology and baking a usable mesh into your runtime, which, funnily enough, requires a lot of time downscaling the mesh to work well within the engine pipeline (ergo, creating various LODs). But it's till faster/cheaper than manually making the asset, if you're actually a good modeler.
But as Current Horror pointed out, a lot of studios are now bypassing the optimisation phase by using Nanite to do the work for them, even though it results in horrible performance, which they attempt to bypass with frame generation and motion blur, using horrible temporal anti-aliasing to hide the bad optimisation, which results in smeary ghosting, which is really apparent in a ton of games.
The thing is, a lot of indie studios make a ton of photorealistic walking sims using photogrammetry or LiDAR on no-nothing budgets. It's not the "realism" that is costly, it's actually making a functional, optimised game out of those realistic assets that is costly, but most studios do not put in the time to do so.
Yes. It's cheaper to build film sets and fly scouting teams to multiple exotic locations for months on end to capture approximations of your art direction, than hand author assets. But realism isn't costly... Look past the criticisms of dunning-kruger infused youtube videos man, and you'll see that asset creation is incomparable to the basic textures on primitive geometry of a couple generations ago. Production has changed in a way that not only stifles creativity, but lends itself to poorly polished and poorly performing games. All in the pursuit of "realism", where many prefer the aesthetics of last generation. That's the definition of diminishing returns.
Photogrammetry doesn't produce "realistic topology" - there isn't such a thing. Topology refers to the mathematical structure of geometry, primarily in regards to how it deforms and renders. Outside of the handful of deforming meshes, the only concerns are 1) Density, 2) Correctness (manifold geometry, excessive concavity, micro-triangles introducing overdraw, incompatibility with the rest of your pipeline, etc.) By necessity, retopology sees heavy automation. LOD generation, with very few exceptions, is done parametrically, with engines providing the functionality in engine. Seen that Simplygon logo at startup? That's one middleware that provides such functionality. The entire retopology/unwrap/baking process can, and often is, entirely automated by the likes of Houdini TOPs for certain classes of photogrammetry assets. Again, by necessity given the sheer number of assets modern games require.
The time spent on photogrammetry is primarily clean up - whether it be removing objects from environments, filling areas not available for imaging or fixing the myriad artifacts that the process produces - it's far from perfect. That is IF you can find a 1:1 analogue for what you want in your game, which brings me to this notion:
Because even half the assets released in a third of games in a year are viable candidates for photogrammetry? Sure. Overlooking that games are already criticized for looking generic, a by-product of excessive photogrammetry, art direction is still a thing, and is more important than fidelity in regards to appeal. One key concern is consistency - that the style and fidelity of assets is consistent throughout a scene. Introduce photogrammetry, and every hand authored asset now has to target that level of realism lest it stick out like a sore thumb. Even when utilising scanned materials as bases, maintaining that level of realism is time consuming and limiting.
Now consider how environments are actually constructed. The majority of game worlds rely on modularity - utilizing instanceable geometry along with trim textures/geometry, topped off with a small number of versatile tiling textures. This isn't just to speed up environment creation, it's to reduce required GPU bandwidth. Real world objects seldom conform to this approach, outside of surface scans used as tileables, and photogrammetry results in unique texture data per asset. Wonder why games have ballooned in size and constantly suffer from streaming hitches? Look no further.
Nanite is an alternative to traditional triangle rasterization as to allow more complex geometry than traditional LOD systems can practically provide. Better handling unoptimized scenes is a side-effect, not it's purpose or a recommendation. Outside of Unreal, mesh shaders are being used for the same reason, with similar results - additional overdraw. See Alan Wake 2. It's a new approach, with the associated growing pains, but "realism" demanded more geometry, so here we are.
Long term, it'll be resolved. That's not to say it's a substitute for optimisation, or was billed as such. It's just a convenient scapegoat. Ironically, Unreal does have some major architectural issues. The entire streaming system is built with Fornite in mind - with the idea of of a persistent server side world. The actor system/tick handling is poor for complex non-linear worlds, resulting in game thread congestion, and the actual streaming is far too course for large, dense worlds. The collaboration with CDPR is at least seeing some progress there - here's to hoping more games benefit from it moving forward.
We just use an iPhone and gaussian splatting. You don't even have to go outside, you can even use AR captured imagery as well: https://youtu.be/UdCKeO4c_xM
EDIT: Just wanted to address this part because I forgot something...
This is true, and part of the point, but also we see that on the flip side we have games like Bodycam, made by two guys, one of whom was 17 at the time when they started, and it looks more realistic than any AAA shooter and most people are none the wiser to how it was made. Mostly UE5 Blueprints and asset packs made from laser-scanned entities.
https://www.youtube.com/shorts/_Gh0x9mtIuQ?feature=share
In this regard, they managed to make a top-selling game that fools a lot of people into thinking it looks "real" without having spent an arm and a leg to do so.
It's possible to get creative, push boundaries and make use of these tools to build out fascinating, unique, or groundbreaking games using these tools and techniques, but as you stated, most studios do not do this.
Fidelity and design are fundamentally different concepts. Films with memorable visual identity that are remembered decades after their release aren't a product of fidelity. The same is largely true for games or any other visual medium. Architectural styles, furniture, clothing, weapons, foliage and entire biomes where designed, with much thought by talented people, to maximize interest/appeal. Even in the case of realistic settings, set designers curate and build specialized props to maximize appeal. Scanning your immediate vicinity is the antithesis of this, which is why, at great expense, AAA studios combine externally sourced photogrammetry assets and scouting operations with a large number of custom assets.
Simply put, photogrammetry may provide a shortcut to fidelity. But fixation on fidelity over design is a shortcut to churning out a generic and visually uninteresting product. There are settings that get away with this. Most do not. And I'd rather play an interesting game than a realistic one.
The generative AI/photogrammetry approach is really interesting. I'd wager it's a poor substitute for coherent, deliberate design throughout a project though. Good design involves intent, understanding and consistency. AI, for the time being, lacks all of the above.