Statistics: Posted by Veta — 15 Jun 2013, 11:52
Statistics: Posted by eXcalibur — 14 Jun 2013, 16:18
Statistics: Posted by pip — 14 Jun 2013, 13:12
Statistics: Posted by Swkoll — 14 Jun 2013, 01:41
If you can process it sequentially, then you don't need multithreading. The entire point of threading is to improve performance.
The problem is that your method is entirely reliant on a uniform distribution of units among the nodes. This will rarely occur in a real scenario, and thus your algorithm will unfortunately be efficient mostly when performance isn't needed and inefficient when it is.
As in my example, I'm not speaking of direct unit to unit calculations such as collision detection, but querying unit data from other units as is the case with things like drones. There is no way to implement your algorithm without severely crippling such communication.
The sim drop you mentioned earlier during critical engagements is hardly "fine". My point is to use a simpler and more efficient algorithm to increase performance in a new engine, not merely match the lackadaisical performance from a much older engine using an inefficient and awkward algorithm.
In the context of FA, what is of most concern is the sim speed drops as rendering is independent from the simulation calculation. As in your example, the sim dropping to -5 during an ASF engagement. Your proposed algorithm would not help this scenario. A tasklet wrapper would.
Even if your algorithm was guaranteed to be more efficient during the remainder of the game, it would be irrelevant as the sim would not be running slow either way. But the fact remains that it cannot be more efficient than a tasklet wrapper, and is indeed far more likely to be *less* efficient not only a majority of the time, but assuredly during the performance critical scenarios where efficiency would be needed most.
Lastly, it has already been established that there are far more edge cases to handle than with a general tasklet solution. They are certainly surmountable, but the fact remains that they are there to begin with.
Statistics: Posted by rootbeer23 — 13 Jun 2013, 01:19
uberge3k wrote:The concept of adjacency is why this is far more complex than it first appears. You must use both axis when separating the nodes, or else there *will* be edge race conditions at the extreme edges. That means that there are 8 nodes adjacent to every other node:
000
010
000
So, assuming a 10x10 grid for simplicity, the first pass would look like:
0000000000
0101010101
0000000000
0101010101
0000000000
0101010101
0000000000
0101010101
0000000000
0101010101
rootbeer23 wrote:And so on until every node is updated. Assuming that you aren't doing this all in one tick, that is a *lot* of added latency in an engine where latency is already an issue. Furthermore, there would be an obvious "stepping" with units in one zone being updated sooner than those in the adjacent zone. This could of course be covered up by interpolating between a set of timestamped prior states in each unit in order to maintain visual temporal coherency for the player, but the displayed state would necessarily dramatically differ from the internal positions of the units as it is no longer one tick behind, but 1-N ticks behind where N is the number of passes needed to process all nodes without adjacency.
you can process every unit in one tick. for the example above you need not simulate 100km^2 sequentially, but only
2*10km^2 sequentially (first you simulate 5*10km^2 on 5 cores, then you simulate the other 5*10km^2 on the same 5 cores), so you only need 2/5th the time given optimal unit distribution. all in one tick, and indistinguishable from pure sequential calculation. (because there has to be a processing order in any case, so it may as well be defined by association with a zone).
rootbeer23 wrote:So now there are limitations on what you can only query based on an arbitrary spatial location? As a rule of thumb, if a feature only works some of the time, it may as well not be there. Therefore, this would break a *lot* of functionality, or worse, it would only function inconsistently and for reasons that are clear only to the developer of the spatial division system.
why would a tank need information about a tank on the other end of the map? the available information is not arbitrary, but is that information which is in the zone of influence of a unit. in principle akin to the speed of light limit in the physical world. same goes for artillery projectiles that travel the whole map and get handed from zone to zone.
only the aiming of the artillery must be a sequential operation to it does aim either at the new or the old position consistently.
rootbeer23 wrote:It's only easier if you are willing to accept the staggering limitations that this method imposes on inter-unit communication, as well as the fact that it will be extremely unoptimized in the very cases that *need* optimization, which is during large scale unit engagements that will tend to occur near the same place. As well as the myriad of subtle edge cases that would need to be handled in order for it to function deterministically.
supcom simulates 8 player 10x10km games just fine on a single core. the grid can be quite large.
rootbeer23 wrote:I do not see any advantages whatsoever over a simple tasklet wrapper utilizing the update-cache-finalize pattern.
dunno. i am ambiguous to what method to use. both seem viable.
Statistics: Posted by uberge3k — 10 Jun 2013, 14:54
And so on until every node is updated. Assuming that you aren't doing this all in one tick, that is a *lot* of added latency in an engine where latency is already an issue. Furthermore, there would be an obvious "stepping" with units in one zone being updated sooner than those in the adjacent zone. This could of course be covered up by interpolating between a set of timestamped prior states in each unit in order to maintain visual temporal coherency for the player, but the displayed state would necessarily dramatically differ from the internal positions of the units as it is no longer one tick behind, but 1-N ticks behind where N is the number of passes needed to process all nodes without adjacency.
Also note that you still have to account for literal corner cases. Remember that planets are spherical. Assuming that they are generated by taking the 6 faces of a cube and projecting them to a sphere, each node field will have four adjacent node fields which will then need to communicate with each other.
So now there are limitations on what you can only query based on an arbitrary spatial location? As a rule of thumb, if a feature only works some of the time, it may as well not be there. Therefore, this would break a *lot* of functionality, or worse, it would only function inconsistently and for reasons that are clear only to the developer of the spatial division system.
It's only easier if you are willing to accept the staggering limitations that this method imposes on inter-unit communication, as well as the fact that it will be extremely unoptimized in the very cases that *need* optimization, which is during large scale unit engagements that will tend to occur near the same place. As well as the myriad of subtle edge cases that would need to be handled in order for it to function deterministically.
I do not see any advantages whatsoever over a simple tasklet wrapper utilizing the update-cache-finalize pattern.
Statistics: Posted by rootbeer23 — 10 Jun 2013, 14:31
rootbeer23 wrote:
you can enforce access rules (like for example a tank in an even zone can only query the position of a tank in an odd zone or its own zone etc.) in the lua interpreter, or whatever your scripting language would be. you need this only in a debug build. this kind of unallowed access is trivial to detect, since each entity is associated with a zone.
that comes in addition to the fact that the target selection function for a tank will ignore targets that are out of range naturally.
rootbeer23 wrote:
and correctness is a priori easier to ensure than for the precondition: that is making a single thread deterministic.
that is because the spatial independence is not a thing that you have to impose on the code, its a natural property of an RTS.
Statistics: Posted by uberge3k — 10 Jun 2013, 13:47
Okay, so that's a rare edge case that will almost never happen and we can easily maintain a list of units that should be processed outside of the node structure. Fair enough.
But multiply this one scenario by all of the different types of units and different permutations regarding the myriad of different ways units can interact with other units. And couple it with the extremely high probability that the guy innocently writing unit code won't be familiar with the multithreaded architecture and will inadvertently write subtle desync-creating code.
Statistics: Posted by rootbeer23 — 10 Jun 2013, 05:25
Statistics: Posted by uberge3k — 10 Jun 2013, 04:18
rootbeer23 wrote:each thread has its own PRNG. no synchronization necessary.
Statistics: Posted by rootbeer23 — 10 Jun 2013, 00:45
uberge3k wrote:rootbeer23 wrote:i agree keeping double state is a possibility.
although memory latency is not a trivial matter.
50 nanosecond busy wait on a cache miss is probably a longer time than doing all the calculations for a projectile.
Taking steps to maintain proper cache coherency will ensure that this happens a vanishingly small percentage of the time. Keep in mind that each unit's state is vanishingly small, and L1/L2 cache is relatively huge nowadays.
Even in the absolute worst possible scenario, it will still be much, much faster than if you didn't thread it.
Statistics: Posted by uberge3k — 09 Jun 2013, 23:38
rootbeer23 wrote:uberge3k wrote:- Unit A and Unit B both need to call random() this tick in order to find out what their weapon's muzzle spread will be.
- They are on different threads.
- On Player 1's PC, Unit A calls random() first. On Player 2's PC, it's unit B.
- Desyncs ensue.
Could this potentially be solved? Yes. Is it more difficult in addition to being less efficient and scalable than the aforementioned threading architecture? Also yes.
each thread has its own PRNG. no synchronization necessary.
Statistics: Posted by uberge3k — 09 Jun 2013, 23:38
rootbeer23 wrote:i agree keeping double state is a possibility.
although memory latency is not a trivial matter.
50 nanosecond busy wait on a cache miss is probably a longer time than doing all the calculations for a projectile.
Statistics: Posted by rootbeer23 — 09 Jun 2013, 22:49
uberge3k wrote:
- Unit A and Unit B both need to call random() this tick in order to find out what their weapon's muzzle spread will be.
- They are on different threads.
- On Player 1's PC, Unit A calls random() first. On Player 2's PC, it's unit B.
- Desyncs ensue.
Could this potentially be solved? Yes. Is it more difficult in addition to being less efficient and scalable than the aforementioned threading architecture? Also yes.
Statistics: Posted by rootbeer23 — 09 Jun 2013, 22:43