This forum is archived and read only.
Please move to the new forum!

by **rootbeer23** » 09 Jun 2013, 22:34

i agree keeping double state is a possibility.
although memory latency is not a trivial matter.
50 nanosecond busy wait on a cache miss is probably a longer time than doing all the calculations for a projectile.

by **uberge3k** » 09 Jun 2013, 22:34

rootbeer23 wrote:
uberge3k wrote:Putting the simulation of each planet on a separate thread is indeed a suboptimal solution. It does not scale well, while the above solution will - eg, what if there is a 1k unit battle on a single planet? Or, what if there are 100 planets, each with 50 units? And so on.

1k units on a single planet can be handled by a single thread.
100 planets with 50 units each can be handled by 100 threads. or 8 threads if you have only 8 cores, with each thread simulating multiple planets in sequence.
if you have 10k units on a single planet, you can divide the units among several threads, partitioning by planets is
only the most straightforward special case of partitioning by spatial coordinates.

This would be incredibly uneven due to unit distribution. You are likely to be wasting an enormous amount of processing time by dedicating threads to sparsely populated areas, or one thread being hammered by a densely populated area.

However, I believe that you are missing my point. Units will still need to refer to each other, and the question is how to do so in a deterministic manner when each unit could be updating in a different order on different player's PCs due to varying thread execution.

If you divide units by spatial coordinates, you will need to consider a myriad of potential edge cases. The primary one being how to maintain the order of pseudorandom value generation. Take this example:

- Unit A and Unit B both need to call random() this tick in order to find out what their weapon's muzzle spread will be.
- They are on different threads.
- On Player 1's PC, Unit A calls random() first. On Player 2's PC, it's unit B.
- Desyncs ensue.

Could this potentially be solved? Yes. Is it more difficult in addition to being less efficient and scalable than the aforementioned threading architecture? Also yes.

by **uberge3k** » 09 Jun 2013, 22:38

rootbeer23 wrote:i agree keeping double state is a possibility.
although memory latency is not a trivial matter.
50 nanosecond busy wait on a cache miss is probably a longer time than doing all the calculations for a projectile.

Taking steps to maintain proper cache coherency will ensure that this happens a vanishingly small percentage of the time. Keep in mind that each unit's state is vanishingly small, and L1/L2 cache is relatively huge nowadays.

Even in the absolute worst possible scenario, it will still be much, much faster than if you didn't thread it.

by **rootbeer23** » 09 Jun 2013, 22:43

uberge3k wrote:This would be incredibly uneven due to unit distribution. You are likely to be wasting an enormous amount of processing time by dedicating threads to sparsely populated areas, or one thread being hammered by a densely populated area.

if there are no units, the threads dedicated to an area dont do any work.
if you have 32 threads on a 4 core machine, then you will have enough threads to populate idle cores.
if the area is densely populated, then thats a suboptimal corner case. gosh, if supcom only went to -1 or -2 during 500vs500 asf battles and remained at +0 otherwise, i would not be complaining.

uberge3k wrote:- Unit A and Unit B both need to call random() this tick in order to find out what their weapon's muzzle spread will be.
- They are on different threads.
- On Player 1's PC, Unit A calls random() first. On Player 2's PC, it's unit B.
- Desyncs ensue.

Could this potentially be solved? Yes. Is it more difficult in addition to being less efficient and scalable than the aforementioned threading architecture? Also yes.

each thread has its own PRNG. no synchronization necessary.

by **rootbeer23** » 09 Jun 2013, 22:49

uberge3k wrote:
rootbeer23 wrote:i agree keeping double state is a possibility.
although memory latency is not a trivial matter.
50 nanosecond busy wait on a cache miss is probably a longer time than doing all the calculations for a projectile.

Taking steps to maintain proper cache coherency will ensure that this happens a vanishingly small percentage of the time. Keep in mind that each unit's state is vanishingly small, and L1/L2 cache is relatively huge nowadays.

Even in the absolute worst possible scenario, it will still be much, much faster than if you didn't thread it.

DRAM access time isnt a good argument anyway, because you would not actually commit any state twice. what you have
is each state in 2 versions per unit. then in the first tick you read version 1 and write version 2, in the second tick you read
version 2 and write version 1 etc pp and so on.
only drawback is: double state memory (ok, we can survive that)
must copy the state of units that you didnt actually modify (we can survive that with a little scar).

by **uberge3k** » 09 Jun 2013, 23:38

rootbeer23 wrote:if there are no units, the threads dedicated to an area dont do any work.
if you have 32 threads on a 4 core machine, then you will have enough threads to populate idle cores.
if the area is densely populated, then thats a suboptimal corner case. gosh, if supcom only went to -1 or -2 during 500vs500 asf battles and remained at +0 otherwise, i would not be complaining.

Which would be one of those suboptimal corner cases, and precisely why this method is insufficient for this type of game.

Sadly, it appears that PA won't have 500vs500ASF battles so I suppose the point is moot either way.

rootbeer23 wrote:
uberge3k wrote:- Unit A and Unit B both need to call random() this tick in order to find out what their weapon's muzzle spread will be.
- They are on different threads.
- On Player 1's PC, Unit A calls random() first. On Player 2's PC, it's unit B.
- Desyncs ensue.

Could this potentially be solved? Yes. Is it more difficult in addition to being less efficient and scalable than the aforementioned threading architecture? Also yes.

each thread has its own PRNG. no synchronization necessary.

But how does each client keep track of synchronizing their threads' PRNGs, and in what order they initialize and activate themselves? What happens when units cross bounding thresholds at different rates? And so on and so forth. My point is that these types of edge cases are what take up the majority of development, so minimizing them, especially when the algorithm is superior, should be a priority.

by **uberge3k** » 09 Jun 2013, 23:38

rootbeer23 wrote:
uberge3k wrote:
rootbeer23 wrote:i agree keeping double state is a possibility.
although memory latency is not a trivial matter.
50 nanosecond busy wait on a cache miss is probably a longer time than doing all the calculations for a projectile.

Taking steps to maintain proper cache coherency will ensure that this happens a vanishingly small percentage of the time. Keep in mind that each unit's state is vanishingly small, and L1/L2 cache is relatively huge nowadays.

Even in the absolute worst possible scenario, it will still be much, much faster than if you didn't thread it.

DRAM access time isnt a good argument anyway, because you would not actually commit any state twice. what you have
is each state in 2 versions per unit. then in the first tick you read version 1 and write version 2, in the second tick you read
version 2 and write version 1 etc pp and so on.
only drawback is: double state memory (ok, we can survive that)
must copy the state of units that you didnt actually modify (we can survive that with a little scar).

Exactly the point I was trying to make.

by **rootbeer23** » 10 Jun 2013, 00:45

uberge3k wrote:
rootbeer23 wrote:each thread has its own PRNG. no synchronization necessary.

But how does each client keep track of synchronizing their threads' PRNGs, and in what order they initialize and activate themselves? What happens when units cross bounding thresholds at different rates? And so on and so forth. My point is that these types of edge cases are what take up the majority of development, so minimizing them, especially when the algorithm is superior, should be a priority.

the threads on 2 nodes process the units in their area of influence in excatly the same sequence, so they will use the same random number for the same calculation, starting at the same time. after tick 1 the threads on both nodes will have used
547382 random numbers in the exact same way. this is the same solution as how to do a single thread sync simulation.
the same is true for all PRNGs across different nodes which process the same area and thus do the exact same calculation in the same sequence. units will cross borders between regions (= move to another thread) after the simulation tick, again, keeping the state the same across nodes.

by **uberge3k** » 10 Jun 2013, 04:18

rootbeer23 wrote:the threads on 2 nodes process the units in their area of influence in excatly the same sequence, so they will use the same random number for the same calculation, starting at the same time. after tick 1 the threads on both nodes will have used
547382 random numbers in the exact same way. this is the same solution as how to do a single thread sync simulation.
the same is true for all PRNGs across different nodes which process the same area and thus do the exact same calculation in the same sequence. units will cross borders between regions (= move to another thread) after the simulation tick, again, keeping the state the same across nodes.

So each node would have to be pregenerated, maintained in the same order using the same seed across all clients. I suppose this wouldn't be an issue for tiny 'planets' the size of PA's if the node count was kept low.

However, I was unclear with what I meant by units crossing bounds at different rates. Imagine a unit that needs to communicate with a unit in a different node. For example, a shoulder drone needs to query the ACU's state to determine what it should be doing. If they are in separate nodes, and they are processed in different orders on different clients, the results will differ and the game will desync.

Okay, so that's a rare edge case that will almost never happen and we can easily maintain a list of units that should be processed outside of the node structure. Fair enough.

But multiply this one scenario by all of the different types of units and different permutations regarding the myriad of different ways units can interact with other units. And couple it with the extremely high probability that the guy innocently writing unit code won't be familiar with the multithreaded architecture and will inadvertently write subtle desync-creating code.

by **rootbeer23** » 10 Jun 2013, 05:25

uberge3k wrote:However, I was unclear with what I meant by units crossing bounds at different rates. Imagine a unit that needs to communicate with a unit in a different node. For example, a shoulder drone needs to query the ACU's state to determine what it should be doing. If they are in separate nodes, and they are processed in different orders on different clients, the results will differ and the game will desync.

no adjacent areas are processed in parallel. you divide the map into horizontal zones and in the first pass you simulate
all even zones, in the second pass you simulate all odd zones.
processing order is strict within a zone (as a precondition), interaction between units of adjacent zones is in sequence, because adjacent zones are not simulated in parallel.

Okay, so that's a rare edge case that will almost never happen and we can easily maintain a list of units that should be processed outside of the node structure. Fair enough.

like artillery and such.

But multiply this one scenario by all of the different types of units and different permutations regarding the myriad of different ways units can interact with other units. And couple it with the extremely high probability that the guy innocently writing unit code won't be familiar with the multithreaded architecture and will inadvertently write subtle desync-creating code.

you can enforce access rules (like for example a tank in an even zone can only query the position of a tank in an odd zone or its own zone etc.) in the lua interpreter, or whatever your scripting language would be. you need this only in a debug build. this kind of unallowed access is trivial to detect, since each entity is associated with a zone.
that comes in addition to the fact that the target selection function for a tank will ignore targets that are out of range naturally.

what we have then is a solution that is deterministic inside a zone (precondition), deterministic with regard to adjacent zones (they are processed in sequence) and deterministic between parallel processed zones (there is no interaction and thus order is irrelevant).

and correctness is a priori easier to ensure than for the precondition: that is making a single thread deterministic.
that is because the spatial independence is not a thing that you have to impose on the code, its a natural property of an RTS.

This forum is archived and read only.
Please move to the new forum!

Forged Alliance Forever

Planetary Annihilation "spiritual successor to T.A"

Re: Planetary Annihilation "spiritual successor to T.A"

Re: Planetary Annihilation "spiritual successor to T.A"

Re: Planetary Annihilation "spiritual successor to T.A"

Re: Planetary Annihilation "spiritual successor to T.A"

Re: Planetary Annihilation "spiritual successor to T.A"

Re: Planetary Annihilation "spiritual successor to T.A"

Re: Planetary Annihilation "spiritual successor to T.A"

Re: Planetary Annihilation "spiritual successor to T.A"

Re: Planetary Annihilation "spiritual successor to T.A"

Re: Planetary Annihilation "spiritual successor to T.A"

Who is online

This forum is archived and read only.Please move to the new forum!

Who is online

This forum is archived and read only.
Please move to the new forum!