i agree keeping double state is a possibility.
although memory latency is not a trivial matter.
50 nanosecond busy wait on a cache miss is probably a longer time than doing all the calculations for a projectile.
Forged Alliance Forever Forums
rootbeer23 wrote:uberge3k wrote:Putting the simulation of each planet on a separate thread is indeed a suboptimal solution. It does not scale well, while the above solution will - eg, what if there is a 1k unit battle on a single planet? Or, what if there are 100 planets, each with 50 units? And so on.
1k units on a single planet can be handled by a single thread.
100 planets with 50 units each can be handled by 100 threads. or 8 threads if you have only 8 cores, with each thread simulating multiple planets in sequence.
if you have 10k units on a single planet, you can divide the units among several threads, partitioning by planets is
only the most straightforward special case of partitioning by spatial coordinates.
Ze_PilOt wrote:If you want something to happen, do it yourself.
rootbeer23 wrote:i agree keeping double state is a possibility.
although memory latency is not a trivial matter.
50 nanosecond busy wait on a cache miss is probably a longer time than doing all the calculations for a projectile.
Ze_PilOt wrote:If you want something to happen, do it yourself.
uberge3k wrote:This would be incredibly uneven due to unit distribution. You are likely to be wasting an enormous amount of processing time by dedicating threads to sparsely populated areas, or one thread being hammered by a densely populated area.
uberge3k wrote:- Unit A and Unit B both need to call random() this tick in order to find out what their weapon's muzzle spread will be.
- They are on different threads.
- On Player 1's PC, Unit A calls random() first. On Player 2's PC, it's unit B.
- Desyncs ensue.
Could this potentially be solved? Yes. Is it more difficult in addition to being less efficient and scalable than the aforementioned threading architecture? Also yes.
uberge3k wrote:rootbeer23 wrote:i agree keeping double state is a possibility.
although memory latency is not a trivial matter.
50 nanosecond busy wait on a cache miss is probably a longer time than doing all the calculations for a projectile.
Taking steps to maintain proper cache coherency will ensure that this happens a vanishingly small percentage of the time. Keep in mind that each unit's state is vanishingly small, and L1/L2 cache is relatively huge nowadays.
Even in the absolute worst possible scenario, it will still be much, much faster than if you didn't thread it.
rootbeer23 wrote:if there are no units, the threads dedicated to an area dont do any work.
if you have 32 threads on a 4 core machine, then you will have enough threads to populate idle cores.
if the area is densely populated, then thats a suboptimal corner case. gosh, if supcom only went to -1 or -2 during 500vs500 asf battles and remained at +0 otherwise, i would not be complaining.
rootbeer23 wrote:uberge3k wrote:- Unit A and Unit B both need to call random() this tick in order to find out what their weapon's muzzle spread will be.
- They are on different threads.
- On Player 1's PC, Unit A calls random() first. On Player 2's PC, it's unit B.
- Desyncs ensue.
Could this potentially be solved? Yes. Is it more difficult in addition to being less efficient and scalable than the aforementioned threading architecture? Also yes.
each thread has its own PRNG. no synchronization necessary.
Ze_PilOt wrote:If you want something to happen, do it yourself.
rootbeer23 wrote:uberge3k wrote:rootbeer23 wrote:i agree keeping double state is a possibility.
although memory latency is not a trivial matter.
50 nanosecond busy wait on a cache miss is probably a longer time than doing all the calculations for a projectile.
Taking steps to maintain proper cache coherency will ensure that this happens a vanishingly small percentage of the time. Keep in mind that each unit's state is vanishingly small, and L1/L2 cache is relatively huge nowadays.
Even in the absolute worst possible scenario, it will still be much, much faster than if you didn't thread it.
DRAM access time isnt a good argument anyway, because you would not actually commit any state twice. what you have
is each state in 2 versions per unit. then in the first tick you read version 1 and write version 2, in the second tick you read
version 2 and write version 1 etc pp and so on.
only drawback is: double state memory (ok, we can survive that)
must copy the state of units that you didnt actually modify (we can survive that with a little scar).
Ze_PilOt wrote:If you want something to happen, do it yourself.
uberge3k wrote:rootbeer23 wrote:each thread has its own PRNG. no synchronization necessary.
But how does each client keep track of synchronizing their threads' PRNGs, and in what order they initialize and activate themselves? What happens when units cross bounding thresholds at different rates? And so on and so forth. My point is that these types of edge cases are what take up the majority of development, so minimizing them, especially when the algorithm is superior, should be a priority.
rootbeer23 wrote:the threads on 2 nodes process the units in their area of influence in excatly the same sequence, so they will use the same random number for the same calculation, starting at the same time. after tick 1 the threads on both nodes will have used
547382 random numbers in the exact same way. this is the same solution as how to do a single thread sync simulation.
the same is true for all PRNGs across different nodes which process the same area and thus do the exact same calculation in the same sequence. units will cross borders between regions (= move to another thread) after the simulation tick, again, keeping the state the same across nodes.
Ze_PilOt wrote:If you want something to happen, do it yourself.
uberge3k wrote:However, I was unclear with what I meant by units crossing bounds at different rates. Imagine a unit that needs to communicate with a unit in a different node. For example, a shoulder drone needs to query the ACU's state to determine what it should be doing. If they are in separate nodes, and they are processed in different orders on different clients, the results will differ and the game will desync.
Okay, so that's a rare edge case that will almost never happen and we can easily maintain a list of units that should be processed outside of the node structure. Fair enough.
But multiply this one scenario by all of the different types of units and different permutations regarding the myriad of different ways units can interact with other units. And couple it with the extremely high probability that the guy innocently writing unit code won't be familiar with the multithreaded architecture and will inadvertently write subtle desync-creating code.
Users browsing this forum: No registered users and 1 guest