Forged Alliance Forever Forged Alliance Forever Forums 2013-06-15T11:52:48+02:00 /feed.php?f=39&t=1709 2013-06-15T11:52:48+02:00 2013-06-15T11:52:48+02:00 /viewtopic.php?t=1709&p=46231#p46231 <![CDATA[Re: Planetary Annihilation "spiritual successor to T.A"]]>
i would've preferred early access at $40 though, even as a backer, because steam is generally more forgiving to unfinished games than high price points. it also didn't help that valve wouldn't let PA list its $40 preorder price point, apparently you cannot list preorders and early access on the same store page. a lot of people were confused and thought the game was retailing for this price.

moral of the story, KS backers may be investors but investors don't know marketing

Statistics: Posted by Veta — 15 Jun 2013, 11:52


]]>
2013-06-14T16:18:44+02:00 2013-06-14T16:18:44+02:00 /viewtopic.php?t=1709&p=46206#p46206 <![CDATA[Re: Planetary Annihilation "spiritual successor to T.A"]]>
although, there is a legitimate way of explaining the high price of the alpha: the more expensive you make it, the more you get people, who really want to contribute to the development process. later on, the beta and release version become cheaper and therefore more accessible to a wider public.

i see for quite some time now, that they use every chance they can to raise more money (even in the end of the kickstarter they sold "metal planets" way overpriced), despite promising in the kickstarter that 900 k$ was enough to make the game...

every time they implicitly beg for more money, i feel they broke their last promise that they have enough money to make the best game in everyone's mind.

the problem is that the further they come in development, the more decisions they have to make how the game game is supposed to be and therefore, they disappoint a lot of people (e.g. me). this method (first promising everything, then disappointing most of the people) is PR suicide, because the game can never turn out to become as good as as initially promised. it is just impossible!

e.g. everybody, who loves the complexity and/or depth and/or many gameplay possibilities in supcom, just has to be disappointed/underwhelmed by PA.

the only chance i see it that UberEntertainment releases addons to the game (look at sins of a solar empire for comparison) and somebody writes a "complex mod" for this game (maybe it will not be possible to mod something into the game, which is not there from the beginning). but this needs at least a couple of years to become reality...

Statistics: Posted by eXcalibur — 14 Jun 2013, 16:18


]]>
2013-06-14T13:12:37+02:00 2013-06-14T13:12:37+02:00 /viewtopic.php?t=1709&p=46188#p46188 <![CDATA[Re: Planetary Annihilation "spiritual successor to T.A"]]> They can justify it all they want with the kickstarter pledge level: selling an unfinished product outside the kickstarter environment for twice the price of the retail makes no sense except if they need money. It's a way to (not) say : "the kickstarter money we got is not enough to finish the product at the level we promised" and pretend to say : 'we love the community so we offer them to pay double price to have a chance to play our awesome alpha' (even though it's unplayable and buggy).
This is really a bad move, though they did achieve something epic : failure.

Now, if there are hiccups in the game development (delay, poor release with bugs, etc), players won't forgive them. They can't do any more mistep and have a huge pressure to deliver a top notch product, otherwise, nobody the image of the company itself will be irremediably marred.

Statistics: Posted by pip — 14 Jun 2013, 13:12


]]>
2013-06-14T01:41:55+02:00 2013-06-14T01:41:55+02:00 /viewtopic.php?t=1709&p=46165#p46165 <![CDATA[Re: Planetary Annihilation "spiritual successor to T.A"]]>
http://www.metacritic.com/game/pc/planetary-annihilation

Statistics: Posted by Swkoll — 14 Jun 2013, 01:41


]]>
2013-06-13T01:19:10+02:00 2013-06-13T01:19:10+02:00 /viewtopic.php?t=1709&p=46071#p46071 <![CDATA[Re: Planetary Annihilation "spiritual successor to T.A"]]>
uberge3k wrote:
If they were on the same thread, the order wouldn't matter. If they were not, then the order definitely does matter and there will be race conditions. What if a node needs access to a node directly above or below it?


the processing order of adjacent zones is guaranteed to be sequential.
first, all zones marked '0' are processed by N threads, then all threads synchronize, then all zones marked "1" are
processed by N threads. if a unit from zone "0" interacts with a unit in zone "1" then it is guaranteed that no thread
is processing a unit in zones marked "1", because threads processing zone "1" have to wait for threads processing zone "0" to finish. same goes for the inverse interaction.


If you can process it sequentially, then you don't need multithreading. The entire point of threading is to improve performance.


this kind of BS is what first made me hesitate to write anything at all at first.


The problem is that your method is entirely reliant on a uniform distribution of units among the nodes. This will rarely occur in a real scenario, and thus your algorithm will unfortunately be efficient mostly when performance isn't needed and inefficient when it is.


that depends on the size of the zones. if they are small enough, then a single thread can process them in time.
and i said it before, in supcom you can have 10x10 with 1000 units and it runs at +0.
then compare the range of tanks to a 10x10km map. you can easily subdivide it into multiple zones.


As in my example, I'm not speaking of direct unit to unit calculations such as collision detection, but querying unit data from other units as is the case with things like drones. There is no way to implement your algorithm without severely crippling such communication.


drones cannot assist anything beyond a very small radius, so i dont see how they are special.
except if the kennel gets destroyed and the drone has to self-destruct. that requires special handling.
you can easily send a signal from the kennel to the drone that is handed from zone to zone obeying order of execution.
kind of like a very fast (indeed maximum speed) invisible projectile, but way easier to implement.


The sim drop you mentioned earlier during critical engagements is hardly "fine". My point is to use a simpler and more efficient algorithm to increase performance in a new engine, not merely match the lackadaisical performance from a much older engine using an inefficient and awkward algorithm.


i dont agree. it just doesnt bother me if a game slows down for a short time.


In the context of FA, what is of most concern is the sim speed drops as rendering is independent from the simulation calculation. As in your example, the sim dropping to -5 during an ASF engagement. Your proposed algorithm would not help this scenario. A tasklet wrapper would.


you can simulate a lot on the map in parallel, even if none of the units close together could be.
what you can is close to the speed you would have with 500vs500 asf with no other units around.
certainly an improvement.


Even if your algorithm was guaranteed to be more efficient during the remainder of the game, it would be irrelevant as the sim would not be running slow either way. But the fact remains that it cannot be more efficient than a tasklet wrapper, and is indeed far more likely to be *less* efficient not only a majority of the time, but assuredly during the performance critical scenarios where efficiency would be needed most.


keeping a portion of the game state twice is not overhead?
you may be convinced that it performs better, but that is not obvious to me.


Lastly, it has already been established that there are far more edge cases to handle than with a general tasklet solution. They are certainly surmountable, but the fact remains that they are there to begin with.


i wasnt around when that was established.

i have a question of my own:

1) if you simulate a tank being hit by 2 projectiles at the same time and you cannot yet decide which projectile hits first until
after you commit the result of the simulation in sequence to all other results, do you generate 2 possible future states for that tank? then would you not need a mutex each time you modify a future states of a unit? even if its unlikely to be locked that still adds overhead.
2) each time there is a conflict - for example 2 tanks want to drive onto the same location - only one tanks can win.
in the single thread model, the loser tank examines the new situation and decides to move somewhere else.
what happens to the loser tank when his simulation result was aborted because of a conflicting commit by another tank?
does it do nothing at all then?
3) if a tank shoots a projectile but it is later decided that the projectile was generated after the tank was destroyed, you have a lot of cleaning up to do during the commit phase, which has to run in sequence.
4) given that you have less time for the commit phase which has to execute sequentially and only after the simulation phase, are you sure you can process each unit then, even if you do not have to do the actual simulation, which is a few floating point calculations, but does not involve deallocation of a destroyed units resources for example.

i am not argueing that it doesnt work, but i have my doubts.
and in the end, you need more resources and cpu time to achieve the same result, i dont have to keep state twice.

Statistics: Posted by rootbeer23 — 13 Jun 2013, 01:19


]]>
2013-06-10T14:54:50+02:00 2013-06-10T14:54:50+02:00 /viewtopic.php?t=1709&p=45786#p45786 <![CDATA[Re: Planetary Annihilation "spiritual successor to T.A"]]>
rootbeer23 wrote:
uberge3k wrote:The concept of adjacency is why this is far more complex than it first appears. You must use both axis when separating the nodes, or else there *will* be edge race conditions at the extreme edges. That means that there are 8 nodes adjacent to every other node:

000
010
000

So, assuming a 10x10 grid for simplicity, the first pass would look like:

0000000000
0101010101
0000000000
0101010101
0000000000
0101010101
0000000000
0101010101
0000000000
0101010101


it looks like this (first half of 1 simtick):

0000000000
1111111111
0000000000
1111111111
0000000000
....

second half:

1111111111
0000000000
1111111111
0000000000
1111111111
....


If they were on the same thread, the order wouldn't matter. If they were not, then the order definitely does matter and there will be race conditions. What if a node needs access to a node directly above or below it?

rootbeer23 wrote:
And so on until every node is updated. Assuming that you aren't doing this all in one tick, that is a *lot* of added latency in an engine where latency is already an issue. Furthermore, there would be an obvious "stepping" with units in one zone being updated sooner than those in the adjacent zone. This could of course be covered up by interpolating between a set of timestamped prior states in each unit in order to maintain visual temporal coherency for the player, but the displayed state would necessarily dramatically differ from the internal positions of the units as it is no longer one tick behind, but 1-N ticks behind where N is the number of passes needed to process all nodes without adjacency.


you can process every unit in one tick. for the example above you need not simulate 100km^2 sequentially, but only
2*10km^2 sequentially (first you simulate 5*10km^2 on 5 cores, then you simulate the other 5*10km^2 on the same 5 cores), so you only need 2/5th the time given optimal unit distribution. all in one tick, and indistinguishable from pure sequential calculation. (because there has to be a processing order in any case, so it may as well be defined by association with a zone).

If you can process it sequentially, then you don't need multithreading. The entire point of threading is to improve performance.

The problem is that your method is entirely reliant on a uniform distribution of units among the nodes. This will rarely occur in a real scenario, and thus your algorithm will unfortunately be efficient mostly when performance isn't needed and inefficient when it is.

rootbeer23 wrote:
So now there are limitations on what you can only query based on an arbitrary spatial location? As a rule of thumb, if a feature only works some of the time, it may as well not be there. Therefore, this would break a *lot* of functionality, or worse, it would only function inconsistently and for reasons that are clear only to the developer of the spatial division system.


why would a tank need information about a tank on the other end of the map? the available information is not arbitrary, but is that information which is in the zone of influence of a unit. in principle akin to the speed of light limit in the physical world. same goes for artillery projectiles that travel the whole map and get handed from zone to zone.
only the aiming of the artillery must be a sequential operation to it does aim either at the new or the old position consistently.

As in my example, I'm not speaking of direct unit to unit calculations such as collision detection, but querying unit data from other units as is the case with things like drones. There is no way to implement your algorithm without severely crippling such communication.

rootbeer23 wrote:
It's only easier if you are willing to accept the staggering limitations that this method imposes on inter-unit communication, as well as the fact that it will be extremely unoptimized in the very cases that *need* optimization, which is during large scale unit engagements that will tend to occur near the same place. As well as the myriad of subtle edge cases that would need to be handled in order for it to function deterministically.


supcom simulates 8 player 10x10km games just fine on a single core. the grid can be quite large.

The sim drop you mentioned earlier during critical engagements is hardly "fine". My point is to use a simpler and more efficient algorithm to increase performance in a new engine, not merely match the lackadaisical performance from a much older engine using an inefficient and awkward algorithm.

rootbeer23 wrote:
I do not see any advantages whatsoever over a simple tasklet wrapper utilizing the update-cache-finalize pattern.


dunno. i am ambiguous to what method to use. both seem viable.

Performance optimization should always be done with the worst case scenario in mind.

Take the example of occlusion detection with terrain rendering algorithms. These algorithms fell out of favor for a simple, logical reason: they only work during non-worst case scenarios, where there is performance to spare, and during the worst case scenario of the largest possible section of terrain being in view, they actually add overhead and worsen the problem. [yes, certain games would indeed benefit from it more than other competing techniques, but I'm speaking of the more general cases where the camera is more or less free the majority of the time, as is the case in an RTS]

Performance optimization is all about removing or minimizing those spikes of low frame rates or slow processing, as that is what players notice. In other words, in the context of a game, the player won't care if it's running at 60 FPS or 5000FPS. He will care if it suddenly drops to 10 FPS.

In the context of FA, what is of most concern is the sim speed drops as rendering is independent from the simulation calculation. As in your example, the sim dropping to -5 during an ASF engagement. Your proposed algorithm would not help this scenario. A tasklet wrapper would.

Even if your algorithm was guaranteed to be more efficient during the remainder of the game, it would be irrelevant as the sim would not be running slow either way. But the fact remains that it cannot be more efficient than a tasklet wrapper, and is indeed far more likely to be *less* efficient not only a majority of the time, but assuredly during the performance critical scenarios where efficiency would be needed most.

Lastly, it has already been established that there are far more edge cases to handle than with a general tasklet solution. They are certainly surmountable, but the fact remains that they are there to begin with.

Statistics: Posted by uberge3k — 10 Jun 2013, 14:54


]]>
2013-06-10T14:31:58+02:00 2013-06-10T14:31:58+02:00 /viewtopic.php?t=1709&p=45784#p45784 <![CDATA[Re: Planetary Annihilation "spiritual successor to T.A"]]>
uberge3k wrote:
The concept of adjacency is why this is far more complex than it first appears. You must use both axis when separating the nodes, or else there *will* be edge race conditions at the extreme edges. That means that there are 8 nodes adjacent to every other node:

000
010
000

So, assuming a 10x10 grid for simplicity, the first pass would look like:

0000000000
0101010101
0000000000
0101010101
0000000000
0101010101
0000000000
0101010101
0000000000
0101010101


it looks like this (first half of 1 simtick):

0000000000
1111111111
0000000000
1111111111
0000000000
....

second half:

1111111111
0000000000
1111111111
0000000000
1111111111
....


And so on until every node is updated. Assuming that you aren't doing this all in one tick, that is a *lot* of added latency in an engine where latency is already an issue. Furthermore, there would be an obvious "stepping" with units in one zone being updated sooner than those in the adjacent zone. This could of course be covered up by interpolating between a set of timestamped prior states in each unit in order to maintain visual temporal coherency for the player, but the displayed state would necessarily dramatically differ from the internal positions of the units as it is no longer one tick behind, but 1-N ticks behind where N is the number of passes needed to process all nodes without adjacency.


you can process every unit in one tick. for the example above you need not simulate 100km^2 sequentially, but only
2*10km^2 sequentially (first you simulate 5*10km^2 on 5 cores, then you simulate the other 5*10km^2 on the same 5 cores), so you only need 2/5th the time given optimal unit distribution. all in one tick, and indistinguishable from pure sequential calculation. (because there has to be a processing order in any case, so it may as well be defined by association with a zone).


Also note that you still have to account for literal corner cases. Remember that planets are spherical. Assuming that they are generated by taking the 6 faces of a cube and projecting them to a sphere, each node field will have four adjacent node fields which will then need to communicate with each other.


like the above example, make zones along the lines of latitude


So now there are limitations on what you can only query based on an arbitrary spatial location? As a rule of thumb, if a feature only works some of the time, it may as well not be there. Therefore, this would break a *lot* of functionality, or worse, it would only function inconsistently and for reasons that are clear only to the developer of the spatial division system.


why would a tank need information about a tank on the other end of the map? the available information is not arbitrary, but is that information which is in the zone of influence of a unit. in principle akin to the speed of light limit in the physical world. same goes for artillery projectiles that travel the whole map and get handed from zone to zone.
only the aiming of the artillery must be a sequential operation to it does aim either at the new or the old position consistently.


It's only easier if you are willing to accept the staggering limitations that this method imposes on inter-unit communication, as well as the fact that it will be extremely unoptimized in the very cases that *need* optimization, which is during large scale unit engagements that will tend to occur near the same place. As well as the myriad of subtle edge cases that would need to be handled in order for it to function deterministically.


supcom simulates 8 player 10x10km games just fine on a single core. the grid can be quite large.



I do not see any advantages whatsoever over a simple tasklet wrapper utilizing the update-cache-finalize pattern.


dunno. i am ambiguous to what method to use. both seem viable.

Statistics: Posted by rootbeer23 — 10 Jun 2013, 14:31


]]>
2013-06-10T13:47:15+02:00 2013-06-10T13:47:15+02:00 /viewtopic.php?t=1709&p=45775#p45775 <![CDATA[Re: Planetary Annihilation "spiritual successor to T.A"]]>
rootbeer23 wrote:
no adjacent areas are processed in parallel. you divide the map into horizontal zones and in the first pass you simulate
all even zones, in the second pass you simulate all odd zones.
processing order is strict within a zone (as a precondition), interaction between units of adjacent zones is in sequence, because adjacent zones are not simulated in parallel.

The concept of adjacency is why this is far more complex than it first appears. You must use both axis when separating the nodes, or else there *will* be edge race conditions at the extreme edges. That means that there are 8 nodes adjacent to every other node:

000
010
000

So, assuming a 10x10 grid for simplicity, the first pass would look like:

0000000000
0101010101
0000000000
0101010101
0000000000
0101010101
0000000000
0101010101
0000000000
0101010101

And the second pass:

0101010101
0000000000
0101010101
0000000000
0101010101
0000000000
0101010101
0000000000
0101010101
0000000000

...And then the 3rd pass:

1010101010
0000000000
1010101010
0000000000
1010101010
0000000000
1010101010
0000000000
1010101010
0000000000

And so on until every node is updated. Assuming that you aren't doing this all in one tick, that is a *lot* of added latency in an engine where latency is already an issue. Furthermore, there would be an obvious "stepping" with units in one zone being updated sooner than those in the adjacent zone. This could of course be covered up by interpolating between a set of timestamped prior states in each unit in order to maintain visual temporal coherency for the player, but the displayed state would necessarily dramatically differ from the internal positions of the units as it is no longer one tick behind, but 1-N ticks behind where N is the number of passes needed to process all nodes without adjacency.

Also note that you still have to account for literal corner cases. Remember that planets are spherical. Assuming that they are generated by taking the 6 faces of a cube and projecting them to a sphere, each node field will have four adjacent node fields which will then need to communicate with each other.

rootbeer23 wrote:
you can enforce access rules (like for example a tank in an even zone can only query the position of a tank in an odd zone or its own zone etc.) in the lua interpreter, or whatever your scripting language would be. you need this only in a debug build. this kind of unallowed access is trivial to detect, since each entity is associated with a zone.
that comes in addition to the fact that the target selection function for a tank will ignore targets that are out of range naturally.

So now there are limitations on what you can only query based on an arbitrary spatial location? As a rule of thumb, if a feature only works some of the time, it may as well not be there. Therefore, this would break a *lot* of functionality, or worse, it would only function inconsistently and for reasons that are clear only to the developer of the spatial division system.

rootbeer23 wrote:
and correctness is a priori easier to ensure than for the precondition: that is making a single thread deterministic.
that is because the spatial independence is not a thing that you have to impose on the code, its a natural property of an RTS.

It's only easier if you are willing to accept the staggering limitations that this method imposes on inter-unit communication, as well as the fact that it will be extremely unoptimized in the very cases that *need* optimization, which is during large scale unit engagements that will tend to occur near the same place. As well as the myriad of subtle edge cases that would need to be handled in order for it to function deterministically.

I do not see any advantages whatsoever over a simple tasklet wrapper utilizing the update-cache-finalize pattern.

Statistics: Posted by uberge3k — 10 Jun 2013, 13:47


]]>
2013-06-10T05:25:10+02:00 2013-06-10T05:25:10+02:00 /viewtopic.php?t=1709&p=45724#p45724 <![CDATA[Re: Planetary Annihilation "spiritual successor to T.A"]]>
uberge3k wrote:
However, I was unclear with what I meant by units crossing bounds at different rates. Imagine a unit that needs to communicate with a unit in a different node. For example, a shoulder drone needs to query the ACU's state to determine what it should be doing. If they are in separate nodes, and they are processed in different orders on different clients, the results will differ and the game will desync.


no adjacent areas are processed in parallel. you divide the map into horizontal zones and in the first pass you simulate
all even zones, in the second pass you simulate all odd zones.
processing order is strict within a zone (as a precondition), interaction between units of adjacent zones is in sequence, because adjacent zones are not simulated in parallel.


Okay, so that's a rare edge case that will almost never happen and we can easily maintain a list of units that should be processed outside of the node structure. Fair enough.


like artillery and such.


But multiply this one scenario by all of the different types of units and different permutations regarding the myriad of different ways units can interact with other units. And couple it with the extremely high probability that the guy innocently writing unit code won't be familiar with the multithreaded architecture and will inadvertently write subtle desync-creating code.


you can enforce access rules (like for example a tank in an even zone can only query the position of a tank in an odd zone or its own zone etc.) in the lua interpreter, or whatever your scripting language would be. you need this only in a debug build. this kind of unallowed access is trivial to detect, since each entity is associated with a zone.
that comes in addition to the fact that the target selection function for a tank will ignore targets that are out of range naturally.

what we have then is a solution that is deterministic inside a zone (precondition), deterministic with regard to adjacent zones (they are processed in sequence) and deterministic between parallel processed zones (there is no interaction and thus order is irrelevant).

and correctness is a priori easier to ensure than for the precondition: that is making a single thread deterministic.
that is because the spatial independence is not a thing that you have to impose on the code, its a natural property of an RTS.

Statistics: Posted by rootbeer23 — 10 Jun 2013, 05:25


]]>
2013-06-10T04:18:55+02:00 2013-06-10T04:18:55+02:00 /viewtopic.php?t=1709&p=45723#p45723 <![CDATA[Re: Planetary Annihilation "spiritual successor to T.A"]]>
rootbeer23 wrote:
the threads on 2 nodes process the units in their area of influence in excatly the same sequence, so they will use the same random number for the same calculation, starting at the same time. after tick 1 the threads on both nodes will have used
547382 random numbers in the exact same way. this is the same solution as how to do a single thread sync simulation.
the same is true for all PRNGs across different nodes which process the same area and thus do the exact same calculation in the same sequence. units will cross borders between regions (= move to another thread) after the simulation tick, again, keeping the state the same across nodes.


So each node would have to be pregenerated, maintained in the same order using the same seed across all clients. I suppose this wouldn't be an issue for tiny 'planets' the size of PA's if the node count was kept low.

However, I was unclear with what I meant by units crossing bounds at different rates. Imagine a unit that needs to communicate with a unit in a different node. For example, a shoulder drone needs to query the ACU's state to determine what it should be doing. If they are in separate nodes, and they are processed in different orders on different clients, the results will differ and the game will desync.

Okay, so that's a rare edge case that will almost never happen and we can easily maintain a list of units that should be processed outside of the node structure. Fair enough.

But multiply this one scenario by all of the different types of units and different permutations regarding the myriad of different ways units can interact with other units. And couple it with the extremely high probability that the guy innocently writing unit code won't be familiar with the multithreaded architecture and will inadvertently write subtle desync-creating code.

Statistics: Posted by uberge3k — 10 Jun 2013, 04:18


]]>
2013-06-10T00:45:13+02:00 2013-06-10T00:45:13+02:00 /viewtopic.php?t=1709&p=45708#p45708 <![CDATA[Re: Planetary Annihilation "spiritual successor to T.A"]]>
uberge3k wrote:
rootbeer23 wrote:each thread has its own PRNG. no synchronization necessary.

But how does each client keep track of synchronizing their threads' PRNGs, and in what order they initialize and activate themselves? What happens when units cross bounding thresholds at different rates? And so on and so forth. My point is that these types of edge cases are what take up the majority of development, so minimizing them, especially when the algorithm is superior, should be a priority.


the threads on 2 nodes process the units in their area of influence in excatly the same sequence, so they will use the same random number for the same calculation, starting at the same time. after tick 1 the threads on both nodes will have used
547382 random numbers in the exact same way. this is the same solution as how to do a single thread sync simulation.
the same is true for all PRNGs across different nodes which process the same area and thus do the exact same calculation in the same sequence. units will cross borders between regions (= move to another thread) after the simulation tick, again, keeping the state the same across nodes.

Statistics: Posted by rootbeer23 — 10 Jun 2013, 00:45


]]>
2013-06-09T23:38:47+02:00 2013-06-09T23:38:47+02:00 /viewtopic.php?t=1709&p=45698#p45698 <![CDATA[Re: Planetary Annihilation "spiritual successor to T.A"]]>
rootbeer23 wrote:
uberge3k wrote:
rootbeer23 wrote:i agree keeping double state is a possibility.
although memory latency is not a trivial matter.
50 nanosecond busy wait on a cache miss is probably a longer time than doing all the calculations for a projectile.

Taking steps to maintain proper cache coherency will ensure that this happens a vanishingly small percentage of the time. Keep in mind that each unit's state is vanishingly small, and L1/L2 cache is relatively huge nowadays.

Even in the absolute worst possible scenario, it will still be much, much faster than if you didn't thread it.


DRAM access time isnt a good argument anyway, because you would not actually commit any state twice. what you have
is each state in 2 versions per unit. then in the first tick you read version 1 and write version 2, in the second tick you read
version 2 and write version 1 etc pp and so on.
only drawback is: double state memory (ok, we can survive that)
must copy the state of units that you didnt actually modify (we can survive that with a little scar).

Exactly the point I was trying to make. :)

Statistics: Posted by uberge3k — 09 Jun 2013, 23:38


]]>
2013-06-09T23:38:29+02:00 2013-06-09T23:38:29+02:00 /viewtopic.php?t=1709&p=45697#p45697 <![CDATA[Re: Planetary Annihilation "spiritual successor to T.A"]]>
rootbeer23 wrote:
if there are no units, the threads dedicated to an area dont do any work.
if you have 32 threads on a 4 core machine, then you will have enough threads to populate idle cores.
if the area is densely populated, then thats a suboptimal corner case. gosh, if supcom only went to -1 or -2 during 500vs500 asf battles and remained at +0 otherwise, i would not be complaining.

Which would be one of those suboptimal corner cases, and precisely why this method is insufficient for this type of game.

Sadly, it appears that PA won't have 500vs500ASF battles so I suppose the point is moot either way.
rootbeer23 wrote:
uberge3k wrote:- Unit A and Unit B both need to call random() this tick in order to find out what their weapon's muzzle spread will be.
- They are on different threads.
- On Player 1's PC, Unit A calls random() first. On Player 2's PC, it's unit B.
- Desyncs ensue.

Could this potentially be solved? Yes. Is it more difficult in addition to being less efficient and scalable than the aforementioned threading architecture? Also yes.


each thread has its own PRNG. no synchronization necessary.

But how does each client keep track of synchronizing their threads' PRNGs, and in what order they initialize and activate themselves? What happens when units cross bounding thresholds at different rates? And so on and so forth. My point is that these types of edge cases are what take up the majority of development, so minimizing them, especially when the algorithm is superior, should be a priority.

Statistics: Posted by uberge3k — 09 Jun 2013, 23:38


]]>
2013-06-09T22:49:48+02:00 2013-06-09T22:49:48+02:00 /viewtopic.php?t=1709&p=45691#p45691 <![CDATA[Re: Planetary Annihilation "spiritual successor to T.A"]]>
uberge3k wrote:
rootbeer23 wrote:i agree keeping double state is a possibility.
although memory latency is not a trivial matter.
50 nanosecond busy wait on a cache miss is probably a longer time than doing all the calculations for a projectile.

Taking steps to maintain proper cache coherency will ensure that this happens a vanishingly small percentage of the time. Keep in mind that each unit's state is vanishingly small, and L1/L2 cache is relatively huge nowadays.

Even in the absolute worst possible scenario, it will still be much, much faster than if you didn't thread it.


DRAM access time isnt a good argument anyway, because you would not actually commit any state twice. what you have
is each state in 2 versions per unit. then in the first tick you read version 1 and write version 2, in the second tick you read
version 2 and write version 1 etc pp and so on.
only drawback is: double state memory (ok, we can survive that)
must copy the state of units that you didnt actually modify (we can survive that with a little scar).

Statistics: Posted by rootbeer23 — 09 Jun 2013, 22:49


]]>
2013-06-09T22:43:35+02:00 2013-06-09T22:43:35+02:00 /viewtopic.php?t=1709&p=45690#p45690 <![CDATA[Re: Planetary Annihilation "spiritual successor to T.A"]]>
uberge3k wrote:
This would be incredibly uneven due to unit distribution. You are likely to be wasting an enormous amount of processing time by dedicating threads to sparsely populated areas, or one thread being hammered by a densely populated area.


if there are no units, the threads dedicated to an area dont do any work.
if you have 32 threads on a 4 core machine, then you will have enough threads to populate idle cores.
if the area is densely populated, then thats a suboptimal corner case. gosh, if supcom only went to -1 or -2 during 500vs500 asf battles and remained at +0 otherwise, i would not be complaining.

uberge3k wrote:
- Unit A and Unit B both need to call random() this tick in order to find out what their weapon's muzzle spread will be.
- They are on different threads.
- On Player 1's PC, Unit A calls random() first. On Player 2's PC, it's unit B.
- Desyncs ensue.

Could this potentially be solved? Yes. Is it more difficult in addition to being less efficient and scalable than the aforementioned threading architecture? Also yes.


each thread has its own PRNG. no synchronization necessary.

Statistics: Posted by rootbeer23 — 09 Jun 2013, 22:43


]]>