Ik snap niet zo goed waarom deze discussie nou moet afbuigen van iets statisch (benchmarks) naar iets anders waar we het niet over hebben, reageer dan op degene die begint met te patsen dat apple zo gewledig gaat zijn in benchmarks, en wijs hem er lekker op dat het om apps performance gaat, maar nee je gaat natuurlijk op degene af die het tegendeel bewijst
En ondertussen wordt ik weer meegesleurd in een oneindige discussie die niet meer gaat over waar die over hoort te gaan...
Anyway, ik speel het spelletje mee
De hoofdzaak van de 8 cores is dat er 4 cores zijn die het zware werk doen en 4 die het lichte werk doen, oftewel effectief is het meer een quadcore dan een octacore. Maar helaas is er een paalde groep mensen die het maar al te grag blijven benadrukken dat het 8 cores zijn (maar niet vertellen wat voor cores) want dat zijn maar liefst 4x meer cores dan Apple gebruikt, en dat klinkt lekker imposant!
Ik heb niet het hele artikel gelezen want daar heb ik geen tijd en zin in, maar de conclusie van elk onderdeel hier maar even genoemd, let vooral op hoe vaak benoemt wordt dat het heel goed schaalt op de 4+4, en er meer dan 4 threats gebruikt worden:
When looking at the total amount of threads on the system, we can see that the S-Browser makes good use of at least 4 CPU cores with some peaks of up to 5 threads. All in all, this is a scenario which doesn't necessarily makes use of 8 cores per-se, however the 4+4 setup of big.LITTLE SoCs does seem to be fully utilized for power management as the computational load shifts between the clusters depending on the needed performance.
The total amount of threads on the system doesn't change much compared to the previous scenario: The S-Browser still manages to actively make good use of up to 4 cores with the occasional burst of up to 5 threads.
The total run-queue depths for the system looks very different for Chrome. We see a consistent use of 4-5 cores and a large burst of up to 8 threads. This is a very surprisng finding and impact on the way we perceive the core count usage of Chrome.
The total run-queue depths for the system again confirm what we saw in the previous scenario: Chrome is able to consistently make use of a large amount of threads, so that we see use of up to 6 CPUs with small bursts of up to almost 9 threads.
In general, the workload is optimized towards 4-core CPUs. Because 4x4 big.LITTLE SoCs in a sense can be seen as 4-core designs, we don't see an issue here. On the other hand, symmetric 8-core designs here would see very little benefit from the additional cores.
As the big cores didn't have much scheduled on them, the total rq-graph for the whole system doesn't look very different from the one on the little cores. Writing messages is definitely a low-end task that doesn't require too much processing power.
Overall the app launch doesn't seem to take much advantage of advanced multi-threading as we just manage to peak at 3 threads in the run-queue.
Again, in terms of actual total run-queue depth this task could have easily been done by a 2-core SoC such as Apple's A6/7/8 without losing on performance. What one has to consider though is that the per-CPU load on such a SoC would be much higher, requiring higher single-CPU performance or frequency. Because the app actually manages to spread out the load equally over 4 CPUs, it should actually be able to take advantage of pararellism for the sake of power efficiency instead of performance.
Overall, the Play Store app also seems to be optimized and aimed for 4-core designs. Here big.LITTLE seems to work well as we see a mix of small threads with a mix of big threads running concurrently on both clusters.
When looking at the total system run-queue depth, things look for a lack of better description, quite ridiculous. We routinely have peaks where all 8 cores of the system are fully loaded and peak at over 10 threads. It looks like Google is able to massively parallelize the app update process and take advantage of even the highest core-count SoCs. This scenario is absolutely about maximum throughput and performance while utilizing all available hardware resources.
Samsung seems to be able to parallelize well the camera application as this is again a sensible scenario that makes good usage of the 4.4 big.LITTLE topology of the SoC.
Overall the picture capture sees quite a surprising peak in terms of the run-queue depth,fully utilizing up to 6 CPUss on the system.
All in all it looks like video recording is about a large number of small threads. There are two spikes in the total run-queue depth that were predominantly caused by the little CPU cluster which pushed the total rq-depth up to 8 in for short moments. I'm not sure what caused the spikes as I remained relatively still during the recording and did no special activity to warrant such behaviour.
Overall, the game's rq-depth averages around 2.5 during the main loading sequence with a larger burst of 7 threads when the 3D intro starts playing.
I think it's pretty safe to come to the conclusion that Real Racing 3 is coded with quad-core CPUs in mind as we see exactly 4 major threads loading the SoC's CPUs to various extent.
Even though the total rq-depth might be a bit misleading here while it's showing an average of around 2.5, we can see that in the individual per-CPU runqueues we have 4 major threads at work. Again this is a case of using parallelization for the sake of power efficiency instead of performance. The 3 smaller threads on the little cores could have well been handled by a single larger CPU at higher frequency, but it wouldn't have been nearly as power efficient as spreading them onto the smaller cores.
En dan met tromgeroffel een mooie stukjes uit de eindconclusie:So while a 2-core design could handle bursts where ~3-4 threads are placed onto the big cluster, the CPUs would need to scale up higher in frequency to provide the same performance compared to a wider 4-core design. And scaling up higher in frequency has a quadratically detrimental effect on power efficiency as we need higher operating voltages.At the end of the day I think the 4 big core designs are not only the better performing ones but also the more efficient ones..
What is clear though albeit there are corner-cases, is that the vast majority of applications do seem to be optimal for quad-core SoCs. This is why traditional 4-core and 4.4 big.LITTLE designs still appear to make the most sense in terms providing a balanced configuration and making most use of the hardware at hand. For big.LITTLE, even if there were no use-cases where all cores are concurrently used, it's not a big deal as what we are aiming for in heterogeneous systems is power efficiency gains.
In the end what we should take away from this analysis is that Android devices can make much better use of multi-threading than initially expected. There's very solid evidence that not only are 4.4 big.LITTLE designs validated, but we also find practical benefits of using 8-core "little" designs over similar single-cluster 4-core SoCs.
PS ik zou volgende keer het artikel beter lezen als ik jou was, het toont namelijk duidelijk aan dat meer cores nuttig zijn
[Reactie gewijzigd door watercoolertje op 25 juli 2024 15:12]