You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
66 lines
2.2 KiB
66 lines
2.2 KiB
8 years ago
|
commit bdfc5375f219d6def81effda4e57cb56d01fc917
|
||
|
Author: Sergei Trofimovich <slyfox@gentoo.org>
|
||
|
Date: Tue Aug 30 12:10:54 2016 +0100
|
||
|
|
||
|
rts: enable parallel GC scan of large (32M+) allocation area
|
||
|
|
||
|
Parallel GC does not scan large allocation area (-A)
|
||
|
effectively as it does not do work stealing from nursery
|
||
|
by default.
|
||
|
|
||
|
That leads to large imbalance when only one of threads
|
||
|
overflows allocation area: most of GC threads finish
|
||
|
quickly (as there is not much to collect) and sit idle
|
||
|
waiting while single GC thread finishes scan of single
|
||
|
allocation area for that thread.
|
||
|
|
||
|
The patch enables work stealing for (equivalent of -qb0)
|
||
|
allocation area of -A32M or higher.
|
||
|
|
||
|
Tested on a highlighting-kate package from Trac #9221
|
||
|
|
||
|
On 8-core machine the difference is around 5% faster
|
||
|
of wall-clock time. On 24-core VM the speedup is 20%.
|
||
|
|
||
|
Signed-off-by: Sergei Trofimovich <siarheit@google.com>
|
||
|
|
||
|
Test Plan: measured wall time and GC parallelism on highlighting-kate build
|
||
|
|
||
|
Reviewers: austin, bgamari, erikd, simonmar
|
||
|
|
||
|
Reviewed By: bgamari, simonmar
|
||
|
|
||
|
Subscribers: thomie
|
||
|
|
||
|
Differential Revision: https://phabricator.haskell.org/D2483
|
||
|
|
||
|
GHC Trac Issues: #9221
|
||
|
|
||
|
diff --git a/rts/RtsFlags.c b/rts/RtsFlags.c
|
||
|
index fda33f0..7a719b9 100644
|
||
|
--- a/rts/RtsFlags.c
|
||
|
+++ b/rts/RtsFlags.c
|
||
|
@@ -237,1 +237,1 @@ void initRtsFlagsDefaults(void)
|
||
|
- RtsFlags.ParFlags.parGcLoadBalancingGen = 1;
|
||
|
+ RtsFlags.ParFlags.parGcLoadBalancingGen = ~0u; /* auto, based on -A */
|
||
|
@@ -1398,2 +1390,19 @@ static void normaliseRtsOpts (void)
|
||
|
}
|
||
|
|
||
|
+#ifdef THREADED_RTS
|
||
|
+ if (RtsFlags.ParFlags.parGcLoadBalancingGen == ~0u) {
|
||
|
+ StgWord alloc_area_bytes
|
||
|
+ = RtsFlags.GcFlags.minAllocAreaSize * BLOCK_SIZE;
|
||
|
+
|
||
|
+ // If allocation area is larger that CPU cache
|
||
|
+ // we can finish scanning quicker doing work-stealing
|
||
|
+ // scan. Trac #9221
|
||
|
+ // 32M looks big enough not to fit into L2 cache
|
||
|
+ // of popular modern CPUs.
|
||
|
+ if (alloc_area_bytes >= 32 * 1024 * 1024) {
|
||
|
+ RtsFlags.ParFlags.parGcLoadBalancingGen = 0;
|
||
|
+ } else {
|
||
|
+ RtsFlags.ParFlags.parGcLoadBalancingGen = 1;
|
||
|
+ }
|
||
|
+ }
|
||
|
+#endif
|