@MoSal

MoSal@lemm.ee · 9 months ago

Thank you for working on this. I found Okhsl to be of great utility.

MoSal@lemm.ee · edit-2 1 year ago

Okay. I updated mold to v2.0.0. Added "-Z", "time-passes" to get link times, ran cargo with --timings to get CPU utilization graphs. Tested on two projects of mine (the one from yesterday is “X”).

Link times are picked as the best from 3-4 runs, changing only white space on main.rs.

`lto="fat"`	lld	mold
project X (cu=1)	105.923	106.380

Project X (cu=8)	103.512	103.513

Project S (cu=1)	94.290	94.969

Project S (cu=8)	100.118	100.449

Observations (lto="fat"): As expected, not a lot of utilization of multi-core. Using codegen-units larger than 1 may even cause a regression in link time. Choice of linker between lld and mold appears to be of no significance.

`lto="thin"`	lld	mold
project X (cu=1)	46.596	47.118

Project X (cu=8)	34.167	33.839

Project X (cu=16)	36.296	36.621

Project S (cu=1)	41.817	41.404

Project S (cu=8)	32.062	32.162

Project S (cu=16)	35.780	36.074

Observations (lto="thin"): Here, we see parallel LLVM_lto_optimize runs kicking in. Testing with codegen-units=16 was also done. In that case, the number of parallel LLVM_lto_optimize runs was so big, the synchronization overhead caused a regression running that test on a humble workstation powered by an Intel i7-7700K processor (4 physical, 8 logical cores only). The results will probably look different running this test case (cu=16) in a more powerful setup. But still, the choice of linker between lld and mold appears to be of no significance.

`lto=false`	lld	mold
project X (cu=1)	29.160	29.231

Project X (cu=8)	8.130	8.293

Project X (cu=16)	7.076	6.953

Project S (cu=1)	11.996	12.069

Project S (cu=8)	4.418	4.462

Project S (cu=16)	4.357	4.455

Observations (lto=false): Here, codegen-units becomes the dominant factor with no heavy LLVM_lto_optimize runs involved. Going above codegen-units=8 does not hurt link time. Still, the choice of linker between lld and mold appears to be of no significance.

`lto="off"`	lld	mold
project X (cu=1)	29.109	29.201
Project X (cu=8)	5.896	6.117
Project X (cu=16)	3.479	3.637
Project S (cu=1)	11.732	11.742
Project S (cu=8)	2.354	2.355
Project S (cu=16)	1.517	1.499

Observations (lto="off"): Same observations as lto=false. Still, the choice of linker between lld and mold appears to be of no significance.

Debug builds link in <.4 seconds.

MoSal@lemm.ee · 1 year ago

`codegen-units=1`, `debug=true`, varying `lto`

`lto = "fat"`

Flags	Clean build time	Pre-strip size	Post-strip size
(default)	2:31	90.8207MiB	7.3374MiB
`["-Z", "gcc-ld=lld"]`	2:31	91.9731MiB	7.3332MiB
`linker = "clang"`	2:32	90.8207MiB	7.3375MiB
`linker = "clang"; fuse-ld="mold"`	2:31	92.1107MiB	7.3334MiB

`lto = "thin"`

Flags	Clean build time	Pre-strip size	Post-strip size
(default)	1:33	96.9630MiB	8.1695MiB
`["-Z", "gcc-ld=lld"]`	1:32	98.3889MiB	8.1777MiB
`linker = "clang"`	1:33	96.9631MiB	8.1695MiB
`linker = "clang"; fuse-ld="mold"`	1:32	98.6903MiB	8.1797MiB

`lto = false`

Flags	Clean build time	Pre-strip size	Post-strip size
(default)	1:32	113.5656MiB	8.0601MiB
`["-Z", "gcc-ld=lld"]`	1:30	115.1210MiB	8.1122MiB
`linker = "clang"`	1:32	113.5656MiB	8.0602MiB
`linker = "clang"; fuse-ld="mold"`	1:31	115.4679MiB	8.0663MiB

`lto = "off"`

Flags	Clean build time	Pre-strip size	Post-strip size
(default)	1:33	113.5666MiB	8.0601MiB
`["-Z", "gcc-ld=lld"]`	1:31	115.1231MiB	8.1122MiB
`linker = "clang"`	1:32	113.5667MiB	8.0602MiB
`linker = "clang"; fuse-ld="mold"`	1:31	115.4697MiB	8.0662MiB

`codegen-units=8`, `debug=true`, varying `lto`

`lto = "fat"`

Flags	Clean build time	Pre-strip size	Post-strip size
(default)	2:21	104.9842MiB	7.6304MiB
`["-Z", "gcc-ld=lld"]`	2:19	106.1436MiB	7.6264MiB
`linker = "clang"`	2:21	104.9882MiB	7.6344MiB
`linker = "clang"; fuse-ld="mold"`	2:19	106.2864MiB	7.6325MiB

`lto = "thin"`

Flags	Clean build time	Pre-strip size	Post-strip size
(default)	1:12	134.1112MiB	9.0445MiB
`["-Z", "gcc-ld=lld"]`	1:09	136.1897MiB	9.0660MiB
`linker = "clang"`	1:12	134.1113MiB	9.0446MiB
`linker = "clang"; fuse-ld="mold"`	1:09	136.4466MiB	9.0494MiB

`lto = false`

Flags	Clean build time	Pre-strip size	Post-strip size
(default)	1:14	158.1049MiB	9.0328MiB
`["-Z", "gcc-ld=lld"]`	1:11	159.9998MiB	9.1129MiB
`linker = "clang"`	1:14	158.1050MiB	9.0328MiB
`linker = "clang"; fuse-ld="mold"`	1:12	160.3123MiB	9.0428MiB

`lto = "off"`

Flags	Clean build time	Pre-strip size	Post-strip size
(default)	0:57	145.9463MiB	9.4586MiB
`["-Z", "gcc-ld=lld"]`	0:54	148.6021MiB	9.6001MiB
`linker = "clang"`	0:57	145.9464MiB	9.4587MiB
`linker = "clang"; fuse-ld="mold"`	0:55	148.8842MiB	9.4668MiB

mold appears to be similar but not faster than lld.

With the caveat that this is not a proper benchmark since:

I didn’t measure link time alone.
I didn’t bother running each case multiple times picking the fastest run (since I perceived the differences to be insignificant).

And a side note, lto = false appears to be practically useless.

codegen-units=1, debug=true, varying lto

lto = "fat"

lto = "thin"

lto = false

lto = "off"

codegen-units=8, debug=true, varying lto

lto = "fat"

lto = "thin"

lto = false

lto = "off"

`codegen-units=1`, `debug=true`, varying `lto`

`lto = "fat"`

`lto = "thin"`

`lto = false`

`lto = "off"`

`codegen-units=8`, `debug=true`, varying `lto`

`lto = "fat"`

`lto = "thin"`

`lto = false`

`lto = "off"`