Discussion:
Parallelize the compilation using Threads
Giuliano Augusto Faulin Belinassi
2018-11-14 21:47:41 UTC
Permalink
As a brief introduction, I am a graduate student that got interested

in the "Parallelize the compilation using threads"(GSoC 2018 [1]). I
am a newcommer in GCC, but already have sent some patches, some of
them have already been accepted [2].

I brought this subject up in IRC, but maybe here is a proper place to
discuss this topic.

From my point of view, parallelizing GCC itself will only speed up the
compilation of projects which have a big file that creates a
bottleneck in the whole project compilation (note: by big, I mean the
amount of code to generate). Additionally, I know that GCC must not
change the project layout, but from the software engineering perspective,
this may be a bad smell that indicates that the file should be broken
into smaller files. Finally, the Makefiles will take care of the
parallelization task.

My questions are:

1. Is there any project compilation that will significantly be improved
if GCC runs in parallel? Do someone has data about something related
to that? How about the Linux Kernel? If not, I can try to bring some.

2. Did I correctly understand the goal of the parallelization? Can
anyone provide extra details to me?

I am willing to turn my master’s thesis on that and also apply to GSoC
2019 if it shows to be fruitful.

[1] https://gcc.gnu.org/wiki/SummerOfCode
[2] https://patchwork.ozlabs.org/project/gcc/list/?submitter=74682


Thanks
Richard Biener
2018-11-15 10:29:02 UTC
Permalink
On Wed, Nov 14, 2018 at 10:47 PM Giuliano Augusto Faulin Belinassi
Post by Giuliano Augusto Faulin Belinassi
As a brief introduction, I am a graduate student that got interested
in the "Parallelize the compilation using threads"(GSoC 2018 [1]). I
am a newcommer in GCC, but already have sent some patches, some of
them have already been accepted [2].
I brought this subject up in IRC, but maybe here is a proper place to
discuss this topic.
From my point of view, parallelizing GCC itself will only speed up the
compilation of projects which have a big file that creates a
bottleneck in the whole project compilation (note: by big, I mean the
amount of code to generate).
That's true. During GCC bootstrap there are some of those (see PR84402).

One way to improve parallelism is to use link-time optimization where
even single source files can be split up into multiple link-time units. But
then there's the serial whole-program analysis part.
Post by Giuliano Augusto Faulin Belinassi
Additionally, I know that GCC must not
change the project layout, but from the software engineering perspective,
this may be a bad smell that indicates that the file should be broken
into smaller files. Finally, the Makefiles will take care of the
parallelization task.
What do you mean by GCC must not change the project layout? GCC
happily re-orders functions and link-time optimization will reorder
TUs (well, linking may as well).
Post by Giuliano Augusto Faulin Belinassi
1. Is there any project compilation that will significantly be improved
if GCC runs in parallel? Do someone has data about something related
to that? How about the Linux Kernel? If not, I can try to bring some.
We do not have any data about this apart from experiments with
splitting up source files for PR84402.
Post by Giuliano Augusto Faulin Belinassi
2. Did I correctly understand the goal of the parallelization? Can
anyone provide extra details to me?
You may want to search the mailing list archives since we had a
student application (later revoked) for the task with some discussion.

In my view (I proposed the thing) the most interesting parts are
getting GCCs global state documented and reduced. The parallelization
itself is an interesting experiment but whether there will be any
substantial improvement for builds that can already benefit from make
parallelism remains a question.
Post by Giuliano Augusto Faulin Belinassi
I am willing to turn my master’s thesis on that and also apply to GSoC
2019 if it shows to be fruitful.
[1] https://gcc.gnu.org/wiki/SummerOfCode
[2] https://patchwork.ozlabs.org/project/gcc/list/?submitter=74682
Thanks
Jeff Law
2018-11-15 16:56:09 UTC
Permalink
Post by Richard Biener
Post by Giuliano Augusto Faulin Belinassi
2. Did I correctly understand the goal of the parallelization? Can
anyone provide extra details to me?
You may want to search the mailing list archives since we had a
student application (later revoked) for the task with some discussion.
In my view (I proposed the thing) the most interesting parts are
getting GCCs global state documented and reduced. The parallelization
itself is an interesting experiment but whether there will be any
substantial improvement for builds that can already benefit from make
parallelism remains a question.
Agreed. Driving down the amount of global state is good in and of
itself. It's also a prerequisite for parallelizing GCC itself using
threads.

I suspect driving down global state probably isn't that interesting for
a master's thesis though :-)

jeff
Szabolcs Nagy
2018-11-15 18:07:21 UTC
Permalink
Post by Richard Biener
In my view (I proposed the thing) the most interesting parts are
getting GCCs global state documented and reduced. The parallelization
itself is an interesting experiment but whether there will be any
substantial improvement for builds that can already benefit from make
parallelism remains a question.
in the common case (project with many small files, much more than
core count) i'd expect a regression:

if gcc itself tries to parallelize that introduces inter thread
synchronization and potential false sharing in gcc (e.g. malloc
locks) that does not exist with make parallelism (glibc can avoid
some atomic instructions when a process is single thr
Martin Jambor
2018-11-16 13:05:24 UTC
Permalink
Hi Giuliano,
Post by Richard Biener
You may want to search the mailing list archives since we had a
student application (later revoked) for the task with some discussion.
Specifically, the whole thread beginning with
https://gcc.gnu.org/ml/gcc/2018-03/msg00179.html

Martin
Giuliano Augusto Faulin Belinassi
2018-11-16 18:59:51 UTC
Permalink
Hi! Sorry for the late reply again :P

On Thu, Nov 15, 2018 at 8:29 AM Richard Biener
Post by Richard Biener
On Wed, Nov 14, 2018 at 10:47 PM Giuliano Augusto Faulin Belinassi
Post by Giuliano Augusto Faulin Belinassi
As a brief introduction, I am a graduate student that got interested
in the "Parallelize the compilation using threads"(GSoC 2018 [1]). I
am a newcommer in GCC, but already have sent some patches, some of
them have already been accepted [2].
I brought this subject up in IRC, but maybe here is a proper place to
discuss this topic.
From my point of view, parallelizing GCC itself will only speed up the
compilation of projects which have a big file that creates a
bottleneck in the whole project compilation (note: by big, I mean the
amount of code to generate).
That's true. During GCC bootstrap there are some of those (see PR84402).
One way to improve parallelism is to use link-time optimization where
even single source files can be split up into multiple link-time units. But
then there's the serial whole-program analysis part.
Did you mean this: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 ?
That is a lot of data :-)

It seems that 'phase opt and generate' is the most time-consuming
part. Is that the 'GIMPLE optimization pipeline' you were talking
about in this thread:
https://gcc.gnu.org/ml/gcc/2018-03/msg00202.html
Post by Richard Biener
Post by Giuliano Augusto Faulin Belinassi
Additionally, I know that GCC must not
change the project layout, but from the software engineering perspective,
this may be a bad smell that indicates that the file should be broken
into smaller files. Finally, the Makefiles will take care of the
parallelization task.
What do you mean by GCC must not change the project layout? GCC
happily re-orders functions and link-time optimization will reorder
TUs (well, linking may as well).
I think this is in response to a comment I made on IRC. Giuliano said
that if a project has a very large file that dominates the total build
time, the file should be split up into smaller pieces. I said "GCC
can't restructure people's code. it can only try to compile it
faster". We weren't referring to code transformations in the compiler
like re-ordering functions, but physically refactoring the source
code.
Yes. But from one of the attachments from PR84402, it seems that such
files exist on GCC,
https://gcc.gnu.org/bugzilla/attachment.cgi?id=43440
Post by Richard Biener
Post by Giuliano Augusto Faulin Belinassi
1. Is there any project compilation that will significantly be improved
if GCC runs in parallel? Do someone has data about something related
to that? How about the Linux Kernel? If not, I can try to bring some.
We do not have any data about this apart from experiments with
splitting up source files for PR84402.
Post by Giuliano Augusto Faulin Belinassi
2. Did I correctly understand the goal of the parallelization? Can
anyone provide extra details to me?
You may want to search the mailing list archives since we had a
student application (later revoked) for the task with some discussion.
In my view (I proposed the thing) the most interesting parts are
getting GCCs global state documented and reduced. The parallelization
itself is an interesting experiment but whether there will be any
substantial improvement for builds that can already benefit from make
parallelism remains a question.
As I agree that documenting GCC's global states is good for the
community and the development of GCC, I really don't think this a good
motivation for parallelizing a compiler from a research standpoint.
There must be something or someone that could take advantage of the
fine-grained parallelism. But that data from PR84402 seems to have the
answer to it. :-)
Post by Richard Biener
Post by Giuliano Augusto Faulin Belinassi
In my view (I proposed the thing) the most interesting parts are
getting GCCs global state documented and reduced. The parallelization
itself is an interesting experiment but whether there will be any
substantial improvement for builds that can already benefit from make
parallelism remains a question.
in the common case (project with many small files, much more than
if gcc itself tries to parallelize that introduces inter thread
synchronization and potential false sharing in gcc (e.g. malloc
locks) that does not exist with make parallelism (glibc can avoid
some atomic instructions when a process is single threaded).
That is what I am mostly worried about. Or the most costly part is not
parallelizable at all. Also, I would expect a regression on very small
files, which probably could be avoided implementing this feature as a
flag?
Post by Richard Biener
Hi Giuliano,
Post by Giuliano Augusto Faulin Belinassi
You may want to search the mailing list archives since we had a
student application (later revoked) for the task with some discussion.
Specifically, the whole thread beginning with
https://gcc.gnu.org/ml/gcc/2018-03/msg00179.html
Martin
Yes, I will research this carefully ;-)

Thank you
Richard Biener
2018-11-19 10:53:20 UTC
Permalink
On Fri, Nov 16, 2018 at 8:00 PM Giuliano Augusto Faulin Belinassi
Post by Giuliano Augusto Faulin Belinassi
Hi! Sorry for the late reply again :P
On Thu, Nov 15, 2018 at 8:29 AM Richard Biener
Post by Richard Biener
On Wed, Nov 14, 2018 at 10:47 PM Giuliano Augusto Faulin Belinassi
Post by Giuliano Augusto Faulin Belinassi
As a brief introduction, I am a graduate student that got interested
in the "Parallelize the compilation using threads"(GSoC 2018 [1]). I
am a newcommer in GCC, but already have sent some patches, some of
them have already been accepted [2].
I brought this subject up in IRC, but maybe here is a proper place to
discuss this topic.
From my point of view, parallelizing GCC itself will only speed up the
compilation of projects which have a big file that creates a
bottleneck in the whole project compilation (note: by big, I mean the
amount of code to generate).
That's true. During GCC bootstrap there are some of those (see PR84402).
One way to improve parallelism is to use link-time optimization where
even single source files can be split up into multiple link-time units. But
then there's the serial whole-program analysis part.
Did you mean this: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 ?
That is a lot of data :-)
It seems that 'phase opt and generate' is the most time-consuming
part. Is that the 'GIMPLE optimization pipeline' you were talking
https://gcc.gnu.org/ml/gcc/2018-03/msg00202.html
It's everything that comes after the frontend parsing bits, thus this
includes in particular RTL optimization and early GIMPLE optimizations.
Post by Giuliano Augusto Faulin Belinassi
Post by Richard Biener
Post by Giuliano Augusto Faulin Belinassi
Additionally, I know that GCC must not
change the project layout, but from the software engineering perspective,
this may be a bad smell that indicates that the file should be broken
into smaller files. Finally, the Makefiles will take care of the
parallelization task.
What do you mean by GCC must not change the project layout? GCC
happily re-orders functions and link-time optimization will reorder
TUs (well, linking may as well).
I think this is in response to a comment I made on IRC. Giuliano said
that if a project has a very large file that dominates the total build
time, the file should be split up into smaller pieces. I said "GCC
can't restructure people's code. it can only try to compile it
faster". We weren't referring to code transformations in the compiler
like re-ordering functions, but physically refactoring the source
code.
Yes. But from one of the attachments from PR84402, it seems that such
files exist on GCC,
https://gcc.gnu.org/bugzilla/attachment.cgi?id=43440
Post by Richard Biener
Post by Giuliano Augusto Faulin Belinassi
1. Is there any project compilation that will significantly be improved
if GCC runs in parallel? Do someone has data about something related
to that? How about the Linux Kernel? If not, I can try to bring some.
We do not have any data about this apart from experiments with
splitting up source files for PR84402.
Post by Giuliano Augusto Faulin Belinassi
2. Did I correctly understand the goal of the parallelization? Can
anyone provide extra details to me?
You may want to search the mailing list archives since we had a
student application (later revoked) for the task with some discussion.
In my view (I proposed the thing) the most interesting parts are
getting GCCs global state documented and reduced. The parallelization
itself is an interesting experiment but whether there will be any
substantial improvement for builds that can already benefit from make
parallelism remains a question.
As I agree that documenting GCC's global states is good for the
community and the development of GCC, I really don't think this a good
motivation for parallelizing a compiler from a research standpoint.
True ;) Note that my suggestions to the other GSoC student were
purely based on where it's easiest to experiment with paralellization
and not where it would be most beneficial.
Post by Giuliano Augusto Faulin Belinassi
There must be something or someone that could take advantage of the
fine-grained parallelism. But that data from PR84402 seems to have the
answer to it. :-)
Post by Richard Biener
Post by Giuliano Augusto Faulin Belinassi
In my view (I proposed the thing) the most interesting parts are
getting GCCs global state documented and reduced. The parallelization
itself is an interesting experiment but whether there will be any
substantial improvement for builds that can already benefit from make
parallelism remains a question.
in the common case (project with many small files, much more than
if gcc itself tries to parallelize that introduces inter thread
synchronization and potential false sharing in gcc (e.g. malloc
locks) that does not exist with make parallelism (glibc can avoid
some atomic instructions when a process is single threaded).
That is what I am mostly worried about. Or the most costly part is not
parallelizable at all. Also, I would expect a regression on very small
files, which probably could be avoided implementing this feature as a
flag?
I think the the issue should be avoided by avoiding fine-grained paralellism.
Which might be somewhat hard given there are core data structures that
are shared (the memory allocator for a start).

The other issue I am more worried about is that we probably have to
interact with make somehow so that we do not end up with 64 threads
when one does -j8 on a 8 core machine. That's basically the same
issue we run into with -flto and it's threaded WPA writeout or recursive
invocation of make.
Post by Giuliano Augusto Faulin Belinassi
Post by Richard Biener
Hi Giuliano,
Post by Giuliano Augusto Faulin Belinassi
You may want to search the mailing list archives since we had a
student application (later revoked) for the task with some discussion.
Specifically, the whole thread beginning with
https://gcc.gnu.org/ml/gcc/2018-03/msg00179.html
Martin
Yes, I will research this carefully ;-)
Thank you
Loading...