Apache Flink 学习:Flink Job ExecutionGraph生成过程

Apache Flink 学习:Flink Job 执行计划生成过程

Transformations

TransformationClasses

并不是每一个 Transformation 都会转换成runtime层中的物理操作。有一些只是逻辑概念,比如union、split/select、partition等。如下图所示的转换树,在运行时会优化成下方的操作图。

Transformations

执行计划转换过程

4层转换

1.转换过程 StreamExecutionEnvironment 存放的 transformation -> StreamGraph -> JobGraph -> ExecutionGraph -> 物理执行图
2.StreamExecutionEnvironment 存放的 transformation -> StreamGraph -> JobGraph 在客户端完成,然后提交 JobGraph 到 JobManager
3.JobManager 的主节点 JobMaster,将 JobGraph 转化为 ExecutionGraph,然后发送到不同的 taskManager,得到实际的物理执行图

LocalStreamEnvironment 中 parallelism

其中 LocalStreamEnvironment Task 中的 parallelism 数量是根据以下代码生成的

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
@Public
public abstract class StreamExecutionEnvironment {
...

private static int defaultLocalParallelism = Runtime.getRuntime().availableProcessors();
/**
* Creates a {@link LocalStreamEnvironment}. The local execution environment
* will run the program in a multi-threaded fashion in the same JVM as the
* environment was created in. The default parallelism of the local
* environment is the number of hardware contexts (CPU cores / threads),
* unless it was specified differently by {@link #setParallelism(int)}.
*
* @return A local execution environment.
*/
public static LocalStreamEnvironment createLocalEnvironment() {
return createLocalEnvironment(defaultLocalParallelism);
}

/**
* Creates a {@link LocalStreamEnvironment}. The local execution environment
* will run the program in a multi-threaded fashion in the same JVM as the
* environment was created in. It will use the parallelism specified in the
* parameter.
*
* @param parallelism
* The parallelism for the local environment.
* @return A local execution environment with the specified parallelism.
*/
public static LocalStreamEnvironment createLocalEnvironment(int parallelism) {
return createLocalEnvironment(parallelism, new Configuration());
}

...
}