P
US11768754B2ActiveUtilityPatentIndex 70

Parallel program scalability bottleneck detection method and computing device

Assignee: UNIV TSINGHUAPriority: Aug 27, 2020Filed: Aug 27, 2020Granted: Sep 26, 2023
Est. expiryAug 27, 2040(~14.1 yrs left)· nominal 20-yr term from priority
Inventors:ZHAI JIDONGJIN YUYANGCHEN WENGUANGZHENG WEIMIN
G06F 11/3466G06F 8/433G06F 11/36Y02D10/00G06F 2201/865G06F 11/3404G06F 11/3409
70
PatentIndex Score
4
Cited by
14
References
16
Claims

Abstract

A computer executed parallel program scalability bottleneck detection method is provided, which includes: building a program structure graph for a program source code; collecting performance data based on a sampling technique during runtime; the performance data including: performance data of each vertex of the program structure graph and inter-process communication dependence of communication vertices; building a program performance graph by filling the program structure graph with the collected performance data, the program performance graph recording data and control dependence of each process as well as inter-process communication dependence; detecting problematic vertices from the program performance graph, and starting from some or all of the problematic vertices, backtracking through data/control dependence edges within a process and communication dependence edges between different processes, to detect scalability bottleneck vertices.

Claims

exact text as granted — not AI-modified
The invention claimed is: 
     
       1. A computer executed parallel program scalability bottleneck detection method, which is used for detecting scalability bottlenecks of a parallel program, the method comprising:
 building, by a processor, a program structure graph for a program source code; 
 collecting, by the processor, performance data based on a sampling technique during runtime, the performance data comprises performance data of each vertex of the program structure graph and inter-process communication dependence of communication vertices; 
 building, by the processor, a program performance graph by filling the program structure graph with the collected performance data, the program performance graph recording data and control dependence edges of each process as well as inter-process communication dependence; 
 detecting, by the processor, problematic vertices from the program performance graph, the detecting problematic vertices from the program performance graph comprises detecting non-scalable vertices and abnormal vertices, wherein a non-scalable vertex refers to a vertex whose performance-process count curve does not reach a pre-defined performance growth standard when a process count increases, and an abnormal vertex refers to a vertex whose difference from other vertices is greater than a pre-defined threshold during comparison of performance data of a same vertex between different processes; and 
 starting, by the processor, from some or all of the problematic vertices, backtracking through data and control dependence edges within a process and communication dependence edges between different processes, to detect scalability bottleneck vertices. 
 
     
     
       2. The parallel program scalability bottleneck detection method according to  claim 1 , wherein, the pre-defined performance growth standard is an average performance growth level of all vertices. 
     
     
       3. The parallel program scalability bottleneck detecting method according to  claim 1 , wherein, the backtracking comprises:
 reversing all edges in the program performance graph; 
 backtracking through data and control dependence edges within a process and communication dependence edges between different processes, until root vertices or collective communication vertices are accessed; and 
 taking vertices on backtracking paths as candidates for scalability bottleneck vertices. 
 
     
     
       4. The parallel program scalability bottleneck detection method according to  claim 1 , further comprising:
 only preserving the communication dependence edge if a waiting event exists, and pruning other communication dependence edges, to reduce searching space for backtracking. 
 
     
     
       5. The parallel program scalability bottleneck detection method according to  claim 1 , wherein, the building the program structure graph comprises:
 obtaining a preliminary program structure graph at compile time; and 
 contracting the program structure graph, comprising removing edges that meet pre-defined criteria in the program structure graph and merging a plurality of vertices into one vertex. 
 
     
     
       6. The parallel program scalability bottleneck detection method according to  claim 5 , wherein, the removing edges that meet pre-defined criteria in the program structure graph comprises:
 only preserving a structure comprising Message Passing Interface (MPI) invocations and loops; and 
 removing a loop vertex whose nested depth exceeds the pre-defined threshold. 
 
     
     
       7. The parallel program scalability bottleneck detection method according to  claim 5 , wherein, the merging a plurality of vertices into one vertex comprises:
 merging continuous vertices with tiny workload into a larger vertex, for computation vertices in the program structure graph. 
 
     
     
       8. The parallel program scalability bottleneck detection method according to  claim 1 , wherein, the collecting performance data based on sampling comprises:
 collecting hardware counter performance data of each vertex of the program structure graph during execution, a hardware counter interface being configured to sample and collect hardware performance data, the program being interrupted at regular clock cycles and program call stack and related performance data being recorded; and 
 associating performance data with a corresponding program structure graph vertex, according to the program call stack information. 
 
     
     
       9. The parallel program scalability bottleneck detection method according to  claim 1 , wherein, the collecting inter-process communication dependence performance data, comprises:
 sampling-based instrumentation, inserted code that collects performance data being executed based on sampling. 
 
     
     
       10. The parallel program scalability bottleneck detection method according to  claim 9 , wherein, the sampling-based instrumentation comprises:
 executing statement that generates a random number at a beginning of the inserted code; 
 generating one random number every time the inserted code is executed, judging whether the random number generated every time the inserted code is executed falls into an interval of a pre-defined threshold; and 
 collecting performance data only when the random number generated every time the inserted code is executed falls into the interval of the pre-defined threshold. 
 
     
     
       11. The parallel program scalability bottleneck detection method according to  claim 1 , the collecting inter-process communication dependence performance data, comprises:
 graph-guided communication compression, communication operation parameters being only recorded once for repeated communications with same parameters, and 
 communication operations on a same group of program structure graph vertices for different time steps being merged together. 
 
     
     
       12. The parallel program scalability bottleneck detection method according to  claim 11 , wherein, the collecting performance data further comprises:
 collecting calling information before an entry and an exit of indirect calls, linking the calling information with real function calls with unique function IDs, and then refining the program structure graph obtained after an inter-procedural analysis. 
 
     
     
       13. The parallel program scalability bottleneck detection method according to  claim 1 , wherein, the collecting inter-process communication dependence performance data, comprises:
 (1) using MPI_Comm_get_info to acquire the information, for collective communication; 
 (2) recording a source or dest process and tag directly, for blocking point to point communication; and 
 (3) using status parameter of synchronous communication functions to identify communication dependence, for non-blocking communication. 
 
     
     
       14. The parallel program scalability bottleneck detection method according to  claim 3 , further comprising:
 obtaining one or more causal paths that connect a set of problematic vertices, as scalability bottleneck candidates. 
 
     
     
       15. The parallel program scalability bottleneck detection method according to  claim 1 , wherein, the performance data of each vertex of the program structure graph is hardware counter performance data. 
     
     
       16. A computing device, comprising a memory, wherein, the memory has computer executable instructions stored thereon, and the executable instructions execute the parallel program scalability bottleneck detection method of  claim 1  by the processor.

Cited by (0)

No later patents cite this yet.

References (0)

No backward citations on record.