aliworkslower shows high latency of work between workqueue_queue_work
and workqueue_execute_start.

Users can specify a threshhold, and whenever the latency between
workqueue_queue_work and workqueue_execute_start exceeds this
threshold, related information of this work is outputed.

For example,

# ./aliworkslower 15
Tracing high work sched latency... Hit Ctrl-C to end.
Worker Thread                 Work    Total Q2S latency (us)          Q2A latency (us)          A2S latency (us)
     19783        vmstat_update                        36                         0                        35
     29195        vmstat_update                        16                         0                        15

- column "Worker Thread": PID of the worker thread
- column "Work": work callback
- column "Total Q2S latency": latency between workqueue_queue_work (Q)
and workqueue_execute_start (S), in us
- column "Q2A latency": latency between workqueue_queue_work (Q)
and workqueue_activate_work (A), in us
- column "A2S latency": latency between workqueue_activate_work (A)
and workqueue_execute_start (S), in us


# aliworkslower --help
usage: aliworkslower.py [-h] [threshold]

Trace high work sched latency

positional arguments:
  threshold   threshold of Q2S latency, in microsecond

optional arguments:
  -h, --help  show this help message and exit

examples:
    ./aliworkslower            # trace work sched latency higher than 10000 us (default)
    ./aliworkslower 1000       # trace work sched latency higher than  1000 us
