Overview
Tracing levels
Documentum job scheduler
Xense Session Manager
 
 
 
 
 
 
 
 
Job scheduler
architecture
Tracing
Job selection query
Tracing job
scheduling
Summary
 
 
 
Documentum job scheduler
Tracing job scheduling

The following trace extract is an abbreviated trace for the execution of a single job:

1 Wed Dec 29 15:54:00 2004 [AGENTEXEC 1160] do_exec: execquery,s0,F,SELECT ALL r_object_id, a_next_invocation FROM dm_job WHERE ( (run_now = 1) OR ((is_inactive = 0) AND ( ( a_next_invocation <= DATE('now') AND a_next_invocation IS NOT NULLDATE ) OR ( a_next_continuation <= DATE('now') AND a_next_continuation IS NOT NULLDATE ) ) AND ((expiration_date > DATE('now')) OR (expiration_date IS NULLDATE)) AND ((max_iterations = 0) OR (a_iterations < max_iterations))) ) AND (i_is_reference = 0 OR i_is_reference is NULL) AND (i_is_replica = 0 OR i_is_replica is NULL) ORDER BY a_next_invocation, r_object_id

2 Wed Dec 29 15:54:00 2004 [AGENTEXEC 1160] do_get: getlastcoll,s0

3 Wed Dec 29 15:54:00 2004 [AGENTEXEC 1160] do_next: next,s0,q0

4 Wed Dec 29 15:54:00 2004 [AGENTEXEC 1160] do_get: get,s0,q0,r_object_id

5 Wed Dec 29 15:54:00 2004 [AGENTEXEC 1160] do_get_attr (result): 08053cc280000208

6 Wed Dec 29 15:54:00 2004 [AGENTEXEC 1160] do_exec: fetch,s0,08053cc280000208,dm_job

7 Wed Dec 29 15:54:00 2004 [AGENTEXEC 1160] do_get: get,s0,08053cc280000208,object_name

8 Wed Dec 29 15:54:00 2004 [AGENTEXEC 1160] do_get_attr (result): dm_FulltextMgr

9 Wed Dec 29 15:54:00 2004 [AGENTEXEC 1160] Processing Job: dm_FulltextMgr
<retrieve attributes method_name,max_iterations,run_interval,run_mode,etc.>

10 Wed Dec 29 15:54:00 2004 [AGENTEXEC 1160] do_get: get,s0,08053cc280000208,target_server

11 Wed Dec 29 15:54:00 2004 [AGENTEXEC 1160] do_get_attr (result): dm2.dm2@xenselap1

12 Wed Dec 29 15:54:00 2004 [AGENTEXEC 1160] do_get: get,s0,serverconfig,object_name

13 Wed Dec 29 15:54:00 2004 [AGENTEXEC 1160] do_get_attr (result): dm2

14 Wed Dec 29 15:54:00 2004 [AGENTEXEC 1160] do_get: get,s0,serverconfig,r_host_name

15 Wed Dec 29 15:54:00 2004 [AGENTEXEC 1160] do_get_attr (result): xenselap1

16 Wed Dec 29 15:54:00 2004 [AGENTEXEC 1160] check_target_server: dm2.dm2@xenselap1

17 Wed Dec 29 15:54:00 2004 [AGENTEXEC 1160] current_docbase: dm2

18 Wed Dec 29 15:54:00 2004 [AGENTEXEC 1160] current_config: dm2

19 Wed Dec 29 15:54:00 2004 [AGENTEXEC 1160] current_machine: xenselap1

20 Wed Dec 29 15:54:00 2004 [AGENTEXEC 1160] do_get: get,s0,08053cc280000208,r_lock_owner

21 Wed Dec 29 15:54:00 2004 [AGENTEXEC 1160] do_get_attr (result):

22 Wed Dec 29 15:54:00 2004 [AGENTEXEC 1160] do_get: checkout,s0,08053cc280000208,,,agentexec
<calculate job output file name>

23 Wed Dec 29 15:54:00 2004 [AGENTEXEC 1160] Output File Name: G:\Documentum\dba\log\00053cc2\agentexec\job_08053cc280000208

24 Wed Dec 29 15:54:00 2004 [AGENTEXEC 1160] File is not locked.
<a_next_invocation calculation>

25 Wed Dec 29 15:54:00 2004 [AGENTEXEC 1160] do_set: set,s0,08053cc280000208,a_next_invocation

26 Wed Dec 29 15:54:00 2004 [AGENTEXEC 1160] do_set (value): Wed Dec 29 15:55:00 2004

27 Wed Dec 29 15:54:00 2004 [AGENTEXEC 1160] do_set: set,s0,08053cc280000208,a_next_continuation

28 Wed Dec 29 15:54:00 2004 [AGENTEXEC 1160] do_set (value): nulldate

29 Wed Dec 29 15:54:00 2004 [AGENTEXEC 1160] do_exec: save,s0,08053cc280000208,T

30 Wed Dec 29 15:54:01 2004 [AGENTEXEC 1160] start_method_launcher

31 Wed Dec 29 15:54:01 2004 [LAUNCHER 808] set_trace_header

32 Wed Dec 29 15:54:01 2004 [LAUNCHER 808] do_launcher_main

33 Wed Dec 29 15:54:01 2004 [LAUNCHER 808] do_get: connect,dm2.dm2,dmadmin,'',,,try_native_first

34 Wed Dec 29 15:54:01 2004 [LAUNCHER 808] do_get: get,s0,serverconfig,object_name

35 Wed Dec 29 15:54:01 2004 [LAUNCHER 808] do_get_attr (result): dm2

36 Wed Dec 29 15:54:01 2004 [LAUNCHER 808] add_msg_param: dm2

37 Wed Dec 29 15:54:01 2004 [LAUNCHER 808] add_msg_param: Wed Dec 29 15:54:01 2004 [INFORMATION] [LAUNCHER 808] Detected while preparing job dm_FulltextMgr for execution: Agent Exec connected to server dm2:

38 Wed Dec 29 15:54:01 2004 [AGENTEXEC 1160] ARG[0] = dm_agent_exec.exe

39 Wed Dec 29 15:54:01 2004 [AGENTEXEC 1160] CreateProcess dm_agent_exec.exe

40 Wed Dec 29 15:54:01 2004 [AGENTEXEC 1160] ARG[1] = -docbase_name dm2.dm2

41 Wed Dec 29 15:54:01 2004 [AGENTEXEC 1160] ARG[2] = -docbase_owner dmadmin

42 Wed Dec 29 15:54:01 2004 [AGENTEXEC 1160] ARG[3] = -trace_level 1

43 Wed Dec 29 15:54:01 2004 [AGENTEXEC 1160] ARG[4] = -job_id 08053cc280000208

44 Wed Dec 29 15:54:01 2004 [AGENTEXEC 1160] ARG[5] = -parent_handle 548

45 Wed Dec 29 15:54:01 2004 [AGENTEXEC 1160] do_sleep: 30

Lines 1 to 3 are the job scheduler DQL query again but this time round we have at least one job to run since lines 4 and 5 show that we have retrieved the r_object_id of a runnable job, in fact it is the dm_FulltextMgr job as shown in lines 7,8 and 9.



I have omitted a whole section of trace showing the attributes on the dm_job object being retrieved and validated, however I have chosen to retain lines 10 to 19. These lines show the checks that the job scheduler makes to ensure that the target_server attribute on the retrieved job object is compatible with the server config and host machine for the current connection. This is probably the single most common cause for jobs not running particularly in a multi-Content Server environment where there will be multiple server configs/host machines. It is also a common problem when restoring a docbase from backup to a different machine - everything will work fine except jobs 'mysteriously' fail to run. Every dm_job object will have a target_server attribute that is incompatible with the current host machine context. Simply run some DQL after the restore to update the target_server attribute to its correct value.

Lines 30 to 44 show the creation of the process to run the job. During process creation dm_agent_exec forks a new dm_agent_exec process to handle the execution of the job's method. This happens even on Windows systems where the more usual software design is to use an additional thread in the same process. The new process connects to the docbase and then executes the method. Line 30 is the starting point for the process creation in the parent agentexec, lines 31 to 37 are the trace output from the child agentexec (notice the different text within the square brackets, the number is the OS ID for the process). When the method has been invoked the child agentexec remains in existence until the method completes and then it too exits, the parent agentexec sleeps for 30 seconds (line 45) and then continues processing the next job from the job scheduler query.

The next extract is a highly edited trace from a system where 4 jobs have been scheduled to run at the same time:

1 Wed Dec 29 15:54:00 2004 [AGENTEXEC 1160] do_exec: execquery,s0,F,SELECT ALL r_object_id, a_next_invocation FROM dm_job WHERE ( (run_now = 1) OR ((is_inactive = 0) AND ( ( a_next_invocation <= DATE('now') AND a_next_invocation IS NOT NULLDATE ) OR ( a_next_continuation <= DATE('now') AND a_next_continuation IS NOT NULLDATE ) ) AND ((expiration_date > DATE('now')) OR (expiration_date IS NULLDATE)) AND ((max_iterations = 0) OR (a_iterations < max_iterations))) ) AND (i_is_reference = 0 OR i_is_reference is NULL) AND (i_is_replica = 0 OR i_is_replica is NULL) ORDER BY a_next_invocation, r_object_id

2 Wed Dec 29 15:54:00 2004 [AGENTEXEC 1160] do_get: getlastcoll,s0

3 Wed Dec 29 15:54:00 2004 [AGENTEXEC 1160] do_next: next,s0,q0

4 Wed Dec 29 15:54:00 2004 [AGENTEXEC 1160] do_get: get,s0,q0,r_object_id}

5 Wed Dec 29 15:54:00 2004 [AGENTEXEC 1160] do_get_attr (result): 08053cc280000208

6 Wed Dec 29 15:54:00 2004 [AGENTEXEC 1160] Processing Job: dm_FulltextMgr

7 Wed Dec 29 15:54:01 2004 [AGENTEXEC 1160] CreateProcess dm_agent_exec.exe

8 Wed Dec 29 15:54:01 2004 [AGENTEXEC 1160] do_sleep: 30

9 Wed Dec 29 15:54:31 2004 [AGENTEXEC 1160] do_next: next,s0,q0
Wed Dec 29 15:54:31 2004 [AGENTEXEC 1160] do_get: get,s0,q0,r_object_id

10 Wed Dec 29 15:54:31 2004 [AGENTEXEC 1160] do_get_attr (result): 08053cc28000022f

11 Wed Dec 29 15:54:31 2004 [AGENTEXEC 1160] Processing Job: dm_DataDictionaryPublisher

12 Wed Dec 29 15:54:31 2004 [AGENTEXEC 1160] CreateProcess dm_agent_exec.exe

13 Wed Dec 29 15:54:31 2004 [AGENTEXEC 1160] do_sleep: 30

14 Wed Dec 29 15:55:01 2004 [AGENTEXEC 1160] do_next: next,s0,q0

15 Wed Dec 29 15:55:01 2004 [AGENTEXEC 1160] do_get: get,s0,q0,r_object_id

16 Wed Dec 29 15:55:01 2004 [AGENTEXEC 1160] do_get_attr (result): 08053cc280000231

17 Wed Dec 29 15:55:01 2004 [AGENTEXEC 1160] Processing Job: dm_CleanFTIndex

18 Wed Dec 29 15:55:01 2004 [AGENTEXEC 1160] CreateProcess dm_agent_exec.exe

19
Wed Dec 29 15:55:01 2004 [AGENTEXEC 1160] do_sleep: 30

20 Wed Dec 29 15:55:31 2004 [AGENTEXEC 1160] do_exec: close,s0,q0

21 Wed Dec 29 15:55:31 2004 [AGENTEXEC 1160] do_sleep: 60

22 Wed Dec 29 15:56:31 2004 [AGENTEXEC 1160] do_exec: execquery,s0,F,SELECT ALL r_object_id, a_next_invocation FROM dm_job WHERE ( (run_now = 1) OR ((is_inactive = 0) AND ( ( a_next_invocation <= DATE('now') AND a_next_invocation IS NOT NULLDATE ) OR ( a_next_continuation <= DATE('now') AND a_next_continuation IS NOT NULLDATE ) ) AND ((expiration_date > DATE('now')) OR (expiration_date IS NULLDATE)) AND ((max_iterations = 0) OR (a_iterations < max_iterations))) ) AND (i_is_reference = 0 OR i_is_reference is NULL) AND (i_is_replica = 0 OR i_is_replica is NULL) ORDER BY a_next_invocation, r_object_id

23 Wed Dec 29 15:56:31 2004 [AGENTEXEC 1160] do_get: getlastcoll,s0

24 Wed Dec 29 15:56:31 2004 [AGENTEXEC 1160] do_next: next,s0,q0

25 Wed Dec 29 15:56:31 2004 [AGENTEXEC 1160] do_get: get,s0,q0,r_object_id

26 Wed Dec 29 15:56:31 2004 [AGENTEXEC 1160] do_get_attr (result): 08053cc280000210

27 Wed Dec 29 15:56:31 2004 [AGENTEXEC 1160] Processing Job: dm_StateOfDocbase

28 Wed Dec 29 15:56:31 2004 [AGENTEXEC 1160] CreateProcess dm_agent_exec.exe

29 Wed Dec 29 15:56:31 2004 [AGENTEXEC 1160] do_sleep: 30

30 Wed Dec 29 15:57:01 2004 [AGENTEXEC 1160] do_next: next,s0,q0

31 Wed Dec 29 15:57:01 2004 [AGENTEXEC 1160] do_exec: close,s0,q0

32 Wed Dec 29 15:57:01 2004 [AGENTEXEC 1160] do_sleep: 60



Line 1 shows the execution of the job scheduler query, lines 3 to 7 show the selection and execution of a job (dm_FulltextMgr) followed by the 30 second sleep (line 8). Lines 9-13 (notice the timestamps have advanced by 30 seconds) show the selection and execution of the second job (dm_DataDictionaryPublisher) and a further 30 second sleep. The 3rd job (dm_CleanFTIndex) is executed in lines 14-19. However when agentexec wakes up from the 3rd 30 second sleep the query collection is closed (line 20), even though we know there is a 4th job to run! What is going on?

Recall from the Content Server Administrators guide there is a parameter -max_concurrent_jobs that can be set for agentexec. The default is 3, which means agentexec runs the job scheduler query and executes up to 3 jobs before going to sleep and then starting over. We can see this in lines 21 and 22 - agentexec sleeps for 60 seconds then runs the job scheduler query and executes the 4th job (dm_StateOfDocbase) in lines 23 to 28.

Note that the sleep period between query executions is 60 seconds by default and is controllable by the -override_sleep_duration agentexec parameter. The sleep period between processing of rows of the job scheduler query is 30 seconds, this duration cannot be altered.

This leads to 2 observations. First, there is a commonly held belief that the 'job scheduler polls every 5 minutes'. I have heard this during numerous site visits and have no idea where it originates from. Clearly this is not the case, if there are no other jobs executing the worse case latency is 60 seconds (plus a few seconds for startup time) when the job scheduler has just gone to sleep after processing the job scheduler query. I guess it probably seems like a long time when you are waiting for the job to start.

Secondly, even if you reduce the inter-query sleep time using -override_sleep_duration parameter there is a built in 30 second latency between job executions. Don't count on application designs that involve using large numbers of frequently executing jobs. next