New-job-scheduler

Today, there is 5 different tasks, most of them have some specific needs. It was not the case two years ago when the task scheduler's initial version has been defined.

Different kind of needs

Network inventory

The agent needs to get a range of IP address to scan. This information depends on the agent networks. The server must be able to associate the agent to the correct network.

The work is share on either a list of agent (manually or auto selected).

Devices (from the network) inventory

This task share the same needs.

ESX inventory

The OCS legacy

The OCS protocol has some serious drawback and that's why we move on. It also explain some old architecture decision.

  • The agent get the work to be done only once, even if the work will take hour.
  • The server used to send huge XML file when only a tiny part is really need for the coming time frame.

The proposal

The server only give part of large job to the agent. Every time an agent request a job, the server shift a job from a large stack.

Example, the server want network 192.168.0.0 to be scanned:
  • Agent A asks Server for a job.
    • Server gives 192.168.0.1 to scan and sets the owner with Agent A.
  • Agent B asks Server for a job.
    • Server gives 192.168.0.2 to scan and sets the owner with Agent B.
  • Agent A sends back information to Server.
    • Server deletes associated job from the stack.
  • Agent A asks Server for a job.
    • Server gives 192.168.0.3 to scan and sets the owner with Agent A.
  • Agent "B" is disconnected from the network, we get a time out on the server.
  • Agent B is disconnected from the network.
    its job is in error but there is no way to update Server, so it saves the last job status.
  • Agent A sends back information to Server.
    • Server deletes associated job from the stack.
    • Server times out the 192.168.0.2 job and regenerate the job.
  • Agent A asks for a job.
    • Server gives 192.168.0.2 to scan and set the owner with Agent A.
  • Agent B reconnects to the network and updates the server with its 192.168.0.2 job's status
    • Server updates the 192.168.0.2 associated job with the agent's status and deletes the job.
  • Agent C asks for a job.
    • Server gives 192.168.0.4 to scan and sets the owner with Agent C.
  • Agent B asks for a job.
    • Server gives 192.168.0.5 to scan and sets the owner with Agent B.

Changes

Each task should support the REST/JSON interface to be able to ask job part.
  • A specific task must inherit from the FusioninventoryTask class and must be uniquely identified (ie. deploy, snmp, esx ...).
  • The agent must request a job from the task manager (ex: FusioninventoryTaskManager). The task manager will answer with a generated json file associated to an agent.
  • The task manager will save the job in a json file when a task is scheduled or its execution is forced. If some task is modified, the previously scheduled job will not be modified.
  • A job must have an owner to be validated as being currently running.
  • A job may have a timeout. If the owner can not update the status before this timeout, the job will be considered as a failure and the task manager will regenerate another job accordingly to the next scheduled date, the periodicity or the number of retries.
  • A task do only one job.
  • A task must generate only one job by agents.
  • A user should be able to force to run a task (ie. put a job on top of the job stack and wake up the concerned agent).

Server side

Each task need to

Warnings

We will have various interface for the agent to contact the server. We mush ensure each of them correctly protect the data.