MyGrid
Open Source Grid for distributed computing
Job Dependencies and distributed parallel execution
HOWTO: Create your own Job Processor.
Using Submit(Job) method of IJobProvider interface
What if you want the cluster to create Jobs from an XML file?
What if you want your own job feed, for example from the Sql database?
MyGrid wants to be what MySQL became for proprietory databases: an open source alternative you can use in your applications and infrastructure, extend the code and add new features if you like. Like commercial grid products, MyGrid includes powerful Web frontend, which allows full management of the grid.
In order to install MyGrid, you need to install at least one Cluster, and one or more Engines. For the purpose of this QuickStart, everything can be installed on a single machine.
MyGrid.Cluster.Console.exe
You should see something like the following:
MyGrid.GUI.exe – gui will allow you to see what’s going on in the grid in a human understandable form, even though you can get a lot of information from messages generated by Engine/Cluster.
MyGrid.Engine.Console.exe
Immediately, our little grid network consisting now of 1 Engine, 1 Cluster and a GUI client will come to life!
Submit a Job from GUI using Jobs Editor
MyGrid comes as a set of several parts:
On the very high level, the way it works is: a Cluster starts up, and asks Feeders: give me job feeds. Then the Cluster starts sending jobs to all Engines connected to it. Some Engines may be down or busy, or not have suitable Job Processors to handle the job, but some will say: “I can pick this job”. The Cluster then allocates a job, assigning it an engine to execute. The Engine passes the job to its local Job Processor. When the Job is complete or failed, its status is signaled back to the Cluster which then decides what to do next: re-submit the job for processing or process other jobs.
The picture below shows a very high level view of the Grid. The Cluster (which may be hosted by a service, or embedded in your own application) is the analog of the Data Synapse GridServer. It receives jobs from the job Feeders. There’re 2 job feeders shipped with the MyGrid (MyGrid.Feeders.Xml and MyGrid.Feeders.FolderWatcher), but any developer can write own feeders and simply copy them to the /feeders folder of the Cluster. The job is an abstract entity that can only be understood by a Job Processor, to which the job is being provided through the help of the Engine (job broker). There’s 1 job processor already shipped with the Grid, called Batch processor (MyGrid.JobProcessors.Shell). You can write your own Job Processor very easily, simply inherit from JobProcessor abstract class and copying it to the /processors folder of the Engine.
You can also add job processors on the fly, by simply copying them into the /processors folder of the working Engine. The Engine will identify those processors and add them.
MyGrid has an integrated job dependencies mechanism that allows the user to set dependencies for jobs. The following graph illustrates MyGrid’s understanding of dependencies:
As you can see, during the lifetime of your distributed application/group of applications, MyGrid clusters may have several input feeds. Feeds are simply some metadata describing a job, from which an object called Feeder can make sense and actually submit it to MyGrid. This type of dependency can be described as ‘dependency by name’, or ‘local feed dependency’:
<Job>
<Name>sleep</Name>
<Depends>
<Dependency jobName="readEmail" />
<Dependency jobName="browseWeb" />
<Dependency jobName="doSomeWork" />
</Depends>
</Job>
Within the feed jobs depend on each other by name, but they can also reference globally available jobs (running within the distributed MyGrid network you are connected to). So it’s becoming more like:
<Job>
<Name>sleep</Name>
<Depends>
<Dependency jobName="readEmail" />
<Dependency jobName="browseWeb" />
<Dependency jobName="doSomeWork" jobId=”cd0d89a0-6724-43cd-868f-8305df50ad” />
</Depends>
</Job>
Why this is important?
Well, as a human, you need an easy way to create dependencies between your jobs. Job ids are only known at run time, so unless you’re sure that this id exists, you shouldn’t include that dependency. “Id” dependencies are generated by MyGrid automatically from the feeds when the feed is processed by the Cluster.
The feed you submit may fail because of 2 reasons:
· Circular references between jobs
· You indicated a global jobId which does not exist at the time Feeder ran
Let’s summarize roles of every component of the Grid:
· Maintains the jobs lists
· Provides recovery and fault tolerance mechanism
· Invokes the feeders if any explicitly are in /feeders folder (loosely coupled)
· Communicates to the Engines
· Prioritizes Engines based on CPU, RAM usage
· Prioritizes Jobs based on the order
· Embeddable in own apps: yes
· Works as a service/daemon: yes
· XML configuration: yes
· Automation: yes (via Feeders and XML)
· Communicates to the Clusters
· Checks availability based on CPU and RAM usage
· Brokers jobs to the Job Processors
· Embeddable in own apps: yes
· Works as a service/daemon: yes
· XML configuration: yes
· Can execute one type of the job
· Communicates result to the Engine
· Creates Jobs based on the jobs metadata and
· Submits Jobs to the Cluster
· Complete Web based management application for MyGrid
·
Runs on embedded C# based web server, as well as: Mono’s XSP, Cassini,
Apache (with Cassini), Apache (with mod_mono), IIS
In MyGrid, you typically install a Cluster as a service/daemon. Configure job feeds. Install Engines on several machines. Start Engines as services/daemons. Start Cluster, Engines. Engines provide full logging, using log4net.
Typical Job Feed XML:
<?xml version="1.0" encoding="utf-8"?>
<JobFeed xmlns="http://tempuri.org/JobFeed.xsd">
<Feeder name="MyGrid.Feeders.Xml"> <!-- type name of the Feeder -->
<Job>
<Name>download</Name>
<Processor>
<Type>MyGrid.JobProcessors.Downloader</Type>
<Assembly>MyGrid.JobProcessors.Downloader.dll</Assembly>
<MaxCPU>20</MaxCPU>
<MinRAM>100</MinRAM>
</Processor>
<Context>
<ContextElement name="ProgressStep" value="5" />
<ContextElement name="Files">
<FilesCollection source="http://localhost/Grid.ppt" destination="resources/apps" />
</ContextElement>
</Context>
</Job>
<Job>
<Name>job1</Name>
<Processor>
<Type>MyGrid.JobProcessors.Shell</Type> <!-- type name of the JobProcessor -->
<Assembly>MyGrid.JobProcessors.Shell.dll</Assembly> <!-- assembly of the JobProcessor -->
<MaxCPU>20</MaxCPU> <!-- optional: maximum CPU usage, under which the job will still run -->
<MinRAM>100</MinRAM> <!-- optional: minimum RAM availability (in Mbytes), under which the job will still run -->
<OS>Win32NT</OS>
<!-- optional: if present, the job will run ONLY on listed engines (Regular Expression) -->
<!-- <Engines>
<EngineName>MACHINE01.*</EngineName>
<EngineName>MACHINE02.*</EngineName>
</Engines> -->
</Processor>
<Context>
<ContextElement name="Command" value="TestShellProcess.exe" />
<ContextElement name="Arguments" value="-a -b -c request.done" />
</Context>
<Depends>
<Dependency jobName="download" /> <!-- jobId="aaa" will create a GLOBAL dependency: MUST BE VALID. THE WHOLE FEED WILL FAIL IF A DEPENDENCY IS INVALID -->
<Dependency jobName="job2" />
<Dependency jobName="job3" />
</Depends>
</Job>
<Job>
<Name>job2</Name>
<Processor>
<Type>MyGrid.JobProcessors.Shell</Type>
<Assembly>MyGrid.JobProcessors.Shell.dll</Assembly>
<MaxCPU>20</MaxCPU>
<MinRAM>100</MinRAM>
</Processor>
<Context>
<ContextElement name="Command" value="TestShellProcess.exe" />
<ContextElement name="Arguments" value="-a -b -c request.done" />
</Context>
<Depends>
<!-- <Dependency type="Job" jobName="job1" /> --> <!-- will create circular dependency!!! -->
<Dependency jobName="job3" />
</Depends>
</Job>
<Job>
<Name>job3</Name>
<Processor>
<Type>MyGrid.JobProcessors.Shell</Type>
<Assembly>MyGrid.JobProcessors.Shell.dll</Assembly>
</Processor>
<Context>
<ContextElement name="Command" value="TestShellProcess.exe" />
<ContextElement name="Arguments" value="-a -b -c request.done" />
<ContextElement name="Time" value="10:00AM" />
</Context>
</Job>
<Job>
<Name>job4</Name>
<Processor>
<Type>MyGrid.JobProcessors.Shell</Type>
<Assembly>MyGrid.JobProcessors.Shell.dll</Assembly>
</Processor>
<Context>
<ContextElement name="Command" value="TestShellProcess.exe" />
<ContextElement name="Arguments" value="-a -b -c request.done" />
</Context>
</Job>
</Feeder>
<Feeder name="MyGrid.Feeders.FolderWatcher">
<Job>
<Name>Incoming file</Name>
<Processor>
<Type>MyGrid.JobProcessors.Shell</Type>
<Assembly>MyGrid.JobProcessors.Shell.dll</Assembly>
<MaxCPU>20</MaxCPU>
<MinRAM>100</MinRAM>
</Processor>
<Context>
<ContextElement name="Command" value="TestShellProcess.exe" />
<ContextElement name="Arguments" value="-a -b -c {Pattern}" />
</Context>
</Job>
<FeedContext>
<FeedContextElement name="Folder" value="c:\arc\tmp\tmp" />
<FeedContextElement name="Pattern" value="request*.done" />
</FeedContext>
</Feeder>
</JobFeed>
This job feed tells the Cluster: that there’re 2 feeders: MyGrid.Feeders.Xml and MyGrid.Feeders.FolderWatcher. The Cluster will attempt to locate and invoke each Feeder, and pass Job metadata to them. Each feeder will then instantiate Job objects and submit them into Cluster queues. The behaviour of each Feeder is specific to that feeder. For example, the FolderWatcher will be watching events from the folder defined in FeedContextElement name=”Folder” and each time an event arrives will create a job, i.e. a Job creation is actually delayed until there’re incoming files.
The <Job> element is specific for the <Processor> this job requires. The <Job> can only be processed if there’s a matching Job Processor. For example the processor for most of the jobs above is MyGrid.Engine.Shell.
The MyGrid provides a very useful and simple job processor, called Shell. You can use this Job processor and its code as a template for your own processors. But you may also think about reusing this processor, especially if your job is handled by an executable.
This job processor will use Job’s Context “Command” and “Arguments” fields and spawn a process, specified in “Command”. After the process exits, if the return code is 0, the Cluster will be notified that the Job completed successfully, otherwise it’ll tell the cluster that the Job has failed.
JobProcessors are invoked from the Engines, and are responsible for executing jobs. There's one job processor called Shell, supplied with the Grid, that can execute any job on the Grid, which is a runnable shell file (.exe, .cmd, .bat). You can use this processor as an example for your own processors, or use it as is. In order to implement a job processor you inherit from the JobProcessor abstract class, and implement its Run(Job job) method. This is the method that the Engine will invoke when attempting to request a job from the Cluster.
1. Create a class library
2. Reference shared MyGrid.Shared assembly (using MyGrid.Shared;)
3. Inherit your class from JobProcessor (there could be many job processors in one physical assembly)
public class Batch : JobProcessor
4. Implement Run method
public override
void Run(Job job){
// run the job
.... (logic specific
for your job processor)
// signal job completion back (may be synchronous or asynchronous)
Response(GridJobStatus.Complete, job);
}
5. Copy JobProcessor assembly and all related .exe's, .dlls etc into Engine's /processors folder
There’re 3 ways you can submit Jobs to the Cluster.
Example:
Job job = new Job();
job.Context["Command"] = "Client.LittleBatch.exe";
job.Context["Arguments"] = "arg1 arg2 -arg3 --arg4";
job.Processor = this.textProcessor.Text;
job.Name = "Report Generation";
IJobProvider processor = (IJobProvider)Activator.GetObject(typeof(IJobProvider), clusterName.Text);
Job j = processor.Submit(job);
Easy, isn’t it?
Then you may use MyGrid.Feeders.FolderWatcher. This feeder will create any number of jobs you specified in the configuration file, whenever there’s a new file in the folder you’re watching.
<Feeder name="MyGrid.Feeders.FolderWatcher">
<FeedContext>
<FeedContextElement name="Folder" value="c:\arc\tmp\tmp" />
<FeedContextElement name="Pattern" value="request*.done" />
</FeedContext>
<Job>
<Name>Run little batch test</Name>
<Processor>MyGrid.Engine.Batch</Processor>
<Context>
<ContextElement name="Command" value="Client.LittleBatch.exe" />
<ContextElement name="Arguments" value="-a -b -c {Pattern}" />
</Context>
</Job>
</Feeder>
<Feeder name="MyGrid.Feeders.Xml">
<Job>
<Name>Run little batch test</Name>
<Processor>MyGrid.JobProcessors.Batch</Processor>
<Context>
<ContextElement name="Command" value="Client.LittleBatch.exe" />
<ContextElement name="Arguments" value="-a -b -c request.done" />
</Context>
</Job>
<Job>
<Name>Run little batch test</Name>
<Processor>MyGrid.JobProcessors.ReportGenerator</Processor>
<Context>
<ContextElement name="Command" value="Client.LittleBatch.exe" />
<ContextElement name="Arguments" value="-a -b -c request.done" />
</Context>
</Job>
</Feeder>
Create your own Feeder by implementing IFeeder interface and place the feeder assembly in the /feeders folder on the Cluster as follows:
public interface IFeeder
{
void Submit(IJobProvider provider, JobFeed.FeederRow feed);
}
Example (based on Xml feeder):
namespace MyGrid.Feeders
{
/// <summary>
/// Feeder takes JobFeed in, and submits jobs to the cluster
/// </summary>
public class Xml : IFeeder
{
/// <summary>
/// Submits all jobs specified by the feed to the provider
/// </summary>
/// <param name="provider"></param>
/// <param name="feed"></param>
public void Submit(IJobProvider provider, JobFeed.FeederRow feed)
{
if(feed.name == this.ToString())
{
// can only process Xml feeder
foreach(JobFeed.JobRow j in feed.GetJobRows())
{
try
{
// create the Job object
Job job = new Job();
foreach(JobFeed.ContextElementRow context in j.GetContextRows()[0].GetContextElementRows())
{
job.Context.Add(context.name, context.value);
}
job.Processor = j.Processor;
job.Name = j.Name;
provider.Submit(job);
}
catch(Exception ex){}
}
}
}
}