MyGrid

Open Source Grid for distributed computing

MyGrid wants to be what MySQL became for proprietory databases: an open source alternative you can use in your applications and infrastructure, extend the code and add new features if you like. Like commercial grid products, MyGrid includes powerful Web frontend, which allows full management of the grid.

QUICKSTART

In order to install MyGrid, you need to install at least one Cluster, and one or more Engines. For the purpose of this QuickStart, everything can be installed on a single machine.

Start Cluster

Go to the command line version of the cluster in MyGrid/Cluster/bin and run

MyGrid.Cluster.Console.exe

You should see something like the following:

Start GUI

Go to MyGrid/GUI/bin and run

MyGrid.GUI.exe – gui will allow you to see what’s going on in the grid in a human understandable form, even though you can get a lot of information from messages generated by Engine/Cluster.

Click Connect to Grid button. The GUI client should connect to the grid and show a list of sample jobs already pending on the cluster.

Start Engine

Go to the command line version of the engine in MyGrid/Engine/bin and run

MyGrid.Engine.Console.exe

Immediately, our little grid network consisting now of 1 Engine, 1 Cluster and a GUI client will come to life!

The Engine should start picking jobs for processing. When the job is acquired by the engine you’ll see something like:

Submit a Job from GUI using Jobs Editor

Open Jobs Editor Tab, click [+] to expand
Click Feeder
In the edit box type MyGrid.Feeders.Xml
Hit [+], then Feeder_Job. Type my real job. Click [+]. Hit Job_Processor
Type MyGrid.JobProcessors.Shell in TYPE field. 20 and 100 in MaxCPU and MinRAM accordingly.
Navigate back by hitting back arrow in the right upper corner of the editor. Click [+], then Job_Context.
Hit Context_ContextElement. Add the following values
Click Preview Job Feed Button
You’ve just defined a job that you can submit to the Cluster for processing.

MyGrid Parts

MyGrid comes as a set of several parts:

Cluster
Engine
Shared assembly/library
Job Processors
Feeders
Web Frontend
SDK

On the very high level, the way it works is: a Cluster starts up, and asks Feeders: give me job feeds. Then the Cluster starts sending jobs to all Engines connected to it. Some Engines may be down or busy, or not have suitable Job Processors to handle the job, but some will say: “I can pick this job”. The Cluster then allocates a job, assigning it an engine to execute. The Engine passes the job to its local Job Processor. When the Job is complete or failed, its status is signaled back to the Cluster which then decides what to do next: re-submit the job for processing or process other jobs.

The picture below shows a very high level view of the Grid. The Cluster (which may be hosted by a service, or embedded in your own application) is the analog of the Data Synapse GridServer. It receives jobs from the job Feeders. There’re 2 job feeders shipped with the MyGrid (MyGrid.Feeders.Xml and MyGrid.Feeders.FolderWatcher), but any developer can write own feeders and simply copy them to the /feeders folder of the Cluster. The job is an abstract entity that can only be understood by a Job Processor, to which the job is being provided through the help of the Engine (job broker). There’s 1 job processor already shipped with the Grid, called Batch processor (MyGrid.JobProcessors.Shell). You can write your own Job Processor very easily, simply inherit from JobProcessor abstract class and copying it to the /processors folder of the Engine.

You can also add job processors on the fly, by simply copying them into the /processors folder of the working Engine. The Engine will identify those processors and add them.

Job Dependencies and distributed parallel execution

MyGrid has an integrated job dependencies mechanism that allows the user to set dependencies for jobs. The following graph illustrates MyGrid’s understanding of dependencies:

As you can see, during the lifetime of your distributed application/group of applications, MyGrid clusters may have several input feeds. Feeds are simply some metadata describing a job, from which an object called Feeder can make sense and actually submit it to MyGrid. This type of dependency can be described as ‘dependency by name’, or ‘local feed dependency’:

<Job>

<Name>sleep</Name>

</Depends>

</Job>

Within the feed jobs depend on each other by name, but they can also reference globally available jobs (running within the distributed MyGrid network you are connected to). So it’s becoming more like:

<Job>

<Name>sleep</Name>

</Depends>

</Job>

Why this is important?

Well, as a human, you need an easy way to create dependencies between your jobs. Job ids are only known at run time, so unless you’re sure that this id exists, you shouldn’t include that dependency. “Id” dependencies are generated by MyGrid automatically from the feeds when the feed is processed by the Cluster.

The feed you submit may fail because of 2 reasons:

· Circular references between jobs

· You indicated a global jobId which does not exist at the time Feeder ran

Details

Let’s summarize roles of every component of the Grid:

Cluster

· Maintains the jobs lists

· Provides recovery and fault tolerance mechanism

· Invokes the feeders if any explicitly are in /feeders folder (loosely coupled)

· Communicates to the Engines

· Prioritizes Engines based on CPU, RAM usage

· Prioritizes Jobs based on the order

· Embeddable in own apps: yes

· Works as a service/daemon: yes

· XML configuration: yes

· Automation: yes (via Feeders and XML)

Engine

· Communicates to the Clusters

· Checks availability based on CPU and RAM usage

· Brokers jobs to the Job Processors

· Embeddable in own apps: yes

· Works as a service/daemon: yes

· XML configuration: yes

Job Processor

· Can execute one type of the job

· Communicates result to the Engine

Feeder

· Creates Jobs based on the jobs metadata and

· Submits Jobs to the Cluster

Web Frontend

· Complete Web based management application for MyGrid

· Runs on embedded C# based web server, as well as: Mono’s XSP, Cassini, Apache (with Cassini), Apache (with mod_mono), IIS

Examples

Typical Installation

In MyGrid, you typically install a Cluster as a service/daemon. Configure job feeds. Install Engines on several machines. Start Engines as services/daemons. Start Cluster, Engines. Engines provide full logging, using log4net.

Typical Job Feed XML:

<?xml version="1.0" encoding="utf-8"?>

<Job>

<Name>download</Name>

<Type>MyGrid.JobProcessors.Downloader</Type>

<Assembly>MyGrid.JobProcessors.Downloader.dll</Assembly>

</Processor>

</ContextElement>

</Context>

</Job>

<Job>

<Type>MyGrid.JobProcessors.Shell</Type>

<Assembly>MyGrid.JobProcessors.Shell.dll</Assembly>

<!-- <Engines>

<EngineName>MACHINE01.*</EngineName>

<EngineName>MACHINE02.*</EngineName>

</Engines> -->

</Processor>

</Context>

</Depends>

</Job>

<Job>

<Type>MyGrid.JobProcessors.Shell</Type>

<Assembly>MyGrid.JobProcessors.Shell.dll</Assembly>

</Processor>

</Context>

</Depends>

</Job>

<Job>

<Type>MyGrid.JobProcessors.Shell</Type>

<Assembly>MyGrid.JobProcessors.Shell.dll</Assembly>

</Processor>

</Context>

</Job>

<Job>

<Type>MyGrid.JobProcessors.Shell</Type>

<Assembly>MyGrid.JobProcessors.Shell.dll</Assembly>

</Processor>

</Context>

</Job>

</Feeder>

<Job>

<Name>Incoming file</Name>

<Type>MyGrid.JobProcessors.Shell</Type>

<Assembly>MyGrid.JobProcessors.Shell.dll</Assembly>

</Processor>

</Context>

</Job>

</FeedContext>

</Feeder>

</JobFeed>

This job feed tells the Cluster: that there’re 2 feeders: MyGrid.Feeders.Xml and MyGrid.Feeders.FolderWatcher. The Cluster will attempt to locate and invoke each Feeder, and pass Job metadata to them. Each feeder will then instantiate Job objects and submit them into Cluster queues. The behaviour of each Feeder is specific to that feeder. For example, the FolderWatcher will be watching events from the folder defined in FeedContextElement name=”Folder” and each time an event arrives will create a job, i.e. a Job creation is actually delayed until there’re incoming files.

The <Job> element is specific for the <Processor> this job requires. The <Job> can only be processed if there’s a matching Job Processor. For example the processor for most of the jobs above is MyGrid.Engine.Shell.

HOWTO: Create your own Job Processor.

The MyGrid provides a very useful and simple job processor, called Shell. You can use this Job processor and its code as a template for your own processors. But you may also think about reusing this processor, especially if your job is handled by an executable.

This job processor will use Job’s Context “Command” and “Arguments” fields and spawn a process, specified in “Command”. After the process exits, if the return code is 0, the Cluster will be notified that the Job completed successfully, otherwise it’ll tell the cluster that the Job has failed.

Quick Start

JobProcessors are invoked from the Engines, and are responsible for executing jobs. There's one job processor called Shell, supplied with the Grid, that can execute any job on the Grid, which is a runnable shell file (.exe, .cmd, .bat). You can use this processor as an example for your own processors, or use it as is. In order to implement a job processor you inherit from the JobProcessor abstract class, and implement its Run(Job job) method. This is the method that the Engine will invoke when attempting to request a job from the Cluster.

Quick Start

1. Create a class library

2. Reference shared MyGrid.Shared assembly (using MyGrid.Shared;)

3. Inherit your class from JobProcessor (there could be many job processors in one physical assembly)

public class Batch : JobProcessor

4. Implement Run method

public override void Run(Job job){
// run the job

    .... (logic specific for your job processor)
   // signal job completion back (may be synchronous or asynchronous)
    Response(GridJobStatus.Complete, job);

}

5. Copy JobProcessor assembly and all related .exe's, .dlls etc into Engine's /processors folder

Advanced

When the engine is started, it looks for any assemblies in the /processors folder. It tries to find any types inherited from MyGrid.Shared.JobProcessor. All types and related physical assembly names are added to internal hashtables, as "capabilities" of the Engine.

When the Engine connects to the grid, and receives notifications about available jobs, it checks if the Job.Processor property matches any of the types it discovered at the previous stage. If a type is found, the engine is capable of processing this job, and it attempts to request it from the Cluster.

The engine instantiates the job processor type, sets status delegates and executes Run() method. When the job finishes, the processor notifies the engine back, and the engine notifies the cluster.

HOWTO: Create your own Feeder

Concepts

There’re 3 ways you can submit Jobs to the Cluster.

By using Submit method of the IJobProvider interface.
By using existing feeders: FolderWatcher or Xml
By writing your own Feeder

Using Submit(Job) method of IJobProvider interface

Example:

Job job = new Job();

job.Context["Command"] = "Client.LittleBatch.exe";

job.Context["Arguments"] = "arg1 arg2 -arg3 --arg4";

job.Processor = this.textProcessor.Text;

job.Name = "Report Generation";

IJobProvider processor = (IJobProvider)Activator.GetObject(typeof(IJobProvider), clusterName.Text);

Job j = processor.Submit(job);

Easy, isn’t it?

What if you want the cluster to create Jobs for each incoming file in the folder, and process that file on a different machine?

Then you may use MyGrid.Feeders.FolderWatcher. This feeder will create any number of jobs you specified in the configuration file, whenever there’s a new file in the folder you’re watching.

</FeedContext>

<Job>

<Name>Run little batch test</Name>

<Processor>MyGrid.Engine.Batch</Processor>

</Context>

</Job>

</Feeder>

What if you want the cluster to create Jobs from an XML file?

<Job>

<Name>Run little batch test</Name>

<Processor>MyGrid.JobProcessors.Batch</Processor>

</Context>

</Job>

<Job>

<Name>Run little batch test</Name>

<Processor>MyGrid.JobProcessors.ReportGenerator</Processor>

</Context>

</Job>

</Feeder>

What if you want your own job feed, for example from the Sql database?

Create your own Feeder by implementing IFeeder interface and place the feeder assembly in the /feeders folder on the Cluster as follows:

public interface IFeeder

{

void Submit(IJobProvider provider, JobFeed.FeederRow feed);

}

Example (based on Xml feeder):

namespace MyGrid.Feeders

{

/// <summary>

/// Feeder takes JobFeed in, and submits jobs to the cluster

/// </summary>

public class Xml : IFeeder

{

/// <summary>

/// Submits all jobs specified by the feed to the provider

/// </summary>

/// <param name="provider"></param>

/// <param name="feed"></param>

public void Submit(IJobProvider provider, JobFeed.FeederRow feed)

{

if(feed.name == this.ToString())

{

// can only process Xml feeder

foreach(JobFeed.JobRow j in feed.GetJobRows())

{

try

{

// create the Job object

Job job = new Job();

foreach(JobFeed.ContextElementRow context in j.GetContextRows()[0].GetContextElementRows())

{

job.Context.Add(context.name, context.value);

}

job.Processor = j.Processor;

job.Name = j.Name;

provider.Submit(job);

}

catch(Exception ex){}

}

Introduction

What is MyGrid?

QUICKSTART

Start Cluster

Start GUI

Start Engine

MyGrid Parts

Job Dependencies and distributed parallel execution

Details

Cluster

Engine

Job Processor

Feeder

Web Frontend

Examples

Typical Installation

HOWTO: Create your own Job Processor.

Quick Start

Quick Start

Advanced

HOWTO: Create your own Feeder

Concepts

Using Submit(Job) method of IJobProvider interface

What if you want the cluster to create Jobs for each incoming file in the folder, and process that file on a different machine?

What if you want the cluster to create Jobs from an XML file?

What if you want your own job feed, for example from the Sql database?