MyGrid

Open Source Grid for distributed computing


 

Introduction   2

What is MyGrid?  2

QUICKSTART   2

Start Cluster 2

Start GUI 3

Start Engine  4

MyGrid Parts  11

Job Dependencies and distributed parallel execution  12

Details  14

Cluster 14

Engine  14

Job Processor 14

Feeder 14

Web Frontend  14

Examples   15

Typical Installation  15

HOWTO: Create your own Job Processor. 17

Quick Start 18

Quick Start 18

Advanced  18

HOWTO: Create your own Feeder  19

Concepts  19

Using Submit(Job) method of IJobProvider interface  19

What if you want the cluster to create Jobs for each incoming file in the folder, and process that file on a different machine?  19

What if you want the cluster to create Jobs from an XML file?  19

What if you want your own job feed, for example from the Sql database?  20

 

 

Introduction

What is MyGrid?

 

MyGrid wants to be what MySQL became for proprietory databases: an open source alternative you can use in your applications and infrastructure, extend the code and add new features if you like. Like commercial grid products, MyGrid includes powerful Web frontend, which allows full management of the grid.

 

 

QUICKSTART

 

In order to install MyGrid, you need to install at least one Cluster, and one or more Engines. For the purpose of this QuickStart, everything can be installed on a single machine.

 

Start Cluster

 

  1. Go to the command line version of the cluster in MyGrid/Cluster/bin and run

 

MyGrid.Cluster.Console.exe

 

You should see something like the following:

 

 

 

 

Start GUI

 

  1. Go to MyGrid/GUI/bin and run

 

    MyGrid.GUI.exe – gui will allow you to see what’s going on in the grid in a human understandable form, even though you can get a lot of information from messages generated by Engine/Cluster.

 



  1. Click Connect to Grid button. The GUI client should connect to the grid and show a list of sample jobs already pending on the cluster.



Start Engine

 

  1. Go to the command line version of the engine in MyGrid/Engine/bin and run

 

MyGrid.Engine.Console.exe

 

Immediately, our little grid network consisting now of 1 Engine, 1 Cluster and a GUI client will come to life!

 

 

  1. The Engine should start picking jobs for processing. When the job is acquired by the engine you’ll see something like:

 

 Submit a Job from GUI using Jobs Editor

 

  1. Open Jobs Editor Tab, click [+] to expand
  2. Click Feeder
  3. In the edit box type MyGrid.Feeders.Xml

  4. Hit [+], then Feeder_Job. Type my real job. Click [+]. Hit Job_Processor
  5. Type MyGrid.JobProcessors.Shell in TYPE field. 20 and 100 in MaxCPU and MinRAM accordingly.
  6. Navigate back by hitting back arrow in the right upper corner of the editor. Click [+], then Job_Context.
  7. Hit Context_ContextElement. Add the following values
  8. Click Preview Job Feed Button

  9. You’ve just defined a job that you can submit to the Cluster for processing.

 

 

MyGrid Parts

MyGrid comes as a set of several parts:

 

 

On the very high level, the way it works is: a Cluster starts up, and asks Feeders: give me job feeds. Then the Cluster starts sending jobs to all Engines connected to it. Some Engines may be down or busy, or not have suitable Job Processors to handle the job, but some will say: “I can pick this job”. The Cluster then allocates a job, assigning it an engine to execute. The Engine passes the job to its local Job Processor. When the Job is complete or failed, its status is signaled back to the Cluster which then decides what to do next: re-submit the job for processing or process other jobs.

 

The picture below shows a very high level view of the Grid. The Cluster (which may be hosted by a service, or embedded in your own application) is the analog of the Data Synapse GridServer. It receives jobs from the job Feeders. There’re 2 job feeders shipped with the MyGrid (MyGrid.Feeders.Xml and MyGrid.Feeders.FolderWatcher), but any developer can write own feeders and simply copy them to the /feeders folder of the Cluster. The job is an abstract entity that can only be understood by a Job Processor, to which the job is being provided through the help of the Engine (job broker). There’s 1 job processor already shipped with the Grid, called Batch processor (MyGrid.JobProcessors.Shell). You can write your own Job Processor very easily, simply inherit from JobProcessor abstract class and copying it to the /processors folder of the Engine.

 

You can also add job processors on the fly, by simply copying them into the /processors folder of the working Engine. The Engine will identify those processors and add them.

 

Job Dependencies and distributed parallel execution

 

MyGrid has an integrated job dependencies mechanism that allows the user to set dependencies for jobs. The following graph illustrates MyGrid’s understanding of dependencies:

 

 

 

As you can see, during the lifetime of your distributed application/group of applications, MyGrid clusters may have several input feeds. Feeds are simply some metadata describing a job, from which an object called Feeder can make sense and actually submit it to MyGrid. This type of dependency can be described as ‘dependency by name’, or ‘local feed dependency’:

 

<Job>

<Name>sleep</Name>

<Depends>

      <Dependency jobName="readEmail" />

<Dependency jobName="browseWeb" />

      <Dependency jobName="doSomeWork" />

</Depends>

</Job>

 

Within the feed jobs depend on each other by name, but they can also reference globally available jobs (running within the distributed MyGrid network you are connected to). So it’s becoming more like:

 

<Job>

<Name>sleep</Name>

<Depends>

      <Dependency jobName="readEmail" />

<Dependency jobName="browseWeb" />

      <Dependency jobName="doSomeWork" jobId=”cd0d89a0-6724-43cd-868f-8305df50ad” />

</Depends>

</Job>

 

Why this is important?

 

Well, as a human, you need an easy way to create dependencies between your jobs. Job ids are only known at run time, so unless you’re sure that this id exists, you shouldn’t include that dependency. “Id” dependencies are generated by MyGrid automatically from the feeds when the feed is processed by the Cluster.

 

The feed you submit may fail because of 2 reasons:

 

·        Circular references between jobs

·        You indicated a global jobId which does not exist at the time Feeder ran

 

 

 

Details

 

Let’s summarize roles of every component of the Grid:

 

Cluster

·         Maintains the jobs lists

·         Provides recovery and fault tolerance mechanism

·         Invokes the feeders if any explicitly are in /feeders folder (loosely coupled)

·         Communicates to the Engines

·         Prioritizes Engines based on CPU, RAM usage

·         Prioritizes Jobs based on the order

·         Embeddable in own apps: yes

·         Works as a service/daemon: yes

·         XML configuration: yes

·         Automation: yes (via Feeders and XML)

 

 Engine

·         Communicates to the Clusters

·         Checks availability based on CPU and RAM usage

·         Brokers jobs to the Job Processors

·         Embeddable in own apps: yes

·         Works as a service/daemon: yes

·         XML configuration: yes

 

Job Processor

·         Can execute one type of the job

·         Communicates result to the Engine

 

Feeder

·         Creates Jobs based on the jobs metadata and

·         Submits Jobs to the Cluster

 

Web Frontend

·         Complete Web based management application for MyGrid

·         Runs on embedded C# based web server, as well as: Mono’s XSP, Cassini, Apache (with Cassini), Apache (with mod_mono), IIS

 

 

Examples

Typical Installation

 

In MyGrid, you typically install a Cluster as a service/daemon. Configure job feeds. Install Engines on several machines. Start Engines as services/daemons. Start Cluster, Engines. Engines provide full logging, using log4net.

  

Typical Job Feed XML:

 

      <?xml version="1.0" encoding="utf-8"?>

<JobFeed xmlns="http://tempuri.org/JobFeed.xsd">

    <Feeder name="MyGrid.Feeders.Xml"> <!-- type name of the Feeder -->

                  <Job>

                        <Name>download</Name>

                        <Processor>

                              <Type>MyGrid.JobProcessors.Downloader</Type>

                              <Assembly>MyGrid.JobProcessors.Downloader.dll</Assembly>

                              <MaxCPU>20</MaxCPU>

                              <MinRAM>100</MinRAM>

                        </Processor>

                        <Context>

                              <ContextElement name="ProgressStep" value="5" />

                              <ContextElement name="Files">

                                    <FilesCollection source="http://localhost/Grid.ppt" destination="resources/apps" />

                              </ContextElement>

                        </Context>

                  </Job>

            <Job>

            <Name>job1</Name>

            <Processor>

                <Type>MyGrid.JobProcessors.Shell</Type> <!-- type name of the JobProcessor -->

                <Assembly>MyGrid.JobProcessors.Shell.dll</Assembly> <!-- assembly of the JobProcessor -->

                <MaxCPU>20</MaxCPU> <!-- optional: maximum CPU usage, under which the job will still run  -->

                <MinRAM>100</MinRAM> <!-- optional: minimum RAM availability (in Mbytes), under which the job will still run  -->

                <OS>Win32NT</OS>

                <!-- optional: if present, the job will run ONLY on listed engines (Regular Expression) -->

                <!-- <Engines>

                    <EngineName>MACHINE01.*</EngineName>

                    <EngineName>MACHINE02.*</EngineName>

                </Engines> -->

            </Processor>

            <Context>

                <ContextElement name="Command" value="TestShellProcess.exe" />

                <ContextElement name="Arguments" value="-a -b -c request.done" />

            </Context>

            <Depends>

                        <Dependency jobName="download" /> <!-- jobId="aaa" will create a GLOBAL dependency: MUST BE VALID. THE WHOLE FEED WILL FAIL IF A DEPENDENCY IS INVALID -->

                <Dependency jobName="job2" />

                <Dependency jobName="job3" />

            </Depends>

        </Job>

        <Job>

            <Name>job2</Name>

            <Processor>

                <Type>MyGrid.JobProcessors.Shell</Type>

                <Assembly>MyGrid.JobProcessors.Shell.dll</Assembly>

                <MaxCPU>20</MaxCPU>

                <MinRAM>100</MinRAM>

            </Processor>

            <Context>

                <ContextElement name="Command" value="TestShellProcess.exe" />

                <ContextElement name="Arguments" value="-a -b -c request.done" />

            </Context>

             <Depends>

                <!-- <Dependency type="Job" jobName="job1" /> --> <!-- will create circular dependency!!! -->

                <Dependency jobName="job3" />

            </Depends>

        </Job>

        <Job>

            <Name>job3</Name>

            <Processor>

                <Type>MyGrid.JobProcessors.Shell</Type>

                <Assembly>MyGrid.JobProcessors.Shell.dll</Assembly>

            </Processor>

            <Context>

                <ContextElement name="Command" value="TestShellProcess.exe" />

                <ContextElement name="Arguments" value="-a -b -c request.done" />

                <ContextElement name="Time" value="10:00AM" />

            </Context>

        </Job>

        <Job>

            <Name>job4</Name>

            <Processor>

                <Type>MyGrid.JobProcessors.Shell</Type>

                <Assembly>MyGrid.JobProcessors.Shell.dll</Assembly>

            </Processor>

            <Context>

                <ContextElement name="Command" value="TestShellProcess.exe" />

                <ContextElement name="Arguments" value="-a -b -c request.done" />

            </Context>

        </Job>

    </Feeder>

    <Feeder name="MyGrid.Feeders.FolderWatcher">

             <Job>

            <Name>Incoming file</Name>

            <Processor>

                <Type>MyGrid.JobProcessors.Shell</Type>

                <Assembly>MyGrid.JobProcessors.Shell.dll</Assembly>

                <MaxCPU>20</MaxCPU>

                <MinRAM>100</MinRAM>

            </Processor>

            <Context>

                <ContextElement name="Command" value="TestShellProcess.exe" />

                <ContextElement name="Arguments" value="-a -b -c {Pattern}" />

            </Context>

        </Job>

        <FeedContext>

            <FeedContextElement name="Folder" value="c:\arc\tmp\tmp" />

            <FeedContextElement name="Pattern" value="request*.done" />

        </FeedContext>

    </Feeder>

</JobFeed>

 

 

This job feed tells the Cluster: that there’re 2 feeders: MyGrid.Feeders.Xml  and MyGrid.Feeders.FolderWatcher. The Cluster will attempt to locate and invoke each Feeder, and pass Job metadata to them. Each feeder will then instantiate Job objects and submit them into Cluster queues. The behaviour of each Feeder is specific to that feeder. For example, the FolderWatcher will be watching events from the folder defined in FeedContextElement name=”Folder” and each time an event arrives will create a job, i.e. a Job creation is actually delayed until there’re incoming files.

 

The <Job> element is specific for the <Processor> this job requires. The <Job> can only be processed if there’s a matching Job Processor. For example the processor for most of the jobs above is MyGrid.Engine.Shell.

 

HOWTO: Create your own Job Processor.

 

The MyGrid provides a very useful and simple job processor, called Shell. You can use this Job processor and its code as a template for your own processors. But you may also think about reusing this processor, especially if your job is handled by an executable.

 

This job processor will use Job’s Context “Command” and “Arguments” fields and spawn a process, specified in “Command”. After the process exits, if the return code is 0, the Cluster will be notified that the Job completed successfully, otherwise it’ll tell the cluster that the Job has failed.

 

  Quick Start

 

JobProcessors are invoked from the Engines, and are responsible for executing jobs.  There's one job processor called Shell, supplied with the Grid, that can execute any job on the Grid, which is a runnable shell file (.exe, .cmd, .bat). You can use this processor as an example for your own processors, or use it as is. In order to implement a job processor you inherit from the JobProcessor abstract class, and implement its Run(Job job) method. This is the method that the Engine will invoke when attempting to request a job from the Cluster.

 

Quick Start

 

1. Create a class library

2. Reference shared MyGrid.Shared assembly (using MyGrid.Shared;)

3. Inherit your class from JobProcessor (there could be many job processors in one physical assembly)

 

public class Batch : JobProcessor

 

4. Implement Run method


public override void Run(Job job){
    // run the job

    ....  (logic specific for your job processor) 
   // signal job completion back  (may be synchronous or asynchronous) 
   
 Response(GridJobStatus.Complete, job);

}

5. Copy JobProcessor assembly and all related .exe's, .dlls etc into Engine's /processors folder

 

Advanced 

 

 

 

 

HOWTO: Create your own Feeder

 

Concepts

 

There’re 3 ways you can submit Jobs to the Cluster.

 

  1. By using Submit method of the IJobProvider interface.
  2. By using existing feeders: FolderWatcher or Xml
  3. By writing your own Feeder

 

Using Submit(Job) method of IJobProvider interface

Example:

 

Job job = new Job();

job.Context["Command"] = "Client.LittleBatch.exe";

job.Context["Arguments"] = "arg1 arg2 -arg3 --arg4";

job.Processor = this.textProcessor.Text;

job.Name = "Report Generation";

IJobProvider processor = (IJobProvider)Activator.GetObject(typeof(IJobProvider), clusterName.Text);

Job j = processor.Submit(job);

 

Easy, isn’t it?

 

What if you want the cluster to create Jobs for each incoming file in the folder, and process that file on a different machine?

 

Then you may use MyGrid.Feeders.FolderWatcher. This feeder will create any number of jobs you specified in the configuration file, whenever there’s a new file in the folder you’re watching.

 

<Feeder name="MyGrid.Feeders.FolderWatcher">

<FeedContext>

              <FeedContextElement name="Folder" value="c:\arc\tmp\tmp" />

              <FeedContextElement name="Pattern" value="request*.done" />

       </FeedContext>

       <Job>

              <Name>Run little batch test</Name>

              <Processor>MyGrid.Engine.Batch</Processor>

              <Context>

                     <ContextElement name="Command" value="Client.LittleBatch.exe" />

                     <ContextElement name="Arguments" value="-a -b -c {Pattern}" />

              </Context>

       </Job>

</Feeder>

 

 

What if you want the cluster to create Jobs from an XML file?

 

 

 

<Feeder name="MyGrid.Feeders.Xml">

<Job>

              <Name>Run little batch test</Name>

              <Processor>MyGrid.JobProcessors.Batch</Processor>

              <Context>

                     <ContextElement name="Command" value="Client.LittleBatch.exe" />

                     <ContextElement name="Arguments" value="-a -b -c request.done" />

              </Context>

       </Job>

       <Job>

              <Name>Run little batch test</Name>

              <Processor>MyGrid.JobProcessors.ReportGenerator</Processor>

              <Context>

                     <ContextElement name="Command" value="Client.LittleBatch.exe" />

                     <ContextElement name="Arguments" value="-a -b -c request.done" />

              </Context>

       </Job>

</Feeder>

 

What if you want your own job feed, for example from the Sql database?

Create your own Feeder by implementing IFeeder interface and place the feeder assembly in the /feeders folder on the Cluster as follows:

 

public interface IFeeder

{

      void Submit(IJobProvider provider, JobFeed.FeederRow feed);

}

 

Example (based on Xml feeder):

 

namespace MyGrid.Feeders

{

/// <summary>

/// Feeder takes JobFeed in, and submits jobs to the cluster

/// </summary>

public class Xml : IFeeder

{

       /// <summary>

       /// Submits all jobs specified by the feed to the provider

       /// </summary>

       /// <param name="provider"></param>

       /// <param name="feed"></param>

       public void Submit(IJobProvider provider, JobFeed.FeederRow feed)

       {

              if(feed.name == this.ToString())

              {

              // can only process Xml feeder

                     foreach(JobFeed.JobRow j in feed.GetJobRows())

                     {

                           try

                           {

                                  // create the Job object

                                  Job job = new Job();

                                  foreach(JobFeed.ContextElementRow context in j.GetContextRows()[0].GetContextElementRows())

                                  {

                                  job.Context.Add(context.name, context.value);  

                                  }

                                  job.Processor = j.Processor;

                                  job.Name = j.Name;

                                  provider.Submit(job);

                           }

                           catch(Exception ex){}

                     }

              }

       }

 

}