Thilina's Blog

I might be wrong, but…

Hardware co-simulations for image processing applications using MATLAB Simulink Xilinx Block-set

When working with image processing applications on hardware level, during the simulations I personally felt, very hard to work with bit/byte data, without seeing the resulting image/video data. For this kind of application we need to stream in and stream out data bits/bytes to and for the hardware module we implemented. MATLAB Simulink with combination of Xilinx Block-set gives great help in dealing this issue. By using MATLAB we can reshape images in to data stream as well as reshape data streams back in to images in MATLAB environment as I discussed in my previous article.

But the problem rises with the complexity of the system in interest. According to the performance of your computer the simulation time can be varying, sometimes it may be able to take hours. Hardware Co simulations on MATLAB come as the solution for the pc performance limitation issue. In this article I am going to share my experiences on performing a hardware co-simulation on MATLAB with the FPGA development kit Atlys Spartan 6 with xls45-3 FPGA.

Let’s take contract stretching technique in image processing. In MATLAB; matter of fact just a few lines of codes. But when it goes to hardware level there are many important operations dealing on it. To perform this kind of an operation we need to keep concentrate on following factors.

  • Data input is not a single matrix, but a stream of bits/bytes (in this case I use 8 bit parallel bus to transmit 1 byte per clock cycle
  • Data input must perform when the system is ready to accept data
  • You cannot input/ output floating point data into Xilinx block set. They must be either integer or fixed point
  • Integer to double conversion, in this case we can use fixed to float conversion with acceptable precision
  • Obtaining row and column
  • Division
  • Floating point multiplication resources vs. latency and clocking frequency
  • Conversion back to fixed point (in this case to integer)
  • All internal modules has to be synchronized

In my application I use 128×128, uint8 gray scale image for contrast stretching. Overview of my module is as below

Inputs

  • 8bit image data
  • 1bit clock
  • 1bit dataStart
  • 1 bit reset
  • 8bit x1 (first transformation threshold)
  • 8bit x2 (second transformation threshold)
  • 8bit y1 (first projected intensity)
  • 8bit y2 (second projected intensity)

Outputs

  • 8bit image data
  • 1bit systemReady (flag)
  • 1bit dataOutOk
  • 1bit frameFinished

clip_image002

clip_image003

Even-though the block diagram shows 3 transformation blocks it can be optimized for one operation block. Since the m,c values also needed to calculate one time per operation by sacrificing few clock cycles, we can further optimize the usage of hardware resources by converting system to a simple state machine.

In order to increase the performance and resource usage the system is designed in to state machine which has main two states and another integrated state machine under first state. First state is used to configure the working parameters for upcoming frame and second state is used to perform contrast stretching operation on image data stream. Let’s have a brief look what happens at the configuring state.

Configuring state

S1:

  • Store c0, c1, c2 {0,y1,y2}
  • Fixed to float x1, y1 {x1-0, y1-0}
  • u = x2-x1
  • v = y2-y1

S2:

  • Fixed to float u,v
  • u = 255-x2
  • v = 255-y2

S3:

  • Fixed to float u,v

S4:

  • Wait till fixed to float ready
  • If ready?
  • w = div(v/u)

S5

  • w = div(v/u)

S6

  • w = div(v/u)

S7

  • Wait till div result ready
  • Store m0 <= w

S8

  • Store m1 <=w

S9

  • Store m2 <=w
  • Go to Main State 2 (processing mode)

With the operation performed in state 1 the coefficients for three transforming functions y = mx + c, now all m values are stored as floating point numbers as well as the c values are stored as integers. The processing state is a pipelined system. In this system there is are two delay blocks, one for pipeline m till the x value converted to float and the c till x to float, multiplied by m and back converted to fix as below

One always block will check value to decide the region of x and sent it to float conversion, at the same time using the flag corresponding m and c values will be send to delayed pipe line to use in proper states. Then the multiplication starts at the end of floating point conversion. At the end of this pipeline it will be re-converted to fixed and add the mapped C value and streamed out from the core.

You can use Xilinx Core generator to generate modules which required using in floating point arithmetic and delay elements. Modules and its source cores are as below,

Module Core source Configuration
   Fixed to float floating point under math operations fixed to float
   Float to fixed floating point under math operations float to fixed
   Float multiplication floating point under math operations multiplication
   Float division floating point under math operations division
   Delay RAM based shift register none

Special codes on the algorithm are as below.

Co-efficient generation part

case(confstate)
  0: begin
     end
  1: begin
       intx <= {1'b0,x1};
       inty <= {1'b0,y1};
       pipevalx <= x2 - x1;
       pipevaly <= y2 - y1;
       fx2flnd <= 1;
       confstate <= 2;
     end
  2: begin
       intx <= {1'b0,pipevalx};
       inty <= {1'b0,pipevaly};
       pipevalx <= 8'b11111111 - x2;
       pipevaly <= 8'b11111111 - y2;
       confstate <= 3;
     end
  3: begin
       intx <= {1'b0,pipevalx};
       inty <= {1'b0,pipevaly};
       fx2flnd <= 0;
       confstate <= 4;
     end
  4: begin
       if(confRdy == 1) begin
         confstate<= 5;
         adiv <= fltb;
         bdiv <= flta;
         div_nd <= 1;
       end
       else begin
         confstate<= 4;
       end
     end
  5: begin
       confstate<= 6;
       adiv <= fltb;
       bdiv <= flta;
     end
  6: begin
       confstate<= 7;
       adiv <= fltb;
       bdiv <= flta;
       div_nd <=0;
     end
  7: begin
       if(divRdy==1) begin
         confstate<= 8;
         m0 <= divRes;
       end
       else begin
         confstate<= 7;
       end
     end
  8: begin
       confstate<= 9;
       m1 <= divRes;
     end
  9: begin
       confstate<= 10;
       m2 <= divRes;
       ready <= 1;
     end
  10:begin
       if(stream==1) begin
         state <= 1;
       end
       else begin
         confstate<=10;
       end
     end
endcase

Data processing part

case(region)
  0: begin
       md <= m0;
       cd <= 9'b0;
       intx <= streamdata;
       fx2flnd <= 1;
     end
  1: begin
       md <= m1;
       cd <= {1'b0,y1};
       intx <= streamdata;
       fx2flnd <= 1;
     end
  2: begin
       md <= m2;
       cd <= {1'b0,y2};
       intx <= streamdata;
       fx2flnd <= 1;
     end
endcase

Region defining part

always @(negedge clk) begin
  if( imagedata<x1) begin
    streamdata <= {1'b0,imagedata};
    region <= 0;
  end
  else if (imagedata<x2) begin
    streamdata <= {1'b0,imagedata - x1};
    region <= 1;
  end
  else begin
    streamdata <= {1'b0,imagedata - x2};
    region <= 2;
  end
end

When considering hardware co-simulation on MATLAB we need to follow additional coding practices to generate the co-simulation model. Additional coding methods are as below.

1. Only put the I/O variable name in module initiation area and define its sizes in the body of program.

2. Do not set initial values with the declaration of variables; initialize them separately within initial block.

3. Do not use capital letters in either I/O ports or interacting module names or in module name. This will cause MATLAB to give you an error when you initiate a black box.

4. Use the core initiation template to initialize your modules to main module.

5. Make sure to run MATLAB as “Run as Administrator” or else nothing will be happen.

core_name  instant_name
(
  .port_name_of_core(port_name_in_module),
  //..............(other ports)
);

Formatted code will be as below.

module contrast_stretching
(
  imagedata,
  x1,
  y1,
  x2,
  y2,
  clk,
  enable,
  stream,
  outready,
  ready,
  imageout
);
input          [7:0]   imagedata;
input          [7:0]   x1;
input          [7:0]   x2;
input          [7:0]   y1;
input          [7:0]   y2;
input                  clk;
input                  enable;
input                  stream;
output reg             outready;
output reg             ready;
output reg    [7:0]    imageout;

reg           [2:0]    state;
reg           [5:0]    confstate;
reg           [7:0]    pipevalx;
reg           [7:0]    pipevaly;
reg           [8:0]    intx;
reg           [8:0]    inty;
reg           [23:0]   m0;
reg           [23:0]   m1;
reg           [23:0]   m2;
reg           [23:0]   adiv;
reg           [23:0]   bdiv;

wire          [23:0]   divRes;
wire          [23:0]   flta;
wire          [23:0]   fltb;

reg                    fx2flnd;
reg                    div_nd;
reg           [2:0]    region;
reg           [8:0]    streamdata;
reg           [23:0]   md;
reg           [8:0]    cd;

wire          [8:0]    resfx;
wire          [8:0]    cq;
wire          [23:0]   mq;
wire          [23:0]   resy;

initial begin
  outready  = 0;
  ready     = 0;
  state     = 0;
  confstate = 0;
  pipevalx  = 0;
  pipevaly  = 0;
  intx      = 0;
  inty      = 0;
  m0        = 0;
  m1        = 0;
  m2        = 0;
  adiv      = 0;
  bdiv      = 0;
  fx2flnd   = 0;
  div_nd    = 0;
  region    = 0;
  streamdata= 0;
  md        = 0;
  cd        = 0;
end

always@(posedge clk) begin
  case(state)
    0: begin
         if(enable==1 & ready == 0 & stream == 0) begin
           confstate <= 1;
         end
         case(confstate)
           0: begin
              end
           1: begin
                intx <= {1'b0,x1};
                inty <= {1'b0,y1};
                pipevalx <= x2 - x1;
                pipevaly <= y2 - y1;
                fx2flnd <= 1;
                confstate <= 2;
              end
           2: begin
                intx <= {1'b0,pipevalx};
                inty <= {1'b0,pipevaly};
                pipevalx <= 8'b11111111 - x2;
                pipevaly <= 8'b11111111 - y2;
                confstate <= 3;
              end
           3: begin
                intx <= {1'b0,pipevalx};
                inty <= {1'b0,pipevaly};
                fx2flnd <= 0;
                confstate <= 4;
              end
           4: begin
                if(confRdy == 1) begin
                  confstate<= 5;
                  adiv <= fltb;
                  bdiv <= flta;
                  div_nd <= 1;
                end
                else begin
                  confstate<= 4;
                  end
                end
             5: begin
                  confstate<= 6;
                  adiv <= fltb;
                  bdiv <= flta;
                end
             6: begin
                  confstate<= 7;
                  adiv <= fltb;
                  bdiv <= flta;
                  div_nd <=0;
                end
             7: begin
                  if(divRdy==1) begin
                    confstate<= 8;
                    m0 <= divRes;
                  end
                  else begin
                    confstate<= 7;
                  end
                end
             8: begin
                  confstate<= 9;
                  m1 <= divRes;
                end
             9: begin
                  confstate<= 10;
                  m2 <= divRes;
                  ready <= 1;
                end
            10: begin
                  if(stream==1) begin
                    state <= 1;
                  end
                  else begin
                    confstate<=10;
                  end
                end
            endcase
       end
    1: begin
         case(region)
           0: begin
                md <= m0;
                cd <= 9'b0;
                intx <= streamdata;
                fx2flnd <= 1;
              end
           1: begin
                md <= m1;
                cd <= {1'b0,y1};
                intx <= streamdata;
                fx2flnd <= 1;
              end
           2: begin
                md <= m2;
                cd <= {1'b0,y2};
                intx <= streamdata;
                fx2flnd <= 1;
              end
         endcase
    end
  endcase
end
always @(negedge clk) begin
  if( imagedata<x1) begin
    streamdata <= {1'b0,imagedata};
    region <= 0;
  end
  else if (imagedata<x2) begin
    streamdata <= {1'b0,imagedata - x1};
    region <= 1;
  end
  else begin
    streamdata <= {1'b0,imagedata - x2};
    region <= 2;
  end
end

always @(posedge clk) begin
  imageout <= resfx[7:0] + cq[7:0];
  outready <= rdyfx;
end

assign strmRdy = confRdy & stream;

wire loadless;

fixedtofloat fxdtoflta
(
  .a(intx), // input [8 : 0] a
  .operation_nd(fx2flnd), // input operation_nd
  .clk(clk), // input clk
  .result(flta), // ouput [23 : 0] result
  .rdy(confRdy) // ouput rdy
);

fixedtofloat fxdtofltb
(
  .a(inty), // input [8 : 0] a
  .operation_nd(fx2flnd), // input operation_nd
  .clk(clk), // input clk
  .result(fltb), // ouput [23 : 0] result
  .rdy(loadless) // ouput rdy
);

floatdivision fltdiv
(
  .a(adiv), // input [23 : 0] a
  .b(bdiv), // input [23 : 0] b
  .operation_nd(div_nd), // input operation_nd
  .clk(clk), // input clk
  .result(divRes), // ouput [23 : 0] result
  .rdy(divRdy) // ouput rdy
);

floatmultiplication fltmul
(
  .a(mq), // input [23 : 0] a
  .b(flta), // input [23 : 0] b
  .operation_nd(strmRdy), // input operation_nd
  .clk(clk), // input clk
  .result(resy), // ouput [23 : 0] result
  .rdy(fl2fx_nd) // ouput rdy
);

floattofix flttofxd
(
  .a(resy), // input [23 : 0] a
  .operation_nd(fl2fx_nd), // input operation_nd
  .clk(clk), // input clk
  .result(resfx), // ouput [8 : 0] result
  .rdy(rdyfx) // ouput rdy
);

mtunnel mdelay
(
  .d(md), // input [23 : 0] d
  .clk(clk), // input clk
  .q(mq) // output [23 : 0] q
);

ctunnel cdelay
(
  .d(cd), // input [8 : 0] d
  .clk(clk), // input clk
  .q(cq) // output [8 : 0] q
);

endmodule

Simulation model will be as follows,

clip_image005

Now we are ready to create hardware co-simulation model for our code. Create a new blank model for hardware co-simulation in MATLAB simulilnk. Place the Xilinx Icon on the model window. Next add a black box and when prompted to locate source, add the path of top module of the model.

Then go to the line which states

% Add addtional source files as needed.
% |-------------
% | Add files in the order in which they should be compiled.
% | If two files "a.vhd" and "b.vhd" contain the entities
% | entity_a and entity_b, and entity_a contains a
% | component of type entity_b, the correct sequence of
% | addFile() calls would be:
% | this_block.addFile('b.vhd');
% | this_block.addFile('a.vhd');
% |-------------
% this_block.addFile('');
% this_block.addFile('');

Then add *.ngc files from the IP-core directory of your Xilinx project. After that add the *.v files of above cores as well. Now your configuration file is ready. Now select ports input and outputs from Xilinx block set and connect them with the model. Set the data type as Boolean in the inputs for the flags. Connect them with step functions. For image data input set input type as unsigned and size to be 8 bits. For now connect it with a constant. Connect image data output to “to workspace block” and other output flags to scopes. These connected blocks are not much important for the model generation. Its only used for initiate the core generation. It’s much better to rename input and output ports since that will be the names of the ports in generated block. But again make sure not use capital letters as well as not to use the same name in the black box’s ports. Your model will be now looking as follows.

clip_image007

Next double click on Xilinx Icon on the model and select

Compilation>> Hardware Co-simulation >> Atlys >> Ethernet >> Point to point

clip_image009

And click the generate button. If everything is done correctly after few minutes your co simulation library model will be generated. Next for the simulations create a simulation frame work similar to the one which i described in my previous article. Copy this block to simulation frame work and connect it as below. Set simulation time to infinity (inf).

clip_image011

Then double click on Library model, and set tab parameter as shown.

clip_image013

Finally connect FPGA to Ethernet port and USB. Power it up. Click on run button to run your simulation. Hope you got a rough idea on performing hardware co-simulation using MATLAB simulink for Xilinx FPGA devices. Thank you very much for reading.

2011 December 6 - Posted by | Electronics, FPGA, Image Processing, MATLAB, Technology

1 Comment »

  1. Hello sir
    its, a really good article for simulink and xilinx block co-simulation. I also tried read an image, process it and display it. But when I read the image, searilize it again deserilize it and send to video viewer it works ok (this is only for demo purpose). But when I connect the Gateway In, Black Box and Gateway Out, I get only black image. Can you address me about the problem? If you give me your email address, I can send the model also to you. Expecting your reply.

    Regards
    Ravi

    Comment by Ravindra Patil | 2011 December 29 | Reply


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: