November 30, 2013

A ball-chasing robot

 
The current version of my robot uses some basic image detection to find a blue ping pong ball. Here you can see it in action:


And here a slightly modified version of the software, where the robot follows me while I am wearing a blue jersey:



The webcam (a Logitech ) is connected to a Beaglebone Black, which does the image processing. The Beaglebone is connected via SPI (which I bitbang on the Beaglebone) to an Atmega 328 hidden inside the robot which is responsible for all the menial tasks like controlling the motors and for the infrared remote. The robot also has a infrared distance sensor at the front so that it does not run into a wall and gets stuck.

This is a typical picture the webcam on the robot shoots:






The non-roundness of the ball on the left stems from me stepping onto it at some point, not from a camera failure.


The Beaglebone uses the Opencv Image processing library to process such a picture. The following is basically the image processing code the Beaglebone runs:


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
#include <opencv2/core/core_c.h>
#include <opencv2/core/core.hpp>
#include <opencv2/imgproc/imgproc_c.h>
#include <opencv2/imgproc/imgproc.hpp>
#include <opencv2/highgui/highgui_c.h>
#include <opencv2/highgui/highgui.hpp>
#include <iostream>
#include <unistd.h>
using namespace std;
using namespace cv;

int main(){

   VideoCapture cap(1);
   cap.set(CV_CAP_PROP_FRAME_WIDTH, 320);
   cap.set(CV_CAP_PROP_FRAME_HEIGHT, 240);
   Mat frame,  hsvframe;
   cap.read(frame);
   cap.read(frame);
   cap.read(frame);
   sleep(1);
   cap.read(frame);
   hsvframe = Mat::zeros(frame.size(), CV_8UC3);
   cvtColor(frame, hsvframe, CV_BGR2HSV);
   Mat imgthreshed =  Mat::zeros(frame.size(), CV_8UC3);
   inRange(hsvframe, Scalar(80, 50,20), Scalar(120, 255,255), imgthreshed);
   vector< vector<Point> > contours;
   vector<Vec4i> hierarchy;
   Mat img  = imgthreshed.clone();
   findContours(img, contours, hierarchy, CV_RETR_EXTERNAL, CV_CHAIN_APPROX_SIMPLE, Point(0,0));
   Rect boundrect;
   Mat drawing = Mat::zeros(frame.size(), CV_8UC3);
   int area = 10; //this controls which size an object must have to be detected
   int IdLargestContour = -1;
   int areabuffer;
   for(unsigned int i = 0; i < contours.size(); i++){
     areabuffer = contourArea(contours[i]);
     if( areabuffer > area){
        area = areabuffer;
        IdLargestContour = i;
     }
  }

    if(IdLargestContour == -1){
        cout << "Nothing to see here" <<endl;
        imwrite("purepicture.jpg", frame);

    }
    else{
        cout << "I see something!"<<endl;
        boundrect = boundingRect(Mat(contours[IdLargestContour]));
        imwrite("purepicture.jpg", frame);
        rectangle(frame, boundrect, Scalar(0,255,0), 1,8,0);
        imwrite("picturewithrect.jpg", frame);
        imwrite("thresh.jpg", imgthreshed);


    }
    return 0;
}


For compilation, you have to link against the DLLs of the included libraries - they should be easy to find online. I am using the C++ bindings of Opencv here; there are also C bindings, but the two are largely incompatible, so be wary when you google for examples.

The first 3 lines of the program launch the webcam and set the resolution to 320x240 - the actual program on the Beaglebone only uses 160x120 for speed reasons. The argument to cap is the number of the webcam you want to access - if you only have a single webcam, use 0; my laptop also has an internal one, so I use webcam 1, the external one.

Mat is the basic image format of Opencv, so we define two of those and save a snapshot from the webcam via cap.read(frame) into frame. I am doing this several times here since the first 2 or 3 pictures from the webcam always end up being way too dark for some reason. In line 24, we convert the image to the HSV color space, which is the preferable format for color-based image processing. Line 26 turns the HSV image into a black-and-white image: The blue pixels (those with a hue between 80 and 120) are white, all others black. The results is the following thresholded image:

In line 30, we search for the various white components of the picture (there is probably only one such component here). In line 36 and following, we search for the largest component among those we have just found. Finally, in line 53, we draw a green rectangle around the largest component into our original picture and save all images:



The Beaglebone is fast enough to perform these calculations relatively quickly;the actual bottleneck is that the drivers for the webcam buffer the image of the webcam several times so that you get an outdated picture if you only request a single snapshot. I am currently "solving" this problem by requesting 5 pictures in a row and then only using the last for the image processing, but this takes some additional time.




November 11, 2013

The PRU of the Beaglebone Black

For timing-critical tasks, the Beaglebone Black has two built-in microprocessors, the PRUs (Processing real-time units). It is not obvious how to use the PRUs; in this post, I try to put together some information on the PRUs.

The PRUs can currently only be programmed in assembler. This is not as bad as it sounds: the assembler instruction set is fairly easy to use, and since it is straightforward to share data between the PRU and the Beaglebone, we only have to write the time-critical code parts in assembler and can write the glue logic in a normal C or C++ program on the Beaglebone black.

At first, you have to install the assembler and the C-library for communication between the PRU and the Beaglebone. You can find an instruction at https://npmjs.org/package/pru under "Driver Library and Assembler". The install also comes with a few example programs.

After installation, you have to enable the PRU via a device tree overlay. On your Beaglebone, navigate to /lib/firmware, create a file called BB-BONE-PRU-00A0.dts and copy the following into the file:


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
/dts-v1/;
/plugin/;

/ {
  compatible = "ti,beaglebone", "ti,beaglebone-black";

 
  part-number = "BB-BONE-PRU";
  version = "00A0";

  exclusive-use =
    "P8.12";

  fragment@0 {
    target = <&am33xx_pinmux>;
    __overlay__ {
      mygpio: pinmux_mygpio{
        pinctrl-single,pins = <
          0x30 0x06
          >;
      };
    };
  };

  fragment@1 {
    target = <&ocp>;
    __overlay__ {
      test_helper: helper {
        compatible = "bone-pinmux-helper";
        pinctrl-names = "default";
        pinctrl-0 = <&mygpio>;
        status = "okay";
      };
    };
  };

  fragment@2{
  target = <&pruss>;
    __overlay__ {
      status = "okay";
    };
  };
};

This enables the PRU and gives the PRU direct access to the GPIO1_12-pin. This is achieved with the lines 14-23; in particular with the line 0x30 0x06. Where do these numbers come from? This is actually fairly non-obvious since both the reference manual and the technical reference manual are silent on this question.

Have a look at selsinork's table in post 5 at http://www.element14.com/community/thread/23952?tstart=0.
Oddly, a similar, but less informative table appears in the reference manual, which you may find at http://www.farnell.com/datasheets/1701090.pdf, and I do not know in which official argument the table linked above appears - it seems to be correct, however. You  should see columns up to mode 7; if not, you find a PDF version of the table in the post after the table.

Looking at the columns mode 6 and mode 7, we see that mode 6 of the gpio1_12 pin is pr1_pru0_pru_r30_14.  In this mode, we can control the behaviour of that particular pin directly in the assembler code of the PRU. This is why we have the 0x06 in the above code: it tells the Beaglebone that we want to switch a pin to mode 6. The other number, 0x30, tells the Beaglebone which pin to switch. The helpful table linked above actually gives the memory address of the pin: it has an offset of 0x830. This is the offset of the pin conf_gpmc_ad12 in the Control module part of the memory of the Beaglebone, as you can find in the technical reference manual http://elinux.org/images/6/65/Spruh73c.pdf of the (processor used in the) Beaglebone in Chapter 9, Control module. Consulting selsinork's table, we see that the pin conf_gpmc_ad12 is indeed GPIO1, pin 12. Since the registers actually controlling the behaviour of the pins start at 0x800, we only pass the offset after this memory address, which is 0x30. 

Now we have to compile the device tree overlay. In /lib/firmware, the command

dtc -O dtb -o BB-BONE-PRU-00A0.dtb -b 0 BB-BONE-PRU-00A0.dts

compiles the .dts file to a file usable by Linux. Should the compiler not be installed on your system, you can find instructions at https://npmjs.org/package/pru under "Device tree" .To enable the overlay, go to /sys/devices/bone_capemgr.8 (or maybe 9, depending on your version of Linux) and load the device tree overlay:


echo BB-BONE-PRU > slots

It should now appear at the end of the list you get with

cat slots

Now you have enabled the PRU. This is not a permanent way to enable the device tree overlay - you will have to do it again after each reboot; forgetting to load the overlay is usually the reason for weird error messages you get when starting the PRU. Some googling should give you good results if you want to make this overlay permanent.

The device tree stuff is unfortunately quite confusing, but you can forget about all this right now. But while we are at the command line, we also enable the PRU drivers via

modprobe uio_pruss

I think they are only required if you actually want to access the memory of the Beaglebone via the PRUs, but safe is safe.

Finally, we can program our PRU. The following two programs will ask the user for an input and move a servo motor connected to GPIO1_12 accordingly. This already requires timing in the lower microseconds range, which is really tricky to reach with direct GPIO manipulations via Linux, leading to a jittery servo. It could probably also be done with a pwm pin, but getting pwm to work also seems to be tricky.
 
To play it safe, you should use a separate battery for the servo motor if you try this (remember to connect the grounds!) since the servo can pull quite a lot of peak current. I  would also advice to use a 10k resistor in the line from the pin to the servo, just in case something goes wrong.

The assembler code compiles to a bin-file; the following C-program shows how to upload the bin-file to the PRU.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
#include <stdio.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/mman.h>
#include<iostream>
#include<bitset>
#include <string.h>
// Driver header file
#include <prussdrv.h>
#include <pruss_intc_mapping.h>
#include <iostream> 
 /*****************************************************************************
* Explicit External Declarations                                             *
*****************************************************************************/

/*****************************************************************************
* Local Macro Declarations                                                   *
*****************************************************************************/
#define PRU_NUM  0
#define ADDEND1  0x0010F012u
#define ADDEND2  0x0000567Au
#define OFFSET_DDR  0x00001000 
#define PRU_ADDR 0x4A300000
#define SHAREDRAM_OFFSET 0x00012000
#define PRUSS0_SHARED_DATARAM    4
#define AM33XX
int main(){
     int fd = open("/dev/mem",O_RDWR | O_SYNC);
     ulong* prusharedmemory = (ulong*) mmap(NULL, 0x10000, PROT_READ | PROT_WRITE, MAP_SHARED, fd, PRU_ADDR+SHAREDRAM_OFFSET);
     prusharedmemory[0]= 0b100011111;
     unsigned int ret;
     tpruss_intc_initdata pruss_intc_initdata = PRUSS_INTC_INITDATA;
    
    printf("\nINFO: Starting %s example.\r\n", "Servo");
    /* Initialize the PRU */
    prussdrv_init ();  
    

    ret = prussdrv_open(PRU_EVTOUT_0);
    if (ret)
    {
        printf("prussdrv_open open failed\n");
        return (ret);
    }
    
    /* Get the interrupt initialized */
    prussdrv_pruintc_init(&pruss_intc_initdata);



     int a;
/*load program into the pru*/
     prussdrv_exec_program (PRU_NUM, "./servo.bin");
    while(1){
          std::cin >> a;
           if(a == 255){
              prusharedmemory[0] = a;
              break;
           }
           a = 90+(100*a)/180;
           prusharedmemory[0] = a;

    }
    prussdrv_pru_wait_event (PRU_EVTOUT_0);
    printf("\tINFO: PRU completed transfer.\r\n");
    prussdrv_pru_clear_event (PRU0_ARM_INTERRUPT);
    /* Disable PRU and close memory mapping*/
    prussdrv_pru_disable (PRU_NUM);
    prussdrv_exit ();
    return 1;
}

We start by defining a memory map into that part of the memory which can be directly accessed from both the PRU and the Beaglebone. The first part of that memory is for storing the sourcecode; offset 0x12000 is in the shared RAM of the PRUs. We can now write something to this place in memory and easily read it into the PRU. The pruss library is supposed to have its own function doing that, but I could not get it to work, and the memory map approach is more transparent anyway.

After that, we initiliaze the PRU interrupt, which allows the PRU to notify our C-program once it is finished, and load the bin file in line 58. After that, we read in a number representing the angle we want to move the servo to. If this number is 255, we pass 255 to the PRU and break the loop; otherwise, we pass a rough estimate of the length of the pulses needed to reach the desired servo position. Finally, we wait until the PRU tells us he is done and do some cleanup. Depending on your servo, you probably have to tweak the numbers a bit.

To compile the code, you need to link to several libraries. From the command line,


g++ YourFileName.c -o YourExecutableName -lpthread -lprussdrv

does the job.

Now for the assembler code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
.setcallreg r29.w0
.origin 0
.entrypoint START

#define AM33XX
#define PRU0_PRU1_INTERRUPT     17
#define PRU1_PRU0_INTERRUPT     18
#define PRU0_ARM_INTERRUPT      19
#define PRU1_ARM_INTERRUPT      20
#define ARM_PRU0_INTERRUPT      21
#define ARM_PRU1_INTERRUPT      22

#define CONST_PRUDRAM   C24
#define CONST_L3RAM     C30
#define CONST_DDR       C31
#define GPIO1 0x4804c000
#define GPIO_CLEARDATAOUT 0x190
#define GPIO_SETDATAOUT 0x194
#define CONST_PRUSHAREDRAM   C28
#define CTBIR_0         0x22020
// Address for the Constant table Block Index Register (CTBIR)
#define CTBIR          0x22020

// Address for the Constant table Programmable Pointer Register 0(CTPPR_0) 
#define CTPPR_0         0x22028 

// Address for the Constant table Programmable Pointer Register 1(CTPPR_1) 
#define CTPPR_1         0x2202C  
#define sp r24
#define lr r23


START:
LBCO      r0, C4, 4, 4
CLR     r0, r0, 4         // Clear SYSCFG[STANDBY_INIT] to enable OCP master port
SBCO      r0, C4, 4, 4
LOOPN:
MOV r3, 0x00012000
LBBO r4, r3, 0, 4
QBEQ EXIT, r4, 255
SET r30, 14
MOV r2, r4
CALL DEL
CLR r30, 14
MOV r2, 2000
CALL DEL
JMP LOOPN



DEL: //delay r2 * 10 us
MOV r5, 1000
INDEL:
SUB r5, r5, 1
QBNE INDEL, r5, 0
SUB r2, r2, 1
QBNE DEL, r2, 0
RET
EXIT:
MOV R31.b0, PRU0_ARM_INTERRUPT+16
HALT 

You should save this in a .p file - at least, that is what the examples do. It can be compiled via

pasm -p YourFileName.p

and you should change the C-code in line 58 to reflect your filename.

You can find a description of all the instructions at http://processors.wiki.ti.com/index.php/PRU_Assembly_Instructions.

In the very first line, we change the register used for storing return addresses of function calls: By default, it is r30, which interferes with our program since r30 also is the register we use to control the pin. More precisely, bit 14 of r30 now controls the gpio1_12 pin thanks to our device tree overlay; you can see it set and cleared in lines 41 and 44.

The lines 34-36 enable access of the PRU to the actual memory of the Beaglebone, not only the parts dedicated to the PRU. We do not actually use it here; you should be a bit careful with direct manipulation of the Beaglebone memory.You may find the information at http://hipstercircuits.com/beaglebone-pru-ddr-memory-access-the-right-way/ useful if you are interested in this.

In line 39, the LBCO command loads the content of memory at the location stored in the register r3 into the register r4. In our case, this is the same location (0x12000) our memory map in the preceding C-program pointed to, so this command loads the value we have stored there (and before that, passed to the C-program via the command line) into register r4.If this value is 255, we jump to the end of the program.  Everything else should be fairly straightforward if you have programmed any assembler before or can be easily found in the instruction set wiki linked above. Line 60 enables the interrupt, notifying the C-program that it can quit now.

Hoepfully, this is useful for someone - it likely will be useful for me in 6 months once I have forgotten everything again.