smoothieware-website-v1

[!NOTE] This is mainly out of date now, and describes a pretty old snapshot of smoothie. left here for historical reasons.

This page is intended as a description of the current status of acceleration and stepping in Smoothie, and then a log of the process of moving to better acceleration and better ( maybe ) stepping.

These are notes to myself, please be nice :)

Here is the plan:

  1. Describe the current status of stepping
  2. Request suggestions on how to make it better/faster
  3. Make acceleration better by acceleration every step
  4. See if that completely ruins our lives
  5. If not, try to optimize the heck out of things, then call it smoothie’s new internal organs.

How Smoothie steps

A summary of how we get smoothie to generate steps:

  1. We receive a line over UART (for example), and dispatch it as an event to all modules.
  2. If that line is a GCode, we dispatch that as an event to all modules.
  3. The Robot likes to listen to GCodes, converts them into small line segments, and passes those to the Planner.
  4. The Planner receives this segment, and turns it into a Block on the Conveyor, then does all kinds of acceleration math/planning on it to figure out the maximal speeds for each movement. The block is now at the end of the Conveyor’s queue.
  5. If this is the first block we added ever, the conveyor queue starts playing music: the Conveyor calls the on_block_begin event, warning all modules that this block starts playing.
  6. Some modules, like Stepper and Extruder, like to take responsibility for a Block. We are here interested in Stepper as it’s the one responsible for doing movements.
  7. Now the very interesting (for us, don’t be sad, planner people) part begins: Stepper instructs its StepperMotor objects to move. They are instructed to move a certain number of steps. Also, the StepperMotor is made active in the StepTicker’s list of active motors.
  8. Things now stop happening in the normal, main loop context, and start happening elsewhere.
  9. The stepping interrupt is executed by the interrupt thingie, at a fixed frequency (we used 100kHz, the default).
  10. When this interrupt occurs, the main thing to happen is that every active StepperMotor’s tick() method gets called.
  11. The main job of this method is simply to do what is called Bresenham’s line algorithm. Basically, we increment a counter by a fixed value every tick(), and when that counter is higher than a given value (determined by the speed we want to move at), we generate a step signal to the stepper driver. (Note we do not actually do Bresenham, we do a floating-point DDA on three axes).

This is the core of what smoothie does.

Additionally, a painful thing we have to do, is to update the speed very often, to be able to accelerate, and decelerate.

This happens in a separate (lower priority) timer, at regular, but more rare intervals.

  1. This method gets called periodically, and, depending on whether we are accelerating, decelerating, or cruising
  2. Changes the speed or does not for all movement stepper motors.

This behavior is inherited from grbl and is the source of many approximations/small problems.

You can end up with deceleration curves reaching zero, or near-zero speed a few steps too early, and instead of a direction change, you get a pause. This currently (Feb 2013) can lead, at high speeds, to «shocky» end of moves, and even missed steps.

Step generation’s current cost

Step generation is what Smoothie spends most of its time doing.

The reason for this is simple:

Basically, we want this to take as little time as possible in order to do it as often as possible.

In order to analyze how much various things we do take to execute, their cost, we can simply turn pins high at the beginning of things, low at the end of them, plug in a Logic Analyzer, and look at the graphs.

This has been extremely useful in the past to figure out where we spent too much time, what needed fixing, etc.

Step generation graph

  1. In white, is the X-axis step signal. It is turned on here and turned off here.
  2. In brown is the duration of this interrupt. This is the total time we spend generating steps. It is the time we want to reduce as much as possible. A bit of the rest of the time is spent doing acceleration, all the rest in the main loop, mostly doing Planner math.
  3. In red, which never turns high here, is the duration of this condition, if it ever becomes true, not very interesting here. Basically, we make sure if we spent so much time in this interrupt occurrence that it overlaps on the next occurrence, we skip the next, but make sure we do so without messing any of the rest of the math.
  4. In orange, is the duration of the tick() method of each StepperMotor
  5. In yellow, is the duration of this condition inside the tick() method, if true. Basically, if we have to generate a step, this turns high.

The width of the screen should represent 10 microseconds.

So, what is important to see here?

Now the very interesting stuff.

We are here to hunt waste. We want this to run as fast as possible. And looking at the curves, we clearly have a problem (not sure it has a solution).

The orange line is the time we spend doing useful work. The work we are here to do.

The brown line is the time it actually takes us, total, to do that work. As you can see, we spend much much time doing stuff that is not directly useful (though it may be necessary). That would be iterating over the active StepperMotors list, calling methods, etc… maintenance if you like.

We want that to last as little time as possible!

There are two situations here:

  1. When we do not output a step signal (the short brown duration)
  2. When we output one or more step signals (the long brown duration)

The additional duration of the second one compared to the first is due to the fact that we have a second timer interrupt, with a higher priority, to turn the step pin low after one microsecond. This interrupts this interrupt to turn the pin off, making this interrupt last longer. I think that’s the main cause of the longer execution time.

Now when we do not output a step signal, the total interrupt time is still much more than the useful (orange) time.

It should probably be interesting to look at the assembly to see how long we spend doing each thing.

What it looks like, in the code.

This is schematic C++ representing what happens in your typical, short step interrupt.

// This is where work is done
extern "C" void TIMER0_IRQHandler (void){

    // If no axes enabled, just ignore for now. This costs us a tiny bit of time
    if( global_step_ticker->active_motor_bm == 0 ){ return; }

    // We set the timer to a very high value so we don't overflow the timer if this takes too long
    LPC_TIM0->MR0 = 2000000;

    // This calls the tick method.
    global_step_ticker->tick();

    // Let's inline it below:
    _isr_context = true;
    int i;
    uint32_t bm;

    // This is your usual loop. I have no idea how costly it is. It seems like it is costly as we don't do much else.
    for (i = 0, bm = 1; i < 12; i++, bm <<= 1){
        if (this->active_motor_bm & bm){

            // We call the tick() method for each StepperMotor.
            this->active_motors[i]->tick();

            // Let's inline this below:
            void StepperMotor::tick(){

                // Increase the (fixed point) counter by one tick
                this->fx_counter += (uint64_t)((uint64_t)1<<32);

                // If we are to step now.
                if( this->fx_counter >= this->fx_ticks_per_step ){

                      // Here we don't care about the case where we do. We care about the time we waste before and after this.

                }
            }
        }
    }

    // The iteration over the active StepperMotors is now finished, all useful work is done
    _isr_context = false;

    // Return to the main interrupt function

    // If we did set a pin high, we want to set the other timer to set it low one microsecond from now
    if( global_step_ticker->reset_step_pins ){
        // But we don't care about the case where we did here. Still the check is expensive
        LPC_TIM1->TCR = 3;
        LPC_TIM1->TCR = 1;
        global_step_ticker->reset_step_pins = false;
    }

    // If a move finished in this tick, we have to tell the actuator to act accordingly. This is not happening here either as we did not generate a step.
    if( global_step_ticker->moves_finished ){ global_step_ticker->signal_moves_finished(); }

    // If we spent too much time inside the interrupt. This should probably never happen if the previous condition was not true.
    if( LPC_TIM0->TC > global_step_ticker->period ){ 
        // We don't care, not happening
    }

    // This is just a security to make sure we never miss our match register
    while( LPC_TIM0->TC > LPC_TIM0->MR0 ){
        LPC_TIM0->MR0 += global_step_ticker->period;
    }

}

// And that's it

Several optimizations found and applied:

These are optimizations that are most useful in the case we don’t do anything. Applying them gives us the following signals/durations:

Optimized step generation graph

Compared to the previous graph, we now spend significantly less time in the interrupt when no step is generated, and a bit less time when one or more steps are generated.