Acceleration Branch

This is mainly out of date now, and describes a pretty old snapshot of smoothie. Left here for historical reasons.

This page is intended as a description of the current status of acceleration and stepping in Smoothie, and then a log of the process of moving to better acceleration and better ( maybe ) stepping.

These are notes to myself, please be nice :)

Here is the plan:

Describe the current status of stepping
Request suggestions on how to make it better/faster
Make acceleration better by acceleration every step
See if that completely ruins our lives
If not, try to optimize the heck out of things, then call it smoothie’s new internal organs.

How Smoothie steps

A summary of how we get smoothie to generate steps:

This describes v1 architecture. See the version-specific sections below for v2 differences in step rates and capabilities.

We receive a line over UART (for example), and dispatch it as an event to all modules.
If that line is a GCode, we dispatch that as an event to all modules.
The Robot likes to listen to GCodes, converts them into small line segments, and passes those to the Planner.
The Planner receives this segment, and turns it into a Block on the Conveyor, then does all kinds of acceleration math/planning on it to figure out the maximal speeds for each movement. The block is now at the end of the Conveyor’s queue.
If this is the first block we added ever, the conveyor queue starts playing music: the Conveyor calls the on_block_begin event, warning all modules that this block starts playing.
Some modules, like Stepper and Extruder, like to take responsibility for a Block. We are here interested in Stepper as it’s the one responsible for doing movements.
Now the very interesting (for us, don’t be sad, planner people) part begins: Stepper instructs its StepperMotor objects to move. They are instructed to move a certain number of steps. Also, the StepperMotor is made active in the StepTicker’s list of active motors.
Things now stop happening in the normal, main loop context, and start happening elsewhere.
The stepping interrupt is executed by the interrupt thingie, at a fixed frequency (we used 100kHz, the default).
When this interrupt occurs, the main thing to happen is that every active StepperMotor’s tick() method gets called.
The main job of this method is simply to do what is called Bresenham’s line algorithm. Basically, we increment a counter by a fixed value every tick(), and when that counter is higher than a given value (determined by the speed we want to move at), we generate a step signal to the stepper driver.

(Note we do not actually do Bresenham, we do a floating-point DDA on three axes).

This is the core of what smoothie does.

Additionally, a painful thing we have to do, is to update the speed very often, to be able to accelerate, and decelerate.

This happens in a separate (lower priority) timer, at regular, but more rare intervals.

This method gets called periodically, and, depending on whether we are accelerating, decelerating, or cruising
Changes the speed or does not for all movement stepper motors.

This behavior is inherited from grbl and is the source of many approximations/small problems.

You can end up with deceleration curves reaching zero, or near-zero speed a few steps too early, and instead of a direction change, you get a pause. This currently (Feb 2013) can lead, at high speeds, to «shocky» end of moves, and even missed steps.

Step generation’s current cost

Step generation is what Smoothie spends most of its time doing.

The reason for this is simple:

The more often we increment the Bresenham timer, the more precise the step generation is.
So we do it as often as possible
But doing it takes time, so the less time it takes the more often we can do it.

Basically, we want this to take as little time as possible in order to do it as often as possible.

In order to analyze how much various things we do take to execute, their cost, we can simply turn pins high at the beginning of things, low at the end of them, plug in a Logic Analyzer, and look at the graphs.

This has been extremely useful in the past to figure out where we spent too much time, what needed fixing, etc.

Step generation graph showing timing analysis

In white, is the X-axis step signal. It is turned on here and turned off here.
In brown is the duration of this interrupt. This is the total time we spend generating steps. It is the time we want to reduce as much as possible. A bit of the rest of the time is spent doing acceleration, all the rest in the main loop, mostly doing Planner math.
In red, which never turns high here, is the duration of this condition, if it ever becomes true, not very interesting here. Basically, we make sure if we spent so much time in this interrupt occurrence that it overlaps on the next occurrence, we skip the next, but make sure we do so without messing any of the rest of the math.
In orange, is the duration of the tick() method of each StepperMotor
In yellow, is the duration of this condition inside the tick() method, if true. Basically, if we have to generate a step, this turns high.

The width of the screen should represent 10 microseconds.

So, what is important to see here?

When the brown line is high, this means something is happening. All the rest of the time is either acceleration updates or the main loop (mostly doing Planner math). About half of each seems to be a healthy ratio, but we can probably go higher.
When the brown line is high is what we want to reduce to a minimum (duration), because the less time it lasts, the more often we can do it.
You can note that the interrupt does not always take as long to execute. Here is what it always does:
- For each active StepperMotor
  - Call its tick() method, in there:
  - Increment the Bresenham counter
  - If the counter is higher than the current Bresenham «roof» value (dependent on the speed we want)
    - Generate a step signal, reset the counter, increment the step counter
    - If we stepped the number of steps we were asked, signal the end of the move
It should be noted that the time it takes to run this interrupt depends on the two “ifs” for each StepperMotor. The more conditions are true, the longer we stay.
This can be seen on the orange line. The normal situation is that we spend very little time in each tick() method (just incrementing the Bresenham counter, then leaving). But if the counter is big enough, the condition is true, and we spend more time there, setting the step pin high etc… When this happens can be seen on the yellow line.

Now the very interesting stuff.

We are here to hunt waste. We want this to run as fast as possible. And looking at the curves, we clearly have a problem (not sure it has a solution).

The orange line is the time we spend doing useful work. The work we are here to do.

The brown line is the time it actually takes us, total, to do that work. As you can see, we spend much much time doing stuff that is not directly useful (though it may be necessary). That would be iterating over the active StepperMotors list, calling methods, etc… maintenance if you like.

We want that to last as little time as possible!

There are two situations here:

When we do not output a step signal (the short brown duration)
When we output one or more step signals (the long brown duration)

The additional duration of the second one compared to the first is due to the fact that we have a second timer interrupt, with a higher priority, to turn the step pin low after one microsecond. This interrupts this interrupt to turn the pin off, making this interrupt last longer. I think that’s the main cause of the longer execution time.

Now when we do not output a step signal, the total interrupt time is still much more than the useful (orange) time.

It should probably be interesting to look at the assembly to see how long we spend doing each thing.

What it looks like, in the code.

This is schematic C++ representing what happens in your typical, short step interrupt.

// This is where work is done
extern "C" void TIMER0_IRQHandler (void){

    // If no axes enabled, just ignore for now. This costs us a tiny bit of time
    if( global_step_ticker->active_motor_bm == 0 ){ return; }

    // We set the timer to a very high value so we don't overflow the timer if this takes too long
    LPC_TIM0->MR0 = 2000000;

    // This calls the tick method.
    global_step_ticker->tick();

    // Let's inline it below:
    _isr_context = true;
    int i;
    uint32_t bm;

    // This is your usual loop. I have no idea how costly it is. It seems like it is costly as we don't do much else.
    for (i = 0, bm = 1; i < 12; i++, bm <<= 1){
        if (this->active_motor_bm & bm){

            // We call the tick() method for each StepperMotor.
            this->active_motors[i]->tick();

            // Let's inline this below:
            void StepperMotor::tick(){

                // Increase the (fixed point) counter by one tick
                this->fx_counter += (uint64_t)((uint64_t)1<<32);

                // If we are to step now.
                if( this->fx_counter >= this->fx_ticks_per_step ){

                      // Here we don't care about the case where we do. We care about the time we waste before and after this.

                }
            }
        }
    }

    // The iteration over the active StepperMotors is now finished, all useful work is done
    _isr_context = false;

    // Return to the main interrupt function

    // If we did set a pin high, we want to set the other timer to set it low one microsecond from now
    if( global_step_ticker->reset_step_pins ){
        // But we don't care about the case where we did here. Still the check is expensive
        LPC_TIM1->TCR = 3;
        LPC_TIM1->TCR = 1;
        global_step_ticker->reset_step_pins = false;
    }

    // If a move finished in this tick, we have to tell the actuator to act accordingly. This is not happening here either as we did not generate a step.
    if( global_step_ticker->moves_finished ){ global_step_ticker->signal_moves_finished(); }

    // If we spent too much time inside the interrupt. This should probably never happen if the previous condition was not true.
    if( LPC_TIM0->TC > global_step_ticker->period ){ 
        // We don't care, not happening
    }

    // This is just a security to make sure we never miss our match register
    while( LPC_TIM0->TC > LPC_TIM0->MR0 ){
        LPC_TIM0->MR0 += global_step_ticker->period;
    }

}

// And that's it

Several optimizations found and applied:

Exit the interrupt early if no step was generated.
Disable the interrupt when not moving.
Inline the StepperMotor’s tick() function.

These are optimizations that are most useful in the case we don’t do anything. Applying them gives us the following signals/durations:

Optimized step generation graph showing improved performance

Compared to the previous graph, we now spend significantly less time in the interrupt when no step is generated, and a bit less time when one or more steps are generated.

V1 vs V2 Step Generation Comparison

The stepping and acceleration architecture has evolved significantly between V1 and V2:

V1 Step Generation

Hardware Capabilities:

Maximum step rate: 100 kHz
Microstepping: Up to 1/32
Processor: NXP LPC1769 (100-120 MHz)
Timer: Hardware timer interrupt at fixed frequency
Processing: Modest CPU headroom

Configuration: Set the step frequency and acceleration using:

- Default 100 kHz
- Default acceleration in mm/s²

Limitations:

Fixed stepping frequency at 100 kHz provides good precision but limits maximum speed at high microstepping ratios
Coarse microstepping (max 1/32) means lower resolution at maximum speeds

V2 Step Generation

Hardware Capabilities:

Maximum step rate: 200 kHz (2× faster than V1)
Microstepping: Up to 1/256 with interpolation (8× finer than V1)
Processor: STM32H745 (480 MHz M7 core, 8.2× DMIPS faster than V1)
Timer: Hardware timer with <1% jitter
Processing: Extensive CPU headroom for advanced features

Configuration: Set the step frequency and acceleration using:

- Default 200 kHz
- Default acceleration in mm/s²
- Configurable pulse width (1-3+ microseconds)

Improvements:

2× faster stepping (200 kHz vs 100 kHz) enables fine microstepping at full speed
1/256 microstepping (vs 1/32) provides 8× finer resolution
Configurable step pulse width for driver compatibility
Lower jitter timing for more precise motion control
Enables: 1/64 microstepping at full speed, fast rapids (30,000 mm/min), smooth delta motion

Performance Summary

Metric	V1	V2	Improvement
Maximum step rate	100 kHz	200 kHz	2× faster
Maximum microstepping	1/32	1/256	8× finer
Processor speed	100 MHz	480 MHz	4.8× faster
CPU headroom	Limited	Extensive	Enables more features

The increased step rate and finer microstepping in V2, combined with significantly more CPU power, enables much smoother motion at higher speeds and supports advanced features like sensorless homing and dynamic load compensation on the stepper drivers.

This is a wiki! If you'd like to improve this page, you can edit it on GitHub.