STM32G4: CORDIC vs sinf() for Motor Control


Posted on April 14, 2020 by admin

Today I decided to decided to turn on the CORDIC in my STM32G431 and compare its speed at calculating sines and cosines to the sinf() function in the math.h library. I also decided to evaluate the performance of how we can use the CORDIC to more quickly calculate the DQ transforms when doing motor control.

First up: evaluating sinf. We declare a float s and assign it to the value of sinf(theta). We measure the time it takes by scoping PA_7. Here is the code:

GPIOA->ODR |= GPIO_ODR_OD7;   // turn on PA7
float s = sinf(theta);
GPIOA->ODR &= ~GPIO_ODR_OD7;  // turn off PA7
WriteDACch1(theta/6.283185f);
WriteDACch2((s/2.0f) + 0.5f );

The results of this were interesting. sinf takes a surprisingly variable amount of time to execute, between 500 ns and almost 4 us.

The changing length actually caused some weird effects with my loop, causing it to overrun sometimes. So: sinf: 3.6 us.

Next up was to test the DQ transforms implemented with sinf. These transforms use some interesting trig identities to reduce the number of sines and cosines which have to be calculated. Here was the code:

GPIOA->ODR |= GPIO_ODR_OD7;
float c = cosf(theta);
float s = sinf(theta);
i_d = 0.667f*( c*i_a + ( 0.866f*s-.5f*c)*i_b + (-0.866f*s-.5f*c)*i_c);  
i_q = 0.667f*(-s*i_a - (-0.866f*c-.5f*s)*i_b - ( 0.866f*c-.5f*s)*i_c);
GPIOA->ODR &= ~GPIO_ODR_OD7;

Result: 8 us. Still quite variable, but at a minimum 3.5 us or so.

Next up was the non-optimized transforms. I decided to try these to see how bad they were. Additionally, if sines and cosines are free, it may actually be faster to calculate the transforms this way. Here was the code:

GPIOA->ODR |= GPIO_ODR_OD7;
i_d = (2.0f/3.0f)*(i_a*cosf(theta) + i_b*cosf(theta+2.094f) + i_c*cosf(theta-2.094f));
i_q = (2.0f/3.0f)*(i_a*sinf(theta) + i_b*sinf(theta+2.094f) + i_c*sinf(theta-2.094f));
GPIOA->ODR &= ~GPIO_ODR_OD7;

Result: still highly variable. The maximum time this transform took was 14 ms.

Next up: Turning on the CORDIC.

The CORDIC is actually incredibly simple to turn on, having only three registers in total: the control register, data in, and data out. However, making these registers work turned out to be incredibly, incredibly frustrating, mostly because I had never worked with q1.15 format before. But eventually the registers succumbed and the CORDIC gave reasonable answers. I chose to use the CORDIC in 16-bit mode, because that is precise enough for what I need to do. Additionally, 32-bit mode requires two sequential writes to the same register, which is weird.

Here was the code which made the CORDIC spit out reasonable results. This code took about 380 ns to execute, a full order of magnitude improvement from worst case time of the sinf function. Additionally, the CORDIC calculates sine and cosine simultaneously! Here is the code:

GPIOA->ODR |= GPIO_ODR_OD7;
volatile unsigned int cordicin = 0x7fff0000; 
rth = theta*10430;
cordicin += rth;
CORDIC->WDATA = cordicin;
unsigned int out0 = CORDIC->RDATA;
short out1 = (out0&0xffff0000)>>16;
short out3 = out0&0xffff;
float ss2 = (float)out3/32767.0f;
GPIOA->ODR &= ~GPIO_ODR_OD7;

It turns out that the CORDIC run itself and all the associated bit shifting between shorts takes only 200 ns, and that the additional 180 ns is spent converting between floats and shorts.

CORDIC calculating a cosine:

I packed it into a function. Sadly declaring the variables and running this function bumps the computation time up to about 0.8 us. For the highest speed, the best way would definitely be to keep everything in integer land, as the ADC readings are in integer form anyways. For maximum blazing speed this would definitely be the way to go. But floats are nice and easy to think about, so we’ll stick with them for now. Here is the code:

void RunCordic(float theta, float *cos_out, float *sin_out) {
	unsigned int cordicin = 0x7fff0000;  //  mag = 1
	short thetashort = theta*10430;       // wrap it
	cordicin += thetashort;
	CORDIC->WDATA = cordicin;
	unsigned int out0 = CORDIC->RDATA;
	short out2 = (out0&0xffff0000)>>16;
	short out1 = out0&0xffff;  //
	*cos_out = (float)out1/32768.0f;
	*sin_out = (float)out2/32768.0f;
}

For motor control purposes, three sines and cosines are necessary. The first version of the transforms uses some trig identities to require only one sine and one cosine, at the cost of needing more floating point multiplies and divides. The second version of the transforms requires three sines and three cosines, but this may be faster if sines and cosines are fast to calculate. It is ideal to calculate these together to save some time. I wrote a function to run the CORDIC three times with the correct phasing done in integer math, to save some more time. This function plus the declaration of float outputs takes only 2 us to run. I paired this function with the version of the transforms which uses three sines. I tested the full Park and Clark transform both in the loop, as they were originally and with the CORDIC code. The original park and clark transforms combined take a total of 9 us. This is pretty impressive as it means the entire Clark transform only takes 1 us. With the CORDIC, drumroll please….. 3.5 us!!

The final test I did was just using the CORDIC once and using the original optimized transforms as they were. It turned out that this was actually faster than the previous method, taking only about 2.8 us. I guess floats are actually just really fast to calculate. Yay! a reduction of 3x over the original transforms, a saving of 6 us of loop time. Pretty cool! With a 20 kHz loop, that’s 12%. Not a ton, but enough that its worth doing.

Cool pic of cosines and sines.

Happy G4’ing!!


Uncategorized

0