April 2020

Introducing: the Hybrid-Electric Moped!


Posted on April 26, 2020 by admin

Hey all, this project has been a long time in the making. Here goes.

Over my senior year, I built the electric bike, which somehow turned out to be awesome. I estimate it has been over a thousand miles, and it has literally never left me stranded- not once. It’s small enough to fit on the commuter rail, yet produces enough torque to literally do a backflip. It is the perfect fun city ride. Yet, the electric bike does leave some things to be desired. Over the summer of 2019, I decided on some features that I wanted in a new vehicle:

  • Suspension. A lack of suspension is OK for short distance, and in fact even beneficial because it makes the bike simpler and lighter. However, after 20 miles of no suspension, it really starts to get painful.
  • 50 mile range. The bike’s 20 miles is enough to explore the Boston area, but is runs on fumes by the time it reaches Lincoln. Western Mass is out of reach.
  • Higher speeds. The electric bike will hit 50 (just barely), but it is terrifying. It comfortably cruises around 25. I would like my new vehicle to cruise at 35 ish to keep up with traffic.
  • Waterproofing. I want to be able to ride though puddles, a feature the electric bike does not have.Waterproofing is a big headache: EVERYTHING must be enclosed, you need o-ringed connectors, bearings must be sealed, etc. But this is necessary for a daily driver vehicle.
  • Looks dank. I want this thing to look sciencey. Electric bike looks like a dumpster fire, not sleek.

With these requirements laid out, I looked for a donor frame. The strategy of “find a frame and attach motor” used in the electric bike worked out really well in the electric bike, as I am not really interested in making frames. Also, it turns out that you can get a really, really well made frame for only several dollars. Bike frames were more or less figured out in 1900. Several options existed for a frame: Mountain bike, Motorcycle, or Moped. I ultimately settled on moped for a few reasons:

  • Bike frames are not designed to take motors. Either you end up with a motor tumor mid drive thing or a motor tumor on the side.
  • I wanted something that was a little heavier than a bike, for a slightly smoother ride at higher speeds.
  • Motorcycles are scary. A motorcycle is happiest doing 80+ on the highway, which I am not interested in doing- I only want to do 35!
  • Additionally, motorcycles are really big and heavy- most motorcycles are about 500 lbs if not more. The motor and battery requirements become fairly substantial and expensive at motorcycle power levels.

A moped seemed to be the correct choice, as it is designed to take a motor and is not a motorcycle. So, onto the craigslist!!! I looked for a moped with a tube frame. A lot of mopeds have stamped frames with integrated gas tanks. I find the stamped frames very ugly and also I do not need the gas tank.

Eventually I found exactly what I was looking for- a Kreidler MP-19 with a seized motor, for $150. I went all the way out to western mass to pick it up.

I sold the gas tank and the motor and recouped $50, so all in all ended up with it all for $100. I cleaned it up as best as I could.

Next up was coming up with a vague plan for the battery and motor. The motor I was interested in using was a bar wound motor, the 2007 Toyota Camry A/C compressor motor. This motor has a 30% longer stator stack than the Prius motor and a flux linkage about 10% higher, but has only 2/3 the resistance because it is bar wound. Therefore the motor constant is a full 35% higher!! I estimate this motor is good for about 10 Nm peak. This motor demands a bus voltage of at least 200 V for good performance.

This left choosing the battery, which proved to be slightly more challenging. On the electric bike, the perfect pack made of Greenworks packs sort of fell into my lap. The electric bike uses four 4Ah 40V packs in series for a total of 160V and 600Wh. For the moped, I needed to the pack to be substantially larger, because I would like longer range at higher sustained speeds on a larger and heavier vehicle. MIT’s FSAE team luckily had a few barely used 16S9P packs laying around, so I snapped them up. These packs had some weird issues where they would short to their aluminum cooling plates, so we used big force to rip the cooling plates off. Problem solved! My average power draw is low enough that these packs will not need any special cooling.

Ripping off what is left of the cooling plate.

The original plan was to run one of these packs and a 4x booster for a final bus voltage of 240V. This plan was not ideal however, as it is difficult to make a 4x doubler with high efficiency, and the pack was still kind of small at only 1.6kWh. After looking sadly at the rest of the FSAE batteries which were destined for the trash I eventually revised my plans to run 1.6 of these FSAE modules for a total of 26S 9P followed by just a doubler, for a final bus voltage of 200 V. While on the low end, this is fine for what I need. This plan required sawing a battery in half, but surprisingly this was accomplished without high difficulty.

At the moment, the project is more or less on hold due to coronavirus. I have been working on getting some little OLED screens working, which will eventually become the dashboard. These little OLED screens are nice, but take about 100 ms to write a whole frame buffer because they talk over I2C which is extremely slow. Getting these to work on a Nucleo and not bog everything down requires extensive utilization of the DMA, which has been a cool coding experience.

Stay tuned!


0

STM32G4: CORDIC vs sinf() for Motor Control


Posted on April 14, 2020 by admin

Today I decided to decided to turn on the CORDIC in my STM32G431 and compare its speed at calculating sines and cosines to the sinf() function in the math.h library. I also decided to evaluate the performance of how we can use the CORDIC to more quickly calculate the DQ transforms when doing motor control.

First up: evaluating sinf. We declare a float s and assign it to the value of sinf(theta). We measure the time it takes by scoping PA_7. Here is the code:

GPIOA->ODR |= GPIO_ODR_OD7;   // turn on PA7
float s = sinf(theta);
GPIOA->ODR &= ~GPIO_ODR_OD7;  // turn off PA7
WriteDACch1(theta/6.283185f);
WriteDACch2((s/2.0f) + 0.5f );

The results of this were interesting. sinf takes a surprisingly variable amount of time to execute, between 500 ns and almost 4 us.

The changing length actually caused some weird effects with my loop, causing it to overrun sometimes. So: sinf: 3.6 us.

Next up was to test the DQ transforms implemented with sinf. These transforms use some interesting trig identities to reduce the number of sines and cosines which have to be calculated. Here was the code:

GPIOA->ODR |= GPIO_ODR_OD7;
float c = cosf(theta);
float s = sinf(theta);
i_d = 0.667f*( c*i_a + ( 0.866f*s-.5f*c)*i_b + (-0.866f*s-.5f*c)*i_c);  
i_q = 0.667f*(-s*i_a - (-0.866f*c-.5f*s)*i_b - ( 0.866f*c-.5f*s)*i_c);
GPIOA->ODR &= ~GPIO_ODR_OD7;

Result: 8 us. Still quite variable, but at a minimum 3.5 us or so.

Next up was the non-optimized transforms. I decided to try these to see how bad they were. Additionally, if sines and cosines are free, it may actually be faster to calculate the transforms this way. Here was the code:

GPIOA->ODR |= GPIO_ODR_OD7;
i_d = (2.0f/3.0f)*(i_a*cosf(theta) + i_b*cosf(theta+2.094f) + i_c*cosf(theta-2.094f));
i_q = (2.0f/3.0f)*(i_a*sinf(theta) + i_b*sinf(theta+2.094f) + i_c*sinf(theta-2.094f));
GPIOA->ODR &= ~GPIO_ODR_OD7;

Result: still highly variable. The maximum time this transform took was 14 ms.

Next up: Turning on the CORDIC.

The CORDIC is actually incredibly simple to turn on, having only three registers in total: the control register, data in, and data out. However, making these registers work turned out to be incredibly, incredibly frustrating, mostly because I had never worked with q1.15 format before. But eventually the registers succumbed and the CORDIC gave reasonable answers. I chose to use the CORDIC in 16-bit mode, because that is precise enough for what I need to do. Additionally, 32-bit mode requires two sequential writes to the same register, which is weird.

Here was the code which made the CORDIC spit out reasonable results. This code took about 380 ns to execute, a full order of magnitude improvement from worst case time of the sinf function. Additionally, the CORDIC calculates sine and cosine simultaneously! Here is the code:

GPIOA->ODR |= GPIO_ODR_OD7;
volatile unsigned int cordicin = 0x7fff0000; 
rth = theta*10430;
cordicin += rth;
CORDIC->WDATA = cordicin;
unsigned int out0 = CORDIC->RDATA;
short out1 = (out0&0xffff0000)>>16;
short out3 = out0&0xffff;
float ss2 = (float)out3/32767.0f;
GPIOA->ODR &= ~GPIO_ODR_OD7;

It turns out that the CORDIC run itself and all the associated bit shifting between shorts takes only 200 ns, and that the additional 180 ns is spent converting between floats and shorts.

CORDIC calculating a cosine:

I packed it into a function. Sadly declaring the variables and running this function bumps the computation time up to about 0.8 us. For the highest speed, the best way would definitely be to keep everything in integer land, as the ADC readings are in integer form anyways. For maximum blazing speed this would definitely be the way to go. But floats are nice and easy to think about, so we’ll stick with them for now. Here is the code:

void RunCordic(float theta, float *cos_out, float *sin_out) {
	unsigned int cordicin = 0x7fff0000;  //  mag = 1
	short thetashort = theta*10430;       // wrap it
	cordicin += thetashort;
	CORDIC->WDATA = cordicin;
	unsigned int out0 = CORDIC->RDATA;
	short out2 = (out0&0xffff0000)>>16;
	short out1 = out0&0xffff;  //
	*cos_out = (float)out1/32768.0f;
	*sin_out = (float)out2/32768.0f;
}

For motor control purposes, three sines and cosines are necessary. The first version of the transforms uses some trig identities to require only one sine and one cosine, at the cost of needing more floating point multiplies and divides. The second version of the transforms requires three sines and three cosines, but this may be faster if sines and cosines are fast to calculate. It is ideal to calculate these together to save some time. I wrote a function to run the CORDIC three times with the correct phasing done in integer math, to save some more time. This function plus the declaration of float outputs takes only 2 us to run. I paired this function with the version of the transforms which uses three sines. I tested the full Park and Clark transform both in the loop, as they were originally and with the CORDIC code. The original park and clark transforms combined take a total of 9 us. This is pretty impressive as it means the entire Clark transform only takes 1 us. With the CORDIC, drumroll please….. 3.5 us!!

The final test I did was just using the CORDIC once and using the original optimized transforms as they were. It turned out that this was actually faster than the previous method, taking only about 2.8 us. I guess floats are actually just really fast to calculate. Yay! a reduction of 3x over the original transforms, a saving of 6 us of loop time. Pretty cool! With a 20 kHz loop, that’s 12%. Not a ton, but enough that its worth doing.

Cool pic of cosines and sines.

Happy G4’ing!!


0

Turning on the STM32G431RB


Posted on April 10, 2020 by admin

A while ago I bought some STM32G4 nucleos, with the intent to eventually switch to them for my motor control needs. The G4 series is ST’s newest line of micros, incorporating the rich set of analog peripherals of the F3 series with the clock speed of the F4 series. The G4s have on average more memory than the F3 series, but not as much as the F4s. Here are the datasheets of the three previous micros I have used extensively and some of their features:

STM32F303K8: 72 MHz, 64 Kb flash, 16 Kb RAM, dual 5 MSPS ADCs, Ultrafast Comparators, and some op amps. Nucleo in LQFP32 package. Easy to solder!

STM32F401RE: 84 MHz, 512 Kb flash, 96 Kb RAM, single 2.4 MSPS ADC. Nucleo in LQFP64 package.

STM32F446RE: 180 MHz, 512 Kb flash, 128 Kb RAM, three 2.4 MSPS ADCs, LOTS of communication interfaces. Nucleo in LQFP64 package.

And the newcomer: STM32G431RB: 170MHz, 128 KB Flash, 32 KB RAM, dual 4 MSPS ADCs with hardware oversampling, op amps, comparators, Math Accelerator. Here is the nucleo. It is a little different than the old nucleo, most importantly it lacks the break-off programmer we have become used to seeing.

The G4 has some interesting features which are definitely worth noting:

  • It has a math accelerator which consists of two modules: a filter accelerator and a CORDIC. Both are super useful for motor control. The FMAC can run FIR filters in hardware and the CORDIC can quickly calculate both a sine and a cosine quickly, at the same time, also in hardware!
  • Cool timer features such as timer dithering. I haven’t looked too deeply into this but it could provide some benefits for motor control when running at low duty cycles.
  • It also includes the op-amps and the comparators from the F303, which is cool. So far I haven’t used them for any motor control but I have used the comparators for a boost converter and the op amp for a school project.
  • It includes some resistors on the chip to charge the VBAT battery, which is pretty cool! Useful for data loggers.
  • It is available in the UFQFPN packages! These packages are both very small and surprisingly easy to solder. For my next motor controller, I will likely use the UFQFPN48 package. This will give me a bunch of extra pins to use for LEDs and overvoltage/overcurrent comparators. The UFQFPN48’s footprint of footprint 53 mm^2 is substantially smaller than the LQFP32’s 94 mm^2.

At the time of this writing, the F303K8 is $4.20, the F4446 is $7.28, and the G431CB is $4.75, all in quantity 1 on Digikey.

ONWARD- LETS TURN IT ON. Turning on this nucleo is actually quite annoying because it is not Mbed-compatible (yet). Which means that I had no idea what to do.

Bayley recommended this thing called “AC6” which I’d never heard of before. But apparently it is a thing that you can download from the interwebs, so I did. I also downloaded “STM32CubeIDE” as a backup. At the time all I knew about these programs was that they were compilers, but just also somehow had loads and loads of additional buttons as well for no apparent reason. So I plugged in my nucleo and loaded up an example program in AC6, and immediately just started getting random, weird errors:

????? I’m not even programming in java??? I got similarly cryptic errors in STM32CubeIDE. Eventually though I was able to make Cube work about half of the time, well enough to write some code. I was able to eventually was able to get the LED to blink using STM32CubeIDE in ‘Debug’ mode:

After enough head banging and trying random stuff I figured out that the problems with AC6 were coming from the launcher (whatever that is? who knows). It was building OK but not copying that onto the nucleo for whatever reason. I also figured out that I needed to make the IDE spit out a .bin file instead of the .elf file. I then could manually drag and drop said .bin onto the nucleo. I made a crappy “turn on the LED with registers” program andddd……

Nothing happened!

Back to CubeIDE… This is the classic Austin strategy for making firmware stuff work: try two approaches with different programs and just yell at try both alternately until one works. I loaded up the example program into CubeIDE and saw how they were configuring things, and selectively copied into AC6. Eventually I figured out some real dumb things.

1. the GPIO MODER registers initialize to all ones by default, not all zeros. So when I was trying to write them into 01 by just |= 0b01 they just stayed as 11 and didn’t work.

2. the line “RCC->AHB2ENR |= RCC_AHB2ENR_GPIOAEN” must be called twice. I think it actually just needs to be called once and then wait for one instruction but calling it twice just works.

And with that things worked!!! Yay. The LED turned on!

Next up was the clock configuration. There are a few settings which must be diddled before the clock speed can be increased. Here is my function:

FLASH->ACR |= FLASH_ACR_LATENCY_8WS;		// adjust number of wait states to 8
RCC->CFGR |= RCC_CFGR_HPRE_3;		// divide sysclk by 2 with AHB prescaler- for some reason this is necessary (page 191)
PWR->CR5 &= ~PWR_CR5_R1MODE;		// clear R1MODE
for (int i = 0; i < 20; i++) {GPIOA->ODR = 0x00;}  // wait
RCC->CFGR &= ~RCC_CFGR_HPRE_3;		// reset sysclk prescaler to 1
for (int i = 0; i < 20; i++) {GPIOA->ODR = 0x00;}  // wait

Then the PLL can be configured to use the HSE and run the core at that juicy 170 MHz:

// =======   configure PLL  =========
RCC->CR |= RCC_CR_HSEON;      // turn on HSE
while (((RCC->CR)&RCC_CR_HSERDY)==0) {}            //wait til HSE is ready
RCC->PLLCFGR |= RCC_PLLCFGR_PLLSRC_0 | RCC_PLLCFGR_PLLSRC_1;  // PLL source = HSE
RCC->PLLCFGR |= RCC_PLLCFGR_PLLM_0 | RCC_PLLCFGR_PLLM_2; // PLL input div (M) = 6
RCC->PLLCFGR |= 85<<RCC_PLLCFGR_PLLN_Pos;
RCC->PLLCFGR |= RCC_PLLCFGR_PLLREN;
RCC->PLLCFGR |= RCC_PLLCFGR_PLLPDIV_1;
RCC->CR |= RCC_CR_PLLON;      // turn on PLL
while (((RCC->CR)&RCC_CR_PLLRDY)==0) {}            //wait til PLL is ready


for (int i = 0; i < 20; i++) {GPIOA->ODR = 0x00;}  // wait
RCC->CFGR |= RCC_CFGR_SW_0 | RCC_CFGR_SW_1;      // use PLL as sysclk
for (int i = 0; i < 20; i++) {GPIOA->ODR = 0x00;}  // wait

All this had to be done with zero debugging at all because I am too stupid to figure out how to use the debugger. But luckily it worked on the first try. I checked this with the MCO output set to SYSCLK/16, by some miracle it output the expected 10.6 MHz. I’m not sure if the “wait by doing 20 GPIOA writes” are necessary but they can’t hurt.

I set up the serial which worked after some head scratching. Initially I set up the wrong serial but eventually found the right one which for some reason is the LPUART:

I tried to get printf working but it seemed very confusing. I decided to ignore it for now.

Next up was turning on TIM1 which was not too bad luckily:

Next up was turning on the FPU. For some reason this turned out to be unnecessarily hard. I spent about an afternoon/night yelling at the compiler calmly browsing the interwebs to find the problem. It turned out that I was missing a bunch of .c files and those were necessary to turn on the FPU. I am also to stupid to figure out how to include files properly, so I copied the entire pile of files straight into the src folder and that solved the problem!

With the proper .c files included I tested by making some random ramps and exponentials out the DAC. Turns out including the proper files makes it work.

Next up was doing something incredibly dumb, which I’ve wanted to do for a while: hardware in the loop boost converter simulator. You have one Nucleo think its driving a boost converter, and then you have another Nucleo pretending to be a boost converter. This enables testing without the fear of blowing up expensive hardware.

I’ve wanted to make a boost converter for a while, for charging my electric bike. It is hard to find a 160V charger lying around. A battery charger is just a power supply, which consists of two stages: an isolation stage and a CC/CV stage. The isolation stage is a pain to build because it requires weird optoisolators and stuff like that, and it is actually quite easy to find in the form of a fixed voltage output supply. I plan to just make the CC/CV stage, in the form of a boost converter. You could also have a power supply which outputs 300V followed by a buck converter, but I opted to go the boost route mostly because I found a 48 volt, 500W power supply lying around.

I also decided to include a bonus diode in my battery charger. This eliminates the problem of needing to precharge the charger, as it can be left outputting 0 volts until the battery is plugged in.

After some code writing, it worked! Yellow is simulated input gate drive signal, blue is simulated inductor current, and purple is simulators output capacitor voltage, which ripples due to simulated battery resistance. I was able to get the simulation running at 500 kHz.

TLDR: G4s are really cool- expect to see them on my next version of motor controllers. Stay tuned!


0

Coronavirus Project 1: Bus Voltage Controller


Posted on April 3, 2020 by admin

Its coronavirus season, which means NO MITERS. RIP. Last week I grabbed my sourdough starter out of the fridge and we shut miters down for the long winter.

Back at my apartment with no CNC mill, I’ve been up to exclusively software projects. It turns out my list of software “back burner” ideas is actually quite long, and the insideness has given me a fair amount of time to work on projects from this list. Notable items from this list:

  • Lookup table interpolator firmware for multiple bus voltages
  • Motor controller firmware re-organization because it is currently a garbage fire
  • Boost converter firmware
  • Boost converter board layout
  • OLED screen board layout

I decided to tackle lookup table interpolator first. For those unfamiliar, the lookup tables in question are used to generate the reference currents for IPM motors at a given speed and torque command. Effectively these lookup tables command a phase angle for the currents. It just seems to be easier to encode this information in D/Q currents instead of magnitude/phase although the information is the same.

Note that there is actually very little Q current on these tables at high speeds, even at high throttles- this table is for a battery voltage of 160V so we’re field weakening quite hard to get to such high ripems.

So the general rule of IPMs is that when you field weaken, you effectively end up in a constant-power regime. For greatest efficiency, you ideally want to field weaken as little as possible: field weakening creates torque at a lower Nm per amp, meaning that more losses are generated per unit torque. Additionally, to generate maximum power on a motor, you should use all the volts you have to generate as much V*I as possible. So, a precise lookup table is important for both power and efficiency. Ideally, the hypotenuse of Vd and Vq should exactly equal the bus voltage.

However, the motor equations are complicated:

Vd = i_d*Rs – w*Lq*i_q

Vq = i_q*Rs + w*Ld*i_d + w*flux_link

The equations alone don’t seem that bad at first, but it turns out that Ld, Lq, and the flux linkage are all functions of Id and Iq because saturation is a thing. Lq changes most dramatically, decreasing by a factor of about 20-40% at high currents.

What about just finding the lookup tables by dynoing? What if we swept over all current phases every single operating point, taking 5 seconds for every phase sweep?

  • 20 speeds
  • 20 throttle positions
  • 10 bus voltages
  • = a little under 14 days of continuous dynoing. Sufficient to say you should do something a little more intelligent.

Anyway, it is sufficient to say that generating the lookup tables is hard. More on that in a later post if I ever get around to it. Back to the main content of this post: what to do once you have the lookup table.

The three axis of the lookup table matrix are as follows: speed, throttle command, and bus voltage. Again, the goal of the lookup table is to generate current setpoints such that the hypotenuse of voltages present are exactly equal to the bus voltage.

I wrote a simple little motor simulator and put in some basic lookup tables. Here I command a slowly increasing throttle value with a bus voltage of 100V.

Here is the voltage magnitude. It increases with throttle until the field weakening kicks in, and then it maintains a level 100 volts, exactly as desired.

But what if instead of increasing throttle slowly, we just bang full throttle? Here is what happens if I clip all voltage magnitudes above 105 V, as they are not physically possible:

You can see that the voltage instantly rails at 105 because it takes volts to slew the inductive load. This causes the controllers to take a poop:

Note the D axis current and its great lack of tracking.

So, anyway, we need a new scheme. The fundamental issue here is that two controllers are not actually independent, as they are actually linked by the relationship of the magnitude of the D and Q voltages being equal to the bus voltage when under field weakening. If we rigidly fix the voltage magnitude, we have only one degree of freedom (the phase) to control the two outputs (the currents).

I tried a number of schemes where you only control one current and let the other be unregulated, but none gave really good performance. Most of the plots looked like this:

After enough messing I got something that was stable at some operating points. The strategy here was to only regulate Q and let D be whatever volts was left. The problem here is that this controller could effectively get “stuck” (as it does here at 20ms in). Even with all the leftover volts on D, it still cannot slew fast enough because the increasing Q current generates a voltage which decreases D current.

So, no luck here.

I thought about this for a few more days and eventually came up with the following bad scheme:

  • run a PI controller on both D and Q as with the naive implementation.
  • Implement a slew rate limit on the throttle command. This alone helps a ton.
  • Relax the voltage magnitude constraint a bit to allow for up to 7% overmodulation. This will decrease efficiency slightly while the throttle slews, but negligibly so.
  • Implement a “bus voltage fudge factor” with a PI controller. This is the bad part.

What is this nonsense of “bus voltage fudge factor” ??????

Its pretty simple- normally the lookup tables are generated for a bus voltage of whatever is actually attached. However, in this case I generate the lookup tables with a fudge factor added to the bus voltage. Kinda bad, but also not bad! Additionally, I use a gain scheduler to integrate the integrator much faster when nearing the limit of the overmodulation. Code:

V_bus_fudge_err = np.hypot(controller.v_d, controller.v_q) - V_bus_actual 

if (abs(V_bus_fudge_err) > 3.5):
    KI_BUSV = 0.5
else:
    KI_BUSV = 0.05

V_bus_fudge_int += V_bus_fudge_err*KI_BUSV
V_bus_fudge_int = np.clip(V_bus_fudge_int, 0, 10)
V_bus_fudge = V_bus_fudge_err*0.5 + V_bus_fudge_int
V_bus_fudge = np.clip(V_bus_fudge, 0, 10)
(controller.i_d_ref, controller.i_q_ref) = LUTmapper(thr2, 100 - V_bus_fudge)

This actually seemed to work pretty well. Using a fairly fast throttle slew of 0-100% throttle in 10 ms, the peak voltage reached was 105, and it only did so for one sample. The fudge factor is the red trace.

A D term on the fudge factor controller could improve this, but I believe there is too much noise for this to really work. 105 is good enough! The current controllers track well with this scheme.

This strategy is likely what I will move forward with on with future motor control adventures. Time to implement this in C++!!


0