How to use inline ASM using WinAVR

I have been working on optimisation of one of my C codes. I needed one function to be as optimal as possible. I decided to use inline ASM to achieve this. I decided to write few lines about this.

There are few rules that is necessary to follow. Each ASM statement is divided by colons into 3(up to four parts):

  1. Assembler instructions part;
  2. A list of output operands (comma separated);
  3. A list of input operands (comma separated);
  4. Clobbered register – usually left empty.

asm(code : output operand list : input operand list [: clobber list]);

Due to optimization strategy, compiler may decide which registers will be used for ASM code, or even it may decide not to use inserted inline ASM code. To avoid this it is recommended to use keyword volatile:

asm volatile(code : output operand list : input operand list [: clobber list]);

Lets go through it with some examples.

Let us say, we want to enable or disable global interrupts. The simple inline ASM sentence will do this:

asm volatile(“cli”::);

asm volatile(”sei”::);

empty command may be inserted like this:

asm volatile( “nop ;this is comment“ ”\n\t”

“nop ;this ASM inline includes 2 nops“ ”\n\t”

::);

Note: “\n\t” is used only for listing purposes- new line and tabbed commands.

When inserting inline ASM code to c program, there is possible to use some special register, that doesn’t have to be assigned to any variables:

Symbol Register
__SREG__ Status register at address 0x3F
__SP_H__ Stack pointer high byte at address 0x3E
__SP_L__ Stack pointer low byte at address 0x3D
__tmp_reg__ Register r0, used for temporary storage
__zero_reg__ Register r1, always zero

Input and output operands are described by a constraint string followed by C expression:

Constraint Used for Range
a Simple upper registers r16 to r23
b Base pointer registers pairs y, z
d Upper register r16 to r31
e Pointer register pairs x, y, z
G Floating point constant 0.0
I 6-bit positive integer constant 0 to 63
J 6-bit negative integer constant -63 to 0
K Integer constant 2
L Integer constant 0
l Lower registers r0 to r15
M 8-bit integer constant 0 to 255
N Integer constant -1
O Integer constant 8, 16, 24
P Integer constant 1
q Stack pointer register SPH:SPL
r Any register r0 to r31
t Temporary register r0
w Special upper register pairs r24, r26, r28, r30
x Pointer register pair X x (r27:r26)
y Pointer register pair Y y (r29:r28)
z Pointer register pair Z z (r31:r30)

The following table shows all assembler mnemonics which require operands and related constraints.

Mnemonic Constraints Mnemonic Constraints
adc r,r add r,r
adiw w,I and r,r
andi d,M asr r
bclr I bld r,I
brbc I,label brbs I,label
bset I bst r,I
cbi I,I cbr d,I
com r cp r,r
cpc r,r cpi d,M
cpse r,r dec r
elpm t,z eor r,r
in r,I inc r
ld r,e ldd r,b
ldi d,M lds r,label
lpm t,z lsl r
lsr r mov r,r
movw r,r mul r,r
neg r or r,r
ori d,M out I,r
pop r push r
rol r ror r
sbc r,r sbci d,M
sbi I,I sbic I,I
sbiw w,I sbr d,M
sbrc r,I sbrs r,I
ser d st e,r
std b,r sts label,r
sub r,r subi d,M
swap r

Constraint characters may be prepended by a single constraint modifier. Contraints without a modifier specify read-only operands. Modifiers are:

Modifier Specifies
= Write-only operand, usually used for all output operands.
+ Read-write operand (not supported by inline assembler)
& Register should be used for output only

Note: Output operands always must be write-only.

Input operand doesn’t have to be read-only, for instance if you need same register for input and output. Then you may use digit in the constraint string:

asm volatile("swap %0" : "=r" (value) : "0" (value));

Constraint “0” tells compiler to use a register with number 0 (%0).

Lets look at the other example:

asm volatile("in %0,%1"    "\n\t"
             "out %1, %2"  "\n\t" 
             : "=&r" (input) 
             : "I" (_SFR_IO_ADDR(PORTD)), "r" (output)
            );

Lets take a look at first line “in %0,%1”. The operand %0 is replaced with register where is input value stored. Register is write only and it is used for output oly(& modifier). The operand %1 is replaced with “I” (_SFR_IO_ADDR(PORTD)) which respond as PORTD address.

Note: IO register has to be always input operand.

The second line of ASM code is similar. Just %2 operand is tied to any register from range (r0 to r31).

What if wee need to pass 32 bit value to inline ASM? Then there are ability to use different letters, which refer to different 8 bit registers:

uint32_t value=0xffffffff;
asm volatile("mov __tmp_reg__, %A0" "\n\t"
             "mov %A0, %D0"         "\n\t"
             "mov %D0, __tmp_reg__" "\n\t"
             "mov __tmp_reg__, %B0" "\n\t"
             "mov %B0, %C0"         "\n\t"
             "mov %C0, __tmp_reg__" "\n\t"
             : "=r" (value)
             : "0" (value)
            );

%A0 is lowest byte of 32 bit value and %D0 is the highest byte. And then all operations are made with these bytes separately. And then can be return as 32bit output parameter by using number as modifier (“0” in this example).

The last thing I would like to cover is pointers. The input parameter can be defined like:

:”e” (ptr)

Then compiler selects registter z(r30:r31). Then:

%A0 refers to r30

%B0 refers to r31

But if you need to point to address location with address stored in Z register like

ld r24, Z

then you need to use variable with lower case letter like:

ld r24, %a0

Few words about Clobbers. Clobbers are necessary when you are using registers which has not been passed as operands, you need to inform the compiler. For instance:

asm volatile(
    "cli"               "\n\t"
    "ld r24, %a0"       "\n\t"
    "inc r24"           "\n\t"
    "st %a0, r24"       "\n\t"
    "sei"               "\n\t"
    :
    : "e" (ptr)
    : "r24"
);

In this example we are using r24 register. The compiler produces the following code fragment in listing:

    cli
    ld r24, Z
    inc r24
    st Z, r24
    sei

another clobber definition may be “memory”, which means that assembler may modify any memory location. But it forces compiler to update all variables before executing the ASM code. Try not to use clobbers it it is possible, because this gives more freedom to compiler to optimize the code.

If you need to reuse some assembler parts more than one time it is recommended to define macros. In AVRLibc you may find many of them. To avoid compiler warnings use __asm__ instead of asm and __volatile__ instead of volatile. Other options re same as in regular inline assembler:

#define loop_until_bit_is_clear(port,bit)  
        __asm__ __volatile__ (             
        "1: " "sbic %0, %1" "\n\t"      
                 "rjmp 1b"               
                 : /* no outputs */        
                 : "I" (_SFR_IO_ADDR(port)),  
                   "I" (bit)    
        )

For my AVR controlled generator I wrote a stub function (the function contains nothing but assembler code). For larger routines it is better to make those stub functions because using macro asm routines may be painful because of code size which is inserted (not called) when macro is called. My stub function for AVR DDS generator:

void signalOUT(const uint8_t *signal, uint8_t ad2, uint8_t ad1, uint8_t ad0)

{

asm volatile( “eor r28, r28 ;r28<-0” “\n\t”

“eor r29, r29 ;r29<-0” “\n\t”

“Loop1:” “\n\t”

“add r28, %0 ;1 cycle” “\n\t”

“adc r29, %1 ;1 cycle” “\n\t”

“adc %A0, %2 ;1 cycle” “\n\t”

“lpm __tmp_reg__, %a3+ ;3 cycles” “\n\t”

“out %4, __tmp_reg__ ;1 cycle” “\n\t”

“rjmp Loop1 ;2 cycles. Total 9 cycles” “\n\t”

:

:”r” (ad0),”r” (ad1),”r” (ad2),”e” (signal),”I” (_SFR_IO_ADDR(PORTD))

:”r28″, “r29”

);

}

lister output fragment:

1768 /* #APP */

1769 00f6 CC27 eor r28, r28 ;r28<-0

1770 00f8 DD27 eor r29, r29 ;r29<-0

1771 Loop1:

1772 00fa C20F add r28, r18 ;1 cycle

1773 00fc D41F adc r29, r20 ;1 cycle

1774 00fe 261F adc r18, r22 ;1 cycle

1775 0100 0590 lpm __tmp_reg__, Z+ ;3 cycles

1776 0102 02BA out 18, __tmp_reg__ ;1 cycle

1777 0104 FACF rjmp Loop1 ;2 cycles. Total 9 cycles

1778

1779 /* #NOAPP */

Note: /* #APP */ and /* #NOAPP */ comments are generated by a compiler to show which sentences were not generated by compiler (inline ASM).

I wanted to make Loop part to be as small as possible. So I managed to use 9 clocks per cycle. The code fragment is from https://www.myplace.nu/avr/minidds/minidds.asm

In other hand it will be easier to calculate signal timings because the inline asm is not affected by compiler optimisation.

Read more about using inline asm using WinAVR from https://www.nongnu.org/avr-libc/user-manual/inline_asm.html

8 Comments:

  1. why is the line of font so tiny? e.g.
    asm(code : output operand list : input operand list [: clobber list]);
    next line of tiny font
    asm volatile(code : output operand list : input operand list [: clobber list]);

    yet the font on the reast of the page is normal size readable font

  2. I agree about font mess. this is somehow related with WYSIWYG editor. Ill try to fix this issue. Thanks 😉

  3. GCC complains when I try to use “ldd”
    instruction. Can you give an example of proper use
    of ldd ?

  4. never mind, it works

  5. What encoding did you use to write this article? I have unreadable characters inside brackets in volitile() command. Very interesting article, but you have feeling like a kid who is looking into a candy store through the window. You can see it, but you cannot have it.

    Could you please to send me this article in some more compatible format like *.pdf? Please, do not use *.doc. I have a bad experience not being able to read a text written on another computer.

    Thank you.

    Konstantin

  6. Seems that older articles were corrupted earlier during some major website upgrade. Will try to fix the article. Thanks for letting know. If you find more – just drop a comment. Thanks

  7. The article has been fixed. Sorry for this inconvenience.

  8. Thanks a lot dude, found your tips very useful in getting around issues caused by the automated optimisation in the GCC toolchain.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.