colinux and Blackfin
Linux assemblers: A comparison of GAS and NASM
Linux assemblers: A comparison of GAS and NASM
A side-by-side look at GNU Assembler (GAS) and Netwide Assembler (NASM)
17 Oct 2007
This article explains some of the more important syntactic and semantic differences between two of the most popular assemblers for Linux®, GNU Assembler (GAS) and Netwide Assembler (NASM), including differences in basic syntax, variables and memory access, macro handling, functions and external routines, stack handling, and techniques for easily repeating blocks of code.
Unlike other languages, assembly programming involves understanding the processor architecture of the machine that is being programmed. Assembly programs are not at all portable and are often cumbersome to maintain and understand, and can often contain a large number of lines of code. But with these limitations comes the advantage of speed and size of the runtime binary that executes on that machine.
Though much information is already available on assembly level programming on Linux, this article aims to more specifically show the differences between syntaxes in a way that will help you more easily convert from one flavor of assembly to the another. The article evolved from my own quest to improve at this conversion.
This article uses a series of program examples. Each program illustrates some feature and is followed by a discussion and comparison of the syntaxes. Although it's not possible to cover every difference that exists between NASM and GAS, I do try to cover the main points and provide a foundation for further investigation. And for those already familiar with both NASM and GAS, you might still find something useful here, such as macros.
This article assumes you have at least a basic understanding of assembly terminology and have programmed with an assembler using Intel® syntax, perhaps using NASM on Linux or Windows. This article does not teach how to type code into an editor or how to assemble and link (but see the sidebar for a quick refresher). You should be familiar with the Linux operating system (any Linux distribution will do; I used Red Hat and Slackware) and basic GNU tools such as gcc and ld, and you should be programming on an x86 machine.
Now I'll describe what this article does and does not cover.
This article covers:
- Basic syntactical differences between NASM and GAS
- Common assembly level constructs such as variables, loops, labels, and macros
- A bit about calling external C routines and using functions
- Assembly mnemonic differences and usage
- Memory addressing methods
This article does not cover:
- The processor instruction set
- Various forms of macros and other constructs particular to an assembler
- Assembler directives peculiar to either NASM or GAS
- Features that are not commonly used or are found only in one assembler but not in the other
For more information, refer to the official assembler manuals (see Resources for links), as those are the most complete sources of information.
Listing 1 shows a very simple program that simply exits with an exit code of 2. This little program describes the basic structure of an assembly program for both GAS and NASM.
Now for a bit of explanation.
One of the biggest differences between NASM and GAS is the syntax. GAS uses the AT&T syntax, a relatively archaic syntax that is specific to GAS and some older assemblers, whereas NASM uses the Intel syntax, supported by a majority of assemblers such as TASM and MASM. (Modern versions of GAS do support a directive called
.intel_syntax, which allows the use of Intel syntax with GAS.)
The following are some of the major differences summarized from the GAS manual:
- AT&T and Intel syntax use the opposite order for source and destination operands. For example:
mov eax, 4
movl $4, %eax
- In AT&T syntax, immediate operands are preceded by
$; in Intel syntax, immediate operands are not. For example:
- In AT&T syntax, register operands are preceded by
%; in Intel syntax, they are not.
- In AT&T syntax, the size of memory operands is determined from the last character of the opcode name. Opcode suffixes of
lspecify byte (8-bit), word (16-bit), and long (32-bit) memory references. Intel syntax accomplishes this by prefixing memory operands (not the opcodes themselves) with
word ptr, and
dword ptr. Thus:
mov al, byte ptr foo
movb foo, %al
- Immediate form long jumps and calls are
lcall/ljmp $section, $offsetin AT&T syntax; the Intel syntax is
call/jmp far section:offset. The far return instruction is
lret $stack-adjustin AT&T syntax, whereas Intel uses
ret far stack-adjust.
In both the assemblers, the names of registers remain the same, but the syntax for using them is different as is the syntax for addressing modes. In addition, assembler directives in GAS begin with a ".", but not in NASM.
.text section is where the processor begins code execution. The
.global in GAS) keyword is used to make a symbol visible to the linker and available to other linking object modules. On the NASM side of Listing 1,
global _start marks the symbol
_start as a visible identifier so the linker knows where to jump into the program and begin execution. As with NASM, GAS looks for this
_start label as the default entry point of a program. A label always ends with a colon in both GAS and NASM.
Interrupts are a way to inform the OS that its services are required. The
int instruction in line 16 does this job in our program. Both GAS and NASM use the same mnemonic for interrupts. GAS uses the
0x prefix to specify a hex number, whereas NASM uses the
h suffix. Because immediate operands are prefixed with
$ in GAS, 80 hex is
int $0x80 (or
80h in NASM) is used to invoke Linux and request a service. The service code is present in the EAX register. A value of 1 (for the Linux exit system call) is stored in EAX to request that the program exit. Register EBX contains the exit code (2, in our case), a number that is returned to the OS. (You can track this number by typing
echo $? at the command prompt.)
Finally, a word about comments. GAS supports both C style (
/* */), C++ style (
//), and shell style (
#) comments. NASM supports single-line comments that begin with the ";" character.
Variables and accessing memory
This section begins with an example program that finds the largest of three numbers.
You can see several differences above in the declaration of memory variables. NASM uses the
db directives to declare 32-, 16-, and 8-bit numbers, respectively, whereas GAS uses the
.byte for the same purpose. GAS has other directives, too, such as
.string. In GAS, you declare variables just like other labels (using a colon), but in NASM you simply type a variable name (without the colon) before the memory allocation directive (
dw, etc.), followed by the value of the variable.
Line 18 in Listing 2 illustrates the memory indirect addressing mode. NASM uses square brackets to dereference the value at the address pointed to by a memory location:
[var1]. GAS uses a circular brace to dereference the same value:
(var1). The use of other addressing modes is covered later in this article.
Listing 3 illustrates the concepts of this section; it accepts the user's name as input and returns a greeting.
The heading for this section promises a discussion of macros, and both NASM and GAS certainly support them. But before we get into macros, a few other features are worth comparing.
Listing 3 illustrates the concept of uninitialized memory, defined using the
.bss section directive (line 14). BSS stands for "block storage segment" (originally a block was started by a symbol), and the memory reserved in the BSS section is initialized to zero during the start of the program. Objects in the BSS section have only a name and a size, and no value. Variables declared in the BSS section don't actually take space, unlike in the data segment.
NASM uses the
resd keywords to allocated byte, word, and dword space in the BSS section. GAS, on the other hand, uses the
.lcomm keyword to allocate byte-level space. Notice the way the variable name is declared in both versions of the program. In NASM the variable name precedes the
resd) keyword, followed by the amount of space to be reserved, whereas in GAS the variable name follows the
.lcomm keyword, which is then followed by a comma and then the amount of space to be reserved. This shows the difference:
varname resb size
.lcomm varname, size
Listing 3 also introduces the concept of a location counter (line 6). NASM provides a special variable (the
$$ variables) to manipulate the location counter. In GAS, there is no method to manipulate the location counter and you have to use labels to calculate the next storage location (data, instruction, etc.).
For example, to calculate the length of a string, you would use the following idiom in NASM:
prompt_str db 'Enter your name: '
STR_SIZE equ $ - prompt_str ; $ is the location counter
$ gives the current value of the location counter, and subtracting the value of the label (all variable names are labels) from this location counter gives the number of bytes present between the declaration of the label and the current location. The
equ directive is used to set the value of the variable STR_SIZE to the expression following it. A similar idiom in GAS looks like this:
.ascii "Enter Your Name: "
.set STR_SIZE, pstr_end - prompt_str
The end label (
pstr_end) gives the next location address, and subtracting the starting label address gives the size. Also note the use of
.set to initialize the value of the variable STR_SIZE to the expression following the comma. A corresponding
.equ can also be used. There is no alternative to GAS's
set directive in NASM.
As I mentioned, Listing 3 uses macros (line 21). Different macro techniques exist in NASM and GAS, including single-line macros and macro overloading, but I only deal with the basic type here. A common use of macros in assembly is clarity. Instead of typing the same piece of code again and again, you can create reusable macros that both avoid this repetition and enhance the look and readability of the code by reducing clutter.
NASM users might be familiar with declaring macros using the
%beginmacro directive and ending them with an
%endmacro directive. A
%beginmacro directive is followed by the macro name. After the macro name comes a count, the number of macro arguments the macro is supposed to have. In NASM, macro arguments are numbered sequentially starting with 1. That is, the first argument to a macro is %1, the second is %2, the third is %3, and so on. For example:
%beginmacro macroname 2
mov eax, %1
mov ebx, %2
This creates a macro with two arguments, the first being
%1 and the second being
%2. Thus, a call to the above macro would look something like this:
macroname 5, 6
Macros can also be created without arguments, in which case they don't specify any number.
Now let's take a look at how GAS uses macros. GAS provides the
.endm directives to create macros. A
.macro directive is followed by a macro name, which may or may not have arguments. In GAS, macro arguments are given by name. For example:
.macro macroname arg1, arg2
movl \arg1, %eax
movl \arg2, %ebx
A backslash precedes the name of each argument of the macro when the name is actually used inside a macro. If this is not done, the linker would treat the names as labels rather then as arguments and will report an error.
Functions, external routines, and the stack
The example program for this section implements a selection sort on an array of integers.
Listing 4 might look overwhelming at first, but in fact it's very simple. The listing introduces the concept of functions, various memory addressing schemes, the stack and the use of a library function. The program sorts an array of 10 numbers and uses the external C library functions
printf to print out the entire contents of the unsorted and sorted array. For modularity and to introduce the concept of functions, the sort routine itself is implemented as a separate procedure along with the array print routine. Let's deal with them one by one.
After the data declarations, the program execution begins with a call to
puts (line 31). The
puts function displays a string on the console. Its only argument is the address of the string to be displayed, which is passed on to it by pushing the address of the string in the stack (line 30).
In NASM, any label that is not part of our program and needs to be resolved during link time must be predefined, which is the function of the
extern keyword (line 24). GAS doesn't have such requirements. After this, the address of the string
usort_str is pushed onto the stack (line 30). In NASM, a memory variable such as
usort_str represents the address of the memory location itself, and thus a call such as
push usort_str actually pushes the address on top of the stack. In GAS, on the other hand, the variable
usort_str must be prefixed with
$, so that it is treated as an immediate address. If it's not prefixed with
$, the actual bytes represented by the memory variable are pushed onto the stack instead of the address.
Since pushing a variable essentially moves the stack pointer by a dword, the stack pointer is adjusted by adding 4 (the size of a dword) to it (line 32).
Three arguments are now pushed onto the stack, and the
print_array10 function is called (line 37). Functions are declared the same way in both NASM and GAS. They are nothing but labels, which are invoked using the
After a function call, ESP represents the top of the stack. A value of
esp + 4 represents the return address, and a value of
esp + 8 represents the first argument to the function. All subsequent arguments are accessed by adding the size of a dword variable to the stack pointer (that is,
esp + 12,
esp + 16, and so on).
Once inside a function, a local stack frame is created by copying
ebp (line 62). You can also allocate space for local variables as is done in the program (line 63). You do this by subtracting the number of bytes required from
esp. A value of
esp – 4 represents a space of 4 bytes allocated for a local variable, and this can continue as long as there is enough space in the stack to accommodate your local variables.
Listing 4 illustrates the base indirect addressing mode (line 64), so called because you start with a base address and add an offset to it to arrive at a final address. On the NASM side of the listing,
[ebp + 8] is one such example, as is
[ebp – 4] (line 71). In GAS, the addressing is a bit more terse:
print_array10 routine, you can see another kind of addressing mode being used after the
push_loop label (line 74). The line is represented in NASM and GAS, respectively, like so:
mov al, byte [ebx + esi]
movb (%ebx, %esi, 1), %al
This addressing mode is the base indexed addressing mode. Here, there are three entities: one is the base address, the second is the index register, and the third is the multiplier. Because it's not possible to determine the number of bytes to be accessed from a memory location, a method is needed to find out the amount of memory addressed. NASM uses the byte operator to tell the assembler that a byte of data is to be moved. In GAS the same problem is solved by using a multiplier as well as using the
l suffix in the mnemonic (for example,
movb). The syntax of GAS can seem somewhat complex when first encountered.
The general form of base indexed addressing in GAS is as follows:
%segment:ADDRESS (, index, multiplier)
%segment:(offset, index, multiplier)
%segment:ADDRESS(base, index, multiplier)
The final address is calculated using this formula:
ADDRESS or offset + base + index * multiplier.
Thus, to access a byte, a multiplier of 1 is used, for a word, 2, and for a dword, 4. Of course, NASM uses a simpler syntax. Thus, the above in NASM would be represented like so:
Segment:[ADDRESS or offset + index * multiplier]
A prefix of
dword is used before this memory address to access 1, 2, or 4 bytes of memory, respectively.
Listing 5 reads a list of command line arguments, stores them in memory, and then prints them.
Listing 5 shows a construct that repeats instructions in assembly. Naturally enough, it's called the repeat construct. In GAS, the repeat construct is started using the
.rept directive (line 6). This directive has to be closed using an
.endr directive (line 8).
.rept is followed by a count in GAS that specifies the number of times the expression enclosed inside the
.rept/.endr construct is to be repeated. Any instruction placed inside this construct is equivalent to writing that instruction
count number of times, each on a separate line.
For example, for a count of 3:
movl $2, %eax
This is equivalent to:
movl $2, %eax
movl $2, %eax
movl $2, %eax
In NASM, a similar construct is used at the preprocessor level. It begins with the
%rep directive and ends with
%rep directive is followed by an expression (unlike in GAS where the
.rept directive is followed by a count):
There is also an alternative in NASM, the
times directive. Similar to
%rep, it works at the assembler level, and it, too, is followed by an expression. For example, the above
%rep construct is equivalent to this:
mov eax, 2
is equivalent to this:
times 3 mov eax, 2
and both are equivalent to this:
mov eax, 2
mov eax, 2
mov eax, 2
In Listing 5, the
%rep) directive is used to create a memory data area for 10 double words. The command line arguments are then accessed one by one from the stack and stored in the memory area until the command table gets full.
As for command line arguments, they are accessed similarly with both assemblers. ESP or the top of the stack stores the number of command line arguments supplied to a program, which is 1 by default (for no command line arguments).
esp + 4 stores the first command line argument, which is always the name of the program that was invoked from the command line.
esp + 8,
esp + 12, and so on store subsequent command line arguments.
Also watch the way the memory command table is being accessed on both sides in Listing 5. Here, memory indirect addressing mode (line 33) is used to access the command table along with an offset in ESI (and EDI) and a multiplier. Thus,
[cmd_tbl + esi * 4] in NASM is equal to
cmd_tbl(, %esi, 4) in GAS.
Even though the differences between these two assemblers are substantial, it's not that difficult to convert from one form to another. You might find that the AT&T syntax seems at first difficult to understand, but once mastered, it's as simple as the Intel syntax.
- Consult the NASM and GAS manuals for complete introductions to these two assemblers:
- Read this Wikipedia entry for an explanation of selection sort.
- In the developerWorks Linux zone, find more resources for Linux developers, and scan our most popular articles and tutorials.
- See all Linux tips and Linux tutorials on developerWorks.
- Stay current with developerWorks technical events and Webcasts.
Get products and technologies
- Order the SEK for Linux, a two-DVD set containing the latest IBM trial software for Linux from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.
- With IBM trial software, available for download directly from developerWorks, build your next development project on Linux.
- Get involved in the developerWorks community through blogs, forums, podcasts, and community topics in our new developerWorks spaces.
Ram holds a post graduate degree in computer science and is working as a software engineer in IBM's India Software Labs in the Rational Division, developing and adding features to Rational Clearcase. He has worked on various flavours of Linux/UNIX and Windows, along with real-time mobile-based operating systems like Symbian and Windows mobile. In his spare time, he hacks Linux and reads books.
change timezone in a linux system
one can observe this file by
# zdump /etc/localtime
to change the default timezone, one can do
1. keep a copy of origirnal one
# cd /etc
# mv localtime localtime.orig
2. select the proper zone file. e.g.
# cp /usr/share/zoneinfo/Asia/Taipei /etc/localtime
now, the default timezone is changed.
software upgrade/install in fedora
the fedora server may be too slow for upgrade.
it is possible to guide yum to use the mirror host in fast-speed.
a. in /etc/yum.repos.d, create a file, says local-core.mirror, and its content as
# local mirrorlist for Taiwan
b. in /etc/yum.repos.d, modify fedora-core.repo, fedora-extras.repo and fedora-updates.repo,
replace original 'mirrorlist=' line to
now, yum will use the host listed in local-core.mirror for upgrade/install.
2. how to clean up the broken download head/files etc ?
# yum clean headers
# yum clean metadata
# yum clean dbcache
# yum clean cache
3. my fedora code doesn't include compiler tools, how to install it ?
# yum install gcc
colinux, Xming, X11, Forwarding
手邊只有一台 Notebook, 但只能用來跑 Windows XP, 不能重裝, 只好 裝 colinux 搭配 Xming 跑 X-Window 的程式.
一, 安裝 colinux
首先到 www.colinux.org 的網站 download colinux-0.6.4 版本.
colinux 的download 網頁為 SF.net > Projects > Cooperative Linux > Files
1. colinux 主程式的安裝
執行 coLinux-0.6.4.exe, 安裝到 %ProgramFiles%\colinux 目錄下.
再將 colinux-0.6.4-20060912-update.zip 中的 執行檔, 解壓縮到 %ProgramFiles%\colinux
2. 安裝 FedoraCore 6
並解壓 縮到 %ProgramFiles%\colinux
3. 建立 啟動參數 檔
<?xml version="1.0" encoding="UTF-8"?>
<block_device index="0" alias=hda1 path="\DosDevices\c:\Program Files\coLinux\FedoraCore6.img" enabled="true" />
<block_device index="2" alias=hda3 path="\DosDevices\c:\Program Files\coLinux\fc5-2gb-1.ext3" enabled="true" />
<block_device index="1" alias=hda2 path="\DosDevices\c:\Program Files\coLinux\swap.img" enabled="true" />
<block_device index="29" path="\DosDevices\c:\Program Files\coLinux\FC-6-i386-DVD.iso" enabled="true" />
<block_device index="29" path="\DosDevices\c:\Program Files\coLinux\102p.iso" enabled="true" />
<block_device index="2" path="\DosDevices\c:\Program Files\coLinux\slack102_rootdisk.img" enabled="true" />
<bootparams>ro root=/dev/hda1 fastboot nogui</bootparams>
<image path="vmlinux" />
<memory size="128" />
<network index="0" type="tap"/>
<network index="1" type="bridged" name="LAN" />
a. 將 "區域連線"改名成 "區域連線(LAN)".
b. 設定 TAP-Win32 Adaptor V8 (coLinux), 將 TCP/IP 內容中的 IP 地址設為 192.168.2.1
5. 設定 colinux 為 service
切換目錄到 %ProgramFiles%\colinux, 並執行
colinux-daemon --install-service fc6 -c fc6.xml
net start fc6 可以啟動 colinux
net stop fc6 可以 shutdown colinux
6. colinux 端的網路設定
切換目錄到 %ProgramFiles%\colinux, 並執行 colinux-console-nt.exe 即可進入 fedora 6.
a. 設定 eth0 為 192.168.2.2
b. 設定 eth1 為 所在 lan 的地址.
二, 安裝 Xming
Xming, 這是一套在 Windows 上的 X Server (不需要 Cygwin)
1. 下載 Xming
三, 使用 X11
1. 在 Fedora 上的設定, 確定 /etc/ssh/sshd_config 中 有設定
2. 在 %ProgramFiles%\Xming 目錄下, 執行
plink -ssh -pw root -X firstname.lastname@example.org xterm
也可以將 以上指令, 放在 %ProgramFiles%\Xming\Xmingrc 內.
例如, 在 menu apps 中, 加入以下內容
colinux exec "plink -ssh -pw root -X email@example.com xterm"
滑鼠右鍵點選 icontray Xming, 就可選擇 Applications -> colinux
plink 的使用說明 (這裡)
PuTTY Link: command-line connection utility
Usage: plink [options] [user@]host [command]
("host" can also be a PuTTY saved session name)
-V print version information and exit
-pgpfp print PGP key fingerprints and exit
-v show verbose messages
-load sessname Load settings from saved session
-ssh -telnet -rlogin -raw
force use of a particular protocol
-P port connect to specified port
-l user connect with specified username
-batch disable all interactive prompts
The following options only apply to SSH connections:
-pw passw login with specified password
Dynamic SOCKS-based port forwarding
Forward local port to remote address
Forward remote port to local address
-X -x enable / disable X11 forwarding
-A -a enable / disable agent forwarding
-t -T enable / disable pty allocation
-1 -2 force use of particular protocol version
-4 -6 force use of IPv4 or IPv6
-C enable compression
-i key private key file for authentication
-noagent disable use of Pageant
-agent enable use of Pageant
-m file read remote command(s) from file
-s remote command is an SSH subsystem (SSH-2 only)
-N don't start a shell/command (SSH-2 only)
open tunnel in place of session (SSH-2 only)
| Prototype || Dojo || Mochikit || Yahoo! || Google ||JQuery|
| Drag n Drop || || || || || || |
| Basic Visual Effects || || || || || || |
| Advanced Visual FX || || || || || |
|Java integration|| |
| Event handling || || || || || || |
| Back button support with Ajax || || |
|Rated Features (0-4 stars)|
|Minimal Learning Curve|
| Ease of use (API) |
|Widget Collection (useful or not)|
| Refined UI effect examples |
|Filesize Range (KB)||46-137||18-276||5-113||2-300||10-44|
| Licensing || MIT || AFL / BSD || MIT/AFL || BSD || Apache ** || MIT |
|More Info||Prototype JS Library||Dojo JS Toolkit||Mochikit JS Toolkit||Yahoo UI Library||Google Web Toolkit||JQuery JS Library|
Other JS libraries not evaluated here:
- Zimbra Ajax TK (Kabuki)
* I will assume that the terms library, toolkit, and framework are inter-changeable. This may merit its own discussion page, just not here.
Google Web Toolkit (Archived Portal Exploration)
JQuery JS Library (Archived Portal Exploration)
Mochikit JS Toolkit (Archived Portal Exploration)
Prototype JS Library (Archived Portal Exploration)
Yahoo UI Library (Archived Portal Exploration)