The Art of
ASSEMBLY LANGUAGE PROGRAMMING

Chapter Sixteen (Part 7)

Table of Content

Chapter Sixteen (Part 9) 

CHAPTER SIXTEEN:
PATTERN MATCHING (Part 8)
16.8 - Some Sample Pattern Matching Applications
16.8.1 - Converting Written Numbers to Integers
16.8 Some Sample Pattern Matching Applications

The best way to learn how to convert a pattern matching problem to the respective pattern matching algorithms is by example. The following sections provide several examples of some small pattern matching problems and their solutions.

16.8.1 Converting Written Numbers to Integers

One interesting pattern matching problem is to convert written (English) numbers to their integer equivalents. For example take the string "one hundred ninety-two" and convert it to the integer 192. Although written numbers represent a pattern quite a bit more complex than the ones we've seen thus far a little study will show that it is easy to decompose such strings.

The first thing we will need to do is enumerate the English words we will need to process written numbers. This includes the following words:

zero one two three four five six seven eight nine ten eleven twelve thirteen fourteen fifteen sixteen seventeen eighteen nineteen twenty thirty forty fifty sixty seventy eighty ninety hundred and thousand.

With this set of words we can build all the values between zero and 65 535 (the values we can represent in a 16 bit integer.

Next we've got to decide how to put these words together to form all the values between zero and 65 535. The first thing to note is that zero only occurs by itself it is never part of another number. So our first production takes the form:

Number   zero | NonZero

The next thing to note is that certain values may occur in pairs denoting addition. For example eighty-five denotes the sum of eighty plus five. Also note that certain other pairs denote multiplication. If you have a statement like "two hundred" or "fifteen hundred" the "hundred" word says multiply the preceding value by 100. The multiplicative words "hundred" and "thousand" are also additive. Any value following these terms is added in to the total; e.g. "one hundred five" means 1*100+5. By combining the appropriate rules we obtain the following grammar

NonZero Thousands Maybe100s | Hundreds
Thousands Under100 thousand
Maybe100s Hundreds |
Hundreds Under100 hundred After100 | Under100
After100 Under100 |
Under100 Tens Maybe1s| Teens | ones
Maybe1s Ones |
ones one | two | three | four | five | six | seven | eight | nine
teens ten | eleven | twelve | thirteen | fourteen | fifteen | sixteen | seventeen | eighteen | nineteen
tens twenty | thirty | forty | fifty | sixty | seventy | eighty | ninety

The final step is to add semantic actions to actually convert the strings matched by this grammar to integer values. The basic idea is to initialize an accumulator value to zero. Whenever you encounter one of the strings that ones teens or tens matches you add the corresponding value to the accumulator. If you encounter the hundred or thousand strings you multiply the accumulator by the appropriate factor. The complete program to do the conversion follows:

; Numbers.asm
;
; This program converts written English numbers in the range "zero"
; to "sixty five thousand five hundred thirty five" to the corresponding
; integer value.

.xlist
include         stdlib.a
includelib      stdlib.lib
matchfuncs
.list


dseg            segment para public 'data'

Value           word    0                       ;Store results here.
HundredsVal     word    0
ThousandsVal    word    0


Str0            byte    "twenty one"
0
Str1            byte    "nineteen hundred thirty-five"
0
Str2            byte    "thirty three thousand two hundred nineteen"
0
Str3            byte    "three"
0
Str4            byte    "fourteen"
0
Str5            byte    "fifty two"
0
Str6            byte    "seven hundred"
0
Str7            byte    "two thousand seven"
0
Str8            byte    "four thousand ninety six"
0
Str9            byte    "five hundred twelve"
0
Str10           byte    "twenty three thousand two hundred ninety-five"
0
Str11           byte    "seventy-five hundred"
0
Str12           byte    "sixty-five thousand"
0
Str13           byte    "one thousand"
0


; The following grammar is what we use to process the numbers.
; Semantic actions appear in the braces.
;
; Note: begin by initializing Value
HundredsVal
and ThousandsVal to zero.
;
; N             -> separators zero
;               | N4
;
; N4            -> do1000s maybe100s
;               | do100s
;
; Maybe100s     -> do100s
;               | <empty string>
;
; do1000s       -> Under100 "THOUSAND" separators
;                               {ThousandsVal := Value*1000}
;
; do100s        -> Under100 "HUNDRED"
;                       {HundredsVal := Value*100} After100
;               | Under100
;
; After100      -> {Value := 0} Under100
;               | {Value := 0} <empty string>
;
; Under100              -> {Value := 0} try20 try1s
;               | {Value := 0} doTeens
;               | {Value := 0} do1s
;
; try1s         -> do1s | <empty string>
;
; try20         -> "TWENTY" {Value := Value + 20}
;               | "THIRTY" {Value := Value + 30}
;               | ...
;               | "NINETY" {Value := Value + 90}
;
; doTeens               -> "TEN"        {Value := Value + 10}
;               | "ELEVEN"      {Value := Value + 11}
;               | ...
;               | "NINETEEN"    {Value := Value + 19}
;
; do1s          -> "ONE"        {Value := Value + 1}
;               | "TWO" {Value := Value + 2}
;               | ...
;               | "NINE"        {Value := Value + 9}


separators      pattern {anycset
delimiters
0
delim2}
delim2          pattern {spancset
delimiters}
doSuccess       pattern {succeed}
AtLast          pattern {sl_match2
separators
AtEOS
AtEOS}
AtEOS           pattern {EOS}


N               pattern {sl_match2
separators
N2
N2}
N2              pattern {matchistr
zero
N3
AtLast}
zero            byte    "ZERO"
0

N3              pattern {sl_match2
N4
0
AtLast}
N4              pattern {sl_match2
do1000s
do100s
Maybe100s}
Maybe100s       pattern {sl_match2
do100s
AtLast
AtLast}

do1000s         pattern {sl_match2
Under100
0
do1000s2}
do1000s2        pattern {matchistr
str1000
0
do1000s3}
do1000s3        pattern {sl_match2
separators
do1000s4
do1000s5}
do1000s4        pattern {EOS
0
0
do1000s5}
do1000s5        pattern {Get1000s}
str1000         byte    "THOUSAND"
0

do100s          pattern {sl_match2
do100s1
Under100
After100}
do100s1         pattern {sl_match2
Under100
0
do100s2}
do100s2         pattern {matchistr
str100
0
do100s3}
do100s3         pattern {sl_match2
separators
do100s4
do100s5}
do100s4         pattern {EOS
0
0
do100s5}
do100s5         pattern {Get100s}
str100          byte    "HUNDRED"
0

After100        pattern {SetVal
0
0
After100a}
After100a       pattern {sl_match2
Under100
doSuccess}

Under100        pattern {SetVal
0
0
Under100a}
Under100a       pattern {sl_match2
try20
Under100b
Do1orE}
Under100b       pattern {sl_match2
doTeens
do1s}

Do1orE          pattern {sl_match2
do1s
doSuccess
0}



NumPat          macro   lbl
next
Constant
string
local   try
SkipSpcs
val
str
tryEOS
lbl             pattern {sl_match2
try
next}
try             pattern {matchistr
str
0
SkipSpcs}
SkipSpcs                pattern {sl_match2
separators
tryEOS
val}
tryEOS          pattern {EOS
0
0
val}
val             pattern {AddVal
Constant}
str             byte    string
byte    0
endm

NumPat  doTeens
try11
10
"TEN"
NumPat  try11
try12
11
"ELEVEN"
NumPat  try12
try13
12
"TWELVE"
NumPat  try13
try14
13
"THIRTEEN"
NumPat  try14
try15
14
"FOURTEEN"
NumPat  try15
try16
15
"FIFTEEN"
NumPat  try16
try17
16
"SIXTEEN"
NumPat  try17
try18
17
"SEVENTEEN"
NumPat  try18
try19
18
"EIGHTEEN"
NumPat  try19
0
19
"NINETEEN"

NumPat  do1s
try2
1
"ONE"
NumPat  try2
try3
2
"TWO"
NumPat  try3
try4
3
"THREE"
NumPat  try4
try5
4
"FOUR"
NumPat  try5
try6
5
"FIVE"
NumPat  try6
try7
6
"SIX"
NumPat  try7
try8
7
"SEVEN"
NumPat  try8
try9
8
"EIGHT"
NumPat  try9
0
9
"NINE"

NumPat  try20
try30
20
"TWENTY"
NumPat  try30
try40
30
"THIRTY"
NumPat  try40
try50
40
"FORTY"
NumPat  try50
try60
50
"FIFTY"
NumPat  try60
try70
60
"SIXTY"
NumPat  try70
try80
70
"SEVENTY"
NumPat  try80
try90
80
"EIGHTY"
NumPat  try90
0
90
"NINETY"

include stdsets.a

dseg            ends



cseg            segment para public 'code'
assume  cs:cseg
ds:dseg


; Semantic actions for our grammar:
;
;
;
; Get1000s-     We've just processed the value one..nine
grab it from
;               the value variable
multiply it by 1000
and store it
;               into thousandsval.

Get1000s        proc    far
push    ds
push    dx
mov     ax
dseg
mov     ds
ax

mov     ax
1000
mul     Value
mov     ThousandsVal
ax
mov     Value
0

pop     dx
mov     ax
di                  ;Required by sl_match.
pop     ds
stc                             ;Always return success.
ret
Get1000s        endp


; Get100s-      We've just processed the value one..nine
grab it from
;               the value variable
multiply it by 100
and store it
;               into hundredsval.

Get100s         proc    far
push    ds
push    dx
mov     ax
dseg
mov     ds
ax

mov     ax
100
mul     Value
mov     HundredsVal
ax
mov     Value
0

pop     dx
mov     ax
di                  ;Required by sl_match.
pop     ds
stc                             ;Always return success.
ret
Get100s         endp


; SetVal-       This routine sets Value to whatever is in si

SetVal          proc    far
push    ds
mov     ax
dseg
mov     ds
ax
mov     Value
si
mov     ax
di
pop     ds
stc
ret
SetVal          endp

; AddVal-       This routine sets adds whatever is in si to Value

AddVal          proc    far
push    ds
mov     ax
dseg
mov     ds
ax
add     Value
si
mov     ax
di
pop     ds
stc
ret
AddVal          endp


; Succeed matches the empty string. In other words
it matches anything
; and always returns success without eating any characters from the input
; string.

Succeed         proc    far
mov     ax
di
stc
ret
Succeed         endp


; This subroutine expects a pointer to a string containing the English
; version of an integer number. It converts this to an integer and
; prints the result.

ConvertNumber   proc    near
mov     value
0
mov     HundredsVal
0
mov     ThousandsVal
0

ldxi    N
xor     cx
cx
match
jnc     NoMatch
mov     al
"'"
putc
puts
print
byte    "' = "
0
mov     ax
ThousandsVal
add     ax
HundredsVal
add     ax
Value
putu
putcr
jmp     Done

NoMatch:        print
byte    "Illegal number"
cr
lf
0

Done:           ret
ConvertNumber   endp



Main            proc
mov     ax
dseg
mov     ds
ax
mov     es
ax

meminit                         ;Init memory manager.

; Union in a "-" to the delimiters set because numbers can have
; dashes in them.

lesi    delimiters
mov     al
'-'
addchar

; Some calls to test the ConvertNumber routine and the conversion process.

lesi    Str0
call    ConvertNumber
lesi    Str1
call    ConvertNumber
lesi    Str2
call    ConvertNumber
lesi    Str3
call    ConvertNumber
lesi    Str4
call    ConvertNumber
lesi    Str5
call    ConvertNumber
lesi    Str6
call    ConvertNumber
lesi    Str7
call    ConvertNumber
lesi    Str8
call    ConvertNumber
lesi    Str9
call    ConvertNumber
lesi    Str10
call    ConvertNumber
lesi    Str11
call    ConvertNumber
lesi    Str12
call    ConvertNumber
lesi    Str13
call    ConvertNumber


Quit:           ExitPgm
Main            endp

cseg            ends

sseg            segment para stack 'stack'
stk             db      1024 dup ("stack ")
sseg            ends

zzzzzzseg       segment para public 'zzzzzz'
LastBytes       db      16 dup (?)
zzzzzzseg       ends
end     Main

Sample output:

'twenty one' = 21
'nineteen hundred thirty-five' = 1935
'thirty three thousand two hundred nineteen' = 33219
'three' = 3
'fourteen' = 14
'fifty two' = 52
'seven hundred' = 700
'two thousand seven' = 2007
'four thousand ninety six' = 4096
'five hundred twelve' = 512
'twenty three thousand two hundred ninety-five' = 23295
'seventy-five hundred' = 7500
'sixty-five thousand' = 65000
'one thousand' = 1000

Chapter Sixteen (Part 7)

Table of Content

Chapter Sixteen (Part 9) 

Chapter Sixteen: Pattern Matching (Part 8)
29 SEP 1996