Chapter Sixteen (Part 7) |
Table of Content | Chapter Sixteen (Part 9) |
CHAPTER SIXTEEN: PATTERN MATCHING (Part 8) |
16.8 -
Some Sample Pattern Matching Applications 16.8.1 - Converting Written Numbers to Integers |
16.8 Some Sample Pattern Matching Applications |
The best way to learn how to convert a pattern matching problem to the respective pattern matching algorithms is by example. The following sections provide several examples of some small pattern matching problems and their solutions.
16.8.1 Converting Written Numbers to Integers
One interesting pattern matching problem is to convert written (English) numbers to their integer equivalents. For example take the string "one hundred ninety-two" and convert it to the integer 192. Although written numbers represent a pattern quite a bit more complex than the ones we've seen thus far a little study will show that it is easy to decompose such strings.
The first thing we will need to do is enumerate the English words we will need to process written numbers. This includes the following words:
zero one two three four five six seven eight nine ten eleven twelve thirteen fourteen fifteen sixteen seventeen eighteen nineteen twenty thirty forty fifty sixty seventy eighty ninety hundred and thousand.
With this set of words we can build all the values between zero and 65 535 (the values we can represent in a 16 bit integer.
Next we've got to decide how to put these words together to form all the values between zero and 65 535. The first thing to note is that zero only occurs by itself it is never part of another number. So our first production takes the form:
Number zero | NonZero
The next thing to note is that certain values may occur in pairs denoting addition. For example eighty-five denotes the sum of eighty plus five. Also note that certain other pairs denote multiplication. If you have a statement like "two hundred" or "fifteen hundred" the "hundred" word says multiply the preceding value by 100. The multiplicative words "hundred" and "thousand" are also additive. Any value following these terms is added in to the total; e.g. "one hundred five" means 1*100+5. By combining the appropriate rules we obtain the following grammar
NonZero | Thousands Maybe100s | Hundreds | |
Thousands | Under100 thousand | |
Maybe100s | Hundreds | | |
Hundreds | Under100 hundred After100 | Under100 | |
After100 | Under100 | | |
Under100 | Tens Maybe1s| Teens | ones | |
Maybe1s | Ones | | |
ones | one | two | three | four | five | six | seven | eight | nine | |
teens | ten | eleven | twelve | thirteen | fourteen | fifteen | sixteen | seventeen | eighteen | nineteen | |
tens | twenty | thirty | forty | fifty | sixty | seventy | eighty | ninety |
The final step is to add semantic actions to actually convert the strings matched by this grammar to integer values. The basic idea is to initialize an accumulator value to zero. Whenever you encounter one of the strings that ones teens or tens matches you add the corresponding value to the accumulator. If you encounter the hundred or thousand strings you multiply the accumulator by the appropriate factor. The complete program to do the conversion follows:
; Numbers.asm ; ; This program converts written English numbers in the range "zero" ; to "sixty five thousand five hundred thirty five" to the corresponding ; integer value. .xlist include stdlib.a includelib stdlib.lib matchfuncs .list dseg segment para public 'data' Value word 0 ;Store results here. HundredsVal word 0 ThousandsVal word 0 Str0 byte "twenty one" 0 Str1 byte "nineteen hundred thirty-five" 0 Str2 byte "thirty three thousand two hundred nineteen" 0 Str3 byte "three" 0 Str4 byte "fourteen" 0 Str5 byte "fifty two" 0 Str6 byte "seven hundred" 0 Str7 byte "two thousand seven" 0 Str8 byte "four thousand ninety six" 0 Str9 byte "five hundred twelve" 0 Str10 byte "twenty three thousand two hundred ninety-five" 0 Str11 byte "seventy-five hundred" 0 Str12 byte "sixty-five thousand" 0 Str13 byte "one thousand" 0 ; The following grammar is what we use to process the numbers. ; Semantic actions appear in the braces. ; ; Note: begin by initializing Value HundredsVal and ThousandsVal to zero. ; ; N -> separators zero ; | N4 ; ; N4 -> do1000s maybe100s ; | do100s ; ; Maybe100s -> do100s ; | <empty string> ; ; do1000s -> Under100 "THOUSAND" separators ; {ThousandsVal := Value*1000} ; ; do100s -> Under100 "HUNDRED" ; {HundredsVal := Value*100} After100 ; | Under100 ; ; After100 -> {Value := 0} Under100 ; | {Value := 0} <empty string> ; ; Under100 -> {Value := 0} try20 try1s ; | {Value := 0} doTeens ; | {Value := 0} do1s ; ; try1s -> do1s | <empty string> ; ; try20 -> "TWENTY" {Value := Value + 20} ; | "THIRTY" {Value := Value + 30} ; | ... ; | "NINETY" {Value := Value + 90} ; ; doTeens -> "TEN" {Value := Value + 10} ; | "ELEVEN" {Value := Value + 11} ; | ... ; | "NINETEEN" {Value := Value + 19} ; ; do1s -> "ONE" {Value := Value + 1} ; | "TWO" {Value := Value + 2} ; | ... ; | "NINE" {Value := Value + 9} separators pattern {anycset delimiters 0 delim2} delim2 pattern {spancset delimiters} doSuccess pattern {succeed} AtLast pattern {sl_match2 separators AtEOS AtEOS} AtEOS pattern {EOS} N pattern {sl_match2 separators N2 N2} N2 pattern {matchistr zero N3 AtLast} zero byte "ZERO" 0 N3 pattern {sl_match2 N4 0 AtLast} N4 pattern {sl_match2 do1000s do100s Maybe100s} Maybe100s pattern {sl_match2 do100s AtLast AtLast} do1000s pattern {sl_match2 Under100 0 do1000s2} do1000s2 pattern {matchistr str1000 0 do1000s3} do1000s3 pattern {sl_match2 separators do1000s4 do1000s5} do1000s4 pattern {EOS 0 0 do1000s5} do1000s5 pattern {Get1000s} str1000 byte "THOUSAND" 0 do100s pattern {sl_match2 do100s1 Under100 After100} do100s1 pattern {sl_match2 Under100 0 do100s2} do100s2 pattern {matchistr str100 0 do100s3} do100s3 pattern {sl_match2 separators do100s4 do100s5} do100s4 pattern {EOS 0 0 do100s5} do100s5 pattern {Get100s} str100 byte "HUNDRED" 0 After100 pattern {SetVal 0 0 After100a} After100a pattern {sl_match2 Under100 doSuccess} Under100 pattern {SetVal 0 0 Under100a} Under100a pattern {sl_match2 try20 Under100b Do1orE} Under100b pattern {sl_match2 doTeens do1s} Do1orE pattern {sl_match2 do1s doSuccess 0} NumPat macro lbl next Constant string local try SkipSpcs val str tryEOS lbl pattern {sl_match2 try next} try pattern {matchistr str 0 SkipSpcs} SkipSpcs pattern {sl_match2 separators tryEOS val} tryEOS pattern {EOS 0 0 val} val pattern {AddVal Constant} str byte string byte 0 endm NumPat doTeens try11 10 "TEN" NumPat try11 try12 11 "ELEVEN" NumPat try12 try13 12 "TWELVE" NumPat try13 try14 13 "THIRTEEN" NumPat try14 try15 14 "FOURTEEN" NumPat try15 try16 15 "FIFTEEN" NumPat try16 try17 16 "SIXTEEN" NumPat try17 try18 17 "SEVENTEEN" NumPat try18 try19 18 "EIGHTEEN" NumPat try19 0 19 "NINETEEN" NumPat do1s try2 1 "ONE" NumPat try2 try3 2 "TWO" NumPat try3 try4 3 "THREE" NumPat try4 try5 4 "FOUR" NumPat try5 try6 5 "FIVE" NumPat try6 try7 6 "SIX" NumPat try7 try8 7 "SEVEN" NumPat try8 try9 8 "EIGHT" NumPat try9 0 9 "NINE" NumPat try20 try30 20 "TWENTY" NumPat try30 try40 30 "THIRTY" NumPat try40 try50 40 "FORTY" NumPat try50 try60 50 "FIFTY" NumPat try60 try70 60 "SIXTY" NumPat try70 try80 70 "SEVENTY" NumPat try80 try90 80 "EIGHTY" NumPat try90 0 90 "NINETY" include stdsets.a dseg ends cseg segment para public 'code' assume cs:cseg ds:dseg ; Semantic actions for our grammar: ; ; ; ; Get1000s- We've just processed the value one..nine grab it from ; the value variable multiply it by 1000 and store it ; into thousandsval. Get1000s proc far push ds push dx mov ax dseg mov ds ax mov ax 1000 mul Value mov ThousandsVal ax mov Value 0 pop dx mov ax di ;Required by sl_match. pop ds stc ;Always return success. ret Get1000s endp ; Get100s- We've just processed the value one..nine grab it from ; the value variable multiply it by 100 and store it ; into hundredsval. Get100s proc far push ds push dx mov ax dseg mov ds ax mov ax 100 mul Value mov HundredsVal ax mov Value 0 pop dx mov ax di ;Required by sl_match. pop ds stc ;Always return success. ret Get100s endp ; SetVal- This routine sets Value to whatever is in si SetVal proc far push ds mov ax dseg mov ds ax mov Value si mov ax di pop ds stc ret SetVal endp ; AddVal- This routine sets adds whatever is in si to Value AddVal proc far push ds mov ax dseg mov ds ax add Value si mov ax di pop ds stc ret AddVal endp ; Succeed matches the empty string. In other words it matches anything ; and always returns success without eating any characters from the input ; string. Succeed proc far mov ax di stc ret Succeed endp ; This subroutine expects a pointer to a string containing the English ; version of an integer number. It converts this to an integer and ; prints the result. ConvertNumber proc near mov value 0 mov HundredsVal 0 mov ThousandsVal 0 ldxi N xor cx cx match jnc NoMatch mov al "'" putc puts print byte "' = " 0 mov ax ThousandsVal add ax HundredsVal add ax Value putu putcr jmp Done NoMatch: print byte "Illegal number" cr lf 0 Done: ret ConvertNumber endp Main proc mov ax dseg mov ds ax mov es ax meminit ;Init memory manager. ; Union in a "-" to the delimiters set because numbers can have ; dashes in them. lesi delimiters mov al '-' addchar ; Some calls to test the ConvertNumber routine and the conversion process. lesi Str0 call ConvertNumber lesi Str1 call ConvertNumber lesi Str2 call ConvertNumber lesi Str3 call ConvertNumber lesi Str4 call ConvertNumber lesi Str5 call ConvertNumber lesi Str6 call ConvertNumber lesi Str7 call ConvertNumber lesi Str8 call ConvertNumber lesi Str9 call ConvertNumber lesi Str10 call ConvertNumber lesi Str11 call ConvertNumber lesi Str12 call ConvertNumber lesi Str13 call ConvertNumber Quit: ExitPgm Main endp cseg ends sseg segment para stack 'stack' stk db 1024 dup ("stack ") sseg ends zzzzzzseg segment para public 'zzzzzz' LastBytes db 16 dup (?) zzzzzzseg ends end Main
Sample output:
'twenty one' = 21 'nineteen hundred thirty-five' = 1935 'thirty three thousand two hundred nineteen' = 33219 'three' = 3 'fourteen' = 14 'fifty two' = 52 'seven hundred' = 700 'two thousand seven' = 2007 'four thousand ninety six' = 4096 'five hundred twelve' = 512 'twenty three thousand two hundred ninety-five' = 23295 'seventy-five hundred' = 7500 'sixty-five thousand' = 65000 'one thousand' = 1000
Chapter Sixteen (Part 7) |
Table of Content | Chapter Sixteen (Part 9) |
Chapter Sixteen: Pattern Matching
(Part 8)
29 SEP 1996