<< 靈 3 � ∥譴�� �屬巾�� � 量 靈 5 � 傑� �맑� >>

靈 4 �. HTML 庚靴��

4.1. ヴ健�� ��≠�

寧ュ ┐┐頌 ヴ舡�  鋼 �珞�槪 comp.lang.python읒밀 셜ヴ “�┍∽ 寧ュ 寧� HTML珞밀읒밀 匯� [headers|images|links]況 寧읽� � �ュ≠?” “�┍∽ 寧ュ �그�槪 그�頌 듬�밀 寧� HTML 珞밀� �맑�況 [parse|translate|munge] � � �ュ≠?” “�┍∽ 寧ュ 寧� HTML �그 뱃뭡�槪 � 纜읒 [add|remove|quote] � � �ュ≠?” 健 �鋼 健壽� 匯� �珞�읒 �� 娛↔健ヴ.

윈�읒 잽��, 虔돛�ュ �頌그少健 듬 쇰솥劍頌 寧サ� �ヴ. 憬 纜櫻 쇰솥鋼, BaseHTMLProcessor.py乾뎬, 健↔鋼 �그�� �맑� �刷槪 �� 섯劍頌� 윈壽솥健 HTML �件槪 庚靴�도刷 도잴 �ュ 件��乾 도깬健ヴ. 듬 纜櫻 쇰솥鋼, dialect.py乾뎬, 健↔鋼 �그ュ 그�頌 듬�밀 HTML 珞밀� �맑�況 纜을�� 胛�윈 BaseHTMLProcessor.py況 ���ュ 籃輛槪 숩윈�ュ 잇靈健ヴ. doc string� �묠槪 居�밀 朗�健 玲��� �ュ� ⌒��윈 숩�. 그 �쇰솥鋼 ��맑塑 譁��  健 숩件 ↔健ヴ, ���� 健壽� �甦맑 匯듬≠ 도뎬譴 �┍∽ 繇�뎌ュ� 숄�蟯�� ┐珞健ヴ. ‘穢�� 譁�, 匯� ↔鋼 ┐≠ �� �壽寧∽ � ↔健ヴ.

Example 4.1. BaseHTMLProcessor.py

윈壽솥健 껍曳 그峀∽ �� 鰥�ヴ�, 윈壽솥鋼 健↔� 健 甄읒밀 ��들 ヴ患 잇靈�槪 蘆淞 良槪 � �ヴ (Windows, UNIX, Mac OS).

from sgmllib import SGMLParser

class BaseHTMLProcessor(SGMLParser):
    def reset(self):
        # extend (called by SGMLParser.__init__)
        self.pieces = []
        SGMLParser.reset(self)

    def unknown_starttag(self, tag, attrs):
        # called for each start tag
        # attrs is a list of (attr, value) tuples
        # e.g. for <pre class="screen">, tag="pre", attrs=[("class", "screen")]
        # Ideally we would like to reconstruct original tag and attributes, but
        # we may end up quoting attribute values that weren't quoted in the source
        # document, or we may change the type of quotes around the attribute value
        # (single to double quotes).
        # Note that improperly embedded non-HTML code (like client-side Javascript)
        # may be parsed incorrectly by the ancestor, causing runtime script errors.
        # All non-HTML code must be enclosed in HTML comment tags (<!-- code -->)
        # to ensure that it will pass through this parser unaltered (in handle_comment).
        strattrs = "".join([' %s="%s"' % (key, value) for key, value in attrs])
        self.pieces.append("<%(tag)s%(strattrs)s>" % locals())

    def unknown_endtag(self, tag):
        # called for each end tag, e.g. for </pre>, tag will be "pre"
        # Reconstruct the original end tag.
        self.pieces.append("</%(tag)s>" % locals())

    def handle_charref(self, ref):
        # called for each character reference, e.g. for "&#160;", ref will be "160"
        # Reconstruct the original character reference.
        self.pieces.append("&#%(ref)s;" % locals())

    def handle_entityref(self, ref):
        # called for each entity reference, e.g. for "&copy;", ref will be "copy"
        # Reconstruct the original entity reference.
        self.pieces.append("&%(ref)s;" % locals())

    def handle_data(self, text):
        # called for each block of plain text, i.e. outside of any tag and
        # not containing any character or entity references
        # Store the original text verbatim.
        self.pieces.append(text)

    def handle_comment(self, text):
        # called for each HTML comment, e.g. <!-- insert Javascript code here -->
        # Reconstruct the original comment.
        # It is especially important that the source document enclose client-side
        # code (like Javascript) within comments so it can pass through this
        # processor undisturbed; see comments in unknown_starttag for details.
        self.pieces.append("<!--%(text)s-->" % locals())

    def handle_pi(self, text):
        # called for each processing instruction, e.g. <?instruction>
        # Reconstruct original processing instruction.
        self.pieces.append("<?%(text)s>" % locals())

    def handle_decl(self, text):
        # called for the DOCTYPE, if present, e.g.
        # <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
        #     "http://www.w3.org/TR/html4/loose.dtd">
        # Reconstruct original DOCTYPE
        self.pieces.append("<!%(text)s>" % locals())

    def output(self):
        """Return processed HTML as a single string"""
        return "".join(self.pieces)

Example 4.2. dialect.py

import re
from BaseHTMLProcessor import BaseHTMLProcessor

class Dialectizer(BaseHTMLProcessor):
    subs = ()

    def reset(self):
        # extend (called from __init__ in ancestor)
        # Reset all data attributes
        self.verbatim = 0
        BaseHTMLProcessor.reset(self)

    def start_pre(self, attrs):
        # called for every <pre> tag in HTML source
        # Increment verbatim mode count, then handle tag like normal
        self.verbatim += 1
        self.unknown_starttag("pre", attrs)

    def end_pre(self):
        # called for every </pre> tag in HTML source
        # Decrement verbatim mode count
        self.unknown_endtag("pre")
        self.verbatim -= 1

    def handle_data(self, text):
        # override
        # called for every block of text in HTML source
        # If in verbatim mode, save text unaltered;
        # otherwise process the text with a series of substitutions
        self.pieces.append(self.verbatim and text or self.process(text))

    def process(self, text):
        # called from handle_data
        # Process text block by performing series of regular expression
        # substitutions (actual substitions are defined in descendant)
        for fromPattern, toPattern in self.subs:
            text = re.sub(fromPattern, toPattern, text)
        return text

class ChefDialectizer(Dialectizer):
    """convert HTML to Swedish Chef-speak

    based on the classic chef.x, copyright (c) 1992, 1993 John Hagerman
    """
    subs = ((r'a([nu])', r'u\1'),
            (r'A([nu])', r'U\1'),
            (r'a\B', r'e'),
            (r'A\B', r'E'),
            (r'en\b', r'ee'),
            (r'\Bew', r'oo'),
            (r'\Be\b', r'e-a'),
            (r'\be', r'i'),
            (r'\bE', r'I'),
            (r'\Bf', r'ff'),
            (r'\Bir', r'ur'),
            (r'(\w*?)i(\w*?)$', r'\1ee\2'),
            (r'\bow', r'oo'),
            (r'\bo', r'oo'),
            (r'\bO', r'Oo'),
            (r'the', r'zee'),
            (r'The', r'Zee'),
            (r'th\b', r't'),
            (r'\Btion', r'shun'),
            (r'\Bu', r'oo'),
            (r'\BU', r'Oo'),
            (r'v', r'f'),
            (r'V', r'F'),
            (r'w', r'w'),
            (r'W', r'W'),
            (r'([a-z])[.]', r'\1.  Bork Bork Bork!'))

class FuddDialectizer(Dialectizer):
    """convert HTML to Elmer Fudd-speak"""
    subs = ((r'[rl]', r'w'),
            (r'qu', r'qw'),
            (r'th\b', r'f'),
            (r'th', r'd'),
            (r'n[.]', r'n, uh-hah-hah-hah.'))

class OldeDialectizer(Dialectizer):
    """convert HTML to mock Middle English"""
    subs = ((r'i([bcdfghjklmnpqrstvwxyz])e\b', r'y\1'),
            (r'i([bcdfghjklmnpqrstvwxyz])e', r'y\1\1e'),
            (r'ick\b', r'yk'),
            (r'ia([bcdfghjklmnpqrstvwxyz])', r'e\1e'),
            (r'e[ea]([bcdfghjklmnpqrstvwxyz])', r'e\1e'),
            (r'([bcdfghjklmnpqrstvwxyz])y', r'\1ee'),
            (r'([bcdfghjklmnpqrstvwxyz])er', r'\1re'),
            (r'([aeiou])re\b', r'\1r'),
            (r'ia([bcdfghjklmnpqrstvwxyz])', r'i\1e'),
            (r'tion\b', r'cioun'),
            (r'ion\b', r'ioun'),
            (r'aid', r'ayde'),
            (r'ai', r'ey'),
            (r'ay\b', r'y'),
            (r'ay', r'ey'),
            (r'ant', r'aunt'),
            (r'ea', r'ee'),
            (r'oa', r'oo'),
            (r'ue', r'e'),
            (r'oe', r'o'),
            (r'ou', r'ow'),
            (r'ow', r'ou'),
            (r'\bhe', r'hi'),
            (r've\b', r'veth'),
            (r'se\b', r'e'),
            (r"'s\b", r'es'),
            (r'ic\b', r'ick'),
            (r'ics\b', r'icc'),
            (r'ical\b', r'ick'),
            (r'tle\b', r'til'),
            (r'll\b', r'l'),
            (r'ould\b', r'olde'),
            (r'own\b', r'oune'),
            (r'un\b', r'onne'),
            (r'rry\b', r'rye'),
            (r'est\b', r'este'),
            (r'pt\b', r'pte'),
            (r'th\b', r'the'),
            (r'ch\b', r'che'),
            (r'ss\b', r'sse'),
            (r'([wybdp])\b', r'\1e'),
            (r'([rnt])\b', r'\1\1e'),
            (r'from', r'fro'),
            (r'when', r'whan'))

def translate(url, dialect="chef"):
    """fetch URL and translate using dialect

    dialect in ("chef", "fudd", "olde")"""
    import urllib
    sock = urllib.urlopen(url)
    htmlSource = sock.read()
    sock.close()
    parserName = "%sDialectizer" % dialect.capitalize()
    parserClass = globals()[parserName]
    parser = parserClass()
    parser.feed(htmlSource)
    parser.close()
    return parser.output()

def test(url):
    """test all dialects against URL"""
    for dialect in ("chef", "fudd", "olde"):
        outfile = "%s.html" % dialect
        fsock = open(outfile, "wb")
        fsock.write(translate(url, dialect))
        fsock.close()
        import webbrowser
        webbrowser.open_new(outfile)

if __name__ == "__main__":
    test("http://diveintopython.org/odbchelper_list.html")

Example 4.3. Output of dialect.py

健 맑�潢�況 ���� Lists 101槪 (The Muppets(믹乾�굄)읒밀 寧읫ュ) Swedish Chef�-恍�頌, (倆맑 �レ 梟優윳優읒 寧읫ュ) Elmer Fudd-恍�頌 , 그靴� (競밀� ��샬靴 健꺌�읒 �潔 �競�) 燁뭐 윳��劍頌 纜을� ↔健ヴ. 윈壽솥健 �悚 �健�� HTML 뱉맑況 ��숩�, 윈壽솥鋼 匯� HTML �그잴 뱃뭡�鋼 믹�� 꿩뀜劍寧, 그壽寧 �그� �健� �맑�ュ 그 �蘆 꿇�頌 “纜을들” ↔槪 섰 ↔健ヴ. � ≠�健 ��숩�, ��鋼, 읫曳 �健�� 珞�梟健 纜을들 ↔槪 섰 ↔健ヴ; �� �刷� 優�읒 寧읫ュ 잇靈�鋼 믹�� 꿩鋼 梗 그�頌 �ヴ.

4.2. Introducing sgmllib.py

HTML 庚靴ュ 뭐 ⌒� ��頌 寧ォ�玲ヴ: HTML槪 그 깬뭡 獵、劍頌 寧ォ�, 그 獵、槪 獵虔��, 그靴� 그 獵、槪 ヴ맸 HTML 頌 �깬뭡��. 憬 纜櫻 ��ュ, �影 �健� �健�壽靴읒 �ュ, sgmllib.py읒 ��밀 ��들ヴ.

sgmllib.py읒ュ � ⌒� 燁�� �甦맑≠ ���� �ヴ: SGMLParser. 健 珞穫읒밀, �밀(parser)ュ 손�� 깬獵優�� �� 꿩鋼 ⌒譴況 � �뮈� 깬獵優들 獵、�頌 獵、蘆ュ �뮈� ��健ヴ. 윈壽 暎�� �밀�健 �ヴ; �健�� �影 �健�壽靴읒ュ, ��� �乾 묽���, .INI �件, �腱 回件�, robots.txt �件, XML ��槪 �명�� 胛� 匯��健 �ヴ.

�┱ �밀�鋼 ��況 傑��ヴ, � 그�鋼 뎬健�況 �맑�� 그靴�ュ 그↔槪 蘆쇰읒 깬獵� ��劍頌 ����, �┱ �┶健 寧燁읒 잴밀 그 깬獵優들 뎬健�況 ����況 �ヴ禍ヴ. SGMLParserュ 腱�健 �맑� 그 뎬健�況 �穢�� 꿩ュヴ; ��읒, �┱ 뎬健�況 傑�� 獵、劍頌 솥��ュ 뎬 뭡×�腱 譁腱, 朗�健 烙¨ ��ュ�읒 ��밀 그 腱�읒∽ �ュ 回��況 繇��ヴ. 그 �밀況 ���� 胛�윈, 윈壽솥鋼 SGMLParser �甦맑況 �쇰�甦맑優�� 그靴� 健壽� 回��況 뒨�껀ヴ.

SGMLParserュ HTML槪 8 ≠�� 뎬健�頌 �맑��, 그靴� 그↔� 、、槪 胛�윈 샥⌒� 回��況 繇��ヴ:

Start tag
<html>, <head>, <body>, <pre>잴  鋼 �刷槪 맸虔�ュ HTML �그, 玗鋼 <br> �ュ <img>�  鋼 덱潢� �그. 맸虔 �그, tagname槪 烙¨��, SGMLParserstart_tagname �ュ do_tagname�� 숄靴�ュ 回��況 黔槪 ↔健ヴ. 잇況 ��, <pre> �그況 烙¨��, start_pre �ュ do_pre 回��況 黔槪 ↔健ヴ. 梟� 烙¨��, SGMLParserュ 健 回��況 그 �그� 뱃뭡�槪 �鋼 靴맑�頌 繇�� ↔健ヴ; 그峀� 꿩劍�, 그 �그 健華� 뱃뭡�� 靴맑�頌 unknown_starttag況 繇��ヴ.
End tag
</html>, </head>, </body>, �ュ </pre>잴  鋼, �刷槪 怒蘆ュ HTML �그. 怒 �그況 烙¨��, SGMLParser end_tagname健�� 쇰花ュ 回��況 繇�� ↔健ヴ. 梟� 烙¨��, SGMLParserュ 健 回��況 繇��ヴ, 그峀� 꿩劍� unknown_endtag況 그 �그 健華劍頌 繇��ヴ.
Character reference
&#160;�  鋼 16玲 玗鋼 10玲 돛�』읒 ��윈 �獵�ュ �� 珞腱읽. 烙¨��, SGMLParserhandle_charref況 10玲 玗鋼 16玲 돛�』槪 ≠玲 그 �맑�況 ≠�� 繇��ヴ.
Entity reference
&copy;잴  鋼, HTML ⌒譴. 烙¨��, SGMLParserhandle_entityref況 그 HTML� 健華槪 ≠�� 繇��ヴ
Comment
HTML �묠, <!-- ... -->頌 몃윈 �ヴ. 烙¨��, SGMLParserhandle_comment況 그 �묠� �譴況 ≠�� 繇��ヴ.
Processing instruction
HTML 庚靴 ��, <? ... >읒 몃윈 �ヴ. 烙¨��, SGMLParserhandle_pi況 그 庚靴 ��� �譴況 ≠�� 繇��ヴ.
Declaration
DOCTYPE�  鋼, HTML 묽꿇, <! ... >頌 몃윈 �ヴ. 烙¨��, SGMLParserhandle_decl況 그 묽꿇� �譴況 ≠�� 繇��ヴ.
Text data
�맑� �刷. 健� 7≠�� ��읒 ���� 꿩ュ 匯� ↔. 烙¨��, SGMLParserhandle_data況 그 �맑�況 ≠�� 繇��ヴ.
Important
�健� 2.0 鋼 �그況 ≠�� ��밀 SGMLParser ュ 묽꿇槪 �� 乾��� 꿩槪 ↔健ヴ (handle_decl健 據�頌 繇��� 꿩槪 ↔健ヴ), 그↔鋼 DOCTYPE�健 獵�尤 朗맸� ↔健�ュ ↔槪 �藜�ヴ. �健� 2.1읒밀ュ �穢��ヴ.

sgmllib.py읒ュ 健↔槪 잇맸��ュ �맑� 匯딕健 ��장ヴ. 윈壽솥鋼 sgmllib.py槪 ��� � �ヴ, ��� �乾읒 HTML �件� 健華槪 �� ��, 그↔鋼 �명槪 ��밀 �그�� ヴ患 �뱉�槪 �悚� ↔健ヴ. SGMLParser �甦맑況 �쇰�甦맑優�� 그靴� unknown_starttag, unknown_endtag, handle_data 그靴� 그�� 乾�況 �뮈尤 �悚�ュ ヴ患 回���槪 穢��劍頌� 그↔鋼 健峀∽ ���ヴ.

Tip
姜도�� �健� IDE읒밀, 윈壽솥鋼 “Run script” �優�腱읒밀 ��� �乾 乾��槪 �穢� � �ヴ.

Example 4.4. Sample test of sgmllib.py

윈�읒 健 甄� HTML ��, toc.html읒 �ュ �刷劍頌 쇰� ��장 虔鋼 獵、健 �ヴ.

<h1>
  <a name='c40a'></a>
  Dive Into Python
</h1>
<p class='pubdate'>
  28 Feb 2001
</p>
<p class='copyright'>
  Copyright copy 2000, 2001 by
  <a href='mailto:f8dy@diveintopython.org' title='send e-mail to the author'>
    Mark Pilgrim
  </a>
</p>
<p>
  <a name='c40ab2b4'></a>
  <b></b>
</p>
<p>
  This book lives at
  <a href='http://diveintopython.org/'>
    http://diveintopython.org/
  </a>
  .
  If you're reading it somewhere else, you may not have the latest version.
</p>

健↔槪 sgmllib.py� �맑� 匯딕槪 ��윈 ���� 健壽� ��≠ ��들ヴ:

start tag: <h1>
start tag: <a name="c40a" >
end tag: </a>
data: 'Dive Into Python'
end tag: </h1>
start tag: <p class="pubdate" >
data: '28 Feb 2001'
end tag: </p>
start tag: <p class="copyright" >
data: 'Copyright '
*** unknown entity ref: &copy;
data: ' 2000, 2001 by '
start tag: <a href="mailto:f8dy@diveintopython.org" title="send e-mail to the author" >
data: 'Mark Pilgrim'
end tag: </a>
end tag: </p>
start tag: <p>
start tag: <a name="c40ab2b4" >
end tag: </a>
start tag: <b>
end tag: </b>
end tag: </p>
start tag: <p>
data: 'This book lives at '
start tag: <a href="http://diveintopython.org/" >
data: 'http://diveintopython.org/'
end tag: </a>
data: ".\012If you're reading it somewhere else, you may not have the lates"
data: 't version.\012'
end tag: </p>

윈�읒 健�� 寧蛔�況 胛� �도≠ �ヴ:

4.3. HTML 珞밀頌쇰� 뎬健�況 偈���

HTML 珞밀頌쇰� 뎬健�況 偈��� 胛�밀ュ, SGMLParser �甦맑況 �쇰�甦맑優�� 그靴� 윈壽솥健 寧���腱 �ュ ⌒譴 玗鋼 、 �그況 胛� 回��況 穢���.

HTML 珞밀頌쇰� 뎬健�況 偈���胛� 憬 纜櫻 ��ュ �。� HTML �件槪 �ュ ↔健ヴ. 윈壽솥� �� 뒵맑�읒 �。� HTML健 �윈 �ヴ�, 윈壽솥鋼 �件 ���槪 ���윈 그↔槪 居槪 � �ヴ, 그壽寧 玲若 �藜ュ 윈壽솥健 휸휸�∽ �껍 �ュ 絳 �健�頌쇰� HTML槪 �槪 ┐ 맸虔들ヴ.

Example 4.5. Introducing urllib

>>> import urllib                                       1
>>> sock = urllib.urlopen("http://diveintopython.org/") 2
>>> htmlSource = sock.read()                            3
>>> sock.close()                                        4
>>> print htmlSource                                    5
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"><html><head>
      <meta http-equiv='Content-Type' content='text/html; charset=ISO-8859-1'>
   <title>Dive Into Python</title>
<link rel='stylesheet' href='diveintopython.css' type='text/css'>
<link rev='made' href='mailto:f8dy@diveintopython.org'>
<meta name='keywords' content='Python, Dive Into Python, tutorial, object-oriented, programming, documentation, book, free'>
<meta name='description' content='a free Python tutorial for experienced programmers'>
</head>
<body bgcolor='white' text='black' link='#0000FF' vlink='#840084' alink='#0000FF'>
<table cellpadding='0' cellspacing='0' border='0' width='100%'>
<tr><td class='header' width='1%' valign='top'>diveintopython.org</td>
<td width='99%' align='right'><hr size='1' noshade></td></tr>
<tr><td class='tagline' colspan='2'>Python&nbsp;for&nbsp;experienced&nbsp;programmers</td></tr>

[...snip...]
1 urllib 匯�鋼 �影 �健� �健�壽靴읒 ���� �ヴ. 그↔鋼 乾��-�� URL 頌쇰� (�頌 絳�健�頌쇰�) 뎬健�況 �靈頌 ↑��� 穢손況 寓돗�ュ ���槪 ≠玲ヴ.
2 The simplest use of urllib況 ≠� 믹릴∽ ���ュ 籃輛鋼 urlopen ��況 ���윈 絳�健�� �譴 �맑�況 ↑��ュ ↔健ヴ. URL槪 윈ュ ↔鋼 �件槪 윈ュ ↔� �며�ヴ. urlopen � �橒』鋼 �件-�며� ∥譴健ヴ, 그↔鋼 �件 ∥譴잴 �  鋼, �。� 回��況 ≠玲ヴ.
3 urlopen健 �橒� 그 �件-�며� ∥譴頌 � � �ュ ≠� 。�� ↔鋼 read健ヴ, 그↔鋼 그 絳 �健� �譴 HTML槪 �⌒� 珞腱읽頌 居� �乾ヴ. 그 ∥譴ュ �� readlines槪 �읏�ヴ, 그↔鋼 �乾�胛頌 그 �맑�況 靴맑�頌 居� �乾ヴ.
4 윈壽솥健 그 ∥譴頌 件槪 譁��, 穢��乾 �件 ∥譴잴 �  健, 蟯��∽ 그↔槪 �껍�.
5 �靴ュ 健靈 http://diveintopython.org/ 遼�健�� 잽�� HTML槪 �寧� 珞腱읽頌 ≠�� �ヴ, 그靴� 그↔槪 �맑� 影�≠ �� �ヴ.

Example 4.6. Introducing urllister.py

윈壽솥健 껍曳 그峀∽ �� 鰥�윔ヴ�, 윈壽솥鋼 健↔� 健 甄읒밀 ���ュ ヴ患 잇靈�槪 蘆淞 良槪 � �ヴ(Windows, UNIX, Mac OS).

from sgmllib import SGMLParser

class URLLister(SGMLParser):
    def reset(self):                              1
        SGMLParser.reset(self)
        self.urls = []

    def start_a(self, attrs):                     2
        href = [v for k, v in attrs if k=='href'] 3 4
        if href:
            self.urls.extend(href)
1 resetSGMLParser__init__ 回��읒 ��밀 繇�들ヴ, 그靴� �� 그↔鋼 그 �밀� �譴≠ 휸뭡��ヴ� �돛劍頌 繇�� � �ヴ. 그甦밀 윈壽솥健 �┱ 競�優≠ ���ヴ�, __init__읒밀≠ 껍レ�, reset읒밀 競�優 ��, 그峀∽ �劍頌� 그↔鋼 �┱ �┶健 �밀 �譴況 �-��� ┐ �據�∽ �-競�優≠ � ↔健ヴ.
2 start_a<a> �그況 烙¨� ┐譁ヴ SGMLParser읒 ��밀 繇�들ヴ. 그 �그ュ href 뱃뭡槪 ��� �도 ��, 그靴�/玗鋼 name �ュ title�  鋼, ヴ患 뱃뭡�槪 ≠� �도 �ヴ. attrs �⌒순�ュ ��� 靴맑�健ヴ, [(attribute, value), (attribute, value), ...]. 玗鋼 그↔鋼 �� <a>件 �도 �ヴ, 傑湧� (��꼬槪 ��도) HTML �그頌밀, 健塑 ±�읒ュ attrs鋼 � 靴맑�≠ � ↔健ヴ.
3 �靴ュ 健 <a> �그≠ href 뱃뭡槪 ≠�� �ュ�況 。�� 윈壽-순� 靴맑� 纜을槪 ≠�� 꿱껍冷 � �ヴ.
4 k=='href'�  鋼 珞腱읽 �귑ュ �� �뱉珞腱읒 濾\�ヴ, 그壽寧 健 ±�읒ュ 꿴��뎬, ���� SGMLParserattrs況 깬��ュ 돛꿴읒 뱃뭡 健華�槪 뱉珞腱頌 순橒�� ┐珞健ヴ.

Example 4.7. Using urllister.py

>>> import urllib, urllister
>>> usock = urllib.urlopen("http://diveintopython.org/")
>>> parser = urllister.URLLister()
>>> parser.feed(usock.read())         1
>>> usock.close()                     2
>>> parser.close()                    3
>>> for url in parser.urls: print url 4
toc.html
#download
toc.html
history.html
download/dip_pdf.zip
download/dip_pdf.tgz
download/dip_pdf.hqx
download/diveintopython.pdf
download/diveintopython.zip
download/diveintopython.tgz
download/diveintopython.hqx

[...snip...]
1 SGMLParser읒 穢�들, feed 回��況 繇��윈, HTML槪 그 �밀頌 ���.[7] 그↔鋼 珞腱읽槪 ��ュ뎬, usock.read()健 �橒� ↔健ヴ.
2 �件�� 譁�≠�頌, 윈壽솥鋼 庚靴≠ 怒寧� 寧� ��� 횰靴 윈壽솥健 ��� URL ∥譴況 �껍꺌梟 �ヴ.
3 윈壽솥鋼 �밀 ∥譴도 을맸, �껍꺌梟 �ヴ, 그壽寧 ヴ患 健傑≠ �ヴ. feed 回��ュ 윈壽솥健 影 匯� HTML槪 庚靴�ュ ↔健 숩��� 꿩ュヴ; 그↔鋼 윈壽솥� HTML槪 ��읒 ����, � 읫�況 �ヴ禍ヴ. � 健� 꼬劍�, close況 繇��윈 그 ��況 ∠靈 ���윈 匯� ↔槪 잽�尤 �맑�도刷 梟�ヴ.
4 그 �밀≠ ����, �명鋼 怒寧�, 그靴� parser.urls 鋼 HTML 珞밀읒 �ュ 匯� 換�들 URL� 靴맑�況 ≠玲ヴ.

4.4. BaseHTMLProcessor.py況 뱉⌒��

SGMLParserュ 그 腱譴頌ュ 껍朗↔도 ���� 꿩ュヴ. 그↔鋼 �맑�� �맑��, � �맑�ヴ, 그靴� 그↔鋼 腱�健 烙¨�ュ 、、� �藜頌� ↔�읒 ��� 回��況 繇��ヴ, 그壽寧 그 回��ュ 껍朗↔도 �� 꿩ュヴ. SGMLParserュ HTML 뱉�腱(consumer)健ヴ: 그↔鋼 HTML槪 ��� 그↔槪 虔鋼, 깬獵優들 獵、�頌 솥��ヴ. 윈壽솥健 健� 뭏�읒밀 숩뀜�健, 윈壽솥鋼 SGMLParser況 �쇰�甦맑優�윈 �穢� �그�槪 �껍蘆ュ �甦맑況 穢�� � �� 絳�健�읒밀 匯� 換��槪 �鋼 靴맑�庚韶, 傑�� ↔�槪 휸��蘆ュ �甦맑況 穢�� � �ヴ. 健靈 健↔槪 �靴ュ � �舡 � 玲�맸�밀 SGMLParser �밀≠ 烙휸맸�ュ 匯�↔槪 寧��� 잽�� HTML珞밀況 �깬뭡�ュ �甦맑況 穢�� ↔健ヴ. ��� ��頌, 健塑 �甦맑ュ HTML 휸�腱(producer)≠ � ↔健ヴ.

BaseHTMLProcessorSGMLParser況 �쇰�甦맑優�ヴ 그靴� 8⌒� 匯� ���乾 庚靴 回��況 靈×�ヴ: unknown_starttag, unknown_endtag, handle_charref, handle_entityref, handle_comment, handle_pi, handle_decl, and handle_data.

Example 4.8. Introducing BaseHTMLProcessor

class BaseHTMLProcessor(SGMLParser):
    def reset(self):                        1
        self.pieces = []
        SGMLParser.reset(self)

    def unknown_starttag(self, tag, attrs): 2
        strattrs = "".join([' %s="%s"' % (key, value) for key, value in attrs])
        self.pieces.append("<%(tag)s%(strattrs)s>" % locals())

    def unknown_endtag(self, tag):          3
        self.pieces.append("</%(tag)s>" % locals())

    def handle_charref(self, ref):          4
        self.pieces.append("&#%(ref)s;" % locals())

    def handle_entityref(self, ref):        5
        self.pieces.append("&%(ref)s;" % locals())

    def handle_data(self, text):            6
        self.pieces.append(text)

    def handle_comment(self, text):         7
        self.pieces.append("<!--%(text)s-->" % locals())

    def handle_pi(self, text):              8
        self.pieces.append("<?%(text)s>" % locals())

    def handle_decl(self, text):
        self.pieces.append("<!%(text)s>" % locals())
1 reset鋼, SGMLParser.__init__읒 ��윈 繇���, self.pieces그 獵� 回��況 繇����읒 � 靴맑�頌 競�優�ヴ. self.pieces뎬健� 뱃뭡劍頌밀 �靴≠ 깬뭡�� 箇ュ HTML 珞밀� 그 獵、�槪 숩傑�∽ � ↔健ヴ. 、 庚靴 回��ュ SGMLParser≠ �맑� 그 HTML槪 �깬뭡� ↔健ヴ, 그靴� 、 回��ュ 그 珞腱읽槪 self.pieces읒 偈≠� ↔健ヴ. self.piecesュ 靴맑��ュ ↔槪 ����. 윈壽솥鋼 傑玗읒 �葉밀 그↔槪 珞腱읽頌 穢��� �� �뱃�밀 그↔읒 、 獵、�槪 偈≠� �도 匯患ヴ. 그↔도 虔돛� ↔健ヴ, 그壽寧 �健�鋼 靴맑�況 ヴ�ュ뎬 ��밀 �� � 湧喀�健ヴ.[8]
2 BaseHTMLProcessorュ �穢� �그�槪 胛� (URLLister읒 �ュ start_a 回��잴  鋼) �─� 回��頌 穢��� 꿩劍�頌, SGMLParser ュ 匯� 맸虔 �그 (start tag)읒 ��윈 unknown_starttag況 繇�� ↔健ヴ. 健 回��ュ 그 �그 (tag)잴 健華/』 �� 뱃뭡槪 ≠玲 靴맑� (attrs)況 ��윈, 읏甦� HTML槪 �깬뭡��, 그靴� 그↔槪 self.pieces읒 偈≠�ヴ. 珞腱읽 ��優ュ 윈�읒밀 �。 健��ヴ; �靴ュ ヴ舡 뭏�읒밀 그↔槪 ��� 섰 ↔健ヴ.
3 暎� �그 (end tags)況 �깬뭡�ュ ↔鋼 �� 릴ヴ; �� 그 �� 健華槪 ��밀 '</...>' □繇읒 몃 ���.
4 SGMLParser≠ 珞腱 �獵況 烙¨��, handle_charref況 그 �獵珞腱梟槪 ≠�� 繇��ヴ. 梟� HTML 珞밀≠ 그 �獵 &#160;況 ���� �ヴ�, ref160健 � ↔健ヴ. 읏甦� 잽�� 珞腱 �獵況 �깬뭡�ュ ↔鋼 �� ref&#...;珞腱頌 몃 �ュヴュ ↔槪 ┥�ヴ .
5 ⌒譴 �獵ュ 珞腱 �獵잴 �며�ヴ, 그壽寧 �맞 譁�(-)≠ 꼬ヴ. 읏甦� ⌒譴 �獵況 �깬뭡�ュ ↔鋼 ref&...; 珞腱� 꿴읒 몃 �ュ ↔槪 ��頌 �ヴ.
6 �맑� �刷鋼 �뮈尤 self.pieces읒 순±�� 꿩鋼 梗頌 偈≠들ヴ.
7 HTML �묠鋼 <!--...-> 珞腱� 꿴읒 몃윈 �ヴ.
8 ��� 庚靴ュ <?...> 珞腱� 꿴읒 몃윈 �ヴ.
Important
HTML ��鋼 (��健꿇�-結 腱力맑�潢�잴  鋼) 匯� �-HTML鋼 HTML �묠劍頌 �壽몃윈葉꺌 �ヴュ ↔槪 �깬�ヴ, 그壽寧 匯� 絳 �健�≠ �據�� 健峀∽ �ュ ↔鋼 껍レヴ (그靴� ��� 匯� 絳 �����鋼 그�健 그峀∽ �� 꿩��도 �乾�ヴ). BaseHTMLProcessorュ �밀≠ 꼬ヴ; 梟� 맑�潢�≠ 쇰�據�∽ 깬���ヴ�, 그 �밀ュ 譁� 그 쇰�據� 맑�潢�≠ HTML乾 ↔庚韶 �맑� ↔健ヴ. 잇況 ��, 梟� 그 맑�潢�≠ 虔�寧  ヴ�ュ �乾槪 ���ヴ�, SGMLParserュ 腱�健 �그잴 뱃뭡槪 烙¨�ヴ� 剛鰥 휸、� ↔健ヴ. SGMLParser ュ �� �그잴 뱃뭡 健華�槪 뱉珞腱頌 순橒�ヴ, 그峀∽ �� 그 맑�潢�況 淮≠�奐 �도 匯患ヴ, 그靴� (읏甦� HTML 珞밀≠ �件 쇰繇 玗鋼 ���� 꿩뀜槪��도) BaseHTMLProcessorュ �� 뱃뭡 』�槪 健燁 쇰繇頌 목 ↔健ヴ, 그峀∽ �� 蟯�尤 그 맑�潢�況 淮≠�奐 ↔健ヴ. �� 윈壽솥� ��健꿇�-臆 맑�潢�況 HTML �묠꿴읒 숩繇��.

Example 4.9. BaseHTMLProcessor output

    def output(self):               1
        """Return processed HTML as a single string"""
        return "".join(self.pieces) 2
1 健↔鋼 그 獵� SGMLParser읒 ��밀 據�頌 繇��ュ 件健 꼬ュ, BaseHTMLProcessor꿴읒 �ュ 그 �寧� 回��健ヴ. ヴ患 庚靴 回��ュ 그�� �깬뭡들 HTML槪 self.pieces읒 ����頌, 健 ��ュ 그壽� 匯� 獵、�槪 �寧� 珞腱읽頌 ���ュ뎬 ���ヴ. 健�읒 셜力잴  健, �健�鋼 靴맑�읒ュ 傑��� 珞腱읽읒ュ 그� ���ヴ, 그甦밀 �靴ュ 읫曳 �┱ �┶健 �맸�劍頌 그↔槪 �깬� ┐梟 그 잽�� 珞腱읽槪 휸뭡�ヴ.
2 윈壽솥健 읏�ヴ�, ��읒 윈壽솥鋼 string 匯�� join 回��況 ��� � �ヴ : string.join(self.pieces, "")

4.5. locals 그靴� globals

�健�鋼 듬 ⌒� 蘆� ��況 ≠玲ヴ, locals 그靴� globals, 그↔鋼 �을순�잴 �을순�읒 ��읒-�競� 閻깜槪 靈×�ヴ.

晄�, 健華×。� � ��. 健↔鋼 朗藜�獵� �靈健ヴ, 그壽寧 燁��ヴ, 그壽�頌 ����. �健�鋼 健華×。�� 쇰花ュ ↔�槪 ���윈 순��槪 偈� 傑��ヴ. 健華×。鋼 �뮈尤 ��健� ��읒밀 �ュ 순�� 健華健� ��� 』鋼 그壽� 순��� 』健ヴ. ��, 健華 ×。鋼 �健� ��庚韶 閻깜� � �ヴ, �맸�읒 �靴ュ 그↔槪 섰 ↔健ヴ.

�健� �頌그少읒밀 �穢� 瑩읒밀, ��읒ュ �⌒� 健華×。健 ��≠��ヴ. 、 ��ュ 腱�梟� 健華×。槪 ≠玲ヴ, �을 健華×。健�� 쇰花ュ뎬, 그 ��� 순��槪 偈� 傑��ヴ. �� 乾�잴 �을�劍頌 穢�들 순��槪 ���ヴ. 、 匯�鋼 腱�梟� 健華×。槪 ≠玲ヴ, �을 健華×。健頌� 쇰花ュ뎬, 그 匯�� 순��槪 偈� 傑��ヴ. ��, �甦맑, ヴ患 匯� �建들 匯�, 그靴� 匯�-�影 순�잴 ���槪 ���ヴ. 그靴� 蘆� 健華×。健 �ュ뎬, 匯� 匯�頌쇰� 閻깜≠���, 蘆� ��잴 잇�況 숩傑�ヴ.

� 娛� ��≠ 순� x� 』槪 �깬��, �健�鋼 그 순�況 匯� ≠�� 健華×。읒밀, 뮈밀�頌, 黔槪 ↔健ヴ:

  1. �을 健華×。 - �� �� 玗鋼 �甦맑 回��읒 �穢�ヴ. 梟� 그 ��≠ �을 순� x況 穢���寧, 玗鋼 乾� x況 ≠玲ヴ�, �健�鋼 健↔槪 ��� ↔健� ��槪 丸�↔健ヴ.
  2. �을 健華×。 - �� 匯�읒 �穢�ヴ. 梟� 그 匯�健 x�� 쇰花ュ 순�, ��, 玗鋼 �甦맑況 穢��윔ヴ�, �健�鋼 그↔槪 ��� ↔健� ��槪 燁�� ↔健ヴ.
  3. 蘆� 健華×。 - 匯� 匯�읒 ×�健ヴ. ��� 繇뱉庚頌밀, �健�鋼 x≠ 蘆� �� 玗鋼 순�� 健華健�� 偈穢� ↔健ヴ.

梟� �健�健 健壽� 健華 ×。 �뒵읒밀도 x況 黔� 鰥�ヴ�, �健�鋼 ���� 'There is no variable named 'x'(x �ュ 健華鋼 꼬舡)'健�ュ 回맸�況 ≠�� NameError잇�況 件劍�ヴ. 윈壽솥鋼 그↔槪 �둥껍≠ 靈 1 �읒밀 숩뀜ヴ, 그壽寧 윈壽솥鋼 �健�健 그 읒壽況 윈壽솥읒∽ �� �읒 �譁寧 皇鋼 件槪 �� �ュ�읒 ��윈 \�� �� 꿩뀜ヴ.

Important
�健� 2.2 ュ 健華×。 �� 뮈밀읒 윳�槪 藜�ュ 藜崍��梟 燁�� 순優況 뱉⌒� ↔健ヴ: 蘆�들 윳을 (nested scopes). �健� 2.0읒밀, 윈壽솥健 �寧� 순�況 蘆�들 �� �ュ ┶ヴ (lambda) ��꿴읒밀 �獵� ┐, �健�鋼 그 순�況 ��� (蘆�들 �ュ lambda) ��� 健華 ×。읒밀, 그靴� 寧밀 그 匯�� 健華×。읒밀 ��� ↔健ヴ. �健� 2.2 ュ 그 순�況 ��� (蘆�들 �ュ lambda) ��� 健華×。읒밀, 그靴�밀ュ 쇰匯��� 健華×。읒밀, 그靴� 寧밀 그 匯�� 健華×。읒밀 ��� ↔健ヴ. �健� 2.1鋼 듬 ≠� 籃� 匯듬 � � �ヴ; �셜 』劍頌, 그↔鋼 �健� 2.0庚韶 虔돛�ヴ, 그壽寧 윈壽솥鋼 ヴ舡� �� �乾槪 윈壽솥� 匯� ��쇰읒 偈≠�윈밀 윈壽솥� 匯�槪 �健� 2.2 庚韶 虔돛�도刷 � � �ヴ:
from __future__ import nested_scopes

�健�읒밀ュ ヴ患 ↔�� 譁�≠�頌, 健華×。鋼 ��-맸읒 曳閻�劍頌 閻깜≠��ヴ. �尤 �을 健華×。鋼 蘆� locals ��頌 閻깜≠���, 그靴� �을 (匯� �影) 健華×。鋼 蘆� globals ��況 ��윈 閻깜≠��ヴ.

Example 4.10. Introducing locals

>>> def foo(arg): 1
...     x = 1
...     print locals()
...     
>>> foo(7)        2
{'arg': 7, 'x': 1}
>>> foo('bar')    3
{'arg': 'bar', 'x': 1}
1 그 �� fooュ 듬 ⌒� 순�況 腱�� �을 健華×。읒 ≠�� �ヴ: arg, 그 』鋼 그 ��頌 �弄�玲ヴ, 그靴� x, 健↔鋼 그 ��꿴읒밀 穢�들ヴ.
2 localsュ 健華/』 �槪 ≠玲 ��槪 �橒�ヴ. 健 ��� �ュ 珞腱읽頌 들 그 순�� 健華健ヴ; 그 ��� 』鋼 그 순�� �靈』健ヴ. 그甦밀 foo7頌 繇��� 그 ��� 듬 ⌒� 순��槪 ���ュ ��槪 �悚�ヴ: arg (7) 그靴� x (1).
3 ����, �健�鋼 돛�乾 �槪 ≠玲ヴ, 그甦밀 윈壽솥鋼 껍� 릴∽ 珞腱읽槪 arg읒∽ �弄 娛 � �ヴ; 그 �� (그靴� locals읒 �� 繇�)ュ 윈�尤 剛 虔돛� ↔健ヴ. locals鋼 匯� 뎬健��� 匯� 순��� 虔돛�ヴ.

locals健 �을 (��) 健華 ×。읒 會ュヴ�, globalsュ �을 (匯�) 健華×。읒 會ュヴ. 그峀�梟, globals≠ �� �藜頌�뎬, ���� 匯�� 健華×。健 �� �藜�� ┐珞健ヴ.[9] 그 匯�� 健華 ×。鋼 匯�-�影� 순�잴 ���槪 �� �槪 흣梟 껍レ�, 그 匯�읒밀 穢�들 匯� ��잴 �甦맑況 ���ヴ. ∽ヴ≠, 그↔鋼 그 匯�頌 �建들 �┱ ↔健�도 ���ヴ.

from module import 그靴� import module�健� 量健況 ���ュ≠? import module槪 ���윈, 그 匯� 腱譴≠ �建들ヴ, 그壽寧 그↔鋼 그 腱�梟� 健華×。槪 傑��ヴ, 그↔健 力頌 윈壽솥健 그 匯� 健華槪 ���윈 그 匯�� �� 玗鋼 뱃뭡읒 閻깜�꺌梟 �ュ 健傑健ヴ: module.function. 그壽寧 from module import況 ���윈, 윈壽솥鋼 �靈頌 ヴ患 匯�頌쇰� �穢� ��잴 뱃뭡�槪 윈壽솥 腱�� 健華×。劍頌 �建�ヴ, 그↔健 力頌 그�健 傑甦� 읏甦� 匯�槪 �獵�� 꿩�밀 그�槪 曳閻�劍頌 閻깜 (�� �ュ) 健傑健ヴ. globals ��頌, 윈壽솥鋼 �靈頌 健壽� ��槪 섰 ��ヴ.

Example 4.11. Introducing globals

Add the following block to BaseHTMLProcessor.py:

if __name__ == "__main__":
    for k, v in globals().items():             1
        print k, "=", v
1 그峀∽ ��뱉�� �� 譁�, 윈壽솥健 健�읒 健 匯� ↔槪 셜 ↔槪 ����. globals ��ュ ��槪 �橒�ヴ, 그靴� �靴ュ items 回��잴 윈壽-순� ��槪 ���윈 그 ��槪 �손�ヴ. 윈�읒밀 �頌� 傑件� ↔鋼 globals ��흣健ヴ.

健靈 健 맑�潢�況 ��� �乾劍頌쇰� ���� 健塑 �悚槪 숩윈影ヴ:

c:\docbook\dip\py>python BaseHTMLProcessor.py
SGMLParser = sgmllib.SGMLParser                1
__doc__ = None                                 2
BaseHTMLProcessor = __main__.BaseHTMLProcessor 3
__name__ = __main__                            4
__builtins__ = <module '__builtin__' (built-in)>
1 SGMLParserfrom module import況 ���윈, sgmllib頌쇰� �建들ヴ. 그↔健 ┥�ュ 力ュ 曳閻�劍頌 �靴� 匯� 健華×。劍頌 �建��ヴュ ↔槪 ┥�ヴ 그靴� 윈�읒 그↔健 �ヴ.
2 匯� 匯�鋼 doc string槪 ≠�ュ뎬, 蘆� 뱃뭡 __doc__劍頌 閻깜≠��ヴ. 健 匯�鋼 �맸�劍頌 그↔槪 穢��� 꿩뀜劍�頌, 그甦밀 그↔鋼 None健 �셜』健 들ヴ.
3 健 匯�鋼 읫曳 � �甦맑梟槪, BaseHTMLProcessor況, 穢��ヴ, 그靴� 윈�읒 �ヴ. 윈�읒 �ュ 그 』鋼 그 �甦맑� �샥� �譴≠ 껍レ�, 그 �甦맑 腱譴巾槪 ����.
4 if __name__ trick槪 ���ュ≠? 匯�健 ��� ┐ (ヴ患 匯�頌 쇰� 그↔槪 �建�ュ ↔� �獵�윈), 그 蘆� __name__ 뱃뭡鋼, �샥� 』, __main__健ヴ. �靴ュ 健 匯�槪 ��� �乾劍頌쇰� 맑�潢�頌 ���윔劍�頌, __name____main__健�, 그↔健 力頌 globals況 �悚�� 胛� �靴� 虔鋼 �맑� ��≠ ���ュ 健傑健ヴ.
Note
localsglobals ��況 ���윈, 윈壽솥鋼 巾��乾 순�� 』槪 돛�劍頌 寓돗�윈, 그 순�健華槪 珞腱읽頌 靈×� � �ヴ. 健↔鋼 getattr ��� ��槪 �蘆蘆ュ뎬, 그↔鋼 그 ��� 健華槪 珞腱읽頌 靈×�劍頌� 윈壽솥健 巾��乾 ��읒 돛�劍頌 閻깜� � �도刷 �윈 影ヴ.

蘆쇰�劍頌, �健�鋼 �靈頌 健塑 (���  鋼) 籃�劍頌 순��槪 偈��ヴ; localsglobalsュ 그壽� 蘆쇰� 깬�槪 �윈ヴ숩ュ 曳閻�乾 鈐珞健ヴ. 그↔健 力頌 그�健 匯� 순�, 匯� 뱃뭡, 匯� 뎬健��읒, 껍朗塑 靈�꼬健 虔돛�ュ 健傑健ヴ. 윈壽솥鋼 匯� ↔鋼 ∥譴健ヴ�ュ 寧� 귑饒槪 꿱↔健ヴ? 腱, 윈�읒 윈壽솥읒∽ �靴ュ �頌� 귑饒健 �ヴ: 匯� ↔鋼 ��健ヴ.

4.6. ��읒-�競� 珞腱읽 ��優

珞腱읽 ��優ュ 』�槪 珞腱읽읒 휭建�� 胛� 맞� 籃輛槪 靈×�ヴ. 』�鋼 ��읒 寧읽�� 量松頌 그 珞腱읽頌 、、� ��優 ��槪 ���윈 휭建��玲ヴ. 健↔鋼 湧喀�乾 ��읒, �尤 윈壽 』�健 휭建��� ┐ュ, 그↔鋼 �� 居�읒 ≠� 맞� ��乾 ↔鋼 껍レヴ. 윈壽솥鋼 �뮈�∽ 그 珞腱읽槪 � 纜 ��숩� 그 ��≠ �┮� 健�� �ュ 꼬ヴ; 윈壽솥鋼 �뱃�밀 그 珞腱읽槪 居ュ ↔� 』�槪 ≠玲 ��槪 居ュ ↔槪 �손�ヴ.

』�槪 ≠玲 ��槪 ���ュ ��읒 ��槪 ���ュ �꿴�乾 ��� 珞腱읽 ��優≠ �ヴ.

Example 4.12. Introducing dictionary-based string formatting

>>> params = {"server":"mpilgrim", "database":"master", "uid":"sa", "pwd":"secret"}
>>> "%(pwd)s" % params                                    1
'secret'
>>> "%(pwd)s is not a good password for %(uid)s" % params 2
'secret is not a good password for sa'
>>> "%(database)s of mind, %(database)s of body" % params 3
'master of mind, master of body'
1 �맸�乾 』�槪 ≠�ュ �� ��읒, 健壽� ��� 珞腱읽 ��優ュ ��槪, params槪 ���ヴ. 그靴� 그 珞腱읽 뱃읒 �뮈� %s �� ��읒, 健 ��鋼 □繇頌 �壽 몃윈玲 健華槪 ≠玲ヴ. 健 健華鋼 params ��� �頌 ������ 그靴� 그 �皆�ュ 』, secret槪, %(pwd)s槪 ���윈 ���ヴ.
2 ��-�競 珞腱읽 ��優ュ �┱ ⌒�� 健華�ュ �잴도 虔돛�ヴ. 、 �ュ ��玲 ��읒 梧��꺌梟 �ヴ, 그峀� 꿩劍� 그 ��優ュ KeyError況 ≠�� ��� ↔健ヴ.
3 윈壽솥鋼 ���  鋼 �況 듬 纜 �穢� � �ヴ; 、、 ��� ┐譁ヴ  鋼 』劍頌 ��� ↔健ヴ.

그甦밀 � 윈壽솥鋼 ��읒-�競� 珞腱읽 ��優況 ���淞� �ュ≠? 舡, ヴ舡 娛읒밀 �잴 』�槪 ≠�ュ ��槪 밈穢�윈 �뮈 �∽ 珞腱읽 ��優況 ��ュ ↔鋼 樓朗 �도�∽ 숩健ュ ↔  ヴ; 윈壽솥健 �藜�ュ �잴 』�槪 健藜 ≠�� �ュ ±�읒ュ 그↔鋼 �靈頌 ��尤 傑��ヴ. 譁� locals庚韶.

Example 4.13. Dictionary-based string formatting in BaseHTMLProcessor.py

    def handle_comment(self, text):
        self.pieces.append("<!--%(text)s-->" % locals()) 1
1 蘆� locals ��況 ���ュ ↔鋼 ��읒-�競� 珞腱읽 ��優� ≠� �� ��輛健ヴ. 그↔健 ┥�ュ 力ュ 윈壽솥鋼 윈壽솥� 珞腱읽읒밀 �을 순�� 健華槪 ��� � �ヴュ ↔槪 �藜�� (健 ±�읒, text乾뎬, 그↔鋼 그 �甦맑 回��읒 乾�頌 ��倪ヴ) 그靴� 、、� 健華��玲 순�ュ 그� 』劍頌 ��� ↔健ヴュ ↔槪 �藜�ヴ. 梟� text'Begin page footer'��, 그 珞腱읽 ��優 "<!--%(text)s-->" % locals()ュ 健 珞腱읽 '<!--Begin page footer-->' 劍頌 ��≠ � ↔健ヴ
    def unknown_starttag(self, tag, attrs):
        strattrs = "".join([' %s="%s"' % (key, value) for key, value in attrs]) 1
        self.pieces.append("<%(tag)s%(strattrs)s>" % locals())                              2
1 健 回��≠ 繇���, attrs鋼 �/』 ��� 靴맑�健ヴ, 譁� ��� items�  鋼뎬, 그↔鋼 �靴≠ 윈壽-순� ��槪 ���윈 그↔槪 �손� � �ヴュ ↔槪 �藜�ヴ. 健↔鋼 ����ュ �밍� ��健�꺌梟 �ヴ, 그壽寧 윈�읒밀ュ 玲��숩껍꺌 � 皇鋼 ↔健 �ヴ, 그壽�頌 그↔槪 솥�� 숩腱:
  1. attrs[('href', 'index.html'), ('title', 'Go to home page')]健�� ≠穢��.
  2. 그 靴맑� 纜을� 憬 纜櫻 耘�읒밀, key'href'況 寓돗� ↔健ヴ, 그靴� value'index.html'槪 寓돗� ↔健ヴ.
  3. 그 珞腱읽 ��優 ' %s="%s"' % (key, value)' href="index.html"'頌 �穢� ↔健ヴ. 健 珞腱읽鋼 그 靴맑� 纜을� �橒 』� ≠� 憬 纜櫻 �뱉≠ 들ヴ.
  4. 듬 纜櫻 耘�읒밀, key'title'槪 寓돗� ↔健ヴ, 그靴� value'Go to home page'況 寓돗� ↔健ヴ.
  5. 그 珞腱읽 ��優ュ ' title="Go to home page"'頌 �穢陵 ↔健ヴ.
  6. 그 靴맑� 纜을鋼 健壽� 듬 ⌒� 庚靴들 �� 珞腱읽槪 �鋼 靴맑�況 �橒�ヴ, strattrs鋼 健 靴맑�� 듬 ⌒ �뱉況 ���윈 ' href="index.html" title="Go to home page"'況 �뭡� ↔健ヴ.
2 健靈, ��읒-�競� 珞腱읽 ��優況 ���윈, �靴ュ tagstrattrs� 그 』槪 珞腱읽頌 휭建�ヴ. 그甦밀 梟� tag'a'��, 그 譁�恢 ��ュ '<a href="index.html" title="Go to home page">'健 � ↔健ヴ, 그靴� 그↔鋼 力頌 self.pieces읒 偈≠�ュ ↔健ヴ.
Important
locals槪 ≠�� ��읒-�競� 珞腱읽 ��優況 ���ュ ↔鋼 믹맞� 籃輛劍頌 손�� 珞腱읽 ��優 ���槪 � 居� 릴∽ �ヴ, 그壽寧 그↔읒ュ �≠≠ �患ヴ. locals읒 �� 繇��ュ뎬읒ュ �。� �� 潔둥健 �ヴ. 件��劍頌, ‘穢�ュ ↔ 梟劍頌ュ 潔솥�� 꿩ヴ, 그壽寧 윈壽솥健 (靴맑� 纜을槪 ���윈) 珞腱읽 ��優 ���槪 耘둥健읒 ≠�� �ヴ�, 윈壽솥鋼 껍譁도 穢��乾 ��읒-�競� ��況 ���윈꺌 �ヴ.

4.7. 뱃뭡』槪 乾���

comp.lang.python읒ュ �� 健塑 �珞健 �ヴ. “寧ュ 乾�쇰繇優 �� 꿩鋼 뱃뭡 』�槪 ≠�ュ 件�� HTML 珞밀�槪 ≠�� �ュ뎬, 寧ュ 그↔� 匯듬況 �據尤 乾�쇰繇優 �� 멸ヴ. �┍∽ 寧ュ 健峀∽ � � �ュ≠?”[10] (健塑 �珞鋼 件��劍頌 �頌鹽� �靴腱읒 ��밀 タ葉 �ュ뎬 그�鋼 HTML鋼-�影-健�ュ ��槪 ≠�� ��� �頌鹽�況 ���� 匯� �健��鋼 HTML 乾�腱頌 ��맸 傑湧뭡槪 ↑�良껍꺌 �ヴ� ���ヴ. 乾�쇰繇優�� 꿩鋼 뱃뭡 』�鋼 HTML �影읒 �� �� 胛��胛健ヴ.) 健傑꺌 �倻�, 乾�쇰繇優 �� 꿩鋼 뱃뭡 』�鋼 HTML槪 BaseHTMLProcessor읒 �� 둥煌劍頌�, 릴∽ �� � �ヴ.

BaseHTMLProcessorュ HTML槪 誨�밀 (���� 그↔鋼 SGMLParser� 腱믹健� ┐珞健ヴ) 그靴� 돛�� HTML槪 �껍 恬ヴ, 그壽寧 그 HTML �悚��ュ 그 建悚� 돛件�� 꿩ヴ. �그잴 뱃뭡 健華�鋼 �刷 그↔�健 �珞腱 玗鋼 辱��� 맸虔��槪 ��도, 뱉珞腱頌 ��≠ 陵 ↔健ヴ, 그靴� 뱃뭡 』�鋼, 그↔�健 �件 乾�쇰繇 玗鋼 껍朗塑 乾� 쇰繇頌 맸虔�� 꿩뀜槪 ��도 , 健燁 乾�쇰繇頌 �壽 몃윈�∽ � ↔健ヴ. �靴≠ 健�� � �ュ ↔鋼 力頌 健 譁�恢 쇰虔�健ヴ.

Example 4.14. Quoting attribute values

>>> htmlSource = """        1
...     <html>
...     <head>
...     <title>Test page</title>
...     </head>
...     <body>
...     <ul>
...     <li><a href=index.html>Home</a></li>
...     <li><a href=toc.html>Table of contents</a></li>
...     <li><a href=history.html>Revision history</a></li>
...     </body>
...     </html>
...     """
>>> from BaseHTMLProcessor import BaseHTMLProcessor
>>> parser = BaseHTMLProcessor()
>>> parser.feed(htmlSource) 2
>>> print parser.output()   3
<html>
<head>
<title>Test page</title>
</head>
<body>
<ul>
<li><a href="index.html">Home</a></li>
<li><a href="toc.html">Table of contents</a></li>
<li><a href="history.html">Revision history</a></li>
</body>
</html>
1 ��� ↔鋼 <a> �그읒 �ュ href 뱃뭡� 뱃뭡 』健 �據尤 乾�쇰繇優 �� 꿩뀜ヴュ ↔健ヴ. (�� ��� ↔鋼 �靴≠ 珞밀優 珞腱읽 (doc string)恍� ヴ患 �┱ ↔槪 胛�윈 �燁 乾�쇰繇況 ���� �ヴュ ↔健ヴ. 그靴� IDE 꿴읒밀 曳閻�劍頌, 力頌 (���ヴ). 그↔�鋼 ��尤 傑��ヴ.)
2 그 �묠�(parser)읒∽ 誨윈�.
3 BaseHTMLProcessor읒 穢�들 output ��況 ���윈, �靴ュ 그 �悚��況, 乾�쇰繇優 들 뱃뭡 』劍頌 잽뭡들, �⌒� 珞腱읽頌 寓돗�ヴ. 健↔劍頌 件健 ヴ 怒祿 �健 숩健�梟, �譁寧 皇鋼 件�健 윈�읒밀 �靈頌 件�寧ュ� 휸、� 숩�: SGMLParserュ 그 �譴 HTML 珞밀況 �묠�윈, 그↔槪 tags, refs, data, 그靴� ��劍頌 솥��ヴ; BaseHTMLProcessorュ 그壽� �뱉�槪 ���윈 HTML 獵、�槪 �깬뭡�윔ヴ (윈壽솥健 그↔�槪 숩� 멸ヴ�, 그↔鋼 parser.pieces읒 윈�尤 ��들ヴ); �균, �靴ュ parser.output槪 繇��ヴ, 그↔鋼 匯� HTML 獵、�槪 �⌒� 珞腱읽頌 ���ヴ.

4.8. dialect.py況 뱉⌒��

DialectizerBaseHTMLProcessor� 。�� (그靴� 그�그塑) 腱믹健ヴ. 그↔鋼 �맑� �刷槪 件蘇� �譴�穢뱃읒 ��맸�ヴ, 그壽寧 그↔鋼 蟯��∽ <pre>...</pre> �刷뱃읒 �ュ �┱ ↔도 그�頌 ��맸�ヴ.

<pre> �刷槪 庚靴�� 胛�밀, �靴ュ 듬 ⌒� 回��況 Dialectizer 꿴읒 穢��ヴ: start_pre 그靴� end_pre.

Example 4.15. Handling specific tags

    def start_pre(self, attrs):             1
        self.verbatim += 1                  2
        self.unknown_starttag("pre", attrs) 3

    def end_pre(self):                      4
        self.unknown_endtag("pre")          5
        self.verbatim -= 1                 6
1 start_preSGMLParser<pre> �그況 그 HTML 뱉맑읒밀 烙¨� ┐譁ヴ �纜 繇�들ヴ. (�맸 �읒, �靴ュ 健↔健 穢蟯�∽ �┍∽ 件�寧ュ� ��섰 ↔健ヴ.) 그 回��ュ �⌒� �⌒순�, attrs況 ��ヴ, 그↔鋼 (梟� �ヴ�) 그 �그� 뱃뭡槪 �� �ヴ. attrsュ, unknown_starttag≠ ��ュ ↔庚韶 �/』 ��� 靴맑�健ヴ.
2 reset 回��읒밀, �靴ュ <pre> �그況 胛� ���頌 ���ュ � 뎬健� 뱃뭡槪 競�優�ヴ. �靴≠ <pre> �그況 梟寧∽ � ┐譁ヴ, �靴ュ 그 ���況 �≠맸�ヴ; �靴ュ </pre> �그況 梟陵 ┐譁ヴ, �靴ュ 그 ���況 \뱉맸�ヴ. (�靴ュ 健↔槪 �甦그頌 �뮈�∽ ��� �도 �ヴ 그甦밀 그↔槪 1 頌 밈穢�� 0劍頌 �밈穢�ヴ, 그壽寧 ��� 籃輛鋼 健↔� 譁�≠�頌 릴∽ 그峀∽ � � �ヴ, 그靴� ��� 籃輛鋼 蘆�들 <pre> �그�ュ 권健� (그峀�梟 ≠��) ±�도 庚靴�ヴ.) �맸 �읒, �靴ュ 健 ���≠ �┍∽ 傑��∽ ��� � �ュ�況 �� 섰 ↔健ヴ.
3 力頌 그↔健ヴ, 그↔健 力頌 �靴≠ <pre> �그況 胛�윈 �靴≠ � 읫曳 傑件� �샥� 庚靴健ヴ. 健靈 �靴ュ 뱃뭡�槪 �鋼 그 靴맑�況 unknown_starttag頌 읓マ껍 �弄 影ヴ 그甦밀 그↔鋼 �셜 밈穢들 庚靴況 � � �ヴ.
4 end_preSGMLParser</pre> �그況 烙¨� ┐ 譁ヴ 繇�들ヴ. 怒 �그ュ 뱃뭡槪 ��� � 꼬劍�頌, 그 回��ュ �⌒순�況 ��� 꿩ュヴ.
5 憬 纜櫻頌, �靴ュ ヴ患 �─� 怒 �그 (end tag)잴 譁�≠�頌 �  健, �셜 밈穢들 庚靴況 �� 읏�ヴ.
6 듬纜櫻頌, �靴ュ ���況 �寧 \뱉맸� 健 <pre> �刷鋼 �� 玲ヴ.

健 맸瑩읒밀, SGMLParser頌 獵� � �健 ����≠ュ ↔鋼 ≠�≠ �ヴ. 寧ュ �� ∠獵�ュ뎬 (그靴� 윈壽솥鋼 ���� 그↔槪 礪� 쟁槪 �乾뎬) SGMLParserュ 、、� �그況 胛� �샥� 回��況, 梟� 梧��ヴ�, 黔껍밀 繇��ヴュ ↔健ヴ. 잇況 ��, �靴ュ 籃� start_preend_pre� 穢�≠ <pre></pre>況 庚靴�ュ ↔槪 숩뀜ヴ. 그壽寧 �┍∽ 健↔健 ≠��≠? 舡, 그↔鋼 譁輛健 껍レヴ, 그↔鋼 �� �健�� �禾� ��輛健ヴ.

Example 4.16. SGMLParser

    def finish_starttag(self, tag, attrs):               1
        try:
            method = getattr(self, 'start_' + tag)       2
        except AttributeError:                           3
            try:
                method = getattr(self, 'do_' + tag)      4
            except AttributeError:
                self.unknown_starttag(tag, attrs)        5
                return -1
            else:
                self.handle_starttag(tag, method, attrs) 6
                return 0
        else:
            self.stack.append(tag)
            self.handle_starttag(tag, method, attrs)
            return 1

    def handle_starttag(self, tag, method, attrs):
        method(attrs)                                    7
1 健 맸瑩읒밀ュ, SGMLParserュ 健藜 맸虔 �그(start tag)況 烙¨�� 그 뱃뭡 靴맑�況 �맑�윔ヴ. �꺌� 錄鋼 傑件� 件鋼 健 �그況 胛� �穢� 庚靴 回��≠ �ュ� 玗鋼 �셜 回��(unknown_starttag)읒 �梧�꺌 �ュ �況 꿱껍蘆ュ 件健ヴ .
2 SGMLParser� “譁輛”鋼 �靴� 읫腥 �깬, getattr읒 숄��ヴ. 윈壽솥健 健�읒 ��� 鰥�槪 �도 �ュ ↔鋼 getattr健 � ∥譴� 腱믹�� 그 ∥譴 腱�읒 穢�들 回��況 黔槪 ↔健ヴ�ュ ↔健ヴ . 윈�읒밀 그 ∥譴ュ self, ��� �譴健ヴ. 그甦밀 梟� tag'pre'��, 健壽� getattr繇�鋼 �� �譴읒밀 start_pre 回��況 黔槪 ↔健ヴ, 그↔鋼 Dialectizer �甦맑� �譴健ヴ.
3 梟� 腱�健 黔� �ュ 그 回��≠ 그 ∥譴읒밀 梧��� 꿩ュヴ� (玗鋼 그� 腱믹 �뒵읒도 꼬ヴ�), getattrAttributeError況 件劍�ヴ, 그壽寧 그↔鋼 珞靈≠ 꼬ヴ, ���� �靴ュ getattr읒 �� 繇�槪 try...except �刷 꿴읒 몃�� 그靴� �맸�劍頌 AttributeError況 寧��� ┐珞健ヴ.
4 �靴ュ start_xxx 回��況 烙¨�� 鰥�劍�頌, �靴ュ �� ���� �읒 do_xxx 回��況 黔껍셜ヴ. 健壽� 健華 �譴 �簫鋼 <br>�  鋼 덱潢� �그�槪 胛� 件��劍頌 ��들ヴ, 그↔�鋼 �皆�ュ 怒 �그況 ≠�� 꿩ュヴ. 그壽寧 윈壽솥鋼 健華禮� �簫 �� ��� � �ヴ; 윈壽솥健 숩ヴ맸�, SGMLParserュ 匯� �그읒 ��윈 � ヴ況 맸도�ヴ. (그峀�梟, 윈壽솥鋼 start_xxxdo_xxx 庚靴 回��況  鋼 �그읒 � ヴ 穢��밀ュ 꿴 들ヴ; 읫曳 start_xxx 回��梟健 繇�� ↔健ヴ.)
5 � �寧� AttributeError, 그↔鋼 getattr읒 �� 繇�健 do_xxx읒 ���윔ヴュ ↔槪 �藜�ヴ. �靴ュ start_xxxdo_xxx 回��도 烙¨�� 鰥�윔劍�頌, �靴ュ 그 잇�況 寧��ヴ 그靴� �셜 回��, unknown_starttag읒 ���ヴ.
6 ��� ↔鋼, try...except �刷鋼 else 據槪 ≠� � �ヴ, 그↔鋼 try...except �刷槪 ���ュ 돛꿴읒 �┱ 잇�도 件�寧� 꿩劍� 繇�들ヴ. �靴�劍頌, 그↔鋼 �靴≠ 健 �그況 胛� do_xxx 回��況 烙¨�윔ヴュ ↔槪 �藜��, 그甦밀 �靴ュ 그↔槪 繇�� ↔健ヴ.
7 start_xxxdo_xxx 回��ュ 曳閻�劍頌 繇��� 꿩ュヴ; �그, 回��, 그靴� 뱃뭡�鋼 健 ��, handle_starttag읒 �弄玲ヴ, 그甦밀 腱믹�鋼 그↔槪 뒨�껨 � �劍� 그靴� 匯� 맸虔 �그≠ 숩蘆��ュ 籃�槪 순±� � �ヴ. �靴ュ 그 穢도 �影� 靈�況 ��頌 ��ュ 꿩ュヴ, 그甦밀 �靴ュ �� 健 回��≠ 그� 件槪 �도刷 그�頌 듬�梟 �� 들ヴ, 그 件健� 그 回��況 (start_xxx �ュ do_xxx) 뱃뭡�� 靴맑�頌 繇��ュ ↔健ヴ. ��� ↔鋼, methodュ ��健�, getattr劍頌 쇰� �橒들ヴ, 그靴� ��ュ ∥譴健ヴ. (윈壽솥健 깃≠ �­도刷 그 恍槪 ��장 ↔槪 寧도 꿴ヴ, 그靴� �뱃� ↔鋼 �靴≠ �靴� �도況 胛�윈 그↔槪 健��ュ �頌� 籃�槪 黔�況 그梟 듬腱 譁腱 寧ュ 그 恍槪 �ュ ↔槪 燁�� ↔健ヴ.) 윈�읒, 그 �� ∥譴ュ 健 숩蘆� 回��읒 乾�頌 �弄�玲ヴ, 그靴� 健 回��ュ �둥껍밀 그 ��況 繇��ヴ. 健 맸瑩읒밀, �靴ュ 그 ��≠ 朗�乾�, 그 健華健 朗�乾�, 玗鋼 �뒵읒밀 穢� ��ュ� 꿱 ��ュ 꼬ヴ; �靴≠ 그 ��읒 ��밀 꿱껍꺌 � 傑件� ↔健 �ヴ� 그↔健 � ⌒� 乾�, attrs頌 繇�들ヴュ ↔健ヴ.

健靈 �靴� �寓들 �頌그少劍頌 �둥껍≠腱: Dialectizer. �靴≠ ─陵 ┐ュ, �靴ュ <pre></pre> �그況 胛� �샥� 庚靴 回��況 穢��� �ュ 燁健�ヴ. 읫曳 �≠� �꺌� 件梟健 錄껍 �ヴ, 그靴� 그↔鋼 �맑� �刷槪 �靴≠ 藜靴-穢�� ��蘿頌 庚靴�ュ ↔健ヴ. 그↔槪 胛�윈, �靴ュ handle_data 回��況 뒨� 껨 ��≠ �ヴ.

Example 4.17. Overriding the handle_data method

    def handle_data(self, text):                                         1
        self.pieces.append(self.verbatim and text or self.process(text)) 2
1 handle_dataュ 읫曳 �寧� 乾�, 庚靴�꺌� �맑�頌 繇�들ヴ.
2 그 獵� BaseHTMLProcessor읒밀, handle_data 回��ュ �뮈�∽ 그 �맑�況 그 �悚 ��, self.pieces읒 偈≠�ヴ. 윈�읒밀 그 �靴ュ �。 � 손�� 穢도健ヴ. 梟� �靴≠ <pre>...</pre> �刷꿴읒 �ヴ�, self.verbatim0숩ヴュ � �┱ 』健 � ↔健ヴ, 그靴� �靴ュ 그 �맑�況 그 �悚 ��읒 순±�� 꿩鋼 梗頌 ��況 읏�ヴ. 그峀� 꿩劍�, �靴ュ 샥⌒� 回��況 繇��윈 그 ��況 庚靴��, 그靴�ュ 그 庚靴� ��況 그 �悚 ��頌 �∽ � ↔健ヴ. �健�읒밀, 健↔鋼, and-or ��況 ���, �-娛 若靴 ��健ヴ.

�靴ュ �� Dialectizer況 健��∽ ��ヴ. �≠� �玲 瑩健�� 그 �맑� �� 腱譴� 뭡�健ヴ. 윈壽솥健 獵�健�도 �槪 꿴ヴ�, 손� �맑� ��≠ ��� ┐, 傑件� �靈� �輛鋼 穢깍 ���健ヴ�ュ ↔槪 윈壽솥鋼 꿱 ↔健ヴ.

4.9. 穢 깍 � � � 101

穢깍 ���鋼 손�� ��� 珞腱읽槪 ≠玲 �맑�況 �명��, ����, ���ュ ∠悚� (그靴� ��尤 �影優들) 籃輛健ヴ. 윈壽솥健 (��  鋼) ヴ患 꿇�읒밀 穢깍 ���槪 ��� 숩뀜ヴ�, 윈壽솥鋼 健 뭏�槪 �樓 �� �뮈尤 re 匯�� ��槪 居�밀 ≠�� ���� 그�� 乾��槪 � 纜 ��숩�.

珞腱읽鋼 �� (index, find, 그靴� count), �� (replace), 그靴� �명(split)槪 胛� 回��況 ≠玲ヴ, 그壽寧 그↔�鋼 ≠� �뮈� ±�읒 �穢�� �ヴ. �� 回��ュ �⌒�, �粱尤-虔뭡들 �쇰珞腱읽槪 黔ュヴ, 그靴� 그↔�鋼 �� �-뱉珞腱읒 濾\�ヴ; 珞腱읽 s況 �뱉珞腱 濾\ ��槪 �淞�, 윈壽솥鋼 s.lower() �ュ s.upper()況 繇��꺌 �� 그靴� 윈壽솥� �� 珞腱읽健 �뱉珞腱읒 會ュ� 蟯乾�꺌 �ヴ. replacesplit 回��도  鋼 靈�槪 ≠玲ヴ. 윈壽솥鋼 ��梟 �ヴ� 그↔�槪 ���꺌 �ヴ (그↔�鋼 �花� 居�읒 릴ヴ), 그壽寧 英 � 손�� ↔�槪 胛�밀ュ, 윈壽솥鋼 穢깍 ���劍頌 ��況 �윈꺌梟 � ↔健ヴ.

Example 4.18. Matching at the end of a string

健壽� 件蘇� 잇靈�鋼, �頌� 맸맑�劍頌 �建�� �읒 �梧� 맸맑�劍頌쇰� ����장 �靴 �뱉況 �影優�� 靈��ュ, 蘆≠ ≠倪タ �靈-�� 珞靈읒밀 �朗들 ↔健ヴ. (윈壽솥, 寧ュ 그� 健↔槪 �藜頌 梟�� 꿩뀜ヴ; 그↔鋼 �靈頌 傑��ヴ.)

>>> s = '100 NORTH MAIN ROAD'
>>> s.replace('ROAD', 'RD.')               1
'100 NORTH MAIN RD.'
>>> s = '100 NORTH BROAD ROAD'
>>> s.replace('ROAD', 'RD.')               2
'100 NORTH BRD. RD.'
>>> s[:-4] + s[-4:].replace('ROAD', 'RD.') 3
'100 NORTH BROAD RD.'
>>> import re                              4
>>> re.sub('ROAD$', 'RD.', s)              5 6
'100 NORTH BROAD RD.'
1 寧� ��ュ �靴 �뱉況 �影優 �ュ ↔健�밀 'ROAD'ュ �� 'RD.'頌 휸簫�도刷 �ュ ↔健�ヴ. 庚舡 숩�읒ュ, 寧ュ 健↔健 맞�밀 그 珞腱읽 回�� replace槪 ��� � �劍靴�� 휸、�ヴ. �균, 匯� 뎬健�ュ �珞腱頌 影� ���, 그甦밀 � 숄件�ュ 珞靈≠ 꿴 � ↔健ヴ. 그靴� 그 �� 珞腱읽, 'ROAD'ュ, ��윔ヴ. 그靴� 健塑 껍� 。�� 잇靈읒밀ュ, s.replaceュ �靈頌 虔돛�ヴ.
2 �鋼, 숄��∽도, ��� 잇�頌 潔梟�ヴ, 그靴� 寧ュ �맸 健壽� 잇況 烙¨�ヴ. 윈�읒밀� 珞靈ュ 'ROAD'≠ 그 �뱉읒밀 듬 纜 寧�祿ヴュ ↔健ヴ, �纜鋼 그 �靴 健華 'BROAD'� � 쇰솥劍頌밀 그靴� � �纜鋼 그 腱譴 ��頌. replace 回��ュ 健↔�槪 듬 纜 ��� ↔劍頌 。��� 獪��劍頌 그 � ヴ況 ���ヴ; 그 돛꿴읒, 寧ュ 寧� �뱉≠ �권들 ↔槪 셜ヴ.
3 �⌒ 健�� 'ROAD' �쇰珞腱읽槪 ≠�ュ �뱉�� 珞靈況 ���� 胛�밀, �靴ュ 健壽� ↔읒 繇뱉� � �ヴ: 읫曳 그 �뱉� 譁�恢 4 珞腱읒밀梟 (s[-4:]) 'ROAD'況 ���� ���ヴ 그靴� 그 珞腱읽� 寧蛔�ュ 그�頌 뒬ヴ (s[:-4]). 그壽寧 윈壽솥鋼 健↔健 健藜 ���� 鰥�槪 꿱 � �ヴ. 잇況 ��, 그 ��鋼 �靴≠ ���� �ュ 珞腱읽� 굶健읒 �梧�ヴ (梟� �靴≠ 'STREET''ST.'頌 ���ヴ�, �靴ュ s[:-6]s[-6:].replace(...)況 ��� ��≠ �槪 ↔健ヴ). 윈壽솥鋼 윈범 マ �읒 ヴ맸 둥껍잴 健↔槪 뒵���� 멸鋼≠? ��도 寧ュ 그壽� 멸� 꿩ヴ.
4 健靈 穢깍 ���劍頌 ��況 이�〈 맸。健ヴ. �健�읒밀, 穢깍 ���� �蘇들 匯� ��鋼 re 匯�읒 �� �ヴ.
5 憬 纜櫻 �⌒순�況 � 纜 숩腱: 'ROAD$'. 健↔鋼 �� 。�� 穢깍 ���劍頌밀 그↔健('ROAD') 珞腱읽� 怒읒밀 ��� ┐梟 그↔槪 羊��ヴ. $ �맸ュ “珞腱읽� 怒”槪 ┥�ヴ. (�皆�ュ 珞腱읽도 �ュ뎬, 豈旦� ^珞腱頌밀, “珞腱읽� 맸虔”槪 �藜�ヴ.)
6 re.sub ��況 ���윈, �靴ュ 그 珞腱읽 s읒밀 穢깍 ��� 'ROAD$'況 黔껍 그↔槪 'RD.'頌 ���ヴ. 健↔鋼 s 珞腱읽� 怒읒 �ュ ROAD잴ュ 件��ヴ, 그壽寧 BROAD� � 쇰솥乾 ROAD 잴ュ 件��� 꿩ュヴ, ���� 그↔鋼 s� ≠�뎬읒 �� ┐珞健ヴ.

Example 4.19. Matching whole words

>>> s = '100 BROAD'
>>> re.sub('ROAD$', 'RD.', s)     1
'100 BRD.'
>>> re.sub('\\bROAD$', 'RD.', s)  2
'100 BROAD'
>>> re.sub(r'\bROAD$', 'RD.', s)  3
'100 BROAD'
>>> s = '100 BROAD ROAD APT. 3'
>>> re.sub(r'\bROAD$', 'RD.', s)  4
'100 BROAD ROAD APT. 3'
>>> re.sub(r'\bROAD\b', 'RD.', s) 5
'100 BROAD RD. APT 3'
1 �뱉況 ��ュ 寧� 健꺌�況 玲���밀, 寧ュ ◎ 그 �뱉� 譁�恢읒 �ュ 'ROAD'況 羊��ュ, 健 �� 잇靈ュ 潔솥�� 꿩舡槪 ��뀜ヴ, ���� 匯� �뱉�健 �靴 �맸況 ���ュ ↔鋼 껍レ� ┐珞健ヴ; �┱ ↔�鋼 �� 그 �뱉 健華劍頌梟 怒祿ヴ. �쇰솥� 맸。槪, 寧ュ 그塑�頌 剛 �屢ヴ, 그壽寧 梟� 그 �靴 健華健 'BROAD'��, 그壽� 그 穢깍 ���鋼 'BROAD'� 쇰솥乾, 그 珞腱읽� 譁�恢읒 �ュ 'ROAD'況 羊��淞 � ↔健ヴ, 그↔鋼 蘆≠ 읏� ↔健 껍レヴ.
2 蘆≠ 玲若頌 읏� ↔鋼 'ROAD'況 羊禮ュ ↔健�ヴ. 그↔鋼 珞腱읽� 譁�恢읒 ��꺌 �� 그靴� � � ��� 쇰솥健 껍レ�, 그 腱譴頌 잽�� ��윈꺌 �ヴ. 健↔槪 穢깍 ���劍頌 ���� 胛�밀, 윈壽솥鋼 \b況 ���ヴ, 그↔鋼 “� ��� ±�ュ ��맸 윈�읒밀 件�寧꺌 �ヴ”ュ ↔槪 �藜�ヴ. �健�읒밀, 健↔鋼 손��ヴ. 珞腱읽꿴읒밀 '\' 珞腱ュ ��맸 ��꺌梟 �� ┐珞健ヴ. (健↔鋼 ┐頌 �만甦맞 ��頌 꿇급�� �ヴ, 그靴� 그↔健 力頌 穢깍 ���健 �健�숩ヴュ �읒밀 � 맞� �≠� 健傑健ヴ. (그壽寧) �鋼 ≒읒밀 숩�, �鋼 穢깍 ���槪 ヴ患 깬珞� 辱��ヴ, 그甦밀 윈壽솥健 �그況 ≠�� �ヴ�, 그↔健 珞輛읒밀� �그乾� 껍レ� 윈壽솥� 穢깍���읒밀� �그乾� 깬샥��≠ ���� � 도 �ヴ.)
3 �만甦맞 ��잴 �幢 虔꾑�淞�, 윈壽솥鋼 깖腱 r� �幢 '...'況 꽜읒 숏巾劍頌�, 健患力 藜≠× 珞腱읽槪 ��� � �ヴ. 健↔鋼 �健�읒∽ 健 珞腱읽읒ュ 껍朗↔도 ��葉밀ュ 꿴들ヴュ ↔槪 �맸�ヴ; '\t'ュ � 珞腱健ヴ, 그壽寧 r'\t'ュ �靈頌 그 �만甦맞 珞腱 \頌밀 깖腱 t≠ �患ヴ. 穢깍 ���槪 ヴ� ┐읒ュ �� 藜≠× 珞腱읽槪 ����況 寧ュ ���ヴ, 그峀� 꿩劍� ��鋼 樓朗 횰靴 辱듄맑壽�玲ヴ (그靴� 穢깍 ��� 그 腱譴梟劍頌도 ��梟읒 손��玲ヴ).
4 *�~�* 숄��∽도, 寧ュ ◎ 寧� �靴읒 ���ュ � 皇鋼 ±�況 烙¨�ヴ. 健 ±�읒, 그 �靴 �뱉ュ �� 'ROAD'況 그 腱譴� 장�� 珞腱頌 �� ��ヴ, 그壽寧 그↔鋼 譁�恢읒 �� 꿩뀜ヴ, ���� 그 �뱉ュ �靴 �맸딘읒 껍�� 밝腱況 ≠�� �� ┐珞健�ヴ. 'ROAD'ュ 그 珞腱읽읒 譁�恢읒 �� 꿩� ┐珞읒, 그↔鋼 件��� 꿩�, 그甦밀 re.sub읒 �� �譴�乾 繇�鋼 껍朗↔도 ���� 꿩ュ �頌 怒健 祿ヴ, 그靴� �靴ュ 그 읏甦� 珞腱읽槪 ヴ맸 寓돗�∽ ��, 그↔鋼 �靴≠ 읏� ↔健 껍レヴ.
5 健 珞靈況 ���� 胛�윈, 寧ュ $ 珞腱況 靈��� � ヴ患 珞腱 \b況 偈≠�ヴ. 健靈 그 穢깍 ���鋼 譁�恢, 庚舡, 玗鋼 ≠�뎬� �뒵읒 ���, “ 그 珞腱읽� �뒵읒밀� 그 腱譴頌 장�� 珞腱읽件 ┐ュ, 'ROAD'況 羊���”.

健↔鋼 穢깍 ���健 � � �ュ ↔ 燁 ��� 件、件 흣健ヴ. 그↔鋼 �京寧∽ ∠悚�ヴ, 그靴� 甄 �譴≠ 그↔槪 ヴ�ュ뎬 力�玲ヴ. 그↔鋼 匯� 珞靈읒 �� �鋼 �輛鋼 껍レヴ. 윈壽솥鋼 그↔�읒 ��밀 潔솥尤 ��밀 꿇靈 그↔�健 �據��況 꿱껍꺌梟 ��, 그靴� 꿇靈 그↔�健 珞靈況 ��숩ヴュ 珞靈況 件劍��梟 �∽ � �況 꿱껍꺌梟 �ヴ.

 

�┱ �┶�鋼 珞靈읒 셧恙��, “寧도 꿱껍, 寧ュ 穢깍 ���槪 껨�꺌”�� 휸、�ヴ. 健靈 그�鋼 듬 ⌒� 珞靈況 ≠玲ヴ.

 
--Jamie Zawinski, in comp.lang.emacs 

� 居�꺌 � ↔

4.10. 그↔槪 匯듬 ���

藜꿴��梟, 윈壽솥鋼 健 �� 譁�恢읒 도マ �劍� 健↔健 ���� 虔뭡들 匯� ↔健ヴ. http://diveintopython.org/읒頌 둥껍≠ 玗맸 ″�들 ↔健 �ュ� 瑩↑�윈 숩�.



[7] SGMLParser잴  鋼 �묠�況 胛� ���乾 ��ュ 뱉�腱 (consumer)健ヴ: 그↔鋼 HTML槪 誨�밀 그↔槪 솥��ヴ. 藜靴 잇��∫�梟, 그 健華 feedュ “뱉�腱 (consumer)”�ュ �譴 솥胛�읒 會도刷 묽���ヴ. ⌒乾�劍頌ュ, 그↔鋼 껍朗塑 寧朗, �蘿, 玗鋼 ヴ患 �┱ 휸�� ��도 꼬ュ ��� �靴梟 �ュ 돛蘿읏읒밀� �맸耘況 휸、寧∽ �ヴ, 그壽寧 윈壽솥健 ���∽ � 力頌 밀밀 �靈頌 ≠�健 �� 숩� 윈壽솥鋼 듬 ⌒� 쇰靴쇰靴� デ健 � 딘臆 匯�健頌쇰� 윈壽솥槪 �껍숩� �ヴュ ↔槪 ニ年 � �ヴ, 그壽寧 윈壽솥鋼 맑맑頌 胛頌��況 �↔鋼 �� 윈壽솥� 譁舡健 윈壽솥읒∽ 뱃巾�況 꾄� �ュ ↔健ヴ�� 胛頌�ヴ, 그靴� "그 �譴 ��鋼 �뮈尤 ���ュ �靴≠ 껍レヴ"�� 蟯��∽ 윈壽솥健 恍� � �ュ 傑件� 健傑ュ 迲�胛� 虔鋼, 그寧譁 朗밞� 꿩鋼 ��읒 健峀∽ 꼈윈 �� ┐珞健ヴ, “그 �묠� (parser)읒∽ 誨健況 �� 譁맸읫.” 그壽寧 껍譁도 그↔健 寧 件�도 匯患ヴ. �倻�。읒, 그↔鋼 �藜頌� �� (譁舡� 그煌)健ヴ.

[8] �健�健 珞腱읽숩ヴ 靴맑�읒 � ∠� 健傑ュ 靴맑�ュ 귑橒≠���梟 珞腱읽鋼 귑橒숄≠��� ┐珞健ヴ. 健↔鋼 靴맑�읒 偈≠�ュ ↔健 �� 그 �뱉況 偈≠�� 그 ��況 ″���梟 �� 들ヴュ ↔槪 ┥�ヴ. 珞腱읽鋼 휸뭡들 �읒ュ 순±� � 꼬劍�頌, s = s + newpiece잴  鋼 ��ュ 읏甦� 珞腱읽� 그 �頌� 珞腱읽 獵、槪 읓��劍頌� 잽��∽ �頌� 珞腱읽槪 휸뭡� ↔健ヴ, 그靴�ュ 그 읏甦� 珞腱읽槪 ��� ↔健ヴ, 健↔鋼 皇鋼 뱉��乾 回匯靴 �靴況 ���ヴ 그靴� ��들 �悚� �鋼 그 珞腱읽健 굶��읒 ��밀 �≠�ヴ, 그甦밀 s = s + newpiece況 耘둥健읒 ���ュ ↔鋼 ���健ヴ. ���乾 ��頌 健꺌� �밀, n⌒� ��槪 靴맑�읒 偈≠�ュ ↔鋼 O(n)健ヴ, ��읒 n⌒� ��槪 珞腱읽읒 偈≠�ュ ↔鋼 O(n2)健ヴ.

[9] 寧ュ 皇鋼 ↔槪 꿇급�� 꿩ュヴ.

[10] 霓ヴ, 그↔鋼 그峀∽ �� �珞鋼 껍レヴ. ��읒ュ 健塑 �珞�鋼 꼬ヴ: “�健� ��況 虔뭡�� 胛�밀ュ 寧ュ �┱ �裔�況 ���꺌 �ュ≠?” (��: Emacs) 玗鋼 “�健�鋼 �숩ヴ 寧鋼≠ 鰥�≠?” (��: “�健 �健�숩ヴ 鰥�ヴ ���� �┶�健 그↔健 鰥��況 읏�� ┐珞읒.” -Larry Wall, 10/14/1998) 그壽寧 HTML 庚靴읒 �� �珞�鋼 � マ읒 � 纜畓頌 � �� 健� ���ヴ, 그靴� 그壽� �珞� 燁읒, 健↔健 ≠� 乾��ュ �珞健ヴ.


<< 靈 3 � ∥譴�� �屬巾�� � 量 靈 5 � 傑� �맑� >>