|
tuple | I_CANT_BELIEVE_THEYRE_NESTABLE_BLOCK_TAGS = ('noscript',) |
|
| I_CANT_BELIEVE_THEYRE_NESTABLE_INLINE_TAGS = \ |
|
tuple | NESTABLE_TAGS |
|
tuple | CHARSET_RE = re.compile("((^|;)\s*charset=)([^;]*)", re.M) |
|
tuple | NESTABLE_BLOCK_TAGS = ('blockquote', 'div', 'fieldset', 'ins', 'del') |
|
tuple | NESTABLE_INLINE_TAGS |
|
dictionary | NESTABLE_LIST_TAGS |
|
dictionary | NESTABLE_TABLE_TAGS |
|
tuple | NESTABLE_TAGS |
|
tuple | NON_NESTABLE_BLOCK_TAGS = ('address', 'form', 'p', 'pre') |
|
tuple | PRESERVE_WHITESPACE_TAGS = set(['pre', 'textarea']) |
|
dictionary | QUOTE_TAGS = {'script' : None, 'textarea' : None} |
|
tuple | RESET_NESTING_TAGS |
|
tuple | SELF_CLOSING_TAGS |
|
| ALL_ENTITIES = XHTML_ENTITIES |
|
string | HTML_ENTITIES = "html" |
|
list | MARKUP_MASSAGE |
|
dictionary | NESTABLE_TAGS = {} |
|
list | PRESERVE_WHITESPACE_TAGS = [] |
|
dictionary | QUOTE_TAGS = {} |
|
dictionary | RESET_NESTING_TAGS = {} |
|
string | ROOT_TAG_NAME = u'[document]' |
|
dictionary | SELF_CLOSING_TAGS = {} |
|
dictionary | STRIP_ASCII_SPACES = { 9: None, 10: None, 12: None, 13: None, 32: None, } |
|
string | XHTML_ENTITIES = "xhtml" |
|
string | XML_ENTITIES = "xml" |
|
| fetch = findAll |
|
| findChild = find |
|
| findChildren = findAll |
|
| first = find |
|
The BeautifulSoup class is oriented towards skipping over
common HTML errors like unclosed tags. However, sometimes it makes
errors of its own. For instance, consider this fragment:
<b>Foo<b>Bar</b></b>
This is perfectly valid (if bizarre) HTML. However, the
BeautifulSoup class will implicitly close the first b tag when it
encounters the second 'b'. It will think the author wrote
"<b>Foo<b>Bar", and didn't close the first 'b' tag, because
there's no real-world reason to bold something that's already
bold. When it encounters '</b></b>' it will close two more 'b'
tags, for a grand total of three tags closed instead of two. This
can throw off the rest of your document structure. The same is
true of a number of other tags, listed below.
It's much more common for someone to forget to close a 'b' tag
than to actually use nested 'b' tags, and the BeautifulSoup class
handles the common case. This class handles the not-co-common
case: where you can't believe someone wrote what they did, but
it's valid HTML and BeautifulSoup screwed up by assuming it
wouldn't be.
Definition at line 1626 of file BeautifulSoup.py.