Static Public Attributes | |
tuple | I_CANT_BELIEVE_THEYRE_NESTABLE_BLOCK_TAGS = ('noscript',) |
I_CANT_BELIEVE_THEYRE_NESTABLE_INLINE_TAGS = \ | |
tuple | NESTABLE_TAGS |
Static Public Attributes inherited from BeautifulSoup.BeautifulSoup | |
tuple | CHARSET_RE = re.compile("((^|;)\s*charset=)([^;]*)", re.M) |
tuple | NESTABLE_BLOCK_TAGS = ('blockquote', 'div', 'fieldset', 'ins', 'del') |
tuple | NESTABLE_INLINE_TAGS |
dictionary | NESTABLE_LIST_TAGS |
dictionary | NESTABLE_TABLE_TAGS |
tuple | NESTABLE_TAGS |
tuple | NON_NESTABLE_BLOCK_TAGS = ('address', 'form', 'p', 'pre') |
tuple | PRESERVE_WHITESPACE_TAGS = set(['pre', 'textarea']) |
dictionary | QUOTE_TAGS = {'script' : None, 'textarea' : None} |
tuple | RESET_NESTING_TAGS |
tuple | SELF_CLOSING_TAGS |
Static Public Attributes inherited from BeautifulSoup.BeautifulStoneSoup | |
ALL_ENTITIES = XHTML_ENTITIES | |
string | HTML_ENTITIES = "html" |
list | MARKUP_MASSAGE |
dictionary | NESTABLE_TAGS = {} |
list | PRESERVE_WHITESPACE_TAGS = [] |
dictionary | QUOTE_TAGS = {} |
dictionary | RESET_NESTING_TAGS = {} |
string | ROOT_TAG_NAME = u'[document]' |
dictionary | SELF_CLOSING_TAGS = {} |
dictionary | STRIP_ASCII_SPACES = { 9: None, 10: None, 12: None, 13: None, 32: None, } |
string | XHTML_ENTITIES = "xhtml" |
string | XML_ENTITIES = "xml" |
Static Public Attributes inherited from BeautifulSoup.Tag | |
fetch = findAll | |
findChild = find | |
findChildren = findAll | |
first = find | |
The BeautifulSoup class is oriented towards skipping over common HTML errors like unclosed tags. However, sometimes it makes errors of its own. For instance, consider this fragment: <b>Foo<b>Bar</b></b> This is perfectly valid (if bizarre) HTML. However, the BeautifulSoup class will implicitly close the first b tag when it encounters the second 'b'. It will think the author wrote "<b>Foo<b>Bar", and didn't close the first 'b' tag, because there's no real-world reason to bold something that's already bold. When it encounters '</b></b>' it will close two more 'b' tags, for a grand total of three tags closed instead of two. This can throw off the rest of your document structure. The same is true of a number of other tags, listed below. It's much more common for someone to forget to close a 'b' tag than to actually use nested 'b' tags, and the BeautifulSoup class handles the common case. This class handles the not-co-common case: where you can't believe someone wrote what they did, but it's valid HTML and BeautifulSoup screwed up by assuming it wouldn't be.
Definition at line 1626 of file BeautifulSoup.py.
|
static |
Definition at line 1656 of file BeautifulSoup.py.
|
static |
Definition at line 1651 of file BeautifulSoup.py.
|
static |
Definition at line 1658 of file BeautifulSoup.py.