Public Attributes | |
declaredHTMLEncoding | |
originalEncoding | |
Public Attributes inherited from BeautifulSoup.BeautifulStoneSoup | |
builder | |
convertEntities | |
convertHTMLEntities | |
convertXMLEntities | |
currentData | |
currentTag | |
declaredHTMLEncoding | |
escapeUnrecognizedEntities | |
fromEncoding | |
hidden | |
instanceSelfClosingTags | |
literal | |
markup | |
markupMassage | |
originalEncoding | |
parseOnlyThese | |
previous | |
quoteStack | |
smartQuotesTo | |
tagStack | |
Public Attributes inherited from BeautifulSoup.PageElement | |
next | |
nextSibling | |
parent | |
previous | |
previousSibling | |
Static Public Attributes | |
tuple | CHARSET_RE = re.compile("((^|;)\s*charset=)([^;]*)", re.M) |
list | NESTABLE_BLOCK_TAGS = ['blockquote', 'div', 'fieldset', 'ins', 'del'] |
list | NESTABLE_INLINE_TAGS |
dictionary | NESTABLE_LIST_TAGS |
dictionary | NESTABLE_TABLE_TAGS |
tuple | NESTABLE_TAGS |
list | NON_NESTABLE_BLOCK_TAGS = ['address', 'form', 'p', 'pre'] |
tuple | PRESERVE_WHITESPACE_TAGS = set(['pre', 'textarea']) |
dictionary | QUOTE_TAGS = {'script' : None, 'textarea' : None} |
tuple | RESET_NESTING_TAGS |
tuple | SELF_CLOSING_TAGS |
Static Public Attributes inherited from BeautifulSoup.BeautifulStoneSoup | |
ALL_ENTITIES = XHTML_ENTITIES | |
string | HTML_ENTITIES = "html" |
list | MARKUP_MASSAGE |
dictionary | NESTABLE_TAGS = {} |
list | PRESERVE_WHITESPACE_TAGS = [] |
dictionary | QUOTE_TAGS = {} |
dictionary | RESET_NESTING_TAGS = {} |
string | ROOT_TAG_NAME = u'[document]' |
dictionary | SELF_CLOSING_TAGS = {} |
dictionary | STRIP_ASCII_SPACES = { 9: None, 10: None, 12: None, 13: None, 32: None, } |
string | XHTML_ENTITIES = "xhtml" |
string | XML_ENTITIES = "xml" |
Static Public Attributes inherited from BeautifulSoup.PageElement | |
fetchNextSiblings = findNextSiblings | |
fetchParents = findParents | |
fetchPrevious = findAllPrevious | |
fetchPreviousSiblings = findPreviousSiblings | |
This parser knows the following facts about HTML: * Some tags have no closing tag and should be interpreted as being closed as soon as they are encountered. * The text inside some tags (ie. 'script') may contain tags which are not really part of the document and which should be parsed as text, not tags. If you want to parse the text as tags, you can always fetch it and parse it explicitly. * Tag nesting rules: Most tags can't be nested at all. For instance, the occurance of a <p> tag should implicitly close the previous <p> tag. <p>Para1<p>Para2 should be transformed into: <p>Para1</p><p>Para2 Some tags can be nested arbitrarily. For instance, the occurance of a <blockquote> tag should _not_ implicitly close the previous <blockquote> tag. Alice said: <blockquote>Bob said: <blockquote>Blah should NOT be transformed into: Alice said: <blockquote>Bob said: </blockquote><blockquote>Blah Some tags can be nested, but the nesting is reset by the interposition of other tags. For instance, a <tr> tag should implicitly close the previous <tr> tag within the same <table>, but not close a <tr> tag in another table. <table><tr>Blah<tr>Blah should be transformed into: <table><tr>Blah</tr><tr>Blah but, <tr>Blah<table><tr>Blah should NOT be transformed into <tr>Blah<table></tr><tr>Blah Differing assumptions about tag nesting rules are a major source of problems with the BeautifulSoup class. If BeautifulSoup is not treating as nestable a tag your page author treats as nestable, try ICantBelieveItsBeautifulSoup, MinimalSoup, or BeautifulStoneSoup before writing your own subclass.
Definition at line 1447 of file BeautifulSoup.py.
def BeautifulSoup.BeautifulSoup.__init__ | ( | self, | |
args, | |||
kwargs | |||
) |
Definition at line 1495 of file BeautifulSoup.py.
References BeautifulSoup.BeautifulStoneSoup.HTML_ENTITIES.
Referenced by BeautifulSoup.BeautifulSoup.__init__().
def BeautifulSoup.BeautifulSoup.__init__ | ( | self, | |
args, | |||
kwargs | |||
) |
Definition at line 1495 of file BeautifulSoup.py.
References BeautifulSoup.BeautifulSoup.__init__(), BeautifulSoup.buildTagMap(), and BeautifulSoup.BeautifulStoneSoup.HTML_ENTITIES.
def BeautifulSoup.BeautifulSoup.extractCharsetFromMeta | ( | self, | |
attrs | |||
) |
Beautiful Soup can detect a charset included in a META tag, try to convert the document to that charset, and re-parse the document from the beginning.
Definition at line 1553 of file BeautifulSoup.py.
References BeautifulSoup.BeautifulStoneSoup.declaredHTMLEncoding.
Referenced by BeautifulSoup.BeautifulSoup.extractCharsetFromMeta().
def BeautifulSoup.BeautifulSoup.extractCharsetFromMeta | ( | self, | |
attrs | |||
) |
Beautiful Soup can detect a charset included in a META tag, try to convert the document to that charset, and re-parse the document from the beginning.
Definition at line 1553 of file BeautifulSoup.py.
References BeautifulSoup.BeautifulStoneSoup._feed(), BeautifulSoup.BeautifulStoneSoup.declaredHTMLEncoding, BeautifulSoup.BeautifulSoup.extractCharsetFromMeta(), BeautifulSoup.BeautifulStoneSoup.fromEncoding, BeautifulSoup.BeautifulStoneSoup.originalEncoding, and BeautifulSoup.BeautifulStoneSoup.unknown_starttag().
|
static |
Definition at line 1551 of file BeautifulSoup.py.
BeautifulSoup.BeautifulSoup.declaredHTMLEncoding |
Definition at line 1592 of file BeautifulSoup.py.
Referenced by BeautifulSoup.UnicodeDammit.__init__(), and BeautifulSoup.UnicodeDammit._detectEncoding().
|
static |
Definition at line 1518 of file BeautifulSoup.py.
|
static |
Definition at line 1512 of file BeautifulSoup.py.
|
static |
Definition at line 1521 of file BeautifulSoup.py.
|
static |
Definition at line 1529 of file BeautifulSoup.py.
|
static |
Definition at line 1547 of file BeautifulSoup.py.
Definition at line 1538 of file BeautifulSoup.py.
BeautifulSoup.BeautifulSoup.originalEncoding |
Definition at line 1575 of file BeautifulSoup.py.
Referenced by BeautifulSoup.UnicodeDammit.__init__(), and BeautifulSoup.UnicodeDammit._convertFrom().
|
static |
Definition at line 1505 of file BeautifulSoup.py.
|
static |
Definition at line 1507 of file BeautifulSoup.py.
|
static |
Definition at line 1542 of file BeautifulSoup.py.
|
static |
Definition at line 1501 of file BeautifulSoup.py.