Public Member Functions | |
def | __init__ |
def | __init__ |
def | find_codec |
def | find_codec |
Public Attributes | |
declaredHTMLEncoding | |
markup | |
originalEncoding | |
smartQuotesTo | |
triedEncodings | |
unicode | |
Static Public Attributes | |
dictionary | CHARSET_ALIASES |
EBCDIC_TO_ASCII_MAP = None | |
dictionary | MS_CHARS |
Private Member Functions | |
def | _codec |
def | _codec |
def | _convertFrom |
def | _convertFrom |
def | _detectEncoding |
def | _detectEncoding |
def | _ebcdic_to_ascii |
def | _ebcdic_to_ascii |
def | _subMSChar |
def | _subMSChar |
def | _toUnicode |
def | _toUnicode |
A class for detecting the encoding of a *ML document and converting it to a Unicode string. If the source encoding is windows-1252, can replace MS smart quotes with their HTML or XML equivalents.
Definition at line 1734 of file BeautifulSoup.py.
def BeautifulSoup.UnicodeDammit.__init__ | ( | self, | |
markup, | |||
overrideEncodings = [] , |
|||
smartQuotesTo = 'xml' , |
|||
isHTML = False |
|||
) |
Definition at line 1748 of file BeautifulSoup.py.
def BeautifulSoup.UnicodeDammit.__init__ | ( | self, | |
markup, | |||
overrideEncodings = [] , |
|||
smartQuotesTo = 'xml' , |
|||
isHTML = False |
|||
) |
Definition at line 1748 of file BeautifulSoup.py.
References BeautifulSoup.UnicodeDammit._convertFrom(), BeautifulSoup.UnicodeDammit._detectEncoding(), BeautifulSoup.BeautifulStoneSoup.declaredHTMLEncoding, BeautifulSoup.BeautifulSoup.declaredHTMLEncoding, BeautifulSoup.UnicodeDammit.declaredHTMLEncoding, BeautifulSoup.BeautifulStoneSoup.markup, BeautifulSoup.UnicodeDammit.markup, BeautifulSoup.BeautifulStoneSoup.originalEncoding, BeautifulSoup.BeautifulSoup.originalEncoding, BeautifulSoup.UnicodeDammit.originalEncoding, BeautifulSoup.BeautifulStoneSoup.smartQuotesTo, BeautifulSoup.UnicodeDammit.smartQuotesTo, BeautifulSoup.UnicodeDammit.triedEncodings, and BeautifulSoup.UnicodeDammit.unicode.
|
private |
Definition at line 1924 of file BeautifulSoup.py.
Referenced by BeautifulSoup.UnicodeDammit._codec(), and BeautifulSoup.UnicodeDammit.find_codec().
|
private |
|
private |
Definition at line 1795 of file BeautifulSoup.py.
References BeautifulSoup.UnicodeDammit._convertFrom(), BeautifulSoup.UnicodeDammit._subMSChar(), BeautifulSoup.UnicodeDammit._toUnicode(), BeautifulSoup.UnicodeDammit.find_codec(), recoMuon.in, BeautifulSoup.BeautifulStoneSoup.markup, BeautifulSoup.UnicodeDammit.markup, BeautifulSoup.BeautifulStoneSoup.originalEncoding, BeautifulSoup.BeautifulSoup.originalEncoding, BeautifulSoup.UnicodeDammit.originalEncoding, BeautifulSoup.BeautifulStoneSoup.smartQuotesTo, BeautifulSoup.UnicodeDammit.smartQuotesTo, and BeautifulSoup.UnicodeDammit.triedEncodings.
|
private |
Definition at line 1795 of file BeautifulSoup.py.
References BeautifulSoup.UnicodeDammit._subMSChar(), BeautifulSoup.UnicodeDammit._toUnicode(), BeautifulSoup.UnicodeDammit.find_codec(), recoMuon.in, BeautifulSoup.BeautifulStoneSoup.markup, BeautifulSoup.UnicodeDammit.markup, BeautifulSoup.BeautifulStoneSoup.smartQuotesTo, BeautifulSoup.UnicodeDammit.smartQuotesTo, and BeautifulSoup.UnicodeDammit.triedEncodings.
Referenced by BeautifulSoup.UnicodeDammit.__init__(), and BeautifulSoup.UnicodeDammit._convertFrom().
Given a document, tries to detect its XML encoding.
Definition at line 1848 of file BeautifulSoup.py.
References BeautifulSoup.UnicodeDammit._ebcdic_to_ascii(), BeautifulSoup.BeautifulStoneSoup.declaredHTMLEncoding, BeautifulSoup.BeautifulSoup.declaredHTMLEncoding, BeautifulSoup.UnicodeDammit.declaredHTMLEncoding, edm.decode(), alcaDQMUpload.encode(), match(), and BeautifulSoup.UnicodeDammit.unicode.
Referenced by BeautifulSoup.UnicodeDammit.__init__(), and BeautifulSoup.UnicodeDammit._detectEncoding().
Given a document, tries to detect its XML encoding.
Definition at line 1848 of file BeautifulSoup.py.
References BeautifulSoup.UnicodeDammit._detectEncoding(), BeautifulSoup.UnicodeDammit._ebcdic_to_ascii(), BeautifulSoup.BeautifulStoneSoup.declaredHTMLEncoding, BeautifulSoup.BeautifulSoup.declaredHTMLEncoding, BeautifulSoup.UnicodeDammit.declaredHTMLEncoding, edm.decode(), alcaDQMUpload.encode(), match(), and BeautifulSoup.UnicodeDammit.unicode.
|
private |
Definition at line 1935 of file BeautifulSoup.py.
References BeautifulSoup.UnicodeDammit._ebcdic_to_ascii(), join(), and Association.map.
|
private |
Definition at line 1935 of file BeautifulSoup.py.
References join(), and Association.map.
Referenced by BeautifulSoup.UnicodeDammit._detectEncoding(), and BeautifulSoup.UnicodeDammit._ebcdic_to_ascii().
|
private |
Changes a MS smart quote character to an XML or HTML entity.
Definition at line 1781 of file BeautifulSoup.py.
References BeautifulSoup.UnicodeDammit._subMSChar(), alcaDQMUpload.encode(), BeautifulSoup.BeautifulStoneSoup.smartQuotesTo, and BeautifulSoup.UnicodeDammit.smartQuotesTo.
|
private |
Changes a MS smart quote character to an XML or HTML entity.
Definition at line 1781 of file BeautifulSoup.py.
References alcaDQMUpload.encode(), BeautifulSoup.BeautifulStoneSoup.smartQuotesTo, and BeautifulSoup.UnicodeDammit.smartQuotesTo.
Referenced by BeautifulSoup.UnicodeDammit._convertFrom(), and BeautifulSoup.UnicodeDammit._subMSChar().
|
private |
Given a string and its encoding, decodes the string into Unicode. %encoding is a string recognized by encodings.aliases
Definition at line 1823 of file BeautifulSoup.py.
References BeautifulSoup.UnicodeDammit._toUnicode(), and BeautifulSoup.UnicodeDammit.unicode.
|
private |
Given a string and its encoding, decodes the string into Unicode. %encoding is a string recognized by encodings.aliases
Definition at line 1823 of file BeautifulSoup.py.
References BeautifulSoup.UnicodeDammit.unicode.
Referenced by BeautifulSoup.UnicodeDammit._convertFrom(), and BeautifulSoup.UnicodeDammit._toUnicode().
def BeautifulSoup.UnicodeDammit.find_codec | ( | self, | |
charset | |||
) |
Definition at line 1918 of file BeautifulSoup.py.
References BeautifulSoup.UnicodeDammit._codec().
Referenced by BeautifulSoup.UnicodeDammit._convertFrom(), and BeautifulSoup.UnicodeDammit.find_codec().
def BeautifulSoup.UnicodeDammit.find_codec | ( | self, | |
charset | |||
) |
Definition at line 1918 of file BeautifulSoup.py.
References BeautifulSoup.UnicodeDammit._codec(), and BeautifulSoup.UnicodeDammit.find_codec().
|
static |
Definition at line 1744 of file BeautifulSoup.py.
BeautifulSoup.UnicodeDammit.declaredHTMLEncoding |
Definition at line 1749 of file BeautifulSoup.py.
Referenced by BeautifulSoup.UnicodeDammit.__init__(), and BeautifulSoup.UnicodeDammit._detectEncoding().
|
static |
Definition at line 1934 of file BeautifulSoup.py.
BeautifulSoup.UnicodeDammit.markup |
Definition at line 1814 of file BeautifulSoup.py.
Referenced by BeautifulSoup.UnicodeDammit.__init__(), and BeautifulSoup.UnicodeDammit._convertFrom().
|
static |
Definition at line 1960 of file BeautifulSoup.py.
BeautifulSoup.UnicodeDammit.originalEncoding |
Definition at line 1755 of file BeautifulSoup.py.
Referenced by BeautifulSoup.UnicodeDammit.__init__(), and BeautifulSoup.UnicodeDammit._convertFrom().
BeautifulSoup.UnicodeDammit.smartQuotesTo |
Definition at line 1752 of file BeautifulSoup.py.
Referenced by BeautifulSoup.UnicodeDammit.__init__(), BeautifulSoup.UnicodeDammit._convertFrom(), and BeautifulSoup.UnicodeDammit._subMSChar().
BeautifulSoup.UnicodeDammit.triedEncodings |
Definition at line 1753 of file BeautifulSoup.py.
Referenced by BeautifulSoup.UnicodeDammit.__init__(), and BeautifulSoup.UnicodeDammit._convertFrom().
BeautifulSoup.UnicodeDammit.unicode |
Definition at line 1756 of file BeautifulSoup.py.
Referenced by BeautifulSoup.UnicodeDammit.__init__(), BeautifulSoup.UnicodeDammit._detectEncoding(), and BeautifulSoup.UnicodeDammit._toUnicode().