Spade
Mini Shell
| Directory:~$ /lib/python2.7/site-packages/kitchen/text/ |
| [Home] [System Details] [Kill Me] |
�
i�:Oc@s5dZddlZddlZddlZyddlZWnek
rSdZnXddlZddl m
Z
ddlmZe
j
�dZeedd�dd ged
d��Zeejee��Zejd�Zed
�Zddd�Zdd�Zd�Zdd�Zdd�ZdZdS(s�
---------------------------------------------
Miscellaneous functions for manipulating text
---------------------------------------------
Collection of text functions that don't fit in another category.
i����N(tsets(tControlCharErrorg333333�?iiiiii
s(?s)<[^>]*>|&#?\w+;cCs�t|t�s'ttjd���nd}yt||d�Wntk
rZd}nX|r�tr�|r�tj |�}|dt
kr�|d}q�n|s�d}n|S(s#Try to guess the encoding of a byte
:class:`str`
:arg byte_string: byte :class:`str` to guess the encoding of
:kwarg disable_chardet: If this is True, we never attempt to use
:mod:`chardet` to guess the encoding. This is useful if you need
to
have reproducibility whether :mod:`chardet` is installed or not.
Default: :data:`False`.
:raises TypeError: if :attr:`byte_string` is not a byte :class:`str`
type
:returns: string containing a guess at the encoding of
:attr:`byte_string`. This is appropriate to pass as the encoding
argument when encoding and decoding unicode strings.
We start by attempting to decode the byte :class:`str` as
:term:`UTF-8`.
If this succeeds we tell the world it's :term:`UTF-8` text. If it
doesn't
and :mod:`chardet` is installed on the system and
:attr:`disable_chardet`
is False this function will use it to try detecting the encoding of
:attr:`byte_string`. If it is not installed or :mod:`chardet` cannot
determine the encoding with a high enough confidence then we rather
arbitrarily claim that it is ``latin-1``. Since ``latin-1`` will
encode
to every byte, decoding from ``latin-1`` to :class:`unicode` will not
cause :exc:`UnicodeErrors` although the output might be mangled.
s'byte_string must be a byte string (str)sutf-8tstrictt
confidencetencodingslatin-1N(t
isinstancetstrt TypeErrortktb_tunicodetUnicodeDecodeErrortNonetchardettdetectt_CHARDET_THRESHHOLD(tbyte_stringtdisable_chardettinput_encodingtdetection_info((s5/usr/lib/python2.7/site-packages/kitchen/text/misc.pytguess_encoding;s
sutf-8treplacecCszy||ko||kSWntk
r/nXt|t�rT|j||�}n|j||�}||krvtStS(s�Compare
two stringsi, converting to byte :class:`str` if one is
:class:`unicode`
:arg str1: First string to compare
:arg str2: Second string to compare
:kwarg encoding: If we need to convert one string into a byte
:class:`str`
to compare, the encoding to use. Default is :term:`utf-8`.
:kwarg errors: What to do if we encounter errors when encoding the
string.
See the :func:`kitchen.text.converters.to_bytes` documentation for
possible values. The default is ``replace``.
This function prevents :exc:`UnicodeError` (python-2.4 or less) and
:exc:`UnicodeWarning` (python 2.5 and higher) when we compare
a :class:`unicode` string to a byte :class:`str`. The errors normally
arise because the conversion is done to :term:`ASCII`. This function
lets you convert to :term:`utf-8` or another encoding instead.
.. note::
When we need to convert one of the strings from :class:`unicode` in
order to compare them we convert the :class:`unicode` string into
a byte :class:`str`. That means that strings can compare
differently
if you use different encodings for each.
Note that ``str1 == str2`` is faster than this function if you can
accept
the following limitations:
* Limited to python-2.5+ (otherwise a :exc:`UnicodeDecodeError` may be
thrown)
* Will generate a :exc:`UnicodeWarning` if non-:term:`ASCII` byte
:class:`str` is compared to :class:`unicode` string.
(tUnicodeErrorRR
tencodetTruetFalse(tstr1tstr2Rterrors((s5/usr/lib/python2.7/site-packages/kitchen/text/misc.pytstr_eqds!
cCst|t�s'ttjd���n|dkrXtttdgt t���}n�|dkr�tttdgt t���}ns|dkr�d}t
|�}gtD]}||kr�|^q�r�ttjd���q�nt
tjd���|r|j|�}n|S( s�Look
for and transform :term:`control characters` in a string
:arg string: string to search for and transform :term:`control
characters`
within
:kwarg strategy: XML does not allow :term:`ASCII` :term:`control
characters`. When we encounter those we need to know what to do.
Valid options are:
:replace: (default) Replace the :term:`control characters`
with ``"?"``
:ignore: Remove the characters altogether from the output
:strict: Raise a :exc:`~kitchen.text.exceptions.ControlCharError`
when
we encounter a control character
:raises TypeError: if :attr:`string` is not a unicode string.
:raises ValueError: if the strategy is not one of replace, ignore, or
strict.
:raises kitchen.text.exceptions.ControlCharError: if the strategy is
``strict`` and a :term:`control character` is present in the
:attr:`string`
:returns: :class:`unicode` string with no :term:`control characters` in
it.
sDprocess_control_char must have a unicode type as the first
argument.tignoreRu?Rs*ASCII control code present in string inputsXThe
strategy argument to process_control_chars must be one of ignore, replace,
or strictN(RR
RRR tdicttzipt_CONTROL_CODESRtlent frozensett_CONTROL_CHARSRt
ValueErrort translate(tstringtstrategyt
control_tabletdatatc((s5/usr/lib/python2.7/site-packages/kitchen/text/misc.pytprocess_control_chars�s%%%cCsCd�}t|t�s0ttjd���ntjt||�S(s/Substitute
unicode characters for HTML entities
:arg string: :class:`unicode` string to substitute out html entities
:raises TypeError: if something other than a :class:`unicode` string is
given
:rtype: :class:`unicode` string
:returns: The plain text without html entities
cSs |jd�}|d dkr#dS|d dkr�yE|d dkr`tt|dd !d
��Stt|dd !��SWqtk
r�qXn�|d dkrtjj|dd !jd��}|r|d
d
kr ytt|dd !��SWqtk
rqXqt|d�Sqn|S(Niiu<tiu&#iu&#xi����iu&sutf-8s&#s
iso-8859-1( tgrouptunichrtintR%thtmlentitydefst
entitydefstgetRR
(tmatchR'tentity((s5/usr/lib/python2.7/site-packages/kitchen/text/misc.pytfixup�s(
"
sFhtml_entities_unescape
must have a unicode type for its first argument(RR
RRR tretsubt
_ENTITY_RE(R'R6((s5/usr/lib/python2.7/site-packages/kitchen/text/misc.pythtml_entities_unescape�s cCs^t|t�stSyt||�}Wntk
r:tSXt|�}|jt�rZtStS(s�Check that a byte :class:`str`
would be valid in xml
:arg byte_string: Byte :class:`str` to check
:arg encoding: Encoding of the xml file. Default: :term:`UTF-8`
:returns: :data:`True` if the string is valid. :data:`False` if it
would
be invalid in the xml file
In some cases you'll have a whole bunch of byte strings and rather
than
transforming them to :class:`unicode` and back to byte :class:`str` for
output to xml, you will just want to make sure they work with the xml
file
you're constructing. This function will help you do that.
Example::
ARRAY_OF_MOSTLY_UTF8_STRINGS = [...]
processed_array = []
for string in ARRAY_OF_MOSTLY_UTF8_STRINGS:
if byte_string_valid_xml(string, 'utf-8'):
processed_array.append(string)
else:
processed_array.append(guess_bytes_to_xml(string,
encoding='utf-8'))
output_xml(processed_array)
( RRRR
RR#tintersectionR$R(RRtu_stringR*((s5/usr/lib/python2.7/site-packages/kitchen/text/misc.pytbyte_string_valid_xml�s
cCs*yt||�Wntk
r%tSXtS(s�Detect if a byte :class:`str` is valid in a specific
encoding
:arg byte_string: Byte :class:`str` to test for bytes not valid in this
encoding
:kwarg encoding: encoding to test against. Defaults to :term:`UTF-8`.
:returns: :data:`True` if there are no invalid :term:`UTF-8`
characters.
:data:`False` if an invalid character is detected.
.. note::
This function checks whether the byte :class:`str` is valid in the
specified encoding. It **does not** detect whether the byte
:class:`str` actually was encoded in that encoding. If you want
that
sort of functionality, you probably want to use
:func:`~kitchen.text.misc.guess_encoding` instead.
(R
RRR(RR((s5/usr/lib/python2.7/site-packages/kitchen/text/misc.pytbyte_string_valid_encodings
R>R=RR:R,R(sbyte_string_valid_encodingsbyte_string_valid_xmlsguess_encodingshtml_entities_unescapesprocess_control_charssstr_eq(t__doc__R1t itertoolsR7R
tImportErrorRtkitchenRtkitchen.pycompat24Rtkitchen.text.exceptionsRtadd_builtin_setRR#trangeR!timapR/R$tcompileR9RRRR,R:R=R>t__all__(((s5/usr/lib/python2.7/site-packages/kitchen/text/misc.pyt<module>s0
,)/8 *(