Spade
Mini Shell
| Directory:~$ /lib64/python3.6/email/__pycache__/ |
| [Home] [System Details] [Kill Me] |
3
k��h��@sjdZddlZddlZddlmZddlmZddlmZddl m
Zddl mZddl m
Z
ed �Zeed
�BZed�ZeeBZeed�Zeed
�Zeed�Bed�ZeeBZeed�BZeeBZeed�Zdd�ZGdd�de�ZGdd�de�ZGdd�de�ZGdd�de�ZGdd�de�Z
Gdd�de�Z!Gdd �d
e�Z"Gd!d"�d"e�Z#Gd#d$�d$e�Z$Gd%d&�d&e�Z%Gd'd(�d(e%�Z&Gd)d*�d*e�Z'Gd+d,�d,e�Z(Gd-d.�d.e�Z)Gd/d0�d0e�Z*Gd1d2�d2e�Z+Gd3d4�d4e�Z,Gd5d6�d6e�Z-Gd7d8�d8e�Z.Gd9d:�d:e�Z/Gd;d<�d<e�Z0Gd=d>�d>e�Z1Gd?d@�d@e�Z2GdAdB�dBe�Z3GdCdD�dDe�Z4GdEdF�dFe�Z5GdGdH�dHe�Z6GdIdJ�dJe�Z7GdKdL�dLe�Z8GdMdN�dNe�Z9GdOdP�dPe�Z:GdQdR�dRe�Z;GdSdT�dTe;�Z<GdUdV�dVe�Z=GdWdX�dXe�Z>GdYdZ�dZe�Z?Gd[d\�d\e�Z@Gd]d^�d^e�ZAGd_d`�d`eA�ZBGdadb�dbeA�ZCGdcdd�dde�ZDGdedf�dfe�ZEGdgdh�dhe�ZFGdidj�djeG�ZHGdkdl�dleH�ZIGdmdn�dneH�ZJGdodp�dpeI�ZKeJddq�ZLeJdrds�ZMeJdtdu�ZNejOdvjPdwjQe���jRZSejOdxjPdwjQe�jTdydz�jTd{d|���jUZVejOd}�jWZXejOdxjPdwjQe�jTdydz�jTd{d|���jUZYejOdxjPdwjQe�jTdydz�jTd{d|���jUZZejOdxjPdwjQe�jTdydz�jTd{d|���jUZ[d~d�Z\d�d��Z]d�d��Z^d�d��Z_d�d��Z`d�d��Zad�d��Zbd�d��Zcd�d��Zdd�d��Zed�d��Zfd�d��Zgd�d��Zhd�d��Zid�d��Zjd�d��Zkd�d��Zld�d��Zmd�d��Znd�d��Zod�d��Zpd�d��Zqd�d��Zrd�d��Zsd�d��Ztd�d��Zud�d��Zvd�d��Zwd�d��Zxd�d��Zyd�d��Zzd�d��Z{d�d��Z|d�d��Z}d�dÄZ~d�dńZd�dDŽZ�d�dɄZ�d�d˄Z�d�d̈́Z�d�dτZ�d�dфZ�d�dӄZ�d�dՄZ�d�dׄZ�d�dلZ�d�dۄZ�d�d݄Z�d�d߄Z�d�d�Z�d�d�Z�d�d�Z�d�d�Z�d�d�Z�d�d�Z�dS)�alHeader
value parser implementing various email-related RFC parsing rules.
The parsing methods defined in this module implement various email related
parsing rules. Principal among them is RFC 5322, which is the followon
to RFC 2822 and primarily a clarification of the former. It also
implements
RFC 2047 encoded word decoding.
RFC 5322 goes to considerable trouble to maintain backward compatibility
with
RFC 822 in the parse phase, while cleaning up the structure on the
generation
phase. This parser supports correct RFC 5322 generation by tagging white
space
as folding white space only when folding is allowed in the non-obsolete
rule
sets. Actually, the parser is even more generous when accepting input than
RFC
5322 mandates, following the spirit of Postel's Law, which RFC 5322
encourages.
Where possible deviations from the standard are annotated on the
'defects'
attribute of tokens that deviate.
The general structure of the parser follows RFC 5322, and uses its
terminology
where there is a direct correspondence. Where the implementation requires
a
somewhat different structure than that used by the formal grammar, new
terms
that mimic the closest existing terms are used. Thus, it really helps to
have
a copy of RFC 5322 handy when studying this code.
Input to the parser is a string that has already been unfolded according to
RFC 5322 rules. According to the RFC this unfolding is the very first
step, and
this parser leaves the unfolding step to a higher level message parser,
which
will have already detected the line breaks that need unfolding while
determining the beginning and end of each header.
The output of the parser is a TokenList object, which is a list subclass.
A
TokenList is a recursive data structure. The terminal nodes of the
structure
are Terminal objects, which are subclasses of str. These do not correspond
directly to terminal objects in the formal grammar, but are instead more
practical higher level combinations of true terminals.
All TokenList and Terminal objects have a 'value' attribute,
which produces the
semantically meaningful value of that part of the parse subtree. The value
of
all whitespace tokens (no matter how many sub-tokens they may contain) is a
single space, as per the RFC rules. This includes 'CFWS', which
is herein
included in the general class of whitespace tokens. There is one exception
to
the rule that whitespace tokens are collapsed into single spaces in values:
in
the value of a 'bare-quoted-string' (a quoted-string with no
leading or
trailing whitespace), any whitespace that appeared between the quotation
marks
is preserved in the returned value. Note that in all Terminal strings
quoted
pairs are turned into their unquoted values.
All TokenList and Terminal objects also have a string value, which attempts
to
be a "canonical" representation of the RFC-compliant form of the
substring that
produced the parsed subtree, including minimal use of quoted pair quoting.
Whitespace runs are not collapsed.
Comment tokens also have a 'content' attribute providing the
string found
between the parens (including any nested comments) with whitespace
preserved.
All TokenList and Terminal objects have a 'defects' attribute
which is a
possibly empty list all of the defects found while creating the token.
Defects
may appear on any token in the tree, and a composite list of all defects in
the
subtree is available through the 'all_defects' attribute of any
node. (For
Terminal notes x.defects == x.all_defects.)
Each object in a parse tree is called a 'token', and each has a
'token_type'
attribute that gives the name from the RFC 5322 grammar that it represents.
Not all RFC 5322 nodes are produced, and there is one non-RFC 5322 node
that
may be produced: 'ptext'. A 'ptext' is a string of
printable ascii characters.
It is returned in place of lists of (ctext/quoted-pair) and
(qtext/quoted-pair).
XXX: provide complete list of token types.
�N)� hexdigits)�OrderedDict)�
itemgetter)�_encoded_words)�errors)�utilsz
�(z
()<>@,:;.\"[]�.z."(z/?=z*'%�%cCs
dt|�jdd�jdd�dS)N�"�\z\\z\")�str�replace)�value�r�,/usr/lib64/python3.6/_header_value_parser.py�quote_string`srcs�eZdZdZdZdZ�fdd�Zdd�Z�fdd�Ze d d
��Z
e dd��Zd
d�Ze dd��Z
e dd��Zdd�Zddd�Zddd�Zddd�Z�ZS)� TokenListNTcst�j||�g|_dS)N)�super�__init__�defects)�self�args�kw)� __class__rrrmszTokenList.__init__cCsdjdd�|D��S)N�css|]}t|�VqdS)N)r
)�.0�xrrr� <genexpr>rsz$TokenList.__str__.<locals>.<genexpr>)�join)rrrr�__str__qszTokenList.__str__csdj|jjt�j��S)Nz{}({}))�formatr�__name__r�__repr__)r)rrrr#ts
zTokenList.__repr__cCsdjdd�|D��S)Nrcss|]}|jr|jVqdS)N)r)rrrrrrzsz"TokenList.value.<locals>.<genexpr>)r)rrrrrxszTokenList.valuecCstdd�|D�|j�S)Ncss|]}|jVqdS)N)�all_defects)rrrrrr~sz(TokenList.all_defects.<locals>.<genexpr>)�sumr)rrrrr$|szTokenList.all_defectscCs|dj�S)Nr)�startswith_fws)rrrrr&�szTokenList.startswith_fwscCstdd�|D��S)zATrue
if all top level tokens of this part may be RFC2047
encoded.css|]}|jVqdS)N)�
as_ew_allowed)r�partrrrr�sz*TokenList.as_ew_allowed.<locals>.<genexpr>)�all)rrrrr'�szTokenList.as_ew_allowedcCs"g}x|D]}|j|j�q
W|S)N)�extend�comments)rr+�tokenrrrr+�s
zTokenList.commentscCst||d�S)N)�policy)�_refold_parse_tree)rr-rrr�fold�szTokenList.foldrcCst|j|d��dS)N)�indent)�print�ppstr)rr0rrr�pprint�szTokenList.pprintcCsdj|j|d��S)N�
)r0)r�_pp)rr0rrrr2�szTokenList.ppstrccs~dj||jj|j�Vx<|D]4}t|d�s<|dj|�Vq|j|d�EdHqW|jrhdj|j�}nd}dj||�VdS)Nz{}{}/{}(r5z*
!! invalid element in token list: {!r}z z Defects:
{}rz{}){})r!rr"�
token_type�hasattrr5r)rr0r,Zextrarrrr5�s
z
TokenList._pp)r)r)r)r"�
__module__�__qualname__r6�syntactic_break�ew_combine_allowedrr
r#�propertyrr$r&r'r+r/r3r2r5�
__classcell__rr)rrrgs
rc@s$eZdZedd��Zedd��ZdS)�WhiteSpaceTokenListcCsdS)N�
r)rrrrr�szWhiteSpaceTokenList.valuecCsdd�|D�S)NcSsg|]}|jdkr|j�qS)�comment)r6�content)rrrrr�
<listcomp>�sz0WhiteSpaceTokenList.comments.<locals>.<listcomp>r)rrrrr+�szWhiteSpaceTokenList.commentsN)r"r8r9r<rr+rrrrr>�sr>c@seZdZdZdS)�UnstructuredTokenList�unstructuredN)r"r8r9r6rrrrrC�srCc@seZdZdZdS)�Phrase�phraseN)r"r8r9r6rrrrrE�srEc@seZdZdZdS)�WordZwordN)r"r8r9r6rrrrrG�srGc@seZdZdZdS)�CFWSList�cfwsN)r"r8r9r6rrrrrH�srHc@seZdZdZdS)�Atom�atomN)r"r8r9r6rrrrrJ�srJc@seZdZdZdZdS)�Tokenr,FN)r"r8r9r6Zencode_as_ewrrrrrL�srLc@seZdZdZdZdZdZdS)�EncodedWordzencoded-wordN)r"r8r9r6�cte�charset�langrrrrrM�srMc@s4eZdZdZedd��Zedd��Zedd��ZdS) �QuotedStringz
quoted-stringcCs"x|D]}|jdkr|jSqWdS)Nzbare-quoted-string)r6r)rrrrrrA�s
zQuotedString.contentcCsBg}x2|D]*}|jdkr(|jt|��q
|j|j�q
Wdj|�S)Nzbare-quoted-stringr)r6�appendr
rr)r�resrrrr�quoted_value�s
zQuotedString.quoted_valuecCs"x|D]}|jdkr|jSqWdS)Nzbare-quoted-string)r6r)rr,rrr�stripped_value�s
zQuotedString.stripped_valueN)r"r8r9r6r<rArTrUrrrrrQ�s
rQc@s$eZdZdZdd�Zedd��ZdS)�BareQuotedStringzbare-quoted-stringcCstdjdd�|D���S)Nrcss|]}t|�VqdS)N)r
)rrrrrr�sz+BareQuotedString.__str__.<locals>.<genexpr>)rr)rrrrr
�szBareQuotedString.__str__cCsdjdd�|D��S)Nrcss|]}t|�VqdS)N)r
)rrrrrr�sz)BareQuotedString.value.<locals>.<genexpr>)r)rrrrr�szBareQuotedString.valueN)r"r8r9r6r
r<rrrrrrV�srVc@s8eZdZdZdd�Zdd�Zedd��Zedd ��Zd
S)�Commentr@cs(djtdg�fdd��D�dggg��S)Nrrcsg|]}�j|��qSr)�quote)rr)rrrrB sz#Comment.__str__.<locals>.<listcomp>�))rr%)rr)rrr
s
zComment.__str__cCs2|jdkrt|�St|�jdd�jdd�jdd�S)Nr@rz\\rz\(rYz\))r6r
r)rrrrrrX
s
z
Comment.quotecCsdjdd�|D��S)Nrcss|]}t|�VqdS)N)r
)rrrrrrsz"Comment.content.<locals>.<genexpr>)r)rrrrrAszComment.contentcCs|jgS)N)rA)rrrrr+szComment.commentsN) r"r8r9r6r
rXr<rAr+rrrrrWs
rWc@s4eZdZdZedd��Zedd��Zedd��ZdS) �AddressListzaddress-listcCsdd�|D�S)NcSsg|]}|jdkr|�qS)�address)r6)rrrrrrB"sz)AddressList.addresses.<locals>.<listcomp>r)rrrr� addresses
szAddressList.addressescCstdd�|D�g�S)Ncss|]}|jdkr|jVqdS)r[N)r6� mailboxes)rrrrrr&sz(AddressList.mailboxes.<locals>.<genexpr>)r%)rrrrr]$szAddressList.mailboxescCstdd�|D�g�S)Ncss|]}|jdkr|jVqdS)r[N)r6�
all_mailboxes)rrrrrr+sz,AddressList.all_mailboxes.<locals>.<genexpr>)r%)rrrrr^)szAddressList.all_mailboxesN)r"r8r9r6r<r\r]r^rrrrrZsrZc@s4eZdZdZedd��Zedd��Zedd��ZdS) �Addressr[cCs|djdkr|djSdS)Nr�group)r6�display_name)rrrrra3szAddress.display_namecCs4|djdkr|dgS|djdkr*gS|djS)Nr�mailboxzinvalid-mailbox)r6r])rrrrr]8s
zAddress.mailboxescCs:|djdkr|dgS|djdkr0|dgS|djS)Nrrbzinvalid-mailbox)r6r^)rrrrr^@s
zAddress.all_mailboxesN)r"r8r9r6r<rar]r^rrrrr_/sr_c@s(eZdZdZedd��Zedd��ZdS)�MailboxListzmailbox-listcCsdd�|D�S)NcSsg|]}|jdkr|�qS)rb)r6)rrrrrrBNsz)MailboxList.mailboxes.<locals>.<listcomp>r)rrrrr]LszMailboxList.mailboxescCsdd�|D�S)NcSsg|]}|jdkr|�qS)rb�invalid-mailbox)rbrd)r6)rrrrrrBRsz-MailboxList.all_mailboxes.<locals>.<listcomp>r)rrrrr^PszMailboxList.all_mailboxesN)r"r8r9r6r<r]r^rrrrrcHsrcc@s(eZdZdZedd��Zedd��ZdS)� GroupListz
group-listcCs"|s|djdkrgS|djS)Nrzmailbox-list)r6r])rrrrr]ZszGroupList.mailboxescCs"|s|djdkrgS|djS)Nrzmailbox-list)r6r^)rrrrr^`szGroupList.all_mailboxesN)r"r8r9r6r<r]r^rrrrreVsrec@s4eZdZdZedd��Zedd��Zedd��ZdS) �Groupr`cCs|djdkrgS|djS)N�z
group-list)r6r])rrrrr]kszGroup.mailboxescCs|djdkrgS|djS)Nrgz
group-list)r6r^)rrrrr^qszGroup.all_mailboxescCs
|djS)Nr)ra)rrrrrawszGroup.display_nameN)r"r8r9r6r<r]r^rarrrrrfgsrfc@sLeZdZdZedd��Zedd��Zedd��Zedd ��Zed
d��Z dS)
�NameAddrz name-addrcCst|�dkrdS|djS)N�r)�lenra)rrrrra�szNameAddr.display_namecCs
|djS)Nri���)�
local_part)rrrrrl�szNameAddr.local_partcCs
|djS)Nrirk)�domain)rrrrrm�szNameAddr.domaincCs
|djS)Nrirk)�route)rrrrrn�szNameAddr.routecCs
|djS)Nrirk)� addr_spec)rrrrro�szNameAddr.addr_specN)
r"r8r9r6r<rarlrmrnrorrrrrh|srhc@s@eZdZdZedd��Zedd��Zedd��Zedd ��Zd
S)� AngleAddrz
angle-addrcCs"x|D]}|jdkr|jSqWdS)Nz addr-spec)r6rl)rrrrrrl�s
zAngleAddr.local_partcCs"x|D]}|jdkr|jSqWdS)Nz addr-spec)r6rm)rrrrrrm�s
zAngleAddr.domaincCs"x|D]}|jdkr|jSqWdS)Nz obs-route)r6�domains)rrrrrrn�s
zAngleAddr.routecCs<x6|D]*}|jdkr|jr
|jSt|j�|jSqWdSdS)Nz addr-specz<>)r6rlror)rrrrrro�s
zAngleAddr.addr_specN) r"r8r9r6r<rlrmrnrorrrrrp�s
rpc@seZdZdZedd��ZdS)�ObsRoutez obs-routecCsdd�|D�S)NcSsg|]}|jdkr|j�qS)rm)r6rm)rrrrrrB�sz$ObsRoute.domains.<locals>.<listcomp>r)rrrrrq�szObsRoute.domainsN)r"r8r9r6r<rqrrrrrr�srrc@sLeZdZdZedd��Zedd��Zedd��Zedd ��Zed
d��Z dS)
�MailboxrbcCs|djdkr|djSdS)Nrz name-addr)r6ra)rrrrra�szMailbox.display_namecCs
|djS)Nr)rl)rrrrrl�szMailbox.local_partcCs
|djS)Nr)rm)rrrrrm�szMailbox.domaincCs|djdkr|djSdS)Nrz name-addr)r6rn)rrrrrn�sz
Mailbox.routecCs
|djS)Nr)ro)rrrrro�szMailbox.addr_specN)
r"r8r9r6r<rarlrmrnrorrrrrs�srsc@s,eZdZdZedd��ZeZZZZ dS)�InvalidMailboxzinvalid-mailboxcCsdS)Nr)rrrrra�szInvalidMailbox.display_nameN)
r"r8r9r6r<rarlrmrnrorrrrrt�srtcs(eZdZdZdZe�fdd��Z�ZS)�DomainrmFcsdjt�jj��S)Nr)rrr�split)r)rrrrm�sz
Domain.domain)r"r8r9r6r'r<rmr=rr)rrru�sruc@seZdZdZdS)�DotAtomzdot-atomN)r"r8r9r6rrrrrw�srwc@seZdZdZdZdS)�DotAtomTextz
dot-atom-textTN)r"r8r9r6r'rrrrrx�srxc@sDeZdZdZdZedd��Zedd��Zedd��Zed d
��Z dS)�AddrSpecz addr-specFcCs
|djS)Nr)rl)rrrrrlszAddrSpec.local_partcCst|�dkrdS|djS)N�rirk)rjrm)rrrrrmszAddrSpec.domaincCs<t|�dkr|djS|djj�|dj|djj�S)Nrzrrirg)rjr�rstrip�lstrip)rrrrrs
zAddrSpec.valuecCsLt|j�}t|�t|t�kr*t|j�}n|j}|jdk rH|d|jS|S)N�@)�setrlrj�
DOT_ATOM_ENDSrrm)rZnamesetZlprrrros
zAddrSpec.addr_specN)
r"r8r9r6r'r<rlrmrrorrrrry�sryc@seZdZdZdZdS)�ObsLocalPartzobs-local-partFN)r"r8r9r6r'rrrrr�sr�cs4eZdZdZdZedd��Ze�fdd��Z�ZS)�DisplayNamezdisplay-nameFcCs�t|�}|djdkr"|jd�n*|ddjdkrLt|ddd��|d<|djdkrd|j�n*|ddjdkr�t|ddd��|d <|jS)
NrrIrirkrkrkrkrkrk)rr6�popr)rrSrrrra)s
zDisplayName.display_namecs�d}|jrd}nx|D]}|jdkrd}qW|r�d}}|djdksX|ddjdkr\d}|d jdks||d
djdkr�d}|t|j�|St�jSdS)NFTz
quoted-stringrrrIr?rirkrkrk)rr6rrarr)rrXrZpreZpost)rrrr8s
zDisplayName.value) r"r8r9r6r;r<rarr=rr)rrr�$sr�c@s,eZdZdZdZedd��Zedd��ZdS)� LocalPartz
local-partFcCs&|djdkr|djS|djSdS)Nrz
quoted-string)r6rTr)rrrrrQs
zLocalPart.valuecCs�tg}t}d}x�|dtgD]�}|jdkr.q|r^|jdkr^|djdkr^t|dd��|d<t|t�}|r�|jdkr�|djdkr�|jt|dd���n
|j|�|d }|}qWt|dd
��}|jS)NFrrI�dotrirkrkrkrkrk)�DOTr6r�
isinstancerRr)rrSZlastZ
last_is_tl�tokZis_tlrrrrlXs$
zLocalPart.local_partN)r"r8r9r6r'r<rrlrrrrr�Lsr�cs4eZdZdZdZe�fdd��Zedd��Z�ZS)�
DomainLiteralzdomain-literalFcsdjt�jj��S)Nr)rrrrv)r)rrrrmuszDomainLiteral.domaincCs"x|D]}|jdkr|jSqWdS)N�ptext)r6r)rrrrr�ipys
zDomainLiteral.ip) r"r8r9r6r'r<rmr�r=rr)rrr�psr�c@seZdZdZdZdZdS)�MIMEVersionzmime-versionN)r"r8r9r6�major�minorrrrrr��sr�c@s4eZdZdZdZdZdZedd��Zedd��Z dS) � Parameter� parameterFzus-asciicCs|jr|djSdS)Nrir)� sectioned�number)rrrr�section_number�szParameter.section_numbercCsbx\|D]T}|jdkr|jS|jdkrx4|D],}|jdkr*x|D]}|jdkr>|jSq>Wq*WqWdS)Nrz
quoted-stringzbare-quoted-stringr)r6rU)rr,rrr�param_value�s
zParameter.param_valueN)
r"r8r9r6r��extendedrOr<r�r�rrrrr��sr�c@seZdZdZdS)�InvalidParameterzinvalid-parameterN)r"r8r9r6rrrrr��sr�c@seZdZdZedd��ZdS)� Attribute� attributecCs$x|D]}|jjd�r|jSqWdS)N�attrtext)r6�endswithr)rr,rrrrU�s
zAttribute.stripped_valueN)r"r8r9r6r<rUrrrrr��sr�c@seZdZdZdZdS)�Section�sectionN)r"r8r9r6r�rrrrr��sr�c@seZdZdZedd��ZdS)�ValuercCs2|d}|jdkr|d}|jjd�r,|jS|jS)NrrIri�
quoted-stringr��extended-attribute)r�r�r�)r6r�rUr)rr,rrrrU�s
zValue.stripped_valueN)r"r8r9r6r<rUrrrrr��sr�c@s(eZdZdZdZedd��Zdd�ZdS)�MimeParameterszmime-parametersFccs�t�}x\|D]T}|jjd�sq|djdkr.q|djj�}||krLg||<||j|j|f�qW�x�|j�D�]�\}}t|t d�d�}|dd}|j
}|jr�t|�dkr�|dddkr�|ddj
jtjd��|dd�}g}d}x�|D]�\} }
| |k�r6|
j�s$|
j
jtjd��q�n|
j
jtjd��|d7}|
j}|
j�r�ytjj|�}Wn&tk
�r�tjj|d d
�}YnRXy|j|d�}Wn"tk
�r�|jdd�}YnXtj|��r�|
j
jtj��|j|�q�Wd
j|�}||fVqpWdS)Nr�rr�)�keyriz.duplicate
parameter name; duplicate(s) ignoredz+duplicate parameter name; duplicate
ignoredz(inconsistent RFC2231 parameter
numberingzlatin-1)�encoding�surrogateescapezus-asciir)rr6r�r�striprRr��items�sortedrrOr�rjrr�InvalidHeaderDefectr��urllib�parseZunquote_to_bytes�UnicodeEncodeErrorZunquote�decode�LookupErrorr�_has_surrogates�UndecodableBytesDefectr)r�paramsr,�name�partsZfirst_paramrOZvalue_parts�ir��paramrrrrr��sZ
zMimeParameters.paramscCsXg}x8|jD].\}}|r0|jdj|t|���q|j|�qWdj|�}|rTd|SdS)Nz{}={}z;
r?r)r�rRr!rr)rr�r�rrrrr s
zMimeParameters.__str__N)r"r8r9r6r:r<r�r
rrrrr��sFr�c@seZdZdZedd��ZdS)�ParameterizedHeaderValueFcCs&x
t|�D]}|jdkr
|jSq
WiS)Nzmime-parameters)�reversedr6r�)rr,rrrr�#s
zParameterizedHeaderValue.paramsN)r"r8r9r:r<r�rrrrr�sr�c@seZdZdZdZdZdZdS)�ContentTypezcontent-typeF�textZplainN)r"r8r9r6r'�maintype�subtyperrrrr�+sr�c@seZdZdZdZdZdS)�ContentDispositionzcontent-dispositionFN)r"r8r9r6r'�content_dispositionrrrrr�3sr�c@seZdZdZdZdZdS)�ContentTransferEncodingzcontent-transfer-encodingFZ7bitN)r"r8r9r6r'rNrrrrr�:sr�c@seZdZdZdZdS)�HeaderLabelzheader-labelFN)r"r8r9r6r'rrrrr�Asr�c@seZdZdZdS)�Header�headerN)r"r8r9r6rrrrr�Gsr�csreZdZdZdZdZ�fdd�Z�fdd�Zdd�Ze dd ��Z
d�fdd� Zd
d�Ze dd��Z
dd�Z�ZS)�TerminalTcst�j||�}||_g|_|S)N)r�__new__r6r)�clsrr6r)rrrr�VszTerminal.__new__csdj|jjt�j��S)Nz{}({}))r!rr"rr#)r)rrrr#\szTerminal.__repr__cCst|jjd|j�dS)N�/)r1rr"r6)rrrrr3_szTerminal.pprintcCs
t|j�S)N)�listr)rrrrr$bszTerminal.all_defectsrcs2dj||jj|jt�j�|js"dn
dj|j��gS)Nz
{}{}/{}({}){}rz
{})r!rr"r6rr#r)rr0)rrrr5fszTerminal._ppcCsdS)Nr)rrrr�pop_trailing_wsoszTerminal.pop_trailing_wscCsgS)Nr)rrrrr+sszTerminal.commentscCst|�|jfS)N)r
r6)rrrr�__getnewargs__wszTerminal.__getnewargs__)r)r"r8r9r'r;r:r�r#r3r<r$r5r�r+r�r=rr)rrr�Ps r�c@s
eZdZedd��Zdd�ZdS)�WhiteSpaceTerminalcCsdS)Nr?r)rrrrr}szWhiteSpaceTerminal.valuecCsdS)NTr)rrrrr&�sz!WhiteSpaceTerminal.startswith_fwsN)r"r8r9r<rr&rrrrr�{sr�c@s
eZdZedd��Zdd�ZdS)�
ValueTerminalcCs|S)Nr)rrrrr�szValueTerminal.valuecCsdS)NFr)rrrrr&�szValueTerminal.startswith_fwsN)r"r8r9r<rr&rrrrr��sr�c@s
eZdZedd��Zdd�ZdS)�EWWhiteSpaceTerminalcCsdS)Nrr)rrrrr�szEWWhiteSpaceTerminal.valuecCsdS)Nrr)rrrrr
�szEWWhiteSpaceTerminal.__str__N)r"r8r9r<rr
rrrrr��sr�r��,zlist-separatorr}zroute-component-markerz([{}]+)rz[^{}]+rz\\�]z\]z[\x00-\x20\x7F]cCs>t|�}|r|jjtj|��tj|�r:|jjtjd��dS)z@If
input token contains ASCII non-printables, register a defect.z*Non-ASCII
characters found in header
tokenN)�_non_printable_finderrrRrZNonPrintableDefectrr�r�)�xtextZnon_printablesrrr�_validate_xtext�s
r�cCs�t|d�^}}g}d}d}xbtt|��D]J}||dkrL|rFd}d}nd}q(|rVd}n|||krdP|j||�q(W|d}dj|�dj||d�g|�|fS)akScan
printables/quoted-pairs until endchars and return unquoted ptext.
This function turns a run of qcontent, ccontent-without-comments, or
dtext-with-quoted-printables into a single string by unquoting any
quoted printables. It returns the string, the remaining value, and
a flag that is True iff there were any quoted printables decoded.
riFrTrN)�
_wsp_splitter�rangerjrRr)r�endcharsZfragment� remainderZvchars�escape�had_qp�posrrr�_get_ptext_to_endchars�s$ r�cCs.|j�}t|dt|�t|��d�}||fS)z�FWS
= 1*WSP
This isn't the RFC definition. We're using fws to represent
tokens where
folding can be done, but when we are parsing the *un*folding has
already
been done so we don't need to watch out for CRLF.
N�fws)r|r�rj)rZnewvaluer�rrr�get_fws�sr�c
Cs�t�}|jd�s
tjdj|���|dd�jdd�^}}||dd�krXtjdj|���dj|�}t|�dkr�|dtkr�|dtkr�|jdd�^}}|d|}t|j��dkr�|j j
tjd ��||_dj|�}yt
jd|d�\}}}} Wn(tk
�rtjd
j|j���YnX||_||_|j j| �xh|�r�|dtk�rdt|�\}
}|j
|
��q6t|d�^}}t|d�}t|�|j
|�dj|�}�q6W||fS)zE encoded-word = "=?" charset
"?" encoding "?" encoded-text "?="
z=?z"expected encoded word but found {}rgNz?=rirrzwhitespace
inside encoded wordz!encoded word format invalid:
'{}'�vtext)rM�
startswithr�HeaderParseErrorr!rvrrjrrrRr�rN�_ewr��
ValueErrorrOrPr*�WSPr�r�r�r�)
r�ewr�r�Zremstr�restr�rOrPrr,�charsr�rrr�get_encoded_word�sH
$
r�cCst�}�x|�r|dtkr4t|�\}}|j|�q
|jd�r�yt|�\}}Wntjk
rdYnrXd}t|�dkr�|dj dkr�|j
jtjd��d}|r�t|�dkr�|d
j d kr�t|dd�|d<|j|�q
t
|d�^}}t|d
�}t|�|j|�dj|�}q
W|S)aOunstructured = (*([FWS] vchar) *WSP) / obs-unstruct
obs-unstruct = *((*LF *CR *(obs-utext) *LF *CR)) / FWS)
obs-utext = %d0 / obs-NO-WS-CTL / LF / CR
obs-NO-WS-CTL is control characters except WSP/CR/LF.
So, basically, we have printable runs, plus control characters or nulls
in
the obsolete syntax, separated by whitespace. Since RFC 2047 uses the
obsolete syntax in its specification, but requires whitespace on either
side of the encoded words, I can see no reason to need to separate the
non-printable-non-whitespace from the printable runs if they occur, so
we
parse this into xtext tokens separated by WSP tokens.
Because an 'unstructured' value must by definition constitute
the entire
value, this 'get' routine does not return a remaining value,
only the
parsed TokenList.
rz=?Trir�z&missing whitespace before encoded
wordFrgzencoded-wordr�rrk���rkrk)rCr�r�rRr�r�rr�rjr6rr�r�r�r�r�r)rrDr,Zhave_wsr�r�r�rrr�get_unstructureds:
r�cCs*t|d�\}}}t|d�}t|�||fS)actext =
<printable ascii except \ ( )>
This is not the RFC ctext, since we are handling nested comments in
comment
and unquoting quoted-pairs here. We allow anything except the
'()'
characters, but if we find any ASCII other than the RFC defined
printable
ASCII, a NonPrintableDefect is added to the token's defects list.
Since
quoted pairs are converted to their unquoted values, what is returned
is
a 'ptext' token. In this case it is a WhiteSpaceTerminal, so
it's value
is ' '.
z()r�)r�r�r�)rr��_rrr�get_qp_ctextWs
r�cCs*t|d�\}}}t|d�}t|�||fS)aoqcontent =
qtext / quoted-pair
We allow anything except the DQUOTE character, but if we find any ASCII
other than the RFC defined printable ASCII, a NonPrintableDefect is
added to the token's defects list. Any quoted pairs are converted
to their
unquoted values, so what is returned is a 'ptext' token. In
this case it
is a ValueTerminal.
rr�)r�r�r�)rr�r�rrr�get_qcontenths
r�cCsNt|�}|stjdj|���|j�}|t|�d�}t|d�}t|�||fS)z�atext
= <matches _atext_matcher>
We allow any non-ATOM_ENDS in atext, but add an InvalidATextDefect to
the token's defects list if we find non-atext characters.
zexpected atext but found
'{}'N�atext)�_non_atom_end_matcherrr�r!r`rjr�r�)r�mr�rrr� get_atextws
r�cCs|ddkrtjdj|���t�}|dd�}|ddkrPt|�\}}|j|�x�|r�|ddkr�|dtkr|t|�\}}nd|dd�dkr�y"t|�\}}|j jtj
d��Wq�tjk
r�t|�\}}Yq�Xnt|�\}}|j|�qRW|�s|j jtj
d ��||fS||dd�fS)
z�bare-quoted-string = DQUOTE *([FWS] qcontent) [FWS] DQUOTE
A quoted-string without the leading or trailing white space. Its
value is the text between the quote marks, with whitespace
preserved and quoted pairs decoded.
rrzexpected '"' but found
'{}'riNrgz=?z!encoded word inside quoted stringz"end of
header inside quoted
string)rr�r!rVr�rRr�r�r�rr�)rZbare_quoted_stringr,rrr�get_bare_quoted_string�s2
r�cCs�|r |ddkr
tjdj|���t�}|dd�}x^|r�|ddkr�|dtkr^t|�\}}n&|ddkrxt|�\}}nt|�\}}|j|�q4W|s�|j jtj
d��||fS||dd�fS)z�comment = "(" *([FWS]
ccontent) [FWS] ")"
ccontent = ctext / quoted-pair / comment
We handle nested comments here, and quoted-pair in our qp-ctext
routine.
rrzexpected '(' but found '{}'riNrYzend of
header inside
comment)rr�r!rWr�r��get_commentr�rRrr�)rr@r,rrrr��s"
r�cCsTt�}xD|rJ|dtkrJ|dtkr2t|�\}}nt|�\}}|j|�qW||fS)z,CFWS
= (1*([FWS] comment) [FWS]) / FWS
r)rH�CFWS_LEADERr�r�r�rR)rrIr,rrr�get_cfws�sr�cCspt�}|r,|dtkr,t|�\}}|j|�t|�\}}|j|�|rh|dtkrht|�\}}|j|�||fS)z�quoted-string
= [CFWS] <bare-quoted-string> [CFWS]
'bare-quoted-string' is an intermediate class defined by this
parser and not by the RFC grammar. It is the quoted string
without any attached CFWS.
r)rQr�r�rRr�)rZ
quoted_stringr,rrr�get_quoted_string�s
r�cCs�t�}|r,|dtkr,t|�\}}|j|�|rL|dtkrLtjdj|���|jd�r�yt |�\}}Wq�tjk
r�t
|�\}}Yq�Xnt
|�\}}|j|�|r�|dtkr�t|�\}}|j|�||fS)zPatom
= [CFWS] 1*atext [CFWS]
An atom could be an rfc2047 encoded word.
rzexpected atom but found
'{}'z=?)rJr�r�rR� ATOM_ENDSrr�r!r�r�r�)rrKr,rrr�get_atom�s$
r�cCs�t�}|s|dtkr(tjdj|���xP|rx|dtkrxt|�\}}|j|�|r*|ddkr*|jt�|dd�}q*W|dtkr�tjdjd|���||fS)z(
dot-text = 1*atext *("." 1*atext)
rz8expected atom at a start of dot-atom-text but found
'{}'r riNz4expected atom at end of dot-atom-text but found
'{}'rk)rxr�rr�r!r�rRr�)rZ
dot_atom_textr,rrr�get_dot_atom_text�s
r�cCs�t�}|dtkr(t|�\}}|j|�|jd�rhyt|�\}}Wqttjk
rdt|�\}}YqtXnt|�\}}|j|�|r�|dtkr�t|�\}}|j|�||fS)z�
dot-atom = [CFWS] dot-atom-text [CFWS]
Any place we can have a dot atom, we could instead have an rfc2047
encoded
word.
rz=?) rwr�r�rRr�r�rr�r�)rZdot_atomr,rrr�get_dot_atoms
r�cCs�|dtkrt|�\}}nd}|ddkr8t|�\}}n*|dtkrVtjdj|���nt|�\}}|dk rx|g|dd�<||fS)a�word
= atom / quoted-string
Either atom or quoted-string may start with CFWS. We have to peel off
this
CFWS first to determine which type of word to parse. Afterward we
splice
the leading CFWS, if any, into the parsed sub-token.
If neither an atom or a quoted-string is found before the next special,
a
HeaderParseError is raised.
The token returned is either an Atom or a QuotedString, as appropriate.
This means the 'word' level of the formal grammar is not
represented in the
parse tree; this is because having that extra layer when manipulating
the
parse tree is more confusing than it is helpful.
rNrz1Expected 'atom' or 'quoted-string' but found
'{}')r�r�r��SPECIALSrr�r!r�)r�leaderr,rrr�get_word(s
r�cCs�t�}yt|�\}}|j|�Wn(tjk
rH|jjtjd��YnXx�|r�|dtkr�|ddkr�|jt�|jjtj d��|dd�}qLyt|�\}}WnDtjk
r�|dt
kr�t|�\}}|jjtj d��n�YnX|j|�qLW||fS)a�
phrase = 1*word / obs-phrase
obs-phrase = word *(word / "." / CFWS)
This means a phrase can be a sequence of words, periods, and CFWS in
any
order as long as it starts with at least one word. If anything other
than
words is detected, an ObsoleteHeaderDefect is added to the token's
defect
list. We also accept a phrase that starts with CFWS followed by a dot;
this is registered as an InvalidHeaderDefect, since it is not supported
by
even the obsolete grammar.
zphrase does not start with wordrr zperiod in
'phrase'riNzcomment found without
atom)rEr�rRrr�rr��PHRASE_ENDSr��ObsoleteHeaderDefectr�r�)rrFr,rrr�
get_phraseGs.
r�cCstt�}d}|dtkr"t|�\}}|s6tjdj|���yt|�\}}Wn^tjk
r�yt|�\}}Wn6tjk
r�|ddkr�|dtkr��t �}YnXYnX|dk r�|g|dd�<|j
|�|o�|ddks�|dtk�r2tt|�|�\}}|j
dk�r|jj
tjd��n|jj
tjd��||d<y|jjd�Wn(tk
�rj|jj
tjd ��YnX||fS)
z= local-part = dot-atom / quoted-string / obs-local-part
Nrz"expected local-part but found
'{}'rzinvalid-obs-local-partz<local-part is not dot-atom,
quoted-string, or obs-local-partz,local-part is not a dot-atom (contains
CFWS)�asciiz)local-part contains non-ASCII
characters))r�r�r�rr�r!r�r�r�rrR�get_obs_local_partr
r6rr�r�r�encoder�ZNonASCIILocalPartDefect)rrlr�r,�obs_local_partrrr�get_local_partmsB
r�cCs�t�}d}�x|o(|ddks,|dtk�r*|ddkrl|rN|jjtjd��|jt�d}|dd�}qnD|ddkr�|jt|dd ��|dd�}|jjtjd
��d}q|r�|djdkr�|jjtjd��yt |�\}}d}Wn4tj
k
�r|dtk�r�t|�\}}YnX|j|�qW|djdk�s\|djd
k�rn|djdk�rn|jjtjd��|djdk�s�|djd
k�r�|djdk�r�|jjtjd��|j�r�d|_||fS)z'
obs-local-part = word *("." word)
Frrr zinvalid repeated
'.'TriNzmisplaced-specialz/'\' character outside of
quoted-string/ccontentr�zmissing '.' between wordsrIz!Invalid
leading '.' in local partrgz"Invalid trailing '.'
in local
partzinvalid-obs-local-partrkrkrkr�)
r�r�rrRrr�r�r�r6r�r�r�r�)rr�Zlast_non_ws_was_dotr,rrrr��sV"
r�cCs@t|d�\}}}t|d�}|r0|jjtjd��t|�||fS)a
dtext = <printable ascii except \ [ ]> / obs-dtext
obs-dtext = obs-NO-WS-CTL / quoted-pair
We allow anything except the excluded characters, but if we find any
ASCII other than the RFC defined printable ASCII, a NonPrintableDefect
is
added to the token's defects list. Quoted pairs are converted to
their
unquoted values, so what is returned is a ptext token, in this case a
ValueTerminal. If there were quoted-printables, an
ObsoleteHeaderDefect is
added to the returned token's defect list.
z[]r�z(quoted printable found in
domain-literal)r�r�rrRrr�r�)rr�r�rrr� get_dtext�s
r�cCs,|rdS|jtjd��|jtdd��dS)NFz"end
of input inside
domain-literalr�zdomain-literal-endT)rRrr�r�)r�domain_literalrrr�_check_for_early_dl_end�srcCslt�}|dtkr(t|�\}}|j|�|s6tjd��|ddkrRtjdj|���|dd�}t||�rp||fS|jtdd��|dt kr�t
|�\}}|j|�t|�\}}|j|�t||�r�||fS|dt kr�t
|�\}}|j|�t||��r||fS|ddk�rtjd j|���|jtdd
��|dd�}|�rd|dtk�rdt|�\}}|j|�||fS)zB
domain-literal = [CFWS] "[" *([FWS] dtext) [FWS] "]"
[CFWS]
rzexpected domain-literal�[z6expected '[' at start of
domain-literal but found
'{}'riNzdomain-literal-startr�z4expected ']' at end
of domain-literal but found
'{}'zdomain-literal-end)r�r�r�rRrr�r!rr�r�r�r�)rrr,rrr�get_domain_literal�sD
rcCstt�}d}|dtkr"t|�\}}|s6tjdj|���|ddkrvt|�\}}|dk rd|g|dd�<|j|�||fSyt|�\}}Wn"tjk
r�t |�\}}YnX|r�|ddkr�tjd��|dk r�|g|dd�<|j|�|o�|ddk�rl|j
jtjd��|djd k�r(|d|dd�<xB|�rj|ddk�rj|jt
�t |d
d��\}}|j|��q*W||fS)z] domain = dot-atom /
domain-literal / obs-domain
obs-domain = atom *("." atom))
Nrzexpected domain but found '{}'rr}zInvalid
Domainr z(domain is not a dot-atom (contains
CFWS)zdot-atomri)rur�r�rr�r!rrRr�r�rr�r6r�)rrmr�r,rrr�
get_domains@
rcCs~t�}t|�\}}|j|�|s.|ddkrH|jjtjd��||fS|jtdd��t|dd��\}}|j|�||fS)z(
addr-spec = local-part "@" domain
rr}z"add-spec local part with no
domainzaddress-at-symbolriN)ryr�rRrrr�r�r)rror,rrr�
get_addr_spec,s
rcCs�t�}xf|rl|ddks$|dtkrl|dtkrHt|�\}}|j|�q|ddkr|jt�|dd�}qW|s�|ddkr�tjdj|���|jt�t |dd��\}}|j|�x�|o�|ddk�rB|jt�|dd�}|s�P|dtk�rt|�\}}|j|�|ddkr�|jt�t |dd��\}}|j|�q�W|�sTtjd��|ddk�rrtjd j|���|jt
dd
��||dd�fS)z� obs-route = obs-domain-list ":"
obs-domain-list = *(CFWS / ",") "@" domain
*("," [CFWS] ["@" domain])
Returns an obs-route token with the appropriate sub-tokens (that
is,
there is no obs-domain-list in the parse tree).
rr�riNr}z(expected obs-route domain but found '{}'z%end of
header while parsing obs-route�:z4expected ':' marking end of
obs-route but found
'{}'zend-of-obs-route-marker)rrr�r�rR�
ListSeparatorrr�r!�RouteComponentMarkerrr�)rZ obs_router,rrr�
get_obs_route<sB
r cCs�t�}|dtkr(t|�\}}|j|�|s:|ddkrJtjdj|���|jtdd��|dd�}|ddkr�|jtdd��|jjtj d ��|dd�}||fSyt
|�\}}Wnztjk
�r2y"t|�\}}|jjtjd
��Wn(tjk
�rtjdj|���YnX|j|�t
|�\}}YnX|j|�|�r`|ddk�r`|dd�}n|jjtj d��|jtdd��|�r�|dtk�r�t|�\}}|j|�||fS)
z�
angle-addr = [CFWS] "<" addr-spec ">" [CFWS] /
obs-angle-addr
obs-angle-addr = [CFWS] "<" obs-route addr-spec
">" [CFWS]
r�<z"expected angle-addr but found
'{}'zangle-addr-startriN�>zangle-addr-endznull addr-spec
in angle-addrz*obsolete route specification in angle-addrz.expected
addr-spec or obs-route but found '{}'z"missing trailing
'>' on
angle-addr)
rpr�r�rRrr�r!r�rr�rr r�)rZ
angle_addrr,rrr�get_angle_addresJ
rcCs<t�}t|�\}}|j|dd��|jdd�|_||fS)z�
display-name = phrase
Because this is simply a name-rule, we don't return a display-name
token containing a phrase, but rather a display-name token with
the content of the phrase.
N)r�r�r*r)rrar,rrr�get_display_name�s
r
cCs�t�}d}|dtkr6t|�\}}|s6tjdj|���|ddkr�|dtkr^tjdj|���t|�\}}|s~tjdj|���|dk r�|g|ddd�<d}|j|�t |�\}}|dk r�|g|dd�<|j|�||fS)z,
name-addr = [display-name] angle-addr
Nrz!expected name-addr but found '{}'r
)
rhr�r�rr�r!r�r
rRr)rZ name_addrr�r,rrr�
get_name_addr�s0
rcCs�t�}yt|�\}}WnNtjk
rdyt|�\}}Wn&tjk
r^tjdj|���YnXYnXtdd�|jD��r�d|_|j |�||fS)z&
mailbox = name-addr / addr-spec
zexpected mailbox but found
'{}'css|]}t|tj�VqdS)N)r�rr�)rrrrrr�szget_mailbox.<locals>.<genexpr>zinvalid-mailbox)
rsrrr�rr!�anyr$r6rR)rrbr,rrr�get_mailbox�s
rcCsht�}xX|r^|d|kr^|dtkrF|jt|dd��|dd�}qt|�\}}|j|�qW||fS)z�
Read everything up to one of the chars in endchars.
This is outside the formal grammar. The InvalidMailbox TokenList that
is
returned acts like a Mailbox, but the data attributes are None.
rzmisplaced-specialriN)rtr�rRr�r�)rr�Zinvalid_mailboxr,rrr�get_invalid_mailbox�srcCs�t�}�x�|o|ddk�r�yt|�\}}|j|�W�ntjk
�r@d}|dtkr�t|�\}}|sz|ddkr�|j|�|jjtjd��n@t |d�\}}|dk r�|g|dd�<|j|�|jjtj
d��nb|ddkr�|jjtjd��nBt |d�\}}|dk �r
|g|dd�<|j|�|jjtj
d��YnX|�r�|ddk�r�|d
}d |_t |d�\}}|j|�|jjtj
d��|r
|ddkr
|jt
�|dd�}q
W||fS)aJ mailbox-list = (mailbox *("," mailbox)) /
obs-mbox-list
obs-mbox-list = *([CFWS] ",") mailbox *(","
[mailbox / CFWS])
For this routine we go outside the formal grammar in order to improve
error
handling. We recognize the end of the mailbox list only at the end of
the
value or at a ';' (the group terminator). This is so that we
can turn
invalid mailboxes into InvalidMailbox tokens and continue parsing any
remaining valid mailboxes. We also allow all mailbox entries to be
null,
and this condition is handled appropriately at a higher level.
r�;Nz,;zempty element in mailbox-listzinvalid mailbox in
mailbox-listr�rizinvalid-mailboxrk)rcrrRrr�r�r�rr�rr�r6r*r)rZmailbox_listr,r�rbrrr�get_mailbox_list�sN
rcCst�}|s$|jjtjd��||fSd}|r�|dtkr�t|�\}}|sl|jjtjd��|j|�||fS|ddkr�|j|�||fSt|�\}}t|j �dkr�|dk r�|j|�|j
|�|jjtjd��||fS|dk r�|g|dd�<|j|�||fS)zg
group-list = mailbox-list / CFWS / obs-group-list
obs-group-list = 1*([CFWS] ",") [CFWS]
zend of header before group-listNrzend of header in
group-listrzgroup-list with empty
entries)rerrRrr�r�r�rrjr^r*r�)rZ
group_listr�r,rrr�get_group_list s8
rcCs"t�}t|�\}}|s$|ddkr4tjdj|���|j|�|jtdd��|dd�}|r�|ddkr�|jtdd��||dd�fSt|�\}}|j|�|s�|jjtj d ��n|ddkr�tjd
j|���|jtdd��|dd�}|�r|dt
k�rt|�\}}|j|�||fS)z7 group = display-name
":" [group-list] ";" [CFWS]
rrz8expected ':' at end of group display name but found
'{}'zgroup-display-name-terminatorriNrzgroup-terminatorzend
of header in groupz)expected ';' at end of group but found
{})rfr
rr�r!rRr�rrr�r�r�)rr`r,rrr� get_groupEs2
rcCsxt�}yt|�\}}WnNtjk
rdyt|�\}}Wn&tjk
r^tjdj|���YnXYnX|j|�||fS)a� address =
mailbox / group
Note that counter-intuitively, an address can be either a single
address or
a list of addresses (a group). This is why the returned Address object
has
a 'mailboxes' attribute which treats a single address as a
list of length
one. When you need to differentiate between to two cases, extract the
single
element, which is either a mailbox or a group token.
zexpected address but found
'{}')r_rrr�rr!rR)rr[r,rrr�get_addresscs
rcCs�t�}�x�|�r�yt|�\}}|j|�W�n$tjk
�rP}�zd}|dtkr�t|�\}}|sr|ddkr�|j|�|jjtjd��nFt |d�\}}|dk r�|g|dd�<|jt
|g��|jjtjd��nh|ddkr�|jjtjd��nHt |d�\}}|dk �r|g|dd�<|jt
|g��|jjtjd��WYdd}~XnX|�r�|ddk�r�|d
d}d|_t |d�\}}|j
|�|jjtjd��|r
|jtdd ��|dd�}q
W||fS)a� address_list = (address *("," address)) /
obs-addr-list
obs-addr-list = *([CFWS] ",") address *(","
[address / CFWS])
We depart from the formal grammar here by continuing to parse until the
end
of the input, assuming the input to be entirely composed of an
address-list. This is always true in email parsing, and allows us
to skip invalid addresses to parse additional valid ones.
Nrr�z"address-list entry with no contentzinvalid address in
address-listzempty element in
address-listrizinvalid-mailboxzlist-separatorrk)rZrrRrr�r�r�rr�rr_r�r6r*r�)rZaddress_listr,�errr�rbrrr�get_address_list�sN
rcCs�t�}|s
|jjtjd��|S|dtkrXt|�\}}|j|�|sX|jjtjd��d}x8|r�|ddkr�|dtkr�||d7}|dd�}q^W|j�s�|jjtjdj |���|jt
|d ��nt|�|_|jt
|d
��|o�|dtk�r
t|�\}}|j|�|�s
|ddk�rX|jdk �r>|jjtjd��|�rT|jt
|d ��|S|jt
dd��|dd�}|�r�|dtk�r�t|�\}}|j|�|�s�|jdk �r�|jjtjd��|Sd}x2|�r�|dtk�r�||d7}|dd�}�q�W|j��s2|jjtjd
j |���|jt
|d ��nt|�|_
|jt
|d
��|�rv|dtk�rvt|�\}}|j|�|�r�|jjtjd��|jt
|d ��|S)zE mime-version = [CFWS] 1*digit [CFWS] "."
[CFWS] 1*digit [CFWS]
z%Missing MIME version number (eg: 1.0)rz0Expected MIME version number
but found only CFWSrr riNz1Expected MIME major version number but found
{!r}r��digitsz0Incomplete MIME version; found only major
numberzversion-separatorz1Expected MIME minor version number but found
{!r}z'Excess non-CFWS text after MIME
version)r�rrRr�HeaderMissingRequiredValuer�r��isdigitr�r!r��intr�r�)rZmime_versionr,rrrr�parse_mime_version�sv
rcCsht�}xX|r^|ddkr^|dtkrF|jt|dd��|dd�}qt|�\}}|j|�qW||fS)z�
Read everything up to the next ';'.
This is outside the formal grammar. The InvalidParameter TokenList
that is
returned acts like a Parameter, but the data attributes are None.
rrzmisplaced-specialriN)r�r�rRr�r�)rZinvalid_parameterr,rrr�get_invalid_parametersrcCsNt|�}|stjdj|���|j�}|t|�d�}t|d�}t|�||fS)a8ttext
= <matches _ttext_matcher>
We allow any non-TOKEN_ENDS in ttext, but add defects to the
token's
defects list if we find non-ttext characters. We also register defects
for
*any* non-printables even though the RFC doesn't exclude all of
them,
because we follow the spirit of RFC 5322.
zexpected ttext but found
'{}'N�ttext)�_non_token_end_matcherrr�r!r`rjr�r�)rr�rrrr� get_ttexts
r!cCs�t�}|r,|dtkr,t|�\}}|j|�|rL|dtkrLtjdj|���t|�\}}|j|�|r�|dtkr�t|�\}}|j|�||fS)z�token
= [CFWS] 1*ttext [CFWS]
The RFC equivalent of ttext is any US-ASCII chars except space, ctls,
or
tspecials. We also exclude tabs even though the RFC doesn't.
The RFC implies the CFWS but is not explicit about it in the BNF.
rzexpected token but found '{}') rLr�r�rR�
TOKEN_ENDSrr�r!r!)rZmtokenr,rrr� get_token's
r#cCsNt|�}|stjdj|���|j�}|t|�d�}t|d�}t|�||fS)aQattrtext
= 1*(any non-ATTRIBUTE_ENDS character)
We allow any non-ATTRIBUTE_ENDS in attrtext, but add defects to the
token's defects list if we find non-attrtext characters. We also
register
defects for *any* non-printables even though the RFC doesn't
exclude all of
them, because we follow the spirit of RFC 5322.
z expected attrtext but found
{!r}Nr�)�_non_attribute_end_matcherrr�r!r`rjr�r�)rr�r�rrr�get_attrtext>s
r%cCs�t�}|r,|dtkr,t|�\}}|j|�|rL|dtkrLtjdj|���t|�\}}|j|�|r�|dtkr�t|�\}}|j|�||fS)aH
[CFWS] 1*attrtext [CFWS]
This version of the BNF makes the CFWS explicit, and as usual we use a
value terminal for the actual run of characters. The RFC equivalent of
attrtext is the token characters, with the subtraction of
'*', "'", and '%'.
We include tab in the excluded set just as we do for token.
rzexpected token but found
'{}') r�r�r�rR�ATTRIBUTE_ENDSrr�r!r%)rr�r,rrr�
get_attributeQs
r'cCsNt|�}|stjdj|���|j�}|t|�d�}t|d�}t|�||fS)z�attrtext
= 1*(any non-ATTRIBUTE_ENDS character plus '%')
This is a special parsing routine so that we get a value that
includes % escapes as a single string (which we decode as a single
string later).
z)expected extended attrtext but found
{!r}Nzextended-attrtext)�#_non_extended_attribute_end_matcherrr�r!r`rjr�r�)rr�r�rrr�get_extended_attrtexths
r)cCs�t�}|r,|dtkr,t|�\}}|j|�|rL|dtkrLtjdj|���t|�\}}|j|�|r�|dtkr�t|�\}}|j|�||fS)z�
[CFWS] 1*extended_attrtext [CFWS]
This is like the non-extended version except we allow % characters, so
that
we can pick up an encoded value as a single string.
rzexpected token but found
'{}') r�r�r�rR�EXTENDED_ATTRIBUTE_ENDSrr�r!r))rr�r,rrr�get_extended_attributezs
r+cCs�t�}|s|ddkr(tjdj|���|jtdd��|dd�}|sX|dj�rhtjdj|���d}x,|r�|dj�r�||d7}|dd�}qnW|dd kr�|d kr�|jjtjd
��t |�|_
|jt|d��||fS)a6 '*' digits
The formal BNF is more complicated because leading 0s are not allowed.
We
check for that and add a defect. We also assume no CFWS is allowed
between
the '*' and the digits, though the RFC is not crystal clear
on that.
The caller should already have dealt with leading CFWS.
r�*zExpected section but found {}zsection-markerriNz$Expected
section number but found {}r�0z'section number has an invalid
leading
0r)r�rr�r!rRr�rrZInvalidHeaderErrorrr�)rr�rrrr�get_section�s&
r.cCs�t�}|stjd��d}|dtkr0t|�\}}|sDtjdj|���|ddkr^t|�\}}nt|�\}}|dk r�|g|dd�<|j|�||fS)z
quoted-string / attribute
z&Expected value but found end of stringNrz Expected value but
found only
{}r) r�rr�r�r�r!r�r+rR)r�vr�r,rrr� get_value�s
r0cCs�t�}t|�\}}|j|�|s.|ddkrN|jjtjdj|���||fS|ddkr�y
t|�\}}d|_|j|�Wntj k
r�YnX|s�tj d��|ddkr�|jt
dd��|dd �}d|_|dd
kr�tj d��|jt
d
d��|dd �}d }|�r.|dtk�r.t
|�\}}|j|�d }|}|j�rH|�rH|dd
k�rHt|�\}}|j}d}|jdk�r�|�r�|ddk�r�d}n$t|�\}} | �r�| ddk�r�d}n(yt|�\}} WnYnX| �s�d}|�r2|jjtjd��|j|�x,|D]$}
|
jdk�rg|
d d �<|
}P�qW|}nd }|jjtjd��|�rb|ddk�rbd }nt|�\}}|j�s�|jdk�r�|�s�|ddk�r�|j|�|d k �r�|�s�t|��|}||fS|jjtjd��|�s|jjtjd��|j|�|d k�r||fSn�|d k �rVx|D]}
|
jdk�r"P�q"W|
jdk|j|
�|
j|_|ddk�rttj dj|���|jt
dd��|dd �}|�r�|ddk�r�t|�\}}|j|�|j|_|�s�|ddk�r�tj dj|���|jt
dd��|dd �}|d k �rZt�}x>|�rR|dtk�r8t|�\}}nt|�\}}|j|��qW|}nt|�\}}|j|�|d k �r�|�s�t|��|}||fS)aY
attribute [section] ["*"] [CFWS] "=" value
The CFWS is implied by the RFC but not made explicit in the BNF. This
simplified form of the BNF from the RFC is made to conform with the RFC
BNF
through some extra checks. We do it this way because it makes both
error
recovery and working with the resulting parse tree easier.
rrz)Parameter contains name ({}) but no valuer,TzIncomplete
parameterzextended-parameter-markerriN�=zParameter not followed by
'='zparameter-separatorrF�'z5Quoted string value for
extended parameter is invalidzbare-quoted-stringzZParameter marked as
extended but appears to have a quoted string value that is
non-encodedzcApparent initial-extended-value but attribute was not marked
as extended or was not initial sectionz(Missing required charset/lang
delimiterszextended-attrtextr�z=Expected RFC2231 char/lang encoding
delimiter, but found {!r}zRFC2231-delimiterz;Expected RFC2231 char/lang
encoding delimiter, but found
{})r�r'rRrrr�r!r.r�r�r�r�r�r�r�rUr�r%r)r6r0�AssertionErrorrrOrPr�r�r�r�)rr�r,r�r�ZappendtoZqstringZinner_valueZ
semi_validr��tr/rrr�
get_parameter�s�
r5cCsht�}�xZ|�rbyt|�\}}|j|�Wn�tjk
r�}z�d}|dtkrZt|�\}}|sl|j|�|S|ddkr�|dk r�|j|�|jjtjd��n@t |�\}}|r�|g|dd�<|j|�|jjtjdj
|���WYdd}~XnX|�r@|ddk�r@|d
}d|_t |�\}}|j|�|jjtjdj
|���|r
|jt
dd ��|dd�}q
W|S)a! parameter *( ";" parameter )
That BNF is meant to indicate this routine should only be called after
finding and handling the leading ';'. There is no
corresponding rule in
the formal RFC grammar, but it is more convenient for us for the set of
parameters to be treated as its own TokenList.
This is 'parse' routine because it consumes the reminaing
value, but it
would never be called to parse a full header. Instead it is called to
parse everything after the non-parameter value of a specific MIME
header.
Nrrzparameter entry with no contentzinvalid parameter
{!r}rizinvalid-parameterz)parameter with invalid trailing text
{!r}zparameter-separatorrk)r�r5rRrr�r�r�rr�rr!r6r*r�)rZmime_parametersr,rr�r�rrr�parse_mime_parametersO sD
r6cCs�xX|rX|ddkrX|dtkr@|jt|dd��|dd�}qt|�\}}|j|�qW|sbdS|jtdd��|jt|dd���dS)zBDo
our best to find the parameters in an invalid MIME header
rrzmisplaced-specialriNzparameter-separator)r�rRr�r�r6)Z tokenlistrr,rrr�_find_mime_parameters� sr7cCs�t�}d}|s$|jjtjd��|Syt|�\}}Wn8tjk
rl|jjtjdj|���t ||�|SX|j|�|s�|ddkr�|jjtjd��|r�t ||�|S|j
j�j�|_
|jtdd��|dd �}yt|�\}}Wn:tjk
�r$|jjtjd
j|���t ||�|SX|j|�|j
j�j�|_|�sJ|S|ddk�r�|jjtjdj|���|`
|`t ||�|S|jtdd
��|jt|dd ���|S)z�
maintype "/" subtype *( ";" parameter )
The maintype and substype are tokens. Theoretically they could
be checked against the official IANA list + x-token, but we
don't do that.
Fz"Missing content type specificationz(Expected content maintype
but found {!r}rr�zInvalid content
typezcontent-type-separatorriNz'Expected content subtype but found
{!r}rz<Only parameters are valid after content type, but found
{!r}zparameter-separator)r�rrRrrr#r�r�r!r7rr��lowerr�r�r�r6)rZctypeZrecoverr,rrr�parse_content_type_header� sX
r9c
Cs�t�}|s |jjtjd��|Syt|�\}}Wn8tjk
rh|jjtjdj|���t ||�|SX|j|�|j
j�j�|_
|s�|S|ddkr�|jjtjdj|���t ||�|S|jtdd��|jt|dd���|S) z*
disposition-type *( ";" parameter )
zMissing content dispositionz+Expected content disposition but found
{!r}rrzCOnly parameters are valid after content disposition, but found
{!r}zparameter-separatorriN)r�rrRrrr#r�r�r!r7rr�r8r�r�r6)rZdisp_headerr,rrr�
parse_content_disposition_header� s2
r:cCs�t�}|s |jjtjd��|Syt|�\}}Wn.tjk
r^|jjtjdj|���YnX|j|�|j j
�j�|_|s�|Sx^|r�|jjtjd��|dt
kr�|jt|dd��|dd�}q�t|�\}}|j|�q�W|S)z
mechanism
z!Missing content transfer encodingz1Expected content transfer encoding
but found {!r}z*Extra text after content transfer
encodingrzmisplaced-specialriN)r�rrRrrr#r�r�r!rr�r8rNr�r�r�)rZ
cte_headerr,rrr�&parse_content_transfer_encoding_header� s.
r;cCsDd}|r@|dr@|ddtkr@|dd}|ddd �|d
<|S)Nrrirkrkrkrkrkrkrkrk)r�)�linesZwsprrr�_steal_trailing_WSP_if_exists
s
r=cCs�|jptd�}|jrdnd}dg}d}d}d}tdd�}t|�} �xH| �r�| jd�}
|
|krf|d 8}qDt|
�}y|j|�|}Wn6tk
r�t d
d�|
j
D��r�d}nd}d
}YnX|
jdkr�t|
|||�qD|o�|�r�|
j
�sTd}d}|
j�rT|
j|d�dd�}
|j|
k�rTt|
�|t|d�k�rBt|�}|j|�|d|
7<qDt|
d��snt|
�| } nt|||||
j|�}d}qDt|�|t|d�k�r�|d|7<qD|
j�r�t|�d |k�r�t|�}|�s�|
j��r�|j||�qDt|
d��s.t|
�}|
j
�s$|d 7}|j|�|| } qD|
j
�rP|�rP| jd|
�d
}qDt|�}|�sh|
j��rx|j||�qD|d|7<qDW|jj|�|jS)zLReturn
string of contents of parse_tree folded according to RFC rules.
z+infzutf-8zus-asciirNrF�wrap_as_ew_blockedricss|]}t|tj�VqdS)N)r�rr�)rrrrrr7
sz%_refold_parse_tree.<locals>.<genexpr>zunknown-8bitTzmime-parameters)r-r�rkrkrkrkrkrk)Zmax_line_length�float�utf8r�r�r�r
r�r�rr$r6�_fold_mime_parametersr'r:r/�lineseprjr=rRr7�_fold_as_ewr;r&�insertr)Z
parse_treer-�maxlenr�r<�last_ewr>Z
want_encodingZend_ew_not_allowedr�r(�tstrrOZencoded_part�newlineZnewpartsrrrr.
s�
r.cCs�|dk r<|r<tt|d
|d�|��}|dd|�|d<|dtkr�|d}|dd�}t|d
�|krz|jt|��|d|7<d}|dtkr�|d}|dd�}|dkr�t|d�n|}x�|�r�|t|d�} |dkr�dn|}
| t|
�d}|dk�r|jd�q�|d|�}tj||
d �}
t|
�| }|dk�r\|d|�}tj|�}
|d|
7<|t|�d�}|r�|jd�t|d�}q�W|d|7<|�r�|SdS)a�Fold
string to_encode into lines as encoded word, combining if allowed.
Return the new value for last_ew, or None if ew_combine_allowed is
False.
If there is already an encoded word in the last line of lines
(indicated by
a non-None value for last_ew) and ew_combine_allowed is true, decode
the
existing ew, combine it with to_encode, and re-encode. Otherwise,
encode
to_encode. In either case, split to_encode as necessary so that the
encoded segments fit within maxlen.
Nrirrzus-asciizutf-8�r?)rOrkrkrkrkrkrkrkrkrkrkrkrkrk)r
r�r�rjrRr=r�r�)Z to_encoder<rErFr;rOZleading_wspZtrailing_wspZnew_last_ewZremaining_spaceZ encode_asZ
text_spaceZ
first_partr�ZexcessrrrrC�
sF
rCcCs��x�|jD�]�\}}|dj�jd�s6|dd7<|}d}y|j|�d}Wn0tk
r�d}tj|�rxd}d}nd}YnX|r�tjj |d |d
�} dj
||| �}
ndj
|t|��}
t|d�t|
�d|kr�|dd
|
|d<q
n"t|
�d|k�r|j
d
|
�q
d}|d}x�|�r�t|�tt|��dt|�}
||
dk�rTd}||
d}}x<|d|�}tjj |d |d
�} t| �|k�r�P|d8}�qfW|j
dj
|||| ��d }|d7}||d�}|�r|dd7<�qWq
WdS)a>Fold TokenList 'part' into the 'lines' list
as mime parameters.
Using the decoded list of parameters and values, format them according
to
the RFC rules, including using RFC2231 encoding if the value cannot be
expressed in 'encoding' and/or the parameter+value is too
long to fit
within 'maxlen'.
rir�strictFTzunknown-8bitr�zutf-8r)Zsaferz
{}*={}''{}z{}={}r?rgrz''rz�NNz
{}*{}*={}{}rkrkrkrkrkrk)r�r{r�r�r�rr�r�r�rXr!rrjrRr
)r(r<rEr�r�rrOZ
error_handlerZencoding_requiredZ
encoded_valuerGr�Zextra_chromeZ
chrome_lenZ
splitpointZmaxchars�partialrrrrA�
s\
rA)��__doc__�rer��stringr�collectionsr�operatorrZemailrr�rrr~r�r�r�r�rr�Z TSPECIALSr"Z ASPECIALSr&r*rr�rr>rCrErGrHrJrLrMrQrVrWrZr_rcrerfrhrprrrsrtrurwrxryr�r�r�r�r�r�r�r�r�r�r�r�r�r�r�r�r�r
r�r�r�r�r�rr�compiler!rrvr�r�matchr��findallr�r
r$r(r�r�r�r�r�r�r�r�r�r�r�r�r�r�r�r�r�r�r�r�rrrrr rr
rrrrrrrrrrr!r#r%r'r)r+r.r0r5r6r7r9r:r;r=r.rCrArrrr�<module>DsC"
!($
V +
*8"
&'/'&).9%>D49/c7