I think unicode versions of those attrs should be separate and would
like to suggest names upath_info, uscript_name (and an alias ubody for
unicode_body). My experience with non-ascii URIs and forms data made
me stick to ASCII and if that's not possible, UTF-8. Still links from
other websites to non-ascii uris sometimes make the user-agent send
request in some other encoding. So I try to keep script_name /
path_info in ASCII and use POST for forms. Google seems to use an
additional field in search form to specify what encoding the form used
(ie=..., probably meaning "input encoding" and oe=... for the encoding
of page returned) and I think they only use ascii for the path
component.
On 2009-07-24, Ian Bicking <ianbicking@???> wrote:
>
> On Thu, Jul 23, 2009 at 2:37 PM, Jim Fulton <jim@???> wrote:
> >
> > On Thu, Jul 23, 2009 at 5:06 PM, Ian Bicking<ianbicking@???> wrote:
> > > On Thu, Jul 23, 2009 at 4:06 AM, Jim Fulton <jim@???> wrote:
> > >>
> > >> webob doesn't convert URLs to Unicode. RFC3986 specifies that UCS
> > >> characters should be encoded in URIs using a UTF-8 encoding followed
> > >> by a URL encoding, so the reverse decoding is straightforward.
> > >> Reading the RFC, I can see how the decision of whether to interpret
> > >> URLs (or URL path segments) as encoded UCS characters might be
> > >> application specific.
> > >>
> > >> My question is whether it was a design decision to leave URLs
> > >> un-decoded, and, if so, what the rational is. I'm not necessarily
> > >> disagreeing with such a decision. :)
> > >
> > > I have intended to decode them, specifically req.path_info and
> > > req.script_name, using the same encoding that req.GET etc (req.charset).
> >
> > That would be inconsistent with RFC3986, which specifies utf-8.
>
>
> I guess it really depends on What The World Actually Does, and I'm not
> sure in this case. For instance, QUERY_STRING is encoded with the
> page encoding I'm pretty sure, so then presumably it could be
> /UTF8-urlencoded-data?latin1-urlencoded-data -- which of course may
> actually be the case (after all, the browser doesn't generate the
> path). Also, what happens when you have <a href="/bête"> or something
> in a page? The browser encodes unsafe characters in these cases.
> So... I'm hoping someone who has experience with the more challenging
> situations with encodings could say what happens.
>
>
> --
>
> Ian Bicking | http://blog.ianbicking.org | http://topplabs.org/civichacker
>
> >
>
--
Best Regards,
Sergey Schetinin
http://s3bk.com/ -- S3 Backup
http://word-to-html.com/ -- Word to HTML Converter
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Paste Users" group.
To post to this group, send email to paste-users@???
To unsubscribe from this group, send email to paste-users+unsubscribe@???
For more options, visit this group at
http://groups.google.com/group/paste-users?hl=en
-~----------~----~----~----~------~----~------~--~---