diff --git a/CHANGES.rst b/CHANGES.rst index 9dc622e04b..51703a4d21 100644 --- a/CHANGES.rst +++ b/CHANGES.rst @@ -22,6 +22,10 @@ Features registration of a ``ZPublisher.interfaces.IXmlrpcChecker`` utility (`#620 `_). +- Document request parameter handling + (`#636 `_). + + Fixes +++++ diff --git a/docs/zdgbook/ObjectPublishing.rst b/docs/zdgbook/ObjectPublishing.rst index 24803de248..6d141ea4bd 100644 --- a/docs/zdgbook/ObjectPublishing.rst +++ b/docs/zdgbook/ObjectPublishing.rst @@ -626,19 +626,54 @@ marshals arguments. Marshalling Arguments from the Request -------------------------------------- -The publisher marshals form data from GET and POST requests. Simple -form fields are made available as Python strings. Multiple fields -such as form check boxes and multiple selection lists become sequences -of strings. File upload fields are represented with **FileUpload** -objects. **FileUpload** objects behave like normal Python file objects -and additionally have a **filename** attribute which is the name of the -file and a **headers** attribute which is a dictionary of file upload -headers. +Zope responds to requests, specified via URL, request headers +and an optional request body. A URL consists of +various parts, among them a *path* and a *query*, see +`RFC 2396 `_ for details. + +Zope uses the *path* to locate an object, method or view for +producing the response (this process is called *traversal*) +and *query* - if present - as a specification for +request parameters. Additionally, request parameters can come from +the optional request body. + +Zope preprocesses the incoming request information and makes +the result available in the so called *request* object. +This way, the response generation code can access all relevant request +information in an easy and natural (pythonic) way. +Preprocessing transforms the request *parameters* into request (or form) +*variables*. +They are made available via the request object's ``form`` attribute +(a ``dict``) or directly via the request object itself, as long as they are +not hidden by other request information. + +The request parameters coming from the *query* have the form +*name*\ ``=``\ *value* and are separated by ``&``; +request parameters from a request body can have different forms +and can be separated in different ways dependent on the +request ``Content-Type``, but they, too, have a *name* and a *value*. + +All request parameter names and values are strings. +A parameter value, however, often designates a value of a specific type, +e.g. an integer or a datetime. The response generating code can +be simplified significantly when it does not need to make the +type conversion itself. In addition, in some cases the request parameters +are not independent from one another but related. In those +cases it can help if the related parameters +are aggregated into a single object. Zope supports both cases but it needs +directives to guide the process. It uses *name* suffixes of the form +``:``\ *directive* to specify such directives. For example, +the parameter ``i:int=1`` tells Zope to convert the value ``'1'`` to an +integer and use it as value for request variable ``i``; the parameter sequence +``x.name:record=Peter&x.age:int:record=10`` tells Zope to construct +a record ``x`` with attributes ``name`` and ``age`` and respective values +``'Peter'`` and ``10``. The publisher also marshals arguments from CGI environment variables and cookies. When locating arguments, the publisher first looks in -CGI environment variables, then other request variables, then form -data, and finally cookies. Once a variable is found, no further +other (i.e. explicitly set or special) request variables, +then CGI environment variables, then form +variables, and finally cookies. Once a variable is found, no further searching is done. So for example, if your published object expects to be called with a form variable named ``SERVER_URL``, it will fail, since this argument will be marshalled from the CGI environment first, @@ -652,7 +687,7 @@ Unfortunately, there is no current documentation for those variables. Argument Conversion -------------------- +~~~~~~~~~~~~~~~~~~~ The publisher supports argument conversion. For example consider this function:: @@ -668,8 +703,14 @@ conversion you name your form variables with a colon followed by a type conversion code. For example, to call the above function with 66 as the argument you -can use this URL ``one_third?number:int=66`` The publisher supports -many converters: +can use this URL ``one_third?number:int=66``. + +Some converters employ special logic for the conversion. +For example, both ``tokens`` as well as ``lines`` convert to +a list of strings but ``tokens`` splits the input at whitespace, ``lines`` +at newlines. + +The publisher supports many converters: - **boolean** -- Converts a variable to ``True`` or ``False``. Variables that are 0, None, an empty string, or an empty sequence @@ -695,9 +736,6 @@ many converters: - **required** -- Raises an exception if the variable is not present or is an empty string. -- **ignore_empty** -- Excludes a variable from the request if the - variable is an empty string. - - **date** -- Converts a string to a **DateTime** object. The formats accepted are fairly flexible, for example ``10/16/2000``, ``12:01:13 pm``. @@ -706,18 +744,12 @@ many converters: but especially treats ambiguous dates as "days before month before year". This useful if you need to parse non-US dates. -- **list** -- Converts a variable to a Python list of values, even if - there is only one value. - -- **tuple** -- Converts a variable to a Python tuple of values, even if - there is only one value. - - **lines** -- Converts a variable to a Python list of native strings by splitting the string on line breaks. Also converts list/tuple of variables to list/tuple of native strings. - **tokens** -- Converts a variable to a Python list of native strings - by splitting the variable on spaces. + by splitting the variable on whitespace. - **text** -- Converts a variable to a native string with normalized line breaks. Different browsers on various platforms encode line @@ -727,7 +759,10 @@ many converters: - **ulines**, **utokens**, **utext** -- like **lines**, **tokens**, **text**, but always converts into unicode strings. -If the publisher cannot coerce a request variable into the type +The full list of supported converters can be found +in ``ZPublisher.Converters.type_converters``. + +If the publisher cannot coerce a request parameter into the type required by the type converter it will raise an error. This is useful for simple applications, but restricts your ability to tailor error messages. If you wish to provide your own error messages, you should @@ -746,100 +781,49 @@ could create a list of integers like so:: -In addition to the mentioned type converters, the publisher also supports -both method and record arguments and specifying character encodings. - - -Character Encodings for Arguments ---------------------------------- - -The publisher needs to know what character encoding was used by the -browser to encode the submitted form fields. In the past, this could -have been a complicated topic. - -Nowadays, as UTF-8 is the de facto standard for encoding on the -Internet, things are much simpler. -**Best practice** +Aggregators +~~~~~~~~~~~ -If you are using Python 3 and set the the ``charset`` meta tag to -``utf-8``, the publisher takes ``utf-8`` as the default encoding, and -thus you do not have to set it manually. +An aggregator directive tells Zope how to process parameters with the same or +a similar name. +Zope supports the following aggregators: -.. note:: +- **list** -- collect all values with this name into a list. + If there are two or more parameters with the same name + they are collected into a list by default. + The ``list`` aggregator is mainly used to ensure that + the parameter leads to a list value even in the case that + there is only one of them. - Further information on how to set the charset: +- **tuple** -- collect all values with this name into a tuple. - https://developer.mozilla.org/de/docs/Web/HTML/Element/meta#attr-charset +- **default** -- use the value of this parameter as a default value; it + can be overridden by a parameter of the same name without + the ``default`` directive. +- **record** -- this directive assumes that the parameter name starts + with *var*\ ``.``\ *attr*. + It tells Zope to create a request variable *var* of type record + (more precisely, a ``ZPublisher.HTTPRequest.record`` instance) and + set its attribute *attr* to the parameter value. + If such a request variable already exists, + then only its attribute *attr* is updated. -.. attention:: +- **records** -- this directive is similar to ``record``. However, *var* + gets as value not a single record but a list of records. + Zope starts a new record (and appends it to the list) + when the current request parameter would override an attribute + in the last record of the list constructed so far (or this list + is empty). - The encoding also can be set by the web server, which would take - precedence over the meta tag. +- **ignore_empty** -- this directive causes Zope to ignore the parameter + if its value is empty. -**Special cases** -If you are still on Python 2 or your pages use a different encoding, -such as ``Windows-1252`` or ``ISO-8859-1``, which was the default -encoding for HTML 4, you have to add the encoding, eg ``:cp1252``, for -all argument type converts, such as follows:: - - - - - -.. note:: - - For a full list of supported encodings, please have a look at: - - https://docs.python.org/3.7/library/codecs.html#standard-encodings - -If your pages all use a character encoding which has ASCII as a subset, -such as Latin-1, UTF-8, etc., then you do not need to specify any -character encoding for boolean, int, long, float and date types. - -.. note:: - - The **form submission encoding** can be overridden by the - ``accept-charset`` attribute of the ``form`` tag: - - https://www.w3.org/TR/html5/sec-forms.html#selecting-a-form-submission-encoding - - -Method Arguments ----------------- - -Sometimes you may wish to control which object is published based on -form data. For example, you might want to have a form with a select -list that calls different methods depending on the item chosen. -Similarly, you might want to have multiple submit buttons which invoke -a different method for each button. - -The publisher provides a way to select methods using form variables -through the use of the ``method`` argument type. The method type allows -the request variable ``PATH_INFO`` to be augmented using information -from a form item's name or value. - -If the name of a form field is ``:method``, then the value of the field -is added to ``PATH_INFO``. For example, if the original ``PATH_INFO`` -is ``foo/bar`` and the value of a ``:method`` field is ``x/y``, then -``PATH_INFO`` is transformed to ``foo/bar/x/y``. This is useful when -presenting a select list. Method names can be placed in the select -option values. - -If the name of a form field **ends** in ``:method`` then the part of -the name before ``:method`` is added to ``PATH_INFO``. For example, if -the original ``PATH_INFO`` is ``foo/bar`` and there is a ``x/y:method`` -field, then ``PATH_INFO`` is transformed to ``foo/bar/x/y``. In this -case, the form value is ignored. This is useful for mapping submit -buttons to methods, since submit button values are displayed and -should therefore not contain method names. - - -Record Arguments ----------------- +An aggregator in detail: the `record` argument +++++++++++++++++++++++++++++++++++++++++++++++ Sometimes you may wish to consolidate form data into a structure rather than pass arguments individually. **Record arguments** allow you @@ -951,6 +935,220 @@ simple as possible. +Specifying argument character encodings +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +An encoding directive tells the converting process the encoding +of the parameter value. Typical encodings are e.g. "utf8" or "latin1". + +An encoding directive is ignored if the parameter does not +have a converter directive as well. +If there is no encoding directive, the converter uses the +default encoding as specified by the Zope configuration option +``zpublisher-default-encoding``. The default value for this configuration +option in Zope 4 is ``utf-8``. + +In principle, Zope supports any encoding known by the ``codecs`` +module. However, the converter may impose restrictions. + + +**Special cases** + +If you are still on Python 2 or your pages use a different encoding, +such as ``Windows-1252`` or ``ISO-8859-1``, which was the default +encoding for HTML 4, you have to add the encoding, eg ``:cp1252``, for +all argument type converts, such as follows:: + + + + + +.. note:: + + For a full list of supported encodings, please have a look at: + + https://docs.python.org/3.7/library/codecs.html#standard-encodings + +If your pages all use a character encoding which has ASCII as a subset, +such as Latin-1, UTF-8, etc., then you do not need to specify any +character encoding for boolean, int, long, float and date types. + +.. note:: + + The **form submission encoding** can be overridden by the + ``accept-charset`` attribute of the ``form`` tag: + + https://www.w3.org/TR/html5/sec-forms.html#selecting-a-form-submission-encoding + + +Method Arguments +~~~~~~~~~~~~~~~~ + +Normally, a request parameter is transformed into a request variable +and made available via the ``form`` attribute of the request object. The +*method* directive tells Zope to extend the path used for traversal. + +You can use a `method` directive to control which object is published based on +form data. For example, you might want to have a form with a select +list that calls different methods depending on the item chosen. +Similarly, you might want to have multiple submit buttons which invoke +a different method for each button. + +The publisher provides a way to select methods using form variables +through the use of the ``method`` argument type. The method type allows +the request variable ``PATH_INFO`` to be augmented using information +from a form item's name or value. + +If the name of a form field is ``:method``, then the value of the field +is added to ``PATH_INFO``. For example, if the original ``PATH_INFO`` +is ``foo/bar`` and the value of a ``:method`` field is ``x/y``, then +``PATH_INFO`` is transformed to ``foo/bar/x/y``. This is useful when +presenting a select list. Method names can be placed in the select +option values. + +If the name of a form field **ends** in ``:method`` then the part of +the name before ``:method`` is added to ``PATH_INFO``. For example, if +the original ``PATH_INFO`` is ``foo/bar`` and there is a ``x/y:method`` +field, then ``PATH_INFO`` is transformed to ``foo/bar/x/y``. In this +case, the form value is ignored. This is useful for mapping submit +buttons to methods, since submit button values are displayed and +should therefore not contain method names. + +Zope supports the following method directives: +``method`` (synonym ``action``), and ``default_method`` +(synonym ``default_action``). A path extension specified by a +``default_method`` directive is overridden by a ``method`` directive. + + +Processing model for request data marshaling +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Zope processes the request parameters in +``ZPublisher.HTTPRequest.HTTPRequest.processInputs``. + +This section describes the complex processing model in some detail as its +various steps and peculiar logic may lead to surprises. If you are developing +`with` Zope as opposed to developing Zope itelf, you may skip over these +details. + +In a preliminary step the request parameters are collected +from the potential sources, i.e. the "query" and +request body (if present), and normalized. The result is a sequence of +name/value pairs, each describing a single request parameter. + +Zope then sets up some variables: + +form + as target for the collected form variables + +defaults + as target for the collected form variable defaults + +tuple_items + to remember which form variable should be tuples + +method + as target for the path extension from method directives. + +It then loops over the request parameter sequence. + + +For each request parameter, the processing consists of the following steps: + +1. Some variables are set up: + + isFileUpload + does the parameter represent an uploaded file? + + converter_type + the most recently seen converter from a converter directive + + character_encoding + the most recently seen encoding from an encoding directive + + flags + to indicate which processing types are requested via directives + + Processing types are "ignore", "aggregate as sequence", + "aggregate as record", "aggregate as records", "use as default", + "convert" (using ``converter_type`` and ``character_encoding``). + +2. The parameter value is checked to see if it is a file upload. + In this case, it is wrapped into a ``FileUpload``, and ``isFileUpload`` + is updated. + +3. All directives in the paramter name are examined from right to left + and the variables set up in step 1 are updated accordingly. + ``:tuple`` directives update ``flags`` and ``tuple_items``, and method + directives update ``flags`` and ``method``. + +4. The actions stored in ``flags`` during step 3 are executed. + + If ``flags`` indicate the use as default, the step operates + on ``defaults``, otherwise on ``form``. + +After all request parameters have been processed +request variables from ``defaults`` are put into ``form`` as long as it +does not contain that variable already. +If a method directive has been encountered the traversal +path is extended accordingly. + +As a security measure, mainly for DTML use, request variables +are not only made available in the request attribute ``form``. +A (somewhat) secured version of them is also stored in +the attribute ``taintedform``. In the *tainted* request variable +variant, strings potentially containing HTML fragments use +``TaintedString`` as data type rather than the normal ``str``. +DTML will automatically quote those values to give some +protection against cross site scripting attacks via HTML injection. +With the more modern page templates, all values (not only tainted ones) +are quoted by default. They typically do not use the tainted +form of the request variables. + +Known issues and caveats +~~~~~~~~~~~~~~~~~~~~~~~~ + +1. There is almost no error handling: + + - unrecognized directives are silently ignored + + - if a request paramater contains several converter directives, the + leftmost wins + + - if a request paramter contains several encoding directives, the + leftmost wins + + - if a request parameter contains an encoding but no converter + directive, the encoding directive is silently ignored + + - some directive combinations do not make sense (e.g. ``:record:records``); + for them, some of the directives are silently ignored + +2. Usually, the order of aggregator directives in a request parameter does + not matter. However, this is not the case for the ``:tuple`` directive. + To really produce a tuple request variable, it must be the left most + directive; otherwise, it is equivalent to ``:list``. + + In addition, ``:tuple`` is always equivalent to ``:list`` for + request variables aggregated as record or sequence of records. + +3. The main use case for the ``:default`` directive is to provide a + default value for form controls (e.g. checkboxes) for which the browser may + or may not pass on a value when the form is submitted. + Unfortunately, this only works at the top level. + It does not work for subcomponents, e.g. an attribute of a "record". + As a consequence, if a request parameter combines ``:default`` with + another aggregator directive, the result may be unexpected. + +4. The request preprocessing happens at a very early stage, before + traversal has taken place. As a consequence, + important configuration for application specific error handling + may not yet have taken effect. Exceptions raised during this stage + are reported and tracked only via "root level" error handling. + For the reason it is typically better to use a form framework such as + ``z3c.form`` or ``zope.formlib`` for form processing + rather than the built-in features described in this document. + + Exceptions ---------- @@ -1006,7 +1204,7 @@ and by default **waitress** returns an error message as follows:: Exceptions and Transactions ---------------------------- +~~~~~~~~~~~~~~~~~~~~~~~~~~~ When Zope receives a request it begins a transaction. Then it begins the process of traversal. Zope automatically commits the transaction diff --git a/src/ZPublisher/HTTPRequest.py b/src/ZPublisher/HTTPRequest.py index ae76da18a1..77cdeee4f5 100644 --- a/src/ZPublisher/HTTPRequest.py +++ b/src/ZPublisher/HTTPRequest.py @@ -483,6 +483,11 @@ def processInputs( setattr=setattr): """Process request inputs + See the `Zope Developer Guide Object Publishing chapter + `_ + for a detailed explanation in the section `Marshalling Arguments from + the Request`. + We need to delay input parsing so that it is done under publisher control for error handling purposes. """