Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tutorial(s) and communicating on using PROJ>=6/GDAL3 in R #43

Open
florisvdh opened this issue Sep 22, 2020 · 37 comments
Open

Tutorial(s) and communicating on using PROJ>=6/GDAL3 in R #43

florisvdh opened this issue Sep 22, 2020 · 37 comments

Comments

@florisvdh
Copy link
Member

Opening a new discussion, as I've been trying to write a tutorial on current good practice in specifying a CRS in R (with focus on the commonly used sf, sp and raster). It would be great if some people could have a critical look at this tutorial. Comments / pull requests are most welcome, and that includes language improvement. Getting feedback on it was the primary reason for me to open this issue.

But, feel free to widen up discussion, about communication (needs?) to R users on practical PROJ>=6/GDAL3 aspects they must know about. (And, who knows, I may have missed existing communications so I'd be happy to hear about it!) While currently most developers and more advanced R users will be aware of these changes, and have found where to look on the internet, I think that many R users are still unaware (even though they should be alarmed by the proj4strings warnings). At least that's my experience - after all these are recent changes.

The context of the above linked tutorial is the following. In our organization (Research Institute for Nature and Forest - www.inbo.be/en) we are gradually extending a tutorials website (https://inbo.github.io/tutorials/), mostly oriented towards reproducible science (spatial topics are found by clicking on the 'gis' tag.). Its primary aim is to support and stimulate scientists within the organization, but of course it is available to everyone and open to contributions. And next to the website, some very nice colleagues organize coding clubs in order to accelerate hands-on experience with R. Overall, we have about 150 R users (as derived from our internal useR mailing list), with a large variety in their R experience and in their effective R usage. Not all scientists are enough self-reliant to find out about technical aspects they need to know, hence they desire being supported in some coordinated way. I imagine this is within normal expectations and it is the case elsewhere.

In order to fill that gap we believe organization-level tutorials are of help. In some cases we just make a tutorial to provide a link to another tutorial elsewhere, in order to prevent unnecessary duplication and keep maintenance low.

@rsbivand
Copy link
Member

Thanks, useful. I'll be speaking at whyR 2020 on Sunday 27 September at 11.00 - some parts of the talk and the (to be posted soon) code extend the celebRation 2020 talk you reference. I'll post a link once I have one.

I haven't read through your tutorial carefully yet, but a question that is becoming more visible is that EPSG:4326 is by definition Latitude-Longitude, while OGC:CRS84 is by definition Longitude-Latitude. From the GDAL barn-raising, PROJ/GDAL are trying to be authority-compliant, but this isn't resolved yet on our side. In the talk, I pivot to using OGC:CRS84 instead of EPSG:4326; you may think about what works for your target readership, but probably EPSG:4326 is a bad idea for GIS/visualization from the point of view of axis order.

@florisvdh
Copy link
Member Author

That's very interesting to learn about @rsbivand , it was beyond my knowledge (I thought we could always reside to EPSG). I will definitely have look at that - looking forward to your talk.

@edzer
Copy link
Member

edzer commented Sep 22, 2020

I agree that writing OGC:CRS84 instead of EPSG:4326 with long-lat data is a good idea, as it makes axis order unambiguous and is authority compliant. I don't agree that "GDAL tries to be authority compliant", but it allows you to be (but by default isn't, I believe). There is still "what most software does" (interpret epsg:4326 as long-lat) versus "what EPSG tells you do to" (interpret epsg:4326 as lat long - which has always been like that). Before assuming that EPSG:4326 is lat long by default in R, we may want to align with QGIS, GRASS, PostGIS and other OSGEO projects.

@rsbivand
Copy link
Member

rsbivand commented Sep 23, 2020

Link to my Sunday whyR talk (11.00 CEST) probably https://youtu.be/mEAyQ8bv1zU. Followed by interesting panel. Presentation files on https://github.com/rsbivand/whyR20_files; output from current rsbivand/sp and rev 1066 of rgdal, sf 0.9-6 CRAN.

@florisvdh
Copy link
Member Author

Thanks Roger for the interesting extra slides (42-73), which you couldn't discuss in the talk, about (among others) proj4string stripping, axis order, WKT representation by GDAL versus PROJ (and, some new rgdal functions). Well explained and I liked reading it!

@rhijmans
Copy link

It is helpful, except, I think, for the statement that one should "specify the CRS by using the EPSG code, but do so without using a proj4string".

That is not generally usable advice. While you can use these codes for the few established CRSs, you cannot use it to define an appropriate system for the data you are working with (and there is an infinite number of these). While we can still use PROJ notation, at least with the WGS84 datum, that seems to be the only generally usable approach available; unless you are able to create your own WKT. I am not aware of any effort to write software to support creation of WKT from basic components (like a PROJ string), but that could be very useful. Either way, it would be good if your tutorial explained how to do that, if there is a way.

@mdsumner
Copy link
Member

mdsumner commented May 24, 2021

While you can use these codes for the few established CRSs, you cannot use it to define an appropriate system for the data you are working with (and there is an infinite number of these)

indeed, in fact I've been considering writing out a short explanation of centring a local projection on a lon,lat location and using metres to buffer an extent around it. It's handy to show the cosine hack one would use to get an approximately correct aspect ratio with angular coordinates, but a throw way proj-string projection and a metres-either-side range is much simpler. (sometimes you care more about arbitrary centre and extent than you do about the projection family, and in addition you may not care at all about epoch-related inaccuracy). I've been using this as a target spec with the GDAL warper and it's clear it's the best way to generate a local projection for general use.

It could be expanded into a "we use this epoch in such-and-such context", so here's the WKT2 version of a throw away proj string used in the more casual ways ....

@edzer
Copy link
Member

edzer commented May 24, 2021

I may be wrong but think the issue is the difference between using proj4 strings as coordinate reference systems, or as coordinate transformations. If we use them as CRS, they get +datum=WGS84 attached which may happen to be what you want but in general not be a good idea, e.g. if you're working in another datum and only want to project. The pivotal WGS84 is the thing that PROJ > 5 managed to get away from. You can use "proj4" notation to create conversion or transformation pipelines, e.g. to do a projection without a datum transformation; this is what PROJ & GDAL (still) do, internally. Both sp and sf accept proj.4 string-defined CRS's as input, just no longer as CRS representation.

@mdsumner
Copy link
Member

mdsumner commented May 24, 2021

Yes, except code is also representation, and round-tripping has no hope to exist. The problems are real ones, and the benefits are clear only in local and primarily terrestrial contexts, exactly the ones where a local authority grid likely already exists and exploring the properties of projections has less value.

The WKT2 rigour in the closed contexts of sf and terra with their built-in GDAL stacks is unproblematic afaic, they just aren't the entire story.

@rhijmans
Copy link

rhijmans commented May 24, 2021

If we use them as CRS, they get +datum=WGS84 attached which may happen to be what you want

If you set up your own CRS parameters, changes are that this is what you want.

but in general not be a good idea, e.g. if you're working in another datum and only want to project

But what other option would I have? That way I could get at least get a WKT2 for the WGS84 variant; and then I could try to use that as template by replacing WGS84 with the datum of choice. But I have never needed that; for the reasons that Mike alludes to.

If we use them as CRS, they get +datum=WGS84 attached which may happen to be what you want, but in general not be a good idea

To extend this discussion a bit: when using the WGS84 datum, the +proj notation is the "in script" representation that I use and would recommend. This is because the proj strings are self-explanatory, and that makes it easy to interpret them and to spot errors. In contrast, EPSG codes are opaque, and their use leads to inferior code that is harder to read and more likely to contain errors. I admit that I am uncertain about the downside of this approach; about all the things that could go wrong, nasty things like axis reversal --- and the degree to which our software could prevent such accidents from happening.

So is it fair to say that:

  1. You can no longer use +proj with datum != WGS84
  2. You can use EPSG codes (or similar) in that situation
  3. If there is no code for what you want, you can use +proj= , but only with datum=WGS84
    (and you can use that to create a WKT2 from which it should be possible to derive a variant with another datum of choice).
  4. if you use datum=WGS84, you can use the "+proj=" notation instead of EPSG codes to be more explicit.
    but be aware of certain risks.... (which are?)
  5. It is not foreseen that the "+proj=... datum=WGS84" route will be dropped by the library maintainers (I sure hope this is true)

?

@mdsumner
Copy link
Member

mdsumner commented May 25, 2021

That summary accords with my understanding, which has been slow - you can also use the modern equivalent of +sphere which is +R=, with an explicit radius, and that works fine as long as you don't stray too far from the earth's radius (at some threshold an error triggers from PROJ itself). I can't find a direct doc of this rn but it's on the proj site, I think.

I think the risks are at sub-metre level for ensemble datums, if you don't have the time stamp of your coordinate measures and ability to choose the right epoch. Other risks are the standard ones we always had (100s or so metres for failure to specify the actual datum).

You also can't use +init=<auth>:<code> anymore, but that's a PROJ lib version thing, modern equivalent is <auth>:<code>. (Just worth a mention with the above list, I think).

@rsbivand
Copy link
Member

Is it possible that an R function might be needed for custom CRS to either just write a WKT2-2019 or PROJJSON string from user Proj4-like input? If PROJ was available, the representation could be validated, and/or proj_create() could be used with a Proj4 string, and/or proj_create_argv() ?

I recall that PROJJSON was suggested as somewhat easier to handle than WKT2, but for user input, Proj4 strings are clearly easiest.

The downside is of course that EPSG coded CRS are much easier to fit into SQL search for transformation pipelines. I've seen 1-2m given as the EPSG:4326 ensemble inaccuracy, with further comments on https://lists.osgeo.org/pipermail/gdal-dev/2021-May/054112.html , https://github.com/rouault/gdal/blob/rfc_81/gdal/doc/source/development/rfc/rfc81_coordinate_epoch.rst . I don't think things have settled at all. If we were already an OSGeo community, maybe we'd be able to put forward points of view, I'm unsure about that, though.

@florisvdh
Copy link
Member Author

florisvdh commented May 26, 2021

Thanks @rhijmans for the feedback! I'd be happy to improve the tutorial where possible. Related to @mdsumner 's remark, its scope is indeed limited to officially registered (say EPSG) CRSs - probably we should state that more clearly. But I feel it is the scope most users are confronted with. So importantly, the tutorial was not intended to explain on defining CRSs from scratch. I think that would best fit in a separate tutorial - cf. post above.

The tutorial's aim is to give generally applicable, basic advice, easy for (beginning) R users (in the light of the recent proj4string obsoletion). In that regard (keeping things simple and robust), I doubt whether it's a good idea to showcase that one can also use +proj to specify CRSs that use the WGS84 datum (datum:EPSG::6326) - the latter is then implicitly assumed by the (current) software implementation. But we could say something about it (see below).

IMO the easiest / failsafe way in specifying an 'official' CRS (for spatial data) is referring the CRS from an official database (like proj.db) instead of manually providing (some of) the parameters.

The WKT2 rigour in the closed contexts of sf and terra with their built-in GDAL stacks is unproblematic afaic, they just aren't the entire story.

Thanks! In order to make the tutorial more complete, it may indeed be a good idea to add a short note on the WGS84 datum case in parentheses or as a footnote (referring to +proj ), or just refer to the PROJ documentation if someone wishes to use projstrings. In that way things don't get too complicated for the beginner, and only full CRS definitions are considered in the tutorial (either defined by the EPSG key or the OGC WKT standard). Note, the EPSG:xxxx format is covered in the tutorial.

In contrast, EPSG codes are opaque, and their use leads to inferior code that is harder to read and more likely to contain errors

Sure one could make mistakes and pass the wrong key. But I think it's a better managed approach - to comply with EPSG CRSs that is. I don't understand 'harder to read' in your explanation.

I am not aware of any effort to write software to support creation of WKT from basic components (like a PROJ string), but that could be very useful. Either way, it would be good if your tutorial explained how to do that, if there is a way.

There is much merit indeed in explaining more on what can be done (and not) with projstrings. It is not an area I'm familiar with though. IMO this deserves its own tutorial (you're also welcome to add it to the inbo/tutorials repo, even though it's linked to my organization, but it would then sit next to the EPSG-CRS tutorial).

--

As a sidenote, and referring to aforementioned WGS84 inaccuracies, IMHO a CRS with a specific WGS84 datum realization (latest realization is G1762: EPSG:1156), such as the EPSG:9057 CRS, seems preferable (when possible) to use over EPSG:4326 (or OGC:CRS84) CRS which uses the WGS84 ensemble. The latter (datum:EPSG::6326) is an ensemble datum implicitly refers to the current WGS84 datum realization, so this datum definition is not constant in time. IIUC this will have an effect regardless of correctly setting the epoch of the coordinates (which I think can be done for each WGS84 datum realization independently).

(PS: I'm not often online these weeks, sorry for the late reply)

@rhijmans
Copy link

rhijmans commented May 26, 2021

My main point was that saying that the general principle is to

specify the CRS by using the EPSG code, but do so without using a proj4string.

seems extreme and exaggerated. Instead you could recommend to use EPSG codes when available; and note that you can use PROJ.4 notation as long as you do not stray from WGS84. There is, by the way, nothing "official" about these codes. I am not suggesting to explain how to roll your own WKT2 CRS with another datum (as we do not seem to know how to easily do this!)

I feel it is the scope most users are confronted with

There might be many that have other needs. I have worked a lot with species distribution data. These do not tend to follow national boundaries and so the appropriate CRS is not in the databases. There is a national grid for Belgium, one for the Netherlands and (presumably) one for Luxembourg. What do you use when you map the Benelux? So I think the situation in which you want to define your own parameters is very common. Of course you could pick one that is "close enough" from a list, but is that good advice?

I don't understand 'harder to read' in your explanation.

To me this is hard to read: ESRI:102004; it means nothing to me. Whereas +proj=lcc +lat_1=33 +lat_2=45 +lat_0=39 +lon_0=-96 is very clear and expressive --- (it takes some getting used to at first, of course). I do not know what EPSG:6326 is, but I know what +proj=longlat stands for. Thus I think that a script that uses the latter is better, unless the former is truly needed, e.g. to deal with datum realizations (now explain that to your general user ;)

((CRSs create a lot of misunderstanding. Using codes does not educate. It would be have been much better to expand PROJ notation, it seems to me. I do not know the limits of that, but the point is moot, we are going to have to live with it.))

CRS with a specific WGS84 datum realization seems preferable

How would that matter in data analysis (as opposed to the e.g. the perfect drone delivery service, be it of goods or bads); as long as the original datum is known and the data correctly transformed to the same (appropriate) datum?

@mdsumner
Copy link
Member

Yes, this is so important. The idea that you must find a code is incredibly toxic, and you can see this idea propagated online in so many communities where experienced users should know better (it's similar to the "find your UTM zone" advice, which is hopelessly naive and off-base).

@rsbivand
Copy link
Member

If it the case that the position data are only going to be used by the person choosing the CRS using a custom Proj4 string, without any datum transformation, then by all means use Proj4 strings. If the data are going to be written to file/db, then the CRS representation matters much more, as is the case for datum transformation.

Another question - for vector data, an s2 approach with only GEOGCRS on a sphere is probably easiest - then everybody is in the same situation. But how do raster and s2 work together? This is, I feel, somewhere in the gdistance area, isn't it? s2 to WebMercator isn't accurate, but at scales of several hundred km, does it matter? Potentially yes? If so (data integration from multiple sources), we are still stuck. Admittedly, GRASS locations could be defined somewhat like Proj4, but in that case, one transformed all non-compliant input to the location CRS. Do we still need to do that, or is s2 spherical without worrying about epochs or datum ensembles close enough? I'd be very interested in seeing @paleolimbot 's views. For species distribution data, is a planar representation essential, or is a spherical representation adequate, and side-steps the CRS representation question?

@paleolimbot
Copy link

Honoured that anybody is interested in my views!

s2 shines when the data are already provided as longitude/latitude, which is and I imagine will be common for a long time. Correcting for epoch or other CRS issues will probably always be at the beginning or end of a script...s2 will always sit in the middle. The other place where s2 shines, regardless of the initial form of the data, is when you need to ask a global question (e.g., "within 500 km of a coastline but not on land"). Those questions rarely have an accuracy requirement that is impacted by the spherical assumption.

Maybe a good way to summarize my feelings about this are that there are no "innaccurate" reference systems, only inaccurate transformations, inaccurate distances between points, and inaccurate representations of edges (i.e., how you draw a line between two coordinates). s2 can help by minimizing the number of times one has to invoke a transformation and has tools that can make edge representation explicit by adding points (at the expense of all distances in s2 being innaccurate/spherical only).

@rsbivand
Copy link
Member

@paleolimbot thanks! My feeling is that maybe we can lessen the need for custom projections by choosing to use spherical representations to a greater extent from now on, rather than having to go planar to handle distances/areas/intersections. However, perhaps species distribution models, if they presume planar, might need re-visiting? Are perhaps also global spherical grid systems, like the now-archived dggridR package https://github.com/r-barnes/dggridR/ or similar? I feel that, unless analysts need high precision, spherical is not worse than planar some distance from the central lon-lat point.

@rhijmans
Copy link

I agree that using lon/lat can be very good for some types of data analysis, but that is a different discussion. There are many instances in which someone would reasonably want to define a custom CRS. As-is, in the GDAL/PROJ world we only have the proj4 string for that, even if its usability has degraded. In the tutorial, Floris says that the proj4 notation should not be used, in part because he believes it will likely be removed. That may be true or not, but if we could only use EPSG (etc) codes we would not have competent general purpose spatial data analysis software. We would need additional software that translates a (perhaps expanded) proj4 string (or similar) to wkt2. (and, who knows, it could then be included in the next version of PROJ? )

@rsbivand
Copy link
Member

rsbivand commented May 28, 2021

I think all these cases are already covered in both sf and sp workflows. The relevant problem is that GDAL degrades input file CRS when exporting to Proj4, so the internal R-side CRS representation in sp or sf workflows cannot be Proj4 when reading from file or checking the representation for validity against PROJ. PROJ and GDAL both import Proj4 strings, but deprecate +towgs84=, +nadgrids= and except WGS84, NAD83 and NAD27 +datum= (and possibly others, unsure). So for the example above:

> sf::st_crs("+proj=lcc +lat_1=33 +lat_2=45 +lat_0=39 +lon_0=-96")
Coordinate Reference System:
  User input: +proj=lcc +lat_1=33 +lat_2=45 +lat_0=39 +lon_0=-96 
  wkt:
PROJCRS["unknown",
    BASEGEOGCRS["unknown",
        DATUM["World Geodetic System 1984",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ID["EPSG",6326]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8901]]],
    CONVERSION["unknown",
        METHOD["Lambert Conic Conformal (2SP)",
            ID["EPSG",9802]],
        PARAMETER["Latitude of false origin",39,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8821]],
        PARAMETER["Longitude of false origin",-96,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8822]],
        PARAMETER["Latitude of 1st standard parallel",33,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8823]],
        PARAMETER["Latitude of 2nd standard parallel",45,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8824]],
        PARAMETER["Easting at false origin",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8826]],
        PARAMETER["Northing at false origin",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8827]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]]
> sf::st_crs("+proj=lcc +lat_1=33 +lat_2=45 +lat_0=39 +lon_0=-96 +ellps=WGS84")
Coordinate Reference System:
  User input: +proj=lcc +lat_1=33 +lat_2=45 +lat_0=39 +lon_0=-96 +ellps=WGS84 
  wkt:
PROJCRS["unknown",
    BASEGEOGCRS["unknown",
        DATUM["Unknown based on WGS84 ellipsoid",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1],
                ID["EPSG",7030]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8901]]],
    CONVERSION["unknown",
        METHOD["Lambert Conic Conformal (2SP)",
            ID["EPSG",9802]],
        PARAMETER["Latitude of false origin",39,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8821]],
        PARAMETER["Longitude of false origin",-96,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8822]],
        PARAMETER["Latitude of 1st standard parallel",33,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8823]],
        PARAMETER["Latitude of 2nd standard parallel",45,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8824]],
        PARAMETER["Easting at false origin",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8826]],
        PARAMETER["Northing at false origin",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8827]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]]
> sf::sf_extSoftVersion()
          GEOS           GDAL         proj.4 GDAL_with_GEOS     USE_PROJ_H 
   "3.10.0dev"        "3.3.0"        "8.0.1"         "true"         "true" 
          PROJ 
       "8.0.1" 

with default insertion of WGS84 as datum in the WKT2-2019 representation if +ellps= is not given. For sp workflows:

> (o <- sp::CRS("+proj=lcc +lat_1=33 +lat_2=45 +lat_0=39 +lon_0=-96"))
CRS arguments:
 +proj=lcc +lat_0=39 +lon_0=-96 +lat_1=33 +lat_2=45 +x_0=0 +y_0=0
+datum=WGS84 +units=m +no_defs 
> cat(sp::wkt(o), "\n")
PROJCRS["unknown",
    BASEGEOGCRS["unknown",
        DATUM["World Geodetic System 1984",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ID["EPSG",6326]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8901]]],
    CONVERSION["unknown",
        METHOD["Lambert Conic Conformal (2SP)",
            ID["EPSG",9802]],
        PARAMETER["Latitude of false origin",39,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8821]],
        PARAMETER["Longitude of false origin",-96,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8822]],
        PARAMETER["Latitude of 1st standard parallel",33,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8823]],
        PARAMETER["Latitude of 2nd standard parallel",45,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8824]],
        PARAMETER["Easting at false origin",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8826]],
        PARAMETER["Northing at false origin",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8827]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]] 
> (o <- sp::CRS("+proj=lcc +lat_1=33 +lat_2=45 +lat_0=39 +lon_0=-96 +ellps=WGS84"))
CRS arguments:
 +proj=lcc +lat_0=39 +lon_0=-96 +lat_1=33 +lat_2=45 +x_0=0 +y_0=0
+ellps=WGS84 +units=m +no_defs 
Warning message:
In showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj = prefer_proj) :
  Discarded datum Unknown based on WGS84 ellipsoid in Proj4 definition
> cat(sp::wkt(o), "\n")
PROJCRS["unknown",
    BASEGEOGCRS["unknown",
        DATUM["Unknown based on WGS84 ellipsoid",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1],
                ID["EPSG",7030]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8901]]],
    CONVERSION["unknown",
        METHOD["Lambert Conic Conformal (2SP)",
            ID["EPSG",9802]],
        PARAMETER["Latitude of false origin",39,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8821]],
        PARAMETER["Longitude of false origin",-96,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8822]],
        PARAMETER["Latitude of 1st standard parallel",33,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8823]],
        PARAMETER["Latitude of 2nd standard parallel",45,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8824]],
        PARAMETER["Easting at false origin",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8826]],
        PARAMETER["Northing at false origin",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8827]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]] 
> (o <- sp::CRS("+proj=lcc +lat_1=33 +lat_2=45 +lat_0=39 +lon_0=-96 +ellps=WGS84"))
CRS arguments:
 +proj=lcc +lat_0=39 +lon_0=-96 +lat_1=33 +lat_2=45 +x_0=0 +y_0=0
+ellps=WGS84 +units=m +no_defs 
Warning message:
In showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj = prefer_proj) :
  Discarded datum Unknown based on WGS84 ellipsoid in Proj4 definition
> cat(sp::wkt(o), "\n")
PROJCRS["unknown",
    BASEGEOGCRS["unknown",
        DATUM["Unknown based on WGS84 ellipsoid",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1],
                ID["EPSG",7030]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8901]]],
    CONVERSION["unknown",
        METHOD["Lambert Conic Conformal (2SP)",
            ID["EPSG",9802]],
        PARAMETER["Latitude of false origin",39,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8821]],
        PARAMETER["Longitude of false origin",-96,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8822]],
        PARAMETER["Latitude of 1st standard parallel",33,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8823]],
        PARAMETER["Latitude of 2nd standard parallel",45,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8824]],
        PARAMETER["Easting at false origin",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8826]],
        PARAMETER["Northing at false origin",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8827]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]] 
> rgdal::rgdal_extSoftVersion()
          GDAL GDAL_with_GEOS           PROJ             sp           EPSG 
       "3.3.0"         "TRUE"        "8.0.1"        "1.4-6"      "v10.018" 

So R-side WKT2-2019 can be created from Proj4 string user input, usually without an authority for the PROJCRS. The WKT2-2019 should also be OK to write to file as part of an object, and should be retrieved as written.

It would be useful to check transformation pipelines between different +ellps= values; as a starter:

> o <-sf::st_crs("+proj=lcc +lat_1=33 +lat_2=45 +lat_0=39 +lon_0=-96 +ellps=GRS80")
> o0 <-sf::st_crs("+proj=lcc +lat_1=33 +lat_2=45 +lat_0=39 +lon_0=-96 +ellps=airy")
> sf::sf_proj_pipelines(o0, o)
Candidate coordinate operations found:  1 
Strict containment:     FALSE 
Axis order auth compl:  FALSE 
...
Best instantiable operation has only ballpark accuracy 
Description: Inverse of unknown + Ballpark geographic offset from unknown to
             unknown + unknown
Definition:  +proj=pipeline +step +inv +proj=lcc +lat_0=39 +lon_0=-96
             +lat_1=33 +lat_2=45 +x_0=0 +y_0=0 +ellps=airy
             +step +proj=lcc +lat_0=39 +lon_0=-96 +lat_1=33
             +lat_2=45 +x_0=0 +y_0=0 +ellps=GRS80

@mdsumner
Copy link
Member

mdsumner commented May 29, 2021

My take, merely for transformations available to general purpose software as Robert mentioned, is to have a basic interface to what PROJ does. @paleolimbot provides this in libproj but has cryptic and painful problems keeping that on CRAN, they have been difficult and unhelpful with failed communications from their side and failed to indicate an impending archive step because of problems in the package. Communications have been ambiguous, and it depends who on the CRAN team is replying. That's typical in my experience, the more in the club you are the better, with no transparency about what they support, help with, etc. They try to be robotic, objective, but they are not.

If you can use PROJ as-is, you get everything you need, input proj4, wkt, whatever format the library provides, and the user/developer can make choices. If we had that shared resource then it'd be easy to write an adjunct to the blog post discussed.

@edzer
Copy link
Member

edzer commented May 29, 2021

pkg libproj made the proj sources part of the package, which sounds attractive because you loose the need to install proj besides R when doing source installs, but which also puts all the PROJ sources under the CRAN src checking, leading to problems similar to e.g. those we now face with s2 (g++11 issues in the upstream library, us needing to push google to move for our case, everyone waiting).

I don't see why the way libproj interfaced PROJ can't be done without including all of PROJ in src but simply using the way sf/rgdal/terra etc use PROJ. If that is what you need then why don't you do that? I also don't see how complaining about CRAN contributes to the discussion here.

@rsbivand
Copy link
Member

This also relates for PROJ to the size of its distributed data; having a projdata package has been suggested, but is version-fragile as Windows and MacOS CRAN binaries may use different versions. This also impacts OSGeo4W, as users may see one PROJ version in R (even if looking at the OSGeo4W-bundled data) and a different version of the data. This matters now because EPSG 10 changed the definitions of the database table and field structures in some places (so R/C code built against later PROJ through PROJ itself bites the dust when looking at an earlier DB structure).

@florisvdh
Copy link
Member Author

florisvdh commented May 29, 2021

Thanks @rhijmans for shedding light on your usecase (or data analysis in general), and for advocating being able to define one's own CRS (I like the Benelux example 🙂). Currently I'm unsure what's the best advice to easily define a new CRS (see below) and this may need discussion at PROJ directly. It seems it is not currently a purpose of PROJ to make this easy.

Enlisting here some things noted, after reading through pages on https://proj.org (but nothing more):

  • projstrings are seen as the definition of a coordinate operation (https://proj.org/usage/quickstart.html)

    • in that sense, I don't believe that projstrings will be 'removed' - although they'll likely evolve further. But an important question is what to use for defining a CRS.
    • Since projected CRSs are, in part, defined by projection parameters, it is still possible to use projstrings to define certain projected CRSs (though currently not recommended by the developers, see further). EDIT: doubting this Various examples on coordinate operations (with proj and with cs2cs), in the PROJ documentation, actually apply it.
      • Since the datum cannot be set freely, for the input coordinates WGS84 datum is assumed by default (?), at least that seems the case according to (partly outdated?) examples in the documentation. I'm unsure on that, as for PROJ 6.0.0 a breaking change was: "In case no ellipsoid or datum specification is provided in the PROJ string, the default ellipsoid is GRS80 (was WGS84 in previous PROJ versions)." Given that the WGS84 datum (datum:EPSG::6326) implies (by definition) the WGS84 ellipsoid, I'm confused here. And an ellipsoid does not imply a single datum (unless the software again makes an assumption).
    • IMHO the documentation is not very clear about how datum shifts work (https://proj.org/usage/transformation.html#transformation-pipelines and https://proj.org/operations/pipeline.html), where PROJ seems to only specify ellipsoids in a transformation pipeline. Hence for each ellipsoid assuming a default associated datum? EDIT: no. And how to construct a conversion pipeline within the ETRS89 datum, which uses GRS80 ellipsoid (e.g. in the Belgian-Dutch border region, for converting between ETRS89-based projections) then seems not immediately apparent, unless the ETRS89 datum will be taken from an explicitly declared input CRS (EPSG/WKT) and not changed. On the other hand it is clear how to set an ellipsoid fixed to a whole pipeline, i.e. by putting ellps=GRS80 before the +step parts: +proj=pipeline +ellps=GRS80 +step ....
      This also relates to Roger's post above.
    • Sidenote: the name 'proj4strings' is avoided - the project is on version 8. It seems best to also mention the generalized 'proj-string' / 'PROJ string' name when documenting/communicating on the 'proj4string' slot/method/function in R. This is just a suggestion of course but to me 'proj4string' sounds as if we somehow emulate PROJ.4 behaviour in R.
  • specifying CRSs: in an example's code comment, explicit advice against projstrings is seen for specifying CRSs. Rather the authority code (or the name) can be provided. From https://proj.org/development/migration.html#code-example:

       /* NOTE: the use of PROJ strings to describe CRS is strongly discouraged */
       /* in PROJ 6, as PROJ strings are a poor way of describing a CRS, and */
       /* more precise its geodetic datum. */
       /* Use of codes provided by authorities (such as "EPSG:4326", etc...) */
       /* or WKT strings will bring the full power of the "transformation */
       /* engine" used by PROJ to determine the best transformation(s) between */
       /* two CRS. */
    

    Similar advice is seen in https://proj.org/usage/transformation.html#proj-4-x-5-x-paradigm.

All in all, it seems that the particular case of creating a new CRS is not specifically envisaged, but I may be wrong. The general idea seems to be that an actual CRSs should be defined in full (WKT or equivalent). So this ultimately seems a debate between full versus partial CRS specifications.

  • The only 'easy' way to specify a full CRS is for existing ones I guess (using EPSG code).
  • That is, unless one could explicitly refer an authority-CRS and alter a few specific parameters. The latter is essentially what projstrings did/do, but an obligatory reference to the base CRS (which you alter) might match the needs of the current PROJ / OGC approach while being sufficient for those wishing to easily define a new CRS. Any ideas about this? A feature request?

Anyway, I intend to add a bit of nuance to the tutorial, regarding projstrings. Probably it's safer to say 'goodbye PROJ.4' than 'goodbye proj4strings' 😉.

@paleolimbot
Copy link

Lots of good stuff here! All my bits are off topic...I'll respond here but feel free to open threads elsewhere if there's any interest in further discussion.

  • @rsbivand mentioned spherical grid systems...the latest s2 will expose the s2 cell system for binning in this way. The H3 index system also has a very portable C API that does something similar. I think for packages to "go spherical" there needs to be more exposed at the C level so that Cartesian algorithms can be adapted. I imagine the best way to do that would be to replicate the GEOS C API with a drop-in S2 version. I probably won't get around to this, though.
  • The current hold up on libproj is actually just static linking libtiff on Solaris. More pressing things came along so I paused development, but it's a worthy experiment that solves a lot of problems. I will finish it and keep it up to date, but am by no means offended by anybody who chooses not to use it. Notably, it will solve the problems @rsbivand noted regarding PROJ and PROJ data versions introducing non-reproducible transformations between users.
  • For what it's worth, I think of proj4 strings as "constructors"...nobody is going to (or should) type out the full WKT2 when they're trying to fit a projection to some data (which is a good thing that people should do more often!). sf objects do it right...they store the user input and calculate the WKT2 when requested to do so.

@florisvdh
Copy link
Member Author

florisvdh commented Jun 17, 2021

After reading further in PROJ documentation I conclude that some thoughts in my previous post, about PROJ assuming an implicit datum, and hence confusion in the documentation, were incorrect - sorry for that. I'll try to clarify that part here (corresponding parts are crossed out above).

As I understand now, (modern) proj-strings only define the coordinate operation itself, i.e. without defining the input and output CRS (and hence, their datum). So if a datum shift is needed (because input & output CRS have different datum), then the corresponding shift must be defined in the pipeline, but the actual CRS datums don't. Another way to say this would be that the geoid's position relative to the ellipsoidal CS are (generally) neglected in proj-strings. For operations with geographical coordinates only the ellipsoid model and relative positions between ellipsoidal CSs (e.g. by Helmert transformation) are defined by proj-strings: one doesn't need the geoid position (relative to the ellipsoidal CS) to define these coordinate operations. Of course you still do need the datum of your input and output CRS in order to match coordinates with real locations.

If I remember correctly, PROJ.4 did assume a WGS84 datum default and allowed an alternative datum definition (e.g. directly with +datum or through 'early binding' with the WGS84 hub using +towgs84). In my previous post there was an assumption that PROJ still uses (or assumes) an input datum, but that seems wrong.

Meanwhile, I'm preparing some questions, to address at PROJ, about the role of proj-strings in declaring CRSs (including custom ones).

@florisvdh
Copy link
Member Author

florisvdh commented Jun 18, 2021

Below follow some results from attempts to declare a CRS from a proj-string, with recent PROJ (using projinfo). I hope this sheds further light on the discussion. Main conclusion: it can be done, but it is discouraged and is considered legacy (unsure how long this remains in place). Importantly, support for direct datum definition is generally absent for proj-strings, except indirectly through using +towgs84. From most of PROJ documentation it is also apparent that proj-strings are not used or advertised as a means to declare a CRS, but for coordinate operations instead.

--

While the declaration of a CRS using a proj-string seems not documented on proj.org for the main PROJ programs (projinfo etc.), it can be done but with little support for datum (*). Taking Roger's first example CRS (while running PROJ 7.2.1):

  • Consistent with expectation based on PROJ documentation, a regular proj-string now corresponds to a coordinate operation:

    $ projinfo -o WKT2:2019 "+proj=lcc +lat_1=33 +lat_2=45 +lat_0=39 +lon_0=-96"
    WKT2:2019 string:
    CONVERSION["PROJ-based coordinate operation",
        METHOD["PROJ-based operation method: +proj=lcc +lat_1=33 +lat_2=45 +lat_0=39 +lon_0=-96"]]
    
  • One needs to add a +type=crs element to the proj-string in order to declare a CRS (click command to see its output):

    $ projinfo -o WKT2:2019 "+proj=lcc +lat_1=33 +lat_2=45 +lat_0=39 +lon_0=-96+type=crs"
    WKT2:2019 string:
    PROJCRS["unknown",
        BASEGEOGCRS["unknown",
            DATUM["World Geodetic System 1984",
                ELLIPSOID["WGS 84",6378137,298.257223563,
                    LENGTHUNIT["metre",1]],
                ID["EPSG",6326]],
            PRIMEM["Greenwich",0,
                ANGLEUNIT["degree",0.0174532925199433],
                ID["EPSG",8901]]],
        CONVERSION["unknown",
            METHOD["Lambert Conic Conformal (2SP)",
                ID["EPSG",9802]],
            PARAMETER["Latitude of false origin",39,
                ANGLEUNIT["degree",0.0174532925199433],
                ID["EPSG",8821]],
            PARAMETER["Longitude of false origin",-96,
                ANGLEUNIT["degree",0.0174532925199433],
                ID["EPSG",8822]],
            PARAMETER["Latitude of 1st standard parallel",33,
                ANGLEUNIT["degree",0.0174532925199433],
                ID["EPSG",8823]],
            PARAMETER["Latitude of 2nd standard parallel",45,
                ANGLEUNIT["degree",0.0174532925199433],
                ID["EPSG",8824]],
            PARAMETER["Easting at false origin",0,
                LENGTHUNIT["metre",1],
                ID["EPSG",8826]],
            PARAMETER["Northing at false origin",0,
                LENGTHUNIT["metre",1],
                ID["EPSG",8827]]],
        CS[Cartesian,2],
            AXIS["(E)",east,
                ORDER[1],
                LENGTHUNIT["metre",1,
                    ID["EPSG",9001]]],
            AXIS["(N)",north,
                ORDER[2],
                LENGTHUNIT["metre",1,
                    ID["EPSG",9001]]]]
    

    This is documented - but once again discouraged - for the PROJ functions proj_create() and proj_create_argv():

    If a proj-string contains a +type=crs option, then it is interpreted as a CRS definition. In particular geographic CRS are assumed to have axis in the longitude, latitude order and with degree angular unit. The use of proj-string to describe a CRS is discouraged. It is a legacy means of conveying CRS descriptions: use of object codes (EPSG:XXXX typically) or WKT description is recommended for better expressivity.

    If a proj-string does not contain +type=crs, then it is interpreted as a coordination operation / transformation.

    Also, the +type=crs at the end is what you get when running projinfo -o proj EPSG:XXXX when the EPSG code is a CRS object, e.g. projinfo -o proj EPSG:27700 (OSGB 1936 / British National Grid).

  • From above output it can be seen that the WGS84 ensemble datum (datum:EPSG::6326) is used here. As can also be learnt from Roger's post, this default WGS84 datum seems to come up provided that no +ellps element is passed (same applies for +ellps=WGS84). Otherwise, the datum is considered as 'unknown':

    $ projinfo -o WKT2:2019 "+proj=lcc +lat_1=33 +lat_2=45 +lat_0=39 +lon_0=-96+ellps=GRS80 +type=crs"
    WKT2:2019 string:
    PROJCRS["unknown",
        BASEGEOGCRS["unknown",
            DATUM["Unknown based on GRS80 ellipsoid",
                ELLIPSOID["GRS 1980",6378137,298.257222101,
                    LENGTHUNIT["metre",1],
                    ID["EPSG",7019]]],
            PRIMEM["Greenwich",0,
                ANGLEUNIT["degree",0.0174532925199433],
                ID["EPSG",8901]]],
        CONVERSION["unknown",
            METHOD["Lambert Conic Conformal (2SP)",
                ID["EPSG",9802]],
            PARAMETER["Latitude of false origin",39,
                ANGLEUNIT["degree",0.0174532925199433],
                ID["EPSG",8821]],
            PARAMETER["Longitude of false origin",-96,
                ANGLEUNIT["degree",0.0174532925199433],
                ID["EPSG",8822]],
            PARAMETER["Latitude of 1st standard parallel",33,
                ANGLEUNIT["degree",0.0174532925199433],
                ID["EPSG",8823]],
            PARAMETER["Latitude of 2nd standard parallel",45,
                ANGLEUNIT["degree",0.0174532925199433],
                ID["EPSG",8824]],
            PARAMETER["Easting at false origin",0,
                LENGTHUNIT["metre",1],
                ID["EPSG",8826]],
            PARAMETER["Northing at false origin",0,
                LENGTHUNIT["metre",1],
                ID["EPSG",8827]]],
        CS[Cartesian,2],
            AXIS["(E)",east,
                ORDER[1],
                LENGTHUNIT["metre",1,
                    ID["EPSG",9001]]],
            AXIS["(N)",north,
                ORDER[2],
                LENGTHUNIT["metre",1,
                    ID["EPSG",9001]]]]
    

(*) 'little support for datum': that is, except when you're happy to use a BOUNDCRS, which is generated when adding a +towgs84 key and hence with a datum indirectly implied by an 'abridged transformation' to target CRS WGS84 (EPSG:4326). That corresponds to the 'early binding' method of PROJ.4 IIUC. Documented here in the context of a transformation pipeline. Note however that +towgs84 is generally not returned anymore by projinfo -o proj, so you should use older sources to get at the needed parameters (+towgs84 is sometimes still returned, e.g. with zeros for EPSG:4258). Example with EPSG:31370 (Belge 1972 / Belgian Lambert 72):

  • The +towgs84 is lost:

    $ projinfo -o proj EPSG:31370
    PROJ.4 string:
    +proj=lcc +lat_0=90 +lon_0=4.36748666666667 +lat_1=51.1666672333333 +lat_2=49.8333339 +x_0=150000.013 +y_0=5400088.438 +ellps=intl +units=m +no_defs +type=crs
    
  • Re-inputting this proj-string results in an unknown datum (here we actually need datum "Reseau National Belge 1972" = EPSG:4313, as returned by projinfo -o WKT2:2019 EPSG:31370) (click command to see its output):

    $ projinfo -o WKT2:2019 "+proj=lcc +lat_0=90 +lon_0=4.36748666666667 +lat_1=51.1666672333333 +lat_2=49.8333339 +x_0=150000.013 +y_0=5400088.438 +ellps=intl +units=m +no_defs +type=crs"
    WKT2:2019 string:
    PROJCRS["unknown",
        BASEGEOGCRS["unknown",
            DATUM["Unknown based on International 1909 (Hayford) ellipsoid",
                ELLIPSOID["International 1909 (Hayford)",6378388,297,
                    LENGTHUNIT["metre",1,
                        ID["EPSG",9001]]]],
            PRIMEM["Greenwich",0,
                ANGLEUNIT["degree",0.0174532925199433],
                ID["EPSG",8901]]],
        CONVERSION["unknown",
            METHOD["Lambert Conic Conformal (2SP)",
                ID["EPSG",9802]],
            PARAMETER["Latitude of false origin",90,
                ANGLEUNIT["degree",0.0174532925199433],
                ID["EPSG",8821]],
            PARAMETER["Longitude of false origin",4.36748666666667,
                ANGLEUNIT["degree",0.0174532925199433],
                ID["EPSG",8822]],
            PARAMETER["Latitude of 1st standard parallel",51.1666672333333,
                ANGLEUNIT["degree",0.0174532925199433],
                ID["EPSG",8823]],
            PARAMETER["Latitude of 2nd standard parallel",49.8333339,
                ANGLEUNIT["degree",0.0174532925199433],
                ID["EPSG",8824]],
            PARAMETER["Easting at false origin",150000.013,
                LENGTHUNIT["metre",1],
                ID["EPSG",8826]],
            PARAMETER["Northing at false origin",5400088.438,
                LENGTHUNIT["metre",1],
                ID["EPSG",8827]]],
        CS[Cartesian,2],
            AXIS["(E)",east,
                ORDER[1],
                LENGTHUNIT["metre",1,
                    ID["EPSG",9001]]],
            AXIS["(N)",north,
                ORDER[2],
                LENGTHUNIT["metre",1,
                    ID["EPSG",9001]]]]
    
  • On an older installation (Xenial), using gdalsrsinfo from GDAL 1.11.3, we can get the original PROJ.4 string for EPSG:31370, that is, with the +towgs84 element:

    $ gdalsrsinfo -o proj4 EPSG:31370
    '+proj=lcc +lat_1=51.16666723333333 +lat_2=49.8333339 +lat_0=90 +lon_0=4.367486666666666 +x_0=150000.013 +y_0=5400088.438 +ellps=intl +towgs84=-106.869,52.2978,-103.724,0.3366,-0.457,1.8422,-1.2747 +units=m +no_defs '
    
  • Back in PROJ 7.2.1, re-inputting this proj-string returns a coordinate operation, as expected:

    $ projinfo -o WKT2:2019 "+proj=lcc +lat_1=51.16666723333333 +lat_2=49.8333339 +lat_0=90 +lon_0=4.367486666666666 +x_0=150000.013 +y_0=5400088.438 +ellps=intl +towgs84=-106.869,52.2978,-103.724,0.3366,-0.457,1.8422,-1.2747 +units=m +no_defs"
    WKT2:2019 string:
    CONVERSION["PROJ-based coordinate operation",
      METHOD["PROJ-based operation method: +proj=lcc +lat_1=51.16666723333333 +lat_2=49.8333339 +lat_0=90 +lon_0=4.367486666666666 +x_0=150000.013 +y_0=5400088.438 +ellps=intl +towgs84=-106.869,52.2978,-103.724,0.3366,-0.457,1.8422,-1.2747 +units=m +no_defs"]]
    
  • After adding the +type=crs, as explained earlier, we get the BOUNDCRS corresponding to the old proj-string:

    $ projinfo -o WKT2:2019 "+proj=lcc +lat_1=51.16666723333333 +lat_2=49.8333339 +lat_0=90 +lon_0=4.367486666666666 +x_0=150000.013 +y_0=5400088.438 +ellps=intl +towgs84=-106.869,52.2978,-103.724,0.3366,-0.457,1.8422,-1.2747 +units=m +no_defs +type=crs"
    WKT2:2019 string:
    BOUNDCRS[
        SOURCECRS[
            PROJCRS["unknown",
                BASEGEOGCRS["unknown",
                    DATUM["Unknown based on International 1909 (Hayford) ellipsoid",
                        ELLIPSOID["International 1909 (Hayford)",6378388,297,
                            LENGTHUNIT["metre",1,
                                ID["EPSG",9001]]]],
                    PRIMEM["Greenwich",0,
                        ANGLEUNIT["degree",0.0174532925199433],
                        ID["EPSG",8901]]],
                CONVERSION["unknown",
                    METHOD["Lambert Conic Conformal (2SP)",
                        ID["EPSG",9802]],
                    PARAMETER["Latitude of false origin",90,
                        ANGLEUNIT["degree",0.0174532925199433],
                        ID["EPSG",8821]],
                    PARAMETER["Longitude of false origin",4.36748666666667,
                        ANGLEUNIT["degree",0.0174532925199433],
                        ID["EPSG",8822]],
                    PARAMETER["Latitude of 1st standard parallel",51.1666672333333,
                        ANGLEUNIT["degree",0.0174532925199433],
                        ID["EPSG",8823]],
                    PARAMETER["Latitude of 2nd standard parallel",49.8333339,
                        ANGLEUNIT["degree",0.0174532925199433],
                        ID["EPSG",8824]],
                    PARAMETER["Easting at false origin",150000.013,
                        LENGTHUNIT["metre",1],
                        ID["EPSG",8826]],
                    PARAMETER["Northing at false origin",5400088.438,
                        LENGTHUNIT["metre",1],
                        ID["EPSG",8827]]],
                CS[Cartesian,2],
                    AXIS["(E)",east,
                        ORDER[1],
                        LENGTHUNIT["metre",1,
                            ID["EPSG",9001]]],
                    AXIS["(N)",north,
                        ORDER[2],
                        LENGTHUNIT["metre",1,
                            ID["EPSG",9001]]]]],
        TARGETCRS[
            GEOGCRS["WGS 84",
                DATUM["World Geodetic System 1984",
                    ELLIPSOID["WGS 84",6378137,298.257223563,
                        LENGTHUNIT["metre",1]]],
                PRIMEM["Greenwich",0,
                    ANGLEUNIT["degree",0.0174532925199433]],
                CS[ellipsoidal,2],
                    AXIS["latitude",north,
                        ORDER[1],
                        ANGLEUNIT["degree",0.0174532925199433]],
                    AXIS["longitude",east,
                        ORDER[2],
                        ANGLEUNIT["degree",0.0174532925199433]],
                ID["EPSG",4326]]],
        ABRIDGEDTRANSFORMATION["Transformation from unknown to WGS84",
            METHOD["Position Vector transformation (geog2D domain)",
                ID["EPSG",9606]],
            PARAMETER["X-axis translation",-106.869,
                ID["EPSG",8605]],
            PARAMETER["Y-axis translation",52.2978,
                ID["EPSG",8606]],
            PARAMETER["Z-axis translation",-103.724,
                ID["EPSG",8607]],
            PARAMETER["X-axis rotation",0.3366,
                ID["EPSG",8608]],
            PARAMETER["Y-axis rotation",-0.457,
                ID["EPSG",8609]],
            PARAMETER["Z-axis rotation",1.8422,
                ID["EPSG",8610]],
            PARAMETER["Scale difference",0.9999987253,
                ID["EPSG",8611]]]]
    

IMO the last steps (support for BOUNDCRS with +towgs84) feel contradictory to the deprecation of +towgs84, mentioned in several places in PROJ documentation. It think it's a relevant question to ask where this is heading to in the future.

For the R side I understand that "The relevant problem is that GDAL degrades input file CRS when exporting to Proj4" (Roger's comment) so this would mean that adding +towgs84 in R won't help anyway (cf. the rgdal warnings about discarded elements) - being unsure about this route.

@rsbivand
Copy link
Member

Recent sp/rgdal provide get_source_if_boundcrs=TRUE and in rgdal/R/Class-CRSx.R:

    wkt2 <- try(showSRID(uprojargs, format="WKT2", multiline="YES",
        prefer_proj=prefer_proj), silent=TRUE)
    if (!inherits(wkt2, "try-error")) {
      if (get_source_if_boundcrs) {
        if (length(grep("^BOUNDCRS", wkt2)) > 0L) {
          wkt2a <- try(.Call("get_source_crs", wkt2, PACKAGE="rgdal"),
            silent=TRUE)
          if (!inherits(wkt2a, "try-error")) wkt2 <- wkt2a
        }
      }
      if (get_enforce_xy()) {
        wkt2a <- try(.Call("proj_vis_order", wkt2, PACKAGE="rgdal"),
          silent=TRUE)
          if (!inherits(wkt2a, "try-error")) {
            wkt2 <- wkt2a
          }
      }
      res[[3]] <- wkt2
    }
  }

where the C code is:

SEXP get_source_crs(SEXP source) {

    PJ_CONTEXT *ctx = proj_context_create();
    PJ *source_crs, *target_crs;
    SEXP res;

    source_crs = proj_create(ctx, CHAR(STRING_ELT(source, 0)));

    if (source_crs == NULL) {
        proj_context_destroy(ctx);
        error("source crs not created");
    }

    target_crs = proj_get_source_crs(ctx, source_crs);

    if (target_crs == NULL) {
        proj_context_destroy(ctx);
        error("target crs not created");
    }

    PROTECT(res = NEW_CHARACTER(1));
    SET_STRING_ELT(res, 0,
#if PROJ_VERSION_MAJOR == 6  && PROJ_VERSION_MINOR < 3
        COPY_TO_USER_STRING(proj_as_wkt(ctx, target_crs, PJ_WKT2_2018, NULL))
#else
        COPY_TO_USER_STRING(proj_as_wkt(ctx, target_crs, PJ_WKT2_2019, NULL))
#endif
    );
    UNPROTECT(1);
    proj_destroy(target_crs);
    proj_destroy(source_crs);
    proj_context_destroy(ctx);

    return(res);


}

BOUNDCRS are found most often when a file contains a +towgs84= tag on reading, so unless the argument is set FALSE, the remnants of this tag will be removed. As you say, it is no longer the role of the CRS definition to provide a specific pipeline to GEOGCRS "WGS84".

@florisvdh
Copy link
Member Author

florisvdh commented Jun 18, 2021

Thanks for pointing at the argument. Since for below example the argument does not have an effect, it seems as if the +towgs84 element was already discarded at an earlier stage (by GDAL itself?). Correct?

Reprex
library(sp)
rgdal::rgdal_extSoftVersion()
#>           GDAL GDAL_with_GEOS           PROJ             sp 
#>        "3.2.1"         "TRUE"        "7.2.1"        "1.4-5"

# old proj4string (with towgs84) for EPSG:31370:
##################################################
crs1 <- CRS("+proj=lcc +lat_1=51.16666723333333 +lat_2=49.8333339 +lat_0=90 +lon_0=4.367486666666666 +x_0=150000.013 +y_0=5400088.438 +ellps=intl +towgs84=-106.869,52.2978,-103.724,0.3366,-0.457,1.8422,-1.2747 +units=m +no_defs +type=crs", 
            get_source_if_boundcrs = TRUE)
#> Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj
#> = prefer_proj): Discarded datum Unknown based on International 1909 (Hayford)
#> ellipsoid in Proj4 definition

# same, but get_source_if_boundcrs = FALSE
##################################################
crs2 <- CRS("+proj=lcc +lat_1=51.16666723333333 +lat_2=49.8333339 +lat_0=90 +lon_0=4.367486666666666 +x_0=150000.013 +y_0=5400088.438 +ellps=intl +towgs84=-106.869,52.2978,-103.724,0.3366,-0.457,1.8422,-1.2747 +units=m +no_defs +type=crs", 
            get_source_if_boundcrs = FALSE)
#> Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj
#> = prefer_proj): Discarded datum Unknown based on International 1909 (Hayford)
#> ellipsoid in Proj4 definition

# current proj-string (without towgs84) for EPSG:31370:
##################################################
crs3 <- CRS("+proj=lcc +lat_1=51.16666723333333 +lat_2=49.8333339 +lat_0=90 +lon_0=4.367486666666666 +x_0=150000.013 +y_0=5400088.438 +ellps=intl +units=m +no_defs +type=crs", 
            get_source_if_boundcrs = TRUE)
#> Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj
#> = prefer_proj): Discarded datum Unknown based on International 1909 (Hayford)
#> ellipsoid in Proj4 definition

# same, but get_source_if_boundcrs = FALSE
##################################################
crs4 <- CRS("+proj=lcc +lat_1=51.16666723333333 +lat_2=49.8333339 +lat_0=90 +lon_0=4.367486666666666 +x_0=150000.013 +y_0=5400088.438 +ellps=intl +units=m +no_defs +type=crs", 
            get_source_if_boundcrs = FALSE)
#> Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj
#> = prefer_proj): Discarded datum Unknown based on International 1909 (Hayford)
#> ellipsoid in Proj4 definition

all.equal(crs1, crs2)
#> [1] TRUE
all.equal(crs1, crs3)
#> [1] TRUE
all.equal(crs1, crs4)
#> [1] TRUE

cat(wkt(crs1), sep = "\n")
#> PROJCRS["unknown",
#>     BASEGEOGCRS["unknown",
#>         DATUM["Unknown based on International 1909 (Hayford) ellipsoid",
#>             ELLIPSOID["International 1909 (Hayford)",6378388,297,
#>                 LENGTHUNIT["metre",1,
#>                     ID["EPSG",9001]]]],
#>         PRIMEM["Greenwich",0,
#>             ANGLEUNIT["degree",0.0174532925199433],
#>             ID["EPSG",8901]]],
#>     CONVERSION["unknown",
#>         METHOD["Lambert Conic Conformal (2SP)",
#>             ID["EPSG",9802]],
#>         PARAMETER["Latitude of false origin",90,
#>             ANGLEUNIT["degree",0.0174532925199433],
#>             ID["EPSG",8821]],
#>         PARAMETER["Longitude of false origin",4.36748666666667,
#>             ANGLEUNIT["degree",0.0174532925199433],
#>             ID["EPSG",8822]],
#>         PARAMETER["Latitude of 1st standard parallel",51.1666672333333,
#>             ANGLEUNIT["degree",0.0174532925199433],
#>             ID["EPSG",8823]],
#>         PARAMETER["Latitude of 2nd standard parallel",49.8333339,
#>             ANGLEUNIT["degree",0.0174532925199433],
#>             ID["EPSG",8824]],
#>         PARAMETER["Easting at false origin",150000.013,
#>             LENGTHUNIT["metre",1],
#>             ID["EPSG",8826]],
#>         PARAMETER["Northing at false origin",5400088.438,
#>             LENGTHUNIT["metre",1],
#>             ID["EPSG",8827]]],
#>     CS[Cartesian,2],
#>         AXIS["(E)",east,
#>             ORDER[1],
#>             LENGTHUNIT["metre",1,
#>                 ID["EPSG",9001]]],
#>         AXIS["(N)",north,
#>             ORDER[2],
#>             LENGTHUNIT["metre",1,
#>                 ID["EPSG",9001]]]]

Created on 2021-06-18 by the reprex package (v2.0.0)

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 4.1.0 (2021-05-18)
#>  os       Linux Mint 20               
#>  system   x86_64, linux-gnu           
#>  ui       X11                         
#>  language nl_BE:nl                    
#>  collate  nl_BE.UTF-8                 
#>  ctype    nl_BE.UTF-8                 
#>  tz       Europe/Brussels             
#>  date     2021-06-18                  
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date       lib source        
#>  cli           2.5.0   2021-04-26 [1] CRAN (R 4.1.0)
#>  digest        0.6.27  2020-10-24 [1] CRAN (R 4.1.0)
#>  evaluate      0.14    2019-05-28 [1] CRAN (R 4.1.0)
#>  fs            1.5.0   2020-07-31 [1] CRAN (R 4.1.0)
#>  glue          1.4.2   2020-08-27 [1] CRAN (R 4.1.0)
#>  highr         0.9     2021-04-16 [1] CRAN (R 4.1.0)
#>  htmltools     0.5.1.1 2021-01-22 [1] CRAN (R 4.1.0)
#>  knitr         1.33    2021-04-24 [1] CRAN (R 4.1.0)
#>  lattice       0.20-44 2021-05-02 [4] CRAN (R 4.1.0)
#>  magrittr      2.0.1   2020-11-17 [1] CRAN (R 4.1.0)
#>  reprex        2.0.0   2021-04-02 [1] CRAN (R 4.1.0)
#>  rgdal         1.5-23  2021-02-03 [1] CRAN (R 4.1.0)
#>  rlang         0.4.11  2021-04-30 [1] CRAN (R 4.1.0)
#>  rmarkdown     2.8     2021-05-07 [1] CRAN (R 4.1.0)
#>  rstudioapi    0.13    2020-11-12 [1] CRAN (R 4.1.0)
#>  sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 4.1.0)
#>  sp          * 1.4-5   2021-01-10 [1] CRAN (R 4.1.0)
#>  stringi       1.6.2   2021-05-17 [1] CRAN (R 4.1.0)
#>  stringr       1.4.0   2019-02-10 [1] CRAN (R 4.1.0)
#>  withr         2.4.2   2021-04-18 [1] CRAN (R 4.1.0)
#>  xfun          0.23    2021-05-15 [1] CRAN (R 4.1.0)
#>  yaml          2.2.1   2020-02-01 [1] CRAN (R 4.1.0)
#> 
#> [1] /home/floris/lib/R/library
#> [2] /usr/local/lib/R/site-library
#> [3] /usr/lib/R/site-library
#> [4] /usr/lib/R/library

@rsbivand
Copy link
Member

rsbivand commented Jun 18, 2021

Interesting. I cannot check without re-installing sp and rgdal from CRAN, as the development versions (''sp** my github fork, rgdal R-forge) do different things, including caching CRS objects (some packages make repeated calls to CRS() with the same input). When nothing else is running, I'll try. rgdal::compare_CRS() gives more output than all.equal().

@rsbivand
Copy link
Member

rsbivand commented Jun 23, 2021

I don't see any difference after re-installing sp and rgdal from CRAN. The +towgs84= tag is removed by default in rgdal::checkCRSArgs_ng() irrespective of get_source_if_boundcrs= when rgdal::get_prefer_proj() - the default. This uses PROJ to instantiate the object internally.

> crs2 <- CRS("+proj=lcc +lat_1=51.16666723333333 +lat_2=49.8333339 +lat_0=90 +lon_0=4.367486666666666 +x_0=150000.013 +y_0=5400088.438 +ellps=intl +towgs84=-106.869,52.2978,-103.724,0.3366,-0.457,1.8422,-1.2747 +units=m +no_defs +type=crs", get_source_if_boundcrs = FALSE)
Warning message:
In showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj = prefer_proj) :
  Discarded datum Unknown based on International 1924 (Hayford 1909, 1910) ellipsoid in Proj4 definition
> crs2
CRS arguments:
 +proj=lcc +lat_0=90 +lon_0=4.36748666666667 +lat_1=51.1666672333333
+lat_2=49.8333339 +x_0=150000.013 +y_0=5400088.438 +ellps=intl +units=m
+no_defs 
> rgdal::set_prefer_proj(FALSE)
> crs2_OSR <- CRS("+proj=lcc +lat_1=51.16666723333333 +lat_2=49.8333339 +lat_0=90 +lon_0=4.367486666666666 +x_0=150000.013 +y_0=5400088.438 +ellps=intl +towgs84=-106.869,52.2978,-103.724,0.3366,-0.457,1.8422,-1.2747 +units=m +no_defs +type=crs", get_source_if_boundcrs = FALSE)
Warning message:
In showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj = prefer_proj) :
  Discarded datum Unknown based on International 1924 (Hayford 1909, 1910) ellipsoid in Proj4 definition,
 but +towgs84= values preserved
> crs2_OSR
CRS arguments:
 +proj=lcc +lat_0=90 +lon_0=4.36748666666667 +lat_1=51.1666672333333
+lat_2=49.8333339 +x_0=150000.013 +y_0=5400088.438 +ellps=intl
+towgs84=-106.869,52.2978,-103.724,0.3366,-0.457,1.8422,-1.2747
+units=m +no_defs 
> cat(wkt(crs2), sep = "\n")
PROJCRS["unknown",
    BASEGEOGCRS["unknown",
        DATUM["Unknown based on International 1924 (Hayford 1909, 1910) ellipsoid",
            ELLIPSOID["International 1924 (Hayford 1909, 1910)",6378388,297,
                LENGTHUNIT["metre",1,
                    ID["EPSG",9001]]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8901]]],
    CONVERSION["unknown",
        METHOD["Lambert Conic Conformal (2SP)",
            ID["EPSG",9802]],
        PARAMETER["Latitude of false origin",90,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8821]],
        PARAMETER["Longitude of false origin",4.36748666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8822]],
        PARAMETER["Latitude of 1st standard parallel",51.1666672333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8823]],
        PARAMETER["Latitude of 2nd standard parallel",49.8333339,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8824]],
        PARAMETER["Easting at false origin",150000.013,
            LENGTHUNIT["metre",1],
            ID["EPSG",8826]],
        PARAMETER["Northing at false origin",5400088.438,
            LENGTHUNIT["metre",1],
            ID["EPSG",8827]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]]
> cat(wkt(crs2_OSR), sep = "\n")
BOUNDCRS[
    SOURCECRS[
        PROJCRS["unknown",
            BASEGEOGCRS["unknown",
                DATUM["Unknown based on International 1924 (Hayford 1909, 1910) ellipsoid",
                    ELLIPSOID["International 1924 (Hayford 1909, 1910)",6378388,297,
                        LENGTHUNIT["metre",1,
                            ID["EPSG",9001]]]],
                PRIMEM["Greenwich",0,
                    ANGLEUNIT["degree",0.0174532925199433],
                    ID["EPSG",8901]]],
            CONVERSION["unknown",
                METHOD["Lambert Conic Conformal (2SP)",
                    ID["EPSG",9802]],
                PARAMETER["Latitude of false origin",90,
                    ANGLEUNIT["degree",0.0174532925199433],
                    ID["EPSG",8821]],
                PARAMETER["Longitude of false origin",4.36748666666667,
                    ANGLEUNIT["degree",0.0174532925199433],
                    ID["EPSG",8822]],
                PARAMETER["Latitude of 1st standard parallel",51.1666672333333,
                    ANGLEUNIT["degree",0.0174532925199433],
                    ID["EPSG",8823]],
                PARAMETER["Latitude of 2nd standard parallel",49.8333339,
                    ANGLEUNIT["degree",0.0174532925199433],
                    ID["EPSG",8824]],
                PARAMETER["Easting at false origin",150000.013,
                    LENGTHUNIT["metre",1],
                    ID["EPSG",8826]],
                PARAMETER["Northing at false origin",5400088.438,
                    LENGTHUNIT["metre",1],
                    ID["EPSG",8827]]],
            CS[Cartesian,2],
                AXIS["(E)",east,
                    ORDER[1],
                    LENGTHUNIT["metre",1,
                        ID["EPSG",9001]]],
                AXIS["(N)",north,
                    ORDER[2],
                    LENGTHUNIT["metre",1,
                        ID["EPSG",9001]]]]],
    TARGETCRS[
        GEOGCRS["WGS 84",
            DATUM["World Geodetic System 1984",
                ELLIPSOID["WGS 84",6378137,298.257223563,
                    LENGTHUNIT["metre",1]]],
            PRIMEM["Greenwich",0,
                ANGLEUNIT["degree",0.0174532925199433]],
            CS[ellipsoidal,2],
                AXIS["geodetic latitude (Lat)",north,
                    ORDER[1],
                    ANGLEUNIT["degree",0.0174532925199433]],
                AXIS["geodetic longitude (Lon)",east,
                    ORDER[2],
                    ANGLEUNIT["degree",0.0174532925199433]],
            ID["EPSG",4326]]],
    ABRIDGEDTRANSFORMATION["Transformation from unknown to WGS84",
        METHOD["Position Vector transformation (geog2D domain)",
            ID["EPSG",9606]],
        PARAMETER["X-axis translation",-106.869,
            ID["EPSG",8605]],
        PARAMETER["Y-axis translation",52.2978,
            ID["EPSG",8606]],
        PARAMETER["Z-axis translation",-103.724,
            ID["EPSG",8607]],
        PARAMETER["X-axis rotation",0.3366,
            ID["EPSG",8608]],
        PARAMETER["Y-axis rotation",-0.457,
            ID["EPSG",8609]],
        PARAMETER["Z-axis rotation",1.8422,
            ID["EPSG",8610]],
        PARAMETER["Scale difference",0.9999987253,
            ID["EPSG",8611]]]]
> crs2_OSR_not_bound <- CRS("+proj=lcc +lat_1=51.16666723333333 +lat_2=49.8333339 +lat_0=90 +lon_0=4.367486666666666 +x_0=150000.013 +y_0=5400088.438 +ellps=intl +towgs84=-106.869,52.2978,-103.724,0.3366,-0.457,1.8422,-1.2747 +units=m +no_defs +type=crs")
Warning message:
In showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj = prefer_proj) :
  Discarded datum Unknown based on International 1924 (Hayford 1909, 1910) ellipsoid in Proj4 definition,
 but +towgs84= values preserved
> crs2_OSR_not_bound
CRS arguments:
 +proj=lcc +lat_0=90 +lon_0=4.36748666666667 +lat_1=51.1666672333333
+lat_2=49.8333339 +x_0=150000.013 +y_0=5400088.438 +ellps=intl
+towgs84=-106.869,52.2978,-103.724,0.3366,-0.457,1.8422,-1.2747
+units=m +no_defs 
> cat(wkt(crs2_OSR_not_bound), sep = "\n")
PROJCRS["unknown",
    BASEGEOGCRS["unknown",
        DATUM["Unknown based on International 1924 (Hayford 1909, 1910) ellipsoid",
            ELLIPSOID["International 1924 (Hayford 1909, 1910)",6378388,297,
                LENGTHUNIT["metre",1,
                    ID["EPSG",9001]]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8901]]],
    CONVERSION["unknown",
        METHOD["Lambert Conic Conformal (2SP)",
            ID["EPSG",9802]],
        PARAMETER["Latitude of false origin",90,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8821]],
        PARAMETER["Longitude of false origin",4.36748666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8822]],
        PARAMETER["Latitude of 1st standard parallel",51.1666672333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8823]],
        PARAMETER["Latitude of 2nd standard parallel",49.8333339,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8824]],
        PARAMETER["Easting at false origin",150000.013,
            LENGTHUNIT["metre",1],
            ID["EPSG",8826]],
        PARAMETER["Northing at false origin",5400088.438,
            LENGTHUNIT["metre",1],
            ID["EPSG",8827]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]]

The chronology by reconstruction was possibly that I switched all instantiation via rgdal::SRS_string() from OSR to PROJ (get/set_prefer_proj()) after adding get_source_if_boundcrs=, but feeling that PROJ is "closer to the metal" than OSR in GDAL. It was obvious that PROJ was more predictable - if something like +towgs84= is "gone", it is "gone". Similarly, the PROJ transformation pipelines seem closer to the metal than GDAL, so rgdal uses them, not GDALs. It looks also as though the development versions of sp (my fork) and rgdal remove the BOUNDCRS anyway, which could be a logic branching error in the get_source_if_boundcrs = FALSE and prefer_proj FALSE branch:

> library(sp)
> crs2 <- CRS("+proj=lcc +lat_1=51.16666723333333 +lat_2=49.8333339 +lat_0=90 +lon_0=4.367486666666666 +x_0=150000.013 +y_0=5400088.438 +ellps=intl +towgs84=-106.869,52.2978,-103.724,0.3366,-0.457,1.8422,-1.2747 +units=m +no_defs +type=crs", get_source_if_boundcrs = FALSE)
Warning message:
In showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj = prefer_proj) :
  Discarded datum Unknown based on International 1924 (Hayford 1909, 1910) ellipsoid in Proj4 definition
> crs2
Coordinate Reference System:
Deprecated Proj.4 representation:
 +proj=lcc +lat_0=90 +lon_0=4.36748666666667 +lat_1=51.1666672333333
+lat_2=49.8333339 +x_0=150000.013 +y_0=5400088.438 +ellps=intl +units=m
+no_defs 
WKT2 2019 representation:
PROJCRS["unknown",
    BASEGEOGCRS["unknown",
        DATUM["Unknown based on International 1924 (Hayford 1909, 1910) ellipsoid",
            ELLIPSOID["International 1924 (Hayford 1909, 1910)",6378388,297,
                LENGTHUNIT["metre",1,
                    ID["EPSG",9001]]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8901]]],
    CONVERSION["unknown",
        METHOD["Lambert Conic Conformal (2SP)",
            ID["EPSG",9802]],
        PARAMETER["Latitude of false origin",90,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8821]],
        PARAMETER["Longitude of false origin",4.36748666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8822]],
        PARAMETER["Latitude of 1st standard parallel",51.1666672333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8823]],
        PARAMETER["Latitude of 2nd standard parallel",49.8333339,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8824]],
        PARAMETER["Easting at false origin",150000.013,
            LENGTHUNIT["metre",1],
            ID["EPSG",8826]],
        PARAMETER["Northing at false origin",5400088.438,
            LENGTHUNIT["metre",1],
            ID["EPSG",8827]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]] 
> rgdal::set_prefer_proj(FALSE)
> crs2_OSR <- CRS("+proj=lcc +lat_1=51.16666723333333 +lat_2=49.8333339 +lat_0=90 +lon_0=4.367486666666666 +x_0=150000.013 +y_0=5400088.438 +ellps=intl +towgs84=-106.869,52.2978,-103.724,0.3366,-0.457,1.8422,-1.2747 +units=m +no_defs +type=crs", get_source_if_boundcrs = FALSE)
> crs2_OSR
Coordinate Reference System:
Deprecated Proj.4 representation:
 +proj=lcc +lat_0=90 +lon_0=4.36748666666667 +lat_1=51.1666672333333
+lat_2=49.8333339 +x_0=150000.013 +y_0=5400088.438 +ellps=intl +units=m
+no_defs 
WKT2 2019 representation:
PROJCRS["unknown",
    BASEGEOGCRS["unknown",
        DATUM["Unknown based on International 1924 (Hayford 1909, 1910) ellipsoid",
            ELLIPSOID["International 1924 (Hayford 1909, 1910)",6378388,297,
                LENGTHUNIT["metre",1,
                    ID["EPSG",9001]]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8901]]],
    CONVERSION["unknown",
        METHOD["Lambert Conic Conformal (2SP)",
            ID["EPSG",9802]],
        PARAMETER["Latitude of false origin",90,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8821]],
        PARAMETER["Longitude of false origin",4.36748666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8822]],
        PARAMETER["Latitude of 1st standard parallel",51.1666672333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8823]],
        PARAMETER["Latitude of 2nd standard parallel",49.8333339,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8824]],
        PARAMETER["Easting at false origin",150000.013,
            LENGTHUNIT["metre",1],
            ID["EPSG",8826]],
        PARAMETER["Northing at false origin",5400088.438,
            LENGTHUNIT["metre",1],
            ID["EPSG",8827]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]] 
> crs2_OSR_no_bnds <- CRS("+proj=lcc +lat_1=51.16666723333333 +lat_2=49.8333339 +lat_0=90 +lon_0=4.367486666666666 +x_0=150000.013 +y_0=5400088.438 +ellps=intl +towgs84=-106.869,52.2978,-103.724,0.3366,-0.457,1.8422,-1.2747 +units=m +no_defs +type=crs", get_source_if_boundcrs = TRUE)
> crs2_OSR_no_bnds
Coordinate Reference System:
Deprecated Proj.4 representation:
 +proj=lcc +lat_0=90 +lon_0=4.36748666666667 +lat_1=51.1666672333333
+lat_2=49.8333339 +x_0=150000.013 +y_0=5400088.438 +ellps=intl +units=m
+no_defs 
WKT2 2019 representation:
PROJCRS["unknown",
    BASEGEOGCRS["unknown",
        DATUM["Unknown based on International 1924 (Hayford 1909, 1910) ellipsoid",
            ELLIPSOID["International 1924 (Hayford 1909, 1910)",6378388,297,
                LENGTHUNIT["metre",1,
                    ID["EPSG",9001]]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8901]]],
    CONVERSION["unknown",
        METHOD["Lambert Conic Conformal (2SP)",
            ID["EPSG",9802]],
        PARAMETER["Latitude of false origin",90,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8821]],
        PARAMETER["Longitude of false origin",4.36748666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8822]],
        PARAMETER["Latitude of 1st standard parallel",51.1666672333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8823]],
        PARAMETER["Latitude of 2nd standard parallel",49.8333339,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8824]],
        PARAMETER["Easting at false origin",150000.013,
            LENGTHUNIT["metre",1],
            ID["EPSG",8826]],
        PARAMETER["Northing at false origin",5400088.438,
            LENGTHUNIT["metre",1],
            ID["EPSG",8827]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]] 

@rsbivand
Copy link
Member

In development sp, identical results are caused not by a logic error, but by (new) CRS look-up caching. Repeated look-ups retrieve output CRS from a local cache if the same input is received.

@florisvdh
Copy link
Member Author

Thanks a lot Roger for demonstrating the effect of rgdal::set_prefer_proj(), and explaining on how rgdal itself decides on removing +towgs84 + changes in dev versions - learning a lot today 🙂!!! This indeed explains why first there was no effect of get_source_if_boundcrs (default being rgdal::set_prefer_proj(TRUE)).

In a context of changing use of proj-strings, IMHO I also think it is natural to consider PROJ as reference. Currently it still supports +towgs84 (BOUNDCRS) but at the same time +towgs84 is considered deprecated (I intend to ask the PROJ developers (shortly) where that is going to). So, dropping +towgs84 for rgdal::set_prefer_proj(TRUE) could feel counterintuitive at the moment, since it's still "not gone". But understood.

Some things to note:

  • with sp / rgdal, including or dropping +type=crs does not seem to have an effect: a CRS is always returned, albeit a 'defective' one with regard to the unknown datum (due to current PROJ). On the other hand PROJ's projinfo -o WKT2:2019 will return a coordinate operation WKT instead of a CRS WKT, if +type=crs is missing (which is in line with the PROJ documentation: proj-strings are seen as coordinate operations, not CRSs). This seems a little confusing.
    • See also the absence of +type=crs in the 'Deprecated Proj.4 string' when printing a CRS object with the sp dev version, while projinfo -o proj seems to always return a proj-string with +type=crs for an inputted CRS object (such as EPSG:27700).
  • in the example with the dev versions: as you note crs2_OSR is (prints) a PROJCRS, instead of a BOUNDCRS as with sp and rgdal from CRAN. I may not understand your explanation too well on that (:thinking:), but I guess you mean it should have been BOUNDCRS so it can be ignored here. (?)

@florisvdh
Copy link
Member Author

florisvdh commented Jul 2, 2021

Sharing here some outcomes from the PROJ mailing list, earlier this week, on the subject of using PROJ strings to specify a CRS (answers from Even Rouault and others). PROJ strings for CRSs will get continued (though reduced) support, for backward compatibility reasons.

After clarifying some aspects, following advice is considered appropriate (see also https://lists.osgeo.org/pipermail/proj/2021-June/010301.html):

  1. Using a PROJ string to specify a CRS is discouraged and currently not much documented by PROJ (PROJ strings are used for coordinate operations instead). At least for registered CRSs, always use AUTHORITY:CODE or WKT2, not PROJ strings.
  2. Also custom CRSs are ideally specified as WKT2, or more conveniently as (currently not yet official) PROJJSON, which follows the WKT2 structure.
  3. When specifying a custom CRS with a PROJ string, the WGS84 ensemble datum (datum:EPSG::6326) will automatically be adopted if no +ellps is specified (and with +ellps, the datum is considered as 'unknown'). If these conditions are fine for your usecase, then there's no problem.
  4. When specifying a custom CRS that doesn't use the WGS84, NAD27 or NAD83 datum (which can be specified with +datum), it is better to use WKT2 or PROJJSON and properly refer the geodetic datum. A typically less accurate alternative is defining as a BOUNDCRS by specifying +towgs84 transformation parameters.

I intend to use that as basis for a few updates in our tutorial on CRSs in R.

Further notes:

  • For backward compatibility reasons, PROJ strings with +towgs84 or +datum, although deprecated, will remain supported by PROJ and GDAL.
    • sf::st_crs() supports this by default. IIUC currently sp::CRS() needs rgdal::set_prefer_proj(FALSE) and its argument get_source_if_boundcrs = FALSE to support +towgs84.
  • As said before, PROJ needs +type=crs for a PROJ string to be interpreted as CRS instead of a coordinate operation. GDAL doesn't have this need, always interpreting as CRS, which is for backward compatibility with GDAL < 3.
  • projinfo -o proj <AUTHORITY>:<CODE> will only return +towgs84 in the PROJ string when there's only one transformation known from the datum of the CRS to WGS84.

florisvdh added a commit to inbo/tutorials that referenced this issue Jul 9, 2021
Adding some insights on the current and expected state of
PROJ string support, since several people
stick to PROJ strings for convenience despite this approach
is discouraged.

See r-spatial/discuss#43 (comment)
and https://lists.osgeo.org/pipermail/proj/2021-June/010301.html .
@florisvdh
Copy link
Member Author

Just updated the discussed CRS tutorial (link). I essentially added a bit more info on current state of PROJ string support for CRSs (partly in footnotes), and referred to PROJJSON, meanwhile trying to stay in line with PROJ documentation and trying not to complicate things too much. Basic advice for specifying EPSG CRSs of course remained the same.

@rsbivand
Copy link
Member

rsbivand commented Jul 9, 2021

Maybe replace proj4string(cities2) <- crs_wgs84 by slot(cities2, "proj4string") <- crs_wgs84.

Also consider introducing something like:

EPSG_db <- rgdal::make_EPSG()
EPSG_db[grep("Belg", EPSG_db$note), 1:2]

to help in finding EPSG codes. Another optional extra is:

data(GridsDatums, package="rgdal")
GridsDatums[grep("Belg", GridsDatums$country),]

The latter needs updating to link directly to https://www.asprs.org/wp-content/uploads/2018/03/10-16-GD-Belgium.pdf for the 2016 link from https://www.asprs.org/asprs-publications/grids-and-datums. These short articles give useful context on how countries have adapted and modernised their standards.

@florisvdh
Copy link
Member Author

Maybe replace proj4string(cities2) <- crs_wgs84 by slot(cities2, "proj4string") <- crs_wgs84.

OK, thank you Roger.

Also consider introducing something like:

EPSG_db <- rgdal::make_EPSG()
EPSG_db[grep("Belg", EPSG_db$note), 1:2]

to help in finding EPSG codes.

Yes, that's a good fit indeed!

The ASPRS Grids and Datums articles are an interesting source, thanks for reminding. IIRC for Belgium it focuses less (resp. not) on the two currently used CRSs: 1) "Belge 1972 / Belgian Lambert 72" (EPSG:31370, based on the "Reseau National Belge 1972" datum) - still used most often - and 2) "ETRS89 / Belgian Lambert 2008" (EPSG:3812, based on the ETRS89 datum) which has accuracy advantages in using GPS data and for conversion to other ETRS89-based CRSs within Europe (EPSG or custom CRSs). In time, it can be expected that the second will have more priority; it is also required by the EU INSPIRE directive. (Comments by the Flemish Geographic Information Agency are at https://www.geopunt.be/voor-experts/lambert-2008 - only available in Dutch).

I hope to delve deeper into this later on (consulting official Belgian sources, including related topics such as the Flemish Positioning Service FLEPOS and the unfortunate ESRI:102499 mismatch), and perhaps start a separate page about it, for Belgian R users and other interested people 😉.

florisvdh added a commit to inbo/tutorials that referenced this issue Jan 31, 2022
Inspired on:

- a (long-standing) suggestion of Roger Bivand in r-spatial/discuss#43 (comment)
- since rgdal will retire, the hint at the end of an email by Michael Sumner:
https://stat.ethz.ch/pipermail/r-sig-geo/2021-September/028778.html

The rgdal approach has been included as a markdown comment.
florisvdh added a commit to inbo/tutorials that referenced this issue Jan 31, 2022
This applies the request by Roger Bivand in r-spatial/discuss#43 (comment).
Apologies for this late update.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants