SEO: resolve potential duplicite content #19

marekcierny · 2016-01-04T21:35:22Z

Several examples of potential duplicite content exist:

shift between language versions: https://anatom.cz/en/ - https://practiceanatomy.com/
user registration: anatom.cz/view/LE/?sessionid= - anatom.cz/view/LE/
view of a particular image https://anatom.cz/view/04/?context=svaly-krkusvg - https://anatom.cz/view/04/

Duplicite content should be a) avoided if possible, b) resolved by redirect 301, or C) resolved by <link rel="canonical" (https://support.google.com/webmasters/answer/139066).

slaweet · 2016-01-05T05:58:47Z

has been resolved by redirect 301 for some time, but google still didn't reindex it. More recently I tried to disallow /en/ and /cs/ urls in robots.txt
has been resolved for some time as well, it's just still hanging in google index
is a TODO

marekcierny · 2016-01-05T07:43:05Z

I would argue against disallowing /en/ and /cs/ urls in robots.txt, as any link to an URL which robots cannot access leads to loss of page rank (it might prevent it from seeing the redirect).

Ultimately, I think C) meta "canonical" should be added to every page to resolve any potential duplicate content we might miss... (E.g. tracking campaigns and traffic sources)

marekcierny · 2016-01-09T20:42:00Z

I wrote a simple PHP function that rewrites any url into a "canonical url".
canonical.TXT

If "echo get_canonical_meta($url)" can be added into every page , it can help us explain to search engines our duplicite content.

papousek · 2016-01-09T21:37:43Z

Unfortunately, the application is written in Python, so we can not include your script into every page view directly. On the other hand, I assume we are able to rewrite it into Python (@slaweet?)

slaweet · 2016-01-10T09:55:19Z

I added canonical urls (f9a7454).
I'm just stripping query string (everything after ?). I changed /overview/?tab=location to /overview/tab/location because of it.
I didn't implement the part with changing domain in canonical, because it wouldn't get ever executed, because the 301 redirect gets executed first and then we are on the correct domain.

slaweet · 2016-01-10T10:06:57Z

As for disallowing /en/ and /cs/ I removed it from robots.txt, but I don't see why it should influence page rank of any other page then the ones with /en/ and /cs/, which we don't want in search results anyway. And IMO we don't want Google to see the redirect, but directly the alternative language version through <link rel="alternate" ...

marekcierny · 2016-01-10T11:09:16Z

OK.
As for disallowing /en/ and /cs/ in robots: http://webmasters.stackexchange.com/questions/54240/is-it-safe-to-block-redirected-but-still-linked-urls-with-robots-txt (In general, my understanding is dissallowing robots to any url we link to within our site is not good.)

The canonical form of the url is also related to <link rel="alternate" sitemap: only canonical forms of urls should be linked as another language version.
For example, on https://anatom.cz/practice//, the canonical url is https://anatom.cz/practice/, and the alternate languagesshould also end onlz with one /.

Issue #19

slaweet · 2016-01-10T11:46:35Z

I've updated <link rel="alternate" (1d33303), even though I don't think it matters what is on the non-canonical pages, as Google is only going to look at (index) the canonical ones.
I've also added '//' -> '/' replacement to canonical url.

marekcierny · 2016-01-12T07:15:44Z

Thank you, Víťo.
Do you use www.google.com/webmasters/tools/ to check for SEO warnings/errors? (I think it's a great tool, especially as we want to ad more languages and content in the future.)
I've just noticed that when logged in, the view-source:https://anatom.cz/ shows canonical address "https://anatom.cz/overview/". But when logged off, it's correct.

marekcierny · 2016-01-13T10:52:34Z

I might be too picky, but other potential duplicate content is
4. url with "/" and without "/" at the end. (e.g. https://anatom.cz/practice/A [chapter selected with a tick] https://anatom.cz/practice/A/ [chapter selected with click on an arrow])
5. selection of chapters for practice (e.g. https://anatom.cz/practice/09/LE and https://anatom.cz/practice/LE/09 [the second url accessible from anatom.cz/view/LE/ - vybrat podkapitolu])

slaweet · 2016-01-13T14:18:09Z

view-source:https://anatom.cz/ for logged in users actually redirects to view-source:https://anatom.cz/overview (notice address bar). Hopefully, search engines cannot log in :-)

I use www.google.com/webmasters/tools/ every now and then, I haven't noticed any SEO warnings or errors there. I've linked Webmaster tools with GA, so it probably displays the errors in GA as well.

Ad 4 and 5: I see the problem, I'll have to think about how to solve it technically.

marekcierny · 2016-01-15T10:07:05Z

Although there is no link to such a page, not sure if this could be problem for search engines or users/brand/security:
https://anatom.cz/overview/V%C3%ADt%C3%A1%20v%C3%A1s%20blbe%C4%8Dek
https://anatom.cz/view/02/V%C3%ADt%C3%A1%20v%C3%A1s%20blbe%C4%8Dek
(random url parameter is recognized as canonical, and the random text is displayed in heading)

slaweet · 2016-01-15T10:38:13Z

Re #19 (comment):
Good catch.
That URL is actually a link to view knowledge of a user, e.g.
https://anatom.cz/overview/slaweet
https://anatom.cz/overview/cierny.m

The problem is that we don't do the check if the given string is a valid username. If not, then the page should return an error.

marekcierny · 2016-02-02T16:33:49Z

Víťo, when I suggested to make a separate url for /overview/?tab=location in order to get the crawler see our main content tree, I didn't know that google can understand AJAX.
Now I think it wasn't a good idea from the start, and we might be better without it. I am sorry to make it complicated.

slaweet · 2016-02-03T07:39:40Z

Marku, I don't think Google AJAX crawling scheme is applicable here. Anything we want to appear in search results (like /overview/?tab=location) has to be on a separate url.

slaweet · 2016-02-03T07:44:04Z

And FYI, your example with "Vítá vás blbeček" has been indexed by google as Google crawled our Github :-)
FYI no.2 the problem with SEO in GA was just reporting issue and was caused by http -> https migration in December. Our impressions changed to https vesion of anatom.cz and those were not listed.

marekcierny · 2016-06-07T12:20:18Z

First, I am concerned we have very similar content (and identical ) when user view in image under different chapters/body parts (eg. practiceanatomy.com/view/UE/image/casti-lidskeho-telasvg and practiceanatomy.com/view/LE/image/casti-lidskeho-telasvg). Can we change the url to practiceanatomy.com/view/LE/#image/casti-lidskeho-telasvg or practiceanatomy.com/view/LE/#image/5 ?

Second, I've found a simple SEO guide, and there are several things we do not do yet:

description tag is the same over the site. Can we make it unique for the pages we would like users to land on?
useful (custom) 404 page (suggested text here)
HTML sitemap and Create a sitemap.xml #17 XML sitemap.

slaweet added a commit that referenced this issue Jan 10, 2016

Add canonical urls. issue #19

f9a7454

slaweet added a commit that referenced this issue Jan 10, 2016

Remove /cs/ and /en/ from robots.txt. Issue #19

61ae0ac

slaweet added a commit that referenced this issue Jan 10, 2016

Change <link alternative to use canonical paths

1d33303

Issue #19

marekcierny added the FI label Jan 13, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SEO: resolve potential duplicite content #19

SEO: resolve potential duplicite content #19

marekcierny commented Jan 4, 2016

slaweet commented Jan 5, 2016

marekcierny commented Jan 5, 2016

marekcierny commented Jan 9, 2016

papousek commented Jan 9, 2016

slaweet commented Jan 10, 2016

slaweet commented Jan 10, 2016

marekcierny commented Jan 10, 2016

slaweet commented Jan 10, 2016

marekcierny commented Jan 12, 2016

marekcierny commented Jan 13, 2016

slaweet commented Jan 13, 2016

marekcierny commented Jan 15, 2016

slaweet commented Jan 15, 2016

marekcierny commented Feb 2, 2016

slaweet commented Feb 3, 2016

slaweet commented Feb 3, 2016

marekcierny commented Jun 7, 2016

SEO: resolve potential duplicite content #19

SEO: resolve potential duplicite content #19

Comments

marekcierny commented Jan 4, 2016

slaweet commented Jan 5, 2016

marekcierny commented Jan 5, 2016

marekcierny commented Jan 9, 2016

papousek commented Jan 9, 2016

slaweet commented Jan 10, 2016

slaweet commented Jan 10, 2016

marekcierny commented Jan 10, 2016

slaweet commented Jan 10, 2016

marekcierny commented Jan 12, 2016

marekcierny commented Jan 13, 2016

slaweet commented Jan 13, 2016

marekcierny commented Jan 15, 2016

slaweet commented Jan 15, 2016

marekcierny commented Feb 2, 2016

slaweet commented Feb 3, 2016

slaweet commented Feb 3, 2016

marekcierny commented Jun 7, 2016