I was looking for some collection of posts earlier about Proton Mail and the whole controversy with the CEO, and I opened a post the lemmy instance that was suggested was lemmy.zip but the community and the poster were from lemmy.world so that made me ask myself a bunch of questions. Reference link
Note: I used duckduckgo
Here are some questions I have:
- How does the search engine decide which instance to link you to as you could in theory show every instance for the same post?
- Could you get a result where all the results are the same post just different instances?
- Do you think that could deter new people finding out about lemmy through search results?
- How can an instance make themselves more visible in the search results (for exposure)?
- I did not get any results from lemmy clients such as vger.app the only results were direct instances, will this always be the case?
I remember learning about search engines a while back but I don’t know how relevant that information is any more. Having crawlers and the more a website is linked in other websites the higher up in the search result will be and the whole robot.txt thing.
I know if I wanted to search for something specific in lemmy I could just use its own search function, but what about people who ask general questions and that happens to be answered in a lemmy post. I wanted to know how exposed we are/ will be to people who don’t yet know about lemmy.
Search engines don’t treat Lemmy specially. They index the pages just like any other site. If it’s discoverable through the crawling process, it’ll be indexed.
Instances that disagree with being found in search engines are not shown. Instance admins can configure their
robots.txt
by addinglemmy-search
. All other instances can theoretically be found. I think their priority depends on the laws of SEO (Search Engine Optimization). This probably means that a post on myownlemmy1337 that is federated with lemmy.world, will be found as a post on lemmy.world.So, if Lemmy was very famouse, I guess it’s possible to get pages over pages with the same result from different instances. However search engines usually have a way to exclude “similar” results.
For voyager it may be possible, they do not want to be found, I don’t know about this though. You could add
site:vger.app
to your search prompt for testing this.I think most search engines are not optimized for this. I’m sure it’s changing but might take some time.
Google historically penalizes duplicate content and selects one source as canonical, usually whichever domain is the most authoritative. When it comes to lemmy, whichever instance hosts the community should probably be the canonical source.
Every post has a
<link rel="canonical" href="https://lemmy.instance/whatever">
tag on it which links to the version of the post on the author’s instance.TIL, thanks for the insight. This is as it should be and Google can deal with it no problem.