It seems that self hosting, for oneself, a federated service, like Lemmy, would only serve to increase the traffic in the network, and not actually serve the purpose of load balancing between servers.
As far as I understand it, the way federation is supposed to work is that the servers cache all the content locally to then serve to the people that are registered to that server. In doing so, the servers only have to transmit a minimal amount of data between themselves which lowers the overhead for small servers – this then means that a small server doesn’t get overwhelmed by a ton of people requesting from it. Now, if, instead, you have everyone self hosting their own server, you go right back to having everyone sending a ton of requests to small servers, thereby overwhelming them. It seems that it’s really only beneficial to the network if you have, say, hundreds of medium sized servers instead of, say, thousands, of very small servers. While there is the resilience factor, the overhead of the network would be rather overwhelming.
Perhaps one possibility of fixing this is to use some form of load balancer like IPFS to distribute the requests more evenly, but I am no where even remotely close to being knowledgeable enough in that to say anything definitively.
ActivityPub is not a distributed network: you don’t have communications between servers in a mesh, the server that owns a community(ex. fediverse@lemmy.world) pushes out JSON data to any subscribers.
Small servers won’t talk directly to each other, unless they’re subscribed to communities on each other so having a lot of small servers doesn’t actively impact the load on each other, but only on the larger servers that have the more active communities.
And, even then, the JSON requests are going to be a lower impact than a user actively browsing the site, though probably only marginally and maybe not in all cases.
One big difference between the json requests and a user callling for the site directly is your instance pulls all the data all the time, whereas a user only pulls the data they use themselves.
Just to be pedantic, it’s not pull, it’s push: the data is POSTed from the server that hosts the community.
Right now loading a page makes a bunch of API queries to pull all the related data for the posts, votes, sidebar info, and so on AND the API is very untuned and sending way more data than the WebUI/a client needs to actually generate a page: hence my ‘it’s less efficient’ comment, though this is certainly something that can be tweaked to improve performance between the back and frontends.
I will, however, admit that this is only true if someone is actually reading the content they’re subscribed to. The ‘subscribe to everything’ scripts turn this math on its head because now you are using resources to gather data you don’t care about.
Actually, the instance you federate with will push the data, and they do so at their own leisure (configurable by the instance admin), and the data itself is already created in the queue (minimal database load), so it definitely have lower impact than actual users browsing the site.