cross-posted from: https://lemmy.ninja/post/30492

Summary

We started a Lemmy instance on June 13 during the Reddit blackout. While we were configuring the site, we accumulated a few thousand bot accounts, leading some sites to defederate with us. Read on to see how we cleaned up the mess.

Introduction

Like many of you, we came to Lemmy during the Great Reddit Blackout. @MrEUser started Lemmy.ninja on the 13th, and the rest of us on the site got to work populating some initial rules and content, learning how Lemmy worked, and finding workarounds for bugs and issues in the software. Unfortunately for us, one of the challenges to getting the site up turned out to be getting the email validation to work. So, assuming we were small and beneath notice, we opened our registration for a few days until we could figure out if the problems we were experiencing were configuration related or software bugs.

In that brief time, we were discovered by malicious actors and hundreds of new bot users were being created on the site. Of course we had no idea, since Lemmy provides no user management features. We couldn’t see them, and the bots didn’t participate in any of our local content.

Discovering the Bots

Within a couple of days, we discovered some third-party tools that gave us the only insights we had into our user base. Lemmy Explorer and The Federation were showing us that a huge number of users had registered. It took a while, but we eventually tracked down a post that described how to output a list of users from our Lemmy database. Sure enough, there were thousands of users there. It took some investigation, but we were eventually able to see which users were actually registered at lemmy.ninja. There were thousands, just like the third-party tools told us.

Meanwhile…

While we were figuring this out, others in Lemmy had noticed a coordinated bot attack, and some were rightly taking steps to cordon off the sites with bots as they began to interact with federated content. Unfortunately for us, this news never made it to us because our site was still young, and young Lemmy servers don’t automatically download all federated content right away. (In fact, despite daily efforts to connect lemmy.ninja to as many communities as possible, I didn’t even learn about the lemm.ee mitigation efforts until today.)

We know now that the bots began to interact with other Mastodon and Lemmy instances at some point, because we learned (again, today) that we had been blocked by a few of them. (Again, this required third-party tools to even discover.) At the time, we were completely unaware of the attack, that we had been blocked, or that the bots were doing anything at all.

Cleaning Up

The moment we learned that the bots were in our database, we set out to eliminate them. The first step, of course, was to enable a captcha and activate email validation so that no new bots could sign up. [Note: The captcha feature was eliminated in Lemmy 0.18.0.] Then we had to delete the bot users.

Next we made a backup. Always make a backup! After that, we asked the database to output all the users so we could manually review the data. After logging into the database docker container, we executed the following command:


select
  p.name,
  p.display_name,
  a.person_id,
  a.email,
  a.email_verified,
  a.accepted_application
from
  local_user a,
  person p
where
  a.person_id = p.id;

That showed us that yes, every user after #8 or so was indeed a bot.

Next, we composed a SQL statement to wipe all the bots.


BEGIN;
CREATE TEMP TABLE temp_ids AS
SELECT person_id FROM local_user WHERE person_id > 85347;
DELETE FROM local_user WHERE person_id IN (SELECT person_id FROM temp_ids);
DELETE FROM person WHERE id IN (SELECT person_id FROM temp_ids);
DROP TABLE temp_ids;
COMMIT;

And to finalize the change:


UPDATE site_aggregates SET users = (SELECT count(*) FROM local_user) WHERE site_id = 1;

If you read the code, you’ll see that we deleted records whose person_id was > 85347. That’s the approach that worked for us. But you could just as easily delete all users who haven’t passed email verification, for example. If that’s the approach you want to use, try this SQL statement:


BEGIN;
CREATE TEMP TABLE temp_ids AS
SELECT person_id FROM local_user WHERE email_verified = 'f';
DELETE FROM local_user WHERE person_id IN (SELECT person_id FROM temp_ids);
DELETE FROM person WHERE id IN (SELECT person_id FROM temp_ids);
DROP TABLE temp_ids;
COMMIT;

And to finalize the change:


UPDATE site_aggregates SET users = (SELECT count(*) FROM local_user) WHERE site_id = 1;

Even more aggressive mods could put these commands into a nightly cron job, wiping accounts every day if they don’t finish their registration process. We chose not to do that (yet). Our user count has remained stable with email verification on.

After that, the bots were gone. Third party tools reflected the change in about 12 hours. We did some testing to make sure we hadn’t destroyed the site, but found that everything worked flawlessly.

Wrapping Up

We chose to write this up for the rest of the new Lemmy administrators out there who may unwittingly be hosts of bots. Hopefully having all of the details in one place will help speed their discovery and elimination. Feel free to ask questions, but understand that we aren’t experts. Hopefully other, more knowledgeable people can respond to your questions in the comments here.

  • shoe
    link
    fedilink
    English
    131 year ago

    Thanks for the writeup! No plans of my own to host a server at the moment, but there’s some good takeaways here.

    also saw lemmy.ninja has a boomer shooters community so ofc had to subscribe

    • MrEUserOP
      link
      fedilink
      English
      61 year ago

      Boomer Shooter is my community, you have made my night!

  • DrWeevilJammer
    link
    fedilink
    English
    121 year ago

    Thanks for the thorough writeup! It’s worth noting that the captcha will be back in the next version, but not exactly sure when it will be released.

    They removed it during the switch from web sockets (which apparently took a lot of time and effort to keep updated), but someone submitted a pull request for a non-web socket version of the captcha code, which was accepted.

    So hopefully we’ll all be able to update to the new version soon.

    • MrEUserOP
      link
      fedilink
      English
      71 year ago

      Thanks for the good news. I’ll let rotarykeyboard know. He’ll be ecstatic.

  • @bdonvr@thelemmy.club
    link
    fedilink
    English
    111 year ago

    As an FYI - all you need to do is delete user entries in the person table.

    It will delete the local_user entry and also update the site_aggregates count automatically.

    • @imaqtpie@sh.itjust.works
      link
      fedilink
      English
      11 year ago

      For others looking for info about fixing the problem, @Saigonauticon@voltage.vn was also able to delete tens of thousands bots from his server. He gave a more succinct explanation, but it may be helpful to others with this problem.

      I’ve wiped the bots out of my instance. In case you meet someone else with the same issue, here’s what I did:

      1. Log in to my server over ssh
      2. run ‘docker -ps’ to get the name of the container running postgresql
      3. docker exec -it <container name> /bin/bash
      4. psql -h localhost -p 5432 -U lemmy -d lemmy
      5. The bot users in my case were all users where id>=3 in my case.
      6. So I ran DELETE FROM TABLE local_user where id>=3
      7. Done

      Of course this is not practical for large instances with a lot of actual users. They might need a user ID range or list to delete. Maybe I ought to write a script to automate this or something.

  • @broken_chatbot@vlemmy.net
    link
    fedilink
    English
    61 year ago

    Thank you for contributing to what will become “best practices” for any new instance when the surge of new users from Reddit (as well as spambots) subsides and the situation becomes less chaotic. I hope other instances which blocked yours over the spambots reconsider their decision.

  • Ulu-Mulu-no-die
    link
    fedilink
    English
    51 year ago

    Fantastic write-up! Thanks for sharing so hopefully other people can avoid the same problem.

  • @highspire@sopuli.xyz
    link
    fedilink
    English
    31 year ago

    I want to say that I saw a post at some point about one of the popular instances (was it beehaw or lemmy.world?) not requiring users to answer a challenge question during signup. Does it seem like email verification is sufficient to mitigate bots or does a challenge question help too, do you think? Sopuli required both. Unverified users can apparently post?

    • MrEUserOP
      link
      fedilink
      English
      11 year ago

      I’m not positive what combination prevents bots from being dumped in to databases. There may even be an exploit that makes using everything irrelevant. I don’t know. For right now, I am vigilant about checking our database looking for sudden large changes.