In case anyone was wondering, yes we were down for ~2 hours or so. I apologize for the inconvenience.

We had a botched upgrade path from 0.17.4 -> 0.18.0. I spent some time debugging but eventually gave up and restored a snapshot (taken on Saturday Jun 24, 2023 @ 11:00 UTC).

We’ll likely stick to 0.17.4 till I can figure out a safe path to upgrade to a bigger (and up-to-date) instance and carry over all the user data. Any help/advice welcome. Hopefully this doesn’t occur again!

    • lemmyrs@lemmyrs.orgOPM
      link
      fedilink
      English
      arrow-up
      1
      ·
      edit-2
      1 year ago

      Well, the lemmy container kept running into:

      lemmy  | 2023-06-24T23:28:52.716586Z  INFO lemmy_server::code_migrations: Running apub_columns_2021_02_02
      lemmy  | 2023-06-24T23:28:52.760510Z  INFO lemmy_server::code_migrations: Running instance_actor_2021_09_29
      lemmy  | 2023-06-24T23:28:52.763723Z  INFO lemmy_server::code_migrations: Running regenerate_public_keys_2022_07_05
      lemmy  | 2023-06-24T23:28:52.801409Z  INFO lemmy_server::code_migrations: Running initialize_local_site_2022_10_10
      lemmy  | 2023-06-24T23:28:52.803303Z  INFO lemmy_server::code_migrations: No Local Site found, creating it.
      lemmy  | thread 'main' panicked at 'couldnt create local user: DatabaseError(UniqueViolation, "duplicate key value violates unique constraint \"local_user_person_id_key\"")', crates/db_schema/src/impls/local_user.rs:157:8
      

      despite the fact that:

      lemmy=# select id, site_id from local_site;
       id | site_id
      ----+---------
        1 |       1
      (1 row)
      

      So you can see that it was unconditionally trying to create a local_site and running into a DB constraint error. I further narrowed it down to this piece of code:

      ///
      /// If a site already exists, the DB migration should generate a local_site row.
      /// This will only be run for brand new sites.
      async fn initialize_local_site_2022_10_10(
        pool: &DbPool,
        settings: &Settings,
      ) -> Result<(), LemmyError> {
        info!("Running initialize_local_site_2022_10_10");
      
        // Check to see if local_site exists
        if LocalSite::read(pool).await.is_ok() {
          return Ok(());
        }
        info!("No Local Site found, creating it.");
      

      At this point I gave up because I couldn’t really tell why LocalSite::read(pool).await.is_ok() was, well…not ok.