I made this based on the gripe about some of the silent failures with federation. Might help users choose other servers. Might help admins troubleshoot. Open to comments and criticisms!

  • maegul (he/they)@lemmy.ml
    link
    fedilink
    English
    arrow-up
    22
    ·
    1 year ago

    Oooohhh … Nice!! I’m repeatedly impressed at how many hackers are going ahead and just getting some stuff done here!!

    Questions/thoughts:

    1. What instance is used as a reference for the delay? One you self-host (lemmy.management)?
    2. Sooo … what’s the deal with lemmy.ml … that seems to have gone beyond lag and is basically falling over … seems like the devs have neglected their own instance’s health?
    3. What’s that Redash? Is it a plotly thing or some other product that just uses their graphing library? How have you found it?
    • hawkwind@lemmy.managementOP
      link
      fedilink
      English
      arrow-up
      10
      ·
      1 year ago

      What instance is used as a reference for the delay? One you self-host (lemmy.management)?

      Yes. lemmy.management. It is purposefully updating subscribed communities to as many as possible (via automation.) This doesn’t correct for network lag, but the idea was to capture the “federation” lag. There’s no code I’m aware of that allows admins to prioritize outbound federation traffic. I could be wrong though.

      Sooo … what’s the deal with lemmy.ml … that seems to have gone beyond lag and is basically falling over … seems like the devs have neglected their own instance’s health?

      I just collect the data.

      What’s that Redash? Is it a plotly thing or some other product that just uses their graphing library? How have you found it?

      https://redash.io I don’t remember how I found it. Probably an “awesome” list on github.

  • tenth@lemmy.world
    link
    fedilink
    English
    arrow-up
    10
    ·
    1 year ago

    Great idea. I was trying to figure out if it was lemmy.world trying to deal with new users or a bug with Memmy app that caused random errors

    Is it possible to have the lag metrics by instances in a table format? Its so hard to view your site on mobile

    • hawkwind@lemmy.managementOP
      link
      fedilink
      English
      arrow-up
      5
      ·
      1 year ago

      I didn’t even load it on mobile. I will check it out tonight and maybe just create a separate “mobile friendly” dashboard.

      • Wailzy@lemmy.world
        link
        fedilink
        English
        arrow-up
        8
        ·
        1 year ago

        Not the person you’re replying to, but I didn’t find it awful on mobile. The zoom by dragging worked well, as did the double tap to view the whole dataset.

        For a quick browse I wasn’t frustrated at all and found the information I wanted to in a short amount of time!

  • UncleStewart@lemmy.world
    link
    fedilink
    English
    arrow-up
    10
    ·
    1 year ago

    On mobile, when touching the “Federation Lag-o-meter (now - 1h)” statistics, the page is hard to scroll. Other than this the page is gold

    • hawkwind@lemmy.managementOP
      link
      fedilink
      English
      arrow-up
      7
      ·
      edit-2
      1 year ago

      Fixed! The regex was not getting content from < 0.18.0 instances. Thanks!

      EDIT: I am wrong, it was something else in feddit.de’s messages I THOUGHT was a version thing, but must be a localization thing. A string in the JSON was breaking some regex. Regardless… fixed.

  • Ranger@programming.dev
    link
    fedilink
    English
    arrow-up
    9
    ·
    1 year ago

    Graph should remove the outlier as it is skewing the results for every other instance and not letting to see smaller numbers show up.

    Or we should move to log scale so that it can be displayed correctly.

  • aleph@lemm.ee
    link
    fedilink
    English
    arrow-up
    9
    ·
    1 year ago

    When I saw the bar looking like the Burj Khalifa, I assumed it was .world instead of .ml. Interesting.

    Props to Ruud@lemmy.world for dealing admirably with the Rexxit hug of death.

    • nsfw_alt_2023@lemmynsfw.com
      link
      fedilink
      English
      arrow-up
      10
      ·
      edit-2
      1 year ago

      I’m expecting that JSON parsing is a huge overhead with the fediverse. I work on a SAAS that needs to do all its internal processing in under 10 ms, and serializing/deserializing ends up being a sizable chunk of server time. I saw a 40% reduction in runtime using simdjson for deserializing, and there exists a rust crate for it, but I haven’t had time to look the Lemmy code over.

      Can anyone with an overloaded instance get on their command line and gather a decent flamegraph so the performance folks can aim optimizations in the right direction?

      https://github.com/brendangregg/FlameGraph

      • aleph@lemm.ee
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        Yep, it seems completely different to when I last looked.

        It seems everyone gets a turn a top.

  • thegiddystitcher@lemm.ee
    link
    fedilink
    English
    arrow-up
    7
    ·
    1 year ago

    It’ll be interesting to see how this changes through the day! I know .world tends to slow down later in the day when the US contingent is getting going.

    (also, yay lemm.ee)

  • possum@lemmy.ml
    link
    fedilink
    English
    arrow-up
    7
    ·
    1 year ago

    This is awesome! Hopefully it’ll help spread the load among instances. Definitely going to use this to see which instance to move to (and which to avoid)

    • hawkwind@lemmy.managementOP
      link
      fedilink
      English
      arrow-up
      7
      ·
      1 year ago

      Keep in mind this is a one hour snapshot. I am working on a historical rating as well to give a better indication of overall long term stability.

  • adrian@kbin.social
    link
    fedilink
    arrow-up
    6
    ·
    1 year ago

    This looks great. Is there any chance that this could be extended to include Kbin as well, since those instances federated with Lemmy, too?

        • hawkwind@lemmy.managementOP
          link
          fedilink
          arrow-up
          1
          ·
          1 year ago

          kbin posts DO show up in the details table. you would need to know the ip they are coming from. they don’t include their instance host name in the header, which is why it’s not in the table and instance is null for some IPs. also I don’t scrape and subscribe kbin magazines like i do for lemmy ATM, so the traffic will be low. probably just a few from kbin.social.

  • FakeJake@fr3diver.se
    link
    fedilink
    arrow-up
    5
    ·
    1 year ago

    This looks really good.

    As an admin of a small kbin instance, I’ll be keeping an eye on updates from you as this will be very handy!

  • cornflour@lemmy.ca
    link
    fedilink
    English
    arrow-up
    2
    ·
    1 year ago

    This is really cool! Would it be possible to grab this data as json, csv or some other equivalent format? I’m working on making my own lemmy client and this would be very helpful to be able to display i think