The news mod team has asked to no longer be a part of the project until we have a composite tool that polls multiple sources for a more balanced view.

It will take a few hours, but FOR NOW there won’t be a bot giving reviews of the source.

The goal was simple: make it easier to show biased sources. This was to give you and the mods a better view of what we were looking at.

The mod team is in agreement: one source of truth isn’t enough. We are working on a tool to give a composite score, from multiple sources, all open source.

  • AbouBenAdhem@lemmy.world
    link
    fedilink
    English
    arrow-up
    10
    ·
    edit-2
    3 months ago

    I was thinking of something like the graph of subreddits from this paper—although I think that’s based on active user overlap, and I don’t know if there’s a similar metric that would cover all news sites.

    • steventhedev@lemmy.world
      link
      fedilink
      arrow-up
      5
      ·
      3 months ago

      I don’t see an easy way to accomplish this without either pulling in the full text of every article over some period and running something like paragraph/doc/site vectors and then clustering by site vector.

      That’s putting a lot of faith into unsupervised learning, and it’s probably just as likely to pick up on stylistic conventions like byline and date formats as it is to cluster by some common thematic pattern like political leaning.

      • AbouBenAdhem@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        3 months ago

        Maybe you could use a source site’s posts and upvotes in different fediverse communities as a proxy (assuming you could find representative communities with a similar range of biases).

        • steventhedev@lemmy.world
          link
          fedilink
          arrow-up
          1
          ·
          3 months ago

          That’s…actually not a bad idea. Take the user-domain name pairs and weigh the edges between domains by the number of unique users who posted from both domains.

          For producing clusters from the resulting graph should be easy, but aside from just saying “these are similar websites” does it really say much?

          You could do something similar with comment/upvote/downvote based linkages - maybe they’ll have some deeper semantic meaning