Basically exactly what the title says. In case there isn’t a great place, or this post ends up getting more visibility than wherever I end up asking I will explain my approximate competency level and the question below.

In terms of competency I have an engineering background and degree, which means I had a single class in statistics. Technically I was one class short of a math minor (Graph Theory) when I graduated. Unlike most engineers and Six Sigma “graduates” I don’t think this automatically makes me some kind of math/stats wizard. I’m aware I know just enough that I can unintentionally massage data to fit my bias (mini rant over).

My question is, when looking at a human population and trying to find the approximate subset of people with certain attributes how are correlations handled to avoid double counting?

For example let’s say I am looking at a specific city and my data sets are thee most recent census, BLS.gov, and Pew Research. With the above sources I can pretty easily estimate something along the lines of

The number men in a US city that are:

  • Between the ages of 22-44
  • Have a STEM degree

However, if I then wanted to add another factor:

  • Are/Vote liberal

I know that is going to interfere with the original criteria because higher levels of education are correlated with people being more liberal, thus if I just punched in the percentages from all three data points the resulting number is likely going to be much smaller than reality.

Is there a term or method I can read up on for how to account for overlaps/correlations between population subsets? Does this make sense or am I asking the wrong kind of question?

FWIW none of this is related to my job, an argument, a shit post, a data graphic, or anything else I will ever really make. It’s just for something specific (not the actually the above example but something like it using the sources I mentioned) I am personally curious about. I have also more generally been wondering about how to account for this kind of overlap for a couple of years now.

Regardless, thanks for taking the time to at least read all this.

Cheers!

  • lolola
    link
    fedilink
    arrow-up
    3
    ·
    10 days ago

    I don’t know of such a community, but I’ll point you in the direction of conditional probability.

    Also:

    Six Sigma “graduates”

    Lol fair description