I most often just get it straight up misunderstanding how the test framework itself works, but I’ve definitely had it make strange decisions like that. I’m a little convinced that the only reason I put up with it for unit tests is because I would probably not write them otherwise haha.
I’ve used it most extensively for non-professional projects, where if I wasn’t using this kind of tooling to write tests they would simply not be written. That means no tickets to close either. That said, I am aware that the AI is almost always at best testing for regression (I have had it correctly realise my logic is incorrect and write tests that catch it, but that is by no means reliable) Part of the “hand holding” I mentioned involves making sure it has sufficient coverage of use cases and edge cases, and that what it expects to be the correct is actually correct according to intent.
I essentially use the AI to generate a variety of scenarios and complementary test data, then further evaluating it’s validity and expanding from there.