I have started to bump up against the limits of many LLMs for programming tasks. At a certain scale, responses degrade and often cause regressions. I have been impressed with o1, though. It's ability to "reason" by employing a "chain of thought" has significantly increased the ceiling.