You don't need an autonomous car to take pictures, record video, develop machine vision, develop object recognition, or do mapping. There are many different ways to get the data being employed and with cloud computing you don't need to build much infrastructure.
I don't have any inside info here but simulation seems better because you can create situations that are unsafe or difficult to generate in real life. The nVidia demo makes it look pretty sophisticated in generating different lighting conditions and simulating actual sensors/camera data given their location on the simulated vehicle. You can simulate sensors failing. You can let accidents happen. Waymo does 1000 times more simulated miles than road miles.
I wonder if there is a possibility that some small company seemingly comes out of nowhere with a superior solution. They would get snatched up by a manufacturer or large auto supplier and then take a little while to make their way into the manufacturing line, but this is basically what happened with Cruise. There are many companies like Cruise out there.
The fundamental problem with development using only simulation (setting aside machine vision/perception issues) is that the system (traffic) is extremely non-linear. In real life, as soon as the machine decides what to do and takes some action, that action has an effect on the perceptions and actions of others - and so on. So, without putting the decision making of the car into action with possible errors, precision, latency, etc., the scene changes in a way that cannot be simulated by the time you get a few seconds beyond that initial decision. It's like trying to figure out what the board will look like many moves ahead in a chess game while watching someone else play your side. That kind of learning - watching someone else play while you think about what you would do - only gets you so far.
Accidents certainly do happen (just ask GM and Waymo) but you are correct that there are a lot of disengagements that perhaps precede a major learning opportunity, and without letting the scenario play out, you don't really know what would have happened. Here again, those bold enough to let the machines do as much as possible will progress faster.
As far as Nvidia goes, you can check their disengagement report at the link I posted, and it's not pretty - about 100 disengagements in only 500 miles. Their simulation of lighting and atmospheric effects may look great, but human vision is pretty special in screening away artifacts that can drive machine vision crazy without you even being aware of it, so what looks "real" to you may be a far cry from real lighting, atmospheric conditions, and textural effects as far as machine vision goes. It's even more complicated in lidar space, where the artifacts are non-intuitive for us and it's difficult to judge the fidelity required for useful simulation.
As for Cruise, you can see by the improvement they achieved since they started in the 2016 and 2017 reports, and they were basically completely in the weeds when they started public testing (about where Nvidia is now) compared to where they are now - almost a factor of 1000 different in both total miles and miles/disengagement.
There are other small startups similar to Cruise, and as I mentioned I know some people involved in one such effort that is about where GM/Cruise was roughly a year earlier. They insist that their road testing is critical. Some of these guys will probably still get bought up, but others profess to want to go all the way with their ideas. The window for startups is closing fast though. I'm not sure any of the major manufacturers are interested buying up outside IP anymore, and even the best funded among them can't go up against the big players in building and deploying large fleets of AV for ridesharing services, which is coming up fast.