Unless a photo is geo-tagged with Latitude/Longitude coordinates and tagged with precise heading using a magnetometer, then there's no automated way to stitch millions of photos from random people together. Again, people use different settings, different depths of field and focus, different lenses, shoot at different times of day so there is lighting/shadow discrepancies, and so on. As a professional photographer, I'd love get my hands on this technology you speak of.
Street View uses 360º photos at regular intervals to give you the smooth transitions you see. That cannot be replicated by random photos taken by people around the world.
What can and is being done is have people's photos of an area viewable in context in Street View as whatever explained, but not as the main 360º view of an area. Google's system works great, I can't wait to see Toronto's Street View go live.