Unsupervised Semantic Parsing of Video Collections