Arun Murthy has put up Yahoo!'s recommended Hadoop best practises.
These are good as they show what things are bad -generally anything that bothers the namenode too much, or any work where the input or intermediate files aren't that big. Small jobs are the enemy. Presumably Arun's team are monitoring stat's and identifying the troublemakers. What you could do there is just recognise these "inefficient" jobs and schedule them differently; allowing them, but penalising the caller.