06 October 2007

Computer Science paper titles

Take the DBLP database of papers in Computer Science, grab all the titles, do a frequency count for the words, keep only nouns. You get the list of the 20 most used nouns in CS paper titles:

  1. systems
  2. system
  3. data
  4. analysis
  5. networks
  6. model
  7. design
  8. algorithm
  9. approach
  10. information
  11. time
  12. software
  13. distributed
  14. learning
  15. network
  16. performance
  17. parallel
  18. web
  19. control
  20. algorithms

(Note that I did not conflate singular with plural usages.) Remembering Shannon's theory, next time when you see a title made out of almost only these words you should realize that it is a CS paper, but not much more is revealed by the title.

