A common issue I have come across in the past is that I need to extract a set of keywords from a web page or article. After having checked on OpenCalais and Alchemy I stumbled upon YQL (Yahoo Query Language) from the Yahoo Developer Network.
This is such an awesome service, it’s a shame it hasn’t gained more popularity!
To analyze a text, webpage or RSS feed and extract the keywords all you have to enter in the YQL Console is:
# extract keywords from text select * from search.termextract where context="Italian sculptors and painters of the renaissance favored the Virgin Mary for inspiration" # generate keywords from a web page select * from search.termextract where context in (select content from html where url="http://en.wikipedia.org/wiki/Black_Friday_(1945)") # extract keywords from a RSS feed select * from search.termextract where context in (select title from rss where url="http://rss.cnn.com/rss/edition.rss") # generate keywords from a RSS feed and sort the results select * from search.termextract where context in (select title from rss where url="http://rss.cnn.com/rss/edition.rss") | sort(field="Result") # remove duplicates and sort the results select * from search.termextract where context in (select title from rss where url="http://rss.cnn.com/rss/edition.rss") | unique(field="Result") | sort(field="Result")
It’s pretty easy to include this in your scripts thanks to the free API they offer.
Here is how to make a REST query:
http://query.yahooapis.com/v1/public/yql?q=select * from search.termextract where context in (select content from html where url="http://en.wikipedia.org/wiki/Black_Friday_(1945)") | unique(field="Result") | sort(field="Result")
respective:
http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20search.termextract%20where%20context%20in%20(select%20content%20from%20html%20where%20url%3D%22http%3A%2F%2Fen.wikipedia.org%2Fwiki%2FBlack_Friday_%25281945%2529%22)