The parser simply looks through each book and pulls out the various descriptions of nouns. Project Gutenberg was the initial corpus, but the parser got greedier and greedier and I ended up feeding it somewhere around 100 gigabytes of text files - mostly fiction, including many contemporary works. Eventually I realised that there's a much better way of doing this: parse books! While playing around with word vectors and the " HasProperty" API of conceptnet, I had a bit of fun trying to get the adjectives which commonly describe a word. The idea for the Describing Words engine came when I was building the engine for Related Words (it's like a thesaurus, but gives you a much broader set of related words, rather than just synonyms).
0 Comments
Leave a Reply. |