Welcome! Log In Create A New Profile

Advanced

smile How to use context-free grammar

Posted by RiaanR 
Announcements Last Post
Announcement SoC Curricula 09/30/2017 01:08PM
Announcement Demarcation or scoping of examinations and assessment 02/13/2017 07:59AM
Announcement School of Computing Short Learning Programmes 11/24/2014 08:37AM
Announcement Unisa contact information 07/28/2011 01:28PM
avatar smile How to use context-free grammar
May 09, 2007 11:58AM
How to use context-free grammar, in a "constructive" way.

pdos.csail.mit.edu

One of there "generated papers" got accepted to WMSCI 2005. grinning smiley
avatar smile Re: How to use context-free grammar
May 09, 2007 12:47PM
a buddy of mine released an opensource implementation of that, it's pretty fun to use (choosing which bodies of text to mix and looking at the result).
avatar smile Re: Re: How to use context-free grammar
May 09, 2007 02:14PM
Man, that's excellent!

Only problem, some of the "generated" papers look very similar to my
own ideas for my "deliverables" in the coming year.

Phrase to remember -> "deployment of moores law"
avatar smile Re: Re: Re: How to use context-free grammar
May 09, 2007 09:00PM
i don't recall how my friend did it, but you can easily implement this in c++ (or java i guess) using a multimap: http://www.cppreference.com/cppmultimap/index.html

simply loop over your text, and add each (word,word) pair to the multimap. that's it for the (simplistic) building.

to generate the body:

1. pick a random word in the mmap
2. construct a discrete probability density function on the current word and use it to pick the next:
2a. find the sum S of the counts of all associated words
2b. pick a random integer K in the range [1,S]
2c. use that to pick the next word (i.e. map K into the bin to which a word belongs)
3. emit the word and goto 2 until your thesis is done winking smiley

what we're doing here is constructing a markov chain: http://en.wikipedia.org/wiki/Markov_chain
let it be known to all that the monte carlo method is the greatest thing on earth.

edit: i just thought of a rather natural generalisation of the algorithm, which ought to produce much more coherent text. what we do is start taking context into account (i.e. this will no longer be a CFG); this has the downside that you'll probably start running into strings of words that are too structured. how to get around it? switch between the two modes. how to do that? probabilistically, based on the word PDF (ie if the word-pair matching is a poor one for the current one, we'll tend to not use it). since this is probably not making sense to anyone, here's the algorithm (completely untested), which is easily implemented using the stl pair and tuple templates:

0: make two multimaps, one for (word,word) pairs and one for ((word,word),word) pairs, call them C0 and C1.

1. pick a random word in the mmap

2. construct (up to) two discrete probability density functions on the current word and use them to pick the next word:

2a. find the sum S1 of the counts of all associated word-pairs from C1

2b. pick a random integer U in the range [1,S1] and get the probability P1 of the word-pair to which it maps (P1 = count(wordpair) / S1)).

2c. if P1 is greater than some "context binding" threshold (i.e. is a good match), goto 3

2d. repeat the process using the PDF given by C0 to get the next word

3. emit the word and goto 2 until your lecture notes on quantum field theory are done winking smiley

this process of "dependent context boosting" can be extended to arbitrarily high orders, with the higher orders cascading down to the lowest (0th order) using some exponential distribution for "context binding" thresholds (instead of just a constant) to compensate for the P^n (where P < 1) falloff of probabilities given n sequential words.

i'd be surprised if more than *thumbsuck* 4 words of context makes a noticable difference, but then any REAL results are more than welcome winking smiley of course the more context one uses, the larger the body of text used to build the PDFs should be, to give the context matching "more legroom" in which to work.
Sorry, only registered users may post in this forum.

Click here to login