Welcome! Log In Create A New Profile

How to use context-free grammar

Posted by RiaanR

Announcements		Last Post
	SoC Curricula	09/30/2017 01:08PM
	Demarcation or scoping of examinations and assessment	02/13/2017 07:59AM
	School of Computing Short Learning Programmes	11/24/2014 08:37AM
	Unisa contact information	07/28/2011 01:28PM

Forums Messages New

RiaanR

How to use context-free grammar
May 09, 2007 11:58AM

Registered: 18 years ago
Posts: 898
Rating: 0

How to use context-free grammar, in a "constructive" way.

pdos.csail.mit.edu

One of there "generated papers" got accepted to WMSCI 2005. grinning smiley

Reply Quote

lycium

Re: How to use context-free grammar
May 09, 2007 12:47PM

Registered: 17 years ago
Posts: 2,813
Rating: 0

a buddy of mine released an opensource implementation of that, it's pretty fun to use (choosing which bodies of text to mix and looking at the result).

Reply Quote

goose

Re: Re: How to use context-free grammar
May 09, 2007 02:14PM

Registered: 18 years ago
Posts: 3,793
Rating: 0

Man, that's excellent!

Only problem, some of the "generated" papers look very similar to my
own ideas for my "deliverables" in the coming year.

Phrase to remember -> "deployment of moores law"

Reply Quote

lycium

Re: Re: Re: How to use context-free grammar
May 09, 2007 09:00PM

Registered: 17 years ago
Posts: 2,813
Rating: 0

i don't recall how my friend did it, but you can easily implement this in c++ (or java i guess) using a multimap: http://www.cppreference.com/cppmultimap/index.html

simply loop over your text, and add each (word,word) pair to the multimap. that's it for the (simplistic) building.

to generate the body:

1. pick a random word in the mmap
2. construct a discrete probability density function on the current word and use it to pick the next:
2a. find the sum S of the counts of all associated words
2b. pick a random integer K in the range [1,S]
2c. use that to pick the next word (i.e. map K into the bin to which a word belongs)
3. emit the word and goto 2 until your thesis is done winking smiley

what we're doing here is constructing a markov chain: http://en.wikipedia.org/wiki/Markov_chain
let it be known to all that the monte carlo method is the greatest thing on earth.

edit: i just thought of a rather natural generalisation of the algorithm, which ought to produce much more coherent text. what we do is start taking context into account (i.e. this will no longer be a CFG); this has the downside that you'll probably start running into strings of words that are too structured. how to get around it? switch between the two modes. how to do that? probabilistically, based on the word PDF (ie if the word-pair matching is a poor one for the current one, we'll tend to not use it). since this is probably not making sense to anyone, here's the algorithm (completely untested), which is easily implemented using the stl pair and tuple templates:

0: make two multimaps, one for (word,word) pairs and one for ((word,word),word) pairs, call them C0 and C1.

1. pick a random word in the mmap

2. construct (up to) two discrete probability density functions on the current word and use them to pick the next word:

2a. find the sum S1 of the counts of all associated word-pairs from C1

2b. pick a random integer U in the range [1,S1] and get the probability P1 of the word-pair to which it maps (P1 = count(wordpair) / S1)).

2c. if P1 is greater than some "context binding" threshold (i.e. is a good match), goto 3

2d. repeat the process using the PDF given by C0 to get the next word

3. emit the word and goto 2 until your lecture notes on quantum field theory are done winking smiley

this process of "dependent context boosting" can be extended to arbitrarily high orders, with the higher orders cascading down to the lowest (0th order) using some exponential distribution for "context binding" thresholds (instead of just a constant) to compensate for the P^n (where P < 1) falloff of probabilities given n sequential words.

i'd be surprised if more than *thumbsuck* 4 words of context makes a noticable difference, but then any REAL results are more than welcome winking smiley

of course the more context one uses, the larger the body of text used to build the PDFs should be, to give the context matching "more legroom" in which to work.