[an error occurred while processing this directive] [an error occurred while processing this directive]       Comp200 Final Project [an error occurred while processing this directive]
COMP 200 Final Project:  Text Riffing

Once upon a time, there was a man named Mark V. Shaney...

In the days of yore, long ago before Twitter and Facebook and even MySpace, the computer lonely hearts hung out on-line at a place called net.singles (link purposely omitted for propriety reasons) which was a Usenet newsgroup, a text-only discussion group sans all the bells and whistles we now take for granted.

In 1984, net.singles was joined by a new member, Mark V. Shaney.  An off-beat, sometimes humorous, philosophical poet, his posts ran the gamut of thought:

I'm not prone to either violence or obscenties, but the outside pressure was too much for both of us.  I live for the moment.  I find (among yuppies) that the readership of net.singles is the same indecision, embarassment, and confusion that they refer to themselves as the yuppies- you figure it out!  You know lately I have ever directed personally insulting remarks to anyone in my brief (and very enjoyable) net career.  Now, let's get the definition of teasing to be meeting a lot of people.  We've been together a year now, but I won't go back through old news articles to get your opinions.  (I bet you can guess how much mail I've been getting recently:-) Send in your responses and hopefully we can come up with better ones.

-- Oct 2 1984, 3:44 pm 

All right, I'm going to post this.  I've been in the spiritual world. There are many levels of relationship.  You can't give love to another person until you have a surplus in the lousy situation you now find yourself stuck in, no matter how bad it looks, you aren't really stuck in the past...but not overdo it.

"Recognizing" that something is "wrong" without having a logical reason behind that statement to back it up is indeed identical to being rather distant and at times most unpleasant.  A woman focuses on the other hand...  I'm still looking.

You have the right to want and to have my cake and eat it too (no responsibilities -- I guess) I don't fit the full sense of the spiritual energy into each others bodies, rather than letting it dissipate into the DMZ, with mutual assurance and safety.  The best way I know to do this is either going to be learned, growth opportunites.


-- Oct 16 1984, 3:01 pm

And what's a philosopher without angst?:

Lately I've had an experience which has to do with a mental and emotional template of how their "mate" should be.  I asked her if I have just two questions - when is the next one, and who are you?  She responded "yes" (without hesitation).  When is the impression you have to expect feedback like 'suggested areas I should "work on"'?  I asked her out to dinner for a couple of days (despite what people would be gone for about a week and that sometime after that would be saying about us)?  So I asked her if she called up and reached the machine. She said she would.  I then told her that I had met once before at a concert.
 
I have just two questions - when is the way mating occurs.


-- Oct 25 1984, 3:23 pm

Admired by some, but often criticized, he once responded

I'm sorry I raised my voice there - but I stand by my statement.  I pointed out "one person" because I have not encountered difficulties with someone feeling "insecure" about "inferior intelligence", I have encountered a quite different problem, numerous times: that of the current political situation.

For my final comments on this subject in this broader question would be vehemently opposed by a number of people, who would feel that you cross-posted your comment to net.philosophy and net.religion.  This says a lot.

On the other person trying to "stifle" my thinking.  This typically involved critical comments like "You think too much. You are absolutely right".


-- Jun 26 1985, 1:52 pm

For a complete search of Mark V. Shaney's net.singles posts click here.

But who was Mark V. Shaney????

Mark's "parents" were Bruce Ellis, Rob Pike and Don Mitchell.    This is less progressive than it sounds--it was 1984 remember.   On the other hand, it was 1984....

Mark was born in 1984 at  AT&T Bell Laboratories in Murray Hill, NJ.  He's not bad for a someone less than 1 year old, isn't he?

Mark lives inside of a computer.   He's a computer program.   Here's the Wikipedia article on him.

 

The Turing Test

In 1950, Alan Turing proposed a test to see if a computer program could demonstrate true intelligence.   The idea of this famous "Turing Test" was simple:  A human interrogator holds a conversation with two unseen entities, one of which is human and one of which is a computer.  If the human interrogator cannot deduce from the conversation which of the conversationalists is human and which is a machine, the computer will be said to have "passed" the test.

Did Mark V. Shaney pass the Turing Test?

The other participants in the net.singles group certainly interacted with Mark V. Shaney as if he was a real person, commenting on his posts and arguing with his prose.    This was not unlike how people interacted with the earlier 1966 "Eliza" program that simulated a patient, though not always very helpful psychotherapist.   Try out Eliza for yourself here.

 

How did Mark V. Shaney do it?

The Mark V. Shaney program is an implementation of what is called a Markov Chain.  His name is actually a play on "Markov Chain"  (you have to be a bit of a geek to get it).     In a nutshell, a Markov Chain is a process where one progresses along from one state to another based on probabilities determined by the state you are in and possibly some of the states you have been in.  That is, the probability of where you are going depends on the where you have been.

The Mark V. Shaney program is generates text based on existing text, in this case, the other posts to the net.singles discussion group.  Essentially, the program looks at pairs of words and calculates the probability that the pair is followed by a particular third word.    Thus, a new text can be generated by starting with any existing pair of words and then randomly choosing, as per the previously calculated probabilities, the third word.  Then the second and third words are used as the pair, and a new third word is chosen.   The process is then repeated, with the pair of words used to calculated the next word sliding along, creating a text of words as long as you wish.    This is referred to as a "2nd order Markov Chain" because the latest two states (words) are used to calculate the next state (word). 

Here's the text of a Scientific American article on the Mark V. Shaney algorithm (sorry, the illustrations were lost -- go find the paper copy in the library)

Here are some sample Markov Chain analyses:

Generating the word probabilities:

This process is fairly straightforward.  For an n'th order Markov Chain, use a preferably large, reference text and

  1. Take the first n words as a set-- in Mark V. Shaney's case, that would be 2 words. 

  2. Record the set and the fact that it was followed by the next word with a count of 1. 

  3. Drop the first word of the set and add the next word as the last word of the set.  This is your new set. 

  4. Look at the word following the current set of words  and record the new set and the word following it

  5. Repeat the above process from steps 3-4 until all the words in the reference text are processed.

  6. After you are done, normalize the counts by dividing by the total following word counts for each word set.   This will give you the probability of each following word.

Generating the new text riff:

Generating a text riff is very similar to the process of generating the word probabilities.  So, using the word probabilities calculated above,

  1. Randomly select any existing word set.  Record this as the first words of the text riff.

  2. Based on the probabilities for the following words for that set, pick one of the following words.  Record this as the next word of the text riff.

  3. Drop the first word of the set and add the chosen following word as the last word of the set.   Why is this new set guaranteed to being the word probabilities data?

  4. Repeat steps 2-3 until the desired number of words in the text riff is achieved.

To try out Mark V. Shaney for yourself, try the following on-line version.   Cut and paste text from the samples provided in the Final Project folder in the Owlspace Resources site into the appropriate text box in the following sites:

Markov Chains in Real Life

Markov chains have wide-ranging applications in almost every aspect of science and technology.  Since they model a sequence of events, many dynamic process, especially those where there is uncertainty involved, are potentially describable by Markov chains.   The following is just a sampling of its applications:

 

The Final Project

The final project for Comp200 will be to develop, over the course of several milestones, the necessary codebase to perform text riffing based on n'th order Markov models of arbitrary given texts.   In the end, you should be able to take a give text file, e.g. the complete works of Shakespeare, and produce a text riff of a given length based on a given Markov chain order.   

In essence, this is a generalization of the Mark V. Shaney program and with it, you will be able to explore how the order of the Markov chain affects the our ability to create a "believable" text riff as well as to gain some interesting insights into the statistical nature of literary works.

In a nutshell:

The algorithm we will pursue to solve this text riffing problem consists of the following basic outline:

  1. Read  a text file into Scheme, removing all whitespace and present it as a list of words.   This capability is provided in the supplied utilities, so there's no additional work to be done here.
  2. Given a desired order, n, for the Markov chain, create a dictionary (hash table) that relates n words in sequence as a key to a value that is another hash table that relates each word that follows the associated key word set in the text with the relative probability of that word with respect to another other subsequent words.
  3. Given the above dictionary, randomly pick a starting word set from the set of keys.  This becomes the first key word set and is also, when converted to a string, the initial result value. 
  4. Given a key word set, a subsequent word is then randomly based on each available word's probability.  This word is appended to the result with the appropriate number of spaces in between.
  5. A new key word set is created by appending the chosen subsequent word and then removing the first word (the oldest one) in the word set.  
  6. Given this new word set key, the algorithm is repeated from steps 4 -6 until the desired number of words has been produced.

Step 0: Create a directory called "final project" that holds all your materials for the final project.

A Peek Ahead...

More supporting materials and instructions will be available shortly, but here is a list of topics that you will need to understand in order to complete the project.   We have not yet covered all of these topics, so expect them in the near future, but do brush up on those that we have already covered.

Required concepts and skills:

Milestones

The final project will consist of two milestones: an intermediate check-point ("Milestone 1") and the final product ("Milestone 2").  Milestone 1 is essentially step #3 in the algorithm described above.   Milestone 2 is essentially steps 4-6 in the above algorithm.

Please see the Owlspace site for the due dates for each milestone.  

For detailed information for the requirements for each milestone, see the following pages:

START EARLY!!!  This project may take many, many days to complete, so plan accordingly!

Acknowledgement

The authors would like to acknowledge and thank Dr. Devika Subramanian of the Rice Computer Science Dept., for first introducing this topic to the authors.  The credit for the term "text riffing" should go to her as well.  To see Dr. Subramanian's often more eloquent and definitely more colorful treatment of this topic, please click here.   Please do note that Dr. Subramanian's discussion, besides being in Python rather than Scheme, is also slightly different that ours will be, in the way that the probabilities are represented, in the functional breakdown and in that our treatment is for the general, n'th order case.