HCC class lecture 8 John Canny 2/23/09
Vygotsky s Genetic Planes Phylogenetic Social-historical Ontogenetic Microgenetic What did he mean by genetic?
Internalization Social Plane Social functions Internalization Scaffolding Showing, explaining Listening and reading Internal (mental) functions Internal (mental) Plane
Externalization Social/historical Plane Social/historical artifacts Externalization Talking, Writing Internal (mental) functions Internal (mental) Plane
Internalization/Externalization
Power Laws Pick a corpus such as: English (collection of many samples) Works of Shakespeare James Joyce s Ulysses and count the occurrences of each word. Sort in decreasing order, let r be the rank in this order. Then f (r) f ( r) c α r where is the frequency of the word of rank r.
Power Law alternate form Instead of frequency vs. rank, we can plot frequency vs. number of sets with that frequency. g ( i) c' i β The value β in this form is related to α via β=1/α+1. This was Zipf s original form, and the one analyzed by Newell.
Examples of Power Laws Note: size vs frequency of that size Zipf s original form
Examples of Power Laws These are in rank-frequency form.
Examples of Power Laws
Examples of Power Laws
Examples of Power Laws
Examples of Power Laws
Examples of Power Laws Also Number of users Facebook friends The popularity of Facebook apps Number of pages in web sites Number of links into a web site Number of links out of a web site
Preferential Attachment
Yule s law (1925) Pure birth process: Only new species are added Number of species in each genera Genera
Literary Theory: Structuralism Looks for structures in the domain of study, e.g. literature or anthropology, and their relation to other Structure includes local (sentence) structure as on the next slide. Also includes deeper structures such as role and plot. E.g. West Side Story is the same plot structure as Romeo and Juliet Structuralists often look for universal structures, e.g. Freud s Oedipal complex
Literary Theory: Structuralism
Bakhtin: The Dialogic Imagination Multiple voices are evident in a text: heteroglossia or multivocality or polyphony.
Kristeva: Intertextuality Kristeva elaborated Bakhtin s ideas into the theory of intertextuality: Texts borrowed and adapted from other texts. Allusion Characters Plot Form Scene
Barthes: S/Z A text is... a multidimensional space in which a variety of writings, none of them original, blend and clash. The text is a tissue of quotations... The writer can only imitate a gesture that is always anterior, never original. His only power is to mix writings, to counter the ones with the others, in such a way as never to rest on any one of them Lexia
Simon s model of texts Text is built by sampling earlier texts: Association: sampling earlier passages in the same corpus. Imitation: sampling segments of word sequences from other works he has written, from works of other authors, and, of course, from sequences he has heard.
Simon s model of texts Statified sampling: Sampling and re-assembly of small segments of text. The choice of which segments to assemble does not have to be random.
Simon s model of texts Simon s model explains the familiar Zipf curve. Limitations: Pure birth process* Should work for different notions of strata * But birth-death processes in equilibrium also produce Zipf curves
Genetic Laws We have given an explanation of Power Law behavior in texts via internalization/externalization:
Genetic Laws Other similar phenomena may be explained in this way: Sales of books, or many other items Citations of scientific articles Number of pages in web sites Number of links into a web site Number of links out of a web site Number of users Facebook friends The popularity of Facebook apps
Language as Action What we have seen so far: Many choice phenomena show the fingerprint of internalization/externalization and genetic origin. This includes language both collective and individual. Is there a more general link between language and action, as Vygotsky and others have suggested?
Georgia Tech Home 26 occupancy sensors Data recorded over several weeks
N-grams N-gram are sequences of n tokens, in this case n sensors The following is a 6-gram sequence of locations: 3-11-27-12-19-20
N-gram statistics Not only words in English, but n-grams of words in English follow power laws*. In the smart home data, n-grams are a more reasonable unit of analysis than individual sensor sites. We might expect to see power law behavior if movement about the house is governed by familiar habit rather than optimal movement or planning. * For small corpora, the n-gram stats for n>1 are often closer to an exact power law than for 1-grams (words).
N-gram statistics Here is the data from the smart home experiment in Zipf s original form. All plots show a β close to 2, which corresponds to α close to 1. Slope β increases slightly as n increases (so α decreasing)
Conclusions There appears to be a genetic mechanism at play, even in simple physical movement about the house. At least from one perspective (n-gram analysis), language and one type of action are remarkably similar. Many other human phenomena show power law behavior, either through internalization/externalization or purely internal mechanisms.
Discussion questions 1. Suggest another measure of human behavior that might show genetic dynamics, and research whether it shows power law behavior (do a web search). Be prepared to explain the genetic mechanism. 2. Discuss the freedom of the author given the statistical similarities of new texts to old ones.