147 Automatic tracking of tongue movement in speech utterances by a computer
In July, 1973, I landed in Newark Airport with my father, one of the leading phonetics research scientists in the world. Just before the airplane skidded onto the runway, feeling a bit queasy, I looked toward the east and saw the Twin Towers, freshly built. I put my head on my father’s lap, and he stoked my head. I still remember the slight aroma of his cologne, mixed with his distinct musk of his breath. This aroma was one of the few things I knew about my father as a child. This is America, the towers seem to speak then, the land of technology and commerce.
My father was one of many pure research scientists recruited to work at Bell Telephone Laboratories, Murray Hill, New Jersey. He was to help create their Speech Sciences department, and ended up spending over 15 years there. The AT&T divesture would split the company in 1984, ending an era of the highest accomplishments in the areas of pure research in the modern times. He left then to head up the Speech and Hearing Sciences department at Ohio State University.
Max Mathews, my father’s boss at Bell Labs, picked us up that hot day from the airport. Out of the window of their boxy sedan, my eyes were filled with green: the trees and grass seemed so sumptuous and luxurious to me. I saw an old man with a Scottish corduroy hat with burgundy crossed patterns, mowing the lawn. I had never seen anyone mow a lawn in Japan. I stayed with the Mathews for the summer, while my parents got ready to move to New Providence, N.J. Though this land seemed so new, and the language seemed so foreign, I was merely reacquainting myself with the country of my birth.
When I was born in Boston, in 1960, my father was finishing up his post doctorate thesis at M.I.T. at the Research Laboratory of Electronics, working with Noam Chomsky. My father brought "generative grammar theory" to Tokyo University as a result. It seemed, though, that he had always planned on coming back to the States. Even with his tenured position at Tokyo University, he knew he would not be able to quell his mission to pursue pure research. He would become the first tenured professor at the prestigious university to abandon his post, to the outrage of many of his fellow academics.
He had a humble beginning at Tokyo University, though. As an undergraduate student, in post-war Japan, he almost failed out as a physics major. He caught the attention of one professor who was creating a new area of information science research, and he began to do rudimentary studies with him. As it turned out, that was the best milieu for my father to exercise his enormous creative and linguistic gift.
He and my mother had another reason for coming to New Jersey. My brother and I were to be raised bilingually and biculturally. We were always told of this, even as small children. That eventually we would move back to the States, and my mother told us, though it would be a hard adjustment, would be better than staying in the regimented, exam-filled environment in Japan. My parents, it seemed, saw this as their primary mission for us.
I ended up earning pocket money by mowing lawns on bright Saturday mornings in New Providence the first few years. I liked the smell of the fresh-cut grass, fragrance of the American suburbia. But, I would soon find out that I was highly allergic to grass. That discomfort, of course, did not stop me from working around the yards of my father’s colleagues, making neat crossing patterns on the lawns of New Jersey.
My father also hired me to create spectrographic prints as my summer job. I would put thermo-sensitive paper around the drum of the device. As the drum rotated, the voice data would translate itself into a visual dance in front of me, the needle jumping up and down like a seismic meter. The needle also made this faint scratching sound, and smelled like burning rubber. I made hundreds of these prints, so that my father could catalogue the patterns. He was working on a computer simulation of the tongue and mouth as an acoustic chamber.
Bell Labs was a bit like Starship Enterprise. The main building had gigantic wings spreading over the green hills. The copper roofs of the building seem to hover in the vast landscape, and they gleamed in the rain. When I went to help my father in the morning, I remember the hallway wings filled with conversation. Once, when I picked up my father at night, I was surprised how many scientists were there late at night. Bell Labs was the birthplace of many inventions, including the transistor, and laser technology, not to mention UNIX and C computer languages. When my father worked there, there had already been seven Nobel Prizes given to Bell Labs scientists. I suppose at that level of intellect and creativity, a regimented time schedule would not be needed. Today we are still benefiting from what my father and his colleagues developed in the Seventies. We will, no doubt, soon run out of that Research & Development capital. Looking back, Bell Labs was one of the most creative zones in the world.
At night, at Max and Marj’s home, they had gatherings of researchers who often brought in violins, and played chamber music. Often Lillian Schwartz, one of the world’s first artists who utilized computer morphed images (http://www.lillian.com), stopped by. She created a portrait of Lincoln with blocked shapes using computer analysis, which hung in Max’s living room. Max often took out a machine that he invented that turned violin music into trumpet blasts, delighting us. These things were just a normal part of the milieu that I grew up in. My brother and Max’s children talked about computer programming, and later attend colleges like M.I.T. and Dartmouth University.
For lazy summer afternoons, for lunch, Marj would often heat up Campbell Tomato Soup, and Grilled Cheese Sandwich. I’d never had those in Japan. I got used to it after a while, even learning to use the electric stove, and eating them before going out to use the long tree swings set up in the front yard, getting rope burns into my hands, swinging from a huge oak tree, a tree large enough that one imagines it being a witness to the American Revolution.
The scientists were convinced that they would be able to simulate human speech completely as to be indistinguishable from the actual voices within a decade. I avidly perused (could not read English that well then) Popular Mechanics magazine sitting around in the Mathews’ home, imagining what the world would be like by the time I was thirty. There was an assumed optimism in the air, symbolized by the World Trade Centers, a twin confidence in information technology and commerce.
In Jeremy Bernstein’s “Three Degrees Above Zero; Bell Labs in the Information Age” (which my father signed and gave to me, one of the first published copies, in 1983), he mentions my father’s young colleague:
Mitchell Marcus, a thirty-three-year-old linguist and computer scientist at the Linguistics and Speech Analysis Research Department of the Acoustical and Behavioral Research Center in Murray Hill, conjectured that it might be within ten or fifteen years. He noted, “We are going to have fairly soon enormously powerful machines with the hardware almost for free. They will be in our TV sets, in our Waring Blenders, and in our microwave ovens, and the right way to communicate with many of them will be to talk to them.” (pg. 49)
Mitchell was one of the young scientists my father recruited to work for him. Mitch left his post at MIT to come to work for my father. I asked my father what his hiring policy was. “You take your time to get only the best, and you give them complete freedom once they arrive.”
Mitch Marcus: “I just didn’t believe I would have carte blanche (at Bell). It took me awhile after I got here to realize that it really is true.” (pg.66)
It is strange to think that while these things were going on, my father never tried to teach me, or attempted to groom me to understand scientific thinking. It seemed that he also gave me and my brother carte blanche, too. It was not until I was in college that I started to inquire about his work. While he drove me from home to college (a three hour ride from Bucknell University, Lewisburg, PA), we spoke about these things. I felt I had come to know my father as a person for the first time. Before then, he was a distant, but a generous, figure, one that a teenager would looked up to as brilliant, but bit of a mystery. The abundant layers of green leaves bursting out on Route 80 echoed in my mind as we talked, as his generative world began to open up, layer by layer, to my awareness as an artist.
In 1980’s, at the age of early 50’s,my father began to send a series of notes to his colleagues questioning the basic tenets of acoustics research, as he found them flawed and inadequate for the goals pursued. In my simplified understanding (over-simplification, I am sure), what the early research assumed was that by segmenting speech patterns, you could have enough data to rebuild speech. It would be a bit like dissecting a frog, and stitching it back together, only to expect it to jump again -- A typical reductionist/modernist assumption.
My father’s Converter/Distributer theory (C/ D theory) assumes that computer technology is now capable of using contextual patterns of speech, and able to simulate an architectural structure to account for the morphing of speech production. Rather than the segmental approach, he calls his new thinking prosodic, as it accounts for the complexity of speech and language. But it would take years of research to get to a point of presenting his new ideas to the linguistics/phonetics community. My father, who rarely had problems finding support for his research in his life, was in for a battle. Many tenured professors, I am sure, found his simple claims rather threatening to their own assumptions. He could not find funding, and found himself fighting the establishment of the research world, the very establishment he had helped to build. After my father’s many futile attempts to secure funding for his new research, my brother, a successful entrepreneur in Silicon Valley, stepped in to fund a post for a graduate student at The Ohio State University, to help my father compile enough data to be able to begin his research.
Retired from Ohio State, he began doing a research stint for International Institute for Advanced Studies in Kyoto. He then updated me on his recent findings and presentation he had been preparing for a major conference. I asked him what he was interested in pursuing for the next six months at the institute. He listed off five issues he was concerned with, including how we need to begin a movement of a non-air conditioned world (just plant more trees, he implores), re-examining the basic premise of Kanji representation in Japanese computer software, and the overlap of the architectural structure to evolutionary studies to account for complexity and mystery of living beings. Apparently, he has not slowed down at all, even after his supposed “retirement.”
In Creativity in Science, Dean Keith Simonton notes: “Highly creative individuals are said to have a flat hierarchy of associations in comparison to the steep hierarchy of associations of those with low creativity.” (pg. 105, Creativity in Science, Cambridge Press) And creative scientists display a definitive “associate richness” of divergent thinking. My father’s mind works in this divergence, while noting prosodic details of reality, but often going beyond the normative associations. He has the ability to seek and detect the underlying assumptions, a quality that Thomas Kuhn mentions in “The Structure of Scientific Revolutions,” an ability that is essential in major paradigm shifts in scientific thinking.
“Discovery commences,” Kuhn notes (pg 52, The Structure of Scientific Revolutions, second edition, University of Chicago Press) “with the awareness of anomaly, i.e., with the recognition that nature has somehow violated the paradigm-induced expectations that govern normal science.”
During my recent collaboration with Susie Ibarra at Brecht Forum (http://www.makotofujimura.com/writings/refractions-32-emanuels-heartbeat/) , I found myself thinking of my father, as a great influence of creativity that leads into great paradigm shifts in sciences and in the arts. “The awareness of anomaly” must exist in the process of creation whether in the arts or in the sciences. Such awareness, for my father, extended into his relationship with his sons. When I told my father that I wanted to pursue being an artist full time, he immediately replied “Oh, that’s what I wanted to be!” He even bought a painting of mine that was selected for an invitational exhibit in Hartford, Ct. in 1984 for $100, but I did not know about it until I visited his home in Ohio many years later. He was my first patron. To him, the “anomaly” of a son desiring to become an artist was a generative opportunity.
In the collection of portraits in Tokyo University hallways, my father’s portrait is included. But we need to note that it was a self-portrait, painted by my father himself. Yes, when asked by the University to hire a portrait artist, he hired himself! Such was the vision of a self-assured man who I consider now to be an artist, as well as a scientist. Adventurous and even mischievous, my father continued into his late 80’s to challenge the status quo. He began to lose his hearing later in life, and became frustrated that he could no longer enjoy string quartets because he cannot hear the “timbre”, but refused to wear hearing aids as he disagreed with the assumptions made behind the technology. Refracting in the layers of his research and his life is heard a resonance of his creative impulses, and the word “generativity” flows in my veins. In my art, too, his influence constantly flickers, like the needle of a spectrographic drum spinning around and around in front of my teenage eyes.