blogcutter

When I was studying for my Masters of Library Science in the mid-1970s, there was a transition underway in libraries from "traditional" card catalogues to computerized ones that could be searched remotely from other computers or from "dumb" terminals. ISBD (International Standard Bibliographic Description) was in its infancy and the idea was that a specific combination of punctuation would pave the way for computers' "recognizing" the elements of bibliographic description so that even if you didn't understand the language of the work in question, you (and the computer) could identify the most important elements of the work - title, author, place of publication, publisher and date published (to list just the basics). I found the whole field quite fascinating.

In terms of subject access, we were in transition between (on the one hand) "controlled vocabulary" subject headings and (on the other hand) "descriptors", which were typically KWIC (keyword-in-context) or KWOC (keyword-out-of-context); those descriptors, I should add, could still be "controlled vocabulary" as many were drawn from some sort of thesaurus possessing its own infrastructure of broader, narrower and related terms.

Fast-forward to the Internet Age. Some people now believe that if you can't get to it through Google, then it doesn't exist. Even Google has been evolving. I used to always go to the "Advanced" settings and adjust the number of "hits" per page to the maximum of 100 (instead of 10). Now you're not even given that option. And the capacity to combine search terms through Boolean operators seems to me to have been been watered down as well. All that before you even begin to consider the oft-insidious matter of paid ads (both overt and covert) and kickbacks offered by private businesses for getting their sites regularly listed in the top 10 "hits", often with seemingly little relevance to the search terms entered.

The more they gear search engines to "natural" language, the more opaque those search engines become. Because "natural" language is notoriously vague and fuzzy. If you want a language that's relatively specific, you should look to a dead language like Latin or ancient Greek, or an artificial language like Esperanto or Eurolang.

This merits another blog in itself. Adult learners rely mainly on DEDUCTIVE reasoning in learning a second or subsequent non-native language - but more on that later (in this and probably future blog entries).

Fact is, you don't really know if a search engine is defaulting to the Boolean "AND" operator, the (inclusive) "OR" operator or something else. You don't know if it's including both British and American (not to mention Canadian) English spelling or not. And then there's the whole area of truncation and "wildcards".

It may seem that search interfaces are more "sophisticated" if you can interact with them in "natural" language. But "natural" (i.e. living, and real vs. artificial) languages are notoriously imprecise, as I already pointed out. So really, we need both "controlled vocabulary" and uncontrolled vocabulary (e.g. KWIC and KWOC indexes and synonyms and current lingo) in order to get the maximum utility from search interfaces.

I remember how whole scholarly articles used to be written about how to sidestep the dreaded $1 Dialog Print Fee. Librarians got very good at constructing detailed and highly specific search expressions to get optimal results for their clients, before resorting to actually printing anything (and in the 1970s, we would print out only a handful of references and then have the rest printed remotely in California and mailed to us via snail-mail). Once we were ready to print, we would get TWO copies printed and mailed to us - one to give to the client, the other to put in our files, with our own quick-and-dirty card-indexing system, in case another client wanted something similar. Nowadays, some folks would probably argue that that was a copyright violation. But back then, we reasoned that we had bought the information fair and square, with taxpayers' money, and if we could re-use it and fully recoup our investment, why not? Besides, it meant that the library became a sort of node for researchers - if other groups were working on a similar project, we could put them in touch with each other so that they didn't reinvent the wheel. We offered a similar type of service if clients needed scientific articles translated into English or French - we'd tap into CISTI's central file and if the article had already been translated, there was no need to get it translated again.

In those days, you got the librarian to do the search for you, sometimes while you looked on and offered helpful suggestions. You and the librarian were effectively research partners. Nowadays, we live in a self-service era. We have "site licences" for the major online database aggregators. So all the employees of a given organization have hot-and-cold-running Dialog, Lexis-Nexis, and so forth - and we think this is an improvement. But in the olden days, those same employees would have come to the library, had the search done FOR them, and the charges would have been per hour (making them LOOK misleadingly high), not per person served. And since it was only a handful of actual persons doing the searches (on behalf of a much larger group), and these were people who did this kind of searching every day, who prepared their strategies in advance and got into and out of the databases in record time (and whose salaries, in those pre-pay equity days, were quite a bit lower than those of most doctors and scientific researchers), the charges would have been considerably less.

Now I'll put in my plug for(second, third, etc.)language learning in adulthood. There's this odd notion that you don't learn languages as easily once you've passed adolescence. I beg to differ. Because babies and young children generally learn language through INDUCTIVE reasoning. They hear a bunch of utterances or examples of speech and then they unconsciously extrapolate from that to be able to form utterances they have never heard before. It all comes down to the Saussurean notion of "langue" (the SYSTEM or INFRASTRUCTURE of language) vs. "parole" (specific instances of speech or text), or in another linguist's (Chomsky? Bloomfield?) terms, "competence" vs. "performance". But for young children, learning language is at least a full-time (if not a 24/7) job! Even then, they typically take at least five or six years to become reasonably competent in speaking their native language. Adults do not have the luxury of devoting all their time to learning a second or additional language, so they use DEDUCTIVE reasoning (which means learning traditional prescriptive grammar and usage and then extrapolating from that). It's not completely that simple, of course. We actually use a variety of deductive and inductive processes when we learn another language. But to say that inductive is better than deductive, or that descriptive is better than prescriptive, is obviously nonsense and fails to capture the complex processes of the human mind.

I'd like to see the prescriptivists and the controlled-vocabulary advocates gain more traction in the modern world. But perhaps I'm fighting a losing battle.

S	M	T	W	T	F	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

Jan. 7th, 2013

Jan. 7th, 2013

On the opacity of search engines, the vaguery of "natural" language and other musings

Profile

Navigation

August 2025

Most Popular Tags

Page Summary

Active Entries

Style Credit

Expand Cut Tags