blogcutter

Looking back over the past fourteen months, it's hard to imagine how we would have coped if we didn't have reliable internet service. Time was, people without a proper computer set-up at home could at least flock to their local public library and book a time to use the computers there. But with most libraries, schools, community centres and the like shuttered because of the pandemic, the options are much more limited. These institutions can now only loan out laptops and such to one family at a time and realistically, with our lives moved almost entirely online, even one computer to a household is likely insufficient.

So this week, I decided to direct my donation to the National Capital Freenet's Community Access Fund:

https://www.ncf.ca/en/high-speed-internet/community-access-fund/

Not sure to what extent it will help people in rural and remote areas where the infrastructure is in many cases inadequate, but initiatives such as this one are at least a step towards helping some low-income folk to tune in to the online world.

Current Mood: content
Current Location: home

How do you select a good internet search engine? Or do you just go with the flow and use whatever the default or requisite search engine is for your particular device and set-up? Those younger than a certain age likely will not remember the pre-web days of ftp and telnet and archie and veronica searches. If they know something about those days and those techniques, they may be inclined to dismiss them as primitive or archaic. And yet, there was a real value in forcing users to pin down what they were really looking for and refine their search strategy.

Ideally we should be able to meld the skills that humans excel at with those tasks that computers do best to form a kind of super-human skill set or tool kit. And as humans, let's not abdicate the responsibility for using our human judgement to pick the right tool for the job at hand!

So far, I have yet to find the ideal search engine, or preferably a cluster of them, along the lines of the metasearch sites that let you use several search engines simultaneously. I do sometimes find it helpful to read review articles that rate search engines according to certain criteria. There are plenty out there although with the search tool landscape constantly changing and evolving, I like to check regularly and find the most recent reviews. Here's one example:

https://www.lifewire.com/best-search-engines-2483352

For any kind of in-depth research, I consider it absolutely vital to be able to "look under the hood" and determine as far as I can the search algorithm that applies. That means at a minimum some sort of "advanced search" functionality. Can you use Boolean logic and wildcards and date ranges? What exactly is being trawled here? Titles, keywords, controlled subject headings or categories, full text? Can you weight your search terms relative to each other? Can you search by format? Will it tell you about content behind a paywall, even if further steps are needed to actually access that content? Who is behind the search engine? A purely commercial enterprise? An academic institution? A government department? Is it paid by other interests to advertise their products or services near the beginning of your search results, regardless of their relevance?

Then of course there's the problem of privacy and security and what they do with your information. DuckDuckGo has gained a respectable following from stating that it doesn't track you. And I do find it useful for quick, simple searches. But it's a bit of a one-trick pony as far as I can see. It's very opaque; I don't really know what-all it's searching and what search operators it understands. There's no "advanced search" capability that I've been able to find. I do sometimes read its news releases but they're very U.S.-oriented and again, they focus only on the privacy aspect, not on any features that might make them particularly efficient at teasing out appropriate and relevant search results.

I'll leave it at that for today. For once, this entry is not primarily about Covid-19, although most of us are probably spending much more time online (and less in bricks-and-mortar research facilities like libraries) since the pandemic swept into our lives. And it IS very topical too, given that the U.S. Department of Justice is taking Google to court.

Current Mood: contemplative
Current Location: home library
Current Music: The Searchers

When I was studying for my Masters of Library Science in the mid-1970s, there was a transition underway in libraries from "traditional" card catalogues to computerized ones that could be searched remotely from other computers or from "dumb" terminals. ISBD (International Standard Bibliographic Description) was in its infancy and the idea was that a specific combination of punctuation would pave the way for computers' "recognizing" the elements of bibliographic description so that even if you didn't understand the language of the work in question, you (and the computer) could identify the most important elements of the work - title, author, place of publication, publisher and date published (to list just the basics). I found the whole field quite fascinating.

In terms of subject access, we were in transition between (on the one hand) "controlled vocabulary" subject headings and (on the other hand) "descriptors", which were typically KWIC (keyword-in-context) or KWOC (keyword-out-of-context); those descriptors, I should add, could still be "controlled vocabulary" as many were drawn from some sort of thesaurus possessing its own infrastructure of broader, narrower and related terms.

Fast-forward to the Internet Age. Some people now believe that if you can't get to it through Google, then it doesn't exist. Even Google has been evolving. I used to always go to the "Advanced" settings and adjust the number of "hits" per page to the maximum of 100 (instead of 10). Now you're not even given that option. And the capacity to combine search terms through Boolean operators seems to me to have been been watered down as well. All that before you even begin to consider the oft-insidious matter of paid ads (both overt and covert) and kickbacks offered by private businesses for getting their sites regularly listed in the top 10 "hits", often with seemingly little relevance to the search terms entered.

The more they gear search engines to "natural" language, the more opaque those search engines become. Because "natural" language is notoriously vague and fuzzy. If you want a language that's relatively specific, you should look to a dead language like Latin or ancient Greek, or an artificial language like Esperanto or Eurolang.

This merits another blog in itself. Adult learners rely mainly on DEDUCTIVE reasoning in learning a second or subsequent non-native language - but more on that later (in this and probably future blog entries).

Fact is, you don't really know if a search engine is defaulting to the Boolean "AND" operator, the (inclusive) "OR" operator or something else. You don't know if it's including both British and American (not to mention Canadian) English spelling or not. And then there's the whole area of truncation and "wildcards".

It may seem that search interfaces are more "sophisticated" if you can interact with them in "natural" language. But "natural" (i.e. living, and real vs. artificial) languages are notoriously imprecise, as I already pointed out. So really, we need both "controlled vocabulary" and uncontrolled vocabulary (e.g. KWIC and KWOC indexes and synonyms and current lingo) in order to get the maximum utility from search interfaces.

I remember how whole scholarly articles used to be written about how to sidestep the dreaded $1 Dialog Print Fee. Librarians got very good at constructing detailed and highly specific search expressions to get optimal results for their clients, before resorting to actually printing anything (and in the 1970s, we would print out only a handful of references and then have the rest printed remotely in California and mailed to us via snail-mail). Once we were ready to print, we would get TWO copies printed and mailed to us - one to give to the client, the other to put in our files, with our own quick-and-dirty card-indexing system, in case another client wanted something similar. Nowadays, some folks would probably argue that that was a copyright violation. But back then, we reasoned that we had bought the information fair and square, with taxpayers' money, and if we could re-use it and fully recoup our investment, why not? Besides, it meant that the library became a sort of node for researchers - if other groups were working on a similar project, we could put them in touch with each other so that they didn't reinvent the wheel. We offered a similar type of service if clients needed scientific articles translated into English or French - we'd tap into CISTI's central file and if the article had already been translated, there was no need to get it translated again.

In those days, you got the librarian to do the search for you, sometimes while you looked on and offered helpful suggestions. You and the librarian were effectively research partners. Nowadays, we live in a self-service era. We have "site licences" for the major online database aggregators. So all the employees of a given organization have hot-and-cold-running Dialog, Lexis-Nexis, and so forth - and we think this is an improvement. But in the olden days, those same employees would have come to the library, had the search done FOR them, and the charges would have been per hour (making them LOOK misleadingly high), not per person served. And since it was only a handful of actual persons doing the searches (on behalf of a much larger group), and these were people who did this kind of searching every day, who prepared their strategies in advance and got into and out of the databases in record time (and whose salaries, in those pre-pay equity days, were quite a bit lower than those of most doctors and scientific researchers), the charges would have been considerably less.

Now I'll put in my plug for(second, third, etc.)language learning in adulthood. There's this odd notion that you don't learn languages as easily once you've passed adolescence. I beg to differ. Because babies and young children generally learn language through INDUCTIVE reasoning. They hear a bunch of utterances or examples of speech and then they unconsciously extrapolate from that to be able to form utterances they have never heard before. It all comes down to the Saussurean notion of "langue" (the SYSTEM or INFRASTRUCTURE of language) vs. "parole" (specific instances of speech or text), or in another linguist's (Chomsky? Bloomfield?) terms, "competence" vs. "performance". But for young children, learning language is at least a full-time (if not a 24/7) job! Even then, they typically take at least five or six years to become reasonably competent in speaking their native language. Adults do not have the luxury of devoting all their time to learning a second or additional language, so they use DEDUCTIVE reasoning (which means learning traditional prescriptive grammar and usage and then extrapolating from that). It's not completely that simple, of course. We actually use a variety of deductive and inductive processes when we learn another language. But to say that inductive is better than deductive, or that descriptive is better than prescriptive, is obviously nonsense and fails to capture the complex processes of the human mind.

I'd like to see the prescriptivists and the controlled-vocabulary advocates gain more traction in the modern world. But perhaps I'm fighting a losing battle.

Current Mood: cynical
Current Location: my home library
Current Music: "Madam Librarian" from the Music Man

S	M	T	W	T	F	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Entries tagged with internet

Philanthropic Phriday 62: Freenet's Community Access Fund

Dismantling the Google-plex, or, Search Engineering 101

On the opacity of search engines, the vaguery of "natural" language and other musings

Profile

Navigation

June 2025

Syndicate

Most Popular Tags

Page Summary

Active Entries

Style Credit

Expand Cut Tags