tutorial

Solr for Drupal Developers, Part 1: Intro to Apache Solr

It's common knowledge in the Drupal community that Apache Solr (and other text-optimized search engines like Elasticsearch) blow database-backed search out of the water in terms of speed, relevance, and functionality. But most developers don't really know why, or just how much an engine like Solr can help them.

I'm going to be writing a series of blog posts on Apache Solr and Drupal, and while some parts of the series will be very Drupal-centric, I hope I'll be able to illuminate why Solr itself (and other search engines like it) are so effective, and why you should be using them instead of simple database-backed search (like Drupal core's Search module uses by default), even for small sites where search isn't a primary feature.

As an aside, I am writing this series of blog posts from the perspective of a Drupal developer who has worked with large-scale, highly customized Solr search for Mercy (example), and with a variety of small-to-medium sites who are using Hosted Apache Solr, a service I've been running as part of Midwestern Mac since early 2011.

Why not Database?

Apache Solr's wiki leads off it's Why Use Solr page with the following:

If your use case requires a person to type words into a search box, you want a text search engine like Solr.

At a basic level, databases are optimized for storing and retrieiving bits of data, usually either a record at a time, or in batches. And relational databases like MySQL, MariaDB, PostgreSQL, and SQLite are set up in such a way that data is stored in various tables and fields, rather than in one large bucket per record.

In Drupal, a typical node entity will have a title in the node table, a body in the field_data_body table, maybe an image with a description in another table, an author whose name is in the users table, etc. Usually, you want to allow users of your site to enter a keyword in a search box and search through all the data stored across all those fields.

Drupal's Search module avoids making ugly and slow search queries by building an index of all the search terms on the site, and storing that index inside a separate database table, which is then used to map keywords to entities that match those keywords. Drupal's venerable Views module will even enable you to bypass the search indexing and search directly in multiple tables for a certain keyword. So what's the downside?

Mainly, performance. Databases are built to be efficient query engines—provide a specific set of parameters, and the database returns a specific set of data. Most databases are not optimized for arbitrary string-based search. Queries where you use LIKE '%keyword%' are not that well optimized, and will be slow—especially if the query is being used across multiple JOINed tables! And even if you use the Search module or some other method of pre-indexing all the keyword data, relational databases will still be less efficient (and require much more work on a developer's part) for arbitrary text searches.

If you're simply building lists of data based on very specific parameters (especially where the conditions for your query all utilize speedy indexes in the database), a relational database like MySQL will be highly effective. But usually, for search, you don't just have a couple options and maybe a custom sort—you have a keyword field (primarily), and end users have high expectations that they'll find what they're looking for by simply entering a few keywords and clicking 'Search'.

How to Build a Simple $50 Standing Desk

Detail of standing desk surface

I'm no stranger to experimenting with my workspace; since I work from a computer for at least 8 hours a day, I try to find ways to prevent RSI and joint pain. I've tried most everything—from über-expensive fancy mesh desk chairs to ergonomic keyboards and vertical mice. But nothing has made as large (and quick) a difference as working from a standing desk.

Checklist for Setting up a CentOS 6 LAMP Server

I have to set up a new LAMP server for different clients here and there, but not with enough frequency to warrant using a particular scripted solution or 'stack' from a particular hosting company. Plus, I like to have a portable solution that is flexible to the needs (and constraints) of a client's website.

Note on hosting providers: For hosting, I've used a very wide variety of hosts. I typically use and recommend Hot Drupal VPSes or Linode VPSes [affiliate link] running CentOS for a good LAMP server. Shared servers are only good for nonessential or low-traffic sites, but they are a bit cheaper and easier to use for simpler needs!

So, here's a typical step-by-step process for how I set up a CentOS 6 (similar process for CentOS 5) server for LAMP (Linux, Apache, MySQL, and PHP), often for low-to-moderate Drupal sites (one or many):

Minecraft Patching Guide for Macs

I've watched a few episodes of 'The Minecraft Project' on YouTube for inspiration, and I occasionally play Minecraft for an hour or two as a diversion (it's like LEGOs on a computer, but much more fun, because there are zombies!).

Jeff's Humble little Minecraft Farm
My humble little Minecraft farm.

One thing I've always liked is The Minecraft Project's look and feel, mostly due to syndicate's use of the DokuCraft Light texture pack. However, getting that texture pack to work along with other mods and patches (especially the automatic tool switcher mod) took some work on my Mac, and I thought I'd post my process for getting everything to work here, for the benefit of others having the same troubles (especially those getting the 'Use the patcher noob' messages where water, lava, etc. are supposed to appear):

How to Repair Your Intel iMac — DIY Guide from Lifeisaprayer.com

Over on Lifeisaprayer.com, I posted a detailed tutorial/guide on how to replace the hard drive inside a 24" Intel iMac with an aluminum enclosure (the process is similar on other aluminum iMacs). It's a rather intricate process, so in addition to a few illustrations, I posted a video of the process on YouTube (it's embedded over on Lifeisaprayer.com as well!).

Intel iMac Teardown and Hard Drive Replacement - DIY/Guide

 

iMac - Intel - Guts exposed

Have fun repairing your iMac! (Please be sure to leave comments on the Lifeisaprayer.com post, and not here).

Simple Steps to Protect Your Online Identity/Data

[Update: Back when this was written, very nice password managers like 1Password and LastPass didn't exist or were not very capable of managing passwords as well as they are today—please ignore the advice below and use a password manager to generate very long, random passwords, and use the password manager instead of memorizing anything.]

Every month or so, another scary story about a huge security compromise (a.k.a. a hack) surfaces on the Internet, and this month is no exception. Earlier this month, the whole Twitter corporate heirarchy had a lot to worry about, as a hacker (that's kind of a misnomer... hackers are usually nothing more than persistent, patient and sly computer users) accessed many Twitter employees' email, iTunes, Google, etc. accounts, all because of the fact that one of the employees (probably not the only one, though) left an open door via a few small missteps, security-wise.

The hacker, after gathering tons of personal information gleaned from all over the web, was able to recover a user's Gmail password by guessing a few personal questions Gmail asks on the password recovery form (i.e. "Who was your favorite actor?," "What is your maiden name?," etc.). Then the hacker simply searched through the user's emails for something like "username password," because he knew that a lot of websites (like the Joomla! forums, some gaming sites, online stores, etc.) simply send an email upon a new user registration that contains the person's username and password. Once the hacker got ahold of a few more passwords this way, he was on his way to 'hacking' all the user's accounts... because like most people online, the user had only one or maybe two passwords he used for everything.

...but using the same password for multiple sites/services isn't necessarily a bad thing. Not if you follow these steps:

How to Save 20 Watts while Running an iMac (or another Mac)

Something you don't think about every day, but something that could save you enough change to get a Big Gulp every now and then: You can take a few simple steps to drastically reduce the amount of power consumed by your computer. Especially when you're doing many things at the same time with multiple hard drives and the screen turned on at full brightness!

This article is written specifically for the 24" iMac (late 2008), but applies to pretty much any Mac that uses electricity (read: ALL of them). By following the steps in this article, you can save a bit of power, which translates into saving a small amount of change each month. And who wouldn't like a few extra nickels in this economy?

The Discovery

I recently purchased the APC Back-UPS NS 1250, and one of the most amazing features of the UPS is the ability to see how many watts are being actively consumed by a device plugged into it.

I found the results of my testing to be quite interesting. When I had the iMac running with the screen at full brightness, the computer was using the energy equivalence of an old 100 Watt tungsten (i.e. 'energy sucker') light bulb! I don't typically run the screen this bright, though, because the lighting in my computer room is typically subdued. So I turned the brightness down all the way (a comfortable level for my vision), and looked again. This time, the computer was using about 75 Watts. NICE!

iMac Power Chart (in Watts)
(Big bright chart for visual learners).

Backup Strategy for Mac OS X Using Disk Utility, Carbon Copy Cloner, etc.

A blast from the past! The following article is from one of my first websites, ca. 1999, and was updated a couple times throughout it's history. I am re-posting it here because my old website will be deprecated quite soon.

A few notes before we begin: Since the writing of this article, Time Machine came into being (along with Mac OS X 10.5), and has brought about a revolution in the way I maintain backups: my schema now is to have a local daily Time Machine backup to my external hard drive (I recommend a simple 1-2 TB External USB hard drive), then do a once-a-month DVD backup (stored offsite) of my most important files. For most home/small business users, this should be adequate.

Another revolution in data backup is the idea of backing up 'to the cloud' - with the prevalence of broadband Internet access, and the plethora of options for online storage, many companies offer solutions to online backup that were only dreamt of back in the late nineties. Some solutions I recommend: MobileMe (what I use, but not for everyone), Mozy, BackJack, and JungleDisk. (No, those aren't referral links—would I try pulling that on you?).

Backup Strategies for OS X

A question often asked on the Apple Discussion boards and by my fellow Mac users is: "How/when should I backup my Mac, and what is the best/quickest and most reliable way to do it." This is a complicated question, as there are many different ways one can go about backing up OSX.

There are three basic ways that I would like to cover in this article:

  1. Using Disk Utility to quickly and easily make a complete, bootable backup to an external drive;
  2. Using Carbon Copy Cloner to either (a) do the same thing as Disk Utility, or (b) to clone a certain folder or group of folders (another program that does a great job is SuperDuper!);
  3. Drag-and-drop copy files and folders for a quick backup of important files.

W3C Validation & Why You Should Use It

Whenever you're designing a website, one of your primary goals, besides communicating the mission of the organization for whom the website is being made, should be to make the website accessible to all visitors, no matter what kind of computer or browser they have, and no matter what kind of disabilities they have (whether it be blindness, deafness, or other problems).

Luckily for you, there's a free and easy-to-use tool on the web that lets you check how well your website conforms to coding standards:

W3C Markup Validation Service Banner

The W3C generously provides this service to further their mission of having an open, accessible and free web. The tool is dead simple to use: just type in your website's URL, and click Validate. Errors will then show up, and you can go back to your source code and fix the little mistakes you've made. But there's a lot more about Validation that needs to be said!

Taming Mac OS X Mail - Previous Recipients

Mac OS X's Mail program has a very handy feature called 'Previous Recipients' that does a very nice thing: It saves a list of every person and email address you've ever sent an email to. Then, it automatically fills in that person's email address when you type it or the person's name in the 'To' field in a new message. This is usually a good thing, because it saves you time (you don't have to look up the address again!).

However, there are times when you want to send an email to a specific email address for that person, and the email address that Mail automatically inserts is—gasp!—the wrong address. For example, I want to send an email to my friend John, so I type in "John" in the To field. Mail fills in the address I usually send emails to: [email protected]. But I want to send the mail to John's alternate address, [email protected]... and I want to start sending emails to that address rather than to his first email address all the time. There are two easy solutions to this problem: