Remote debugging of PHP with IntelliJ IDEA

Just finished setting this so that I could debug some PHP code running under XAMPP on my OS X Lion laptop. The steps are based on those from PHP debugging on IntelliJ IDEA 10 with Xdebug and Building and Installing Xdebug on XAMPP for OS X 10.6.

1. Get the output from the phpinfo() function on your installation. Xdebug has a nice configuration process that uses this info. If you don’t have that available already you need to create a simple file, for example info.php containing

<?php phpinfo(); ?>

and navigate to it in your browser, for example http://localhost:8080/util/info.php. Just select all the contents of the page and copy to the clipboard.

2. Go to the Xdebug Tailored Installation Instructions and paste the configuration info from the webpage into the text box then click “analyse my phpinfo() output”. The output will be customized install instructions that you should follow, including the custom phpize faq entry replacement for step 5. There is one exception. Before step 6, running make, you will need to modify the Makefile because you need a 32 bit build but the output from ./configure will give you a 64 bit one. If you just run make and then use otool to check the resulting xdebug.so module you get output like

21:26:02 $ otool -hv modules/xdebug.so
modules/xdebug.so:
Mach header
      magic cputype cpusubtype  caps    filetype ncmds sizeofcmds      flags
MH_MAGIC_64  X86_64        ALL  0x00      BUNDLE    10       1440 DYLDLINK

but if you check one of the modules in XAMPP php extensions directory, in my case “/Applications/XAMPP/xamppfiles/lib/php/php-5.2.9/extensions/no-debug-non-zts-20060613″, you’ll see.

21:20:58 $ otool -hv sqlite.so
sqlite.so (architecture i386):
Mach header
      magic cputype cpusubtype  caps    filetype ncmds sizeofcmds      flags
   MH_MAGIC    I386        ALL  0x00      BUNDLE     8       1248 DYLDLINK SUBSECTIONS_VIA_SYMBOLS
sqlite.so (architecture ppc):
Mach header
      magic cputype cpusubtype  caps    filetype ncmds sizeofcmds      flags
   MH_MAGIC     PPC        ALL  0x00      BUNDLE     7       1192 DYLDLINK

To fix this edit the Makefile to add “-m32″ to the CFLAGS and LDFLAGS definitions. I ended up with “CFLAGS = -g -O0 -m32″ and “LDFLAGS = -m32″

3. Copy the xdebug.so to the correct location, which is in the install instructions.

4. Update your php.ini. For XAMPP this is in /Applications/XAMPP/etc/php.ini. I added the following lines

zend_extension="/Applications/XAMPP/xamppfiles/lib/php/php-5.2.9/extensions/no-debug-non-zts-20060613/xdebug.so"
xdebug.remote_enable=true
xdebug.remote_port=9000
xdebug.profiler_enable=1
xdebug.profiler_output_dir="/logs"

5. Configure an IntelliJ debug configuration. This is simple. You’ll need to create a “Server” which records the host and port IDEA should connect to and the debugger to use. Pick Xdebug for the Debugger and use the xdebug.remote_port value from your php.ini file. I’m using localhost for the host but you could debug across machines if you wanted to. The other thing the configuration needs is an “Ide key(session id)” value. This can be any string, you’re going to have to supply it as a request parameter so stick to alphanumeric with no spaces and you’ll be fine, I used possumhead.

6. Set your break points and navigate to the page you want to debug, adding XDEBUG_SESSION_START=possumhead to the url to set ide key. This gets stored in a cookie so works across multiple pages. The Xdebug documentation on Starting the Debugger has more information about how the communication works and other values you can set in php.ini.

Posted in Uncategorized | Leave a comment

The difference between ‘good enough’ and better.

From Dustin Curtis a look at voice control in Android and iPhone. Really, it’s a look at how voice control is presented in the Android and iPhone advertisements. It may well be that the functionality is the same in both platforms but way it’s presented in the two advertisements gives me the impression that Apple has thought much harder about the use cases and personas involved, as well as caring enough to employ a much better advertising agency.

Posted in Uncategorized | Leave a comment

JAX-WS client transport logging and debugging

To log the request and response at the transport level the recommended solution you’ll find on the internet is to set

-Dcom.sun.xml.ws.transport.http.client.HttpTransportPipe.dump=true

However, this doesn’t work for me. In java 1.6 the HttpTransportPipe class is in the com.sun.xml.internal.ws.transport.http.client package. Note the extra “internal” in the package name. So, I tried

-Dcom.sun.xml.internal.ws.transport.http.client.HttpTransportPipe.dump=true

and it worked great.

Posted in Uncategorized | Leave a comment

Contract-first web services in Jetty 7 with JAX-WS

I recently had to implement a web service endpoint using jetty 7 as the server. There are reasons we’re not using a simple rest approach for this that I’m not going to go into. Most of the information I’m using came from this post on Using the JAX-WS Endpoint API but there are a few wrinkles, obvious in retrospect I suppose, that tripped me up for a while. Here’s my heavily commented version of the source.

Update Even with this configuration JAX-WS will rewrite the endpoint URL in the provided WSDL so that the protocol and host match that of the server. Which is not what you want when you’re behind, for instance, an apache proxy that does the SSL de/encryption and the actual host name of the jetty instance isn’t the public face you want to present. Fortunately this is easy to fix with a bit of configuration. This tutorial on using jetty with apache shows the important configurations. The directives are “ProxyPreserveHose On”, so that apache passes through to jetty the originally requested host name, and using the mod_headers directive “RequestHeader append X-Forwarded-Proto ‘https’” to inform jetty that it should behave as if being connected to over https.

 
// Tell JAX-WS to use the Jetty implementation of the server provider.
// For this to work you need the jetty-jaxws2spi packages. I'm
// referencing these in the pom as
// 
//    org.mortbay.jetty
//    jetty-jaxws2spi
//    7.0.1.v20091125
// 
System.setProperty("com.sun.net.httpserver.HttpServerProvider",
        "org.eclipse.jetty.jaxws2spi.JettyHttpServerProvider");

// Create a server in the normal way.
Server server = new Server(8080);

// Tell Jetty's JAX-WS implementation that it should use the server we
// created as the server instead of creating another one.
JettyHttpServerProvider.setServer(server);

// A ContextHandlerCollection is needed somewhere in the tree of
// handlers provided to the server for Endpoint.publish to work. This
// is because we want to map the endpoint to a url and that
// requires a ContextHandlerCollection to select the right
// ContextHandler for the url.
HandlerCollection serverHandlers = new ContextHandlerCollection();
server.setHandler(serverHandlers);

// We need to serve exactly the wsdl we used to generate the service,
// and that we exchanged with our partners, not the a version that
// JAX-WS generates. The requires providing metadata and properties
// to the endpoint.

// First read the wsdl from the classpath and create a Source.
URL exampleWsdlResource =
        ExampleServer.class
                .getResource("/wsdl/Example.wsdl");
Source exampleWsdlSource =
        new StreamSource(exampleWsdlResource.openStream());
exampleWsdlSource.setSystemId(exampleWsdlResource.toExternalForm());

// Create a list containing the wsdl. This will be the metadata.
List metadata = new ArrayList(1);
metadata.add(exampleWsdlSource);

// Now for the properties.
Map properties = new HashMap();

// Name space for QNames must be the target namespace in the wsdl
// loaded as exampleWsdlResource above. See create method in
// SDDocumentImpl in the JAX-WS code. This is where it check the
// QNames provided here against the metadata to identify which of
// the entries in the metadata is the correct wsdl.
properties.put(Endpoint.WSDL_PORT,
        new QName("http://vast.com/example/v1.0",
                "ExamplePort"));
properties.put(Endpoint.WSDL_SERVICE,
        new QName("http://vast.com/example/v1.0",
                "ExampleService"));

// Create the Endpoint and provide the metadata and properties.
Endpoint exampleEndpoint =
        Endpoint.create(new ExampleServiceImpl());
exampleEndpoint.setMetadata(metadata);
exampleEndpoint.setProperties(properties);

// Now publish the endpoint. This says where incoming calls should
// be made to. Port number matches the one used to create the
// server.
exampleEndpoint.publish("http://localhost:8080/alerts");

// Start the server and wait till it finishes.
server.start();
server.join();
Posted in Java | 2 Comments

Talking Points Memo’s app for front page layout

A very interesting and technical post from Erik Hinton about Talking Points Memo’s new application for managing the content of their front page. It’s a great demonstration of what can be done with contentEditable. Being able to easily change the front page content is clearly an important feature for TPM so well worth developing in-house if there is no app available that satisfies the requirements.

Posted in Web | Leave a comment

MongoDB with replica sets on AWS – Part Three

I’ve now finished a Logback appender that writes to MongoDB. The source is available on github as the Logback-on-MongoDB project. I’ve not got round to testing it against a MongoDB replica set yet but it works fine against a local single MongoDB install.

The appender is configured via the usual logback configuration mechanism and functions in the same way as other appenders.

 
  <appender name="MONGO" class="com.zanthan.logback.MongoDbAppender">
    <mongoServerAddress>
      <address>localhost</address>
      <port>27017</port>
    </mongoServerAddress>
    <database>Logs</database>
    <collection>LogbackTest</collection>
  </appender>

Errors will be thrown when the appender is initialized if the configuration data is missing or if a connection to the MongoDB can not be established. Take a look at the start method in the MongoDBAppender class for more information.

The appender doesn’t support an encoder to convert the logged data to a string for printing. Instead it stores the data as a document in the configured collection. Here’s the method that creates the document.

    /**
     * Create a new document recording the information
     * from the logging event. Five properties are set
     * "w", the timestamp, "v" the log level, "t" the
     * thread name, "l" the logger name and "m" the
     * message. "v" has one of five possible values,
     * "D", "I", "W", "E" or "X" (X is used if the
     * level is not recognized as debug, info, warning,
     * or error).
     *
     * @param event The event to log.
     */
    @Override
    protected void append(E event) {
        BasicDBObject doc = new BasicDBObject();
        doc.put("w", event.getTimeStamp());
        doc.put("v", logLevel(event.getLevel()));
        doc.put("t", event.getThreadName());
        doc.put("l", event.getLoggerName());
        doc.put("m", event.getFormattedMessage());
        dbCol.insert(doc);
    }

I used short names for the keys as there is no compression in data transmission or storage with MongoDB so I’d prefer not to transmit unnecessary data.

Posted in MongoDB | Leave a comment

MongoDB with replica sets on AWS – Part Two

Part One describes the manual process for setting up a MongoDB replica set using EC2 instances. This part presents the program that automates the operation. It’s written in Java in what I’d describe as a scripting style, such as you might use to write a Ruby or Python program that performs the same sort of operation. It’s not designed for completely robust unattended operation, it doesn’t need to be for its use cases. The complete source code for the MongoDB on AWS project, including pom etc. is available on github. Rather than write a post describing the program I’ve just added lots of comments and posted the significant parts here, it should be easy enough to follow along just reading from the top.

p.s. Some interesting and useful information from Foursquare describing how they use replica sets. Extra configuration wrinkles on top of what I’ve done in this experiment.

package com.zanthan.aws.scripting;

// imports removed

/**
 * Create a new MongoDB replica set using two Amazon EC2
 * instances. This program has to be run from the correct
 * directory because it uses relative paths for some files.
 * It is deliberately written in a scripting style, the
 * sort of program you might think of writing in Ruby or
 * Python.
 *
 * @author amoffat Alex Moffat
 */
public class StartMongoDBReplicaSet {

    /**
     * Id of the image to use for each of the instances.
     */
    private static final String IMAGE_ID =
            "ami-74f0061d";

    /**
     * Type of instance to use.
     */
    private static final String INSTANCE_TYPE =
            "t1.micro";

    /**
     * Availability zone to start the instances in.
     */
    private static final String AVAILABILITY_ZONE =
            "us-east-1a";

    /**
     * Security group that defines firewall rules for
     * the MongoDB instances.
     */
    private static final String SECURITY_GROUP =
            "MongoDB";

    /**
     * Name of the key pair to use for the instances.
     */
    private static final String KEY_NAME =
            "KeyPair20110224";

    /**
     * Location of the file containing the private key from
     * the key pair used for launching instances.
     */
    private static final String KEY_FILE =
            "keys/KeyPair20110224.pem";

    /**
     * Id of the EBS snapshot containing the MongoDB executables.
     * This will be used to create the disks mounted by the AWS
     * instances.
     */
    private static final String SNAPSHOT_ID =
            "snap-a9c5e0c6";

    /**
     * Name of the device the disks created from the snapshot will
     * be attached to. This is referenced in the MongoDBInit.txt file.
     */
    private static final String DEVICE_NAME =
            "/dev/sdf";

    /**
     * File of CloudInit instructions.
     */
    private static final String LOCAL_USER_DATA_FILE =
            "MongoDBInit.txt";

    /**
     * Command that is executed on each Amazon EC2 instance to start
     * MongoDB. The private ip address of the other machine in the
     * replica set is appended to the command.
     */
    private static final String REMOTE_MONGO_DB_START_CMD =
            "/mongodb/mongodb-linux-x86_64-1.8.1/bin/mongod " +
                    "--dbpath /home/ec2-user/data/db " +
                    "--logpath /home/ec2-user/mongodb.log " +
                    "--nohttpinterface " +
                    "--fork " +
                    "--replSet logSet/";

    /**
     * Path to the local MongoDB mongo executable. This is used
     * to issue the commands to configure the replica set.
     */
    private static final String LOCAL_MONGO_EXECUTABLE =
            "mongodb-osx-x86_64-1.8.1/bin/mongo";

    /**
     * The command to configure the replica set. Need additional
     * single quotes because of Java MessageFormat formatting rules.
     * {0} and {1} are substituted with the private ip addresses
     * of the two machines in the set.
     */
    private static final String START_REPLICASET_CMD =
            "'db.runCommand({\"replSetInitiate\" : {\n" +
                    "\"_id\" : \"logSet\",\n" +
                    "\"members\" : [\n" +
                    "{\n" +
                    "\"_id\" : 1,\n" +
                    "\"host\" : \"'{0}'\"\n" +
                    "},\n" +
                    "{\n" +
                    "\"_id\" : 2,\n" +
                    "\"host\" : \"'{1}'\"\n" +
                    "}\n" +
                    "]}})'";

    /**
     * Start the whole process.
     *
     * @param args Arguments passed to the program are ignored.
     * @throws IOException In the event of an error.
     */
    public static void main(String[] args)
            throws IOException {

        // Create the AmazonEC2 client to communicate with EC2
        AmazonEC2 ec2 = createEC2Client();

        // Create object to execute ssh commands on the
        // EC2 instances started by the program.
        SshCommandExecutor sshCommandExecutor =
                createSSHCommandExecutor();

        // Create the replica set.
       new StartMongoDBReplicaSet(ec2, sshCommandExecutor).run();
    }

    /**
     * Create the object that will use SSH to execute commands
     * on the EC2 instances.
     *
     * @return New executor.
     * @throws IOException If errors occur.
     */
    private static SshCommandExecutor createSSHCommandExecutor()
            throws IOException {

        // Check that the key file we want to use exists.
        File keyFile = new File(KEY_FILE);
        if (!keyFile.exists()) {
            throw new IllegalStateException("Key file " +
                    keyFile.getPath() + " does not exist.");
        }

        // Create a temporary hosts file. The ssh command will
        // automatically add hosts to this file. A new one is
        // used instead of the default one. EC2 reuses IP addresses
        // so you may find you get the different host fingerprints
        // for the same IP over the course of multiple executions.
        // There is no way to tell SSH not to blow up in this case
        // but using a new host file for each replica set guarantees
        // it won't happen.
        File hostsFile = File.createTempFile("Hosts", ".txt");
        hostsFile.deleteOnExit();

        return new SshCommandExecutorImpl(keyFile, hostsFile);
    }

    /**
     * Create Amazon EC2 client to use to communicate with AWS. The
     * AWS_CREDENTIAL_FILE environment variable that is used by the
     * Amazon command line tools is used here to find the access
     * and secret key we need.
     *
     * @return The client.
     * @throws IOException If things go wrong.
     */
    private static AmazonEC2 createEC2Client()
            throws IOException {

        // Find the name of the credentials file and make sure
        // it exists.
        String credentialFileName =
                System.getenv("AWS_CREDENTIAL_FILE");
        if (credentialFileName == null) {
            throw new IllegalStateException("No value for environment " +
                    "variable AWS_CREDENTIAL_FILE");
        }
        File propertiesFile = new File(credentialFileName);
        if (!propertiesFile.exists()) {
            throw new IllegalStateException("Properties file " +
                    propertiesFile.getPath() + " does not exist.");
        }

        // Load the credentials file as a properties file.
        FileReader propertiesReader = new FileReader(propertiesFile);
        Properties awsKeys = new Properties();
        awsKeys.load(propertiesReader);

        // Pull out the correct properties to create the basic
        // credentials we need.
        BasicAWSCredentials credentials =
                new BasicAWSCredentials(awsKeys.getProperty("AWSAccessKeyId"),
                        awsKeys.getProperty("AWSSecretKey"));

        return new AmazonEC2Client(credentials);
    }

    /**
     * Communication with EC2.
     */
    private final AmazonEC2 ec2;

    /**
     * Execute commands on remote hosts using SSH.
     */
    private final SshCommandExecutor sshCommandExecutor;

    /**
     * Create an new object.
     *
     * @param ec2 Communication with EC2.
     * @param sshCommandExecutor Remote command execution
     */
    public StartMongoDBReplicaSet(AmazonEC2 ec2,
                                  SshCommandExecutor sshCommandExecutor) {
        this.ec2 = ec2;
        this.sshCommandExecutor = sshCommandExecutor;
    }

    /**
     * The top level entry point. This is the basic script that sequences
     * the necessary operations.
     *
     * @throws IOException In the event of an error.
     */
    private void run()
            throws IOException {

        // Start the instances.
        List instances = startInstances();

        // Even after the instances have started we need
        // to wait a little while for the CloudInit
        // processing to complete so that we can connect
        // to them with SSH.
        waitABit(45, "for initialization to complete.");

        // Get the public and private ip addresses of
        // the two instances.
        String machineOnePublicIp =
                instances.get(0).getPublicIpAddress();
        String machineOnePrivateIp =
                instances.get(0).getPrivateIpAddress();

        String machineTwoPublicIp =
                instances.get(1).getPublicIpAddress();
        String machineTwoPrivateIp =
                instances.get(1).getPrivateIpAddress();

        // Start MongoDB on the first machine.
        startMongoDb(machineOnePublicIp,
                machineTwoPrivateIp);

        // Start MongoDB on the second machine.
        startMongoDb(machineTwoPublicIp,
                machineOnePrivateIp);

        // Wait till everything is ready.
        waitABit(30, "for MongoDB to become available.");

        // Start the replica set. We'll contact machineOne.
        startReplicaSet(machineOnePublicIp,
                machineOnePrivateIp,
                machineTwoPrivateIp);

    }

    /**
     * Start the 2 instances needed for the replica set.
     *
     * @return List of the instances. All will be in the
     * running state.
     * @throws IOException If things go wrong.
     */
    private List startInstances()
            throws IOException {

        // New block device to create from snapshot
        // containing MongoDB executables.
        EbsBlockDevice blockDevice =
                new EbsBlockDevice()
                        .withSnapshotId(SNAPSHOT_ID)
                        .withDeleteOnTermination(true);
        // Mapping for the new block device to map it
        // to /dev/sdf
        BlockDeviceMapping blockDeviceMapping =
                new BlockDeviceMapping()
                        .withDeviceName(DEVICE_NAME)
                        .withEbs(blockDevice);
        // Date for CloudInit to use when configuring
        // the started instance.
        String userData = readUserData();
        // Configure the request using the block device
        // mapping, user data and other parameters.
        RunInstancesRequest runRequest =
                new RunInstancesRequest(IMAGE_ID, 2, 2)
                        .withSecurityGroups(SECURITY_GROUP)
                        .withKeyName(KEY_NAME)
                        .withInstanceType(INSTANCE_TYPE)
                        .withPlacement(new Placement(AVAILABILITY_ZONE))
                        .withBlockDeviceMappings(blockDeviceMapping)
                        .withUserData(userData);

        // Ask AWS to start the instances.
        RunInstancesResult runResult = ec2.runInstances(runRequest);

        // Wait for them to start.
        List instances =
                waitForInstancesToStart(runResult.getReservation()
                        .getInstances());

        System.out.println("Started instances.");

        return instances;
    }

    /**
     * Read the CloudInit configuration data from a file and return it.
     *
     * @return The config data, base64 encoded.
     * @throws IOException If there are problems.
     */
    private String readUserData() throws IOException {
        // Make sure the file exists.
        File dataFile = new File(LOCAL_USER_DATA_FILE);
        if (!dataFile.exists()) {
            throw new IllegalStateException("Can not find ec2 init " +
                    "data file " + dataFile.getPath());
        }
        // Read and convert to base64
        Base64InputStream inputStream =
                new Base64InputStream(new FileInputStream(dataFile),
                        true);
        String userData =
                IOUtils.toString(inputStream);
        inputStream.close();
        return userData;
    }

    /**
     * Wait until all of the instances in the list report their state
     * as running, or 1.5 minutes have elapsed. If they aren't all
     * running after 1.5 minutes and IllegalStateException is thrown.
     *
     * @param instances The instances to check.
     * @return The instances, all in the running state.
     */
    private List waitForInstancesToStart(List instances) {
        // Number of times we've checked
        int count = 0;
        // Start by checking, just in case.
        boolean allRunning = checkIfAllRunning(instances);
        while (!allRunning && count < 6) {
            // Wait 15 seconds.
            waitABit(15, "for instances to start.");
            // Get the status of the instances by providing
            // a list of their ids.
            DescribeInstancesRequest describeRequest =
                    new DescribeInstancesRequest()
                            .withInstanceIds(extractIds(instances));
            DescribeInstancesResult describeResult =
                    ec2.describeInstances(describeRequest);
            // We know there will be a single reservation.
            instances = describeResult.getReservations()
                    .get(0).getInstances();
            allRunning = checkIfAllRunning(instances);
            ++count;
        }

        if (!allRunning) {
            throw new IllegalStateException("All instances did not start.");
        }
        return instances;
    }

    /**
     * Wait some number of seconds.
     *
     * @param timeInSeconds Number of seconds to wait.
     * @param reason The reason we're waiting.
     */
    private void waitABit(long timeInSeconds, String reason) {
        System.out.println("Waiting " + timeInSeconds + " seconds " +
                reason);
        synchronized (this) {
            try {
                wait(timeInSeconds * 1000);
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
        }
    }

    /**
     * From a list of instances extract a list of their ids.
     *
     * @param instances The instances.
     * @return The ids.
     */
    private List extractIds(List instances) {
        List instanceIds =
                new ArrayList(instances.size());
        for (Instance instance : instances) {
            instanceIds.add(instance.getInstanceId());
        }
        return instanceIds;
    }

    /**
     * If all of the instances in the list have a state of "running"
     * return true, otherwise return false.
     *
     * @param instances The instances to check.
     * @return True or false.
     */
    private boolean checkIfAllRunning(List instances) {
        System.out.println("Checking that all instances are running.");
        boolean allRunning;
        allRunning = true;
        for (Instance instance : instances) {
            System.out.println("Instance " + instance.getInstanceId() +
                    " is " + instance.getState().getName() + ".");
            if (!"running".equals(instance.getState().getName())) {
                allRunning = false;
            }
        }
        return allRunning;
    }

    /**
     * Start a MongoDB server on first machine telling it that it's part
     * of a replica set that includes second machine. Communication
     * between the members of the replica set is done via their private
     * IP addresses.
     *
     * @param firstMachinePublicIp Public IP of the first machine.
     * @param secondMachinePrivateIp Private IP of the second machine.
     * @throws IOException If errors occur.
     */
    private void startMongoDb(String firstMachinePublicIp,
                              String secondMachinePrivateIp)
            throws IOException {
        // Need to create an address to log on to the first
        // machine.
        String firstMachineAddress = "ec2-user@" +
                firstMachinePublicIp;
        System.out.println("Going to start MongoDB as " +
                firstMachineAddress);

        // Variables to collect results starting MongoDB.
        final String[] processNumber = new String[] {null};
        final StringBuilder sb = new StringBuilder();

        // Number of times we've tried to start MongoDB.
        int count = 0;

        // It may take a while for the instance to be fully initialized
        // and reachable via ssh so try several times.
        while (processNumber[0] == null && count < 4) {
            // Use SSH to issue the command to start MongoDB on
            // the remote machine. Look at the output and if
            // we see "forked process: " assume success.
            sshCommandExecutor.execute(firstMachineAddress,
                    REMOTE_MONGO_DB_START_CMD + secondMachinePrivateIp,
                    new SshCommandExecutor.OutputHandler() {
                        public void handle(String line) {
                            sb.append(line);
                            sb.append('\n');
                            if (line.startsWith("forked process: ")) {
                                processNumber[0] =
                                        line.substring("forked process: ".length()).trim();
                            }
                        }
                    });
            ++count;
            if (processNumber[0] == null) {
                waitABit(30, "for instance to be reachable via ssh.");
            }
        }

        // If we didn't find a process then print out all of
        // the output from SSH and throw an exception.
        if (processNumber[0] == null) {
            System.out.println("Response from starting MongoDB was:");
            System.out.println(sb.toString());
            throw new IllegalStateException("Could not start MongoDB as " +
                    firstMachineAddress);
        }

        System.out.println("MongoDB started process " + processNumber[0] +
                " as " + firstMachineAddress);
    }

    /**
     * Start a MongoDB replica set by contacting the admin database on machine
     * one.
     *
     * @param firstMachinePublicIp Public IP of first machine.
     * @param firstMachinePrivateIp Private IP of first machine.
     * @param secondMachinePrivateIp Private IP of second machine.
     * @throws IOException In the event of error.
     */
    private void startReplicaSet(String firstMachinePublicIp,
                                 String firstMachinePrivateIp,
                                 String secondMachinePrivateIp)
            throws IOException {

        // Substitute the two private ip addresses into the command
        // to start the replica set.
        String cmd =
                MessageFormat.format(START_REPLICASET_CMD,
                        firstMachinePrivateIp, secondMachinePrivateIp);

        // Write the start command into a file with the
        // correct extension.
        File jsFile = File.createTempFile("replicaset", ".js");
        jsFile.deleteOnExit();
        FileUtils.writeStringToFile(jsFile, cmd);

        // Run the command and write out the results.
        ProcessBuilder pb = new ProcessBuilder(
                LOCAL_MONGO_EXECUTABLE,
                firstMachinePublicIp + "/admin",
                jsFile.getPath()
        );
        pb.redirectErrorStream(true);

        Process p = pb.start();

        BufferedReader rdr =
                new BufferedReader(new InputStreamReader(p.getInputStream()));
        try {
            String line;
            while ((line = rdr.readLine()) != null) {
                System.out.println(line);
            }
        } finally {
            rdr.close();
        }
    }
}
Posted in Amazon Web Services, MongoDB | Leave a comment

MongoDB with replica sets on AWS – Part One

This describes how to run MongoDB on Amazon Web Services, in particular using Amazon Elastic Compute Cloud (EC2) to support a MongoDB replica set.

Choice of machine image

A 64bit machine is needed if a mongod is to manage more than 2 GB of data. It will also need medium or high IO performance which indicates a Large instance as the minimum realistic starting point. The 850 GB of instance storage would be used for the MongoDB data storage. This will cost $0.34 per hour (pricing for US East region) so a replica set with two machines (one master and one slave) will cost $0.68 an hour or about $5,590 a year. That’s the on-demand pricing, because we want to run 24×7 reserved instances may be cheaper. Using reserved instances with a one year term in the same region is $0.12 per hour with a $910 down payment each giving a total of about $3,922.

Location

For reliability in the event of the failure of an availability zone there should be a second slave in a different availability zone in the same region or even in a different region. This will incur data transfer charges between the master and the second slave, much cheaper ($0.01 per GB transferred) between availability zones in the same region than between regions ($0.10 per GB transferred in plus between from $0.15 per GB transferred out).
If the load is low it would be nice to be able to use the machines for other purposes as well, for instance primary web server with slave MongoDB server and backup web server with master MongoDB server.

Experimental

To start with I want to experiment with using Micro instances, even though these have slower EBS storage, to see what the throughput is.

Cookbook

This describes the actual steps I took to get a MongoDB replica set working using manual commands. The next post will build on this to show how to automate the process of starting and stopping a replica set. You have to have automation to work successfully with cloud services.

Security Group

Create a security group to restrict connections into the instances running MongoDB. Both master and slave instances will use this same group because a slave could need to take over from a master so no distinction should exist between them.

#
ec2-authorize MongoDB -P tcp -p 22
ec2-authorize MongoDB -P tcp -p 27017 -u 844613644011 -o MongoDB
ec2-authorize MongoDB -P tcp -p 27017 -u 844613644011 -o AppServer
ec2-authorize MongoDB -P tcp -p 27017 -s 24.153.207.123/32

Install MongoDB

Create a volume for the MongoDB executables. This isn’t going to hold the data so it doesn’t have to be very big. Create the smallest one we can, which is 1GB. Record the volume it.

#
ec2-create-volume --size 1 --availability-zone us-east-1a

Start a micro instance in same availability zone. KeyPair20110224 is the key I’m going to use to log on with SSH later.

#
ec2-run-instances ami-74f0061d -g MongoDB -k KeyPair20110224 -t t1.micro --availability-zone us-east-1a

Wait till it’s running.

#
ec2-describe-instances

Attach the new volume to the instance. Instance id from the output of ec2-describe-instances above, volume id from when the volume was created. You could also use ec2-describe-volumes.

#
ec2-attach-volume vol-77815b1c --instance i-73c2221d --device /dev/sdf

Logon to the instance. The address comes from ec2-describe-instances output.

#
ssh -i keys/KeyPair20110224.pem ec2-user@ec2-174-129-155-243.compute-1.amazonaws.com

Format the volume and mount it.

#
sudo mkfs -t ext3 /dev/sdf
sudo mkdir /mongodb
sudo mount /dev/sdf /mongodb

Download and unpack mongodb package onto the volume we mounted. Using the 64 bit linux package.

#
curl http://fastdl.mongodb.org/linux/mongodb-linux-x86_64-1.8.1.tgz > mongodb.tgz
sudo mv mongodb.tgz /mongodb/
cd /mongodb
sudo tar tzf mongodb.tgz
sudo rm mongodb.tgz

Create a data directory under the ec2-user home directory.

#
cd
mkdir -p data/db

Start mongod.

#
cd /mongodb/mongodb-linux-x86_64-1.8.1
bin/mongod --dbpath /home/ec2-user/data/db --logpath /home/ec2-user/mongodb.log --fork

Test that mongod is working and can be reached locally using the shell. Start mongo and try some commands.

#
bin/mongo
MongoDB shell version: 1.8.1
connecting to: test
> db
test
> post = {"title" : "My Test Post", "content" : "Some stuff I wrote"}
{ "title" : "My Test Post", "content" : "Some stuff I wrote" }
> db.blog.insert(post)
> db.blog.find()
{ "_id" : ObjectId("4dcc9ed4052e568e1f58cb00"), "title" : "My Test Post", "content" : "Some stuff I wrote" }
> exit
bye

Stop mongod. Look in the log to get the process id or record it when the instance starts.

#
head /home/ec2-user/mongodb.log
sudo kill -2 1032
tail /home/ec2-user/mongodb.log

Create a snapshot

Unmount the EBS volume. This is so we can take a snapshot to use to create volumes when running instances.

#
cd
sudo umount -d /mongodb

Logoff the instance and detach the volume. Use ec2-describe-volumes to check that the detach has completed.

#
ec2-detach-volume vol-77815b1c
ec2-describe-volumes vol-77815b1c

Terminate the instance.

#
ec2-terminate-instances i-73c2221d

Take a snapshot of the volume. This is so we can launch other instances using the snapshot to create and attach a volume.

#
ec2-create-snapshot vol-77815b1c -d 'MongoDB installation'

Test the snapshot

Create a CloudInit configuration file called MongoDBInit.txt to start an instance, contents are shown below. For this experiment I’ll use the space on the default drive associated with the instance to store the data. This configuration mounts the volume with the mongodb executables, creates the data directory and starts mongod.

#cloud-config
mounts:
 - [ /dev/sdf, /mongodb, “auto”, “defaults”, “0”, “2” ]
runcmd:
 - cd /home/ec2-user
 - mkdir -p data/db
 - cd /mongodb/mongodb-linux-x86_64-1.8.1
 - bin/mongod --dbpath /home/ec2-user/data/db --logpath /home/ec2-user/mongodb.log --nohttpinterface --fork

Start an ec2 micro instance using the snapshot and CloudInit configuration file. This is to let us test that the automatic mounting works and the installed mongod will run successfully.

#
ec2-run-instances ami-74f0061d -g MongoDB -k KeyPair20110224 -t t1.micro --availability-zone us-east-1a -b "/dev/sdf=snap-a9c5e0c6::true" --user-data-file MongoDBInit.txt

Wait till the new instance is running

#
ec2-describe-instances

Check that it’s possible to connect to mongod on the new instance using the mongo shell from your local machine. Address of the machine comes from ec2-describe-instances.

#
mongodb-osx-x86_64-1.8.1/bin/mongo ec2-67-202-13-208.compute-1.amazonaws.com

Terminate the instance.

#
ec2-terminate-instance i-8fc02be1

Modify CloudInit configuration to support replica sets

To create a replica set we need to know the ip addresses of the machines in the set so we’ll have to start the instances and then use ssh to start mongod. The commands run by ssh will be run as ec2-user so the directories that MongoDBInit.txt creates will have to have their owner changed. Modify MongoDB.init.txt by removing the last two lines so that mongod isn’t automatically started and adding “chown -R ec2-user data” as the last line.

Start another instance with the new MongoDBInit.txt

#
ec2-run-instances ami-74f0061d -g MongoDB -k KeyPair20110224 -t t1.micro --availability-zone us-east-1a -b "/dev/sdf=snap-a9c5e0c6::true" --user-data-file MongoDBInit.txt

Use SSH to start mongod.

#
ssh -i keys/KeyPair20110224.pem ec2-user@ec2-204-236-247-165.compute-1.amazonaws.com "/mongodb/mongodb-linux-x86_64-1.8.1/bin/mongod --dbpath /home/ec2-user/data/db --logpath /home/ec2-user/mongodb.log --nohttpinterface --fork"

Test that mongod is running by reaching it from the admin machine.

#
mongodb-osx-x86_64-1.8.1/bin/mongo ec2-204-236-247-165.compute-1.amazonaws.com

Shutdown mongod using ssh and the process id reported when it was started with ssh.

#
ssh -i keys/KeyPari20110224.pem ec2-user@ec2-204-236-247-165.compute-1.amazonaws.com “kill 1049”

Terminate the instance.

#
ec2-terminate-instances i-8fc02be1

Starting a replica set.

Start two instances to host the two servers of the replica set.

#
ec2-run-instances -n 2 ami-74f0061d -g MongoDB -k KeyPair20110224 -t t1.micro --availability-zone us-east-1a -b "/dev/sdf=snap-a9c5e0c6::true" --user-data-file MongoDBInit.txt

Start mongodb on both machines. Use the internal ip addresses to communicate between the instances. So the first command tells machine one to talk to machine two and the second tells two to talk to one. The replica set is called logSet.

#
ssh -i keys/KeyPair20110224.pem ec2-user@ec2-50-17-110-242.compute-1.amazonaws.com "/mongodb/mongodb-linux-x86_64-1.8.1/bin/mongod --dbpath /home/ec2-user/data/db --logpath /home/ec2-user/mongodb.log --nohttpinterface --fork --replSet logSet/domU-12-31-39-04-0C-AF.compute-1.internal"

ssh -i keys/KeyPair20110224.pem ec2-user@ec2-50-19-70-14.compute-1.amazonaws.com "/mongodb/mongodb-linux-x86_64-1.8.1/bin/mongod --dbpath /home/ec2-user/data/db --logpath /home/ec2-user/mongodb.log --nohttpinterface --fork --replSet logSet/domU-12-31-39-09-88-8B.compute-1.internal"

Use mongo to configure the repl set. Until this is done the replica set isn’t created. Connect to one of the two instances. The internal ip addresses of the two instances are used in the configuration. It takes a short while before the set is configured and the response appears.

#
mongodb-osx-x86_64-1.8.1/bin/mongo ec2-50-19-70-14.compute-1.amazonaws.com/admin
MongoDB shell version: 1.8.1
connecting to: ec2-50-19-70-14.compute-1.amazonaws.com/admin
> db.runCommand({"replSetInitiate" : {
... "_id" : "logSet",
... "members" : [
... {
... "_id" : 1,
... "host" : "domU-12-31-39-04-0C-AF.compute-1.internal"
... },
... {
... "_id" : 2,
... "host" : "domU-12-31-39-09-88-8B.compute-1.internal"
... }
... ]}})
{
	"info" : "Config now saved locally.  Should come online in about a minute.",
	"ok" : 1
}

Look at the mongodb.log files on the two machines. You can see which has become the primary and which the secondary. Also, when you connect to an instance you can see if it is the primary or not.

#
mongodb-osx-x86_64-1.8.1/bin/mongo ec2-50-19-70-14.compute-1.amazonaws.com/adminMongoDB shell version: 1.8.1
connecting to: ec2-50-19-70-14.compute-1.amazonaws.com/admin
logSet:PRIMARY>

Now shutdown the slave and then the master using the process numbers we recorded when we started them.

#
ssh -i keys/KeyPair20110224.pem ec2-user@ec2-50-17-110-242.compute-1.amazonaws.com "kill 1044"
ssh -i keys/KeyPair20110224.pem ec2-user@ec2-50-19-70-14.compute-1.amazonaws.com "kill 1065"

Terminate the instances

#
ec2-terminate-instances i-bb7780d5 i-b97780d7

Clearly executing all of these commands manually isn’t going to work for more than this initial setup, it’s just too complex. The next step is to try and automate the process of bringing up and shutting down a replica set.

Posted in Amazon Web Services, MongoDB | 1 Comment

Command line argument parsing in scala

This is an adaptation of an answer from stackoverflow. Scala: Best way to parse command-line parameters (CLI)? The highest ranked answer listed a number of external libraries but I prefer the simpler functional approach that didn’t get as many votes. Here’s my version of that with additional comments to explain what’s going on.

I want to be able to parse command lines like

  MyProg --port 8090
  MyProg directory --port 8089
  MyProg directory
  MyProg --port 8999 directory

First the using the function. I’m using Symbol(“foo”) instead of ‘foo just so that the formatting works in this post.

def main(args: Array[String]) = {

  val options = nextOption(Map(), args.toList)

  val port = options.getOrElse(Symbol("port"), 8090)
    .asInstanceOf[Int]

  val dataDirectory = options.getOrElse(Symbol("dir"), "src/main/site")
    .asInstanceOf[String]
}

The more interesting bit is the nextOption function itself. It uses pattern matching on lists to recursively parse elements of the argument list. A nice simple system for the common case of simple argument handling.

  // Recursively parse the arguments provided in remainingArguments
  // adding them to the parsedArguments map and returning the
  // completed map when done.
  def nextOption(parsedArguments: OptionMap,
                 remainingArguments: List[String]): OptionMap = {
    // Does a string look like it could be an option?
    def isOption(s: String) = s.startsWith("--")
    // Match the remaining arguments.
    remainingArguments match {
    // Nothing left so just return the parsed arguments
      case Nil => parsedArguments
      // Option defining the port to listen on. Use the value after the
      // --port option as the number and continue parsing with the
      // remainder of the list.
      case "--port" :: value :: tail =>
        nextOption(parsedArguments ++ Map(Symbol("port") -> value.toInt),
          tail)
      // The data directory. This case matches if the directory comes
      // before the port option, the directory doesn't look like an
      // option (doesn't start with --) and the string after it
      // does. Here parsing needs to continue with tail of the
      // arguments provided to this call as the next
      // iteration must consider possibleOption.
      case dir :: possibleOption :: tail
        if !isOption(dir) && isOption(possibleOption) =>
        nextOption(parsedArguments ++ Map(Symbol("dir") -> dir),
          remainingArguments.tail)
      // Data directory. This matches the last element in the list if it
      // doesn't look like an option. As we know there is nothing
      // left in the list use Nil for the remainingArguments passed
      // to the next iteration.
      case dir :: Nil
        if !isOption(dir) =>
        nextOption(parsedArguments ++ Map(Symbol("dir") -> dir),
          Nil)
      // Nothing else matched so this must be an unknown option.
      case unknownOption :: tail =>
        error("Unknown option " + unknownOption)
        exit(1)
    }
  }

The error method comes from the Loggable trait described in Simple scala trait for logging with slf4j.

Posted in Scala | Leave a comment

Hitting the IE Selector Limit

Ran into an interesting problem testing today. Over the weekend something changed and some elements of our app weren’t looking correct any more on IE6. Worse, this was only happening in the QA verification builds, not on developer boxes. The major difference in CSS between QA and developer builds is that QA builds use a single combined and compressed browser specific CSS file with a strong name that’s a hash of its contents and permanent caching, for speed in production; whereas developer builds use individual files, that are never cached, for easier development.

So, the first suspect was the combination process I wrote, or the YUI Compressor we use. However, the styles were in the compressed file that was being served. It just looked like IE wasn’t finding them. Looking at the change history for development file containing the styles didn’t show any likely recent changes. Manually editing the combined CSS file to move the position of the selector that wasn’t being found it looked like we could isolate the problem, a sort of move it to line 1074 and it failed, line 1072 and it worked, so what’s wrong with line 1073 approach? The problem was that there just didn’t seem to be anything wrong with the lines in question.

Then a thought, perhaps there is a file size limit we’re running into. Delete some random lines above the missing selector and all is fine, put them back and the problem comes back. Now, to the google, and we find Yet another IE6 limitation: the Selector Limit. We were running into a limit of 4095 selectors in a single CSS file in IE browsers.

Once we knew the problem the solution was pretty easy, well, the real solution is to rationalize our selectors some more but this will do for now. I modified the CSS combination program to start a new file every time it had processed 4000 selectors, using a simple regular expression to approximate the start of a selector. Works very well.

Posted in CSS | 2 Comments