Home About The Codist RSS Feed

A Sql Challenge: Find Available Appointment Times
May 14, 2007 12:37 perm link Readers: 1679

I am working on a service appointment application, where a user can choose a service taking some multiple of 30 minutes, and be given a set of possible appointment times. It's easy to solve in code, but it seems an interesting challenge to create this in a single sql statement.

Since my readers are such an intelligent lot, I wonder if anyone has a solution to this.

The table has half hour slots, 48 per day. Times where the service is not available are marked as blocked, times with existing appointments have a foreign key to the appointment. I left out the irrelevant columns:

create table schedule
(
  id int auto_increment primary key,

  blocked boolean default true,
  halfhour time not null,
  day date not null,

  appointment_id int
);
create view openslots as select * from schedule where not blocked and appointment_id is null;

The schedule table is preallocated by day and halfhour so the id's are consecutive. The requirement is to find all sets of consecutive slots for a given multiple of half hours. Note that appointments over a day boundary are not allowed.

The actual database is H2, basically assume SQL92 features.

Any ideas are welcome, I have some as well but need to move on instead of playing around with the sql; this has an easy code solution.

Changing the table definitions are also possible.

My Tags:

  • Babu: May 15, 2007 05:31

    I've explained this in http://vsbabu.org/mt/archives/2004/04/30/when_is_he_free.html - note that this is for Oracle

    You might find this useful.

  • Jeff: May 15, 2007 06:54

    Here you go:

    <a href="http://weblogs.sqlteam.com/jeffs/archive/2007/05/15/60207.aspx">http://weblogs.sqlteam.com/jeffs/archive/2007/05/15/60207.aspx</a>

    Should work in any SQL dialect that supports correlated subqueries. I can modify it to remove that requirement as well, but derived tables must be supported. both are pretty basic SQL features (i think even MySQL supports these.)

    Let me know what you think.

  • codist: May 15, 2007 08:33

    I tried the solution from Jeff, translated to H2

    select day, min(halfhour) as StartTime, max(halfhour) as EndTime
    from
      (select s.day, s.halfhour, s.blocked,
        (select count(*) from hair.schedule s2 
         where s2.day = s.day and 
               s2.halfhour <= s.halfhour and 
               s2.blocked != s.blocked
         ) as RunGroup
       from hair.schedule s
      ) x
    where
      not x.blocked
    group by
      day, RunGroup
    having count(*) >= 3
    order by day
    

    But I only get 1 result per day (basically the whole block of time >= 3). Could be I made a mistake in the sql...

  • Jeff: May 15, 2007 08:43

    It is hard to guess without some sample data what might be happening. Your code looks fine to to me, but I am not familiar with H2.

  • Jeff: May 15, 2007 09:17

    This update will return multiple results instead of 1 big block. I think that this is more of what you are looking for. It is actually even shorter and maybe more efficient, I didn't check yet. Note that this code applies to my example and schema, you'll need to make minor tweaks to modify to yours. more on my blog in a few minutes.

    declare @timeslotsNeeded int

    set @timeSlotsNeeded = 3

    select s.Day, s.TimeSlot, max(s2.TimeSlot) as EndTime

    from schedule s

    inner join schedule s2

    on s2.day =s.day and

    s2.TimeSlot between s.TimeSlot and s.TimeSlot @TimeSlotsNeeded- 1

    where

    s.Booked = 0 and s2.Booked = 0

    group by s.Day, s.TimeSlot

    having count(*) = @TimeSlotsNeeded

  • jeff: May 15, 2007 09:29

    Blog post has been updated. Note that in my previous comment, above, this line:

    s2.TimeSlot between s.TimeSlot and s.TimeSlot @TimeSlotsNeeded- 1

    should read:

    s2.TimeSlot between s.TimeSlot and s.TimeSlot @TimeSlotsNeeded- 1

    (there should be a PLUS ( ) before the variable ... it got edited out for some reason when posted)

  • jeff: May 15, 2007 09:29

    .. and it was edited out again! Well, see my blog for the code! :)

  • codist: May 15, 2007 11:55

    yeah that's a bug I need to fix. This code works

    select s.day, s.slot, max(s2.slot) as EndTime
    from hair.schedule s
    inner join hair.schedule s2
       on s2.day =s.day and
          s2.slot between s.slot and s.slot PLUSSIGN 3 - 1
    where
      not s.blocked and not s2.blocked
    group by s.day, s.slot
    having count(*) = 3
    order by day,slot
    
  • Jeff: May 15, 2007 11:58

    So what do I win? :)

    Actually, just a "thank you" would be kind of nice ....

  • codist: May 16, 2007 05:45

    Thank you for your help! The project itself has been shelved for a bit. It is a useful sql concept that wasn't covered well elsewhere.

  • Add Comment

SQL Injections: How Not To Get Stuck
May 08, 2007 17:38 perm link Readers: 2723

This guest post was written by Thomas Mueller, the author of the open source H2 relational database, an earlier database Hypersonic SQL, and Pointbase Micro.

SQL injection is one of the most common security vulnerabilities for web applications today. Recently a test (see How Prevalent Are SQL Injection Vulnerabilities?) showed how widespread the problem is. This article shows ways to solve the problem, including a new technique called "Disabling Literals", and lists new attacks.

SQL Injection Example

What is SQL injection? You can skip this section if you already know it. We are using Java code and the JDBC API here, but SQL injection is possible in all programming languages, even when using O/R mapping tools. Here is some code from "Insecure Systems Inc.":

stat.executeQuery("SELECT * FROM USERS WHERE " + 
        "NAME='" + name + "' AND PASSWORD='" + password + "'");

An attacker could use the following password:

' OR ''=' 

In this case the SQL statement becomes

SELECT * FROM USERS WHERE 
NAME='admin' AND PASSWORD='' OR ''='' 

This condition is true in every case. If this was the password checking logic, the attacker just broke in.

Prevention with Stored Procedures and Views

One suggestion is to stored procedures to prevent SQL injection. Here we replace the SELECT statement with a stored procedure: GET_USER(userName, password) returning a result set. Now the application may call:

stat.executeQuery(
        "CALL GET_USER('" + name + "', '" + password + "')");

Of course one stored procedure is not enough for an application. For changing the password of an existing user there might be another stored procedure: CHANGE_PASSWORD(userName, password), returning the new password. To break in, the attacker only needs to providing the following password:

'||CHANGE_PASSWORD('admin','123')||'

So using stored procedures and/or views by themselves does not guarantee safety from SQL injections. Stored procedures and views have other benefits, for example the ability to change the behavior while the application is running, but they don't solve the problem.

Using Access Rights

Let's say we don't do anything to prevent SQL injection, but use very strict access rights, combined with views and/or stored procedures. The idea is to limit the effects of SQL injections. In the extreme case, it is possible to create a "sandbox" so a potential hacker can do no harm. Creating such a sandbox may have high costs:

1) Setting the correct access rights for tables/columns is not easy.

2) Row based access rights is even harder.

3) Each session needs its own database connection.

4) You need to write most of the business logic in SQL.

5) When using dynamic queries, SQL injection is still possible.

6) There is no way to do it in a database independent way.

You may wonder how you could possibly limit the access to certain rows in table. It is possible but complicated. Take a look at the attached source code (method limitRowAccess) on how to do it. You store the current user id in the database (as a session variable, if your database supports this), then use a view to show only the rows with this id.

Using Placeholders

Lets go back and try to prevent SQL injection, instead of just limiting the effects. A good way to solve the problem is to use placeholders (also called parameters). So here is the code from "Secure Systems Inc.":

PreparedStatement prep = conn.prepareStatement(
        "SELECT * FROM USERS WHERE " + 
        "NAME=? AND PASSWORD=?");
prep.setString(1, name);
prep.setString(2, password);
ResultSet rs = prep.executeQuery();

Whatever the user name and the password is, SQL injection is not possible. The attacker has no chance. The problem seems to be solved. So we are done? Not yet. Programmers are lazy, and don't always use placeholders. As you see, the secure code is quite a big longer than the insecure code. People tend to use the most simple solution for a given problem (there are some exceptions, for example excessive use of XML, but that's another story). So if you are a manager, how do you make sure all developers of your system use placeholders? And how do you ensure placeholders are used in existing code: old code that works, nobody wants to read, much less touch (for good reason)?

Disabling Literals

With "disabling literals" the database engine runs in a mode where text and number literals are not allowed as part of SQL statements. Only placeholders are allowed. Currently only the H2 database engine supports this feature. To activate it, execute the SQL statement:

SET ALLOW_LITERALS NONE

The default would mode is ALLOW_LITERALS ALL, and there is also mode to allow only numbers. After disabling literals, the insecure code does not run, instead the database simply throws an exception:

"Literals of this kind are not allowed"

Allowing numbers is not as strict as disabling all kind of literals, and may be seen as a compromise, however other cases of SQL injection are still possible in this mode. Lets say the application allows login based on user id instead of user name. The developer of the company "Half Secure Systems Inc." might write:

PreparedStatement prep = conn.prepareStatement(
        "SELECT * FROM USERS WHERE " + 
        "ID=" + id + " AND PASSWORD=?");
prep.setString(1, password);
ResultSet rs = prep.executeQuery();

Here a PreparedStatement is used, but the user id is not set via placeholder for whatever reason. In the database, user IDs are numbers, not text, so this works even when allowing number literals in the database engine. But here is a user id that works always:

1 OR 1=1

In this case, the SQL statement becomes

SELECT * FROM USER WHERE 
ID=1 OR 1=1 AND PASSWORD=?

So disabling all kinds of literals is clearly better. With all literals disabled, a working solution is:

PreparedStatement prep = conn.prepareStatement(
        "SELECT * FROM USERS WHERE " + 
        "ID=? AND PASSWORD=?");
prep.setInt(1, Integer.parseInt(id));
prep.setString(2, password);
ResultSet rs = prep.executeQuery();

Now number literals are also disabled. Disabling numbers may be a problem in legacy applications where numbers are hardcoded in the application. Let's say we have an ITEMS table, and this code lists all active items:

SELECT NAME FROM ITEMS WHERE ACTIVE=1

This statement is secure (no user input is embedded in the statement), however the database rejects it if number literals are disabled. What to do now? By using another new feature, called Constants. Constants are very common in "regular" programming languages such as Java (static final), but not common in SQL. Constants are very handy, they are descriptive and using them avoids typos. Among SQL databases only the H2 Database Engine currently supports constants. In other databases you could use user defined functions. In H2 constants are set like this:

CREATE CONSTANT TYPE_ACTIVE VALUE 1

The constant needs to be created while value literals are still allowed, of course. By the way, you can also create constants with prefix, like Const.ACTIVE if you like this better. The constants may be used as follows:

SELECT NAME FROM ITEM WHERE ACTIVE=TYPE_ACTIVE

So now we have enforced the use of placeholders and constants, and the problem is solved. Only one database engine (H2) currently supports those features, however. You have two options:

Plan A: Use H2 as your production database engine

Plan B: Use H2 to run unit tests

Plan A may not be your choice, as maybe you don't want to use such a 'young' database for production. But Plan B may be a working solution for those who don't want to wait until their database of choice supports Disabling Literals and Constants. Both features are not patented, and if you tell your database vendor to support them they might do it.

So the SQL injection problem is solved? Not fully, unfortunately.

ORDER BY Injection

A cool feature, not only in web applications, is the ability to sort a list on each column. Sometimes sorting is done in the application tier, or even (using Javascript) on the client side. However sometimes sorting is done in the database. An easy solution (and developers like easy solutions) is to place a link on each column header, and include the column name in the link. Usually 'table tools' do the same. The link to order by 'name' might look like this: list.jsp?order=name. The application then dynamically builds the SQL statement to order by the specified column:

ResultSet rs = stat.executeQuery(
        "SELECT ID, NAME FROM ITEMS ORDER BY " + order);

Now, manipulating links is simple. Some people might say: "Oh, what can possibly happen, an attacker can only sort on other columns, no problem". Well, a clever attacker could do other things as well, for example retrieve the admin password. Using the following 'order by column':

CASE WHEN (SELECT PASSWORD FROM USERS WHERE NAME='ADMIN') LIKE 'S%' THEN ID ELSE -ID END

If the id in the resulting list is ordered ascending, the admin password starts with 's'. The attacker can retrieve even an 10 character password after at most 10*64=640 tries. And less than 70 tries, if he uses binary search. If the process is automated this might take at most a few seconds. If literals are disabled the attack would be much, much harder. But SQL Injection is still possible.

Sidenote: Storing only the (cryptographic) hash code of the password would help. For details see the method storePasswordHashWithSalt() in the source code. Many systems don't store the hash code of the password so that they can send you your old password by email. Technically it is not possible to compute the password from a hash code so systems that can give you your password keep the password in plain text or encrypt it in some reversible way. This is not really secure.

Can placeholders be used? In SQL, 'ORDER BY 1' means 'order by the first column'. But unfortunately, most databases don't support placeholders here (ORDER BY ?). A solution is to verify the column name is a valid identifier. Technically, this does not prevent a SQL injection, but prevents attacks. In Java this could be done using:

if (!order.matches("[a-zA-Z0-9_]*")) {
    order = "1";
}

Other kinds of SQL Injection

The cases listed above are not the only SQL injection risks. Some creative web developers might put stored procedure names, table names, or even complete SQL statements in links or in hidden fields. Luckily this is much less common. One might be tempted to prohibit dynamic statements (i.e. only allow fixed statements) but when filtering and searching data, it is difficult to avoid dynamic statements. The best protection here is probably code analysis.

In a Nutshell

SQL Injection is possibly the biggest security risk for web applications. Most attacks can be prevented using a technique called 'Disabling Literals', which is currently only available in the H2 database engine.

Links

Wikipedia: SQL Injection

H2 Database Engine: Disabling Literals to solve SQL Injection

How Prevalent Are SQL Injection Vulnerabilities?

Source Code

The code is written for Java and JDBC. It is mostly database independent, but some parts only work with the H2 database.

package org.h2.samples;

import java.io.*;
import java.sql.*;

public class SQLInjection {
    
    Connection conn;
    Statement stat;
    
    public static void main(String[] args) throws Exception {
        new SQLInjection().run("org.h2.Driver", 
                "jdbc:h2:test", "sa", "sa");
        new SQLInjection().run("org.postgresql.Driver", 
                "jdbc:postgresql:jpox2", "sa", "sa");
        new SQLInjection().run("com.mysql.jdbc.Driver", 
                "jdbc:mysql://localhost/test", "sa", "sa");
        new SQLInjection().run("org.hsqldb.jdbcDriver", 
                "jdbc:hsqldb:test", "sa", "");
        new SQLInjection().run(
                "org.apache.derby.jdbc.EmbeddedDriver", 
                "jdbc:derby:test3;create=true", "sa", "sa");
    }
    
    void run(String driver, String url, String user, String password) throws Exception {
        Class.forName(driver);
        conn = DriverManager.getConnection(url, user, password);
        stat = conn.createStatement();
        try {
            stat.execute("DROP TABLE USERS");
        } catch (SQLException e) {
            // ignore
        }
        stat.execute("CREATE TABLE USERS(ID INT PRIMARY KEY, " + 
                "NAME VARCHAR(255), PASSWORD VARCHAR(255))");
        stat.execute("INSERT INTO USERS VALUES(1, 'admin', 'super')");
        stat.execute("INSERT INTO USERS VALUES(2, 'guest', '123456')");
        stat.execute("INSERT INTO USERS VALUES(3, 'test', 'abc')");

        loginByNameInsecure();
        
        if(url.startsWith("jdbc:h2:")) {
            loginStoredProcedureInsecure();
            limitRowAccess();
        }
        
        loginByNameSecure();
        
        if(url.startsWith("jdbc:h2:")) {
            stat.execute("SET ALLOW_LITERALS NONE");
            stat.execute("SET ALLOW_LITERALS NUMBERS");
            stat.execute("SET ALLOW_LITERALS ALL");
        }

        loginByIdInsecure();
        loginByIdSecure();
        
        try {
            stat.execute("DROP TABLE ITEMS");
        } catch (SQLException e) {
            // ignore
        }

        stat.execute("CREATE TABLE ITEMS(ID INT PRIMARY KEY, " + 
                "NAME VARCHAR(255), ACTIVE INT)");
        stat.execute("INSERT INTO ITEMS VALUES(0, 'XBox', 0)");
        stat.execute("INSERT INTO ITEMS VALUES(1, 'XBox 360', 1)");
        stat.execute("INSERT INTO ITEMS VALUES(2, 'PlayStation 1', 0)");
        stat.execute("INSERT INTO ITEMS VALUES(3, 'PlayStation 2', 1)");
        stat.execute("INSERT INTO ITEMS VALUES(4, 'PlayStation 3', 1)");

        listActiveItems();

        if(url.startsWith("jdbc:h2:")) {
            stat.execute("DROP CONSTANT IF EXISTS TYPE_INACTIVE");
            stat.execute("DROP CONSTANT IF EXISTS TYPE_ACTIVE");
            stat.execute("CREATE CONSTANT TYPE_INACTIVE VALUE 0");
            stat.execute("CREATE CONSTANT TYPE_ACTIVE VALUE 1");
            listActiveItemsUsingConstants();
        }
        
        listItemsSortedInsecure();
        listItemsSortedSecure();
        
        if(url.startsWith("jdbc:h2:")) {
            storePasswordHashWithSalt();
        }
        
        conn.close();
    }
    
    void loginByNameInsecure() throws Exception {
        System.out.println("Insecure Systems Inc. - login");
        String name = input("Name?");
        String password = input("Password?");
        ResultSet rs = stat.executeQuery("SELECT * FROM USERS WHERE " + 
                "NAME='" + name + "' AND PASSWORD='" + password + "'");
        if (rs.next()) {
            System.out.println("Welcome!");
        } else {
            System.out.println("Access denied!");
        }
    }
    
    public static ResultSet getUser(Connection conn, String userName, String password) throws Exception {
        PreparedStatement prep = conn.prepareStatement(
                "SELECT * FROM USERS WHERE NAME=? AND PASSWORD=?");
        prep.setString(1, userName);
        prep.setString(2, password);
        return prep.executeQuery();
    }

    public static String changePassword(Connection conn, String userName, String password) throws Exception {
        PreparedStatement prep = conn.prepareStatement(
                "UPDATE USERS SET PASSWORD=? WHERE NAME=?");
        prep.setString(1, password);
        prep.setString(2, userName);
        prep.executeUpdate();
        return password;
    }

    void loginStoredProcedureInsecure() throws Exception {
        System.out.println("Insecure Systems Inc. - login using a stored procedure");
        stat.execute("CREATE ALIAS IF NOT EXISTS " + 
                "GET_USER FOR \"org.h2.samples.SQLInjection.getUser\"");
        stat.execute("CREATE ALIAS IF NOT EXISTS " + 
                "CHANGE_PASSWORD FOR \"org.h2.samples.SQLInjection.changePassword\"");
        String name = input("Name?");
        String password = input("Password?");
        ResultSet rs = stat.executeQuery(
                "CALL GET_USER('" + name + "', '" + password + "')");
        if (rs.next()) {
            System.out.println("Welcome!");
        } else {
            System.out.println("Access denied!");
        }
    }
    
    void loginByNameSecure() throws Exception {
        System.out.println("Secure Systems Inc. - login using placeholders");
        String name = input("Name?");
        String password = input("Password?");
        PreparedStatement prep = conn.prepareStatement(
                "SELECT * FROM USERS WHERE " + 
                "NAME=? AND PASSWORD=?");
        prep.setString(1, name);
        prep.setString(2, password);
        ResultSet rs = prep.executeQuery();
        if (rs.next()) {
            System.out.println("Welcome!");
        } else {
            System.out.println("Access denied!");
        }
    }

    void limitRowAccess() throws Exception {
        System.out.println("Secure Systems Inc. - limit row access");
        stat.execute("DROP TABLE IF EXISTS SESSION_USER");
        stat.execute("CREATE TABLE SESSION_USER(ID INT, USER INT)");
        stat.execute("DROP VIEW IF EXISTS MY_USER");
        stat.execute("CREATE VIEW MY_USER AS " + 
                "SELECT U.* FROM SESSION_USER S, USERS U " + 
                "WHERE S.ID=SESSION_ID() AND S.USER=U.ID");
        stat.execute("INSERT INTO SESSION_USER VALUES(SESSION_ID(), 1)");
        ResultSet rs = stat.executeQuery("SELECT ID, NAME FROM MY_USER");
        while (rs.next()) {
            System.out.println(rs.getString(1) + ": " + rs.getString(2));
        }
    }

    void loginByIdInsecure() throws Exception {
        System.out.println("Half Secure Systems Inc. - login by id");
        String id = input("User ID?");
        String password = input("Password?");
        try {
            PreparedStatement prep = conn.prepareStatement(
                    "SELECT * FROM USERS WHERE " + 
                    "ID=" + id + " AND PASSWORD=?");
            prep.setString(1, password);
            ResultSet rs = prep.executeQuery();
            if (rs.next()) {
                System.out.println("Welcome!");
            } else {
                System.out.println("Access denied!");
            }
        } catch(SQLException e) {
            System.out.println(e);
        }
    }

    void loginByIdSecure() throws Exception {
        System.out.println("Secure Systems Inc. - login by id");
        String id = input("User ID?");
        String password = input("Password?");
        try {
            PreparedStatement prep = conn.prepareStatement(
                    "SELECT * FROM USERS WHERE " + 
                    "ID=? AND PASSWORD=?");
            prep.setInt(1, Integer.parseInt(id));
            prep.setString(2, password);
            ResultSet rs = prep.executeQuery();
            if (rs.next()) {
                System.out.println("Welcome!");
            } else {
                System.out.println("Access denied!");
            }
        } catch(Exception e) {
            System.out.println(e);
        }
    }
    
    void listActiveItems() throws Exception {
        System.out.println("Half Secure Systems Inc. - list active items");
        ResultSet rs = stat.executeQuery(
                "SELECT NAME FROM ITEMS WHERE ACTIVE=1");
        while (rs.next()) {
            System.out.println("Name: " + rs.getString(1));
        }
    }

    void listActiveItemsUsingConstants() throws Exception {
        System.out.println("Secure Systems Inc. - list active items");
        ResultSet rs = stat.executeQuery(
                "SELECT NAME FROM ITEMS WHERE ACTIVE=TYPE_ACTIVE");
        while (rs.next()) {
            System.out.println("Name: " + rs.getString(1));
        }
    }

    void listItemsSortedInsecure() throws Exception {
        System.out.println("Insecure Systems Inc. - list items");
        String order = input("order (id, name)?");
        try {
            ResultSet rs = stat.executeQuery(
                    "SELECT ID, NAME FROM ITEMS ORDER BY " + order);
            while (rs.next()) {
                System.out.println(rs.getString(1) + ": " + rs.getString(2));
            }
        } catch(SQLException e) {
            System.out.println(e);
        }
    }
    
    void listItemsSortedSecure() throws Exception {
        System.out.println("Secure Systems Inc. - list items");
        String order = input("order (id, name)?");
        if (!order.matches("[a-zA-Z0-9_]*")) {
            order = "id";
        }
        try {
            ResultSet rs = stat.executeQuery(
                    "SELECT ID, NAME FROM ITEMS ORDER BY " + order);
            while (rs.next()) {
                System.out.println(rs.getString(1) + ": " + rs.getString(2));
            }
        } catch(SQLException e) {
            System.out.println(e);
        }
    }

    void storePasswordHashWithSalt() throws Exception {
        System.out.println("Very Secure Systems Inc. - login");
        stat.execute("DROP TABLE IF EXISTS USERS2");
        stat.execute("CREATE TABLE USERS2(ID INT PRIMARY KEY, " + 
            "NAME VARCHAR, SALT BINARY, HASH BINARY)");
        stat.execute("INSERT INTO USERS2 VALUES" + 
                "(1, 'admin', SECURE_RAND(16), NULL)");
        stat.execute("DROP CONSTANT IF EXISTS HASH_ITERATIONS");
        stat.execute("DROP CONSTANT IF EXISTS HASH_ALGORITHM");
        stat.execute("CREATE CONSTANT HASH_ITERATIONS VALUE 100");
        stat.execute("CREATE CONSTANT HASH_ALGORITHM VALUE 'SHA256'");
        stat.execute("UPDATE USERS2 SET " +
                "HASH=HASH(HASH_ALGORITHM, STRINGTOUTF8('abc' || SALT), HASH_ITERATIONS) " +
                "WHERE ID=1");
        String user = input("user?");
        String password = input("password?");
        stat.execute("SET ALLOW_LITERALS NONE");
        PreparedStatement prep = conn.prepareStatement(
                "SELECT * FROM USERS2 WHERE NAME=? AND " + 
                "HASH=HASH(HASH_ALGORITHM, STRINGTOUTF8(? || SALT), HASH_ITERATIONS)");
        prep.setString(1, user);
        prep.setString(2, password);
        ResultSet rs = prep.executeQuery();
        while (rs.next()) {
            System.out.println("name: " + rs.getString("NAME"));
            System.out.println("salt: " + rs.getString("SALT"));
            System.out.println("hash: " + rs.getString("HASH"));
        }
        stat.execute("SET ALLOW_LITERALS ALL");
    }

    String input(String prompt) throws Exception {
        System.out.print(prompt);
        BufferedReader reader = new BufferedReader(new InputStreamReader(System.in));
        return reader.readLine();
    }
    
}

My Tags:

  • Stephen: May 09, 2007 07:42

    For some systems, i've implemented the authentication with dbm, gdbm, or even a plain text file. There's no possibility for SQL insertion because, well, there's no SQL.

    My interface to the SQL database is entirely dynamic. Dynamic strings going in. Dynamic data coming out, without meaningful restrictions. So my library has some heroics to prevent insertion. Few restrictions for the caller - like exactly one sql statement per call. It was a pain to write, once. And, it's complicated enough that it's difficult to be sure i've covered every possible attack. Yet, i'm pretty confident. It's a library, so i can easily reuse my mistakes.

  • Add Comment

Anatomy Of A Successful Project #1, Fuzzy Vehicle Search Engine
Feb 19, 2007 08:51 perm link Readers: 1329

I write a lot of posts on software project disasters and stupidity which can be both instructive and entertaining. It's time to begin a series on projects from the other side of the coin: they actually succeeded.

The first post in this series covers the development of the Consumer's Digest Online website in the late 1990's. Despite the success of the project itself, don't both looking for the site, as CD Online failed to pay their bills for the development and hosting of the site in 2001 and it was killed. The magazine still exists but has never built a replacement website since as far as I know, the owner of the debt may still have a valid claim. So in a way the project both succeeded and failed.

The company I worked for, Tensor Information Systems (a consulting firm which last quite a while but ultimately died in 2001) received a contract from CD Online to build an extremely challenging website and product search engine. The core idea was to provide the customers on the web the ability to search for products based on wide-ranging set of criteria, with the ability to find vehicles that could be configured to meet the criteria as the major feature. The key to the vehicle search would be to allow the user to specify features, and then have the search engine attempt to find cars that could be configured based on the auto maker's own rules for packages and features. This had never been done (even by the auto makers themselves) and as far as I know, hasn't been repeated since. This feature had scared every firm that CD had approached (one had actually tried to do it earlier and gave it up after a year of effort) but our firm was aggressive (and possibly foolhardy as well). The magazine had printed in the magazine that the site would be available in November around the time that we accepted the project, which gave us a six month development window. Unfortunately contract negotiations whittled the time to an almost impossible 3 months before coding could actually start.

Since our company had no actual experience with search engines we hired a contractor who supposedly had some experience while the rest of the team worked on the remaining site features. The technology we used at the time was Apple/NEXT's WebObjects written in Objective-C. We would be hosting the site ourselves on HP/UX and using Oracle as a database. CD Online would be responsible for entering all the data from their product databases in addition to all the data the vehicle manufacturers provided (model information, feature/package rules, pricing, etc) once we had built a data entry tool.

I wasn't a part of the team to start with.

The contractor we hired wasn't up to designing the core search database so one of our senior people was dumped into the project and had to design a database on the fly so that development could continue. The core problem was that the rules determining what feature and packages were combinable for vehicles were complex and not easily coded into a relational database. Basically features (like leather steering wheels, heavy-duty suspensions, engines, etc) are combined into packages (collections of features) that can be combined according to rules (package X requires packages Y and Z and optionally G or E, but disallows packages Q and T). The rules were inheritable, so if you had package X and required package Z, you would also require whatever package Z required. Some vehicles had as many as 10,000,000 different potential package combinations. The rules existed for both the obvious (you can't have two engines) and the sublime (you could only have leather seats with sport mirrors).

The database would ultimately have ten's of thousands of features, products, packages and their rules.

Building a search engine for such a fuzzy set of rules coded in a relational database seemed impossible (to all the other firms who had turned down or failed to build this). The contractor we had hired to build this core technology delivered some working code halfway throughout the short schedule. It did a preliminary search of the database and then sorted through the results in memory, which worked well for him in his development database which had only a few items in it. Once it was run on the growing production database (data was still being entered) it became unusable. Running on production hardware it took an average of 70 seconds to return a single set of results, and often the results were completely wrong. Now the project was six weeks away from being public and the core (and star) feature was unusable. Bad karma. He was fired.

Our resident DBA (not really a designer) and the architect (who had designed the database in a hurry) now tried to optimize the SQL query at the core of the problem, which when printed out was 5 pages long and appeared to to be the main culprit. No matter what they tried nothing really changed. It was clear that a redesign of the database might help but there was no time since so much of the code (and the data entry) was already complete. The hardware was already purchased and setup, and adding 10x hardware was not affordable to CD Online (and in case would not drop the response time to an acceptable level anyway).

I shared an office with the architect and overheard the wailing and gnashing every day (I was in charge of a project for the post office) and offered to help around 4 weeks before the deadline. My idea was a way to solve a problem where you couldn't change the code, the database design, or the hardware, and do it in only 4 weeks. It took some mighty convincing arguments before I was even allowed to investigate the idea (they continued to try the optimization of sql). My guess was that even though the number of combinations appeared almost infinite, the might be a way to "predigest" the data in such a way that it could be search entirely in memory instead of in the database and bypass Oracle entirely.

My first plan was to read the requisite data from the database and then do some statistics on it. Basically the rules data resulted in a tree of possibilities (and/or/not) for each vehicle model (non vehicles were also in the database but rarely had rules for configuration). With hundreds of models and about a dozen manufacturers the trees appeared infinite but I found I if could winnow them down to sets of recurring subtrees, the tree of trees could be represented fully in memory in around 20MB. The data was sorted by price (so as you went down a tree path you knew the price of the vehicle based on the current configuration) and only sufficient data was kept to allow one to run the search; at display time Oracle was consulted to fetch the actual displayable detail.

The key here was to extract the data from Oracle weekly into a set of static files which were digested into a form which could be read into memory when the application launched (actually into each instance). Then I wrote the actual search engine in C that traversed the trees based on the existing code that collected the user's requests. Another complicating factor was the the HP/UX memory allocated was horrible on deleting objects (I had written a commercial memory allocator library and knew all the tricks) so I created a suballocator for my own use and pre-allocated enough space at launch for both the data and the searches to avoid the HP/UX overhead. After the search ran I would grab all the data from Oracle and pass the result back to the existing code which displayed it to the user.

All of this happened in four long weeks of work. It worked, returning searches in less that 1 second on any set of criteria (which were extensive) and no code changes were necessary in the rest of the application so we were able to start serving CD Online around the original magazine promise date. The website was an immediate hit and would serve around one to two thousand users simultaneously on most days.

The most amazing thing to me (other than this worked at all!) was that no bugs were ever found in the runtime search engine during its lifetime. Later on to increase the number of instances available to serve the site, someone else built a shared memory version of the data so only one copy would load (it was static); this allowed us to double the instances without adding more RAM.

Sadly CD Online ran up a bill of around $800,000 before Tensor pulled the plug. This killed CD Online and ultimately contributed to Tensor's downfall as well. Since then a lot of vehicle search engines have appeared (think vehix.com for example) but no one has attempted anything like ours again (I could be wrong).

The lesson to be learned from this project is that sometimes an impossible problem can be solved, if you only look differently at the problem. Another lesson I learned is that fuzzy data and relational databases are a nasty combination, and that often RAM is your friend. Today, I would probably look at a distributed solution (think Google) but in 1999, and facing the limit of unchangeable code, database and hardware), this turned out to be the only successful solution.

Next time: Sabre save a bundle.

My Tags:

  • Bob Rossney: Feb 19, 2007 12:54

    Not to be a jerk, but I think that calling a project that never went into production and helped put the company which developed it out of business "successful" simply because the software it generated happened to work is using a definition of "success" that ultimately is not very useful.

    Not only did it fail from a business perspective, you really don't know (since it never went into production) that it succeeded from a technical perspective. There are any number of reasons that a program that works in your test environment will die in the wild, and I'm sure you can think of more than one or two.

    Steve Jobs (who wasn't worried about being thought a jerk) famously said "Real artists ship." Having run a fantastically successful failed project myself, I know exactly what he meant. I think you do too.

  • Philo: Feb 19, 2007 14:34

    Reread the article carefully.

    The author doesn't indicate an exact start date, but mentions that the site was designed (in six months) in "the late 90's." It was killed for lack of payment in 2001 - so it seems to me it ran for a few years, at least.

    In addition: "so we were able to start serving CD Online around the original magazine promise date. The website was an immediate hit and would serve around one to two thousand users simultaneously on most days." (followed by a paragraph about improvements that were made later)

    I'd be hard-pressed to indict a project that worked and was popular just because the bean counters couldn't figure out a business plan or pay their bills. The vehicle search may have been a failed business effort, but it sounds to me like a very successful project.

  • codist: Feb 20, 2007 08:20

    CD Online ran from around November 1998 until sometime around the start of 2001.

  • Bob Rossney: Feb 20, 2007 12:08

    My bad. I overlooked the part about it being up and running for a couple of years; the last 3 paragraphs sure make the project sound like what I was describing. Yeah, I'd call that a win too, then.

  • Add Comment

WTF IT Stories #1: It's Not The Database, Stupid!
Feb 11, 2007 21:03 perm link Readers: 17757

I went to work for a financial services company and soon found myself in the middle of a long war between the AS/400 (now called the System i) group and the java development group. Like all war stories one can learn a lot in the retelling.

Apparently in this company the AS/400 had been the database of record for a long time, and even when java web apps began to be built, they were required to access the data via the DB/2 variant that ran on top of the AS/400 operating system. Now this system is superb for running batch applications written in RPG, even though the team actually wrote all of their applications in a 4GL type of environment. The operating system is actually built around a database of sorts, but its not relational and is really designed for fast sequential I/O. The DB/2 variant actually sits on top of the OS database more as a client than anything else. The files beneath the DB/2 tables are not managed by DB/2 at all as the batch applications don't even know it exists. This makes for much fun.

The AS/400 team was considered the cream of the IT group and thus had a lot of authority and clout with upper management.

One issue facing the Java team was that the 4GL tool managed its constraints internal to the tool, and this was not available to any outside application including DB/2. Thus any constraints defined in the relational database were not adhered to by any batch applications accessing the same files (ie tables). Thus there was no requirement for primary keys or unique columns or foreign keys in the data to actually be unique or even point to anything (the batch applications had no such concepts). Writing web apps on top of such data became a massively frustrating exercise where any query could return multiple results for primary key or unique queries, dangling foreign key references, etc. On top of that the 4GL only used a few datatypes, and all others were defined internal to the tool, thus there were 7 different kinds of dates none of which were real SQL dates. All of this had to be dealt with manually in java code. The column and table names were also tool generated and were basically random names until someone manually built a view layer which gave real names to everything.

Needless to say the users of the web apps were eternally unhappy as data became missing, or random exceptions happened randomly, and such problems caused no end of complaints. The field staff (which were contract) had to pay for every application use and even their hardware and thus this caused them financial pain. Naturally the Java team (being the relatively new folks on the block) were considered talentless, malingering and slow.

But this was only a sidelight to the real battle. The main web app used by the field staff (basically their only tool to do everything) was mind-numbingly slow. Logging in took minutes, searching for a client was an exercise best done over lunch. No one dared to use the app with a client in the office for fear of looking like an idiot. And of course it was the Java developer's fault.

Oddly enough the application ran very fast when run against the Dev and QA partitions in the AS/400 (the system supports multiple partitions with own memory pool, fractional CPU usage, and disk mappings). When the same code ran against the Production partition the performance turned to stone. However this fact was ignored by the AS/400 team, as their platform was known to be incredibly fast and self-tuning, and thus it had to the the fault of Java. Or the Java programmers. Or JDBC. Or anything except the database.

So they hired J2EE consultants, Java performance experts, basically anyone who claimed they could fix the web apps (since the Java programmers were talentless, malingering and slow). They found nothing, which only showed that Java was the problem anyway (must be stupid consultants). Then they hired an AS/400 expert from IBM who went over the entire system and found everything to be in excellent shape and very fast (however he admitted knowing nothing about DB/2 and only evaluated the OS and the batch environment). Thus the proof was there that the Java team and their language was, well, talentless, malingering and slow. It's not the database, stupid!

OK, now I showed up as a new hire and wanted to find some way to speed things up. Of course it was obvious to me (and everyone else in the Java team) that the problem was in the AS/400 but how to prove it? It turned out one of the DBA's (Oracle, really smart guy) had built a mirror of the web tables from the AS/400 on an old desktop running Oracle. Now now one in the Java team was allowed access to the Production partition on the AS/400 but the DBA did. So I had an idea.

I built a simple Java application which scanned a schema using JDBC, read the names of all the tables, then built a simple "SELECT * FROM XXX FETCH FIRST 1 ROWS ONLY" for both databases and ran this for every table, tracking the time necessary (thus measuring only the time to access the first row of the table and overhead). I got access from the DBA's to both the production DB/2 on the AS/400 and the Oracle running on the old desktop. We did this without telling anyone what I was doing.

After running the tests a number of times I produced a chart of the results. The Oracle version ran an order of magnitude faster, even though it was on a pathetic piece of hardware, and the AS/400 was the database hardware of record. I then sent it to everyone I could think of. All we heard from the AS/400 group then was "hummana-hummana-hummana" as there wasn't anything to direct blame to. Same code, same machine, same network, same SQL, using the production JDBC drivers from IBM and Oracle. Oopsy.

So they were ordered to have their DBA (really just an analyst) run the code I gave him (same results) and then he watched the console from the AS/400 while the production web app ran (which they could have done years earlier) and low and behold, as soon as a big query came in (like find a client!) the system began to page like mad for minutes. It seemed that the minimum memory for the partition running DB/2 through which the entire fieldstaff ran their work had only 80MB of RAM. Ouch. The AS/400 would gradually increase the memory but it was tuned for batch applications, not instant web queries. Oddly enough if several queries came in sequence it got faster as some more memory would be applied. But then it would drop back down again.

After a bit more time of complaining there wasn't enough RAM in the AS/400 (like 16GB) they finally gave in and increased the memory to 1GB as a floor. And all of the performance problems went away just like that.

And no one ever said a thing after that, like the war never happened. Of course the data problems continue to this day (I no longer work there of course) but at least you can find a client now in a few seconds.

And yeah, it was the stupid database!

My Tags:

  • Adam: Feb 12, 2007 09:06

    I have experience developing web applications accessing data on both System i machines and Oracle database servers. Generally the code is the same or very similar. I also have some experience developing native AS/400 programs in RPG. I have not encountered any problems like the one described, but I can see how this problem could happen. In most larger System i (AS/400) shops, there is a big division between the development and operations staff. There generally isn't anyone with actual DBA experience because, to a large extent, the system is self-tuning. Operations doesn't have to worry about most DBA tasks, and batch programmers don't have to either. Having said all of that, the root cause of the problem should have been found by operations (of course I'm a developer, other people's opinions may vary).

    I have to respond to some of the article. Here are some of the things which jumped out at me:

    * ... the team actually wrote all of their applications in a 4GL type of environment.

    * On top of that the 4GL only used a few datatypes, and all others were defined internal to the tool, thus there were 7 different kinds of dates none of which were real SQL dates.

    Ugh! There's the first problem. Of the (AS/400) 4GL environments that I have seen, all of them produce junk code. This sounds like an especially bad 4GL. This probably lead to many of the data problems which are mentioned in the article. On the other hand, the author is wrong about a few things:

    * The DB/2 variant actually sits on top of the OS database more as a client than anything else. The files beneath the DB/2 tables are not managed by DB/2 at all as the batch applications don't even know it exists.

    This isn't really true. The database is built into the operating system, there's just more than one way to access it. The files 'beneath' the DB/2 tables can be managed by DB/2. Apparently, in this case, they weren't. You can add primary and foreign key contraints to System i files, and you'll get errors if you try to break them. It was probably too much trouble to change their 4GL programs to handle those errors, so they didn't add the constraints. From here, the author describes what the consultants found:

    * They found nothing, which only showed that Java was the problem anyway (must be stupid consultants). Then they hired an AS/400 expert from IBM who went over the entire system and found everything to be in excellent shape and very fast (however he admitted knowing nothing about DB/2 and only evaluated the OS and the batch environment).

    Wow, there was definitely some stupidity here. Either they should have asked IBM for someone with experience in setting up the System i as a database server, or they should have gotten their money back from IBM. Because the actual problem that they eventually found is not real complex. However, all of the blame shouldn't be heaped on the 'AS/400 expert'. The Java experts which were brought in could have done what the author did (built a copy of the database in Oracle and compared times to access it versus the System i database). I think this whole mess shows the value of asking the right questions. It seems like both teams just asked if their stuff was okay. When they were told that their stuff was fine, they assumed it was the other team's (or system's) fault. Nobody asked how can we make the existing system work better.

  • rkeene: Feb 12, 2007 11:29

    Similar problem on another project. The only difference was that it was the number of WebSphere threads allocated. Soon as we raised it preformance problems evaporated.

    Definately the same problem with the 'expert' WebSphere consultant that came in.

  • Al: Feb 16, 2007 15:19

    ""And yeah, it was the stupid database!". No it wasn't!

    This article is interesting, but the author clearly is not truly familiar with the AS/400 and makes a number of erroneous statements, coming across as just a little too superior.

    "The DB/2 variant actually sits on top of the OS database more as a client than anything else."

    Well, not really. In fact, not at all. If there is one defining characteristic of the AS/400 integrated relational database, it is that it does NOT sit on top of the operating system, as in ALL other platforms, but is integrated into the operating system. Much of the database functionality on an AS/400 sits BELOW what is called the TIMI (Technology Independent Machine Interface), which, for Java people can be thought of as roughly equivalent to the virtual machine/bytecode interface of a JVM.

    The database is relational, has been since most modern relational databases were just glints in the eyes of their creators. The AS/400 database often leads all other DB2 products in capability. For example, it was the first DB2 on any platform to deliver Vector Encoded Indexes, which is a technology similar in purpose, but technically superior to, the bitmap indexes offered by other database vendors.

    To deal with the relational integrity issues the author describes, there would have been no problem in adding foreign key and other integrity constraints to the underlying AS/400 tables, as the previous poster suggested.

    The data types problem the author describes is a limitation of the 4 GL tool, perhaps, but mechanisms exist on the AS/400 to do "virtual data types" if you like, so the date conversion could have been done there before being made available to the users as views, so the conversions didn't HAVE to be done in Java code. SQL data types are available, so they could have been used, but it seems the 4 GL tool stood in the way of redefining the underlying tables.

    Most importantly, though, the problem he describes is not a database problem, but one in an are on the AS/400 call "Work Management", which includes the concepts of tuning and how resources are allocated to executing jobs, and how those jobs are placed on appropriate queues for execution etc.

    It is true that AS/400 sites often do not have deep ranks of highly skilled "tuning" and "DBA" gurus like, for example, Oracle sites do. This is because the AS/400 does a remarkably good job of tuning itself, and most of the time these people are not needed. Most Oracle shops however, would go down in a few weeks or months if the "gurus" went on vacation!

    The subsystem memory allocation problem the author describes is unusual in that it is an exception to this rule - the result of insufficient thought being given to the Work Management parameters for the production environment. This is a configuration and tuning issue, and NOT "the stupid database" as the author claims.

    This shop clearly could have done with someone with a few more clues in tuning the AS/400. The fact that the application had good response times for the development and QA environments should have been enough for people to realize that it wasn't "Java" that was the source of the problem. (But what can you do - true rocket scientists go build rockets for more money. We, on the other hand, who build and run software systems, keep being squeezed by the bean counters trying to send our jobs to India. In the end, some of the more talented are persuaded to pursue other careers that are more satisfying.)

    I'm glad for this author, and his client, that he found the problem when others didn't. But this doesn't justify his ill-informed and supercilious posturing about what remains one of the most elegant architectures, and robust, enterprise-level operating systems the industry has ever seen.

    If the author truly wants to understand the AS/400 and its capabilities, I suggest he glance at "Inside the AS/400" by Frank G. Soltis. You'll never again look at competing PC or server architectures with the same respect!

  • DoctorEternal: Feb 21, 2007 17:10

    Ah, the memories... Over the years I've had more "AS/400 converting to Java" students than I can count (I'm a Sun Java Instructor). I personally still love the ole AS/400s. Clink, clank clunk!

    Dr.E

    http://www.turingshop.com/reports/01Java/

  • Add Comment

Write Your Own Database, Again: An Interview With The Author Of H2, Thomas Mueller
Nov 13, 2006 18:30 perm link Readers: 1190

This is an interview I did via email with Thomas Mueller, the author of the open source H2 relational database, an earlier database Hypersonic SQL, and Pointbase Micro. I use H2 in all of my projects (including the software behind this blog). Very few people have written a relational database, much less more than one.

A brief biography (ie your education, where you live, work, whatever you want to share)

My name is Thomas Mueller, I was born and now live in Switzerland. For two years I lived in California while working for PointBase. I miss the weather there but otherwise Switzerland is a very nice place, I like it a lot. I now live together with my girlfriend in a small town in Switzerland. I have a bachelor's degree in computer science.

What do you do for a living?

I now work for Day, where I work on their products (CRX and Communiqué) and also on Apache Jackrabbit, the JCR reference implementation. So, the H2 project is my hobby (however it takes most of my free time).

Why did you start writing a relational database, HSQLDB?

Actually I didn't write HSQLDB, but the predecessor of HSQLDB: Hypersonic SQL. I wanted to learn Java back in 1996. First I wrote some cool applets (for example Hypersonic Fractals). Afterwards, I started to write Hypersonic SQL. I didn't really know a lot about database engines at that time, but it was fun learning it.

How long did it take you? Did you have any help?

Hypersonic SQL started in 1998 and it took around two years until it was somewhat useful. Hypersonic SQL was almost completely my source code, however I had a lot of help from people using it in the form of bug reports and feedback.

HSQLDB was started in 2001 using the source code of Hypersonic SQL, and there were multiple people working on this project. I was only involved in this project a little bit. The HSQLDB project was started when the company I was working for (PointBase) told me I should stop working on Hypersonic SQL. That was a bit strange, because the original plan was that I would continue to work on it. Anyway, that is the past. I will not make the same mistake again and let someone tell me I should stop working on H2.

Why did you write yet another database, H2?

I was not satisfied with the other databases.

What are the most important features in H2? What distinguishes it from other open source databases?

There is no single most important feature. For me, H2 is a combination of: fast, stable, easy to use, and features. But I'm sure there is still a lot of room for improvement for all those points. So I will continue to fix bugs, write additional tests, improve the performance, and add new functionality. I think compared to other databases, H2 is already very good, but it is far away from 'done'.

Do you think you will ever be able to work full time on H2?

I don't know, time will tell. My plan is to continue to work on it like now for a few years until H2 has enough market share. Then I hope to provide commercial support and / or consulting. My plan is to start slowly. I think it will take about 5 years until I work full time on it. In any case I will try to do that in cooperation with my employer. This is the current plan, and it may still change.

 

H2 relational database

My Tags:

Name:


Optional URL:


Comment:


Save Cancel

Copyright © 2007 By Andrew Wulf