API pricing and usage

From: Philip David Harvey [mailto:mail@5000wpm.co.uk]
Sent: 01 October 2011 15:16
To: rjdedwards@yahoo.com; Richard Edwards; Phil Harvey
Cc: mail@5000wpm.co.uk
Subject: API pricing and usage

I thought this was quite an interesting layout (it is for weather underground - the open source weather system. I was scraping this, but I decided to sign up because it is 'free' for commercial use)
Thought it might be of interest

Phil.

From Evernote:

Screenshot

79a0c6fad9d86c27b59b054f131dce

Oracle docs show plans for Hadoop, NoSQL — Cloud Computing News

There has been speculation for a while now that Oracle might someday make its foray into the Hadoop and NoSQL spaces, and next week looks like that time.

With regard to Hadoop, CEO Larry Ellison made it clear during last week’s earnings call that the company is working on a connector that will let customers load unstructured data from Hadoop into their Oracle Exadata appliances. Now we have proof — and Oracle’s big data plans don’t stop with Hadoop.

Updated: I’ve seen some Oracle-produced content highlighting the company’s plans for a big data platform, apparently slated for launch in the second half of 2012, that not only includes the Hadoop connector — called Oracle Loader for Hadoop — but also a NoSQL database. The goal, it seems, is to let customers acquire data from whatever sources they please and then feed it into an Oracle Exadata data warehouse system. Once there, data can be analyzed via number of means, including existing Oracle technologies such as in-database MapReduce, mining and statistical analysis with R.

Reaffirming this information Thursday, Larry Dignan at ZDNet highlighted that the Oracle Loader for Hadoop is the topic of multiple sessions at next week’s Oracle OpenWorld conference as is “Oracle NoSQL Database.”

What remains to be seen, though, is how heavy Oracle — which has enabled Hadoop integration for some time, actually — will actually invest in Hadoop and NoSQL now that it appears interested in productizing them. Will Oracle sell a physical Hadoop appliance, as Ellison alluded to and as competitor EMC is doing, or is the connector as far as it goes? Will Oracle support any Hadoop or NoSQL distributions, or will it create its own like it did with Linux?

That database kingpin Oracle is getting into these markets is great validation for the technologies — and certainly will please Oracle customers wanting formal support for Oracle-Hadoop-NoSQL environments — but how it decides to do business within these spaces could be even more meaningful. Oracle buying up a company or two would be a very big deal to everyone else in these spaces, as it would both add and eliminate competition in one fell stroke. On the other hand, Oracle going too proprietary could limit its effectiveness in spaces dominated by open source technologies with plenty of hype and investment to thrive on their own.

Image courtesy of Flickr user RachScottHalls .

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.

Oracle going all NoSQL on us...maybe

API design for humans - (37signals)

API design for humans Noah Sep 28

19 comments Latest by DougReeder

One of the things about working with data at 37signals is that I end up interacting with a lot of different APIs—I’ve used at least ten third-party APIs in the last few months, as well as all of our public APIs and a variety of internal interfaces. I’ve used wrappers in a couple different languages, and written a few of my own. It’s fair to say I’ve developed some strong opinions about API design and documentation from a data consumer’s perspective.

From my experience, there are a few things that really end up mattering from an API usability perspective (I’ll leave arguments about what is truly REST, or whether XML or JSON is actually better technically to someone else).

Tell me more: documentation is king

I have some preferences for actual API design (see below), but I will completely trade them for clear documentation. Clear documentation includes:

  • Examples that show the full request. This can be a full example using curl like we provide in our API documentation, or just a clear statement of the request like Campaign Monitor does for each of their methods.
  • Examples that show what the expected response is. One of the most frustrating things when reading API documentation is not knowing what I’m going to get back when I utilize the API—showing mock data goes along way towards this. Really good API documentation like this would let you write an entire wrapper without ever making a single request to the API. Campaign Monitor and MailChimp both have good, but very different takes on this.
  • A listing of error codes, what they mean, and what the most common cause of receiving them is. I’m generally not the biggest fan of the Adwords API in many ways, but they are a great example of exhaustively documenting every single response code they return.
  • A searchable HTML interface. Whether it’s visually appealing doesn’t really matter much, and Google indexing it is plenty of search. What doesn’t work for me is when the API documentation is in PDF, or I have to authenticate to get access to it.
  • Communication of versioning and deprecation schedules. There’s some debate about whether versioning is better than gradual evolution, but regardless, anytime you’re changing something in a way that might break someone’s existing code, fair warning is required, and it should be on your documentation site. Sometimes you have to make a change for security reasons that don’t allow much advance notice, but wherever possible, providing a couple of weeks notice goes a long way. The Github API clearly shows what will be removed when and shows the differences between versions clearly.

Let me in: all about authentication

Most APIs either use OAuth/OAuth2, user/password or API token via HTTP basic or digest authorization, or a special parameter included in each request. These all have advantages and disadvantages:

  • OAuth and the still developing OAuth2 are becoming the defacto standard for any API that primarily expects the API consumer to be a third-party application. It’s secure, relatively consumer-friendly, and relatively easy to use. The downsides: implementations vary slightly across service providers (I often hear that “the documentation is the implementation”), and it’s a lot of overhead for something where you just want a single piece of data.
  • User/password or special API token via HTTP basic or digest authorization is fast, doesn’t require anything more than curl, and you can even make some request from a browser. The downside here is that it’s not especially “friendly” to ask an end-user to go find an API token.
  • User/password or API token as a URL parameter fortunately isn’t very common. It’s more confusing than basic authentication, has all of the same downsides, and no real benefit.

There’s clearly a tradeoff here—OAuth is great for delegating authorization for a third-party, but API tokens are better for quick access to your own data. My preference is for an API to offer multiple methods of authentication so you can choose the best method for what you’re trying to do.

Underlying design: REST or something like it

There’s a formal definition of REST, and endless debates have erupted over whether a given API is truly RESTful or not. As a consumer of APIs, I don’t really care whether it’s technically RESTful, but there are a few “RESTlike” attributes that do matter:

  • Not SOAP. I know there are legitimate reasons to have a SOAP API, and I can imagine that translating a massive API from a legacy SOAP to REST interface is quite difficult, but it’s incredibly difficult to consume SOAP if you have to develop a client library from scratch or do anything non-standard.
  • Use HTTP verbs to mean something. Any API consumer is capable of sending GET, POST, PUT, and DELETE verbs, and they greatly enhance the clarity of what a given request does. It’s terrifying to use an API where a GET request can change the underlying data.
  • Sensible resource names. Having sensible resource names/paths (e.g., /posts/23 instead of /api?type=posts&id=23) improves the clarity of what a given request does. Using URL parameters is fantastic for filtering, but if everything is based on a single API endpoint and tons of parameters, the mental model required to use it gets too complex very quickly. I really like the way Assistly’s API has implemented resource paths and uses HTTP verbs.
  • XML and JSON. I like JSON. Some people like XML. Unless the costs of offering both are staggering, offer both. Ideally, you’ll let me switch between just by changing an extension from .xml to .json.
  • Use abstraction where it’s helpful. Your API implementation does not have to mimic your underlying application architecture. For example, the Adwords API has 25 distinct “services”, each of which seem like one module of the underlying implementation. This is logical if you implemented it, but that doesn’t make it particularly friendly to use—if you wanted to retrieve your click through rate for each ad you’re running, you’d end up making (literally) dozens of distinct requests. On the other hand, the Google Analytics data export API lets you define reports so that you rarely have to make more than one request, but you’re still only fetching what you need. If you think there’s some common action that people will go to the API for, you should make that action easy, even if it means compromising on ideology.

These are my preferences, and there are no hard and fast rules of API design. Regardless of whether you apply any of these principles in your next API, I would absolutely encourage you to look at your API from the standpoint of a consumer. Maybe try building something small using your own API – it can be an eye opening experience.

Behind the scenes: A/B testing part 2: How we test - (37signals)

A few weeks ago, we shared some of what we’ve been testing with the Highrise marketing page. We’ve continued to test different concepts for that page and we’ll be sharing some of the results from those tests in the next few weeks, but before we do that, I wanted to share some of how we approach and implement A/B tests like this.

Deciding what to test

Our ideas for what to test come from everywhere: from reading industry blogs (some examples: Visual Website Optimizer, ABtests.com), a landing page someone saw, an ad in the newspaper (our long form experiments were inspired in part by the classic “Amish heater” ads you frequency see in the newspaper), etc. Everyone brings ideas to the table, and we have a rough running list of ideas – big and small – to test.

My general goal is to have at least one, and preferably several A/B tests running at any given time across one or more of our marketing sites. There’s no “perfect” when it comes to marketing sites, and the only way you learn about what works and doesn’t work is to continuously test.

We might be simultaneously testing a different landing page, the order of plans on the plan selection page, and wording on a signup form simultaneously. These tests aren’t always big changes, and may only be exposed to a small portion of traffic, but any time you aren’t testing is an opportunity you’re wasting. People have been testing multiple ‘layers’ in their sites and applications like this for a long time, but Google has really popularized this lately (some great reading on their infrastructure is available here).

Implementing the tests

We primarily use two services and some homegrown glue to run our A/B tests. Essentially, our “tech stack” for running A/B tests goes like this:

  1. We set up the test using Optimizely, which makes it incredibly easy for anyone to set up tests – it doesn’t take any knowledge of HTML or CSS to change the headline on a page, for example. At the same time, it’s powerful enough for almost anything you could want to do (it’s using jQuery underneath, so you’re only limited by the power of the selector), and for wholesale rewrites of a page we can deploy an alternate version and redirect to that page. There are plenty of alternatives to Optimizely as well – Visual Website Optimizer, Google Website Optimizer, etc. – but we’ve been quite happy with Optimizely.
  2. We add to the stock Optimizely setup a Javascript snippet that is inserted on all pages (experimental and original) that identifies the test and variation to Clicky, which we use for tracking behavior on the marketing sites. Optimizely’s tracking is quite good (and has improved drastically over the last few months), but we still primarily use Clicky for this tracking since it’s already nicely setup for our conversion “funnel” and offers API access.
  3. We also add to Optimizely another piece of Javascript to rewrite all the URLs on the marketing pages to “tag” each visitor that’s part of an experiment with the experimental group. When a visitor completes signup, Queenbee – our admin and billing system – stores that tag in a database. This lets us easily track plan mix, retention, etc. across experimental groups (and we’re able to continue to do this far into the future).
  4. Finally, we do set up some click and conversion goals in Optimizely itself. This primarily serves as a validation—visitor tracking is not an exact science, and so I like to verify that the results we tabulate from our Clicky tracking are at least similar to what Optimizely measures directly.

Evaluating the results

Once we start a test, our Campfire bot ‘tally’ takes center stage to help us evaluate the test.

We’ve set up tally to respond to a phrase like “tally abtest highrise landing page round 5” with two sets of information:

  1. The “conversion funnel” for each variation—what portion of visitors reached the plan selection page, reached the signup form, and completed signup. For each variation, we compare these metrics to the original for statistical significance. In addition, tally estimates the required sample size to detect a 10% difference in performance, and we let the experiment run to that point (for a nice explanation of why you should let tests run based on a sample size as opposed to stopping when you think you’ve hit a significant result, see here).
  2. The profile of each variation’s “cohort” that has completed signup. This includes the portion of signups that were for paying plans, the average price of those plans, and the net monthly value of a visitor to any given variation’s landing page (we also have a web-based interface to let us dig deeper into these cohorts’ retention and usage profiles). These numbers are important—we’d rather have lower overall signups if it means we’re getting a higher value signup.

Tally sits in a few of our Campfire rooms, and anyone at 37signals can check on the results of any test that’s going on or recently finished anytime in just a few seconds.

Once a test has finished, we don’t just sit back and bask in our higher conversion rates or increased average signup value—we try to infer what worked and what didn’t work, design a new test, and get back to experimenting and learning.

Interesting insight into A/B testing at 37S. M ostly relates to Mktg site but good reference for all if we decide to engage in A/B for MS platform.