I need to get a better handle on the advantages of a RESTful web-based API.  People tell me they scale better than other methods, but I don't see how they reduce database access or have any other feature that helps handle large amounts of traffic.  The RESTful naming convention certainly helps keep them organized and 'orthoganal' , so if you DO have a web API it makes sense to keep it RESTful, but my main question is - if you don't plan on your database being accessed from anything other that the server/client apps you write yourself, is their any advantage?

Views: 380

Reply to This

Replies to This Discussion

Your question about “the advantages of a RESTful web-based API” leads me to ask the following: advantage over what other type of API?   Most people compare and contrast REST with XML and argue the advantage of one over the other.

The REST payload is much smaller than most other APIs especially XML over SOAP. REST is well suited for hand held devices and other devices that have a low speed connection. The hand held may communicate with a server using REST to deliver the payload to from the hand held to the server.  From that point on, the server may communicate with a target system such as a database using an alternate protocol or method.

REST and its popular JSON format is more dynamic than XML and less rigid than XML.

 

XML over SOAP continues to be widely used for integration between enterprise packaged applications.  Most enterprise packaged application include, out of the box, an XML over SOAP API for the purpose of integrating that application with other applications in the enterprise. I do see newer packaged applications now including both flavors, REST and XML. But for sure enterprise packaged application mostly provide an XML over SOAP API.

I believe the learning curve for SOAP is very steep because it contains many different w3c specifications often referred to as WS-* included in the SOAP spec.

One size does not fit all.  I may have no choice but to use XML because that’s what the enterprise packaged application includes out of the box. On the other hand, if it’s a mobile hand held device, a light weight protocol such a REST is superior.

 

Applications today (and over the last 20 years) are multi-tier.  Your client app may communicate with a server via some protocol such as REST or XML. And the server may then communicate to a database via JDBC.  I worked on a system where a hand held communicates with an API gateway hosted in the DMZ using REST. The Gateway communicates with an ESB middleware server via XML, and the ESB then communicates to a packaged application via yet another protocol.  With this application, security was most important. And for scalability, the back-end server must be sized to handle the load.

I hope this helps.

I find RESTful apps to be easier to maintain and easier to debug when compared to a framework that uses some templating language like JSP or Django.  It is much easier if you use some library like jQuery or Angular JS, although these come with a learning curve ... but once you have moderate skill it is just more fun to design applications.  An frankly in an industry that can be "de-humanizing" it is important to enjoy and take pride in the work you create.

Another handy trick is that you can read json or xml files while doing initial development rather than having to simultaneously build the server.  Then when the client is sufficiently complete, these data files server as a "contract" to define what the server needs to provide.

I'm currently converting an application from Flash to AngularJS and I have to admit, the learning curve is much steeper than for more conventional librarys like jQuery.  But the resulting code elegance is worth it if you have the time to learn.

Of course that's just my opinion, I could be wrong :)

Fred.

Like some already have said. Its a simpler implementation that is typically simpler to maintain and has a smaller payload, Typically in JSON format. I think the more important question is what are the disadvantages. 

But if your building a simpler server/client app, there may not be any advantage to implementing REST, unless your taking the SPA approach. 

I guess it all boils down to what you want to accomplish. 

Actually, REST is not an API and there is no such thing as "RESP API", although for some reason we use SOAP, XML-RPC and REST in our vocabulary as if they were interchangeable forms of API design and they are not.

REST is about Architectural Design and Resource Representation and it's a far much deeper concept than "API", which requires careful reading and thoughtful understanding of the PhD dissertation that coined the term, and all related IETF RFCs. So to really discuss this topic it would require that all of the participants in the discussion be fully familiar with the aforementioned dissertation, and of course a deep technical understanding of the original design goals of the original HTTP 1.0 and the evolution of this protocol leading up to the recent 2.0 proposal, by reading __and understanding__ all the related RFCs.

For one thing a REST design can represent resources in any form, whether it's HTML, XML, JSON or a JPEG file and in general any representation that makes sense to the User Agent per the suggestions of the Accept and Content-Type headers. So representation is one aspect whilst the question actually relates to scalability and database access, which is wrongly assuming that all resources will be pulled from a database, and that is the __least__ ideal case.

The scalability of REST comes primarily from taking advantage of caching, but there are many other important reasons (related to the CAP theorem). This means that the representation of the resource must include all the required information to understand it (including it's "state" hence the term "REpresentational State Transfer". This means that any network component must not need to look beyond the resource metadata (basically the HTTP verb and headers) to decide what to do with it, hence it's faster and more secure just by definition. This is BTW the main draw back and main weakness to using SOAP, XML-RPC or other tunnelling-style protocols over HTTP, which would require peeking into the content of the package and understanding the tunnelled protocol (e.g. SOAP) to understand the intention of the request. Which brings us to the most important misunderstanding about HTTP: is a TRANSFER and NOT a transport protocol. Hence SOAP and XML-RPC are by definition trojan horses tunnelled over HTTP. They are extremely unreliable for anything beyond the scope of LAN and should be banned from use in large-scale implementations or for anything that should be considered Internet-scale.

Back to the original question, perhaps the best way to __begin__ to understand this is by example:

Suppose you have a typical business system which handles inventory and sales. In any such system you will find that there are many more READ requests than WRITE ones in a ration that usually exceeds 10:1. If you map this to your UI it is likely that you will have about 10 GETs for every one POST, PUT or DELETE. Furthermore, viewed from the resource modelling level, most if this data is either non-mutable or slowly mutable which could perfectly allow for caching.

Take an inventory item for example, you may think that inventory level is critical but in reality most business models can deal with a an old value, but it will depend on the nature of the item. For example, if you are selling items and unless the lead time is excessive, you are not going to stop a sale because you don't have sufficient inventory, business will generally prefer to sell and then handle a back-order process, so things like negative inventory quantities are quite common. On the other hand a spare part in a nuclear facility requires exact and real-time access to the inventory level of this part (known as safety stock) and this holds true for most reliability-centred operations.

So what is the resource? In this case it is usually the part record ro simply put un REST terms "the part" or the "item". How cacheable is it? it depends. Some parts will be more cacheable than others and some may be non-cacheable, forcing a request to the database to obtain the real-time value. This resource-cache definition maps directly to HTTP GET and caching directives, in other words, both the lead time and criticality of the part will determine for how long the network components (including the user agent) may cache each item resource. This means that only the first GET /part/{id} will actually fetch the info from the DB and construct a representation in some format that is cacheable by any component of the network. For example a JSON representation that is stored and propagated in an HTTP-based noSql system such as MongoDB. In other words, the resource is stored in a relational DB but after the first get it gets cached (for a given time frame according to the HTTP caching directives which are determined by the items lead time and criticality as stated above). From then on, any network component and/or user agent requiring this resource could obtain a local copy, whether in it's own cache (e.g. the browser's cache) or from some near-by network component who also holds a copy (e.g. a caching network proxy).

As another example take an invoice. An invoice is stored in the RDBMS as a master-detail record (a.k.a. one to many relationship). This means that EVERY TIME I need to fetch the invoice I must first query for the header data, get some other related data (e.g. customer info, billing and shipping addresses, etc.) and then query for each individual invoice-line to be able to construct the "invoice resource" which is ultimately what the consumer wants, and this may need to bring in some other related data as well (usually in the form a complex join or sub-query) . In other words RDBMS is very effective at storing complex and relational data but it's really ineffective at extracting, constructing and distributing this data across multiple nodes for millions of GET requests. And the worst part of this is that most system designers and programmers don't even realize that in the case of the invoice this data is NON MUTABLE, EVER. This means, that once stored in the RDBMS by the invoicing code, there could just be a simple trigger that de-normalizes this data and serializes it to a JSON structure that is easily distributable across thousands of nodes and NEVER have to go to the database again to retrieve this information.

This means that once the invoice resource is created it can have a cache forever header so not only user agents do not have to wait until the request makes it all the way down to the RDBMS and have to construct the de-normalized object each time. This resource will be ultimately cached in the Browser's cache and NEVER again (unless user cleans their cache) need to even consult the network to obtain a copy of it.

Ans this is just the tip of the iceberg but the point is that REST is a complete architectural design and you must be familiar with all the levels involved (e.g. network, user agents, HTTP), and not only talk about API design.

So basically to answer your question: in a typical RESTful design, most POST, PUT, DELETE will follow a different path than GET. In other words, non-idempotent (or "write") methods will usually route through some code and then to a RDBMS while most idempotent methods (or "reads") will route to near and cached resource storage (even local as is the case of a Browser), therefore dealing with the CAP theorem. Basically, a real RESTful system must be carefully designed to take advantage of caching and eventual consistency and be modelled in resources not API.

 



Alejandro IMass said:

Actually, REST is not an API and there is no such thing as "RESP API", although for some reason we use SOAP, XML-RPC and REST in our vocabulary as if they were interchangeable forms of API design and they are not.

REST is about Architectural Design and Resource Representation and it's a far much deeper concept than "API", which requires careful reading and thoughtful understanding of the PhD dissertation that coined the term, and all related IETF RFCs. So to really discuss this topic it would require that all of the participants in the discussion be fully familiar with the aforementioned dissertation, and of course a deep technical understanding of the original design goals of the original HTTP 1.0 and the evolution of this protocol leading up to the recent 2.0 proposal, by reading __and understanding__ all the related RFCs.

For one thing a REST design can represent resources in any form, whether it's HTML, XML, JSON or a JPEG file and in general any representation that makes sense to the User Agent per the suggestions of the Accept and Content-Type headers. So representation is one aspect whilst the question actually relates to scalability and database access, which is wrongly assuming that all resources will be pulled from a database, and that is the __least__ ideal case.

The scalability of REST comes primarily from taking advantage of caching, but there are many other important reasons (related to the CAP theorem). This means that the representation of the resource must include all the required information to understand it (including it's "state" hence the term "REpresentational State Transfer". This means that any network component must not need to look beyond the resource metadata (basically the HTTP verb and headers) to decide what to do with it, hence it's faster and more secure just by definition. This is BTW the main draw back and main weakness to using SOAP, XML-RPC or other tunnelling-style protocols over HTTP, which would require peeking into the content of the package and understanding the tunnelled protocol (e.g. SOAP) to understand the intention of the request. Which brings us to the most important misunderstanding about HTTP: is a TRANSFER and NOT a transport protocol. Hence SOAP and XML-RPC are by definition trojan horses tunnelled over HTTP. They are extremely unreliable for anything beyond the scope of LAN and should be banned from use in large-scale implementations or for anything that should be considered Internet-scale.

Back to the original question, perhaps the best way to __begin__ to understand this is by example:

Suppose you have a typical business system which handles inventory and sales. In any such system you will find that there are many more READ requests than WRITE ones in a ration that usually exceeds 10:1. If you map this to your UI it is likely that you will have about 10 GETs for every one POST, PUT or DELETE. Furthermore, viewed from the resource modelling level, most if this data is either non-mutable or slowly mutable which could perfectly allow for caching.

Take an inventory item for example, you may think that inventory level is critical but in reality most business models can deal with a an old value, but it will depend on the nature of the item. For example, if you are selling items and unless the lead time is excessive, you are not going to stop a sale because you don't have sufficient inventory, business will generally prefer to sell and then handle a back-order process, so things like negative inventory quantities are quite common. On the other hand a spare part in a nuclear facility requires exact and real-time access to the inventory level of this part (known as safety stock) and this holds true for most reliability-centred operations.

So what is the resource? In this case it is usually the part record ro simply put un REST terms "the part" or the "item". How cacheable is it? it depends. Some parts will be more cacheable than others and some may be non-cacheable, forcing a request to the database to obtain the real-time value. This resource-cache definition maps directly to HTTP GET and caching directives, in other words, both the lead time and criticality of the part will determine for how long the network components (including the user agent) may cache each item resource. This means that only the first GET /part/{id} will actually fetch the info from the DB and construct a representation in some format that is cacheable by any component of the network. For example a JSON representation that is stored and propagated in an HTTP-based noSql system such as MongoDB. In other words, the resource is stored in a relational DB but after the first get it gets cached (for a given time frame according to the HTTP caching directives which are determined by the items lead time and criticality as stated above). From then on, any network component and/or user agent requiring this resource could obtain a local copy, whether in it's own cache (e.g. the browser's cache) or from some near-by network component who also holds a copy (e.g. a caching network proxy).

As another example take an invoice. An invoice is stored in the RDBMS as a master-detail record (a.k.a. one to many relationship). This means that EVERY TIME I need to fetch the invoice I must first query for the header data, get some other related data (e.g. customer info, billing and shipping addresses, etc.) and then query for each individual invoice-line to be able to construct the "invoice resource" which is ultimately what the consumer wants, and this may need to bring in some other related data as well (usually in the form a complex join or sub-query) . In other words RDBMS is very effective at storing complex and relational data but it's really ineffective at extracting, constructing and distributing this data across multiple nodes for millions of GET requests. And the worst part of this is that most system designers and programmers don't even realize that in the case of the invoice this data is NON MUTABLE, EVER. This means, that once stored in the RDBMS by the invoicing code, there could just be a simple trigger that de-normalizes this data and serializes it to a JSON structure that is easily distributable across thousands of nodes and NEVER have to go to the database again to retrieve this information.

This means that once the invoice resource is created it can have a cache forever header so not only user agents do not have to wait until the request makes it all the way down to the RDBMS and have to construct the de-normalized object each time. This resource will be ultimately cached in the Browser's cache and NEVER again (unless user cleans their cache) need to even consult the network to obtain a copy of it.

Ans this is just the tip of the iceberg but the point is that REST is a complete architectural design and you must be familiar with all the levels involved (e.g. network, user agents, HTTP), and not only talk about API design.

So basically to answer your question: in a typical RESTful design, most POST, PUT, DELETE will follow a different path than GET. In other words, non-idempotent (or "write") methods will usually route through some code and then to a RDBMS while most idempotent methods (or "reads") will route to near and cached resource storage (even local as is the case of a Browser), therefore dealing with the CAP theorem. Basically, a real RESTful system must be carefully designed to take advantage of caching and eventual consistency and be modelled in resources not API.

Thank you so much for that detailed answer!  

 

Reply to Discussion

RSS

Happy 10th year, JCertif!

Notes

Welcome to Codetown!

Codetown is a social network. It's got blogs, forums, groups, personal pages and more! You might think of Codetown as a funky camper van with lots of compartments for your stuff and a great multimedia system, too! Best of all, Codetown has room for all of your friends.

When you create a profile for yourself you get a personal page automatically. That's where you can be creative and do your own thing. People who want to get to know you will click on your name or picture and…
Continue

Created by Michael Levin Dec 18, 2008 at 6:56pm. Last updated by Michael Levin May 4, 2018.

Looking for Jobs or Staff?

Check out the Codetown Jobs group.

 

Enjoy the site? Support Codetown with your donation.



InfoQ Reading List

KubeCon EU: Backstage, Crossplane and Others Preparing for CNCF Graduation

More projects from the CNCF incubated level are preparing to graduate for an ever-widening cloud native ecosystem. The Backstage community has worked on a more robust architecture, and Crossplane aimed to improve its developer DX. KubeFlow and Volcano, both tools promising to improve AI adoption within the Kubernetes ecosystem, are working on easier installation and more features, respectively.

By Olimpiu Pop

How to Tame Technical Debt in Software Development

According to Marijn Huizenveld, discipline is key to preventing accumulating technical debt. In order to be disciplined you should make it difficult to ignore the debt. Heuristics like fixing small issues immediately, agreeing on a timebox for improvement, and making messy things look messy, can help tame technical debt.

By Ben Linders

xAI Opens Grok as an Open-Source Model

Elon Musk announced that xAI would make its AI chatbot Grok open source, and now the release is accessible on GitHub and Hugging Face. This move enables researchers and developers to expand upon the model, influencing how xAI evolves Grok in the face of competition from tech giants like OpenAI, Meta, Google, Microsoft, and others.

By Daniel Dominguez

Presentation: Portfolio Analysis at Scale: Running Risk and Analytics on 15+ Million Portfolios Every Day

William Chen discusses the importance of trimming your computational graph, storing data in multiple formats, leveraging open source, and considering multiple dimensions of modularization.

By William Chen

Redis Switches to SSPLv1: Restrictive License Sparks Fork by Former Maintainers

Redis has recently announced a change in their license by transitioning from the open-source BSD to the more restrictive Server Side Public License (SSPLv1). The move has promptly led to a fork initiated by former maintainers and reignited discussions surrounding the sustainability of open-source initiatives.

By Renato Losio

© 2024   Created by Michael Levin.   Powered by

Badges  |  Report an Issue  |  Terms of Service