The Blog from the DBA Classroom

By: Joel Goodman

SCAN you LISTEN to this?

Posted by Joel Goodman on 18/03/2010


My good friend and colleague Harald van Breederode recently posted an article on How to Set Up a Private DNS for Your Virtual Cluster where he discussed DNS and the steps he took to create both master and slave DNS servers on his virtual RAC cluster. The article then showed how to statically define IP addresses in the DNS configuration files and how to generate the configuration and zone files. One of the “hostnames” listed had 3 IP addresses; that of the “Single Client Access Name” or SCAN as it is known.

I taught the 11gR2 Grid Infrastructure and RAC course for the first time in EMEA last week and many questions arose about SCANs and SCAN vips as many of the delegates on the course were experienced DBAs who had been using either Oracle 10g or 11g Clusterware and RAC. Some of them were conceptual and some interesting discussions arose about the SCAN vips and SCAN listeners in contrast to vips and listeners in 10g and 11g, so I thought it worth sharing.

1. Architecture for vips and listeners in 10g and 11gr1

In Oracle 10g and 11gR1 each node in the cluster has a Virtual IP Address (VIP) which is activated on the public adaptor and is used by the listener on that node. The vip is also a resource registered in the Oracle Clusterware Repository (OCR) file and the listener usually runs from the ASM home if ASM is used although this can be changed with SRVCTL.

Here is some output from the LSNRCTL utility:

LSNRCTL for Linux: Version 10.2.0.4.0 – Production on 30-SEP-
2009 15:37:15
Copyright (c) 1991, 2008, Oracle. All rights reserved.
Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
STATUS of the LISTENER
————————
Alias                                                            LISTENER

Version                                                      TNSLSNR for Linux:

Version                                                      10.2.0.4.0 – Production
Start Date                                                  30-SEP-2009 14:42:04
Uptime                                                       0 days 0 hr. 55 min. 11 sec
Trace Level                                                 off
Security                                                     ON: Local OS Authentication
SNMP                                                          OFF
Listener Parameter File                               /u01/app/10.2.0/asm_1/network/admin/listener.ora
Listener Log File                                         /u01/app/10.2.0/network/log/lsnr.log
Listening Endpoints Summary…
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.226.201)(PORT=1521)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.226.130)(PORT=1521)))
Services Summary…
Service “+ASM” has 1 instance(s).
Instance “+ASM1″, status READY, has 1 handler(s) for this service…
Service “ora10g.mynode.com” has 1 instance(s).
Instance “ora10g1″, status READY, has 1 handler(s) for this service…
Service “ora10gXDB.mynode.com” has 1 instance(s).
Instance “ora10g1″, status READY, has 1 handler(s) for this service…
The command completed successfully

10g and 11gR1 VIP Usage and Implications

In the output above, IP address 192.168.226.201 is a vip normally used by this listener on this node say node 1 for example. But if node 1 fails, then the vip is failed over to a surviving cluster node which activates the vip. If adaptor ETH0 is the public interface then on the surviving node say node 3, then ETH0:1 will be the virtual adaptor for the vip used by the listener on node 3 and ETH0:2 will be the failed over vip from node 1.

This failover of vips facilitates connect time failover so that a client or middle tier trying to connect to that listener using a load balanced connection will get an error returned from the node to which the vip failed over as no listener is using that vip. In my example node 3 has ETH0:2 for IP address 192.168.226.201 but no listener is listening on that address on node 3 so an error is returned to the client which then tries another listener address at random from the tnsnames.ora entry.

Once the failed node 1 is restarted, the vip is deactivated on node 3, the adaptor ETH0:2 is no longer active and the vip is reactivated on Node 1 as adaptor ETH0:1 and a listener on node 1 once again listens on that vip.

So a vip is used differently when failed over than it is when on its normal “home” node.

Note: This is true for Database vips. Application vips behave the same on any node where they are activated but that is beyond the scope of this discussion.

10g and 11gR1 Listener Usage and Implications

RAC Listeners from Oracle 8i onwards have had several jobs to do in supporting connection requests made by clients and middle tiers to RAC Databases:

1. Managing incoming connection requests including normal and failover connections situations

Connection requests may arrive at any time but can peak if many login requests occur simultaneously. This occurs at instance failure for example if Transparent Application Failover (TAF) is used with BASIC rather than pre-connected sessions. At such a time listeners must handle many requests depending on how many connections existed to the failed instance, and how many surviving nodes and listeners exist. For each connection request the listener must perform a load balance decision (see 2 below) and then either spawn and bequeath or redirect the request (see 3 and 4 below). Occasionally listeners may reject requests if too many are queued and this may be seen on a per service basis using “lsnrctl services” command. This is why TAF parameters exist for RETRY and DELAY. The implications for this are delays to reconnection which affects availability to the user.

2. Making Load Balancing decisions

The Listener makes “Connection Time” load balancing decisions as to which instance will be used for the connection. This decision is made by whichever listener the client or middle tier connects to as all the listeners on all the nodes should be referenced in the REMOTE_LISTENER parameter of each database instance. Since Oracle 10g, the load balancing decision may be made on a “per service” basis as each service can choose from amongst different load balancing methods:

  • Session count for long duration sessions
  • Node run queue length for short sessions with no metrics
  • Service metrics  based on either ELAPSED TIME PER CALL or CPU TIME PER CALL

3. Performing a “Spawn and Bequeath” request if the connection is to be made to an instance on the same cluster node

If the listener decides to connect the client to an instance on the same node due to a load balancing decision or because the only instance accepting logins for the service is on the same node, the the traditional spawn and bequeath method may be used. The listener spawns an oracle process from the Oracle home associated with the instance hosting the service and bequeaths the transport connection used between the client and the listener over to the new process which then communicates with the client.

4. Performing a “Redirect” request if the connection is to be made to an instance on another cluster node

Since Spawn and Bequeath is not possible for cases where the connection is made to an instance on a remote node, the listener making the load balancing decision, must employ the help of the listener on the node hosting the chosen instance. The listener first contacted by the client returns the address of the listener on the chosen node to the client which in turn contacts that listener specifying that it must connect to the specific instance on that node.

The behaviour of the vips and the listeners prior to 11gR2 has the following implications:

  • Database vips do not behave as normal vips. They fail over only to prevent the TCP timeout delay but are not used in the same way as they are when not failed over.
  • The listeners perform many different functions and when a “login storm” occurs after a failure there may be delays and retries because the same listeners are making load balancing decisions and performing spawn and bequeath connection processing
  • The code path and time required to establish a connection to an instance varies depending on whether the chosen instance for the service is hosted on the same node as the chosen listener.

Enter SCAN VIPS and SCANs in 11gR2

From 11gR2 the behaviour of the vips and the listeners is modified with the introduction of SCAN Technology.  This includes:

  • Single Client Access Name
  • SCAN VIPS
  • SCAN Listeners
  • Node or Local Listeners

Part of this architecture follows on from the implications listed above. Essentially there are now two layers of listeners:

  • Up to Three SCAN listeners – which manage the original client connection requests and which make load balancing decisions. Client connect requests always go to the SCAN listeners (except for the upgrade situation mentioned below). Much has already been written about the SCAN addresses but to summarise there can be:
  1. A single SCAN address, vip resource and listener if the server side hosts file is used to resolve the scan name but in this case the SCAN LISTENER is not highly available if the hosting node fails or the listener goes down.
  2. Three SCAN addresses statically defined in a DNS configuration with the same SCAN name as was demonstrated in Harald’s article.  Note: It is also possible to have the three static SCAN addresses managed in the corporate DNS and not in a qualified subdomain or zone. This will work perfectly for rather static clusters and is probably a good choice for many customers.
  3. Three SCAN addresses automatically acquired from DHCP and managed in the clusterware using GNS and mDNS and requiring a delegated subdomain in the corporate DNS server configuration as the IP addresses and host names will not be known outside the cluster until resolved at least once and cached in the DNS Servers for their TTL duration. This choice facilitates Grid Plug and Play (gPNP) whereby new nodes may be added to the cluster without requiring any changes to the corporate DNS configuration as DHCP assigns the IP Addresses and the host names to IP address mapping is updated dynamically by using mDNS protocol so that GNS is then able to resolve the new host names properly. It requires only a one off setup of the delegated subdomain in the corporate DNS setup and DHCP setup as well. Note that this requires the “Advanced” configuration be chosen when installing the Grid Infrastructure.
  4. Each SCAN listener has a SCAN VIP and both are so called “CLUSTER RESOURCES” in the OCR and which may be displayed using the CRSCTL command.
  5. Three SCANs and SCAN Listeners was chosen as the optimal number to make sure that at least two SCANs and SCAN listeners should be active to handle login storms if a node hosting a SCAN fails.
  • Node listeners on each node which connect clients to the chosen instance – these listeners are not normally contacted by the clients at the initial connection request and not referenced by the REMOTE_LISTENER parameter except for upgraded RAC databases from prior releases. Each local listener also has a vip and these are examples of “LOCAL RESOURCES” in the OCR.

The behaviour of these components differs from the description of listeners and vips prior to 11gR2.

  1. Clients always connect to SCAN Listeners (except for the upgrade situation mentioned above) and the DNS resolution achieves a round robin client side load balance. This relieves the client from requiring a tnsnames.ora entry with an address list for the listeners and permits load balancing using simple connect strings such as “EZ CONNECT” style or Java style syntax.
  2. SCAN Listeners always use the REDIRECT method for passing the connection request on to a Node listener even if the Node Listener is on the same node as the SCAN Listener. This has the effect of evening out the code path for connection requests.
  3. If a Node fails, then the SCAN Vip fails over to another node as in the pre-11gr2 case but the SCAN Listener fails over as well. This means that the SCAN vip behaves the same on all nodes IE it has a listener listening on the vip and is in marked contrast to the case prior to 11gR2.
  4. Node Listeners only connect clients or middle tiers to the instance hosting the service chosen by the SCAN listener performing load balancing. The separation of load balancing and initial connection handling from the “spawn and bequeath” processing means that SCAN listeners have less  to do when a login storm occurs since all they do is send out redirect packets. By having two layers of listeners, the system can be more responsive than with one layer of listeners by effectively parallelising all the connection requests.

A final word about “CLUSTER RESOURCES” and “LOCAL RESOURCES”.

when doing “crsctl stat res -t” the output is divided into these two categories.

  1. CLUSTER RESOURCES are those which can run on any node of a cluster and may fail over. Examples are SCAN vips, SCAN Listeners, GNS vip, GNS, Database instances, Database services and more. There is no requirement for multiple occurrences of a resource for it to be considered a CLUSTER resource. For example there is only ever one GNS and one GNS Vip but they both can be anywhere on the cluster. Or there are three SCAN vips and SCAN Listeners regardless of the number of cluster nodes and they may run anywhere. Or the database instances for a database may run on any servers in the server pool to which it is assigned.
  2. LOCAL RESOURCES are those which either run on a specific node or do not run at all. Examples are Local listeners which either run or don’t run, but which don’t fail over as do SCAN listeners. Or ASM Instances which run one per node or don’t run if the node is down but which do not fail over. An Example of a Local resource on only one node may be when a node is down or a resource is down on another node.

So there are both Local and Cluster resources running on multiple nodes some may run on only one node.

We certainly had fun last week discussing and debating the SCAN Architecture and the changes from previous releases and it is hoped that readers will enjoy some of these observations.

Joel

18/03/2010

About these ads

6 Responses to “SCAN you LISTEN to this?”

  1. Kalib said

    Thank you so much for the great article. I have 3 static IP’s for SCAN and my application is oracle forms 6i. If I use the scan ports in the tnsnames.ora it does not even connect, but using the vips works (I have 2-node cluster). Even with apps that uses oracle client 10g and 11g R1 using the scan ports works but seems to be slower! any comment on that?

    Also now I have a problem ORA-12542 and don’t know how to define queuesize in 11gR2 in the listener configuration since it’s part of the cluster services and the options I see out there is just how to change port or oracle_home, so how can you add queuesize?

    One last question outside the scope of listeners, I configured my bond1 (the interconnect) MTU to 9000, the bond works fine and tests fine but when try to start Cluster services they fail. Is there any known issues in 11gR2 for that, or is there any other file/service needs to be configured in addition to the bond

  2. Pascal

    Clients for older database releases can still use the node vips for backwards compatibility but officially new databases are not meant to do so.
    The node VIPs fail over upon node failure as they did in 11gR1 and earlier but the node listeners do not fail over again as they are not meant to
    as there are node listeners on each node.

    In your situation, you could either allow certain IPs through the firewall or hard code one of the scan ips as the host name in the external dns.
    As you are not using GNS, the ip addresses are fixed and this would work as well although at node failure, any connection requests would be
    delayed until the scan vip and the scan listener for that scan vip failed over to another node. To avoid this you could add the the three scan ip
    addreesses to the external dns server and allow those ips through the firewall as you are allowing the 4 node vip ip addresses at the moment.
    As long as the IPs are fixed amd not assigned by dhcp then this should work.

    Joel

  3. Pascal said

    Hi Joel,

    Thanks you for a Great Explanation abour VIPs and SCANs.

    We have done a fresh RAC 11gR2 Database Install last week, using SCAN with 3 Static IPs, configured round-robin in the Company’s DNS.

    I have noticed one interesting Behavior:
    Sometimes we have oracle clients which are outside DMZ, and these Clients do not have access to the internal DNS Server. External Clients have access to the External DNS-Server, which uses another Domain e.g ( *.company.de ), whereas the internal Clients will use the Domain *.company.de

    So, in the Pre-11gR2 Environment, we used to configure the Host-Resolution Settings in the local /etc/hosts file for Linux Clients or in Windows Registry. But we could not do so anymore with the 3 SCAN IPs in the /etc/hosts file. So , we just configured the VIPs in the /etc/hosts File, but the Connection to Cluster worked nevertherless.

    Could you please explain how the client is able to make a Connection to the Cluster, even without configuring the SCAN IPs in the local Client’s /etc/hosts File or DNS?

    Thanks.

    Regards,

    Pascal

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
Follow

Get every new post delivered to your Inbox.

Join 118 other followers

%d bloggers like this: