Tuesday, September 20, 2016

HANA High Availability


Availability the measure of a system's operational continuity, is expressed as a percentage of time,
inversely proportional to downtime. For example, if a given system is designed to be available for 99.9% of the time (sometimes called "three nines"), its downtime per year must be less than 0.1%, or 9 hours.

Downtime is the consequence of outages, which may be intentional (e.g. for system upgrades) or caused
by unplanned faults. A fault can be due to equipment malfunction, software or network failures, or due to a
major disaster such as a fire, a regional power loss or a construction accident, which may decommission
the entire data-center.

High Availability is a set of techniques, engineering practices and design principles for Business
Continuity. This is achieved by eliminating single points of failure (fault tolerance), and providing the ability
to rapidly resume operations after a system outage with minimal business loss (fault resilience).

Fault Recovery is the process of recovering and resuming operations after an outage due to a fault.
Disaster Recovery is the process of recovering operations after an outage due to a prolonged data center or site failure. Preparing for disasters may require backing up data across longer distances, and may thus be
more complex and costly.



HANA High Availability:

As an in-memory database, SAP HANA must not only concern itself with maintaining the reliability of its data in the event of failures, but also with resuming operations with most of that data loaded back in memory as quickly as possible.

The following figure shows the phases of High Availability. The first phase is readiness, being prepared for the inevitable fault. During this time, data is backed up and standby systems are ready to take over. A fault must be detected, either automatically or administratively (to avoid false positives), and a recovery process is put in action. Finally, the fault must be repaired, and the system may need to be reverted to the original configuration (failed back), to be ready again for the next fault.

Different RPO/RTO values can be associated with different kinds of faults. Business critical systems are expected to operate with an RPO of zero data loss in the case of local faults, and often even in the case of a disaster. But the challenges of disaster recovery are different from locally recoverable faults; to achieve zero RPO and low RTO, data must be replicated synchronously over longer distances, which impacts regular system performance and may require more expensive standby and failover solutions. All of this leads to tradeoff decisions around the attributes of fault recovery functionality, cost and complexity. SAP accordingly offers complementary design options, including three levels of Disaster Recovery support and two automatic Fault Recovery support features, summarized in the following table and further discussed in the sections below.






SAP HANA BACKUP:

Even though SAP HANA database is a IN-Memory database ,i.e data residing in memory ,it has got its own persistency .
Data is always saved from memory to disk at save points and data changes are written into redo log files.So that in case of failure they can be used for recovery.

                   

Data in SAP HANA is constantly written into disk from memory at savepoints. Databackup can be performed with database being online

To perform a databackup in SAP HANA, below prerequisites are essential
DATABASE USER: Backup in SAP HANA can only be performed with a database user, so it is necessary to create or use a existing user with the necessary privileges and authorizations
BACKUP_ADMIN: This is the system privilege that the Database user used for performing the backup should have .
CATALOG READ: System privilege that the Database user need to have for collecting the information required by the backup wizard

Data backup can be taken manually or can be scheduled

SAP HANA database backups up its services (nameserver,nameservertopology,indexserver,xsengine and statisticserver)
Which will be stored in a specific destination mentioned in configuration file  global.ini with the parameters
basepath_databackup and basepath_logbackup

The parameters pointing as below to the default valuesbasepath_databackup=$(DIR_INSTANCE)/backup/data used for databackup,
basepath_logbackup==$(DIR_INSTANCE)/backup/log respectively for logbackup

Backup can be performed with the below sql query

BACKUP DATA USING FILE  ('<path><prefix>')

The backup files generated by performing the databackup of SAP HANA database will be as below
COMPLETE_DATA_BACKUP_databackup_0
COMPLETE_DATA_BACKUP_databackup_1
COMPLETE_DATA_BACKUP_databackup_2
COMPLETE_DATA_BACKUP_databackup_3
COMPLETE_DATA_BACKUP_databackup_4

Where COMPLETE_DATA_BACKUP is the prefix(default) ,which can be changed to the required prefix while performing a data backup.



Log Modes in SAP HANA:

There are 3 types of log modes in SAP HANA which influence the log backup
•              Normal: In this mode ,log backup happens where the log segment is nearing full which will prevent the log segment full situation.Log backup is automatic by default which can be changed by editing the global.ini parameter enable_auto_log_backup .For point in time recovery this mode is recommended.
•              Overwrite:In this mode log segments are freed upon each savepoints and no log backup happens
So no point in time recovery can be done.
•              Legacy:in this mode log segments are freed,  once the full database backup is performed.Log backup is not performed.

Log backup generates a backup files of the log segments at the interval which is specified in the global.ini file parameter log_backup_timeout_s.if the log segment becomes full before the log backup timeout interval ,log backup will be performed .This log backup time interval can be set only if the database is running with automatic log backup enable
The path of the logbackup will be at the location $DIR_INSTANCE/backup/log

Configuration Files Backup:

Backup of the SAP Hana database configuration file
SAP Hana database configuration files are not backed up automatically as a part of database backup .So all the configuration files like global.ini,indexserver.ini,nameserver.ini,sapprofile.ini,daemon.ini and other customer-specific configuration files needs to be backed up manually



SAP HANA RECOVERY

Pre-steps:

<sid>adm:user password is required to shutdown the target system where the
backups have to be recovered upon

Configuration:between source and target system ,if backups are to be recovered on
different target system


Databackup and logbackups taken before the system failure or the time to which the system
needs to be recovered

Available Recovery types option with SAP HANA:

Recovering the database to the most recent state:used for recovering the database
to the time as close as possible to the current time . Databackup ,logbackup available
since last databbackup and log area are required to perform the above type recovery

Recovering the database to the point in time :used for recovering the database to the specific
point in time. . Databackup ,logbackup available since last databbackup and log area are
required to perform the above type recovery.

Recovering the database to specific database:Used for recovering the database
to a specified data backup.specificdatabackup is required for the above
type of recovery option.


we can perform Backup and recover using HANA studio.


Storage Replication:

One drawback of backups is the potential loss of data from the time of the last backup to the time of the
failure. A preferred solution therefore, is to provide continuous replication of all persisted data. Several SAP
HANA hardware partners offer a storage-level replication solution, which delivers a backup of the volumes or
file-system to a remote, networked storage system. In some of these vendor-specific solutions, which are
certified by SAP, the SAP HANA transaction only completes when the locally persisted transaction log has
been replicated remotely. This is called synchronous storage replication.




System Replication is an alternative HA solution for SAP HANA, System replication employs an "N+N" approach,
with a secondary standby SAP HANA system with the same number of nodes as the active, primary system.
Each service and instance of the primary SAP HANA system communicates pair-wise with a counterpart in
the secondary system.


The secondary system can be located near the primary system to serve as a rapid failover solution for
planned downtime, or to handle storage corruption or other local faults. Alternatively, or additionally,
a secondary system can be installed in a remote site for disaster recovery. Like Storage
Replication, this Disaster Recovery option requires a reliable link between the primary and secondary sites.

The instances in the secondary system operate in live replication mode. In this mode, all secondary system
services constantly communicate with their primary counterparts, replicate and persist data and logs, and
typically load data to memory. The secondary system does not accept requests or queries.

In an alternative configuration, called system replication without data-preload, the secondary system
does not pre-load data, and hence consumes very little memory. This allows the secondary system to serve
dual purposes, for instance as a development or test/QA system with separate storage. Before failover,
these activities must of course be turned off. The tradeoff is a longer RTO in case of failover.

Here is how system replication works. When the secondary system is brought up in live replication mode,
each service component establishes a connection with its primary system counterpart, and requests a
snapshot of the data. From then on, all logged changes in the primary system are replicated. Whenever logs
are persisted in the primary system, they are also sent to the secondary system.

A transaction in the primary system is not committed until the logs are replicated, as determined by a log replication option:

Synchronous: The primary system waits with committing the transaction until it receives a reply
that the log is persisted in the secondary system. This mode guarantees immediate consistency
between both systems, at a cost of delaying the transaction by the time for data transmission and
persisting in the secondary system.

Synchronous in-memory: The primary system commits the transaction after it receives a reply that
the log was received by the secondary system, but before it was persisted. The transaction delay in
the primary system is shorter, because it only includes the data transmission time.

Asynchronous (from SPS6): the primary system commits the transaction after sending the log
without waiting for a response. This eliminates the synchronization latency, at the risk of minor
theoretical data-loss during failure. This mode is most useful when the secondary site is hundreds
of kilometers away from the primary site, or when reducing latency is critical.

1 comment: