Thursday, December 16, 2021

An Introduction to OAuth 2.0

 

OAuth 2 is an authorization framework that enables applications to obtain limited access to user accounts on an HTTP service, such as Facebook, GitHub, and DigitalOcean. It works by delegating user authentication to the service that hosts the user account, and authorizing third-party applications to access the user account. OAuth 2 provides authorization flows for web and desktop applications, and mobile devices.

This informational guide is geared towards application developers, and provides an overview of OAuth 2 roles, authorization grant types, use cases, and flows.
Let's get started with OAuth Roles! OAuth defines four roles:
  1. Resource Owner : User The resource owner is the user who authorizes an application to access their account. The application's access to the user's account is limited to the "scope" of the authorization granted (e.g. read or write access).
  2. Client : Application The authorization server verifies the identity of the user then issues access tokens to the application.
  3. Resource ServerThe server that keeping the resource owner’s protected resources. And this server is capable of accepting and responding to protected resource requests using access tokens.
  4. Authorization Server : API - The client is the application that wants to access the user's account. Before it may do so, it must be authorized by the user, and the authorization must be validated by the API.

Abstract Protocol Flow

Now that you have an idea of what the OAuth roles are, let's look at a diagram of how they generally interact with each other:
 

Here is a more detailed explanation of the steps in the diagram;
  1. The application requests authorization to access service resources from the user
  2.  If the user authorized the request, the application receives an authorization grant
  3. The application requests an access token from the authorization server (API) by presenting authentication of its own identity, and the authorization grant
  4. If the application identity is authenticated and the authorization grant is valid, the authorization server (API) issues an access token to the application. Authorization is complete.
  5. The application requests the resource from the resource server (API) and presents the access token for authentication
  6. If the access token is valid, the resource server (API) serves the resource to the application.

Application Registration

Before using OAuth with your application, you must register your application with the service. This is done through a registration form in the "developer" or "API" portion of the service's website, where you will provide the following information (and probably details about your application)
  1. Application Name
  2. Application Website
  3. Redirected URI or Callback URI
The redirect URI is where the service will redirect the user after they authorize (or deny) your application, and therefore the part of your application that will handle authorization codes or access tokens.

Client ID and Client Secret

Once your application is registered, the service will issue "client credentials" in the form of a client identification number and a client secret. 

The Client ID is a publicly exposed string that is used by the service API to identify the application, and is also used to build authorization URLs that are presented to users.

The Client Secret is used to authenticate the identity of the application to the service API when the application requests to access a user's account, and must be kept private between the application and the API.

Authorization Grant

In the above flow the first four steps cover obtaining an authorization grant and access token. The authorization grant type depends on the method used by the application to request authorization, and the grant types supported by the API. OAuth 2 defines four grant types, each of which is useful in different cases.
  1. Authorization Code: used with server-side Applications
  2. Implicit: used with Mobile Apps or Web Applications (applications that run on the user's device)
  3.  Resource Owner Password Credentials: used with trusted Applications, such as those owned by the service itself
  4.  Client Credentials: used with Applications API access. 

Wednesday, December 15, 2021

Critical vulnerability in Apache Log4j library

Researchers discovered a critical vulnerability in Apache Log4j library, which scores perfect 10 out of 10 in CVSS. Here’s how to protect against it.

Why CVE-2021-44228 is so dangerous

CVE-2021-44228, also named Log4Shell or LogJam, is a Remote Code Execution (RCE) class vulnerability. If attackers manage to exploit it on one of the servers, they gain the ability to execute arbitrary code and potentially take full control of the system.

What makes CVE-2021-44228 especially dangerous is the ease of exploitation: even an inexperienced hacker can successfully execute an attack using this vulnerability. According to the researchers, attackers only need to force the application to write just one string to the log, and after that they are able to upload their own code into the application due to the message lookup substitution function.

Which versions of the Log4j library is vulnerable and how can you protect your servers from attack?

Almost all versions of Log4j are vulnerable, starting from 2.0-beta9 to 2.14.1. The simplest and most effective protection method is to install the most recent version of the library, 2.15.0.
How I passed the CKA (Certified Kubernetes Administrator) Exam

Getting Prepared

First I advice you to download the exam study guide book and get a thorough understanding of what is required of you.

I followed the CKA prep course offered by Linux Academy here to gather and fine tune my knowledge according to the requirements of the exam study guide. Please note that watching this course alone will not guarantee you getting through.

Take it easy and All the very best!

Don’t stress out much on the exam. True, it’s hard. Don’t panic. I panicked at the last minute before starting the exam and tried to reschedule for a later date thinking I was not ready. However it was too late and I just took a deep breath and thought, what the hell, I’ll just go for it and I did. Don’t be scared as you get a free retry on the exam. Thankfully, I passed on the first attempt itself.


 

 

 

 

Sunday, November 5, 2017

Data Masking Simplified

What Does Data Masking Mean?

Data Masking is the replacement of existing sensitive information in test or development databases with information that looks real but is of no use to anyone who might wish to misuse it. In general, the users of the test, development or training databases do not need to see the actual information as long as what they are looking at looks real and is consistent.

Data Masking Techniques

  1. Substitution
  2. Shuffling
  3. Redaction / Null
  4. Number and Date Variance
  5. Blurring
  6. Masking Out Date
  7. Table Internal Synchronization
  8. Cross Schema Synchronization
  9. Selective Masking: Ability to Apply a WHERE Claus
  10. User Defined SQL Commands

Substitution

This technique consists of randomly replacing the contents of a column of data with information that looks similar but is completely unrelated to the real details. For example, the surnames in a customer database could be sanitized by replacing the real last names with surnames drawn from a largish random list.

Substitution is very effective in terms of preserving the look and feel of the existing data. The downside is that a largish store of substitutable information must be available for each column to be substituted. For example, to sanitize surnames by substitution, a list of random last names must be available. Then to sanitize telephone numbers, a list of phone numbers must be available. Frequently, the ability to generate known invalid data (credit card numbers that will pass the checksum tests but never work) is a nice-to-have feature.

Substitution data can sometimes be very hard to find in large quantities - however any data masking software should contain datasets of commonly required items. When evaluating data masking software, the size, scope and variety of the datasets should be considered. Another useful feature to look for is the ability to build your own custom datasets and add them for use in the masking rules.

Shuffling

Shuffling is similar to substitution except that the substitution data is derived from the column itself. Essentially the data in a column is randomly moved between rows until there is no longer any reasonable correlation with the remaining information in the row. 

There is a certain danger in the shuffling technique. It does not prevent people from asking questions like “I wonder if so-and-so is on the supplier list?” In other words, the original data is still present and sometimes meaningful questions can still be asked of it. Another consideration is the algorithm used to shuffle the data. If the shuffling method can be determined, then the data can be easily “in-shuffled”. For example, if the shuffle algorithm simply ran down the table swapping the column data in between every group of two rows it would not take much work from an interested party to revert things to their un-shuffled state. 

Shuffling is rarely effective when used on small amounts of data. For example, if there are only 5 rows in a table it probably will not be too difficult to figure out which of the shuffled data really belongs to which row. On the other hand, if a column of numeric data is shuffled, the sum and average of the column still work out to the same amount. This can sometimes be useful.

Shuffle rules are best used on large tables and leave the look and feel of the data intact. They are fast, but great care must be taken to use a sophisticated algorithm to randomize the shuffling of the rows.


Redaction / Null

Data redaction is the destruction of sensitive data, such as any personally identifiable information (PII). PII can be used on its own or with other information to identify or locate a single person, or to identify an individual in context. Enabling redaction allow you to transform PII to a pattern that does not contain any identifiable information. For example, you could replace all Social Security numbers (SSN) like 123-45-6789 with an unintelligible pattern like XXX-XX-XXXX, or replace only part of the SSN (XXX-XX-6789).

Although encryption techniques are available to protect Hadoop data, the underlying problem with using encryption is that an admin who has complete access to the cluster also access to unencrypted sensitive user data. Even users with appropriate ACLs on the data could have access to logs and queries where sensitive data might have leaked.

Data redaction provides compliance with industry regulations such as PCI and HIPAA, which require that access to PII be restricted to only those users whose jobs require such access. PII or other sensitive data must not be available through any other channels to users like cluster administrators or data analysts. However, if you already have permissions to access PII through queries, the query results will not be redacted. Redaction only applies to any incidental leak of data. Queries and query results must not show up in clear text in logs, configuration files, UIs, or other unprotected areas.


Number and Date Variance

The Number Variance technique is useful on numeric or date data. Simply put, the algorithm involves modifying each number or date value in a column by some random percentage of its real value.

This technique has the nice advantage of providing a reasonable disguise for the data while still keeping the range and distribution of values in the column to within existing limits. For example, a column of salary details might have a random variance of ±10% placed on it. Some values would be higher, some lower but all would be not too far from their original range. Date fields are also a good candidate for variance techniques. Birth dates, for example, could be varied with in an arbitrary range of ± 120 days which effectively disguises the personally identifiable information while still preserving the distribution. 

The variance technique can prevent attempts to discover true records using known date data or the exposure of sensitive numeric or date data


Blurring

Alter the existing value randomly with in a define range


Masking Out Data

Generic term for this process is data anonymization, means replacing certain fields with a mask character (such as an X). This effectively disguises the data content while preserving the same formatting on front end screens and reports. For example, a column of credit card numbers might look like: 

4346 6454 0020 5379
4493 9238 7315 5787
4297 8296 7496 8724

and after the masking operation the information would appear as:

4346 XXXX XXXX 5379
4493 XXXX XXXX 5787
4297 XXXX XXXX 8724

The masking characters effectively remove much of the sensitive content from the record while still preserving the look and feel. Take care to ensure that enough of the data is masked to preserve security. It would not be hard to regenerate the original credit card number from a masking operation such as: 4297 8296 7496 87XX since the numbers are generated with a specific and well known checksum algorithm. Also care must be taken not to mask out potentially required information.

A masking operation such as XXXX XXXX XXXX 5379 would strip the card issuer details from the credit card number. This may, or may not, be desirable.

If the data is in a specific, invariable format, then Masking Out is a powerful and fast option. If numerous special cases must be dealt with then masking can be slow, extremely complex to administer and can potentially leave some data items inappropriately masked.

Table Internal Synchronization

Sometimes the same data appears in multiple rows within the same table. In the example below, the name Robert Smith appears in the FIRST_NAME and LAST_NAME columns in multiple rows.

In other words, some of the data items are de-normalized because of repetitions in multiple rows. If the name Robert Smith changes to Albert Wilson after masking, then the same Robert Smith referenced in other rows must also change to Albert Wilson in a consistent manner. This requirement is necessary to preserve the relationships between the data rows and is called Table-Internal Synchronization.

A Table-Internal Synchronization operation will update columns in groups of rows within a table to contain identical values. This means that every occurrence of Robert Smith in the table will contain Albert Wilson. Good data anonymization software should provide support for this requirement.

Cross Schema Synchronization

Many databases contain de-normalized data in tables which are located in multiple schemas. If this data is related, a Table-To-Table Synchronization operation may be required after the data masking operations have concluded. The analysis phase conducted before the construction of the masking routines should pay attention to this requirement and the masking software should be able to support it if required.

Selective Masking: Ability to Apply a WHERE Clause

It is essential to be able to use specific criteria to choose the rows on which the masking operations are performed. In effect, this means that it must be possible to apply a Where Clause to a set of data and have the masking operations apply only to that subset of the table.

As an example of a Where Clause requirement, consider a masking operation on a column containing first names. These names are gender specific and the end users of the database may well require Male and Female names to be present in the appropriate rows after the masking operations complete. Two rules, each with a Where Clause based on the gender column will be required here. There is a potential trap here – note the discussion entitled Where Clause Skips in the Data Masking Issues section of this document.

User Defined SQL Commands

It is very helpful, essential in many circumstances, to be able to run user defined SQL statements. Such statements could be used, for example, to create an index which speeds up the operation of other rules or to assist with a complex synchronization operation.

Note that many databases use different internal mechanisms for the creation and execution of a block of statements (i.e. a procedure) than they do for simple SQL statements. It is usually important that the solution chosen be able to support both block SQL constructs as well as simple insert, update and delete statements.