Wednesday, 10 June 2015

Real Time Interview Questions on Joiner Transformation

Joiner Transformation
1. What is a Joiner Transformation and why it is an Active one?
Answer:
A Joiner is an Active and Connected transformation used to join two source data streams coming from same or heterogeneous databases or files.
The Joiner transformation joins sources with at least one matching column. The Joiner transformation uses a condition that matches one or more pairs of columns between the two sources.
In the Joiner transformation, we must configure the transformation properties namely Join Condition, Join Type and optionally Sorted Input option to improve Integration Service performance.
The join condition contains ports from both input sources that must match for the Integration Service to join two rows. Depending on the join condition and the type of join selected, the Integration Service either adds the row to the result set or discards the row. Because of this reason, the number of rows in Joiner output may not be equal to the number of rows in Joiner Input. This is why Joiner is considered an Active transformation.
2. State the limitations where we cannot use Joiner in the mapping pipeline.
Answer:
The Joiner transformation accepts input from most transformations. However, following are the limitations:

  1.  Joiner transformation cannot be used when either of the input pipelines contains an Update Strate-gy transformation.
  2.  Joiner transformation cannot be used if we connect a Sequence Generator transformation directly before the Joiner transformation.

3. Out of the two input pipelines of a joiner, which one will we set as the master pipeline?
Answer:
During a session run, the Integration Service compares each row of the master source against the detail source. The master and detail sources need to be configured for optimal performance.
When the Integration Service processes an unsorted Joiner transformation, it blocks the detail source while it caches rows from the master source. Once the Integration Service finishes reading and caching all master rows, it unblocks the detail source and reads the detail rows. This is why if we have the source containing fewer input rows in master, the cache size will be smaller, thereby improving the performance.
For a Sorted Joiner transformation, use the source with fewer duplicate key values as the master source for optimal performance and disk storage. When the Integration Service processes a sorted Joiner transfor-mation, it caches rows for one hundred keys at a time. If the master source contains many rows with the same key value, the Integration Service must cache more rows, and performance can be slowed.
Blocking logic is possible if master and detail input to the Joiner transformation originate from dif-ferent sources. Otherwise, it does not use blocking logic. Instead, it stores more rows in the cache.
4. What are the different types of Joins available in Joiner Transformation?
Answer:
In SQL, a join is a relational operator that combines data from multiple tables into a single result set. The Joiner transformation is similar to an SQL join except that data can originate from different types of sources.
The Joiner transformation supports the following types of joins:

  1.  Normal
  2.  Master Outer
  3.  Detail Outer
  4.  Full Outer

A normal or master outer join performs faster than a full outer or detail outer join.
5. Define the various Join Types of Joiner Transformation.
Answer:

  1.  In a normal join, the Integration Service discards all rows of data from the master and detail source that do not match, based on the join condition.
  2.  A master outer join keeps all rows of data from the detail source and the matching rows from the master source. It discards the unmatched rows from the master source.
  3.  A detail outer join keeps all rows of data from the master source and the matching rows from the detail source. It discards the unmatched rows from the detail source.
  4. A full outer join keeps all rows of data from both the master and detail sources.

6. Describe the impact of number of join conditions and join order in a Joiner.
Answer:
We can define one or more conditions based on equality between the specified master and detail sources. Both ports in a condition must have the same data type.
If we need to use two ports in the join condition with non-matching data types we must convert the data types so that they match. The Designer validates data types in a join condition.
Additional ports in the join condition, increases the time necessary to join two sources.
The order of the ports in the join condition can impact the performance of the Joiner transformation. If we use multiple ports in the join condition, the Integration Service compares the ports in the order we specified.
Only equality operator is available in joiner join condition.
7. How does Joiner transformation treat NULL value matching?
Answer:
The Joiner transformation does not match null values.
For example, if both EMP_ID1 and EMP_ID2 contain a row with a null value, the Integration Service does not consider them a match and does not join the two rows.
To join rows with null values, replace null input with default values in the Ports tab of the joiner, and then join on the default values.
If a result set includes fields that do not contain data in either of the sources, the Joiner transfor-mation populates the empty fields with null values. If we know that a field will return a NULL and we do not want to insert NULLs in the target, set a default value on the Ports tab for the corre-sponding port.
8. When we configure the join condition, what are the guidelines we need to follow to main-tain the sort order?
Suppose we configure Sorter transformations in the master and detail pipelines with the following sorted ports in order: ITEM_NO, ITEM_NAME and PRICE.
Answer:
If we have sorted both the master and detail pipelines in order of the ports say ITEM_NO, ITEM_NAME and PRICE we must ensure that:
 Use ITEM_NO in the First Join Condition.
 If we add a Second Join Condition, we must use ITEM_NAME.
 If we want to use PRICE as a Join Condition apart from ITEM_NO, we must also use ITEM_NAME in the Second Join Condition.
 If we skip ITEM_NAME and join on ITEM_NO and PRICE, we will lose the input sort order and the In-tegration Service fails the session.
9. What are the transformations that cannot be placed between the sort origin and the Join-er transformation so that we do not lose the input sort order?
Answer:
The best option is to place the Joiner transformation directly after the sort origin to maintain sorted data. However do not place any of the following transformations between the sort origin and the Joiner transfor-mation:
  1.  Custom
  2.  Unsorted Aggregator
  3.  Normalizer
  4.  Rank
  5.  Union transformation
  6.  XML Parser transformation
  7.  XML Generator transformation
  8.  Mapplet [if it contains any one of the above mentioned transformations]

10. What is the use of sorted input in joiner transformation?
Answer:
It is recommended to Join sorted data when possible. We can improve session performance by con-figuring the Joiner transformation to use sorted input. When we configure the Joiner transformation to use sorted data, it improves performance by minimizing disk input and output. We see
great performance improvement when we work with large data sets.
For an unsorted Joiner transformation, designate as the master source the source with fewer rows. For optimal performance and disk storage, designate the master source as the source with the fewer rows. During a session, the Joiner transformation compares each row of the master source against the de-tail source. The fewer unique rows in the master, the fewer iterations of the join comparison occur, which speeds the join process.
11.Can we join two tables based on a join column having different data type?
For example table 1 EMPNO (string) and table 2 EMPNUM (number)
Answer:
Yes possible in this case. If we are using Joiner, we should be able to do this explicit conversion in an expres-sion transformation before joining the tables.
12.Implementation Scenario1 - Joiner transformation is joining two tables s1 and s2. s1 has 10,000 rows and s2 has 1000 rows . Which table you will set master for better perfor-mance of joiner transformation? Why?
Answer:
Set table S2 as Master table because informatica server has to keep master table in the cache so if it is 1000 in cache will get performance instead of having 10000 rows in cache.
DWBIConcepts DWBIConcepts DWBIConcepts DWBIConcepts

2 comments:

  1. Nice blog thanks for sharing.

    ReplyDelete
  2. The article is so appealing. You should read this article before choosing the Big data engineering services you want to learn.

    ReplyDelete