The CLM Book - Optimized Component Lifecycle Management with Sonatype CLM

4.5. Component Identification

One of the most important things you can do with regards to understanding the components in your application, is to identify them. What remains unidentified is of obvious concern.

figs/web/app-comp-report-unknown.png

Figure 4.29. Unknown Component


Sonatype CLM allows you to identify components in a number of ways, including:

  • Extensive matching via Sonatype CLM algorithms
  • Claiming components
  • Establishing proprietary components

In this section, we’ll describe all of these in detail, within the context of identifying components using the application composition report, as well as offer our suggestion for best practices.

4.5.1. Matching Components

When a Sonatype CLM analysis is performed hashes of the components in your application are created. This in many ways is like a fingerprint, which is unique to a component. We then compare that fingerprint (hash), back to our database of components. This database includes general component info, usage statistics, security vulnerability and license information. All of this information can beu used as parameters in your policy, which translates to more understanding of the component usage in your organization. That data however, can only be linked based on a matching of hashes, which can be exact or similar, and in some cases, unknown. We discuss these three match types below.

figs/web/app-comp-report-filter-matches.png

Figure 4.30. Filter and Matching Options


Exact
An exact match means that a one-to-one link was found between a component hash in your application, and our database. All the data you see represented corresponds to a component we have identified and collect information for. This is the best case scenario with regard to component identification, and most components should fit in this category.
Similar
A similar match is found using proprietary matching algorithms, and is our best guess for a component that you have in your application. In some cases, multiple matches may be found, and this is where the Similar section of the CIP is important. While the most likely match is used to display any information about a similar matched component, you can see all other matches in this section of the application composition report. An example is displayed in Figure 4.13, “CIP, Similar Section”.
Unknown
There are instances where not even a similar component match can be determined. This should be considered a serious situation, at least one that needs to be investigated. This could be a case of a component being recompiled and modified so that a match to our database is no longer possible.

However, there is a chance that component is something malicious introduced into the application. Either way, an unknown component is one that Sonatype CLM has no information for. Of course, if during your investigation, you are able to identify the component, you can claim that component, via the Claim Components section, which we will walk you through in more detail a little bit later. An example is displayed in Figure 4.29, “Unknown Component”.

[Note]

Unknown components will not be displayed in the License tab until they have been claimed.

In addition to the main filters above, you can also control whether all violations for each component will be displayed. By default the summary of violations is shown. This means that only the worst violation for a component will be shown, and the component will only appear once in the list. Choosing All or Waived, will show all violations (including those waived), or only the waived violations, respectively.

[Note]

Changing the Violations filter can result in the components being displayed in the component list more than once.

4.5.2. Managing Proprietary Components

As with our matched components, proprietary is one of the options included in the Filter on the Policy tab of the application composition report. Unfortunately, there is often a little bit of confusion around identifying a proprietary component, so lets start first with what a proprietary component is.

Simply put, proprietary components are those components that are unique to your organization. In many cases these are actually developed by your organization and distributed among the applications you develop.

figs/web/app-comp-report-proprietary.png

Figure 4.31. Proprietary Component


In most cases components unique to your organization will simply display as Unknown. However in reality they are very well known by your team, and simply unknown to Sonatype CLM.

To address this, you can set up the Sonatype CLM server to automatically identify proprietary components when an application is scanned. This will then place them into the Proprietary filter.

You still need to claim the components, but it will help you distinguish truly unknown components, from those that simply aren’t known to Sonatype CLM. To set up proprietary identification:

  1. Make sure you are logged into the Sonatype CLM Server with admin-level permissions (member of the Global Role, Admin).
  2. Click the System Preferences icon figs/web/clm-server-system-preferences-icon.png, and then the Proprietary Components option.
  3. When you are telling Sonatype CLM how to identify proprietary components, there are a couple of different methods:

    1. The first option is to add proprietary group parameters, or components, that are considered proprietary. For example, if we entered com.sonatype, everything found in the path com/sonatype would be marked as a proprietary component, and therefore not evaluated.

      [Note]

      This method follows a traditional ANT GLOB pattern.

    2. The second option is to enter a regular expression. If you choose this option, make sure to click the Regular expression check box. For more information on regular expressions, check out Oracle’s Java documentation. However, an example of a regular expression might be test\.zip. In this example anything in the top level directory with a .zip file extension would be excluded from the evaluation.

      [Note]

      Occurrences inside an identified archive, will make the binary proprietary as well. For example, if a proprietary zip is found inside a jar, the jar is also considered proprietary.

  4. After entering your proprietary component identification, click the Add button. This will queue your new proprietary component identifier for saving. Additionally, click any remove icon (resembles a minus symbol) in the list to remove an entry. No changes will be persisted to the server until you click the Save button.
[Tip]

When using regular expressions, using the format .*{some_identifying_text}\.zip, the entire directory that is being evaluated will be searched for proprietary components. For example, using .*data\.zip vs. data\.zip.

figs/web/clm-server-proprietary-packages-configuration.png

Figure 4.32. Proprietary Packages Configuration via the Sonatype CLM Server


Once your proprietary components are configured Sonatype CLM will look at the component and the directory structure of the application being evaluated. If it matches your proprietary component configuration, it will be identified as proprietary and displayed them accordingly in the reports.

[Note]

The proprietary component changes will not be evaluated against existing reports.

4.5.3. Claiming a Component

When a component is similar or unknown, yet you are certain the component is recognized by your organization, Sonatype CLM includes functionality to prevent that component from being identified as similar or unknown in future reports. In other words, Sonatype CLM allows you to to claim the component as your own.

Once claimed, that component will be known to the CLM server. It will no longer be treated it as Similar or Unknown, and instead result in an Exact.

figs/web/app-comp-report-claim-component.png

Figure 4.33. Claim a Component


  1. Access an application composition report.
  2. Click the Policy tab, and then click the Unknown or Similar component filter.
  3. Click the row of component you wish to claim in the list - the Component Information Panel is displayed.
  4. Click on the Claim Component section of the CIP .
  5. Enter values for the Group ID, Artifact ID, and Version (GAV). All of which are mandatory.
  6. As an option, enter the Maven coordinates classifier and extension (equivalent to type or packaging), the Created Date, and/or a Comment. The created date is initialized with the date of the youngest entry in the component to be claimed.
  7. Click the Claim button, to officially stake your claim for the component.

On review of the existing report, as well as those in the future, there is now an indicator that information about the component has been edited. When hovered over, a tooltip is displayed identifying that the component has been claimed.

We refer to this as the edited component tick mark (a small red triangle) on all future scans for this application, as well as any application with a valid Application ID on the CLM Server.

figs/web/app-comp-report-claimed-component.png

Figure 4.34. Claimed Component Indicator


In addition, the Component Info section for the claimed component will now have two new fields, one indicating the Identification Source is Manual, and the other, Identification Comment will include any comments that were entered. While any policy violations will be displayed, the component graph will not.

Finally, if you have made a mistake and wish to revoke the claim on the component or make an edit, click on the Claim Component tab. Then, use the Revoke or Update buttons respectively.

figs/web/app-comp-report-update-claimed-component.png

Figure 4.35. Update or Revoke Claimed Component Indicator


[Tip]

Use the cancel button to undo any changes you made but haven’t saved.

4.5.4. Summary

Component identification may seem like a slightly tedious task in the beginning, but it has rewards that pay off in the long term. That’s because once you identify (claim) a component, it becomes known to the Sonatype CLM server. This is great because, any other application that includes that component will no longer treat it as unknown either. Of course this section should have shown you a few other things as well, here’s a highlight:

  • Understanding how components are matched.
  • Claiming an unknown component.
  • Setting up and identifying proprietary components.