Maven: The Complete Reference

8.5. Controlling the Contents of an Assembly

In theory, id and format are the only absolute requirements for a valid assembly descriptor; however, many assembly archivers will fail if they do not have at least one file to include in the output archive. The task of defining the files to be included in the assembly is handled by the five main sections of the assembly descriptor: files, fileSets, dependencySets, repositories, and moduleSets. To explore these sections most effectively, we’ll start by discussing the most elemental section: files. Then, we’ll move on to the two most commonly used sections, fileSets and dependencySets. Once you understand the workings of fileSets and dependencySets, it’s easier to understand repositories and moduleSets.

8.5.1. Files Section

The files section is the simplest part of the assembly descriptor, it is designed for files that have a definite location relative to your project’s directory. Using this section, you have absolute control over the exact set of files that are included in your assembly, exactly what they are named, and where they will reside in the archive.

Including a JAR file in an Assembly using files

<assembly>
    ...
    <files>
        <file>
            <source>target/my-app-1.0.jar</source>
            <outputDirectory>lib</outputDirectory>
            <destName>my-app.jar</destName>
            <fileMode>0644</fileMode>
        </file>
    </files>
    ...
</assembly>

Assuming you were building a project called my-app with a version of 1.0, Including a JAR file in an Assembly using files would include your project’s JAR in the assembly’s lib/ directory, trimming the version from the file name in the process so the final file name is simply my-app.jar. It would then make the JAR readable by everyone and writable by the user that owns it (this is what the mode 0644 means for files, using Unix four-digit Octal permission notation). For more information about the format of the value in fileMode, please see the Wikipedia’s explanation of four-digit Octal notation.

You could build a very complex assembly using file entries, if you knew the full list of files to be included. Even if you didn’t know the full list before the build started, you could probably use a custom Maven plugin to discover that list and generate the assembly descriptor using references like the one above. While the files section gives you fine-grained control over the permission, location, and name of each file in the assembly archive, listing a file element for every file in a large archive would be a tedious exercise. For the most part, you will be operating on groups of files and dependencies using fileSets. The remaining four file-inclusion sections are designed to help you include entire sets of files that match a particular criteria.

8.5.2. FileSets Section

Similar to the files section, fileSets are intended for files that have a definite location relative to your project’s directory structure. However, unlike the files section, fileSets describe sets of files, defined by file and path patterns they match (or don’t match), and the general directory structure in which they are located. The simplest fileSet just specifies the directory where the files are located:

<assembly>
    ...
    <fileSets>
        <fileSet>
            <directory>src/main/java</directory>
        </fileSet>
    </fileSets>
    ...
</assembly>

This file set simply includes the contents of the src/main/java directory from our project. It takes advantage of many default settings in the section, so let’s discuss those briefly.

First, you’ll notice that we haven’t told the file set where within the assembly matching files should be located. By default, the destination directory (specified with outputDirectory) is the same as the source directory (in our case, src/main/java). Additionally, we haven’t specified any inclusion or exclusion file patterns. When these are empty, the file set assumes that all files within the source directory are included, with some important exceptions. The exceptions to this rule pertain mainly to source-control metadata files and directories, and are controlled by the useDefaultExcludes flag, which is defaulted to true. When active, useDefaultExcludes will keep directories like .svn/ and CVS/ from being added to the assembly archive. Section 8.5.3, “Default Exclusion Patterns for” provides a detailed list of the default exclusion patterns.

If we want more control over this file set, we can specify it more explicitly. Including Files with fileSet shows a fileSet element with all of the default elements specified.

Including Files with fileSet

<assembly>
    ...
    <fileSets>
        <fileSet>
            <directory>src/main/java</directory>
            <outputDirectory>src/main/java</outputDirectory>
            <includes>
                <include>**</include>
            </includes>
            <useDefaultExcludes>true</useDefaultExcludes>
            <fileMode>0644</fileMode>
            <directoryMode>0755</directoryMode>
        </fileSet>
    </fileSets>
    ...
</assembly>

The includes section uses a list of include elements, which contain path patterns. These patterns may contain wildcards such as ‘**’ which matches one or more directories or ‘*’ which matches part of a file name, and ‘?’ which matches a single character in a file name. Including Files with fileSet uses a fileMode entry to specify that files in this set should be readable by all, but only writable by the owner. Since the fileSet includes directories, we also have the option of specifying a directoryMode that works in much the same way as the fileMode. Since a directories’ execute permission is what allows users to list their contents, we want to make sure directories are executable in addition to being readable. Like files, only the owner can write to directories in this set.

The fileSet entry offers some other options as well. First, it allows for an excludes section with a form identical to the includes section. These exclusion patterns allow you to exclude specific file patterns from a fileSet. Include patterns take precedence over exclude patterns. Additionally, you can set the filtering flag to true if you want to substitute property values for expressions within the included files. Expressions can be delimited either by ${ and } (standard Maven expressions like ${project.groupId}) or by @ and @ (standard Ant expressions like @project.groupId@). You can adjust the line ending of your files using the lineEnding element; valid values for lineEnding are:

keep
Preserve line endings from original files. (This is the default value.)
unix
Unix-style line endings
lf
Only a Line Feed Character
dos
MS-DOS-style line endings
crlf
Carriage-return followed by a Line Feed

Finally, if you want to ensure that all file-matching patterns are used, you can use the useStrictFiltering element with a value of true (the default is false). This can be especially useful if unused patterns may signal missing files in an intermediary output directory. When useStrictFiltering is set to true, the Assembly plugin will fail if an include pattern is not satisfied. In other words, if you have an include pattern which includes a file from a build, and that file is not present, setting useStrictFiltering to true will cause a failure if Maven cannot find the file to be included.

8.5.3. Default Exclusion Patterns for

When you use the default exclusion patterns, the Maven Assembly plugin is going to be ignoring more than just SVN and CVS information. By default the exclusion patterns are defined by the DirectoryScanner class in the plexus-utils project hosted at Codehaus. The array of exclude patterns is defined as a static, final String array named DEFAULTEXCLUDES in DirectoryScanner. The contents of this variable are shown in Definition of Default Exclusion Patterns from Plexus Utils.

Definition of Default Exclusion Patterns from Plexus Utils. 

public static final String[] DEFAULTEXCLUDES = {
// Miscellaneous typical temporary files
"**/*~",
"**/#*#",
"**/.#*",
"**/%*%",
"**/._*",

// CVS
"**/CVS",
"**/CVS/**",
"**/.cvsignore",

// SCCS
"**/SCCS",
"**/SCCS/**",

// Visual SourceSafe
"**/vssver.scc",

// Subversion
"**/.svn",
"**/.svn/**",

// Arch
"**/.arch-ids",
"**/.arch-ids/**",

//Bazaar
"**/.bzr",
"**/.bzr/**",

//SurroundSCM
"**/.MySCMServerInfo",

// Mac
"**/.DS_Store"
};

This default array of patterns excludes temporary files from editors like GNU Emacs, and other common temporary files from Macs and a few common source control systems (although Visual SourceSafe is more of a curse than a source control system). If you need to override these default exclusion patterns you set useDefaultExcludes to false and then define a set of exclusion patterns in your own assembly descriptor.

8.5.4. dependencySets Section

One of the most common requirements for assemblies is the inclusion of a project’s dependencies in an assembly archive. Where files and fileSets deal with files in your project, dependency files don’t have a location in your project. The artifacts your project depends on have to be resolved by Maven during the build. Dependency artifacts are abstract, they lack a definite location, and are resolved using a symbolic set of Maven coordinates. Since file and fileSet specifications require a concrete source path, dependencies are included or excluded from an assembly using a combination of Maven coordinates and dependency scopes.

The simplest dependencySet is an empty element:

<assembly>
    ...
    <dependencySets>
        <dependencySet/>
    </dependencySets>
    ...
</assembly>

The dependencySet above will match all runtime dependencies of your project (runtime scope includes the compile scope implicitly), and it will add these dependencies to the root directory of your assembly archive. It will also copy the current project’s main artifact into the root of the assembly archive, if it exists.

Note

Wait? I thought dependencySet was about including my project’s dependencies, not my project’s main archive? This counterintuitive side-effect was a widely-used bug in the 2.1 version of the Assembly plugin, and, because Maven puts an emphasis on backward compatibility, this counterintuitive and incorrect behavior needed to be preserved between a 2.1 and 2.2 release. You can control this behavior by changing the useProjectArtifact flag to false.

While the default dependency set can be quite useful with no configuration whatsoever, this section of the assembly descriptor also supports a wide array of configuration options, allowing your to tailor its behavior to your specific requirements. For example, the first thing you might do to the dependency set above is exclude the current project artifact, by setting the useProjectArtifact flag to false (again, its default value is true for legacy reasons). This will allow you to manage the current project’s build output separately from its dependency files. Alternatively, you might choose to unpack the dependency artifacts using by setting the unpack flag to true (this is false by default). When unpack is set to true, the Assembly plugin will combine the unpacked contents of all matching dependencies inside the archive’s root directory.

From this point, there are several things you might choose to do with this dependency set. The next sections discuss how to define the output location for dependency sets and how include and exclude dependencies by scope. Finally, we’ll expand on the unpacking functionality of the dependency set by exploring some advanced options for unpacking dependencies.

Customizing Dependency Output Location

There are two configuration options that are used in concert to define the location for a dependency file within the assembly archive: outputDirectory and outputFileNameMapping. You may want to customize the location of dependencies in your assembly using properties of the dependency artifacts themselves. Let’s say you want to put all the dependencies in directories that match the dependency artifact’s groupId. In this case, you would use the outputDirectory element of the dependencySet, and you would supply something like:

<assembly>
    ...
    <dependencySets>
        <dependencySet>
            <outputDirectory>${artifact.groupId}</outputDirectory>
        </dependencySet>
    </dependencySets>
    ...
</assembly>

This would have the effect of placing every single dependency in a subdirectory that matched the name of each dependency artifact’s groupId.

If you wanted to perform a further customization and remove the version numbers from all dependencies. You could customize the output file name for each dependency using the outputFileNameMapping element as follows:

<assembly>
    ...
    <dependencySets>
        <dependencySet>
            <outputDirectory>${artifact.groupId}</outputDirectory>
            <outputFileNameMapping>
                ${artifact.artifactId}.${artifact.extension}
            </outputFileNameMapping>
        </dependencySet>
    </dependencySets>
    ...
</assembly>

In the previous example, a dependency on commons:commons-codec version 1.3, would end up in the file commons/commons-codec.jar.

Interpolation of Properties in Dependency Output

As mentioned in the Assembly Interpolation section above, neither of these elements are interpolated with the rest of the assembly descriptor, because their raw values have to be interpreted using additional, artifact-specific expression resolvers.

The artifact expressions available for these two elements vary only slightly. In both cases, all of the ${project.*}, ${pom.**}, and ${*} expressions that are available in the POM and the rest of the assembly descriptor are also available here. For the outputFileNameMapping element, the following process is applied to resolve expressions:

  1. If the expression matches the pattern ${artifact.\*}:

    1. Match against the dependency’s Artifact instance (resolves: groupId, artifactId, version, baseVersion, scope, classifier, and file.*)
    2. Match against the dependency’s ArtifactHandler instance (resolves: expression)
    3. Match against the project instance associated with the dependency’s Artifact (resolves: mainly POM properties)
  2. If the expression matches the patterns ${pom.*} or ${project.\*}:

    1. Match against the project instance (MavenProject) of the current build.
  3. If the expression matches the pattern $ and the Artifact instance contains a non-null classifier, resolve to the classifier preceded by a dash (-classifier). Otherwise, resolve to an empty string.

    1. Attempt to resolve the expression against the project instance of the current build.
    2. Attempt to resolve the expression against the POM properties of the current build.
    3. Attempt to resolve the expression against the available system properties.
    4. Attempt to resolve the expression against the available operating-system environment variables.

The outputDirectory value is interpolated in much the same way, with the difference being that there is no available ${artifact.*} information, only the ${project.\*} instance for the particular artifact. Therefore, the expressions listed above associated with those classes (1a, 1b, and 3 in the process listing above) are unavailable.

How do you know when to use outputDirectory and outputFileNameMapping? When dependencies are unpacked only the outputDirectory is used to calculate the output location. When dependencies are managed as whole files (not unpacked), both outputDirectory and outputFileNameMapping can be used together. When used together, the result is the equivalent of:

<archive-root-dir>/<outputDirectory>/<outputFileNameMapping>

When outputDirectory is missing, it is not used. When outputFileNameMapping is missing, its default value is: ${artifact.artifactId}-${artifact.version}-$.${artifact.extension}

Including and Excluding Dependencies by Scope

In Section 3.4, “Project Dependencies”, it was noted that all project dependencies have one scope or another. Scope determines when in the build process that dependency normally would be used. For instance, test-scoped dependencies are not included in the classpath during compilation of the main project sources; but they are included in the classpath when compiling unit test sources. This is because your project’s main source code should not contain any code specific to testing, since testing is not a function of the project (it’s a function of the project’s build process). Similarly, provided-scoped dependencies are assumed to be present in the environment of any eventual deployment. However, if a project depends on a particular provided dependency, it is likely to require that dependency in order to compile. Therefore, provided-scoped dependencies are present in the compilation classpath, but not in the dependency set that should be bundled with the project’s artifact or assembly.

Also from Section 3.4, “Project Dependencies”, recall that some dependency scopes imply others. For instance, the runtime dependency scope implies the compile scope, since all compile-time dependencies (except for those in the provided scope) will be required for the code to execute. There are a number of complex relationships between the various dependency scopes which control how the scope of a direct dependency affects the scope of a transitive dependency. In a Maven Assembly descriptor, we can use scopes to apply different settings to different sets of dependencies accordingly.

For instance, if we plan to bundle a web application with Jetty to create a completely self-contained application, we’ll need to include all provided-scope dependencies somewhere in the jetty directory structure we’re including. This ensures those provided dependencies actually are present in the runtime environment. Non-provided, runtime dependencies will still land in the WEB-INF/lib directory, so these two dependency sets must be processed separately. These dependency sets might look similar to the following XML.

Defining Dependency Sets Using Scope. 

<assembly>
    ...
    <dependencySets>
        <dependencySet>
            <scope>provided</scope>
            <outputDirectory>lib/${project.artifactId}</outputDirectory>
        </dependencySet>
        <dependencySet>
            <scope>runtime</scope>
            <outputDirectory>
                webapps/${webContextName}/WEB-INF/lib
            </outputDirectory>
        </dependencySet>
    </dependencySets>
    ...
</assembly>

Provided-scoped dependencies are added to the lib/ directory in the assembly root, which is assumed to be a libraries directory that will be included in the Jetty global runtime classpath. We’re using a subdirectory named for the project’s artifactId in order to make it easier to track the origin of a particular library. Runtime dependencies are included in the WEB-INF/lib path of the web application, which is located within a subdirectory of the standard Jetty webapps/ directory that is named using a custom POM property called webContextName. What we’ve done in the previous example is separate application-specific dependencies from dependencies which will be present in a Servlet contains global classpath.

However, simply separating according to scope may not be enough, particularly in the case of a web application. It’s conceivable that one or more runtime dependencies will actually be bundles of standardized, non-compiled resources for use in the web application. For example, consider a set of web application which reuse a common set of Javascript, CSS, SWF, and image resources. To make these resources easy to standardize, it’s a common practice to bundle them up in an archive and deploy them to the Maven repository. At that point, they can be referenced as standard Maven dependencies - possibly with a dependency type of zip - that are normally specified with a runtime scope. Remember, these are resources, not binary dependencies of the application code itself; therefore, it’s not appropriate to blindly include them in the WEB-INF/lib directory. Instead, these resource archives should be separated from binary runtime dependencies, and unpacked into the web application document root somewhere. In order to achieve this kind of separation, we’ll need to use inclusion and exclusion patterns that apply to the coordinates of a specific dependency.

In other words, say you have three or four web application which reuse the same resources and you want to create an assembly that puts provided dependencies into lib/, runtime dependencies into webapps/<contextName>/WEB-INF/lib, and then unpacks a specific runtime dependency into your web application’s document root. You can do this because the Assembly allows you to define multiple include and exclude patterns for a given dependencySet element. Read the next section for more development of this idea.

Fine Tuning: Dependency Includes and Excludes

A resource dependency might be as simple as a set of resources (CSS, Javascript, and Images) in a project that has an assembly which creates a ZIP archive. Depending on the particulars of our web application, we might be able to distinguish resource dependencies from binary dependencies solely according to type. Most web applications are going to depend on other dependencies of type jar, and it is possible that we can state with certainty that all dependencies of type zip are resource dependencies. Or, we might have a situation where resources are stored in jar format, but have a classifier of something like resources. In either case, we can specify an inclusion pattern to target these resource dependencies and apply different logic than that used for binary dependencies. We’ll specify these tuning patterns using the includes and excludes sections of the dependencySet.

Both includes and excludes are list sections, meaning they accept the sub-elements include and exclude respectively. Each include or exclude element contains a string value, which can contain wildcards. Each string value can match dependencies in a few different ways. Generally speaking, three identity pattern formats are supported:

groupId:artifactId - version-less key
You would use this pattern to match a dependency by only the groupId and the artifactId.
groupId:artifactId:type[:classifier] - conflict id
The pattern allows you to specify a wider set of coordinates to create a more specific include/exclude pattern.
groupId:artifactId:type[:classifier]:version - full artifact identity
If you need to get really specific, you can specify all the coordinates.

All of these pattern formats support the wildcard character ‘*’, which can match any subsection of the identity and is not limited to matching single identity parts (sections between ‘:’ characters). Also, note that the classifier section above is optional, in that patterns matching dependencies that don’t have classifiers do not need to account for the classifier section in the pattern.

In the example given above, where the key distinction is the artifact type zip, and none of the dependencies have classifiers, the following pattern would match resource dependencies assuming that they were of type zip:

*:zip

The pattern above makes use of the second dependency identity: the dependency’s conflict id. Now that we have a pattern that distinguishes resource dependencies from binary dependencies, we can modify our dependency sets to handle resource archives differently:

Using Dependency Excludes and Includes in dependencySets

<assembly>
    ...
    <dependencySets>
        <dependencySet>
            <scope>provided</scope>
            <outputDirectory>lib/${project.artifactId}</outputDirectory>
        </dependencySet>
        <dependencySet>
            <scope>runtime</scope>
            <outputDirectory>
                webapps/${webContextName}/WEB-INF/lib
            </outputDirectory>
            <excludes>
                <exclude>*:zip</exclude>
            </excludes>
        </dependencySet>
        <dependencySet>
            <scope>runtime</scope>
            <outputDirectory>
                webapps/${webContextName}/resources
            </outputDirectory>
            <includes>
                <include>*:zip</include>
            </includes>
            <unpack>true</unpack>
        </dependencySet>
    </dependencySets>
    ...
</assembly>

In Using Dependency Excludes and Includes in dependencySets, the runtime-scoped dependency set from our last example has been updated to exclude resource dependencies. Only binary dependencies (non-zip dependencies) should be added to the WEB-INF/lib directory of the web application. Resource dependencies now have their own dependency set, which is configured to include these dependencies in the resources directory of the web application. The includes section in the last dependencySet reverses the exclusion from the previous dependencySet, so that resource dependencies are included using the same identity pattern (i.e. *:zip). The last dependencySet refers to the shared resource dependency and it is configured to unpack the shared resource dependency in the document root of the web application.

Using Dependency Excludes and Includes in dependencySets was based upon the assumption that our shared resources project dependency had a type which differed from all of the other dependencies. What if the share resource dependency had the same type as all of the other dependencies? How could you differentiate the dependency? In this case if the shared resource dependency had been bundled as a JAR with the classifier resources, you would match that dependency with the following identity pattern:

*:jar:resources

Instead of matching on artifacts with a type of zip and no classifier, we’re matching on artifacts with a classifier of resources and a type of jar.

Just like the fileSets section, dependencySets support the useStrictFiltering flag. When enabled, any specified patterns that don’t match one or more dependencies will cause the assembly - and consequently, the build - to fail. This can be particularly useful as a safety valve, to make sure your project dependencies and assembly descriptors are synchronized and interacting as you expect them to. By default, this flag is set to false for the purposes of backward compatibility.

Transitive Dependencies, Project Attachments, and Project

The dependencySet section supports two more general mechanisms for tuning the subset of matching artifacts: transitive selection options, and options for working with project artifacts. Both of these features are a product of the need to support legacy configurations that applied a somewhat more liberal definition of the word “dependency”. As a prime example, consider the project’s own main artifact. Typically, this would not be considered a dependency; yet older versions of the Assembly plugin included the project artifact in calculations of dependency sets. To provide backward compatibility with this “feature”, the 2.2 releases (currently at 2.2-beta-2) of the Assembly plugin support a flag in the dependencySet called useProjectArtifact, whose default value is true. By default, dependency sets will attempt to include the project artifact itself in calculations about which dependency artifacts match and which don’t. If you’d rather deal with the project artifact separately, set this flag to false.

Tip

The authors of this book recommend that you always set useProjectArtifact to false.

As a natural extension to the inclusion of the project artifact, the project’s attached artifacts can also be managed within a dependencySet using the useProjectAttachments flag (whose default value is false). Enabling this flag allows patterns that specify classifiers and types to match on artifacts that are “attached” to the main project artifact; that is, they share the same basic groupId/artifactId/version identity, but differ in type and classifier from the main artifact. This could be useful for including JavaDoc or source jars in an assembly.

Aside from dealing with the project’s own artifacts, it’s also possible to fine-tune the dependency set using two transitive-resolution flags. The first, called useTransitiveDependencies (and set to true by default) simply specifies whether the dependency set should consider transitive dependencies at all when determining the matching artifact set to be included. As an example of how this could be used, consider what happens when your POM has a dependency on another assembly. That assembly (most likely) will have a classifier that separates it from the main project artifact, making it an attachment. However, one quirk of the Maven dependency-resolution process is that the transitive-dependency information for the main artifact is still used when resolving the assembly artifact. If the assembly bundles its project dependencies inside itself, using transitive dependency resolution here would effectively duplicate those dependencies. To avoid this, we simply set useTransitiveDependencies to false for the dependency set that handles that assembly dependency.

The other transitive-resolution flag is far more subtle. It’s called useTransitiveFiltering, and has a default value of false. To understand what this flag does, we first need to understand what information is available for any given artifact during the resolution process. When an artifact is a dependency of a dependency (that is, removed at least one level from your own POM), it has what Maven calls a "dependency trail", which is maintained as a list of strings that correspond to the full artifact identities (groupId:artifactId:type:[classifier:]version) of all dependencies between your POM and the artifact that owns that dependency trail. If you remember the three types of artifact identities available for pattern matching in a dependency set, you’ll notice that the entries in the dependency trail - the full artifact identity - correspond to the third type. When useTransitiveFiltering is set to true, the entries in an artifact’s dependency trail can cause the artifact to be included or excluded in the same way its own identity can.

If you’re considering using transitive filtering, be careful! A given artifact can be included from multiple places in the transitive-dependency graph, but as of Maven 2.0.9, only the first inclusion’s trail will be tracked for this type of matching. This can lead to subtle problems when collecting the dependencies for your project.

Warning

Most assemblies don’t really need this level of control over dependency sets; consider carefully whether yours truly does. Hint: It probably doesn’t.

Advanced Unpacking Options

As we discussed previously, some project dependencies may need to be unpacked in order to create a working assembly archive. In the examples above, the decision to unpack or not was simple. It didn’t take into account what needed to be unpacked, or more importantly, what should not be unpacked. To gain more control over the dependency unpacking process, we can configure the unpackOptions element of the dependencySet. Using this section, we have the ability to choose which file patterns to include or exclude from the assembly, and whether included files should be filtered to resolve expressions using current POM information. In fact, the options available for unpacking dependency sets are fairly similar to those available for including files from the project directory structure, using the file sets descriptor section.

To continue our web-application example, suppose some of the resource dependencies have been bundled with a file that details their distribution license. In the case of our web application, we’ll handle third-party license notices by way of a NOTICES file included in our own bundle, so we don’t want to include the license file from the resource dependency. To exclude this file, we simply add it to the unpack options inside the dependency set that handles resource artifacts:

Excluding Files from a Dependency Unpack. 

<asembly>
    ...
    <dependencySets>
        <dependencySet>
            <scope>runtime</scope>
            <outputDirectory>
                webapps/${webContextName}/resources
            </outputDirectory>
            <includes>
                <include>*:zip</include>
            </includes>
            <unpack>true</unpack>
            <unpackOptions>
                <excludes>
                    <exclude>**/LICENSE*</exclude>
                </excludes>
            </unpackOptions>
        </dependencySet>
    </dependencySets>
    ...
</assembly>

Notice that the exclude we’re using looks very similar to those used in fileSet declarations. Here, we’re blocking any file starting with the word LICENSE in any directory within our resource artifacts. You can think of the unpack options section as a lightweight fileSet applied to each dependency matched within that dependency set. In other words, it is a fileSet by way of an unpacked dependency. Just as we specified an exclusion pattern for files within resource dependencies in order to block certain files, you can also choose which restricted set of files to include using the includes section. The same code that processes inclusions and exclusions on fileSets has been reused for processing unpackOptions.

In addition to file inclusion and exclusion, the unpack options on a dependency set also provides a filtering flag, whose default value is false. Again, this should be familiar from our discussion of file sets above. In both cases, expressions using either the Maven syntax supported. Filtering is a particularly nice feature to have for dependency sets, though, since it effectively allows you to create standardized, versioned resource templates that are then customized to each assembly as they are included. Once you start mastering the use of filtered, unpacked dependencies which store shared resources, you will be able to start abstracting repeated resources into common resource projects.

Summarizing Dependency Sets

Finally, it’s worth mentioning that dependency sets support the same fileMode and directoryMode configuration options that file sets do, though you should remember that the directoryMode setting will only be used when dependencies are unpacked.

8.5.5. moduleSets Sections

Multi-module builds are generally stitched together using the parent and modules sections of interrelated POMs. Typically, parent POMs specify their children in a modules section, which under normal circumstances causes the child POMs to be included in the build process of the parent. Exactly how this relationship is constructed can have important implications for the ways in which the Assembly plugin can participate in this process, but we’ll discuss that more later. For now, it’s enough to keep in mind this parent-module relationship as we discuss the moduleSets section.

Projects are stitched together into multi-module builds because they are part of a larger system. These projects are designed to be used together, and single module in a larger build has little practical value on its own. In this way, the structure of the project’s build is related to the way we expect the project (and its modules) to be used. If consider the project from the user’s perspective, it makes sense that the ideal end goal of that build would be a single, distributable file that the user can consume directly with minimum installation hassle. Since Maven multi-module builds typically follow a top-down structure, where dependency information, plugin configurations, and other information trickles down from parent to child, it seems natural that the task of rolling all of these modules into a single distribution file should fall to the topmost project. This is where the moduleSet comes into the picture.

Module sets allow the inclusion of resources that belong to each module in the project structure into the final assembly archive. Just like you can select a group of files to include in an assembly using a fileSet and a dependencySet, you can include a set of files and resources using a moduleSet to refer to modules in a multi-module build. They achieve this by enabling two basic types of module-specific inclusion: file-based, and artifact-based. Before we get into the specifics and differences between file-based and artifact-based inclusion of module resources into an assembly, let’s talk a little about selecting which modules to process.

Module Selection

By now, you should be familiar with includes/excludes patterns as they are used throughout the assembly descriptor to filter files and dependencies. When you are referring to modules in an assembly descriptor, you will also use the includes/excludes patterns to define rules which apply to different sets of modules. The difference in moduleSet includes and excludes is that these rules do not allow for wildcard patterns. (As of the 2.2-beta-2 release, this feature has not really seen much demand, so it hasn’t been implemented.) Instead, each include or exclude value is simply the groupId and artifactId for the module, separated by a colon, like this:

groupId:artifactId

In addition to includes and excludes, the moduleSet also supports an additional selection tool: the includeSubModules flag (whose default value is true). The parent-child relationship in any multi-module build structure is not strictly limited to two tiers of projects. In fact, you can include any number of tiers, or layers, in your build. Any project that is a module of a module of the current project is considered a sub-module. In some cases, you may want to deal with each individual module in the build separately (including sub-modules). For example, this is often simplest when dealing with artifact-based contributions from these modules. To do this, you would simply leave the useSubModules flag set to the default of true.

When you’re trying to include files from each module’s directory structure, you may wish to process that module’s directory structure only once. If your project directory structure mirrors that of the parent-module relationships that are included in the POMs, this approach would allow file patterns like **/src/main/java to apply not only to that direct module’s project directory, but also to the directories of its own modules as well. In this case you don’t want to process sub-modules directly (they will be processed as subdirectories within your own project’s modules instead), you should set the useSubModules flag to false.

Once we’ve determined how module selection should proceed for the module set in question, we’re ready to choose what to include from each module. As mentioned above, this can include files or artifacts from the module project.

Sources Section

Suppose you want to include the source of all modules in your project’s assembly, but you would like to exclude a particular module. Maybe you have a project named secret-sauce which contains secret and sensitive code that you don’t want to distribute with your project. The simplest way to accomplish this is to use a moduleSet which includes each project’s directory in ${module.basedir.name} and which excludes the secret-sauce module from the assembly.

Includes and Excluding Modules with a moduleSet

<assembly>
    ...
    <moduleSets>
        <moduleSet>
            <includeSubModules>false</includeSubModules>
            <excludes>
                <exclude>
                    com.mycompany.application:secret-sauce
                </exclude>
            </excludes>
            <sources>
                <outputDirectoryMapping>
                    ${module.basedir.name}
                </outputDirectoryMapping>
                <excludeSubModuleDirectories>
                    false
                </excludeSubModuleDirectories>
                <fileSets>
                    <fileSet>
                        <directory>/</directory>
                        <excludes>
                            <exclude>**/target</exclude>
                        </excludes>
                    </fileSet>
                </fileSets>
            </sources>
        </moduleSet>
    </moduleSets>
    ...
</assembly>

In Includes and Excluding Modules with a moduleSet, since we’re dealing with each module’s sources it’s simpler to deal only with direct modules of the current project, handling sub-modules using file-path wildcard patterns in the file set. We set the includeSubModules element to false so we don’t have to worry about submodules showing up in the root directory of the assembly archive. The exclude element will take care of excluding the secret-sauce module. We’re not going to include the project sources for the secret-sauce module; they’re, well, secret.

Normally, module sources are included in the assembly under a subdirectory named after the module’s artifactId. However, since Maven allows modules that are not in directories named after the module project’s artifactId, it’s often better to use the expression ${module.basedir.name} to preserve the module directory’s actual name (${module.basedir.name} is the same as calling MavenProject.getBasedir().getName()). It is critical to remember that modules are not required to be subdirectories of the project that declares them. If your project has a particularly strange directory structure, you may need to resort to special moduleSet declarations that include specific project and account for your own project’s idiosyncrasies.

Warning

Try to minimize your own project’s idiosyncrasies, while Maven is flexible, if you find yourself doing too much configuration there is likely an easier way.

Continuing through Includes and Excluding Modules with a moduleSet, since we’re not processing sub-modules explicitly in this module set, we need to make sure sub-module directories are not excluded from the source directories we consider for each direct module. By setting the excludeSubModuleDirectories flag to false, this allows us to apply the same file pattern to directory structures within a sub-module of the one we’re processing. Finally in Includes and Excluding Modules with a moduleSet, we’re not interested in any output of the build process for this module set. We exclude the target/ directory from all modules.

It’s also worth mentioning that the sources section supports fileSet-like elements directly within itself, in addition to supporting nested fileSets. These configuration elements are used to provide backward compatibility to previous versions of the Assembly plugin (versions 2.1 and under) that didn’t support multiple distinct file sets for the same module without creating a separate module set declaration. They are deprecated, and should not be used.

Interpolation of outputDirectoryMapping in

In the section called “Customizing Dependency Output Location”, we used the element outputDirectoryMapping to change the name of the directory under which each module’s sources would be included. The expressions contained in this element are resolved in exactly the same way as the outputFileNameMapping, used in dependency sets (see the explanation of this algorithm in Section 8.5.4, “dependencySets Section”).

In Includes and Excluding Modules with a moduleSet, we used the expression ${module.basedir.name}. You might notice that the root of that expression, module, is not listed in the mapping-resolution algorithm from the dependency sets section; this object root is specific to configurations within moduleSets. It works in exactly the same way as the ${artifact.*} references available in the outputFileNameMapping element, except it is applied to the module’s MavenProject, Artifact, and ArtifactHandler instances instead of those from a dependency artifact.

Binaries section

Just as the sources section is primarily concerned with including a module in its source form, the binaries section is primarily concerned with including the module’s build output, or its artifacts. Though this section functions primarily as a way of specifying dependencySets that apply to each module in the set, there are a few additional features unique to module artifacts that are worth exploring: attachmentClassifier and includeDependencies. In addition, the binaries section contains options similar to the dependencySet section, that relate to the handling of the module artifact itself. These are: unpack, outputFileNameMapping, outputDirectory, directoryMode, and fileMode. Finally, module binaries can contain a dependencySets section, to specify how each module’s dependencies should be included in the assembly archive. First, let’s take a look at how the options mentioned here can be used to manage the module’s own artifacts.

Suppose we want to include the javadoc jars for each of our modules inside our assembly. In this case, we don’t care about including the module dependencies; we just want the javadoc jar. However, since this particular jar is always going to be present as an attachment to the main project artifact, we need to specify which classifier to use to retrieve it. For simplicity, we won’t cover unpacking the module javadoc jars, since this configuration is exactly the same as what we used for dependency sets earlier in this chapter. The resulting module set might look similar to Including JavaDoc from Modules in an Assembly.

Including JavaDoc from Modules in an Assembly. 

<assembly>
    ...
    <moduleSets>
        <moduleSet>
            <binaries>
                <attachmentClassifier>javadoc</attachmentClassifier>
                <includeDependencies>false</includeDependencies>
                <outputDirectory>apidoc-jars</outputDirectory>
            </binaries>
        </moduleSet>
    </moduleSets>
    ...
</assembly>

In Including JavaDoc from Modules in an Assembly, we don’t explicitly set the includeSubModules flag, since it’s true by default. However, we definitely want to process all modules - even sub-modules - using this module set, since we’re not using any sort of file pattern that could match on sub-module directory structures within. The attachmentClassifier grabs the attached artifact with the javadoc classifier for each module processed. The includeDependencies element tells the Assembly plugin that we’re not interested in any of the module’s dependencies, just the javadoc attachment. Finally, the outputDirectory element tells the Assembly plugin to put all of the javadoc jars into a directory named apidoc-jars/ off of the assembly root directory.

Although we’re not doing anything too complicated in this example, it’s important to understand that the same changes to the expression-resolution algorithm discussed for the outputDirectoryMapping element of the sources section also applies here. That is, whatever was available as ${artifact.*} inside a dependencySet+’s +outputFileNameMapping configuration is also available here as ${module.*}. The same applies for outputFileNameMapping when used directly within a binaries section.

Finally, let’s examine an example where we simply want to process the module’s artifact and its runtime dependencies. In this case, we want to separate the artifact set for each module into separate directory structures, according to the module’s artifactId and version. The resulting module set is surprisingly simply, and it looks like the listing in Including Module Artifacts and Dependencies in an Assembly:

Including Module Artifacts and Dependencies in an Assembly. 

<assembly>
    ...
    <moduleSets>
        <moduleSet>
            <binaries>
                <outputDirectory>
                    ${module.artifactId}-${module.version}
                </outputDirectory>
                <dependencySets>
                    <dependencySet/>
                </dependencySets>
            </binaries>
        </moduleSet>
    </moduleSets>
    ...
</assembly>

In Including Module Artifacts and Dependencies in an Assembly, we’re using the empty dependencySet element here, since that should include all runtime dependencies by default, with no configuration. With the outputDirectory specified at the binaries level, all dependencies should be included alongside the module’s own artifact in the same directory, so we don’t even need to specify that in our dependency set.

For the most part, module binaries are fairly straightforward. In both parts - the main part, concerned with handling the module artifact itself, and the dependency sets, concerned with the module’s dependencies - the configuration options are very similar to those in a dependency set. Of course, the binaries section also provides options for controlling whether dependencies are included, and which main-project artifact you want to use.

Like the sources section, the binaries section contains a couple of configuration options that are provided solely for backward compatibility, and should be considered deprecated. These include the includes and excludes sub-sections.

moduleSets, Parent POMs

Finally, we close the discussion about module handling with a strong warning. There are subtle interactions between Maven’s internal design as it relates to parent-module relationships and the execution of a module-set’s binaries section. When a POM declares a parent, that parent must be resolved in some way or other before the POM in question can be built. If the parent is in the Maven repository, there is no problem. However, as of Maven 2.0.9 this can cause big problems if that parent is a higher-level POM in the same build, particularly if that parent POM expects to build an assembly using its modules’ binaries.

Maven 2.0.9 sorts projects in a multi-module build according to their dependencies, with a given project’s dependencies being built ahead of itself. The problem is the parent element is considered a dependency, which means the parent project’s build must complete before the child project is built. If part of that parent’s build process includes the creation of an assembly that uses module binaries, those binaries will not exist yet, and therefore cannot be included, causing the assembly to fail. This is a complex and subtle issue, which severely limits the usefulness of the module binaries section of the assembly descriptor. In fact, it has been filed in the bug tracker for the Assembly plugin at: http://jira.codehaus.org/browse/MASSEMBLY-97. Hopefully, future versions of Maven will find a way to restore this functionality, since the parent-first requirement may not be completely necessary.

8.5.6. Repositories Section

The repositories section represents a slightly more exotic feature in the assembly descriptor, since few applications other than Maven can take full advantage of a Maven-repository directory structure. For this reason, and because many of its features closely resemble those in the dependencySets section, we won’t spend too much time on the repositories section of the assembly descriptor. In most cases, users who understand dependency sets should have no trouble constructing repositories via the Assembly plugin. We’re not going to motivate the repositories section; we’re not going to go through a the business of setting up a use case and walking you through the process. We’re just going to bring up a few caveats for those of you who find the need to use the repositories section.

Having said that, there are a two features particular to the repositories section that deserve some mention. The first is the includeMetadata flag. When set to true it includes metadata such as the list of real versions that correspond to -SNAPSHOT virtual versions, and by default it’s set to false. At present, the only metadata included when this flag is true is the information downloaded from Maven’s central repository.

The second feature is called groupVersionAlignments. Again, this section is a list of individual groupVersionAlignment configurations, whose purpose is to normalize all included artifacts for a particular groupId to use a single version. Each alignment entry consists of two mandatory elements - id and version - along with an optional section called excludes that supplies a list of artifactId string values which are to be excluded from this realignment. Unfortunately, this realignment doesn’t seem to modify the POMs involved in the repository, neither those related to realigned artifacts nor those that depend on realigned artifacts, so it’s difficult to imagine what the practical application for this sort of realignment would be.

In general, it’s simplest to apply the same principles you would use in dependency sets to repositories when adding them to your assembly descriptor. While the repositories section does support the above extra options, they are mainly provided for backward compatibility, and will probably be deprecated in future releases.

8.5.7. Managing the Assembly’s Root Directory

Now that we’ve made it through the main body of the assembly descriptor, we can close the discussion of content-related descriptor sections with something lighter: root-directory naming and site-directory handling.

Some may consider it a stylistic concern, but it’s often important to have control over the name of the root directory for your assembly, or whether the root directory is there at all. Fortunately, two configuration options in the root of the assembly descriptor make managing the archive root directory simple: includeBaseDirectory and baseDirectory. In cases like executable jar files, you probably don’t want a root directory at all. To skip it, simply set the includeBaseDirectory flag to false (it’s true by default). This will result in an archive that, when unpacked, may create more than one directory in the unpack target directory. While this is considered bad form for archives that are meant to be unpacked before use, it’s not so bad for archives that are consumable as-is.

In other cases, you may want to guarantee the name of the archive root directory regardless of the POM’s version or other information. By default, the baseDirectory element has a value equal to ${project.artifactId}-${project.version}. However, we can easily set this element to any value that consists of literal strings and expressions which can be interpolated from the current POM, such as ${project.groupId}-${project.artifactId}. This could be very good news for your documentation team! (We all have those, right?)

Another configuration available is the includeSiteDirectory flag, whose default value is false. If your project build has also constructed a website document root using the site lifecycle or the Site plugin goals, that output can be included by setting this flag to true. However, this feature is a bit limited, since it only includes the outputDirectory from the reporting section of the current POM (by default, target/site) and doesn’t take into consideration any site directories that may be available in module projects. Use it if you want, but a good fileSet specification or moduleSet specification with sources configured could serve equally well, if not better. This is yet another example of legacy configuration currently supported by the Assembly plugin for the purpose of backward compatibility. Your mileage may vary. If you really want to include a site that is aggregated from many modules, you’ll want to consider using a fileSet or moduleSet instead of setting includeSiteDirectory to true.

8.5.8. componentDescriptors and

To round out our exploration of the assembly descriptor, we should touch briefly on two other sections: containerDescriptorHandlers and componentDescriptors. The containerDescriptorHandlers section refers to custom components that you use to extend the capabilities of the Assembly plugin. Specifically, these custom components allow you to define and handle special files which may need to be merged from the multiple constituents used to create your assembly. A good example of this might be a custom container-descriptor handler that merged web.xml files from constituent war or war-fragment files included in your assembly, in order to create the single web-application descriptor required for you to use the resulting assembly archive as a war file.

The componentDescriptors section allows you to reference external assembly-descriptor fragments and include them in the current descriptor. Component references can be any of the following:

  1. Relative file paths: src/main/assembly/component.xml
  2. Artifact references: groupId:artifactId:version[:type[:classifier]]
  3. Classpath resources: /assemblies/component.xml
  4. URLs: http://www.sonatype.com/component.xml

Incidentally, when resolving a component descriptor, the Assembly plugin tries those different strategies in that exact order. The first one to succeed is used.

Component descriptors can contain many of the same content-oriented sections available in the assembly descriptor itself, with the exception of moduleSets, which is considered so specific to each project that it’s not a good candidate for reuse. Also included in a component descriptor is the containerDescriptorHandlers section, which we briefly discussed above. Component descriptors cannot contain formats, assembly id’s, or any configuration related to the base directory of the assembly archive, all of which are also considered unique to a particular assembly descriptor. While it may make sense to allow sharing of the formats section, this has not been implemented as of the 2.2-beta-2 Assembly-plugin release.