Customizing the Microsoft .NET Framework Common Language Runtime

As described, writing a host using the CLR hosting APIs offers you the most control over how assemblies are loaded into an application domain. To demonstrate the range of customizations available, I write a host that runs applications encased in the cocoons described earlier. The host, runcocoon.exe, takes the name of the cocoon to run as input and uses the methods in the System.Reflection namespace to invoke the application's main entry point to start it running. As the application runs, you'll load its assemblies out of the .cocoon file instead of letting the CLR follow its default rules.

You'll also implement different versioning rules than the ones the CLR would normally enforce. Specifically, you always use the assemblies that are contained in the cocoon as you're running the application. If a different version of one of the assemblies is placed on disk somewhere, and version policy has been set that would normally cause that version to be used, you can generally ignore it. This keeps the cocoon static and isolatedchanges made to the system by other applications won't affect it. However, there is one scenario in which you would consider version policy. (This is why I said you would only generally ignore policy earlier.) The CLR default version policy system comprises three levels as discussed in Chapter 7: application, publisher, and administrator. In this versioning scheme, you ignore application and publisher policy, but pay attention to administrator policy. A primary purpose of this policy level is to give an administrator a way to specify that a particular version of an assembly should not be used on the system because of a security vulnerability, a consistent crash, or some other fatal flaw. So if a banned assembly is contained in a cocoon you are trying to run, you'll fail to load it. Instead, you can print an error message and stop running the application. This new policy system is reasonable behavior and gives me a good chance to demonstrate how the hosting API can be used to implement a custom version policy scheme.

Recall from Chapter 2 that the COM interfaces in the hosting API are grouped into a set of managers. All of the interfaces in a given manager work together to provide a coherent set of functionality. One of these managers, the assembly loading manager, contains the two COM interfaces the host must implement to satisfy the requirements of the cocoon scenario: IHostAssemblyManager and IHostAssemblyStore. In addition to the implementations of these two interfaces, the runcocoon.exe host also contains an application domain manager, a host control object, and the main program logic that ties it all together (see Figure 8-4). In the next several sections, I describe each of these primary components of the host.

Figure 8-4. The architecture of the runcocoon.exe host

Implementing the IHostAssemblyManager Interface

IHostAssemblyManager is the primary interface in the assembly loading manager. That is, it is the interface the CLR asks for through the host control mechanism to determine whether you'd like to customize how the CLR loads assemblies. (Recall from Chapter 2 that the CLR calls the host's implementation of IHostControl::GetHostManager at startup once for every primary interface to determine which managers a host supports.) The methods on IHostAssemblyManager are described in Table 8-2.

Table 8-2. The Methods on IHostAssemblyManager

Method

Description

GetNonHostStoreAssemblies

Returns a list of assemblies that should be loaded by the CLR rather than by the host.

GetAssemblyStore

Returns the host's implementation of the IHostAssemblyStore interface. The CLR calls methods on IHostAssemblyStore to enable the host to load an assembly.

GetHostApplicationPolicy

In Chapter 6, I discuss how application-level policy can be specified for an application domain using the ConfigurationFile property on AppDomainSetup. GetHostApplicationPolicy provides an alternate way to specify application-level policy.

In addition to its role as the primary interface in the assembly loading manager, IHostAssemblyManager provides two key capabilities. First, it allows the host to specify the list of assemblies that should be loaded by the CLR instead of being redirected to the host. Second, IHostAssemblyManager allows the host to return its implementation of the other interface in this managerIHostAssemblyStore. I discuss these capabilities in the next two sections.

Specifying Non-Host-Loaded Assemblies

When a host provides an implementation of the assembly loading manager, the CLR calls the host directly to load an assembly instead of going through its normal resolution and loading process. Although this is exactly what is needed for the assemblies you've stored in the cocoon, or for the assemblies users have stored in the database in the SQL Server scenario (for example), it's almost always the case that the host does not want to take over the responsibility for loading some assemblies, namely, those assemblies that are shipped by Microsoft as part of the .NET Framework platform. These include assemblies such as System, System.Windows.Forms, and System.Web. Although it's possible for a host to load these assemblies, doing so leads to complications downstream. To imagine the issues you can run into, consider what would happen if the cocoons you are running contained the .NET Framework assemblies in addition to the assemblies that make up the application. Although this would make the cocoons even more selfcontained, it causes some additional implementation concerns that you, as the writer of the host, might not be willing to tackle. For example, recall from Chapter 3 that the CLR automatically enforces that the .NET Framework assemblies loaded into a process are the versions that were built and tested with the CLR that has been loaded. If a host were to load the .NET Framework itself, this benefit would be lost. In theory, you could figure out which assemblies to load based on a list of published version numbers, but even so, the host would just be guessingonly Microsoft as the builder of the .NET Framework knows the exact set of assemblies that are meant to work together.

The second complication involves handling servicing releases to the .NET Framework assemblies. Occasionally, Microsoft releases updates to the .NET Framework assemblies in the form of single bug fix releases, service packs, and so on. These updates are made directly to the .NET Framework assemblies stored in the global assembly cache. If a host were to package these assemblies in a custom format and load them from that format as I discussed doing with cocoons, the applications run by the host would not pick up these bug fix releases because the versions of those assemblies stored in the global assembly cache would not be used. Although you could argue this extra isolation is desired, there are clearly cases when the host would want the applications it runs to use the updated .NET Framework assemblies. The classic case when this behavior is desired is to pick up a bug fix that closes a security vulnerability, for example. If a host did want to be responsible for loading the .NET Framework assemblies, it could work around the servicing issue by loading the assemblies directly out of the global assembly cache itself. However, the process of doing so is not straightforward and, therefore, the benefits aren't likely worth the extra work. The .NET Framework SDK does include a set of APIs that enables you to enumerate the contents of the global assembly cache (see fusion.h in the Include directory of the SDK), but there are no APIs that enable you to load an assembly directly from the cache. It might be tempting to think that a managed API such as System.Reflection.Assembly.Load could be used to load the .NET Framework assembly, but because you've hooked the loading process by implementing an assembly loading manager, that call would just get redirected back to your host anyway!

Because of the complexities of dealing with service releases and of guaranteeing the .NET Framework assemblies match the CLR that is loaded, most hosts choose to load only those assemblies that their users have built as part of the applications they are hosting and let the CLR load the .NET Framework assemblies.

In the cocoon scenario, it's now clear that you'll load the assemblies built as part of the application out of the cocoon, but you'll let the CLR load the .NET Framework assemblies out of the global assembly cache. However, there is one more assembly I haven't considered yet: the CocoonHostRuntime assembly that contains the application domain manager. This assembly is neither a .NET Framework assembly, nor is it written by the user as part of the application. Rather, it is part of the host. You must decide whether you should load it yourself or leave it up to the CLR. In this case, there is no clear-cut answer. On one hand, you could include a copy of CocoonHostRuntime with each cocoon and load it yourself, or you could carry it along with the runcocoon.exe host and have the CLR load it. For this sample, choose the latter. The CocoonHostRuntime assembly will be deployed to the same directory as runcocoon.exe and loaded by the CLR from there. Figure 8-5 gives a summary of the various assemblies involved in running a cocoon, including from where and by whom they are loaded.

Figure 8-5. Assembly loading in the cocoon scenario

The CLR determines which assemblies it should load as opposed to which assemblies it should ask the host to load by calling the host's implementation of IHostAssemblyManager::GetNonHostStoreAssemblies. As the host, you have two choices for specifying how the CLR should behave regarding assemblies you'd like it to load. First, you can provide an exact list of assemblies you'd like the CLR to load. In this scenario, the CLR will load all assemblies in the list you provide and will call your implementation of IHostAssemblyStore for all others. Your other option is to let the CLR try to load all assemblies first by looking in the global assembly cache. If an assembly is found in the cache, it is loadedthe host is never asked. On the other hand, if the assembly could not be found in the global assembly cache, the host is asked to resolve the reference. In this case, if the host doesn't successfully resolve the reference, the CLR continues to look for the assembly by probing in the ApplicationBase directory structure.

These options both have their pros and cons. The advantage of telling the CLR exactly which assemblies you'd like it to load ensures that you as the host will always have complete control over the assemblies that you'd like to load. For example, in the cocoon scenario, this means that the CLR will never load an assembly from the global assembly cache that is also contained in the cocoon file. This preserves the isolation in that you always know that the assemblies encased in the cocoon files are the ones that are loaded.

On the other hand, the disadvantage of providing an exact list of assemblies for the CLR to load is that this list can become stale as the deployment environment changes around you. For example, say you've asked the CLR to load version 2.0.5000 of System.XML. Later, a version policy statement is issued (either by the publisher or the administrator) that redirects all references to System.XML from version 2.0.5000 to 2.0.6000. The CLR will apply this version policy and look to see whether the resulting reference is to an assembly you've asked it to load. In this case, the resulting reference will not be in the list of assemblies you've asked the CLR to load (because the version is different), so the CLR will call your implementation of IHostAssemblyStore to load the assembly. In this particular case, you can work around this by not providing a version number when you tell the CLR to load System.XML. Doing so, however, results in looser binding semantics than you might want. Either way, you can see how the installation of new assemblies and the presence of version policy can invalidate the list you provide.

As discussed, the alternative is to let the CLR look for all assemblies in the global assembly cache before giving the host the opportunity to load the assembly. Although this approach gets around the fragility problems you might see when you're providing a full list, it can result in the CLR loading some assemblies you wish it wouldn't. For example, say that an application in a cocoon file uses a statistical package in an assembly named AcmeStatistics. AcmeStatistics has a strong name and is packaged in the cocoon file along with the application that uses it. Furthermore, assume that another, completely unrelated application installed the AcmeStatistics assembly in the global assembly cache. If the CLR is given the first chance to load all assemblies, it's possible that the copy of AcmeStatistics in the global assembly cache will be loaded instead of the copy contained in the cocoon file. If the AcmeStatistics assembly in the global assembly cache is exactly the same as the copy in the cocoon file, it really doesn't matter from where it is loaded. However, because you are allowing the CLR to load AcmeStatistics from a location other than the cocoon, it is possible that the assembly that is loaded differs from the one contained in the cocoon file. For example, it could be that the copy of AcmeStatistics in the global assembly cache is a service release that just happens to have the same version number as the one in the cocoon. It's also possible that a version policy statement is present that redirects all references to the version of AcmeStatistics contained in the cocoon to the version in the global assembly cache.

GetNonHostStoreAssemblies returns a list of the assemblies you'd like the CLR to load. The list of assemblies is in the form of a pointer to an interface called ICLRAssemblyReferenceList, as you can see in the following definition from mscoree.idl:

interface IHostAssemblyManager: IUnknown { HRESULT GetNonHostStoreAssemblies ( [out] ICLRAssemblyReferenceList **ppReferenceList ); // Other methods omitted. }

Telling the CLR to attempt to load all assemblies first is straightforwardyou just return NULL for *ppReferenceList as shown in the following example:

HRESULT STDMETHODCALLTYPE CCocoonAssemblyManager::GetNonHostStoreAssemblies( ICLRAssemblyReferenceList **ppReferenceList) { *ppReferenceList = NULL; return S_OK; }

If, on the other hand, you'd like to supply the CLR with an exact list of assemblies to load, you must obtain a pointer to an ICLRAssemblyReferenceList that describes your list. Obtaining such an interface is done using the GetCLRAssemblyReferenceList method from ICLRAssemblyIdentityManager discussed earlier. Here's the definition of GetCLRAssemblyReferenceList from mscoree.idl:

interface ICLRAssemblyIdentityManager : IUnknown { HRESULT GetCLRAssemblyReferenceList( [in] LPCWSTR *ppwzAssemblyReferences, [in] DWORD dwNumOfReferences, [out] ICLRAssemblyReferenceList **ppReferenceList ); // Other methods omitted. }

GetCLRAssemblyReferenceList takes an array of string-based identities describing the assemblies you'd like the CLR to load. These identities are in the standard string-based form used in Chapter 7; that is:

"<assemblyName, Version=<version>, PublicKeyToken=<token>, culture=<culture>"

Given an array of assembly identities, along with a count of the number of items in the array, GetCLRAssemblyReferenceList returns an interface of type ICLRAssemblyReferenceList that you can pass to GetNonHostStoreAssemblies.

The string-based identities you pass to GetCLRAssemblyReferenceList can be either fully qualified (that is, they contain values for the public key token, version, and culture in addition to the required friendly name) or partial. The ability to specify partial identities in this case comes in handy, especially when referring to the .NET Framework assemblies. Recall from Chapter 3 that the CLR ensures that the .NET Framework assemblies that are loaded match the CLR that is running in the process. As a result, there's really no need to specify a version number when referring to an assembly that is part of the .NET Framework. In this case, it's much better to leave it to the CLR to determine which version to load.

Runcocoon.exe's implementation of GetNonHostStoreAssemblies tells the CLR to load the mscorlib, System, and CocoonHostRuntime assemblies as shown in the following code snippet. References to all other assemblies are redirected to the implementation of IHostAssemblyStore.

// The names of the assemblies you'd like the CLR to load const wchar_t *wszNonHostAssemblies[] = { L"CocoonHostRuntime, PublicKeyToken=38c3b24e4a6ee45e", L"mscorlib, PublicKeyToken=b77a5c561934e089", L"System, PublicKeyToken=b77a5c561934e089", }; // RunCocoon's implementation of GetNonHostStoreAssemblies HRESULT STDMETHODCALLTYPE CCocoonAssemblyManager::GetNonHostStoreAssemblies( ICLRAssemblyReferenceList **ppReferenceList) { // GetIdentityManager is a private method that uses GetRealProcAddress to // call GetCLRIdentityManager to get the ICLRAssemblyIdentityManager // interface. ICLRAssemblyIdentityManager *pIdentityManager = GetIdentityManager(); DWORD dwCount = sizeof(wszNonHostAssemblies)/sizeof(wszNonHostAssemblies[0]); HRESULT hr = pIdentityManager->GetCLRAssemblyReferenceList( wszNonHostAssemblies, dwCount, ppReferenceList); assert(SUCCEEDED(hr)); pIdentityManager->Release(); return S_OK; }

Another, less obvious use for GetNonHostStoreAssemblies is to enable a host to prevent a particular assembly from ever being loaded into a process. I discuss some of the motivation for this in Chapter 12, but for now, suffice it to say that some .NET Framework assemblies just don't make sense in certain hosting environments. For example, it probably doesn't make sense for the System.Windows.Forms assembly ever to be loaded in a server environment such as Microsoft ASP.NET or SQL Server. By not including such an assembly in the list returned from GetNonHostStoreAssemblies, and then refusing to load it yourself when your implementation of IHostAssemblyStore is called, you can prevent particular assemblies from ever being loaded.

Returning an Assembly Store

GetAssemblyStore, the final method to discuss on IHostAssemblyManager, is used to return an implementation of the IHostAssemblyStore interface. IHostAssemblyStore is the interface you implement to load assemblies out of the cocoon file as described in the next section. All hosts that implement GetCLRLoadedAssemblies will likely want to implement GetAssemblyStore, too. After all, without an implementation of IHostAssemblyStore, there would be no way to load any assembly other than those returned from GetNonHostStoreAssemblies. GetAssemblyStore's only parameter is a pointer into which you'll return the implementation of IHostAssemblyStore as shown in the following method signature:

interface IHostAssemblyManager: IUnknown { // Other methods omitted ... HRESULT GetAssemblyStore([out] IHostAssemblyStore **ppAssemblyStore); };

Runcocoon.exe's implementation of IHostAssemblyStore is contained in a class called CCocoonAssem blyStore. The implementation of GetAssemblyStore is straightforward: simply create a new instance of CCocoonAssemblyStore, cast it to a pointer to the IHostAssemblyStore interface, and return it through the out parameter as shown in the following code:

HRESULT STDMETHODCALLTYPE CCocoonAssemblyManager::GetAssemblyStore( IHostAssemblyStore **ppAssemblyStore) { m_pAssemblyStore = new CCocoonAssemblyStore(m_pRootStorage); *ppAssemblyStore = (IHostAssemblyStore *)m_pAssemblyStore; return S_OK; }

Implementing the IHostAssemblyStore Interface

The IHostAssemblyStore interface contains methods that hosts can use to load assemblies from formats other than standard PE files stored in the file system. It is this interface that enables SQL Server to load assemblies directly out of the database and the runcocoon.exe host to load assemblies directly from OLE structured storage files. Instead of returning the assembly as a filename on disk, implementers of IHostAssemblyStore return assemblies in the form of a pointer to an IStream interface. This enables you to load assemblies from literally anywhere that you can store or construct a contiguous set of bytes. It is IHostAssemblyStore that also enables you to customize how the default CLR version policy system works.

As you've seen, the CLR determines whether a host wishes to implement a custom assembly store using a two-step process. First, it calls the host implementation of IHostControl, passing in the IID for IHostAssemblyManager. This tells the CLR that the host implements the assembly loading manager. Next, the CLR calls IHostAssemblyManager::GetAssemblyStore to get a pointer to the IHostAssemblyStore interface representing the custom store.

IHostAssemblyStore contains two methods: one the CLR calls to resolve references to assemblies (ProvideAssembly), and another that is called to resolve references to individual files within an assembly (ProvideModule). This latter method is called only for assemblies that consist of more than one file.

Take a look at how to implement these two methods. As before, I describe them in the context of the runcocoon.exe host.

Resolving Assembly References

If a host provides an implementation of IHostAssemblyStore, the CLR will call the ProvideAssembly method to resolve all references to assemblies not contained in the non-host-store assemblies list. The input to ProvideAssembly is a structure called AssemblyBindInfo that contains not only the identity of the assembly to load, but also information about how the default CLR version policy system would affect the reference to that assembly. Assemblies resolved out of a custom store are returned to the CLR in the form of a pointer to an IStream interface. If you're not familiar with how IStream works, there's plenty of information on MSDN or in the platform SDK.

In addition to the AssemblyBindInfo structure and the stream through which the assembly is returned, ProvideAssembly also contains parameters that enable you to return debugging information, to associate any host-specific context data with a particular assembly bind, and to specify a unique identity for the assembly you are returning (more on this later). Here's the definition for ProvideAssembly from mscoree.idl:

interface IHostAssemblyStore: IUnknown { HRESULT ProvideAssembly ( [in] AssemblyBindInfo *pBindInfo, [out] UINT64 *pAssemblyId, [out] UINT64 *pContext, [out] IStream **ppStmAssemblyImage, [out] IStream **ppStmPDB); // Other method definitions omitted... }

The AssemblyBindInfo Structure

The AssemblyBindInfo structure has four fields:

typedef struct _AssemblyBindInfo { DWORD dwAppDomainId; LPCWSTR lpReferencedIdentity; LPCWSTR lpPostPolicyIdentity; DWORD ePolicyLevel; } AssemblyBindInfo;

The first field, dwAppDomainId, identifies the application domain into which the assembly will be loaded. This field isn't particularly useful in the runcocoon.exe host because there is only one application domain. To understand why this field is needed, consider what would happen if the host were capable of running multiple cocoons simultaneously in the same process. In this case, you'd probably choose to load each cocoon into its own application domain. Given the fact that there is only one implementation of IHostAssemblyStore per process, you'd have no way of identifying which cocoon file to load the requested assembly from without the dwAppDomainId field. The way you'd likely implement this is to keep a table that maps application domain IDs to .cocoon files. Then when ProvideAssembly was called, you'd use the dwAppDomainId you're passed to find the appropriate cocoon file in the table. The unique identifier for an application domain can be obtained using the Id property on System.AppDomain.

The rest of the fields in the AssemblyBindInfo structure identify the assembly that the host needs to load. This information is contained in three fields: lpReferencedIdentity, ePolicyLevel, and lpPostPolicyIdentity. The first field, lpReferencedIdentity, contains the identity of the original assembly as referenced by its caller. The ePolicyLevel field indicates whether that original reference would be redirected by any version policy were the CLR to load that assembly. (The values for ePolicyLevel are defined by the EBindPolicyLevels enumeration discussed later in the chapter.) That is, the ePolicyLevel field tells you whether any version policy that would affect the reference is present on the system. Finally, the lpPostPolicyIdentity field contains the assembly that would be referenced if the policy identified in ePolicyLevel were actually applied. Look at the following example to see how the values of these fields work together. Consider the case in which code is running in one of the cocoons that loads an assembly like so:

Assembly a = Assembly.Load("Customers, Version=1.1.0.0, PublicKeyToken=865931ab473067d1, culture=neutral");

Furthermore, say the administrator of the machine has used the .NET Configuration tool to specify some version policy for the Customers assembly. Specifically, the administrator has specified policy that redirects all references to version 1.1 of Customers to version 2.0. In XML, that policy would look like the following:

<configuration> <runtime> <assemblyBinding xmlns="urn:schemas-microsoft-com:asm.v1"> <dependentAssembly> <assemblyIdentity name="Customers" publicKeyToken="865931ab473067d1" /> <bindingRedirect oldVersion="1.1.0.0" newVersion="2.0.0.0" /> </dependentAssembly> </assemblyBinding> </runtime> </configuration>

In this situation, the relevant fields in the AssemblyBindInfo structure would have the following values when ProvideAssembly is called:

lpReferencedIdentity = "Customers, Version=1.1.0.0, PublicKeyToken=865931ab473067d1, Culture=neutral"; lpPostPolicyIdentity = "Customers, version=2.0.0.0, culture=neutral publickeytoken=865931ab473067d1, processorarchitecture=msil"; ePolicyLevel = ePolicyLevelAdmin

Note

You might notice that the format of lpPostPolicyIdentity is slightly different from the format of lpReferencedIdentity. Specifically, the keywords version, publickeytoken, and culture have a different case, and a new element called processorarchitecture appears. lpPostPolicyIdentity looks a bit different because it is a binding identity, whereas lpReferencedIdentity is the literal string from the assembly reference (the call to Assembly.Load in the example).

Implementers of ProvideAssembly can use the information in lpReferencedIdentity, lpPostPolicyIdentity, and ePolicyLevel for informational purposes only. That is, the CLR will enforce that the assembly you return from ProvideAssembly has a binding identity that matches lpPostPolicyIdentityyou are not free to return an assembly with any identity you want. In some ways this restriction is unfortunate because it limits the flexibility of what you can do with an assembly loading manager. Nevertheless, a host still has control over version policy because you control how policy is applied within your application domain .

Even though the implementation of ProvideAssembly cannot return an assembly the CLR doesn't expect, you can still implement some versioning rules quite easily. To demonstrate what you can do, let's implement some versioning rules to ensure that the assemblies stored in the cocoon file are the exact ones you load at run time. Specifically, you won't load a different version of an assembly based on the existence of version policy. You can avoid application-level policy easily because you control that for your application domain. As for publisher policy, there are a few ways to keep that from affecting you. As discussed in Chapter 6, the AppDomainSetup object has a property called DisallowPublisherPolicy you can set to cause all publisher policy statements to be ignored for code running in a particular application domain. Alternatively, you can specify the same setting using the <publisherPolicy> element of your application configuration file. This is the approach I've taken with runcocoon.exe. If you download the samples for this book, you'll find a file called runcocoon.exe.config in the same directory as runcocoon.exe. This file uses the <publisherPolicy> element to turn off publisher policy for the applications contained in cocoon files:

<?xml version="1.0"?> <configuration> <runtime> <assemblyBinding xmlns="urn:schemas-microsoft-com:asm.v1"> <publisherPolicy apply="no" /> </assemblyBinding> </runtime> </configuration>

Now that I've discussed how to prevent application and publisher policy from affecting your host, turn your attention to policy specified by the administrator. The primary use of administratorspecified version policy is to provide a mechanism that administrators can use to prevent a particular version of an assembly from being used. Administrators use this policy to prevent any application from using an assembly that has a known security vulnerability, causes a fatal crash, and so on. It's generally good practice to honor any policy set by an administrator. To that end, runcocoon.exe host will not load any assembly that an administrator has explicitly disallowed through version policy. However, instead of loading the alternate version the administrator calls for, you can simply print an error message and discontinue execution. In a way, you're taking the middle ground here: you are honoring what the administrator says by not loading the referenced assembly, but you're not opening yourself up to the possibility of unintended behavior by executing an assembly that wasn't originally tested as part of the application contained in the cocoon.

Testing to see whether the administrator has issued a version policy statement for an assembly in the cocoon is easy. Simply check for the appropriate value in the AssemblyBindInfo structure and return a "file not found" HRESULT to tell the CLR you can't find the assembly (that is, you won't load it). At this point, execution of the cocoon would stop with an exception. The following snippet from runcocoon's implementation of ProvideAssembly shows how to do this:

HRESULT STDMETHODCALLTYPE CCocoonAssemblyStore::ProvideAssembly( AssemblyBindInfo *pBindInfo, UINT64 *pAssemblyId, UINT64 *pContext, IStream **ppStmAssemblyImage, IStream **ppStmPDB) { // Check to see if administrator policy was applied. If so, print an error // to the command line and return "file not found." This will cause the // execution of the cocoon to stop with an exception. if (pBindInfo->ePolicyLevel == ePolicyLevelAdmin) { wprintf(L"Administrator Version Policy is present that redirects: %s to %s . Stopping execution\n", pBindInfo->lpReferencedIdentity, pBindInfo->lpPostPolicyIdentity); return HRESULT_FROM_WIN32(ERROR_FILE_NOT_FOUND); } // Rest of the function omitted for brevity... }

The EBindPolicyLevels Enumeration

Most of the values in the EBindPolicyLevels enumeration are self-explanatory because they map directly to the levels of the default version policy scheme, such as application, publisher, or administrator. A few, however, don't fit that pattern and require additional explanation. Here's the definition of the enumeration from mscoree.idl:

typedef enum { ePolicyLevelNone = 0x0, ePolicyLevelRetargetable = 0x1, ePolicyUnifiedToCLR = 0x2, ePolicyLevelApp = 0x4, ePolicyLevelPublisher = 0x8, ePolicyLevelHost = 0x10, ePolicyLevelAdmin = 0x20 } EBindPolicyLevels;

The first value that might not look familiar is ePolicyLevelRetargetable. Although it's not likely you'll ever see this value in your implementation of ProvideAssembly, it's worth spending a few minutes understanding for what it could be used. ePolicyLevelRetargetable is related to a feature in the CLR to support the different implementations of the CLR as described in the European Computer Manufacturers Association (ECMA) standard. Because the CLR is part of an international standard, anyone can produce an implementation of it on any platform. The ePolicyLevelRetargetable value shows up if an implementation of the CLR other than the version shipped as part of the full .NET Framework chose to reference a different assembly than the one the application originally referenced. This is useful, for example, in alternate implementations that have different names for the .NET Framework assemblies.

The second value of EBindPolicyLevels that doesn't fit the pattern of the familiar policy levels is ePolicyUnifiedToCLR. This value relates to the feature I discuss in Chapters 3 and 7 whereby a given CLR will load the matching versions of the .NET Framework assemblies. The term Unified comes from the sense that the CLR is unifying all references to the .NET Framework assemblies such that the set of assemblies that were shipped together are always used together. Two things would have to be true before you'd see this value passed to ProvideAssembly. First, your host would have to run an application that contains assemblies built with an older version of the CLR than the one running in the process (and hence would have references to the older .NET Framework assemblies). Second, your host would have to be responsible for loading some of the .NET Framework assemblies. In most scenarios, hosts don't load the .NET Framework assemblies as I concluded during the discussion of the IHostAssemblyManager::GetNonHostStoreAssemblies method earlier in the chapter.

Associating Host-Specific Data with an Assembly

The pContext parameter to ProvideAssembly enables you to communicate any host-specific data about an assembly from the unmanaged portion of your host to the managed portion. pContext is a pointer to a 64-bit unsigned integer in which you can store any host-specific data to associate with the assembly you return from ProvideAssembly. This data can be retrieved in managed code using the HostContext property on System.Reflection.Assembly.

The SQL Server host provides a good example of how host-specific data for an assembly can be used. When an administrator registers an assembly in the SQL Server catalog, she indicates which predefined set of security permissions that assembly should be granted when it is run. SQL Server records this information in its catalog along with the contents of the assembly. When an assembly is returned from ProvideAssembly, SQL Server reads the data describing the requested permission set from the catalog and returns it in *pContext. On the managed side, this information is obtained from the HostContext property on Assembly and is used as input into the security policy system to make sure the proper permission set is granted to the assembly. More details about how to associate permissions with an assembly are provided in Chapter 10.

Assigning Assembly Identity

Before you look at runcocoon's implementation of ProvideAssembly, I have one more topic to discuss: how assemblies are uniquely identified within the CLR data structures at run time. When the CLR loads an assembly from disk, it uses the fully qualified pathname of the file, in addition to the assembly's name, to uniquely identify the assembly. The pathname is used as part of the internal identity of an assembly partly to ensure application correctness, but also as a performance optimization. If the CLR is asked to load the same physical file from disk multiple times, it can reuse the memory and data structures it has already set up for the assembly instead of loading the assembly multiple times.

When hosts take over the assembly loading process and return pointers to streams from ProvideAssembly, there is nothing (at least that can be computed cheaply) to uniquely identify the bytes pointed to by that stream within the CLR. It's up to the host to associate a unique piece of data with each stream that serves the same purpose that the filename does for an assembly loaded from the file system. That is, it enables the CLR to uniquely identify the assembly internally so performance can be increased by preventing multiple loads of the same assembly. The ability for the host to provide this unique identity is the purpose of the pAssemblyId parameter to ProvideAssembly.

Upon return from ProvideAssembly, *pAssemblyId is intended to hold a 64-bit number that uniquely identifies the assembly. If multiple calls to ProvideAssembly result in the same number being returned in *pAssemblyId, the CLR assumes the assemblies are the same and reuses the bytes and data structures it already has instead of mapping the contents of the stream again. The CLR treats the unique number assigned by the host as an opaque identityit never interprets the number in any way. Therefore, the semantics of this unique identifier are completely up to the host. The host can choose to use a value from an internal data structure, it can generate a unique value based on the assembly (such as a hash of the name), and so on.

The implementation of ProvideAssembly in runcocoon.exe uses a value from one of its internal data structures to uniquely identify assemblies to the CLR. Recall that each cocoon file has a stream named _index that contains a mapping of binding identities to the names of the streams containing the assemblies. When the CLR calls the implementation of ProvideAssembly, you would look through the index to find the name of the stream containing the assembly corresponding to the binding identity specified in pBindInfo->lpPostPolicyIdentity. When you find the appropriate index entry, remember its place in the index and use that as the assembly's unique identifier. Given that each binding identity is unique within the index, the position makes a perfect unique identifier for an assembly. As an example, consider the contents of the _index stream shown in Figure 8-6.

Figure 8-6. The _index stream for the HRTracker application

When the CLR calls ProvideAssembly with a binding identity of

Payroll, version=10.0.0.0, culture=neutral, publickeytoken=3d9829272b3b00b1, processorarchitecture=msil

look in the index and find the requested identity in entry number 1. Return the number 1 in *pbAssemblyId. Subsequent requests for the same binding identity will cause you to find the same entry in the index and therefore to return a pointer to the same stream within the cocoon file.

Again, the value you return in *pbAssemblyId can be any number that serves to uniquely identify a given assembly in your host. The assembly's position within the _index stream makes a perfect unique identifier in the cocoon scenario.

Loading Assemblies from a Cocoon

Now that I've covered the concepts needed to implement ProvideAssembly, look at the implementation in runcocoon.exe. The facts that assemblies are returned from ProvideAssembly in the form of pointers to streams and the cocoons are constructed of streams named after the assemblies they contain make the implementation pretty straightforward. After all, OLE structured storage files support streams directly, so there's no need for you to provide a custom implementation of IStream. All you need to do is use the structured storage APIs to open streams based on assembly name and return those streams directly from ProvideAssembly.

To recap, the implementation of ProvideAssembly in runcocoon.exe contains the following logic:

  1. Check to see whether administrator policy is set for the referenced assembly. If so, display an error message, set the appropriate error code, and return.

  2. Extract the binding identity of the requested assembly from the AssemblyBindInfo.lpPost-PolicyIdentity field. Given this binding identity, look in the _index stream to find the name of the stream in the cocoon that contains the assembly corresponding to the requested binding identity. The implementation of ProvideAssembly does this using a helper class called CStreamIndex.

  3. Set a unique identifier for the assembly to the position of the requested assembly in the index.

  4. Open the stream that contains the assembly you're looking for using the IStorage::Open stream structured storage API.

The implementation of ProvideAssembly from runcocoon.exe is shown in the following code. As described, ProvideAssembly uses some helper classes to get its job done. The source for these helper classes, along with the full source for runcocoon, can be found on this book's companion Web site.

HRESULT STDMETHODCALLTYPE CCocoonAssemblyStore::ProvideAssembly( AssemblyBindInfo *pBindInfo, UINT64 *pAssemblyId, UINT64 *pContext, IStream **ppStmAssemblyImage, IStream **ppStmPDB) { assert(m_pCocoonStorage); wprintf(L"ProvideAssembly called for binding identity: %s\n", pBindInfo->lpPostPolicyIdentity); // Check to see if administrator policy was applied. If so, print an error // to the command line and return "file not found." This will cause the // execution of the cocoon to stop with an exception. if (pBindInfo->ePolicyLevel == ePolicyLevelAdmin) { wprintf(L"Administrator Version Policy is present that redirects: %s to %s . Stopping execution\n", pBindInfo->lpReferencedIdentity, pBindInfo->lpPostPolicyIdentity); return HRESULT_FROM_WIN32(ERROR_FILE_NOT_FOUND); } // The CStreamIndex class contains the contents of the _index stream. // Call this class to find and open the stream containing the // assembly described by AssemblyBindInfo.lpPostPolicyIdentity. HRESULT hr = m_pStreamIndex->GetStreamForBindingIdentity( pBindInfo->lpPostPolicyIdentity, pAssemblyId, ppStmAssemblyImage); // Don't use pContext for any host-specific data - set it to 0. // Also, don't return a stream containing debugging information // for this assembly. *pContext = 0; *ppStmPDB = NULL; return hr; }

Resolving Module References

The ProvideModule method on IHostAssemblyStore exists to support assemblies that consist of multiple files. Before I get into the details of how ProvideModule works, some clarification of the terms assembly and module would be useful. Strictly speaking, an assembly is a collection of types and resources that act as a consistent unit in terms of versioning, deployment, security, and type visibility, among other characteristics. Nothing in the formal definition of an assembly says anything about how its contents are physically packaged. That is, the definition of an assembly does not dictate that all of an assembly's contents must be contained in a single file. In practice, though, this is almost always the case. The primary reason for this is that most development tools present assemblies as single physical files. Nevertheless, the capability does exist using some tools to build assemblies consisting of multiple files. For example, both the C# .NET and Visual Basic .NET compilers support a command-line option called addmodule that enables you to construct an assembly from multiple stand-alone files. In addition, you can use the SDK tool al.exe to build multi-file assemblies.

When an assembly consists of multiple files, one of those files contains an assembly manifest. A manifest is metadata that describes various aspects of the assembly, including its name and the files that make up the assembly. For example, consider the case of an assembly called Statistics that consists of three files: a file called statistics.dll, which contains the manifest, and two other code files called curves.dll and probability.dll. A high-level view of the contents of each of the files in this assembly is shown in Figure 8-7.

Figure 8-7. The contents of the files in the Statistics assembly

For purposes of the discussion of the IHostAssemblyStore interface, the file containing the assembly manifest (statistics.dll in the example) is called the assembly, whereas the other files in the assembly are called modules. When the Statistics assembly is initially referenced in code, the CLR calls the ProvideAssembly method to get the stream that contains the contents of statistics.dll. Then, if code contained in either curves.dll or probability.dll is referenced, the CLR calls ProvideModule to get the contents of those files.

Now that you understand when ProvideModule would be used, look at how to implement it. Many of the concepts I cover in the discussion of ProvideAssembly apply to ProvideModule as well. For example, the contents of modules are returned as pointers to IStream interfaces just as they are for assemblies. Also, the concept of assigning a unique identity to the stream that is returned applies here as well. So, many of the parameters to ProvideModule should look familiar. Here's its definition from mscoree.idl:

interface IHostAssemblyStore: IUnknown { // Other method definitions omitted... HRESULT ProvideModule ( [in] ModuleBindInfo *pBindInfo, [out] DWORD *pdwModuleId, [out] IStream **ppStmModuleImage, [out] IStream **ppStmPDB); }

As you can probably guess, the pdwModuleId parameter is used to assign a unique identity to the stream, the ppStmModuleImage parameter is used to return the IStream pointer representing the module, and ppStmPDB is an IStream pointer to the debugging information. The parameter that's new is the ModuleBindInfo parameter. This parameter serves the same logical purpose as does the AssemblyBindInfo parameter to ProvideAssemblyit identifies the module to be loaded. The ModuleBindInfo structure has three fields as shown in the following definition:

typedef struct _ModuleBindInfo { DWORD dwAppDomainId; LPCWSTR lpAssemblyIdentity; LPCWSTR lpModuleName; } ModuleBindInfo;

The first field, dwAppDomainId, identifies the application domain into which the module will be loaded. This field serves the same purpose as does the dwAppDomainId field in the AssemblyBindInfo structure. Because modules are always contained in part of an assembly, the implementer of ProvideModule must know which assembly contains the module being requested. The lpAssemblyIdentity field provides this information in the form of the string name of the containing assembly. The final field, lpModuleName, is the name of the module to load.

Not all CLR hosts support multi-file assemblies. As the creator of a new host, it's up to you to decide how important multi-file assemblies are to your scenario. As I said earlier, the tools support for creating multi-file assemblies isn't great, so in practice you don't see many of these assemblies. If you look at the popular hosts that exist today, you'll see a mixed bag of support: the ASP.NET, Microsoft Internet Explorer, and Default Host all support multi-file assemblies, but the SQL Server host doesn't. For purposes of this example, I've chosen not to support multi-file assemblies in runcocoon.exe. Opting not to support this scenario is easy from a coding perspective. All you need to do is return the HRESULT corresponding to ERROR_FILE_NOT_FOUND from IHostAssemblyStore::ProvideModule as shown in the following example:

HRESULT STDMETHODCALLTYPE CCocoonAssemblyStore::ProvideModule( ModuleBindInfo *pBindInfo, DWORD *pdwModuleId, IStream **ppStmModuleImage, IStream **ppStmPDB) { return HRESULT_FROM_WIN32(ERROR_FILE_NOT_FOUND); }

Bringing It All Together

The bulk of the implementation of the runcocoon.exe host is contained in the assembly loading manager I've been discussing in the last several sections. Let me now take that implementation and show what else is needed to make a fully functional runcocoon.exe. You need to take the following steps to complete the host:

1.

Open the .cocoon file passed as a command-line argument to runcocoon.exe.

2.

Initialize the CLR using CorBindToRuntimeEx.

3.

Create the objects that contain the implementation of the assembly loading manager and notify the CLR of their existence using a host control object.

4.

Use the application domain manager from the CocoonHostRuntime assembly to invoke the application contained in the cocoon. As the application runs, assemblies will be referenced and the implementation of IHostAssemblyStore will be called to load them from the cocoon.

The next several sections describe each step in detail.

Opening the Cocoon File

Because cocoons are OLE structured storage files, you can use the StgOpenStorage API from the platform SDK to open them. StgOpenStorage returns an IStorage pointer that you'll save and use later to open the streams corresponding to the application's assemblies. The following code from the main routine of runcocoon.exe uses StgOpenStorage to open the cocoon:

int wmain(int argc, wchar_t* argv[]) { HRESULT hr = S_OK; // Make sure a cocoon file was passed as a command-line argument. if (argc != 2) { wprintf(L"Usage: RunCocoon <cocoon file name>\n"); return 0; } // Open the cocoon using the structured storage APIs. IStorage *pRootStorage = NULL; hr = StgOpenStorage(argv[1], NULL, STGM_READ | STGM_DIRECT | STGM_SHARE_EXCLUSIVE, NULL, 0, &pRootStorage); if (!SUCCEEDED(hr)) { wprintf(L"Error opening cocoon file: %s\n", argv[1]); return 0; } // The rest of main omitted for brevity... }

Initializing the CLR

After you've verified that you can open the cocoon, it's time to initialize the CLR using CorBindToRuntimeEx. Your use of this API is straightforward: make sure .NET Framework version 2.0 of the CLR gets loaded, then save the pointer to ICLRRuntimeHost so you can use it later to start the CLR, set the host control object, and access the ICLRControl interface to register your application domain manager with the CLR:

// Start the CLR. Make sure .NET Framework 2.0 build is used. ICLRRuntimeHost *pCLR = NULL; hr = CorBindToRuntimeEx( L"v2.0.41013", L"wks", STARTUP_CONCURRENT_GC, CLSID_CLRRuntimeHost, IID_ICLRRuntimeHost, (PVOID*) &pCLR);

Creating the Assembly Loading Manager and Host Control Object

It's now time to hook your implementation of the assembly loading manager into the CLR so you get called to load assemblies out of the cocoon. Do this in three steps:

1.

Create an instance of the CCocoonAssemblyManager class. This class provides your implementation of IHostAssemblyManager. Recall from earlier in the chapter that this interface contains the GetNonHostStoreAssemblies method and also provides the CLR with your custom assembly store implementation through the GetAssemblyStore method.

2.

Create an instance of your host control object that provides the CLR with the implementation of your assembly loading manager. Runcocoon's host control object is contained in the class CHostControl. This class implements the IHostControl interface that the CLR uses to discover which managers a host supports. When the CLR calls IHostControl::Get-HostManager with the IID for IHostAssemblyManager, CHostControl returns your instance of CCocoonAssemblyManager (casted to a pointer to an IHostAssemblyStore interface, of course).

3.

The final step in hooking your assembly loading manager implementation into the CLR is to register your host control object with the CLR. Do this by passing an instance of CHostControl to ICLRRuntimeHost::SetHostControl.

The following code snippet from runcocoon's main routine demonstrates these three steps:

int wmain(int argc, wchar_t* argv[]) { // The first part of main omitted... // Create an instance of CCocoonAssemblyManager. This class contains your // implementation of the assembly loading manager, specifically the // IHostAssemblyStore interface. Pass the IStorage for the cocoon's // root storage object to the constructor. CCocoonAssemblyManager saves // this pointer and uses it later to load assemblies from the cocoon using // IHostAssemblyStore. CCocoonAssemblyManager *pAsmManager = new CCocoonAssemblyManager(pRootStorage); assert(pAsmManager); // Create a host control that takes the new assembly loading manager. The // CHostControl class implements IHostControl, which the CLR calls at // startup to determine which managers you support. In this case, // support just the assembly loading manager. CHostControl *pHostControl = new CHostControl(NULL, NULL, NULL, NULL, NULL, (IHostAssemblyManager *)pAsmManager, NULL, NULL, NULL); // Tell the CLR about your host control object. Remember that you must do // this before calling ICLRRuntimeHost::Start. pCLR->SetHostControl((IHostControl *)pHostControl);

Invoking the Hosted Application

The application contained in the cocoon is invoked from runcocoon's application domain manager. Your application domain manager is implemented by a class called CocoonDomainManager in the CocoonHostRuntime assembly. CocoonDomainManager has a Run method that takes the name of the assembly containing the application's executable and the name of the type within that assembly that contains the main method. Run loads the assembly containing the executable using the Assembly.Load method. After the assembly is loaded, Run uses other methods in the System.Reflection namespace to launch the application. The code for the CocoonHostRuntime assembly is shown in Listing 8-2.

Listing 8-2. CocoonHostRuntime.cs

using System; using System.Text; using System.Reflection; namespace CocoonHostRuntime { public interface ICocoonDomainManager { void Run(string assemblyName, string typeName); } public class CocoonDomainManager : AppDomainManager, ICocoonDomainManager { public override void InitializeNewDomain( AppDomainSetup appDomainInfo) { // Set the flags so that the unmanaged portion of your // host gets notified of your domain manager via // IHostControl::SetAppDomainManager. InitializationFlags = DomainManagerInitializationFlags.RegisterWithHost; } // Run the "main" method from <assemblyName>.<typeName>. public void Run(string assemblyName, string typeName) { try { Assembly asm = Assembly.Load(assemblyName); Type t = asm.GetType(typeName, true, true); MethodInfo m = t.GetMethod("Main"); m.Invoke(null, null); } catch (Exception e) { Console.WriteLine("Exception executing entry point: " + e.Message); } } } }

Before you can use CocoonDomainManager to execute your hosted application, you need to get the name of the assembly and the type containing the application's main from the cocoon.

Recall that these names are contained in the _exeBindingIdentity and _entryPoint streams, respectively. The complete code for runcocoon's main program is shown in Listing 8-3.

Listing 8-3. Runcocoon.cpp

// // Runcocoon.cpp : The main program for the runcocoon host. // #include "stdafx.h" #include "CHostControl.h" #include "CStreamIndex.h" #include "CCocoonAssemblyStore.h" #include "CCocoonAssemblyManager.h" // Returns the contents of pszStreamName and returns it in pszString. // This method is used to read the _exeBindingIdentity and _entryPoint // streams, which contain the binding identity of the assembly and the name of // the type containing the application's entry point. HRESULT GetStringFromStream(IStorage *pStorage, wchar_t *pszStreamName, wchar_t *pszString) { IStream *pStream = NULL; HRESULT hr = pStorage->OpenStream(pszStreamName, 0, STGM_READ | STGM_DIRECT | STGM_SHARE_EXCLUSIVE, 0, &pStream); assert(SUCCEEDED(hr)); // Determine how many bytes to read based on the size of the stream. STATSTG stats; pStream->Stat(&stats, STATFLAG_DEFAULT); // Read the bytes into pszString. DWORD dwBytesRead = 0; hr = pStream->Read(pszString, stats.cbSize.LowPart, &dwBytesRead); assert(stats.cbSize.LowPart == dwBytesRead); assert(SUCCEEDED(hr)); pStream->Release(); return S_OK; } int wmain(int argc, wchar_t* argv[]) { HRESULT hr = S_OK; // Make sure a cocoon file was passed as a command-line argument. if (argc != 2) { wprintf(L"Usage: RunCocoon <cocoon file name>\n"); return 0; } // Open the cocoon using the structured storage APIs. IStorage *pRootStorage = NULL; hr = StgOpenStorage(argv[1], NULL, STGM_READ | STGM_DIRECT | STGM_SHARE_EXCLUSIVE, NULL, 0, &pRootStorage); if (!SUCCEEDED(hr)) { wprintf(L"Error opening cocoon file: %s\n", argv[1]); return 0; } // Start .NET Framework 2.0 version of the CLR. ICLRRuntimeHost *pCLR = NULL; hr = CorBindToRuntimeEx( L"v2.0.41013 , L"wks", STARTUP_CONCURRENT_GC, CLSID_CLRRuntimeHost, IID_ICLRRuntimeHost, (PVOID*) &pCLR); assert(SUCCEEDED(hr)); // Create an instance of CCocoonAssemblyManager. This class contains your // implementation of the assembly loading manager, specifically the // IHostAssemblyStore interface. Pass the IStorage for the cocoon's // root storage object to the constructor. CCocoonAssemblyManager saves // this pointer and uses it later to load assemblies from the cocoon using // IHostAssemblyStore. CCocoonAssemblyManager *pAsmManager = new CCocoonAssemblyManager(pRootStorage); assert(pAsmManager); // Create a host control object that takes the new assembly loading // manager. The CHostControl class implements IHostControl, which // the CLR calls at startup to determine which managers you support. // In this case, support just the assembly loading manager. CHostControl *pHostControl = new CHostControl(NULL, NULL, NULL, NULL, NULL, (IHostAssemblyManager *)pAsmManager, NULL, NULL, NULL); // Tell the CLR about your host control object. Remember that you // must do this before calling ICLRRuntimeHost::Start. hr = pCLR->SetHostControl((IHostControl *)pHostControl); assert(SUCCEEDED(hr)); // Get the CLRControl object. Use this to set your AppDomainManager. ICLRControl *pCLRControl = NULL; hr = pCLR->GetCLRControl(&pCLRControl); assert(SUCCEEDED(hr)); hr = pCLRControl->SetAppDomainManagerType(L"CocoonHostRuntime, Version=5.0.0.0, PublicKeyToken=38c3b24e4a6ee45e, Culture=neutral", L"CocoonHostRuntime.CocoonDomainManager"); assert(SUCCEEDED(hr)); // Start the CLR. hr = pCLR->Start(); // Get the binding identity for the exe contained in the cocoon. wchar_t wszExeIdentity[MAX_PATH]; ZeroMemory(wszExeIdentity, MAX_PATH*sizeof(wchar_t)); hr = GetStringFromStream(pRootStorage, L"_exeBindingIdentity", wszExeIdentity); // Get the name of the type containing the application's main method. wchar_t wszEntryType[MAX_PATH]; ZeroMemory(wszEntryType, MAX_PATH*sizeof(wchar_t)); hr = GetStringFromStream(pRootStorage, L"_entryPoint", wszEntryType); // Launch the application using your domain manager. ICocoonDomainManager *pDomainManager = pHostControl->GetDomainManagerForDefaultDomain(); assert(pDomainManager); hr = pDomainManager->Run(wszExeIdentity, wszEntryType); assert(SUCCEEDED(hr)); pDomainManager->Release(); pCLRControl->Release(); pHostControl->Release(); return 0; }

The complete source code for the runcocoon host can be found on this book's companion Web site.

    Категории