| Copyright | [2014..2017] Trevor L. McDonell [2014..2014] Vinod Grover (NVIDIA Corporation)  | 
|---|---|
| License | BSD3 | 
| Maintainer | Trevor L. McDonell <tmcdonell@cse.unsw.edu.au> | 
| Stability | experimental | 
| Portability | non-portable (GHC extensions) | 
| Safe Haskell | None | 
| Language | Haskell2010 | 
Data.Array.Accelerate.LLVM.Native
Description
This module implements a backend for the Accelerate language targeting
 multicore CPUs. Expressions are on-line translated into LLVM code, which is
 just-in-time executed in parallel over the available CPUs. Functions are
 automatically parallelised over all available cores, unless you set the
 environment variable 'ACCELERATE_LLVM_NATIVE_THREADS=N', in which case N
 threads will be used.
Programs must be compiled with '-threaded', otherwise you will get a "Blocked indefinitely on MVar" error.
Synopsis
- data Acc a
 - class (Typeable a, Typeable (ArrRepr a)) => Arrays a
 - class Afunction f
 - type family AfunctionR f :: *
 - run :: Arrays a => Acc a -> a
 - runWith :: Arrays a => Native -> Acc a -> a
 - run1 :: (Arrays a, Arrays b) => (Acc a -> Acc b) -> a -> b
 - run1With :: (Arrays a, Arrays b) => Native -> (Acc a -> Acc b) -> a -> b
 - runN :: Afunction f => f -> AfunctionR f
 - runNWith :: Afunction f => Native -> f -> AfunctionR f
 - stream :: (Arrays a, Arrays b) => (Acc a -> Acc b) -> [a] -> [b]
 - streamWith :: (Arrays a, Arrays b) => Native -> (Acc a -> Acc b) -> [a] -> [b]
 - data Async a
 - wait :: Async a -> IO a
 - poll :: Async a -> IO (Maybe a)
 - cancel :: Async a -> IO ()
 - runAsync :: Arrays a => Acc a -> IO (Async a)
 - runAsyncWith :: Arrays a => Native -> Acc a -> IO (Async a)
 - run1Async :: (Arrays a, Arrays b) => (Acc a -> Acc b) -> a -> IO (Async b)
 - run1AsyncWith :: (Arrays a, Arrays b) => Native -> (Acc a -> Acc b) -> a -> IO (Async b)
 - runNAsync :: (Afunction f, RunAsync r, AfunctionR f ~ RunAsyncR r) => f -> r
 - runNAsyncWith :: (Afunction f, RunAsync r, AfunctionR f ~ RunAsyncR r) => Native -> f -> r
 - runQ :: Afunction f => f -> ExpQ
 - runQWith :: Afunction f => f -> ExpQ
 - runQAsync :: Afunction f => f -> ExpQ
 - runQAsyncWith :: Afunction f => f -> ExpQ
 - data Native
 - type Strategy = Gang -> Executable
 - createTarget :: [Int] -> Strategy -> IO Native
 - balancedParIO :: Int -> Strategy
 - unbalancedParIO :: Strategy
 
Documentation
Accelerate is an embedded language that distinguishes between vanilla arrays (e.g. in Haskell memory on the CPU) and embedded arrays (e.g. in device memory on a GPU), as well as the computations on both of these. Since Accelerate is an embedded language, programs written in Accelerate are not compiled by the Haskell compiler (GHC). Rather, each Accelerate backend is a runtime compiler which generates and executes parallel SIMD code of the target language at application runtime.
The type constructor Acc represents embedded collective array operations.
 A term of type Acc a is an Accelerate program which, once executed, will
 produce a value of type a (an Array or a tuple of Arrays). Collective
 operations of type Acc a comprise many scalar expressions, wrapped in
 type constructor Exp, which will be executed in parallel. Although
 collective operations comprise many scalar operations executed in parallel,
 scalar operations cannot initiate new collective operations: this
 stratification between scalar operations in Exp and array operations in
 Acc helps statically exclude nested data parallelism, which is difficult
 to execute efficiently on constrained hardware such as GPUs.
- A simple example
 
As a simple example, to compute a vector dot product we can write:
dotp :: Num a => Vector a -> Vector a -> Acc (Scalar a)
dotp xs ys =
  let
      xs' = use xs
      ys' = use ys
  in
  fold (+) 0 ( zipWith (*) xs' ys' )The function dotp consumes two one-dimensional arrays (Vectors) of
 values, and produces a single (Scalar) result as output. As the return type
 is wrapped in the type Acc, we see that it is an embedded Accelerate
 computation - it will be evaluated in the object language of dynamically
 generated parallel code, rather than the meta language of vanilla Haskell.
As the arguments to dotp are plain Haskell arrays, to make these available
 to Accelerate computations they must be embedded with the
 use function.
An Accelerate backend is used to evaluate the embedded computation and return
 the result back to vanilla Haskell. Calling the run function of a backend
 will generate code for the target architecture, compile, and execute it. For
 example, the following backends are available:
- accelerate-llvm-native: for execution on multicore CPUs
 - accelerate-llvm-ptx: for execution on NVIDIA CUDA-capable GPUs
 
See also Exp, which encapsulates embedded scalar computations.
- Avoiding nested parallelism
 
As mentioned above, embedded scalar computations of type Exp can not
 initiate further collective operations.
Suppose we wanted to extend our above dotp function to matrix-vector
 multiplication. First, let's rewrite our dotp function to take Acc arrays
 as input (which is typically what we want):
dotp :: Num a => Acc (Vector a) -> Acc (Vector a) -> Acc (Scalar a) dotp xs ys = fold (+) 0 ( zipWith (*) xs ys )
We might then be inclined to lift our dot-product program to the following
 (incorrect) matrix-vector product, by applying dotp to each row of the
 input matrix:
mvm_ndp :: Num a => Acc (Matrix a) -> Acc (Vector a) -> Acc (Vector a)
mvm_ndp mat vec =
  let Z :. rows :. cols  = unlift (shape mat)  :: Z :. Exp Int :. Exp Int
  in  generate (index1 rows)
               (\row -> the $ dotp vec (slice mat (lift (row :. All))))Here, we use generate to create a one-dimensional
 vector by applying at each index a function to slice
 out the corresponding row of the matrix to pass to the dotp function.
 However, since both generate and
 slice are data-parallel operations, and moreover that
 slice depends on the argument row given to it by
 the generate function, this definition requires
 nested data-parallelism, and is thus not permitted. The clue that this
 definition is invalid is that in order to create a program which will be
 accepted by the type checker, we must use the function
 the to retrieve the result of the dotp operation,
 effectively concealing that dotp is a collective array computation in order
 to match the type expected by generate, which is that
 of scalar expressions. Additionally, since we have fooled the type-checker,
 this problem will only be discovered at program runtime.
In order to avoid this problem, we can make use of the fact that operations
 in Accelerate are rank polymorphic. The fold
 operation reduces along the innermost dimension of an array of arbitrary
 rank, reducing the rank (dimensionality) of the array by one. Thus, we can
 replicate the input vector to as many rows there
 are in the input matrix, and perform the dot-product of the vector with every
 row simultaneously:
mvm :: A.Num a => Acc (Matrix a) -> Acc (Vector a) -> Acc (Vector a)
mvm mat vec =
  let Z :. rows :. cols = unlift (shape mat) :: Z :. Exp Int :. Exp Int
      vec'              = A.replicate (lift (Z :. rows :. All)) vec
  in
  A.fold (+) 0 ( A.zipWith (*) mat vec' )Note that the intermediate, replicated array vec' is never actually created
 in memory; it will be fused directly into the operation which consumes it. We
 discuss fusion next.
- Fusion
 
Array computations of type Acc will be subject to array fusion;
 Accelerate will combine individual Acc computations into a single
 computation, which reduces the number of traversals over the input data and
 thus improves performance. As such, it is often useful to have some intuition
 on when fusion should occur.
The main idea is to first partition array operations into two categories:
- Element-wise operations, such as 
map,generate, andbackpermute. Each element of these operations can be computed independently of all others. - Collective operations such as 
fold,scanl, andstencil. To compute each output element of these operations requires reading multiple elements from the input array(s). 
Element-wise operations fuse together whenever the consumer operation uses a single element of the input array. Element-wise operations can both fuse their inputs into themselves, as well be fused into later operations. Both these examples should fuse into a single loop:


If the consumer operation uses more than one element of the input array
 (typically, via generate indexing an array multiple
 times), then the input array will be completely evaluated first; no fusion
 occurs in this case, because fusing the first operation into the second
 implies duplicating work.
On the other hand, collective operations can fuse their input arrays into themselves, but on output always evaluate to an array; collective operations will not be fused into a later step. For example:

Here the element-wise sequence (use
 + generate + zipWith) will
 fuse into a single operation, which then fuses into the collective
 fold operation. At this point in the program the
 fold must now be evaluated. In the final step the
 map reads in the array produced by
 fold. As there is no fusion between the
 fold and map steps, this
 program consists of two "loops"; one for the use
 + generate + zipWith
 + fold step, and one for the final
 map step.
You can see how many operations will be executed in the fused program by
 Show-ing the Acc program, or by using the debugging option -ddump-dot
 to save the program as a graphviz DOT file.
As a special note, the operations unzip and
 reshape, when applied to a real array, are executed
 in constant time, so in this situation these operations will not be fused.
- Tips
 
- Since 
Accrepresents embedded computations that will only be executed when evaluated by a backend, we can programatically generate these computations using the meta language Haskell; for example, unrolling loops or embedding input values into the generated code. - It is usually best to keep all intermediate computations in 
Acc, and onlyrunthe computation at the very end to produce the final result. This enables optimisations between intermediate results (e.g. array fusion) and, if the target architecture has a separate memory space, as is the case of GPUs, to prevent excessive data transfers. 
Instances
| Arrays b => Afunction (Acc b) | |
Associated Types type AfunctionR (Acc b) :: * # Methods aconvert :: Config -> Layout aenv aenv -> Acc b -> OpenAfun aenv (AfunctionR (Acc b))  | |
| (Arrays a, Afunction r) => Afunction (Acc a -> r) | |
Associated Types type AfunctionR (Acc a -> r) :: * # Methods aconvert :: Config -> Layout aenv aenv -> (Acc a -> r) -> OpenAfun aenv (AfunctionR (Acc a -> r))  | |
| type AfunctionR (Acc b) | |
| type AfunctionR (Acc a -> r) | |
class (Typeable a, Typeable (ArrRepr a)) => Arrays a #
Arrays consists of nested tuples of individual Arrays, currently up to
 15-elements wide. Accelerate computations can thereby return multiple
 results.
Minimal complete definition
arrays, flavour, toArr, fromArr
Instances
| Arrays () | |
| (Arrays a, Arrays b) => Arrays (a, b) | |
| (Shape sh, Elt e) => Arrays (Array sh e) | |
| (Arrays a, Arrays b, Arrays c) => Arrays (a, b, c) | |
| (Arrays a, Arrays b, Arrays c, Arrays d) => Arrays (a, b, c, d) | |
| (Arrays a, Arrays b, Arrays c, Arrays d, Arrays e) => Arrays (a, b, c, d, e) | |
| (Arrays a, Arrays b, Arrays c, Arrays d, Arrays e, Arrays f) => Arrays (a, b, c, d, e, f) | |
| (Arrays a, Arrays b, Arrays c, Arrays d, Arrays e, Arrays f, Arrays g) => Arrays (a, b, c, d, e, f, g) | |
| (Arrays a, Arrays b, Arrays c, Arrays d, Arrays e, Arrays f, Arrays g, Arrays h) => Arrays (a, b, c, d, e, f, g, h) | |
Methods arrays :: (a, b, c, d, e, f, g, h) -> ArraysR (ArrRepr (a, b, c, d, e, f, g, h)) flavour :: (a, b, c, d, e, f, g, h) -> ArraysFlavour (a, b, c, d, e, f, g, h) toArr :: ArrRepr (a, b, c, d, e, f, g, h) -> (a, b, c, d, e, f, g, h) fromArr :: (a, b, c, d, e, f, g, h) -> ArrRepr (a, b, c, d, e, f, g, h)  | |
| (Arrays a, Arrays b, Arrays c, Arrays d, Arrays e, Arrays f, Arrays g, Arrays h, Arrays i) => Arrays (a, b, c, d, e, f, g, h, i) | |
Methods arrays :: (a, b, c, d, e, f, g, h, i) -> ArraysR (ArrRepr (a, b, c, d, e, f, g, h, i)) flavour :: (a, b, c, d, e, f, g, h, i) -> ArraysFlavour (a, b, c, d, e, f, g, h, i) toArr :: ArrRepr (a, b, c, d, e, f, g, h, i) -> (a, b, c, d, e, f, g, h, i) fromArr :: (a, b, c, d, e, f, g, h, i) -> ArrRepr (a, b, c, d, e, f, g, h, i)  | |
| (Arrays a, Arrays b, Arrays c, Arrays d, Arrays e, Arrays f, Arrays g, Arrays h, Arrays i, Arrays j) => Arrays (a, b, c, d, e, f, g, h, i, j) | |
Methods arrays :: (a, b, c, d, e, f, g, h, i, j) -> ArraysR (ArrRepr (a, b, c, d, e, f, g, h, i, j)) flavour :: (a, b, c, d, e, f, g, h, i, j) -> ArraysFlavour (a, b, c, d, e, f, g, h, i, j) toArr :: ArrRepr (a, b, c, d, e, f, g, h, i, j) -> (a, b, c, d, e, f, g, h, i, j) fromArr :: (a, b, c, d, e, f, g, h, i, j) -> ArrRepr (a, b, c, d, e, f, g, h, i, j)  | |
| (Arrays a, Arrays b, Arrays c, Arrays d, Arrays e, Arrays f, Arrays g, Arrays h, Arrays i, Arrays j, Arrays k) => Arrays (a, b, c, d, e, f, g, h, i, j, k) | |
Methods arrays :: (a, b, c, d, e, f, g, h, i, j, k) -> ArraysR (ArrRepr (a, b, c, d, e, f, g, h, i, j, k)) flavour :: (a, b, c, d, e, f, g, h, i, j, k) -> ArraysFlavour (a, b, c, d, e, f, g, h, i, j, k) toArr :: ArrRepr (a, b, c, d, e, f, g, h, i, j, k) -> (a, b, c, d, e, f, g, h, i, j, k) fromArr :: (a, b, c, d, e, f, g, h, i, j, k) -> ArrRepr (a, b, c, d, e, f, g, h, i, j, k)  | |
| (Arrays a, Arrays b, Arrays c, Arrays d, Arrays e, Arrays f, Arrays g, Arrays h, Arrays i, Arrays j, Arrays k, Arrays l) => Arrays (a, b, c, d, e, f, g, h, i, j, k, l) | |
Methods arrays :: (a, b, c, d, e, f, g, h, i, j, k, l) -> ArraysR (ArrRepr (a, b, c, d, e, f, g, h, i, j, k, l)) flavour :: (a, b, c, d, e, f, g, h, i, j, k, l) -> ArraysFlavour (a, b, c, d, e, f, g, h, i, j, k, l) toArr :: ArrRepr (a, b, c, d, e, f, g, h, i, j, k, l) -> (a, b, c, d, e, f, g, h, i, j, k, l) fromArr :: (a, b, c, d, e, f, g, h, i, j, k, l) -> ArrRepr (a, b, c, d, e, f, g, h, i, j, k, l)  | |
| (Arrays a, Arrays b, Arrays c, Arrays d, Arrays e, Arrays f, Arrays g, Arrays h, Arrays i, Arrays j, Arrays k, Arrays l, Arrays m) => Arrays (a, b, c, d, e, f, g, h, i, j, k, l, m) | |
Methods arrays :: (a, b, c, d, e, f, g, h, i, j, k, l, m) -> ArraysR (ArrRepr (a, b, c, d, e, f, g, h, i, j, k, l, m)) flavour :: (a, b, c, d, e, f, g, h, i, j, k, l, m) -> ArraysFlavour (a, b, c, d, e, f, g, h, i, j, k, l, m) toArr :: ArrRepr (a, b, c, d, e, f, g, h, i, j, k, l, m) -> (a, b, c, d, e, f, g, h, i, j, k, l, m) fromArr :: (a, b, c, d, e, f, g, h, i, j, k, l, m) -> ArrRepr (a, b, c, d, e, f, g, h, i, j, k, l, m)  | |
| (Arrays a, Arrays b, Arrays c, Arrays d, Arrays e, Arrays f, Arrays g, Arrays h, Arrays i, Arrays j, Arrays k, Arrays l, Arrays m, Arrays n) => Arrays (a, b, c, d, e, f, g, h, i, j, k, l, m, n) | |
Methods arrays :: (a, b, c, d, e, f, g, h, i, j, k, l, m, n) -> ArraysR (ArrRepr (a, b, c, d, e, f, g, h, i, j, k, l, m, n)) flavour :: (a, b, c, d, e, f, g, h, i, j, k, l, m, n) -> ArraysFlavour (a, b, c, d, e, f, g, h, i, j, k, l, m, n) toArr :: ArrRepr (a, b, c, d, e, f, g, h, i, j, k, l, m, n) -> (a, b, c, d, e, f, g, h, i, j, k, l, m, n) fromArr :: (a, b, c, d, e, f, g, h, i, j, k, l, m, n) -> ArrRepr (a, b, c, d, e, f, g, h, i, j, k, l, m, n)  | |
| (Arrays a, Arrays b, Arrays c, Arrays d, Arrays e, Arrays f, Arrays g, Arrays h, Arrays i, Arrays j, Arrays k, Arrays l, Arrays m, Arrays n, Arrays o) => Arrays (a, b, c, d, e, f, g, h, i, j, k, l, m, n, o) | |
Methods arrays :: (a, b, c, d, e, f, g, h, i, j, k, l, m, n, o) -> ArraysR (ArrRepr (a, b, c, d, e, f, g, h, i, j, k, l, m, n, o)) flavour :: (a, b, c, d, e, f, g, h, i, j, k, l, m, n, o) -> ArraysFlavour (a, b, c, d, e, f, g, h, i, j, k, l, m, n, o) toArr :: ArrRepr (a, b, c, d, e, f, g, h, i, j, k, l, m, n, o) -> (a, b, c, d, e, f, g, h, i, j, k, l, m, n, o) fromArr :: (a, b, c, d, e, f, g, h, i, j, k, l, m, n, o) -> ArrRepr (a, b, c, d, e, f, g, h, i, j, k, l, m, n, o)  | |
| (Arrays a, Arrays b, Arrays c, Arrays d, Arrays e, Arrays f, Arrays g, Arrays h, Arrays i, Arrays j, Arrays k, Arrays l, Arrays m, Arrays n, Arrays o, Arrays p) => Arrays (a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p) | |
Methods arrays :: (a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p) -> ArraysR (ArrRepr (a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p)) flavour :: (a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p) -> ArraysFlavour (a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p) toArr :: ArrRepr (a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p) -> (a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p) fromArr :: (a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p) -> ArrRepr (a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p)  | |
Minimal complete definition
aconvert
Instances
| Arrays b => Afunction (Acc b) | |
Associated Types type AfunctionR (Acc b) :: * # Methods aconvert :: Config -> Layout aenv aenv -> Acc b -> OpenAfun aenv (AfunctionR (Acc b))  | |
| (Arrays a, Afunction r) => Afunction (Acc a -> r) | |
Associated Types type AfunctionR (Acc a -> r) :: * # Methods aconvert :: Config -> Layout aenv aenv -> (Acc a -> r) -> OpenAfun aenv (AfunctionR (Acc a -> r))  | |
type family AfunctionR f :: * #
Instances
| type AfunctionR (Acc b) | |
| type AfunctionR (Acc a -> r) | |
Synchronous execution
runWith :: Arrays a => Native -> Acc a -> a Source #
As run, but execute using the specified target (thread gang).
run1 :: (Arrays a, Arrays b) => (Acc a -> Acc b) -> a -> b Source #
This is runN, specialised to an array program of one argument.
run1With :: (Arrays a, Arrays b) => Native -> (Acc a -> Acc b) -> a -> b Source #
As run1, but execute using the specified target (thread gang).
runN :: Afunction f => f -> AfunctionR f Source #
Prepare and execute an embedded array program.
This function can be used to improve performance in cases where the array
 program is constant between invocations, because it enables us to bypass
 front-end conversion stages and move directly to the execution phase. If you
 have a computation applied repeatedly to different input data, use this,
 specifying any changing aspects of the computation via the input parameters.
 If the function is only evaluated once, this is equivalent to run.
In order to use runN you must express your Accelerate program as a function
 of array terms:
f :: (Arrays a, Arrays b, ... Arrays c) => Acc a -> Acc b -> ... -> Acc c
This function then returns the compiled version of f:
runN f :: (Arrays a, Arrays b, ... Arrays c) => a -> b -> ... -> c
At an example, rather than:
step :: Acc (Vector a) -> Acc (Vector b) step = ... simulate :: Vector a -> Vector b simulate xs = run $ step (use xs)
Instead write:
simulate = runN step
You can use the debugging options to check whether this is working
 successfully. For example, running with the -ddump-phases flag should show
 that the compilation steps only happen once, not on the second and subsequent
 invocations of simulate. Note that this typically relies on GHC knowing
 that it can lift out the function returned by runN and reuse it.
See the programs in the 'accelerate-examples' package for examples.
See also runQ, which compiles the Accelerate program at _Haskell_ compile
 time, thus eliminating the runtime overhead altogether.
runNWith :: Afunction f => Native -> f -> AfunctionR f Source #
As runN, but execute using the specified target (thread gang).
stream :: (Arrays a, Arrays b) => (Acc a -> Acc b) -> [a] -> [b] Source #
Stream a lazily read list of input arrays through the given program, collecting results as we go.
streamWith :: (Arrays a, Arrays b) => Native -> (Acc a -> Acc b) -> [a] -> [b] Source #
As stream, but execute using the specified target (thread gang).
Asynchronous execution
Block the calling thread until the computation completes, then return the result.
poll :: Async a -> IO (Maybe a) #
Test whether the asynchronous computation has already completed. If so,
 return the result, else Nothing.
runAsyncWith :: Arrays a => Native -> Acc a -> IO (Async a) Source #
As runAsync, but execute using the specified target (thread gang).
run1Async :: (Arrays a, Arrays b) => (Acc a -> Acc b) -> a -> IO (Async b) Source #
As run1, but execute asynchronously.
run1AsyncWith :: (Arrays a, Arrays b) => Native -> (Acc a -> Acc b) -> a -> IO (Async b) Source #
As run1Async, but execute using the specified target (thread gang).
runNAsync :: (Afunction f, RunAsync r, AfunctionR f ~ RunAsyncR r) => f -> r Source #
As runN, but execute asynchronously.
runNAsyncWith :: (Afunction f, RunAsync r, AfunctionR f ~ RunAsyncR r) => Native -> f -> r Source #
As runNWith, but execute asynchronously.
Ahead-of-time compilation
runQ :: Afunction f => f -> ExpQ Source #
Ahead-of-time compilation for an embedded array program.
This function will generate, compile, and link into the final executable,
 code to execute the given Accelerate computation at Haskell compile time.
 This eliminates any runtime overhead associated with the other run*
 operations. The generated code will be optimised for the compiling
 architecture.
Since the Accelerate program will be generated at Haskell compile time,
 construction of the Accelerate program, in particular via meta-programming,
 will be limited to operations available to that phase. Also note that any
 arrays which are embedded into the program via use
 will be stored as part of the final executable.
Usage of this function in your program is similar to that of runN. First,
 express your Accelerate program as a function of array terms:
f :: (Arrays a, Arrays b, ... Arrays c) => Acc a -> Acc b -> ... -> Acc c
This function then returns a compiled version of f as a Template Haskell
 splice, to be added into your program at Haskell compile time:
{-# LANGUAGE TemplateHaskell #-}
f' :: a -> b -> ... -> c
f' = $( runQ f )Note that at the splice point the usage of f must monomorphic; i.e. the
 types a, b and c must be at some known concrete type.
In order to link the final program together, the included GHC plugin must be used when compiling and linking the program. Add the following option to the .cabal file of your project:
ghc-options: -fplugin=Data.Array.Accelerate.LLVM.Native.Plugin
Similarly, the plugin must also run when loading modules in ghci.
Additionally, when building a _library_ with Cabal which utilises runQ, you
 will need to use the following custom build Setup.hs to ensure that the
 library is linked together properly:
import Data.Array.Accelerate.LLVM.Native.Distribution.Simple main = defaultMain
And in the .cabal file:
build-type: Custom
custom-setup
  setup-depends:
      base
    , Cabal
    , accelerate-llvm-nativeThe custom Setup.hs is only required when building a library with Cabal.
 Building executables with cabal requires only the GHC plugin.
See the lulesh-accelerate project for an example.
- Note:
 
Due to GHC#13587, this currently must be as an untyped splice.
The correct type of this function is similar to that of runN:
runQ :: Afunction f => f -> Q (TExp (AfunctionR f))
Since: 1.1.0.0
runQAsyncWith :: Afunction f => f -> ExpQ Source #
Ahead-of-time analogue of runNAsyncWith. See runQ for more information.
The correct type of this function is:
runQAsyncWith :: (Afunction f, RunAsync r, AfunctionR f ~ RunAsyncR r) => f -> Q (TExp (Native -> r))
Since: 1.1.0.0
Execution targets
Native machine code JIT execution target
Instances
| Skeleton Native | |
Methods generate :: (Shape sh, Elt e) => Native -> UID -> Gamma aenv -> IRFun1 Native aenv (sh -> e) -> CodeGen (IROpenAcc Native aenv (Array sh e)) transform :: (Shape sh, Shape sh', Elt a, Elt b) => Native -> UID -> Gamma aenv -> IRFun1 Native aenv (sh' -> sh) -> IRFun1 Native aenv (a -> b) -> IRDelayed Native aenv (Array sh a) -> CodeGen (IROpenAcc Native aenv (Array sh' b)) map :: (Shape sh, Elt a, Elt b) => Native -> UID -> Gamma aenv -> IRFun1 Native aenv (a -> b) -> IRDelayed Native aenv (Array sh a) -> CodeGen (IROpenAcc Native aenv (Array sh b)) fold :: (Shape sh, Elt e) => Native -> UID -> Gamma aenv -> IRFun2 Native aenv (e -> e -> e) -> IRExp Native aenv e -> IRDelayed Native aenv (Array (sh :. Int) e) -> CodeGen (IROpenAcc Native aenv (Array sh e)) fold1 :: (Shape sh, Elt e) => Native -> UID -> Gamma aenv -> IRFun2 Native aenv (e -> e -> e) -> IRDelayed Native aenv (Array (sh :. Int) e) -> CodeGen (IROpenAcc Native aenv (Array sh e)) foldSeg :: (Shape sh, Elt e, Elt i, IsIntegral i) => Native -> UID -> Gamma aenv -> IRFun2 Native aenv (e -> e -> e) -> IRExp Native aenv e -> IRDelayed Native aenv (Array (sh :. Int) e) -> IRDelayed Native aenv (Segments i) -> CodeGen (IROpenAcc Native aenv (Array (sh :. Int) e)) fold1Seg :: (Shape sh, Elt e, Elt i, IsIntegral i) => Native -> UID -> Gamma aenv -> IRFun2 Native aenv (e -> e -> e) -> IRDelayed Native aenv (Array (sh :. Int) e) -> IRDelayed Native aenv (Segments i) -> CodeGen (IROpenAcc Native aenv (Array (sh :. Int) e)) scanl :: (Shape sh, Elt e) => Native -> UID -> Gamma aenv -> IRFun2 Native aenv (e -> e -> e) -> IRExp Native aenv e -> IRDelayed Native aenv (Array (sh :. Int) e) -> CodeGen (IROpenAcc Native aenv (Array (sh :. Int) e)) scanl' :: (Shape sh, Elt e) => Native -> UID -> Gamma aenv -> IRFun2 Native aenv (e -> e -> e) -> IRExp Native aenv e -> IRDelayed Native aenv (Array (sh :. Int) e) -> CodeGen (IROpenAcc Native aenv (Array (sh :. Int) e, Array sh e)) scanl1 :: (Shape sh, Elt e) => Native -> UID -> Gamma aenv -> IRFun2 Native aenv (e -> e -> e) -> IRDelayed Native aenv (Array (sh :. Int) e) -> CodeGen (IROpenAcc Native aenv (Array (sh :. Int) e)) scanr :: (Shape sh, Elt e) => Native -> UID -> Gamma aenv -> IRFun2 Native aenv (e -> e -> e) -> IRExp Native aenv e -> IRDelayed Native aenv (Array (sh :. Int) e) -> CodeGen (IROpenAcc Native aenv (Array (sh :. Int) e)) scanr' :: (Shape sh, Elt e) => Native -> UID -> Gamma aenv -> IRFun2 Native aenv (e -> e -> e) -> IRExp Native aenv e -> IRDelayed Native aenv (Array (sh :. Int) e) -> CodeGen (IROpenAcc Native aenv (Array (sh :. Int) e, Array sh e)) scanr1 :: (Shape sh, Elt e) => Native -> UID -> Gamma aenv -> IRFun2 Native aenv (e -> e -> e) -> IRDelayed Native aenv (Array (sh :. Int) e) -> CodeGen (IROpenAcc Native aenv (Array (sh :. Int) e)) permute :: (Shape sh, Shape sh', Elt e) => Native -> UID -> Gamma aenv -> IRPermuteFun Native aenv (e -> e -> e) -> IRFun1 Native aenv (sh -> sh') -> IRDelayed Native aenv (Array sh e) -> CodeGen (IROpenAcc Native aenv (Array sh' e)) backpermute :: (Shape sh, Shape sh', Elt e) => Native -> UID -> Gamma aenv -> IRFun1 Native aenv (sh' -> sh) -> IRDelayed Native aenv (Array sh e) -> CodeGen (IROpenAcc Native aenv (Array sh' e)) stencil :: (Stencil sh a stencil, Elt b) => Native -> UID -> Gamma aenv -> IRFun1 Native aenv (stencil -> b) -> IRBoundary Native aenv (Array sh a) -> IRDelayed Native aenv (Array sh a) -> CodeGen (IROpenAcc Native aenv (Array sh b)) stencil2 :: (Stencil sh a stencil1, Stencil sh b stencil2, Elt c) => Native -> UID -> Gamma aenv -> IRFun2 Native aenv (stencil1 -> stencil2 -> c) -> IRBoundary Native aenv (Array sh a) -> IRDelayed Native aenv (Array sh a) -> IRBoundary Native aenv (Array sh b) -> IRDelayed Native aenv (Array sh b) -> CodeGen (IROpenAcc Native aenv (Array sh c))  | |
| Persistent Native | |
Methods  | |
| Embed Native | |
| Execute Native | |
Methods map :: (Shape sh, Elt b) => ExecutableR Native -> Gamma aenv -> AvalR Native aenv -> StreamR Native -> sh -> LLVM Native (Array sh b) generate :: (Shape sh, Elt e) => ExecutableR Native -> Gamma aenv -> AvalR Native aenv -> StreamR Native -> sh -> LLVM Native (Array sh e) transform :: (Shape sh, Elt e) => ExecutableR Native -> Gamma aenv -> AvalR Native aenv -> StreamR Native -> sh -> LLVM Native (Array sh e) backpermute :: (Shape sh, Elt e) => ExecutableR Native -> Gamma aenv -> AvalR Native aenv -> StreamR Native -> sh -> LLVM Native (Array sh e) fold :: (Shape sh, Elt e) => ExecutableR Native -> Gamma aenv -> AvalR Native aenv -> StreamR Native -> (sh :. Int) -> LLVM Native (Array sh e) fold1 :: (Shape sh, Elt e) => ExecutableR Native -> Gamma aenv -> AvalR Native aenv -> StreamR Native -> (sh :. Int) -> LLVM Native (Array sh e) foldSeg :: (Shape sh, Elt e) => ExecutableR Native -> Gamma aenv -> AvalR Native aenv -> StreamR Native -> (sh :. Int) -> DIM1 -> LLVM Native (Array (sh :. Int) e) fold1Seg :: (Shape sh, Elt e) => ExecutableR Native -> Gamma aenv -> AvalR Native aenv -> StreamR Native -> (sh :. Int) -> DIM1 -> LLVM Native (Array (sh :. Int) e) scanl :: (Shape sh, Elt e) => ExecutableR Native -> Gamma aenv -> AvalR Native aenv -> StreamR Native -> (sh :. Int) -> LLVM Native (Array (sh :. Int) e) scanl1 :: (Shape sh, Elt e) => ExecutableR Native -> Gamma aenv -> AvalR Native aenv -> StreamR Native -> (sh :. Int) -> LLVM Native (Array (sh :. Int) e) scanl' :: (Shape sh, Elt e) => ExecutableR Native -> Gamma aenv -> AvalR Native aenv -> StreamR Native -> (sh :. Int) -> LLVM Native (Array (sh :. Int) e, Array sh e) scanr :: (Shape sh, Elt e) => ExecutableR Native -> Gamma aenv -> AvalR Native aenv -> StreamR Native -> (sh :. Int) -> LLVM Native (Array (sh :. Int) e) scanr1 :: (Shape sh, Elt e) => ExecutableR Native -> Gamma aenv -> AvalR Native aenv -> StreamR Native -> (sh :. Int) -> LLVM Native (Array (sh :. Int) e) scanr' :: (Shape sh, Elt e) => ExecutableR Native -> Gamma aenv -> AvalR Native aenv -> StreamR Native -> (sh :. Int) -> LLVM Native (Array (sh :. Int) e, Array sh e) permute :: (Shape sh, Shape sh', Elt e) => ExecutableR Native -> Gamma aenv -> AvalR Native aenv -> StreamR Native -> Bool -> sh -> Array sh' e -> LLVM Native (Array sh' e) stencil1 :: (Shape sh, Elt e) => ExecutableR Native -> Gamma aenv -> AvalR Native aenv -> StreamR Native -> sh -> LLVM Native (Array sh e) stencil2 :: (Shape sh, Elt e) => ExecutableR Native -> Gamma aenv -> AvalR Native aenv -> StreamR Native -> sh -> sh -> LLVM Native (Array sh e) aforeign :: (Arrays as, Arrays bs) => String -> (StreamR Native -> as -> LLVM Native bs) -> StreamR Native -> as -> LLVM Native bs  | |
| Link Native | |
| Compile Native | |
| Foreign Native | |
| Intrinsic Native | |
Methods intrinsicForTarget :: Native -> HashMap ShortByteString Label  | |
| Target Native Source # | |
Methods  | |
| Remote Native | Data instance for arrays in the native backend. We assume a shared-memory machine, and just manipulate the underlying Haskell array directly.  | 
Methods allocateRemote :: (Shape sh, Elt e) => sh -> LLVM Native (Array sh e) useRemoteR :: (ArrayElt e, ArrayPtrs e ~ Ptr a, Storable a, Typeable a, Typeable e) => Int -> Maybe (StreamR Native) -> ArrayData e -> LLVM Native () copyToRemoteR :: (ArrayElt e, ArrayPtrs e ~ Ptr a, Storable a, Typeable a, Typeable e) => Int -> Int -> Maybe (StreamR Native) -> ArrayData e -> LLVM Native () copyToHostR :: (ArrayElt e, ArrayPtrs e ~ Ptr a, Storable a, Typeable a, Typeable e) => Int -> Int -> Maybe (StreamR Native) -> ArrayData e -> LLVM Native () copyToPeerR :: (ArrayElt e, ArrayPtrs e ~ Ptr a, Storable a, Typeable a, Typeable e) => Int -> Int -> Native -> Maybe (StreamR Native) -> ArrayData e -> LLVM Native () indexRemote :: Array sh e -> Int -> LLVM Native e  | |
| Async Native | |
| Marshalable Native Int | |
| ArrayElt e => Marshalable Native (ArrayData e) | |
| data ExecutableR Native | |
| data ObjectR Native | |
| type ArgR Native | |
| type EventR Native | |
type EventR Native = ()  | |
| type StreamR Native | |
type StreamR Native = ()  | |
| data KernelMetadata Native | |
type Strategy = Gang -> Executable Source #
The strategy for balancing work amongst the available worker threads.
Arguments
| :: [Int] | CPU IDs to launch worker threads on  | 
| -> Strategy | Strategy to balance parallel workloads  | 
| -> IO Native | 
Create a Native execution target by spawning a worker thread on each of the given capabilities, and using the given strategy to load balance the workers when executing parallel operations.
Execute a computation where threads use work stealing (based on lazy splitting of work stealing queues and exponential backoff) in order to automatically balance the workload amongst themselves.
unbalancedParIO :: Strategy Source #
Execute a computation without load balancing. Each thread computes an equally sized chunk of the input. No work stealing occurs.