Commit b7ea071c by Fakher F. Assaad

Updated Build system and documentation

1 parent a19f5d6b
......@@ -10,5 +10,5 @@ $(TARGET): $(OBJS)
$(FC) $(LF) -o $(TARGET) $(OBJS) $(LIBS)
clean:
rm $(OBJS)
rm -f $(OBJS)
......@@ -10,5 +10,5 @@ $(TARGET): $(OBJS)
$(FC) $(LF) -o $(TARGET) $(OBJS) $(LIBS)
clean:
rm $(OBJS)
rm -f $(OBJS)
......@@ -10,5 +10,5 @@ $(TARGET): $(OBJS)
$(FC) $(LF) -o $(TARGET) $(OBJS) $(LIBS)
clean:
rm $(OBJS)
rm -f $(OBJS)
......@@ -15,4 +15,4 @@ clean:
(make -f Compile_cov clean );\
(make -f Compile_scal clean );\
(make -f Compile_eq clean );\
rm *.mod *~ \#* *.out
rm -f *.mod *~ \#* *.out
No preview for this file type
This diff could not be displayed because it is too large.
#Node real time speedup
1 156.789 28
2 79.2239 55.4137324721454
4 39.7976 110.31047098317485
8 19.93456 220.22517677841896
16 9.9785 439.95510347246574
32 5.028 873.1288782816229
64 2.506 1751.832402234637
#OMP real-time speedup (gepinned)
1 41487.85 1
2 24350.843 1.703754157504937
4 14902.294 2.7839908406048086
7 8567.924 4.842228992694146
14 4244.074 9.775477524661445
28 3827.053 10.84067819285492
#L Realtime per bin DOF
4 46.5952 256
5 157.7983 400
6 517.6544 576
7 1350.915 784
8 3214.6523 1024
9 5929.47 1296
% Copyright (c) 2016 The ALF project.
% This is a part of the ALF project documentation.
% The ALF project documentation by the ALF contributors is licensed
% under a Creative Commons Attribution-ShareAlike 4.0 International License.
% For the licensing details of the documentation see license.CCBYSA.
% !TEX root = Doc.tex
%-------------------------------------------------------------------------------------
\section*{Acknowledgments}
%-------------------------------------------------------------------------------------
We are very grateful to S. Beyl, M. Hohenadler, F. Parisen Toldin, M. Raczkowski, J. Schwab, T. Sato, Z. Wang and M. Weber, for constant support during the development of this project. FFA would also like to thank T.~Lang and Z.~Y.~Meng for developments of the auxiliary field code as well as T.~Grover.
MB thanks the Bavarian Competence Network for Technical and Scientific High Performance Computing (KONWIHR) for financial support. FG and JH thank the SFB-1170 for financial support under projects Z03 and C01. FFA thanks the DFG-funded FOR1807 and FOR1346 for partial financial support.
Part of the optimization of the code was carried out during the Porting and Tuning Workshop 2016 offered by the Forschungszentrum J\"ulich.
Calculations to extensively test this package were carried out both on SuperMUC at the Leibniz Supercomputing Centre and on JURECA \cite{Jureca16} at the J\"ulich Supercomputing Centre. We thank both institutions for generous allocation of computing time.
%The authors gratefully acknowledge the computing time granted by the John von Neumann Institute for Computing (NIC) and provided on the supercomputer JURECA \cite{Jureca16} at Jülich Supercomputing Centre (JSC). The authors gratefully acknowledge the Gauss Centre for Supercomputing e.V. (www.gauss-centre.eu) for funding this project by providing computing time on the GCS Supercomputer SuperMUC at the Leibniz Supercomputing Centre (LRZ, www.lrz.de).
\ No newline at end of file
% Copyright (c) 2016 The ALF project.
% This is a part of the ALF project documentation.
% The ALF project documentation by the ALF contributors is licensed
% under a Creative Commons Attribution-ShareAlike 4.0 International License.
% For the licensing details of the documentation see license.CCBYSA.
% !TEX root = Doc.tex
%-------------------------------------------------------------------------------------
\subsection{ Analysis programs }\label{sec:analysis}
%-------------------------------------------------------------------------------------
%
\begin{table}[h]
\begin{tabular}{@{} l l @{}}\toprule
Program & Description \\\midrule
\texttt{cov\_scal.f90} & In combination with the script \texttt{analysis.sh}, the bin files with suffix \texttt{\_scal} are read in, \\
& and the corresponding files with suffix \texttt{\_scalJ} are produced. They contain the result \\
& of the Jackknife rebinning analysis (see Sec.~\ref{sec:sampling}). \\
\texttt{cov\_eq.f90} & In combination with the script \texttt{analysis.sh}, the bin files with suffix \texttt{\_eq} are read in, \\
& and the corresponding files will suffix \texttt{\_eqJR} and \texttt{\_eqJK} are produced. They correspond \\
& to correlation functions in real and Fourier space, respectively. \\
\texttt{cov\_tau.f90} & In combination with the script \texttt{analysis.sh}, the bin files \texttt{X\_tau} are read in, \\
& and the directories \texttt{X\_kx\_ky} are produced for all \texttt{kx} and \texttt{ky} greater or equal to zero. \\
& Here \texttt{X} is a place holder from \texttt{Green}, \texttt{SpinXY}, etc as specified in \texttt{ Alloc\_obs(Ltau)} \\
& (See section \ref{Alloc_obs_sec}). Each directory contains a file \texttt{g\_kx\_ky} containing the \\
& time displaced correlation function traced over the orbitals. It also contains the \\
& covariance matrix if \texttt{N\_cov} is set to unity in the parameter file (see Sec.~\ref{sec:input}). \\
& Equally, a directory \texttt{X\_R0} for the local time displaced correlation function is generated. \\\bottomrule
\end{tabular}
\caption{ Overview of analysis programs that are called within the script \texttt{analysis.sh}. \label{table:analysis_programs}}
\end{table}
%
Here we briefly discuss the analysis programs which read in bins and carry out the error analysis. (See Sec.~\ref{sec:sampling} for a more detailed discussion.)
Error analysis is based on the central limit theorem, which requires bins to be statistically independent, and also the existence of a well-defined variance for the observable under consideration.
The former will be the case if bins are longer than the autocorrelation time. The latter has to be checked by the user. In the parameter file listed in Sec.~\ref{sec:input}, the user can specify how many initial bins should be omitted (variable \texttt{n\_skip}).
This number should be comparable to the autocorrelation time.
The rebinning variable \texttt{N\_rebin} will merge \texttt{N\_rebin} bins into a single new bin.
If the autocorrelation time is smaller than the effective bin size, the error should become independent of the bin size and thereby of the variable \texttt{N\_rebin}.
Our analysis is based on the Jackknife resampling.
As listed in Table \ref{table:analysis_programs} we provide three analysis programs to account for the three observable types. The programs can be found in the directory \texttt{Analysis} and are executed by running the bash shell script
\texttt{analysis.sh}.
%
\begin{table}[h]
\begin{tabular}{@{} l l @{}}\toprule
File & Description \\\midrule
\texttt{parameters} & Contains also variables for the error analysis:\\
& \texttt{n\_skip}, \texttt{N\_rebin} and \texttt{N\_Cov} (see Sec.~\ref{sec:input}) \\
\texttt{X\_scal}, \texttt{Y\_eq}, \texttt{Y\_tau} & Monte Carlo bins (see Table \ref{table:output}) \\\bottomrule
\end{tabular}
\caption{Standard input files for the error analysis. \label{table:analysis_input}}
\end{table}
%
\begin{table}[h]
\begin{tabular}{@{} l l l @{}}\toprule
File & Description \\\midrule
\texttt{X\_scalJ} & Jackknife mean and error of \texttt{X}, where \texttt{X} stands for \texttt{Kin, Pot, Part}, and \texttt{Ener}.\\
\texttt{Y\_eqJR} and \texttt{Y\_eqJK} & Jackknife mean and error of \texttt{Y}, where \texttt{Y} stands for \texttt{Green, SpinZ, SpinXY}, and \texttt{Den}.\\
& The suffixes \texttt{R} and \texttt{K} refers to real and reciprocal space, respectively.\\
\texttt{Y\_R0/g\_R0} & Time-resolved and spatially local Jackknife mean and error of \texttt{Y},\\
& where \texttt{Y} stands for \texttt{Green, SpinZ, SpinXY}, and \texttt{Den}.\\
\texttt{Y\_kx\_ky/g\_kx\_ky} & Time resolved and $\vec{k}$-dependent Jackknife mean and error of \texttt{Y},\\
& where \texttt{Y} stands for \texttt{Green, SpinZ, SpinXY}, and \texttt{Den}.\\\bottomrule
\end{tabular}
\caption{ Standard output files of the error analysis. \label{table:analysis_output}}
\end{table}
%
In the following, we describe the formatting of the output files mentioned in Table \ref{table:analysis_output}.
\begin{itemize}
\item For the scalar quantities \texttt{X}, the output files \texttt{X\_scalJ} have the following formatting:
\begin{alltt}
Effective number of bins, and bins: <N_bin - n_skip> <N_bin>
OBS : 1 <mean(X)> <error(X)>
OBS : 2 <mean(sign)> <error(sign)>
\end{alltt}
\item For the equal time correlation functions \texttt{Y}, the formatting of the output files \texttt{Y\_eqJR} and \texttt{Y\_eqJK} follows this structure:
\begin{alltt}
do i = 1, N_unit_cell
<k_x(i)> <k_y(i)>
do alpha = 1, N_orbital
do beta = 1, N_orbital
alpha beta Re<mean(Y)> Re<error(Y)> Im<mean(Y)> Im<error(Y)>
enddo
enddo
enddo
\end{alltt}
where \texttt{Re} and \texttt{Im} refer to the real and imaginary part, respectively.
\item The imaginary-time displaced correlation functions \texttt{Y} are written to the output files \texttt{Y\_R0/g\_R0}, when measured locally in space,
and to the output files \texttt{Y\_kx\_ky/g\_kx\_ky} when they are measured $\vec{k}$-resolved.
Both output files have the following formatting:
\begin{alltt}
do i = 0, Ltau
tau(i) <mean( Tr[Y] )> <error( Tr[Y])>
enddo
\end{alltt}
where \texttt{Tr} corresponds to the trace over the orbital degrees of freedom.
\end{itemize}
% Copyright (c) 2016 The ALF project.
% This is a part of the ALF project documentation.
% The ALF project documentation by the ALF contributors is licensed
% under a Creative Commons Attribution-ShareAlike 4.0 International License.
% For the licensing details of the documentation see license.CCBYSA.
In its present form, the auxiliary field QMC code of the ALF project allows to simulate a large class of non-trivial models, both efficiently and at minimal programming cost. There are many possible extensions which deserve to be considered in future releases. The model Hamiltonians we have presented so far are imaginary-time independent. This however can be easily generalized to imaginary-time dependent model Hamiltonians thus allowing, for example, to access entanglement properties of interacting fermionic systems \cite{Broecker14,Assaad14,Assaad13a,Assaad15}. Generalizations to include global moves are equally desirable. This is a prerequisite to play with recent ideas of self-learning algorithms \cite{Xu16a} so as to possibly avoid the issue of critical slowing down. At present, the QMC code of this package is restricted to discrete HS fields such that implementations of the long-range Coulomb repulsion -- as introduced in \cite{Hohenadler14,Ulybyshev2013,Brower12} -- are not yet included. Extensions to continuous HS fields are certainly possible, but require an efficient upgrading scheme. Finally, an implementation of the ground state projective QMC method is equally desirable.
This diff could not be displayed because it is too large.
% Copyright (c) 2016 The ALF project.
% This is a part of the ALF project documentation.
% The ALF project documentation by the ALF contributors is licensed
% under a Creative Commons Attribution-ShareAlike 4.0 International License.
% For the licensing details of the documentation see license.CCBYSA.
% !TEX root = Doc.tex
%-------------------------------------------------------------------------------------
\section*{License}
%-------------------------------------------------------------------------------------
The ALF code is provided as an open source software such that it is available to all and we hope that it
will be useful. If you benefit from this code we ask that you acknowledge the ALF collaboration as mentioned on our
homepage \url{alf.physik.uni-wuerzburg.de}. The git repository at \url{alf.physik.uni-wuerzburg.de} gives us the tools to
create a small but vibrant community around the code and provides a suitable entry point for future contributors and future developments.
The homepage is also the place where the original source files can be found.
With the coming public release it was necessary to add copyright headers to our source files.
%and to think about the
%social contract that comes into existence between us, our users and our software and therefore the question was on
%the table of how to make those ideas part of our licensing scheme.
The Creative Commons licenses are a good way to share our documentation and it is also well
accepted by publishers. Therefore this documentation is licensed to you under a CC-BY-SA license.
This means you can share it and redistribute it as long as you cite the original source and
license your changes under the same license. The details are in the file license.CCBYSA that you should have received with this documentation.
The source code itself is licensed under a GPL license to keep the source as well as any future work in the community.
To express our desire for a proper attribution we decided to make this a visible part of the license.
To that end we have exercised the rights of section 7 of GPL version 3 and have amended
the license terms with an additional paragraph that expresses our wish that if an author has benefitted from this code
that he/she should consider giving back a citation as specified on \url{alf.physik.uni-wuerzburg.de}.
This is not something that is meant to restrict your freedom of use, but something that we strongly expect to be good scientific conduct.
The original GPL license can be found in the file license.GPL and the additional terms can be found in license.additional.
In favour to our users, the ALF code contains part of the lapack implementation version 3.6.1 from \url{http://www.netlib.org/lapack}.
Lapack is licensed under the modified BSD license whose full text can be found in license.BSD.\\
With that being said, we hope that the ALF code will prove to you to be a suitable and high-performance tool that enables
you to perform quantum Monte Carlo studies of solid state models of unprecedented complexity.\\
\\
The ALF project's contributors.\\
%-------------------------------------------------------------------------------------
\subsection*{COPYRIGHT}
%-------------------------------------------------------------------------------------
Copyright \textcopyright ~2016, The \textit{ALF} Project.\\
The ALF Project Documentation
is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
You are free to share and benefit from this documentation as long as this license is preserved
and proper attribution to the authors is given. For details see the ALF project
homepage \url{alf.physik.uni-wuerzburg.de} and the file \texttt{license.CCBYSA}.
% Copyright (c) 2016 The ALF project.
% This is a part of the ALF project documentation.
% The ALF project documentation by the ALF contributors is licensed
% under a Creative Commons Attribution-ShareAlike 4.0 International License.
% For the licensing details of the documentation see license.CCBYSA.
% !TEX root = Doc.tex
%-------------------------------------------------------------------------------------
\subsection{Other models}
%-------------------------------------------------------------------------------------
\label{sec:other_models}
The aim of this section is to briefly mention a small selection of other models that can be studied using the QMC code of the ALF project.
%-------------------------------------------------------------------------------------
\subsubsection{Kondo lattice model}
%-------------------------------------------------------------------------------------
Simulating the Kondo lattice with the QMC code of the ALF project requires rewriting of the model along the lines of Refs.~\cite{Assaad99a,Capponi00,Beach04}.
Adopting the notation of these articles, the Hamiltonian that one will simulate reads:
\begin{equation}\label{eqn:ham_kondo}
\hat{\mathcal{H}} =
\underbrace{-t \sum_{\langle \vec{i},\vec{j} \rangle,\sigma} \left( \hat{c}_{\vec{i},\sigma}^{\dagger} \hat{c}_{\vec{j},\sigma}^{\phantom\dagger} + \text{H.c.} \right) }_{\equiv \hat{\mathcal{H}}_t} - \frac{J}{4}
\sum_{\vec{i}} \left( \sum_{\sigma} \hat{c}_{\vec{i},\sigma}^{\dagger} \hat{f}_{\vec{i},\sigma}^{\phantom\dagger} +
\hat{f}_{\vec{i},\sigma}^{\dagger} \hat{c}_{\vec{i},\sigma}^{\phantom\dagger} \right)^{2} +
\underbrace{\frac{U}{2} \sum_{\vec{i}} \left( \hat{n}^{f}_{\vec{i}} -1 \right)^2}_{\equiv \hat{\mathcal{H}}_U}.
\end{equation}
This form is included in the general Hamiltonian (\ref{eqn:general_ham}) such that the above Hamiltonian can be implemented in our program package.
The relation to the Kondo lattice model follows from expanding the square of the hybridization to obtain:
\begin{equation}
\hat{\mathcal{H}} =\hat{\mathcal{H}}_t
+ J \sum_{\vec{i}} \left( \hat{\vec{S}}^{c}_{\vec{i}} \cdot \hat{\vec{S}}^{f}_{\vec{i}} + \hat{\eta}^{z,c}_{\vec{i}} \cdot \hat{\eta}^{z,f}_{\vec{i}}
- \hat{\eta}^{x,c}_{\vec{i}} \cdot \hat{\eta}^{x,f}_{\vec{i}} - \hat{\eta}^{y,c}_{\vec{i}} \cdot \hat{\eta}^{y,f}_{\vec{i}} \right)
+ \hat{\mathcal{H}}_U.
\end{equation}
where the $\eta$-operators relate to the spin-operators via a particle-hole transformation in one spin sector:
\begin{equation}
\hat{\eta}^{\alpha}_{\vec{i}} = \hat{P}^{-1} \hat{S}^{\alpha}_{\vec{i}} \hat{P} \; \text{ with } \;
\hat{P}^{-1} \hat{c}^{\phantom\dagger}_{\vec{i},\uparrow} \hat{P} = (-1)^{i_x+i_y} \hat{c}^{\dagger}_{\vec{i},\uparrow} \; \text{ and } \;
\hat{P}^{-1} \hat{c}^{\phantom\dagger}_{\vec{i},\downarrow} \hat{P} = \hat{c}^{\phantom\dagger}_{\vec{i},\downarrow}
\end{equation}
Since the $\hat{\eta}^{f} $- and $ \hat{S}^{f} $-operators do not alter the parity [$(-1)^{\hat{n}^{f}_{\vec{i}}}$ ] of the $f$-sites,
\begin{equation}
\left[ \hat{\mathcal{H}}, \hat{\mathcal{H}}_U \right] = 0.
\end{equation}
Thereby, and for positive values of $U$ , doubly occupied or empty $f$-sites -- corresponding to even parity sites -- are suppressed by a Boltzmann factor
$e^{-\beta U/2} $ in comparison to odd parity sites. Choosing $\beta U $ adequately essentially allows to restrict the Hilbert space to odd parity $f$-sites.
In this Hilbert space $\hat{\eta}^{x,f} = \hat{\eta}^{y,f} = \hat{\eta}^{z,f} =0$ such that the Hamiltonian (\ref{eqn:ham_kondo}) reduces to the Kondo lattice model.
%-------------------------------------------------------------------------------------
\subsubsection{$SU(N)$ Hubbard-Heisenberg models}
%-------------------------------------------------------------------------------------
$SU(2N)$ Hubbard-Heisenberg \cite{Assaad04,Lang13} models can be written as:
\begin{equation}
\hat{\mathcal{H}} =
\underbrace{ - t \sum_{ \langle \vec{i},\vec{j} \rangle } \left( \vec{\hat{c}}^{\dagger}_{\vec{i}} \vec{\hat{c}}^{\phantom{\dagger}}_{\vec{j}} + \text{H.c.} \right) }_{\equiv \hat{\mathcal{H}}_t} \; \;
\underbrace{ -\frac{J}{2 N} \sum_{ \langle \vec{i},\vec{j} \rangle } \left(
\hat{D}^{\dagger}_{ \vec{i},\vec{j} }\hat{D}^{\phantom\dagger}_{ \vec{i},\vec{j}} +
\hat{D}^{\phantom\dagger}_{ \vec{i},\vec{j} } \hat{D}^{\dagger}_{ \vec{i},\vec{j} } \right) }_{\equiv\hat{\mathcal{H}}_J}
+
\underbrace{\frac{U}{N} \sum_{\vec{i}} \left(
\vec{\hat{c}}^{\dagger}_{\vec{i}} \vec{\hat{c}}^{\phantom\dagger}_{\vec{i}} - {\frac{N}{2} } \right)^2}_{\equiv \hat{\mathcal{H}}_U}
\end{equation}
Here,
$ \vec{\hat{c}}^{\dagger}_{\vec{i}} =
(\hat{c}^{\dagger}_{\vec{i},1}, \hat{c}^{\dagger}_{\vec{i},2}, \cdots, \hat{c}^{\dagger}_{\vec{i}, N } ) $ is an
$N$-flavored spinor, and $ \hat{D}_{ \vec{i},\vec{j}} = \vec{\hat{c}}^{\dagger}_{\vec{i}}
\vec{\hat{c}}_{\vec{j}} $.
To use the QMC code of the ALF project to simulate this model, one will rewrite the $J$-term as a sum of perfect squares,
\begin{equation}
\hat{\mathcal{H}}_J = -\frac{J}{4 N} \sum_{ \langle \vec{i}, \vec{j} \rangle }
\left(\hat{D}^{\dagger}_{ \langle \vec{i}, \vec{j} \rangle } + \hat{D}_{ \langle \vec{i}, \vec{j} \rangle } \right)^2 -
\left(\hat{D}^{\dagger}_{ \langle \vec{i}, \vec{j} \rangle } - \hat{D}_{ \langle \vec{i}, \vec{j} \rangle} \right)^2,
\end{equation}
so to manifestly bring it into the form of the general Hamiltonian(\ref{eqn:general_ham}).
It is amusing to note that setting the hopping $t=0$, charge fluctuations will be suppressed by the Boltzmann factor $e^{-\beta U /N \left( \vec{\hat{c}}^{\dagger}_{\vec{i}} \vec{\hat{c}}^{\phantom\dagger}_{\vec{i}} - {\frac{N}{2} } \right)^2 } $
%\mycomment{MB: corrected minus sign in the exponent}
since in this case $ \left[ \hat{\mathcal{H}_J}, \hat{\mathcal{H}}_U \right] = 0 $.
%\mycomment{MB: I suggest to use only the J-term here, $ \left[ \hat{\mathcal{H}}_{J}, \hat{\mathcal{H}}_U \right] = 0 $.}
This provides a route to use the auxiliary field QMC algorithm to simulate -- free of the sign problem -- $SU(2N)$ Heisenberg models in the self-adjoint antisymmetric representation \footnote{ This corresponds to a Young tableau with single column and $N/2$ rows.}
For odd values of $N$ recent progress in our understanding of the origins of the sign problem \cite{Wei16} allows us to simulate a set of non-trivial Hamiltonians \cite{Li15,Assaad16}, without encountering the sign problem.
% Copyright (c) 2016 The ALF project.
% This is a part of the ALF project documentation.
% The ALF project documentation by the ALF contributors is licensed
% under a Creative Commons Attribution-ShareAlike 4.0 International License.
% For the licensing details of the documentation see license.CCBYSA.
% !TEX root = Doc.tex
%-------------------------------------------------------------------------------------
\subsection{Performance, memory requirements and parallelization}
%-------------------------------------------------------------------------------------
As mentioned in the introduction, the auxiliary field QMC algorithm scales linearly in inverse temperature $\beta$ and cubic in the volume $N_{\text{dim}}$. Using fast updates, a single spin flip requires $(N_{\text{dim}})^2$ operations to update the Green function upon acceptance. As there are $L_{\text{Trotter}}\times N_{\text{dim}}$ spins to be visited, the total computational cost for one sweep is of the order of $\beta (N_{\text{dim}})^3$. This operation dominates the performance, see Fig.~\ref{fig_scaling_size}. A profiling analysis of our code shows that 80-90\% of the CPU time is spend in ZGEMM calls of the BLAS library provided in the MKL package by Intel. Consequently, the single-core performance is next to optimal.
\begin{figure}[h]
\begin{center}
\includegraphics[scale=.8]{Figures/Size_scaling_ALF_2.pdf}
\end{center}
\caption{\label{fig_scaling_size}Volume scaling behavior of the auxiliary field QMC code of the ALF project on SuperMUC (phase 2/Haswell nodes) at the LRZ in Munich. The number of sites $N_{\text{dim}}$ corresponds to the system volume.
The plot confirms that the leading scaling order is due to matrix multiplications such that the runtime is dominated by calls to ZGEMM. }
\end{figure}
For the implementation which scales linearly in $\beta$, one has to store $L_{\text{Trotter}}/\texttt{NWrap}$ intermediate propagation matrices of dimension $N\times N$. For large lattices and/or low temperatures this dominates the total memory requirements that can exceed 2~GB memory for a sequential version.
At the heart of Monte Carlo schemes lies a random walk through the given configuration space. This is easily parallalized via MPI by associating one random walker to each MPI task. For each task, we start from a random configuration and have to invest the autocorrelation time $T_\mathrm{auto}$ to produce an equilibrated configuration.
Additionally we can also profit from an OpenMP parallelized version of the BLAS/LAPACK library for an additional speedup, which also effects equilibration overhead $N_\text{MPI}\times T_\text{auto} / N_\text{OMP}$, where $N_{\text{MPI}}$ is the number of cores and $N_{\text{OMP}}$ the number of OpenMP threads.
For a given number of independent measurements $N_\text{meas}$, we therefore need a wall-clock time given by
\begin{equation}\label{eqn:scaling}
T = \frac{T_\text{auto}}{N_\text{OMP}} \left( 1 + \frac{N_\text{meas}}{N_\text{MPI}} \right) \,.
\end{equation}
As we typically have $ N_\text{meas}/N_\text{MPI} \gg 1 $,
the speedup is expected to be almost perfect, in accordance with
the performance test results for the auxiliary field
QMC code on SuperMUC (see Fig.~\ref{fig_scaling} (left)).
For many problem sizes, 2~GB memory per MPI task (random walker) suffices such that we typically start as many MPI tasks as there are physical cores per node. Due to the large amount of CPU time spent in MKL routines, we do not profit from the hyper-threading option. For large systems, the memory requirement increases and this is tackled by increasing the amount of OpenMP threads to decrease the stress on the memory system and to simultaneously reduce the equilibration overhead (see Fig.~\ref{fig_scaling} (right)). For the displayed speedup, it was crucial to pin the MPI tasks as well as the OpenMP threads in a pattern which keeps the threads as compact as possible to profit from a shared cache. This also explains the drop in efficiency from 14 to 28 threads where the OpenMP threads are spread over both sockets.
We store the field configurations of the random walker as checkpoints, such that a long simulation can be easily split into several short simulations. This procedure allows us to take advantage of chained jobs using the dependency chains provided by the batch system.
\begin{figure}[H]
\begin{center}
\includegraphics[scale=0.6]{Figures/MPI_scaling_ALF_2.pdf}
\includegraphics[scale=0.6]{Figures/OMP_scaling_ALF_2.pdf}
\end{center}
\caption{\label{fig_scaling} MPI (left) and OpenMP (right) scaling behavior of the auxiliary field QMC code of the ALF project on SuperMUC (phase 2/Haswell nodes) at the LRZ in Munich.
The MPI performance data was normalized to 28 cores and was obtained using a problem size of $N_{\text{dim}}=400$. This is a medium to small system size that is the least favorable in terms of MPI synchronization effects.
The OpenMP performance data was obtained using a problem size of $N_{\text{dim}}=1296$. Employing 2 and 4 OpenMP threads introduces some synchronization/management overhead such that the per-core performance is slightly reduced, compared to the single thread efficiency. Further increasing the amount of threads to 7 and 14 keeps the efficiency constant. The drop in performance of the 28 thread configuration is due to the architecture as the threads are now spread over both sockets of the node. To obtain the above results, it was crucial to pin the processes in a fashion that keeps the OpenMP threads as compact as possible.}
\end{figure}
%Next to the entire computational time is spent in BLAS routines such that the performance of the code will depend on the particular implementation of this library.
%We have found that the code performs well, and that an efficient OpenMP version of the library can be obtained merely by loading the corresponding BLAS and LAPACK routines.
%\mycomment{MB: Do we want to say more about OpenMP here, i.e. that it can be useful when warm-up time is a problem (and getting many CPUs is not).
%In all other cases, the MPI parallelization is always better than the trivial OpenMP parallelization of library algos.}
% Copyright (c) 2016 The ALF project.
% This is a part of the ALF project documentation.
% The ALF project documentation by the ALF contributors is licensed
% under a Creative Commons Attribution-ShareAlike 4.0 International License.
% For the licensing details of the documentation see license.CCBYSA.
% !TEX root = Doc.tex
%-------------------------------------------------------------------------------------
\subsection{Running the code}\label{sec:running}
%-------------------------------------------------------------------------------------
In this section we describe the steps to compile and run the code and to perform the error analysis of the data.
%-------------------------------------------------------------------------------------
\subsubsection{Compilation}
%-------------------------------------------------------------------------------------
The environment variables are defined in the bash script \texttt{set\_env.sh} as follows:
\lstset{style=bash}
\begin{lstlisting}
# Description of PROGRAMMCONFIGURATION:
# -DMPI selects MPI.
# Setting nothing compiles without mpi.
# -DQRREF selects a reference implementation of the QR decomposition.
# Setting nothing selects system lapack for the QR decomposition.
# -DSTAB1, DSTAB2 selects an alternative stabilization scheme.
# Setting nothing selects the default stabilizatition
PROGRAMMCONFIGURATION=""
f90="gfortran"
export f90
F90OPTFLAGS="-O3"
export F90OPTFLAGS
FL="-c ${F90OPTFLAGS} ${PROGRAMMCONFIGURATION}"
export FL
DIR=`pwd`
export DIR
Libs=${DIR}"/Libraries/"
export Libs
LIB_BLAS_LAPACK="-llapack -lblas"
export LIB_BLAS_LAPACK
\end{lstlisting}
In the above, the GNU Fortan compiler \texttt{gfortran} is set.\footnote{A known issue with the alternative Intel Fortran compiler \texttt{ifort} is the handling of automatic, temporary arrays
which \texttt{ifort} allocates on the stack. For large system sizes and/or low temperatures this may lead to
a runtime error. One solution is to demand allocation of arrays above a certain size on the heap instead of the stack.
This is accomplished by the \texttt{ifort} compiler flag \texttt{-heap-arrays [n]} where \texttt{[n]} is the minimal size (in kilobytes, for example \texttt{n=1024}) of arrays
that are allocated on the heap.}
The program can be compiled and ran either in single-thread mode (default) or
in multi-threading mode (define \texttt{-DMPI}) using the MPI standard for parallelization.
To compile the libraries, the analysis programs and the quantum Monte Carlo program, the following steps should be executed:
\begin{enumerate}
\item Export the environment variables:
\begin{verbatim}
source set_env.sh
\end{verbatim}
\item Compile the libraries and the error analysis routines
\begin{verbatim}
cd Libraries
make
cd ..
cd Analysis
make
cd ..
\end{verbatim}
\item Compile the quantum Monte Carlo code
\begin{verbatim}
cd Prog
make
cd ..
\end{verbatim}
\end{enumerate}
%-------------------------------------------------------------------------------------
\subsubsection{Starting a simulation}
%-------------------------------------------------------------------------------------
To start a simulation from scratch, the following files have to be present: \texttt{parameters} and \texttt{seeds}.
To run a single-thread simulation, for example by using the parameters of one of the Hubbard models described in Sec.~\ref{sec:ex}, issue the command
\begin{verbatim}
./Prog/Examples.out
\end{verbatim}
To restart the code using an existing simulation as a starting point, first run the script \texttt{out\_to\_in.sh} to set
the input configuration files.
%-------------------------------------------------------------------------------------
\subsubsection{Error analysis}
%-------------------------------------------------------------------------------------
To perform an error analysis (based on the jackknife scheme) of the Monte Carlo bins for all observables run the script \texttt{analysis.sh}
(see Sec.~\ref{sec:analysis}).
% !TEX root = doc.tex
% Copyright (c) 2016 The ALF project.
% This is a part of the ALF project documentation.
% The ALF project documentation by the ALF contributors is licensed
% under a Creative Commons Attribution-ShareAlike 4.0 International License.
% For the licensing details of the documentation see license.CCBYSA.
%
%------------------------------------------------------------
\subsection{Monte Carlo sampling}\label{sec:sampling}
%------------------------------------------------------------
%
The default updating scheme consists of local moves which change (upon acceptance) only one entry of $L_{\mathrm{Trotter}}(M_I+M_V)$ fields (see Sec. \ref{sec:updating}).
To generate an independent configuration $C$, one has to visit at least each field once. Our unit of \textit{sweeps} is defined such that each field is visited twice in a sequential propagation from $\tau = 0$ to $\tau = L_{\text{ Trotter}}$ and back. A single sweep will generically not suffice to produce an independent configuration.
% This is however only the lower bound as there can be a region in the spin space where the fields are correlated and it requires a larger or even global move to significantly change the configuration to an independent one. One might imagine a ferromagnet due to spontaneous symmetry breaking. All spins are parallel aligned and, let' say, point upwards. The configuration of only down spins is equally justified, but rotating one to the other requires a global operation. Flipping the spins individually one after another generates intermediate states of relative high energy which corresponds to a low probability in the QMC algorithm.
In fact, the autocorrelation time $T_\mathrm{auto}$ characterizes the required time scale to generate an independent configuration or values $\langle\langle\hat{O}\rangle\rangle_C$ for the observable $O$.
This has several consequences for the Monte Carlo simulation:
\begin{itemize}
\item First of all, we start from a randomly chosen field configuration such that one has to invest \textit{at least} one $T_\mathrm{auto}$ to generate relevant, equilibrated configurations before reliable measurements are possible. This phase of the simulation is known as the warm-up. In order to keep the code as flexible as possible (different simulations might have different autocorrelation times), measurements are taken from the very beginning. Instead, we provide the parameter \path{n_skip} for the analysis to ignore the first \path{n_skip} bins.
\item Secondly, our implementation averages over a given amount of measurements set by the variable \texttt{NSWEEPS} before storing the results, known as one bin, on the disk. A bin corresponds to \texttt{NSWEEPS} sweeps. The error analysis requires statistically independent bins to generate reliable confidence estimates. If bins are to small (averaged over a period shorter then $T_\mathrm{auto}$), the error bars are then typically underestimated. Most of the time, the autocorrelation time is unknown before the simulation is started. Sometimes the used compute cluster does not allow single runs long enough to generate appropriately sized bins. Therefore, we provide the \path{N_rebin} parameter that specifies how many bins are combined into a new bin during the error analysis. In general, one should check that a further increase of the bin size does not change the error estimate (For an explicit example, the reader is referred to the Appendix of Ref.~\cite{Assaad02}).
The \path{N_rebin} variable can be used to control a second issue. The distribution of the Monte Carlo estimates $\langle\langle\hat{O}\rangle\rangle_C$ are unknown. The result in the form $(\mathrm{mean}\pm \mathrm{error})$ assumes a Gaussian distribution. Luckily, every original distribution with a finite variance turns into a Gaussian one, once it is folded often enough (central limit theorem). Due to the internal averaging (folding) within one bin, many observables are already quite Gaussian. Otherwise one can increase \path{N_rebin} further, even if the bins are already independent~\cite{Bercx17}.
\item The third issue concerns time displaced correlation functions. Even if the configurations are independent, the fields within the configuration are still correlated. Hence, the data for $S_{\alpha,\beta}(\vec{k},\tau)$ (see Sec.~\ref{sec:obs}; Eqn.~\ref{eqn:s}) and $S_{\alpha,\beta}(\vec{k},\tau+\Delta\tau)$ are also correlated. Setting the switch \path{N_Cov = 1} triggers the calculation of the covariance matrix in addition to the usual error analysis. The covariance is defined by
\begin{equation}
COV_{\tau \tau'}=\frac{1}{N_{Bins}}\left\langle\left(S_{\alpha,\beta}(\vec{k},\tau)-\langle S_{\alpha,\beta}(\vec{k},\tau)\rangle\right)\left(S_{\alpha,\beta}(\vec{k},\tau')-\langle S_{\alpha,\beta}(\vec{k},\tau')\rangle\right)\right\rangle\,.
\end{equation}
An example where this information is necessary is the calculation of mass gaps extracted by fitting the tail of the time displaced correlation function. Omitting the covariance matrix will underestimate the error.
\end{itemize}
% !TEX root = doc.tex
% Copyright (c) 2017 The ALF project.
% This is a part of the ALF project documentation.
% The ALF project documentation by the ALF contributors is licensed
% under a Creative Commons Attribution-ShareAlike 4.0 International License.
% For the licensing details of the documentation see license.CCBYSA.
%
%-----------------------------------------------------------------------------------
\subsection{Stabilization - a peculiarity of the BSS algorithm}\label{sec:stable}
%-----------------------------------------------------------------------------------
%
From \eqref{eqn:partition_2} it can be seen that for the calculation of the Monte Carlo weight
and for the observables a long product of matrix exponentials has to be formed.
On top of that we need to be able to extract the single-particle Green function for a given flavor index at say time slice $\tau = 0$. As mentioned above in Eq.~(\ref{eqn:Green_eq}), this quantity is given by:
\begin{equation}
\bm{G}= \left( \mathds{1} + \prod_{ \tau= 1}^{L_{\text{Trotter}}} \bm{B}_\tau \right)^{-1}.
\end{equation}
To boil this down to more familiar terms from linear algebra we remark that we can recast this problem as the question to the solution of the linear system
\begin{equation}
(\mathds{1} + \prod_\tau \bm{B}_\tau) x = b.
\end{equation}
The $\bm{B}_\tau \in \mathbb{C}^{n\times n}$ depend on the lattice size as well as other physical parameters that can be chosen such that a matrix norm of $\bm{B}_i$ can have any number.
From standard perturbation theory for linear systems it is known that the computed solution $\tilde{x}$ would
contain a relative error of
\begin{equation}
\frac{|\tilde{x} - x|}{|x|} = \mathcal{O}\left(\epsilon \kappa_p\left(\mathds{1} + \prod_\tau \bm{B}_\tau\right)\right).
\end{equation}
Here $\epsilon$ denotes the machine precision which is $2^{-53}$ for IEEE double precision numbers
and $\kappa_p(\bm{M})$ is the condition number of the matrix $\bm{M}$ with respect ot the matrix $p$-norm.
The important fact that makes straight-forward inversion so badly suited stems from the fact that $ \prod_ \tau \bm{B}_\tau $ contains exponentially large and small scales as can be seen in Eq.~\eqref{eqn:partition_2}. Thereby, as a function of increasing inverse temperature,
the condition number will grow exponentially so that the computed solution $\tilde{x}$
will often contain no correct digits at all.
To circumvent this, more sophisticated methods have to be employed. We will first of all assume that the multiplication of \texttt{NWrap} $\bm{B}$ matrices has an acceptable condition number.
Assuming for simplicity that \texttt{NWrap} is a multiple of $L_{\text{Trotter}}$, we can write:
\begin{equation}
\bm{G} = \left( \mathds{1} + \prod\limits_{ i = 0}^{\frac{L_{\text{Trotter}}} {\texttt{NWrap} -1}} \underbrace{\prod_{\tau=1}^{\texttt{NWrap}} \bm{B}_{i \cdot \texttt{NWrap}+ \tau} }_{ \equiv \mathcal{\bm{B}}_i}\right)^{-1}.
\end{equation}
Within the auxiliary field QMC implementation of the ALF project, we are by default employing
the strategy of forming a product of QR-decompositions which was proven to be weakly backwards stable in \cite{Bai2011}.
The key idea is to efficiently separate the scales of a matrix from the orthogonal part of a matrix.
This can be achieved using a QR decomposition of a matrix $\bm{A}$ in the form $\bm{A}_i = \bm{Q}_i \bm{R}_i$. The matrix $\bm{Q}_i$ is unitary and hence in the usual $2$-norm it holds that $\kappa_2(\bm{Q}_i) = 1$.
To get a handle on the condition number of $\bm{R}_i$ we will form the
diagonal matrix
\begin{equation}
(\bm{D}_i)_{n,n} = |(\bm{R}_i)_{n,n}|
\label{eq:diagnorm}
\end{equation}
and set $\tilde{\bm{R}}_i = \bm{D}_i^{-1} \bm{R}_i$
This gives the decomposition
\begin{equation}
\bm{A}_i = \bm{Q}_i \bm{D}_i \tilde{\bm{R}}_i.
\end{equation}
$\bm{D}_i$ now contains the row norms of the original $\bm{R}_i$ matrix and hence attempts to separate off the total scales of the problem from $\bm{R}_i$.
This is similar in spirit to the so-called matrix equilibration which tries to improve the condition number of a matrix by suitably chosen column and row scalings.
Due to a theorem by van der Sluis \cite{vanderSluis1969} we know that the choice in \eqref{eq:diagnorm} is almost optimal among all diagonal matrices $\bm{D}$ from the space of diagonal matrices
$\mathcal{D}$ in the sense that
\begin{equation*}
\kappa_p((\bm{D}_i)^{-1} \bm{R}_i ) \leq n^{1/p} \min_{\bm{D} \in \mathcal{D}} \kappa_p(\bm{D}^{-1} \bm{R}_i).
\end{equation*}
Now, given an initial decomposition of $\bm{A}_{j-1} = \prod_i \mathcal{\bm{B}}_i = \bm{Q}_{j-1} \bm{D}_{j-1} \bm{T}_{j-1}$ an update
$\mathcal{\bm{B}}_j \bm{A}_{j-1}$ is formed in the following three steps:
\begin{enumerate}
\item Form $ \bm{M}_j = (\mathcal{\bm{B}}_j \bm{Q}_{j-1}) \bm{D}_{j-1}$. Note the parentheses.
\item Do a QR decomposition of $\bm{M}_j = \bm{Q}_j \bm{D}_j \bm{R}_j$. This gives the final $\bm{Q}_j$ and $\bm{D}_j$.
\item Form the updated $\bm{T}$ matrices $\bm{T}_j = \bm{R}_j \bm{T}_{j-1}$.
\end{enumerate}
%While this provides provides a stable method to calculate the involved matrix product
%it can be pretty expensive. Therefore the user can specify to skip a certain number of
%QR Decompositions and perform plain multiplications instead. This is specified in the parameters file by the \path{NWrap} parameter.
%\path{NWrap}~=~1 corresponds to always performing QR decompositions whereas larger integers give longer intervals where no QR decomposition will be performed.
The effectiveness of the stabilization \emph{has} to be judged for every simulation from the output file \path{info} (Sec.~\ref{sec:output_prec}). For most simulations there are two values to look out for:
\begin{itemize}
\item \texttt{Precision Green}
\item \texttt{Precision Phase}
\end{itemize}
The Green function as well as the average phase are usually numbers with a magnitude of $\mathcal{O} (1)$.
For that reason we recommend that \path{NWrap} is chosen such that the mean precision is of the order of $10^{-8}$ or better.
We have included typical values of \texttt{Precision Phase} and of the mean and the maximal values of \texttt{Precision Green} in the
discussion of example simulations, see Sec.~\ref{sec:prec_charge} and Sec.~\ref{sec:prec_spin}.
% Copyright (c) 2016 The ALF project.
% This is a part of the ALF project documentation.
% The ALF project documentation by the ALF contributors is licensed
% under a Creative Commons Attribution-ShareAlike 4.0 International License.
% For the licensing details of the documentation see license.CCBYSA.
% !TEX root = Doc.tex
%------------------------------------------------------------
\subsection{Updating schemes}\label{sec:updating}
%------------------------------------------------------------
%
The program allows for different types of updating schemes. Given a configuration $C$ we propose a new one, $C'$, with probability $T_0(C \rightarrow C')$ and accept it according to the Metropolis-Hastings acceptance-rejection probability,
\begin{equation}
P(C \rightarrow C') = \text{min} \left( 1, \frac{T_0(C' \rightarrow C) W(C')}{T_0(C \rightarrow C') W(C)} \right),
\end{equation}
so as to guarantee the stationarity condition. Here, $ W(C) = \left| \Re \left[ e^{-S(C)} \right] \right| $.
\begin{table}[h]
\begin{tabular}{@{} l l l @{}}\toprule
Variable & Type & Description \\
\\\midrule
\texttt{Propose\_S0} & Logical & If true, proposes local moves according to the probability $e^{-S_0}$ \\
% \texttt{Global\_moves} & Logical & If true, allows for global moves. \\
% \texttt{N\_Global } & Integer & Number of global moves per sweep of single spin flips. \\
% \texttt{TEMPERING} & Compiling option & Requires MPI and runs the code in a parallel tempering mode.
\\\bottomrule
\end{tabular}
\caption{ Variables required to control the updating scheme \label{table:Updating_schemes}}
\end{table}
%
%------------------------------------------------------------
\subsubsection{The default: sequential single spin flips}
%------------------------------------------------------------
%
The default updating scheme is a sequential single spin flip algorithm. Consider the Ising spin $s_{i,\tau}$. We will flip it with probability one such that for this local move the proposal matrix is symmetric. If we are considering the Hubbard-Stratonovich field $l_{i,\tau}$ we will propose with probability $1/3$ one of the other three possible fields. Again, for this local move, the proposal matrix is symmetric. Hence in both cases we will accept or reject the move according to
\begin{equation}
P(C \rightarrow C') = \text{min} \left( 1, \frac{ W(C')}{W(C)} \right)
\end{equation}
It is worth noting that this type of sequential spin flip updating does not satisfy detailed balance but the more fundamental stationarity condition \cite{Sokal89}.
%
%------------------------------------------------------------
\subsubsection{Sampling of $e^{-S_0}$}
%------------------------------------------------------------
%
Consider an Ising spin at space-time $i,\tau$ and the configuration $C$. Flipping this spin will generate the configuration $C'$ and we will propose the move according to
\begin{equation}
T_0(C \rightarrow C') = \frac{e^{-S_0(C')}}{ e^{-S_0(C')} + e^{-S_0(C)} } = 1 - \frac{1}{1 + e^{-S_0(C')} /e^{-S_0(C)}}
\end{equation}
Note that the function $\texttt{S0}$ in the \texttt{Hamitonian\_example.f90} module computes precisely the ratio\\
${e^{-S_0(C')} /e^{-S_0(C)}}$ so that $T_0(C \rightarrow C') $ does not require any further programming.
Thereby one will accept the proposed move with the probability:
\begin{equation}
P(C \rightarrow C') = \text{min} \left( 1, \frac{e^{-S_0(C)} W(C')}{ e^{-S_0(C')} W(C)} \right).
\end{equation}
With Eq.~\ref{eqn:partition_2} one sees that the bare action $S_0(C)$ determining the dynamics of the Ising spin in the absence of coupling to the fermions does not enter the Metropolis acceptance-rejection step. This sampling scheme is used if the logical variable \texttt{Propose\_S0} is switched on.
%
%------------------------------------------------------------
%\input{global_alf_2.0.tex}
\subsection{Global updates and parallel tempering}
Global updates and parallel tempering is already implemented in the ALF-code, and will be included in a future release.
\ No newline at end of file
......@@ -15,4 +15,4 @@ errors.o: mat_mod.o
maxent.o: mat_mod.o errors.o
clean:
rm $(OBJS)
rm -f $(OBJS)
......@@ -8,4 +8,4 @@ all:
clean:
(make -f Compile clean ) ;\
rm *.mod *~ \#*
rm -f *.mod *~ \#*
......@@ -7,4 +7,4 @@ all: $(OBJS)
$(FC) $(SUFFIX) $(FLAGS) -c $<
clean:
rm $(OBJS)
rm -f $(OBJS)
# -DMPI selects MPI.
# -DSTAB1 Alternative stabilization, using the singular value decomposition.
# -DSTAB2 Alternative stabilization, lapack QR with manual pivoting. Packed form of QR factorization is not used.
# (Noflag) Default stabilization, using lapack QR with pivoting. Packed form of QR factorization is used.
# -DQRREF Enables reference lapack implementation of QR decomposition.
# Recommendation: just use the -DMPI flag if you want to run in parallel or leave it empy for serial jobs.
# The default stabilization, no flag, is generically the best.
PROGRAMCONFIGURATION = -DMPI
PROGRAMCONFIGURATION =
f90 = gfortran
export f90
F90OPTFLAGS = -O3 -Wconversion -fcheck=all
F90OPTFLAGS = -O3
export F90OPTFLAGS
F90USEFULFLAGS = -cpp -std=f2003
F90USEFULFLAGS = -cpp
export F90USEFULFLAGS
FL = -c ${F90OPTFLAGS} ${PROGRAMCONFIGURATION}
export FL
DIR = ${CURDIR}
export DIR
Libs = ${DIR}/Libraries/
export Libs
LIB_BLAS_LAPACK = -llapack -lblas
export LIB_BLAS_LAPACK
all: lib ana program
lib:
cd Libraries && $(MAKE)
ana:
cd Analysis && $(MAKE)
program:
cd Prog && $(MAKE)
clean: cleanall
cleanall: cleanprog cleanlib cleanana
cleanprog:
cd Prog && $(MAKE) clean
cleanlib:
cd Libraries && $(MAKE) clean
cleanana:
cd Analysis && $(MAKE) clean
help:
@echo "The following are some of the valid targets of this Makefile"
@echo "all, program, lib, ana, clean, cleanall, cleanprog, cleanlib, cleanana"
......@@ -14,6 +14,6 @@ $(TARGET): $(OBJS)
$(FC) $(LF) -o $(TARGET) $(OBJS) $(LIBS)
clean:
rm $(OBJS)
rm -f $(OBJS)
......@@ -12,4 +12,4 @@ Examples:
clean:
(make -f Compile_Examples clean );\
rm *.mod *~ \#*
rm -f *.mod *~ \#*
#define GIT
#define GIT_COMMIT_HASH "bc2f74f"
#define GIT_BRANCH "master"
#define GIT_COMMIT_HASH "adc3482"
#define GIT_BRANCH "ALF-1.0"
# setting QRREF has the highest priority. Setting nothing selects System lapack for the QR decomposition.
# -DQRREF sets reference QR
# -DMPI selects MPI.
# -DSTAB1, -DSTAB2 selects an alternative stabilitation scheme.
PROGRAMMCONFIGURATION="-DQRREF "
PROGRAMMCONFIGURATION="-DMPI"
PROGRAMMCONFIGURATION=""
f90="gfortran"
export f90
F90OPTFLAGS="-O3 -Wconversion -fcheck=all"
F90OPTFLAGS="-O3 "
export F90OPTFLAGS
F90USEFULFLAGS="-cpp -std=f2003"
F90USEFULFLAGS="-cpp "
export F90USEFULFLAGS
FL="-c ${F90OPTFLAGS} ${PROGRAMMCONFIGURATION}"
export FL
DIR=`pwd`
export DIR
Libs=${DIR}"/Libraries/"
export Libs
LIB_BLAS_LAPACK="-llapack -lblas"
export LIB_BLAS_LAPACK
Markdown is supported
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!