How can I set LD_PRELOAD for a Hadoop cluster?

7 views (last 30 days)
I am testing two scripts, one that uses Spark and one that uses MapReduce, on a Hadoop cluster running Red Hat Enterprise Linux 7.6. I am receiving the following error:
Error: failed /usr/lib/jvm/java-1.8.0-openjdk-, ...\nbecause /apps/matlab/R2020b/sys/os/glnxa64/ undefined symbol: __cxa_thread_atexit_impl
This seems to be because the nodes have GLIBC 2.17 installed and this symbol starts being defined in GLIBC 2.18. I understand that there is a shim library for GLIBC 2.17 to work with MATLAB R2020b, but how can I force the cluster to find and load it?

Accepted Answer

MathWorks Support Team
MathWorks Support Team on 5 Jan 2022 at 5:00
Edited: MathWorks Support Team on 20 Jan 2022 at 14:51
We can preload the library in the MATLAB scripts through cluster properties without modifying the system environment variables. The two scripts require slightly different code, because they are built on different technologies. In both of these, $MATLAB_ROOT needs to be replaced with the full path to the MATLAB installation on the worker.
For "mapreduce", modify your code as below:
cluster = parallel.cluster.Hadoop(..)\n...\ncluster.HadoopProperties('mapred.child.env') = 'LD_PRELOAD=$MATLAB_ROOT/bin/glnxa64/'\n...\nmapreducer(cluster)
For the Spark-based script, modify the code as below:
cluster = parallel.cluster.Hadoop(..)\n...\ncluster.SparkProperties('spark.executorEnv.LD_PRELOAD') = '$MATLAB_ROOT/bin/glnxa64/'\n...\nmapreducer(cluster)
This error may continue to be present after running the "mapreduce" script. The issue is likely related to "mapred.child.env", which is the legacy name for this configuration and the part of the workaround intended to allow mapreduce to work. Changing this to use the following configuration properties resolves the issue:
preloadEnv = 'LD_PRELOAD=$MATLAB_ROOT/bin/glnxa64/';\ncluster.HadoopProperties('') = preloadEnv;\ncluster.HadoopProperties('mapreduce.reduce.env') = preloadEnv; \ncluster.HadoopProperties('') = preloadEnv;
  1 Comment
Jared Evans
Jared Evans on 22 Oct 2021
In case it spares anyone some pain, my cluster was upgraded and I starting having this issue again. What ultimately fixed this was to add:
cluster.SparkProperties('spark.yarn.appMasterEnv.LD_PRELOAD') = '$MATLAB_ROOT/bin/glnxa64/'
in additon to the
cluster.SparkProperties('spark.executorEnv.LD_PRELOAD') = '$MATLAB_ROOT/bin/glnxa64/'
that was already in place. For some reason the upgrades were now causing yarn to complain during the shipping of the containers, when it never had before.
In case this doesn't translate as I expect, my approach was actually done with pyspark as:
LD_PRELOAD = os.environ['LD_PRELOAD']
config_list = [..., ('spark.yarn.appMasterEnv.LD_PRELOAD',f"{LD_PRELOAD}"),
('spark.executorEnv.LD_PRELOAD',f"{LD_PRELOAD}"), ...]
conf = pyspark.SparkConf().setAll(config_list)
spark = SparkSession.builder.master('yarn')\

Sign in to comment.

More Answers (0)




Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!