dmsong
作者dmsong·2021-05-26 17:19
系统工程师·IPS

Alpine容器中编译ELK遇到core dump问题及处理过程

字数 4792阅读 5388评论 0赞 2

最近,我们的某个用户在Open Power服务器FP5280G2上运行alpine镜像容器,在alpine容器里编译ELK的时候,报错segment fault(core dumped)。
切换了Alpine版本3.11.6和3.13.5,openjdk8和openjkd11,logstash7.6.1和6.6.2,都是同样的现象。

现象为,在执行到

/usr/share/logstash/bin/logstash-plugin install logstash-filter-translate  

一步时,报core dump

#  
# A fatal error has been detected by the Java Runtime Environment:  
#  
# SIGSEGV (0xb) at pc=0x0000000000000000, pid=44, tid=64  
#  
# JRE version: OpenJDK Runtime Environment (11.0.9+11) (build 11.0.9+11-alpine-r1)  
# Java VM: OpenJDK 64-Bit Server VM (11.0.9+11-alpine-r1, mixed mode, tiered, compressed oops, concurrent mark sweep gc, linux-ppc64le)  
# Problematic frame:  
# C 0x0000000000000000  
#  
# Core dump will be written. Default location: Core dumps may be processed with "/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h %e" (or dumping to /root/dist/core.44)  
#  
# An error report file with more information is saved as:  
# /root/dist/hs_err_pid44.log  
Compiled method (c1) 1718 168 3 java.util.Collections$Unmodifiab leCollection$1::next (10 bytes)  
total in heap [0x00007fff7ab67710,0x00007fff7ab67c80] = 1392  
relocation [0x00007fff7ab67870,0x00007fff7ab67898] = 40  
constants [0x00007fff7ab67900,0x00007fff7ab67980] = 128  
main code [0x00007fff7ab67980,0x00007fff7ab67b80] = 512  
stub code [0x00007fff7ab67b80,0x00007fff7ab67be8] = 104  
metadata [0x00007fff7ab67be8,0x00007fff7ab67c00] = 24  
scopes data [0x00007fff7ab67c00,0x00007fff7ab67c18] = 24  
scopes pcs [0x00007fff7ab67c18,0x00007fff7ab67c68] = 80  
dependencies [0x00007fff7ab67c68,0x00007fff7ab67c70] = 8  
nul chk table [0x00007fff7ab67c70,0x00007fff7ab67c80] = 16  
#  
# If you would like to submit a bug report, please visit:  
#  [https://gitlab.alpinelinux.org/alpine/aports/issues](https://gitlab.alpinelinux.org/alpine/aports/issues)  
# The crash happened outside the Java Virtual Machine in native code.  
# See problematic frame for where to report the bug.  
#  
Aborted (core dumped)  

日志显示是c程序段出错。

在容器中,用gdb分析抛出的core

# gdb java core.1621924577.java.44  
(gdb) bt full  
#0 0x00007fffaf1dec4c in abort () from /lib/ld-musl-powerpc64le.so.1  
#1 0x00007fffaeb05fe4 in ?? () from /usr/lib/jvm/java-11-openjdk/lib/server/libjvm.so  

看起来是jvm通过jni调用到alpine的c库的时候出现段错误退出。

logstash自带了jni库lbjffi-1.2.so,我们看一下链接情况:

/tmp # ldd /usr/share/logstash/vendor/jruby/lib/jni/ppc64le-Linux/libjffi-1.2.so  
/lib/ld-musl-powerpc64le.so.1 (0x7fff94340000)  
libc.so.6 => /lib/ld-musl-powerpc64le.so.1 (0x7fff94340000)  
Error relocating /usr/share/logstash/vendor/jruby/lib/jni/ppc64le-Linux/libjffi-1.2.so: __sprintf_chk: symbol not found  
Error relocating /usr/share/logstash/vendor/jruby/lib/jni/ppc64le-Linux/libjffi-1.2.so: __snprintf_chk: symbol not found  
Error relocating /usr/share/logstash/vendor/jruby/lib/jni/ppc64le-Linux/libjffi-1.2.so: __vsnprintf_chk: symbol not found  

说明这个libjffi-1.2.so链接到了ld-musl-powerpc64le.so.1,而两者不完全兼容。猜测这个logstash自带的libjffi-1.2.so是在gnu libc环境下编译出来的,直接拿到alpine中,会自动链接到alpine的libc库,即musl库。由于兼容性问题,导致了Jvm报core dump。

查找alpine官方提供的package,找到libjffi相关的package安装

apk -U --no-cache add libffi java-jffi libffi-dev java-jffi-native  
安装之后,在/usr/lib/目录下出现libjffi-1.2.so  
/home/jffi # ls /usr/lib/libjffi-1.2.so -l  
-rwxr-xr-x 1 root root 132776 Jul 15 2019 /usr/lib/libjffi-1.2.so  

用LDD检查一下链接情况

/home/jffi # ldd /usr/lib/libjffi-1.2.so  
/lib/ld-musl-powerpc64le.so.1 (0x7fffa9c40000)  
libc.musl-ppc64le.so.1 => /lib/ld-musl-powerpc64le.so.1 (0x7fffa9c40000)  

可见,这个包在链接到musl库的时候,不会出现找不到符号的问题。
然后,我们就用这个库文件替换logstash下的libjffi-1.2.so

mv /usr/share/logstash/vendor/jruby/lib/jni/ppc64le-Linux/libjffi-1.2.so /usr/share/logstash/vendor/jruby/lib/jni/ppc64le-Linux/libjffi-1.2.so.bak  
ln -s /usr/lib/libjffi-1.2.so /usr/share/logstash/vendor/jruby/lib/jni/ppc64le-Linux/libjffi-1.2.so  

再次执行编译命令

~/dist # /usr/share/logstash/bin/logstash-plugin install logstash-filter-translate
OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future release.
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.headius.backport9.modules.Modules to method sun.nio.ch.NativeThread.signal(long)
WARNING: Please consider reporting this to the maintainers of com.headius.backport9.modules.Modules
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Validating logstash-filter-translate
Installing logstash-filter-translate
Installation successful


编译通过。

**结论**

alpine的c库是musl。用gcc编译的程序或者动态库文件,直接在alpine中运行,可能会有兼容性问题。如果出现段错误或者core dump,要检查一下是否存在兼容性问题。




参考链接:
https://pkgs.alpinelinux.org/packages
https://github.com/telekom-security/tpotce/tree/19.03.3/docker/elk 

如果觉得我的文章对您有用,请点赞。您的支持将鼓励我继续创作!

2

添加新评论0 条评论

Ctrl+Enter 发表

作者其他文章

相关文章

相关问题

相关资料

X社区推广