solr - Yokozuna shutting down and taking Riak with it -Can't seem to find why -


currently experiencing issue on 10 node cluster, whereby after approx day of running, 3 nodes drop out (always random 3).

riak version : 2.1.4

10 vm's running 10gb ram each, running oracle linux version 7.3

java version :

[riak@pp2xria01trd001 riak$] java -version openjdk version "1.8.0_121" openjdk runtime environment (build 1.8.0_121-b13) openjdk 64-bit server vm (build 25.121-b13, mixed mode) 

our usual riak guy on holiday @ moment, don't have resource into. or guidance on possibly start looking appreciated.

crash dump details :

slogan: kernel pid terminated (application_controller) ({application_terminated,yokozuna,shutdown})system version: erlang  r16b02_basho10 (erts-5.10.3) [source] [64-bit] [smp:2:2] [async-threads:64] [hipe] [kernel-poll:true] [frame-pointer] 

not in solr.log detail why :

2017-04-06 21:04:13,958 [info] <qtp1924582348-828>@logupdateprocessorfactory.java:198 [marketblueprints_index] webapp=/internal_solr path=/update params={} {} 0 0 2017-04-06 21:04:18,567 [info] <qtp1924582348-855>@solrdispatchfilter.java:732 [admin] webapp=null path=/admin/cores params={action=status&wt=json} status=0 qtime=2 2017-04-06 21:04:23,573 [info] <qtp1924582348-1161>@solrdispatchfilter.java:732 [admin] webapp=null path=/admin/cores params={action=status&wt=json} status=0 qtime=2 2017-04-06 21:04:28,578 [info] <qtp1924582348-865>@solrdispatchfilter.java:732 [admin] webapp=null path=/admin/cores params={action=status&wt=json} status=0 qtime=2 2017-04-06 21:04:33,584 [info] <qtp1924582348-848>@solrdispatchfilter.java:732 [admin] webapp=null path=/admin/cores params={action=status&wt=json} status=0 qtime=2 2017-04-06 21:04:38,589 [info] <qtp1924582348-641>@solrdispatchfilter.java:732 [admin] webapp=null path=/admin/cores params={action=status&wt=json} status=0 qtime=2 2017-04-06 21:04:54,242 [info] <thread-1>@monitor.java:41 yokozuna has exited - shutting down solr 2017-04-06 21:04:55,219 [info] <thread-2>@server.java:320 graceful shutdown socketconnector@0.0.0.0:8093 2017-04-06 21:04:56,027 [info] <thread-2>@server.java:329 graceful shutdown o.e.j.w.webappcontext{/internal_solr,file:/var/lib/riak/yz_temp/solr-webapp/webapp/},/usr/lib64/ riak/lib/yokozuna-2.1.7-0-g6cf80ad/priv/solr/webapps/solr.war 2017-04-06 21:04:59,288 [info] <thread-2>@corecontainer.java:314 shutting down corecontainer instance=1916575798 2017-04-06 21:04:59,710 [info] <thread-2>@solrcore.java:1040 [feed_mapping_index]  closing solrcore org.apache.solr.core.solrcore@78acc5b 

however, after of merge processes in solr.log, getting following (which suspect preventing supervisor re-starting 2nd time, , hence stopping riak

2017-04-06 21:05:13,546 [info] <thread-2>@cachingdirectoryfactory.java:305 closing directory: /var/lib/riak/yz/endpoint_mappings_index/data 2017-04-06 21:05:13,547 [info] <thread-2>@cachingdirectoryfactory.java:236 looking close /var/lib/riak/yz/endpoint_mappings_index/data/index [cacheddir<<refcount=0;path= /var/lib/riak/yz/endpoint_mappings_index/data/index;done=false>>] 2017-04-06 21:05:13,547 [info] <thread-2>@cachingdirectoryfactory.java:305 closing directory: /var/lib/riak/yz/endpoint_mappings_index/data/index 2017-04-06 21:05:14,657 [info] <thread-2>@contexthandler.java:832 stopped o.e.j.w.webappcontext{/internal_solr,file:/var/lib/riak/yz_temp/solr-webapp/webapp/},/usr/lib64/ri ak/lib/yokozuna-2.1.7-0-g6cf80ad/priv/solr/webapps/solr.war 2017-04-06 21:05:15,298 [warn] <thread-2>@queuedthreadpool.java:145 79 threads not stopped 

erlang.log contains :

2017-04-06 21:04:54.193 [error] <0.5934.108> gen_server yz_solr_proc terminated reason: {timeout,{gen_server,call,[<0.1306.0>,{spawn_connection,{url,"http://localhost: 8093/internal_solr/admin/cores?action=status&wt=json","localhost",8093,undefined,undefined,"/internal_solr/admin/cores?action=status&wt=json",http,hostname},100,1,{[],false },[]}]}} 2017-04-06 21:04:54.198 [error] <0.5934.108> crash report process yz_solr_proc 0 neighbours exited reason: {timeout,{gen_server,call,[<0.1306.0>,{spawn_connection ,{url,"http://localhost:8093/internal_solr/admin/cores?action=status&wt=json","localhost",8093,undefined,undefined,"/internal_solr/admin/cores?action=status&wt=json",http,h ostname},100,1,{[],false},[]}]}} in gen_server:terminate/6 line 744 2017-04-06 21:04:54.201 [error] <0.1150.0> supervisor yz_solr_sup had child yz_solr_proc started yz_solr_proc:start_link("/var/lib/riak/yz", "/var/lib/riak/yz_temp", 8 093, 8985) @ <0.5934.108> exit reason {timeout,{gen_server,call,[<0.1306.0>,{spawn_connection,{url,"http://localhost:8093/internal_solr/admin/cores?action=status&wt=j son","localhost",8093,undefined,undefined,"/internal_solr/admin/cores?action=status&wt=json",http,hostname},100,1,{[],false},[]}]}} in context child_terminated 2017-04-06 21:04:57.422 [info] <0.1102.0>@riak_ensemble_peer:leading:631 {{kv,1141798154164767904846628775559596109106197299200,3,114179815416476790484662877555959610910619 7299200},'riak@pp2xria01trd001.pp2.williamhill.plc'}: leading 2017-04-06 21:04:57.422 [info] <0.1090.0>@riak_ensemble_peer:leading:631 {{kv,685078892498860742907977265335757665463718379520,3,6850788924988607429079772653357576654637183 79520},'riak@pp2xria01trd001.pp2.williamhill.plc'}: leading 2017-04-06 21:04:57.780 [info] <0.1072.0>@riak_ensemble_peer:leading:631 {{kv,0,3,0},'riak@pp2xria01trd001.pp2.williamhill.plc'}: leading 2017-04-06 21:05:01.432 [info] <0.8030.232>@yz_solr_proc:init:119 starting solr: "/usr/bin/riak/java" ["-djava.awt.headless=true","-djetty.home=/usr/lib64/riak/lib/yokozuna -2.1.7-0-g6cf80ad/priv/solr","-djetty.temp=/var/lib/riak/yz_temp","-djetty.port=8093","-dsolr.solr.home=/var/lib/riak/yz","-dhostcontext=/internal_solr","-cp","/usr/lib64/r iak/lib/yokozuna-2.1.7-0-g6cf80ad/priv/solr/start.jar","-dlog4j.configuration=file:///etc/riak/solr-log4j.properties","-dyz.lib.dir=/usr/lib64/riak/lib/yokozuna-2.1.7-0-g6c f80ad/priv/java_lib","-d64","-xms4g","-xmx4g","-xx:+usestringcache","-xx:+usecompressedoops","-dcom.sun.management.jmxremote.port=8985","-dcom.sun.management.jmxremote.auth enticate=false","-dcom.sun.management.jmxremote.ssl=false","org.eclipse.jetty.start.main"] 2017-04-06 21:05:01.483 [info] <0.1108.0>@riak_ensemble_peer:leading:631 {{kv,1370157784997721485815954530671515330927436759040,3,137015778499772148581595453067151533092743 6759040},'riak@pp2xria01trd001.pp2.williamhill.plc'}: leading 2017-04-06 21:05:02.032 [info] <0.8030.232>@yz_solr_proc:handle_info:184 solr stdout/err: openjdk 64-bit server vm warning: ignoring option usesplitverifier; support re moved in 8.0 openjdk 64-bit server vm warning: ignoring option usestringcache; support removed in 8.0  2017-04-06 21:05:04.212 [info] <0.1110.0>@riak_ensemble_peer:leading:631 {{kv,1415829711164312202009819681693899175291684651008,3,0},'riak@pp2xria01trd001.pp2.williamhill.p lc'}: leading 2017-04-06 21:05:10.798 [info] <0.1096.0>@riak_ensemble_peer:leading:631 {{kv,913438523331814323877303020447676887284957839360,3,9134385233318143238773030204476768872849578 39360},'riak@pp2xria01trd001.pp2.williamhill.plc'}: leading 2017-04-06 21:05:17.001 [info] <0.8030.232>@yz_solr_proc:handle_info:184 solr stdout/err: error: exception thrown agent : java.rmi.server.exportexception: port alrea dy in use: 8985; nested exception is:         java.net.bindexception: address in use (bind failed)  2017-04-06 21:05:17.964 [error] <0.8030.232> gen_server yz_solr_proc terminated reason: {"solr os process exited",1} 2017-04-06 21:05:17.964 [error] <0.8030.232> crash report process yz_solr_proc 0 neighbours exited reason: {"solr os process exited",1} in gen_server:terminate/6  line 744 2017-04-06 21:05:17.964 [error] <0.1150.0> supervisor yz_solr_sup had child yz_solr_proc started yz_solr_proc:start_link("/var/lib/riak/yz", "/var/lib/riak/yz_temp", 8 093, 8985) @ <0.8030.232> exit reason {"solr os process exited",1} in context child_terminated 2017-04-06 21:05:17.964 [error] <0.1150.0> supervisor yz_solr_sup had child yz_solr_proc started yz_solr_proc:start_link("/var/lib/riak/yz", "/var/lib/riak/yz_temp", 8 093, 8985) @ <0.8030.232> exit reason reached_max_restart_intensity in context shutdown 2017-04-06 21:05:17.964 [error] <0.1119.0> supervisor yz_sup had child yz_solr_sup started yz_solr_sup:start_link() @ <0.1150.0> exit reason shutdown in context  child_terminated 2017-04-06 21:05:17.964 [error] <0.1119.0> supervisor yz_sup had child yz_solr_sup started yz_solr_sup:start_link() @ <0.1150.0> exit reason reached_max_restart_ intensity in context shutdown 2017-04-06 21:05:23.072 [error] <0.1551.0> supervisor yz_index_hashtree_sup had child ignored started yz_index_hashtree:start_link() @ undefined exit reason kill ed in context shutdown_error 2017-04-06 21:05:24.353 [info] <0.745.0>@yz_app:prep_stop:74 stopping application yokozuna. 2017-04-06 21:05:27.582 [error] <0.745.0>@yz_app:prep_stop:82 stopping application yokozuna - exit:{noproc,{gen_server,call,[yz_solrq_drain_mgr,{drain,[]},infinity]}}. 2017-04-06 21:05:27.582 [info] <0.745.0>@yz_app:stop:88 stopped application yokozuna. 2017-04-06 21:05:27.940 [info] <0.7.0> application yokozuna exited reason: shutdown 2017-04-06 21:05:28.165 [info] <0.431.0>@riak_kv_app:prep_stop:228 stopping application riak_kv - marked service down. 2017-04-06 21:05:28.252 [info] <0.431.0>@riak_kv_app:prep_stop:232 unregistered pb services 2017-04-06 21:05:28.408 [info] <0.431.0>@riak_kv_app:prep_stop:237 unregistered webmachine routes 2017-04-06 21:05:28.459 [info] <0.431.0>@riak_kv_app:prep_stop:239 active put fsms completed 2017-04-06 21:05:29.665 [info] <0.540.0>@riak_kv_js_vm:terminate:237 spidermonkey vm (pool: riak_kv_js_hook) host stopping (<0.540.0>) 2017-04-06 21:05:29.665 [info] <0.539.0>@riak_kv_js_vm:terminate:237 spidermonkey vm (pool: riak_kv_js_hook) host stopping (<0.539.0>) 2017-04-06 21:05:30.379 [info] <0.532.0>@riak_kv_js_vm:terminate:237 spidermonkey vm (pool: riak_kv_js_reduce) host stopping (<0.532.0>) 2017-04-06 21:05:31.116 [info] <0.534.0>@riak_kv_js_vm:terminate:237 spidermonkey vm (pool: riak_kv_js_reduce) host stopping (<0.534.0>) 2017-04-06 21:05:31.362 [info] <0.533.0>@riak_kv_js_vm:terminate:237 spidermonkey vm (pool: riak_kv_js_reduce) host stopping (<0.533.0>) 2017-04-06 21:05:32.153 [info] <0.536.0>@riak_kv_js_vm:terminate:237 spidermonkey vm (pool: riak_kv_js_reduce) host stopping (<0.536.0>) 2017-04-06 21:05:32.245 [info] <0.537.0>@riak_kv_js_vm:terminate:237 spidermonkey vm (pool: riak_kv_js_reduce) host stopping (<0.537.0>) 2017-04-06 21:05:32.676 [info] <0.535.0>@riak_kv_js_vm:terminate:237 spidermonkey vm (pool: riak_kv_js_reduce) host stopping (<0.535.0>) 2017-04-06 21:05:33.450 [info] <0.431.0>@riak_kv_app:stop:250 stopped  application riak_kv. 2017-04-06 21:05:41.701 [info] <0.195.0>@riak_core_app:stop:116 stopped  application riak_core. 2017-04-06 21:05:43.061 [info] <0.93.0> alarm_handler: {clear,system_memory_high_watermark} 

we have options added riak.conf

search = on search.solr.jmx_port = 8985 search.solr.jvm_options = -d64 -xms4g -xmx4g -xx:+usestringcache -xx:+usecompressedoops search.solr.port = 8093 search.solr.start_timeout = 180s 

no sign of oom errors, or processes being killed oom_killer


Comments

Popular posts from this blog

Command prompt result in label. Python 2.7 -

javascript - How do I use URL parameters to change link href on page? -

amazon web services - AWS Route53 Trying To Get Site To Resolve To www -