• Switched opensm back to jwsubnet11 by restarting opensmd on jwsubnet12. Observed similar behavior:

    Jan 13 08:12:36 184984 [A0BE9700] 0x80 -> Entering STANDBY state
    Jan 13 09:16:26 256916 [9F3E6700] 0x01 -> log_rcv_cb_error: ERR 3111: Received MAD with error status = 0xC
                            SubnGetResp(SMInfo), attr_mod 0x0, TID 0x26411
                            Initial path: 0,1,59,3,70 Return path: 0,71,2,57,1
    Jan 13 09:17:16 257737 [E3E6F700] 0x80 -> Entering DISCOVERING state
    Jan 13 09:17:19 357467 [A0BE9700] 0x80 -> Entering MASTER state

    and 41 seconds later:

    Jan 13 09:17:23 558144 [A0BE9700] 0x02 -> osm_ucast_mgr_process: updn tables configured on all switches
    Jan 13 09:17:23 620654 [A0BE9700] 0x02 -> osm_ucast_mgr_process: chain tables configured on all switches
    Jan 13 09:18:00 079821 [A0BE9700] 0x02 -> SUBNET UP

    Pings also got interrupted for that period:

    [1610525837.997448] 64 bytes from jwb0001i.juwels (10.13.23.11): icmp_seq=61 ttl=64 time=0.083 ms
    [1610525839.021447] 64 bytes from jwb0001i.juwels (10.13.23.11): icmp_seq=62 ttl=64 time=0.082 ms
    [1610525882.477339] 64 bytes from jwb0001i.juwels (10.13.23.11): icmp_seq=104 ttl=64 time=448 ms
    [1610525883.030185] 64 bytes from jwb0001i.juwels (10.13.23.11): icmp_seq=105 ttl=64 time=0.082 ms

    It is possible to get timestamps with -D.

  • Today the testing with adjusted parameters showed the following results:

    Feb 09 09:49:28 273720 [FF17D700] 0x80 -> Entering MASTER state
    Feb 09 09:50:39 772438 [FF17D700] 0x02 -> SUBNET UP
    --
    Feb 09 10:04:27 454906 [1F730700] 0x80 -> Entering MASTER state
    Feb 09 10:04:44 283030 [1F730700] 0x02 -> SUBNET UP
    --
    Feb 09 10:38:20 151123 [1215B700] 0x80 -> Entering MASTER state
    Feb 09 10:38:36 560428 [1215B700] 0x02 -> SUBNET UP

    The first one is the reference run without tweaking. After adjusting some parameters the result improved from 1:11 down to 16-17 seconds. The following was adjusted in opensm.log:

    max_wire_smps 32
    max_wire_smps2 32
    routing_threads_num 0

    and in updn_lid_tracking.conf:

    routing_threads_num 0

    routing_threads_num needs to be updated in two files due to a bug, should no longer be necessary once we move to opensm 5.8.1.

Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment