Hello
we have a problem, where a python script using the OpenVino Inference Engine in GPU mode causes Linux to enter a "Zombie-Mode", where the PC does not react to anything (no ACPI shutdown works, the screen freezes and even the Magic SysRq keys have no effect) and more network traffic than 100MBit switches can handle is created. After some investigation a problem with the Intel GPU driver seems likely. See https://software.intel.com/en-us/node/804844?page=0
Problem description:
We are running the attached python script multiple times in separated Docker containers. The used hardware is either an Intel NUC7BNH, an NUC7i7DNH or an NUC8BEH (on the NUC8 no freeze was observed until now). The OS is an Ubuntu 16.04 (with patched kernel 4.7.0.intel.r5.0 or kernel 4.15.0-15-generic (freezes happen less frequent with kernel 4.15).
What happens is that the Linux freezes randomly after some time (with the NUC7i7DNH and the patched kernel 4.7.0.intel.r5.0 it happens after a few minutes, with the 4.15 kernel freezes it takes a few hours or even days until the freeze happens). When it freezes no ACPI shutdown works, the screen freezes and even the Magic SysRq keys have no effect. A strange side effect is that a lot of network traffic is created (so much traffic that the network dies and no PC on the switch can communicate). The logs (kern.log, syslog) show nothing special.
If anyone observed a similar problem or has an idea, what can cause this behaviour, please let me know.
Greetings,
Thomas