How to track down and fix network device issues
Last Update: March 22, 2025
One of the most frustrating network issues to track down is when there is a misbehaving device inside your network.
This can manifest as your entire network seemingly not working right, or perhaps even more frustratingly, just some devices aren't working right. Ever had your wife walk in and say, "The internet isn't working on my phone"? And you respond with, "My phone, laptop, and every other device seem to be working fine. It must be you." How'd that go for you?
The larger your network is, with more network switches, the more likely you are to run into the scenario where just some, but not all, devices are affected. If you're running a home lab, have a Plex, NAS, Minecraft, Home Assistant, and Pi-hole server (among others), then you can probably relate.
Since you're already monitoring your network with Latency Llama (you are monitoring your network with Latency Llama right?! If you're not, your network issues get a lot harder to solve. Just saying.), you can visually see what part of your network is having issues. Those issues will manifest in either high latency, high packet loss, or both.
Quick detour on how monitoring works with Latency Llama: You install a small client program on a machine (raspberry pi, gaming rig, NAS, whatever), ideally hardwired, inside your network. You then specify other machines in your network (a.k.a. "targets") and Latency Llama will monitor network statics (e.g. latency, packet loss) between the client and the targets. To get the best view of your network, the client machine would be hardwired directly to your router, but that isn't a requirement.
The targets can really be anything on your network from access points to a Roku. To get visibility on your entire network, specify targets down each branch of your network relative to the client machine. In a typical network, your have a router connected to one or more larger network switches (often managed switches on larger networks) and spider out to small, unmanaged switches that end devices are plugged into (like that 8-port switch behind your TV that your TV, Roku, Xbox, Playstation, and Switch are all plugged into). So your targets will be one device behind every one of those tiny unmanaged switches. Even if you have a smaller network, you probably have a few of them. Don't target every device in your house since that can just add a bunch of traffic to your network and confuse things.
Even if you have primarily a wireless network, target your mesh nodes or access points. Pro-tip: use wired backhaul (i.e. use wires to connect your nodes instead of meshing) on your mesh nodes if you can; it'll greatly reduce latency on your network and reduces the likelihood of more exotic problems.
It's worth noting that everyone's go-to method for tracking down network issues, the internet speed test, would likely be useless for tracking down issues on your LAN.
Gotchas
The reason you need to start measuring latency BEFORE you have issues is to establish a baseline. Theoretically, latency should be the measure of the time data spends traveling through your network, but the CPU of your target devices also takes time to respond. This means that measured latency can sometimes appear higher than you're expecting for a given target device, even though everything is perfectly fine. If you know this baseline, it's easier to understand when issues arise.
You might think something like a high-end managed network switch makes a great target device to measure latency off of, but you may get surprised. Due to the way network switches work, data passing through the switch is crazy fast, but the switch itself has a really tiny CPU, so a ping request (the way latency is measured) can take a long time to respond to. That means that the measured latency of traffic going through your switches to devices beyond it will appear to be lower than the measured latency stopping at the switch, which logically doesn't makes sense since going through it is a further distance. If you don't know what to expect out of your network, it's hard to know when something is wrong vs just unexpected.
Similar issues can be seen with other network devices like security cameras or other IoT devices. They have extremely low-powered CPUs, so they may have slow responses to pings when they are working just fine and there is no issue with your network.
Even the exact same computer running Windows vs Linux can give a different latency reading.
TLDR: Continuously measure your network with Latency Llama.
Solutions
While the issue can certainly be one of the network devices themselves, the more likely culprit is a misconfigured device connected to the network. It's largely trial and error to find the problem device. Basically, just disconnect smaller and smaller branches of your network until you're disconnecting individual devices to find the issues.
Between tests, you'll likely need to restart a bunch of stuff to get everything back to working order. Often just unplugging the problem device won't stop the issues once they've started.
Wireless devices complicate this process a lot. If you see an issue with one mesh node or access point and unplug it, the devices attached to it, including the misbehaving one, may just jump to another node or access point within range. So now the issues present somewhere else in your network.
Also, mesh nodes... mesh... meaning that unplugging wired backhaul will just cause it attach via wireless meshing. A lot of modern access points will do this too. (In case you're wondering, there isn't a lot of difference between access points and mesh nodes these days. If they have wired backhaul, they're essentially access points. If they are using wireless backhaul, they are basically mesh nodes.)
For wireless, you generally need to completely power off devices to get them off your network. You can log into your access point, router, etc. to see what devices are attached to each mesh node/access point.
This is where the Latency Llama dashboard will become your best friend. You can visually see which branch of your network is misbehaving, so you're not having to wait around for something to feel wonky. You can directly see when the issue is presenting itself and often see generally where in your network as well.
Case Studies
If you've never actually tracked down a network issue, it can sometimes be hard to know what to look for (sadly, the answer is rarely 42), so here are some real-world examples to get you started.
Wireless Washing Machine
It's still not clear why every single device in our homes need to have Wi-Fi, but resistance appears to be futile. (Wi-Fi enabled, IoT spoons are probably next, because that's something the world needs...)
This one was a little easier to track down because the washing machine always seemed to have issues connecting to the network (it was eventually just used as a dumb washing machine, but the Wi-Fi was left on), so it was an obvious place to start looking when the access point it was connected to started showing high latency. Despite never actually working as a "smart" device, the washing machine was still connected to the wireless network and apparently flooding it with junk traffic.
The steps to prove that this was the misbehaving device:
- Power it off as well as all the network hardware
- Turn everything on except the washing machine
- Watch the Latency Llama dashboard for a day to verify there are no network issues
- Turn on the washing machine and see what happens
In this case, within minutes, latency spiked to the access point the washing machine was connected to, so it was clear that this was indeed the problem.
The solution was just to go into the router settings and block the washing machine from being able to connect.
Satellite TV Box
For an unnamed Satellite TV provider, there is a box sitting next to every TV to provide service to that TV. All the data comes in over a combination of Wi-FI and COAX. However, the boxes have ethernet ports.
A well-meaning individual (not to point any fingers... Grandpa) used that unnecessary ethernet port to plug the box into a nearby network switch. Turns out that ethernet port was completely un-configured and was just dumping random electrical signals onto the network. You can probably guess how well that went. It basically seemed like a bunch of junk data to all the other devices on the network and caused all kinds of problems.
Fortunately, this was one of those situations where unplugging the misconfigured device would immediately fix the problem, so the troubleshooting steps were:
- Wait for the problem to manifest by watching the Latency Llama dashboard
- Then start unplugging while watching the dashboard; starting big by unplugging whole branches of the network, and working down to individual devices
Ubiquiti Unifi Devices
This was an interesting case where the network infrastructure itself was the problem, but not because it was broken or underpowered. Sometimes, if you can't isolate an end device as the problem, try removing/replacing network infrastructure like switches and access points/mesh nodes.
On a network with multiple Unifi devices (Dream Machine Pro, several U6 Pro access points, several Flex switches, and a bunch of cameras), the access points had an off-brand, unmanaged switch (all Unifi switches are at least L2 managed) between them and the router (UDM Pro). The network would degrade at some point within 48 hours of powering everything up. There seemed to be no logic to when or how the network would fall apart. While watching the Latency Llama dashboard, latency would spike to several random targets. It required rebooting all access points, managed switches, and the router to get everything working again.
Removing the unmanaged switch and sticking in a Unifi managed switch magically fixed the problem (multiple unmanaged switches were tried, but only the Unifi managed switch fixed the issue). Ubiquiti networking gear is excellent so this is not a typical requirement. It could have been a bug in a specific version of Unifi software or there could have been something else about the network architecture that caused the issue to manifest. Regardless, this is why trying to remove basically everything in your network, piece by piece, is really important as you work to isolate any issues.
So Long, and Thanks for All the Llamas
Once you are using Latency Llama to visualize your network, you can see if the changes you're making have an effect and don't have to rely on running ping from a command line or going by if the network "feels" fixed.
Q.E.D.