PubNub Chat: a Cautionary Tale

PubNub advertises their chat as a feature-rich foundation for building scalable, real-time messaging experiences:
PubNub Chat enables developers to quickly build and scale fast, powerful live chat experiences without the hassle of managing real-time infrastructure, ensuring low latency performance.
Their JavaScript SDK promises to abstract away the technical complexity:
[…] Meant to be easy & intuitive to use as it focuses on features you would most likely build in your chat app, not PubNub APIs and all the technicalities behind them.
After thousands in overages and a mountain of user complaints, I can say with confidence: this isn’t true.
Rewinding to August 2024
The current date of this article is July 4th, 2025. But to understand our issues, we need to go back to August 2024, when the warning signs first appeared. In August, I opened the bug on the main repository of the PubNub javascript SDK highlighting a critical issue: refreshing the app spiked CPU usage to 99%.
This bug was able to be reproduced with the PubNub Chat SDK very easily, and almost killed our application before it got off the ground. Our core application is built in Tauri, so users rely on their OS’s native webview instead of Chromium like in Electron.
Up to 20% of our users were leveraging MacOS and doing something as simple as refreshing the application would immediately result in a memory leak that would take up all of their system resources. Refreshes happen during version updates, wake-from-sleep events, or when Vite encounters chunk-loading errors.
Patching The Issue
We built a custom Tauri plugin to hook into navigation events and forcibly disconnect PubNub if the user refreshed the page. We hooked Control+R
and then kicked off the disconnect - but the disconnect had to be extremely thorough. This wasn’t just about disconnecting the SDK, though. We had to manually garbage collect every related listener and entity before allowing a refresh.
disconnect = () => {
if (!this.chat) {
return;
}
if (this.currentChannelParticipantListStreamDisconnectFunction) {
this.currentChannelParticipantListStreamDisconnectFunction();
}
if (this.inviteListenerFn) {
this.inviteListenerFn();
}
if (this.customStatusListenerFn) {
this.customStatusListenerFn();
}
if (this.customEventListenerFn) {
this.customEventListenerFn();
}
if (this.currentChannelListDisconnectFunction) {
this.currentChannelListDisconnectFunction();
}
this.chat.sdk.destroy(true);
this.chatStore.$reset();
};
And these are all promises, so it was almost always a race - what’s going on under the hood? Will this cause the leak because we want the refresh to be faster? (spoiler: yes, the leak still occurred randomly). However, who wants to wait for all these promises to resolve before refreshing?
Lastly, our users don’t necessarily know to open activity monitor, and should not be expected to.
Quick Sidebar: Abusive Overage Charges
Before diving deeper, it’s worth pausing to talk about the billing. I told them up front: we needed support for 3,000 active users. They sold us bundled transactions that “should have been enough.” I want to be clear that this maybe is the case, but when we get into the environmental considerations and the issues we were dealing with due to the SDK’s architecture that this was not enough. Every solution they pitched pushed complexity and cost-saving effort onto us. To insert this again:
[…] Meant to be easy & intuitive to use as it focuses on features you would most likely build in your chat app, not PubNub APIs and all the technicalities behind them.
Security & Access Control
Security inside of PubNub is handled by their Access Manager, which is a signed token that also assigns permissions inside of said token. This design has serious flaws. PubNub’s own permission model contradicts itself.
Channel Permissions Are Horrible
If a user can invite, they can also kick. No separation of roles. Both set channel members
and remove channel members
mean the user must be given Manage
on the channel itself.
There are two approaches here: you can assign the channel one at a time in the token with Manage. Or, you can use a wildcard regex as supported by the Access Manager. If you do one channel at a time the token will grow to be unsustainable in size after about 100 channels (see: HTTP Nonsense section). Secondly, YOU then need to track every single channel a user can join.
If you choose the Regex route, then you will not be able to add users to channels dynamically because the regex method cannot possibly work for this.
We handled channels on our own infra, naming them using channelId patterns to keep tokens small.
d.[myid].[other-user-id]
But this does not work for team channels. If you have a group of users and want them to add another user, the channel id must change. If the channel id changes, then everyone must be notified of this change, disconnect from the channel, and connect to the new channel id.
Additionally, if the user gets booted from a channel then the id must be reverted back or else the user can technically still join
it - because it still exists in their token so they still have permissions.
For large team channels, this means essentially a free for all shit-show. There is no mechanism in which to invite users to a large channel that controls who can invite or edit a channel. So again, your server has to do all the access control.
Keep in mind, for each channel you assign, you must also assign a -pnpres
permission afterward because the built-in SDK automatically joins the presence channel. This, again, swells the token size. Theoretically, you could just let them join any *-pnpres
channel. From a security perspective? Absolutely not.
Note: Bad Memberships Breaks The Entire SDK
If a user lacks permissions or a channel no longer exists, the SDK doesn’t degrade gracefully - it implodes. We implemented a second tier validation to make sure that the memberships could be joined against the token to prevent this implosion, otherwise the SDK crashes instead of simply skipping the problematic channel. I suspect this is because the underlying SDK adds it to the subscription and, since that subscription is run alongside others in the listener, the whole listener just terminates.
Naming Channels Is Awful
Naming is equally complicated in the context of chat. If we are a user and connect to a channel, users expect channel names to reflect who they’re talking to. Here’s what it takes to get a proper channel name in PubNub Chat:
Inside of PubNub chat, the flow is the following:
- Get Memberships (and iterate until you run out because there is no real mechanism to order these)
- For each membership, you must now get the members via
channel.getMembers()
- You can now have access to the channel name because the members are available
So, let’s say you have 100 channels in your application. Your application will:
- call
getMembership()
to get all 100 channels (or call it multiple times) - call
getMembers()
100 times to get members for each of the 100 channels
At this point, you manually strip your own ID from each member list and construct names yourself. Could you substring out your own name? Sure. But you’d also need to update all channel names every time a username changes.
And to make that work, you’d need to subscribe to every channel with streamUpdatesOn
, or else you’d never see the updates.
Joining Channels
So let’s assume that we’ve now made it this far - we have 100 channels and members for each channel.
In order to know if we get a message, we must watch()
every single channel we have. That’s expected. It’s basic chat functionality. Simply because I am not looking at a channel does not mean that I don’t need to be notified of new messages on it.
Want to send a message? You can’t just watch
, you have to join
. Joining does three operations:
- Set the memberships
- Subscribe (to both the channel and
-pnpres
) - Update the
LastReadMessageTimetoken
timestamp
From there, you can then send an actual message. But the crux of the issue is the watch
mechanic, because it only watches the messages - not reactions or edits.
Watching Message Updates \ Reactions \ New Members
To get message updates or reactions, you have to set up a second watcher - on every individual message. This means you also have to clean these up - or risk more memory leaks, since they’re active subscriptions.
You can technically watch a chunk of messages, but if your application is getting new messages in a higher velocity channel you might as well just streamUpdatesOn
the message(s) as they come in.
Message.streamUpdatesOn(messages, (messages) => {
// This chunks them all btw - not just the ones that changed. RIP.
console.log("Updated messages: ", messages)
})
To complicate it further, imagine you joined a channel and then while you were not looking got a message, then several emotes were applied. Unless you watch()
the message that just arrived, you’ll never know when you switch to that channel that a message with emotes was sent.
- When you join a channel and get the first page of history (max 25), you have to watch every message in the viewport yourself
- When you get a message on another channel you joined(), but have not looked at yet, you must also watch that message so if you were to switch to looking at that channel you don’t miss any actual message updates
- You can say “that sounds bad, I’ll just get message history again when I join the channel” (what we did, and it is kinda gross)
Handling New Members
I left out a really important part above - if we want to know if a channel member joins as well, then we need to watch memberships on the channel. Since I already talked about how many watchers we already have (and have to cleanup by hand) this is feeling a bit gross. Ultimately, we just watched for messages, added those same messages to the Message Watch, and if we did not recognize the user_id
on the message then we dynamically fetched the user from PubNub.
PubNub Chat Is Fundamentally Broken \ HTTP Nonsense
PubNub Chat relies on HTTP, and their SDK makes zero effort to handle URL size limits. Once users join too many channels, things just break. The problem gets worse with Access Manager, which appends massive tokens to each request URL.
This is made worse depending on how you’ve approach the previous issues (access control manager, channel id conventions). Depending on your channel ID structure, you hit the limit fast.
Adding Channel Group Support
To attempt to deal with the above issue, I ended up extending the SDK and implementing channel groups myself (let’s be real - this should have been done by their SDK team). I dynamically detect the size of the URL so the app does not implode. However, this also hit the limit on size as we have users who talk to several hundred people throughout the course of their duties in a given month (think HR, double digits per month). They need to be able to be messaged, so we cant simply have disappearing channels. Likewise, we need message history.
Channel Groups solved it for a about a month before things started breaking again, because there is a limit of 10. In practice, we could join ~140 channels. PubNub claims you can join thousands. Technically you can, but we can add a dozen asterisks to that with technical considerations.
The Issue Of Unstable Connections
PubNub handles unstable internet poorly. As you can see, during the boot phase of the application the app makes dozens of HTTP calls just to reach a usable state.
- Authenticate and connect
- Get all memberships()
- Get all members in the channels with memberships()
- Get history for the channel we’re looking at
Join()
the current channel- Add more subscribers for message edits on the messages we can see
- Subscribe to participant channel changes
In our core regions, network instability is common, prompting users to have backup connections. We also heavily use VPNs on and off, which means interruptions are common. PubNub does have a linear backoff + retry mechanism, but given the amount of http calls going on here we were not able to stabilize the connection at all for users that were having connection instability.
Take, for instance, the PNTimeoutCategory would occur during one of these operations and, depending on where that occurred, particularly getMemberships()
we would basically be stuck failing to boot our entire application and need to manually wait and retry. If during that time the underlying connection to some channels failed (common) then the application would broadcast a “giving up” status.
$2800.00 In Overages
This is where most of our overages piled up. Despite the fact that it was a timeout, PubNub was reporting back to us that nearly half of all our overages were in the getMembers()
operation. The only thing I can assume is that the linearRetryPolicy
was actually kicking back and recalling the getMembers()
operation before giving up.
What makes this particularly difficult to debug is that our logs reflect another story rather than what support was able to communicate to us directly. Our logs reflect a PNTimeoutCategory for most operations, which suggests that PubNub was still accepting those calls and billing us for them.
[ERROR] Failed to get channel members
[ERROR] {"status":{"error":true,"category":"PNTimeoutCategory","statusCode":0,"errorData":"REDACTED"},"name":"PubNubError"}
[ERROR] Failed to get channel members
[ERROR] {"status":{"error":true,"category":"PNTimeoutCategory","statusCode":0,"errorData":"REDACTED"},"name":"PubNubError"}
[ERROR] Failed to get channel members
[ERROR] {"status":{"error":true,"category":"PNTimeoutCategory","statusCode":0,"errorData":"REDACTED"},"name":"PubNubError"}
[ERROR] Failed to get channel members
[ERROR] {"status":{"error":true,"category":"PNTimeoutCategory","statusCode":0,"errorData":"REDACTED"},"name":"PubNubError"}
[ERROR] Failed to get channel members
[ERROR] {"status":{"error":true,"category":"PNTimeoutCategory","statusCode":0,"errorData":"REDACTED"},"name":"PubNubError"}
[ERROR] Failed to get channel members
[ERROR] {"status":{"error":true,"category":"PNTimeoutCategory","statusCode":0,"errorData":"REDACTED"},"name":"PubNubError"}
[ERROR] Failed to get channel members
[ERROR] {"status":{"error":true,"category":"PNTimeoutCategory","statusCode":0,"errorData":"REDACTED"},"name":"PubNubError"}
[ERROR] Failed to get channel members
[ERROR] {"status":{"error":true,"category":"PNTimeoutCategory","statusCode":0,"errorData":"REDACTED"},"name":"PubNubError"}
[ERROR] Failed to get channel members
[ERROR] {"status":{"error":true,"category":"PNTimeoutCategory","statusCode":0,"errorData":"REDACTED"},"name":"PubNubError"}
[ERROR] Failed to get channel members
[ERROR] {"status":{"error":true,"category":"PNTimeoutCategory","statusCode":0,"errorData":"REDACTED"},"name":"PubNubError"}
[ERROR] Failed to get channel members
[ERROR] {"status":{"error":true,"category":"PNTimeoutCategory","statusCode":0,"errorData":"REDACTED"},"name":"PubNubError"}
[ERROR] Failed to get channel members
const linearRetryPolicy = LinearRetryPolicy({
delay: 5000,
maximumRetry: 10,
});
const [chatInitError, chat] = await withCatch(
Chat.init({
// ...
restore: true,
listenToBrowserNetworkEvents: true,
retryConfiguration: linearRetryPolicy,
// needed or else Safari will hang up the connection if nothing is happening for 60 seconds
subscribeRequestTimeout: 55
}),
);
This user has 300 channels, sorted into 10 channel groups. We still need to iterate over ever channel to build an accurate member list. We could cache some of it, but the problem is the other user could have changed their chat name. So, we must loop and query them all. But we have no way of knowing where the instability was. Likewise, having our app remember which one failed seems like a very crazy approach.
The Javascript Chat SDK is basically abandoned
I opened several issues between August and November example, example 2, leak issue. None were fixed.
So what did PubNub do instead, across three quarters? They deprecated the JavaScript Chat SDK and started over in Kotlin.. Still no channel groups as of today.
What Now?
We migrated to GetStream.io and ripped out nearly all the code we wrote to manage, recover, and clean up PubNub connections.
Factoring in nearly $3,000 in PubNub overages, GetStream pays for itself - even at a higher base price.