Pathing Analysis in Fullstory - See The What and Why

Fullstory Data Scientist, Greg Larchev, and Solutions Engineer, Patrick Brandt, demonstrate how to use Pathing Utils to apply pathing analysis to data from Fullstory Data Export.

For e-commerce managers pathing analysis is critical for understanding the steps your users take to either convert or falloff. It can also be applied to situations beyond e-commerce. In fact, we recently used How to answer the what and the why using Fullstory data export on our engineering jobs page.

In this video, Greg and Patrick demonstrate how you can use Pathing Utils to dial-in on e-commerce conversions and will:

Give you an overview of Pathing Utils and Fullstory Data Export.
Use a Jupyter notebook to walk you through how to use pathing analysis with e-commerce data from OodaTime.com.
Discuss how user journeys can be further analyzed with Session Replay to find revenue-impacting points of friction.

You can find Greg’s Jupyter notebook and supporting Python packages on the Fullstory GitHub.

Watch the Video

Read the Transcript

Patrick Brandt: Hi, I'm Patrick. I'm a solutions engineer here at Fullstory, and today I'm joined by Greg from our data science team.

Greg Larchev: Pleasure to be here today.

Patrick Brandt: Today Greg and I are going to discuss an open source project that he created that demonstrates how Fullstory data can be paired with pathing analysis to create visualizations of how users travel through websites or through mobile apps. Before I begin, I want to make sure that everyone understands what Fullstory is. Fullstory is an analytics platform that records user experiences in websites and in mobile apps, and then it aggregates all of this data up so that can be further analyzed or create analytics reports and also to make individual session replays searchable, so our customers can recognize where their apps or their sites might not be performing well through analytics, and then they can dive right in to watch individual session replays to see where in their user experiences the friction is coming from, and so that they know exactly what they need to fix.

Patrick Brandt: We also have a product called Data Export that allows our customers to export the raw event data out of Fullstory and then land that into their data analytics platform of choice. That might be a data warehouse like Redshift or BigQuery or their file system. I should also add that Fullstory provides robust privacy controls, so that our customers can ensure that their users' data is protected. Greg, first, tell us where you got your data from.

Greg Larchev: A lot of our customers are e-commerce companies and those are the ones who are particularly interested in the pathing analysis. They want to see what sort of journeys their users take through their websites. In order to simulate some that data, we've put together a fake e-commerce site called OODAtime.com. It pretends to sell watches, and clocks and other timepieces. Even though you can't actually buy anything there, it generates some very neat pathing data for us.

Patrick Brandt: For those in the audience who would like to go to a fake e-comm site and not purchase anything, you can go to OODAtime.com. Just know that your data will be used for research purposes. It's worth mentioning that in order to generate a viable data set for us to do research on, we've sent a lot of bots, worked with another company to script some bots to go generate traffic on the site. A vast majority of the data we'll be looking at today is a result of that, but we have also interspersed some human activity as well. We have our data source. Tell me how you got the data out of Fullstory and where did you put it?

Greg Larchev: We have another open source tool called Hauser. It's maintained and was created by Fullstory. It's a part of the Data Export package that you've talked about. Hauser allows you to export your Fullstory session data to a warehouse of your choice. It supports Redshift, BigQuery–in this case, we just use it to download some of that station data to a local machine. Also, a slice of that data is included with the open source package that we provided.

Patrick Brandt: Okay, great. You used Hauser to export data out of Fullstory, you landed as JSON files in your local file system. We've stored some of these files out on GitHub for the purposes of this project. I think there are about 26,000 events representing these files. Cool. Now that we have the data, what are some of the features that you're looking for? Where do you begin?

Greg Larchev: An e-commerce customer would typically be interested in, like you mentioned, the paths or journeys that their users take through their websites. That would be well-represented by something called the funnel. A funnel is just a collection of URLs that a user would typically navigate through in succession. Each step of a funnel represents a single URL that a user would navigate to.

Patrick Brandt: All right, great. A good way to think about funnels: a path through a website that has some kind of business value. If you're an e-comm site, the first step of the funnel would be a user hitting the homepage, so OODAtime.com, in this case. The second step of the funnel might be someone viewing a product detail page, so like the red watch. Then the next step after that, the third and final step in our example would be someone going to the cart and then checking out and paying for their watch. You would expect that at each step of the funnel fewer people would get through, so 100% of your audience would go OODAtime.com and then some percentage, fewer would get to the second step and look at the red watch product detail page, and then even fewer still would actually go through the transaction and get to cart. Then when you stack these three steps on top of each other, it actually assumes the shape of a funnel. Cool. Now we know what funnels are, and we know where the data is. Let's look at the Jupyter notebook that you put together.

Greg Larchev: Sounds good.

Patrick Brandt: This is the pathing demo notebook that Greg built. We're just going to step through the blocks one at a time and discuss what's going on here. First block, I noticed that you're loading in some very common libraries that are used by data analysts when they're writing Python to understand data. You've got Matplotlib, Numpy, Pandas. We spend a lot of time with those as we go through this notebook. I noticed as well though that you're importing a lot of modules in the Path Utils package. What are some of the things that you provided through these modules?

Greg Larchev: Sure. The Path Utils package is sort of the all-encompassing package for this project. Each module there represents a different set of capabilities, different things that you can do with the funnel and path analysis that you'd like to perform. You can look at some of the most popular URLs there. You can get statistics for a single funnel, you can plot a Sankey Diagram, you can get session links for a funnel of interest, look at some timing analysis and so on.

Patrick Brandt: Wow, that is awesome. You've got a lot going on here. The next block it looks like you're just pointing to where the JSON files exist. In this case, it's the sample data directory in this project. Let's load those JSON files into memory. We've got our handful of files there, and again, this represents about 26,000 events, discrete events. If we're going to look at those, or at least 15 of them, we'll see some patterns here. First, I want to cover up on some of the fields that are inherent to data exports. This would be like the data export schema. All of these column names that start with a capital letter come out of the Data Export tool. You'll notice that we have a handful of IDs here. We're going to circle back on that in a second.

Patrick Brandt: There's an event start column. This is actually the timestamp at which the event occurred and then an event type column where there's a representation of several different event types that we record, and then some data about the page itself, page performance, and then the URL on which the event occurred. For anyone who wants to dive deep into what these fields are and what they do, you can go to the API reference that links through the notebook and then scroll down to this data dictionary. There's a description of each field, what its type is. The event type field in particular has a lot of detail about the events that we record, and some of these events are actually enrichment. We'll talk about that more in a minute, but to get really familiar with data export, it'd be a good idea to spend a lot of time on this page. All right. It looks like you've introduced some other fields here. I noticed that this looks like a multi-index data frame, so you've got some records that are rolling up under another ID, the SID. What is that? How did you create that?

Greg Larchev: Sure. As you can see here, every row of the data frame represents a single event and each of those has a column, several columns associated with it. Some of those are session ID and user ID–just sort of a set of unique identifiers for each event. A user, for instance, might have different sessions associated with them. What we do here is we aggregate a session ID and the user ID for each event into a single master index. This is what our SID value represents.

Patrick Brandt: Got it. If I look at the code here, you've got this preproc events method. Within that, you're creating a compound key of session ID and user ID and then rolling up all of the discrete events underneath. You're grouping them by that compound key. Those are the events that a particular user generated as they were going through their session. Cool. Since we're only concerned with analyzing funnels at the moment, we want to strip out any events that aren't navigation. You've included remove non-navigation method. If I run that, we will see now that the only event type that is in this data frame is the navigate event type. It also happens that we get a few more groups of sessions since we've limited the amount of data we're working with. Cool. Let's get an idea of how we can start to use this data. You've created your first visualization here, plotting a diagram of top most visited URLs. How would I use this if I'm interested in exploring and discovering particular funnels and user journeys?

Greg Larchev: Sure. If you're, for instance, a product manager for OODAtime.com and you're looking for a good place to start diving into your data, this would probably be it. The first thing you might be interested in is figuring out what are the actual URLs on your website that your users are visiting. This histogram right here gives you that information. It ranks the most frequently visited URLs on your site. As you would expect, the homepage is the most frequent one. This would probably be the case for most websites. From here on, you can sort of pick some of the interesting URLs, and then proceed to the next step and create some interesting funnels with those.

Patrick Brandt: All right, that's great. Let's do that. I noticed that you picked out one of these URLs, the cart path. If I run this code block, I see a collection of URL triplecates. What am I looking at here?

Greg Larchev: Sure. One of the things that our customers often ask us is, "What makes a good funnel, and how do I create a good funnel?" This code block here helps you do that. Let's go back to our product manager example. We chose a cart URL here because this is pretty important. This is where your customers actually buy your products and pay for them. You might want to take a look at where do the users who visit the cart come from and where do they go? This code block helps you find those funnels of interest. What's happening inside is that we take the original data frame and then we filter it to only include the sessions which contain the cart URL, and then we look at all the possible funnels that include that. We aggregate them all together, and this is the list that you're seeing here.

Patrick Brandt: Got it. Basically, you look at all the permutations that can exist with that cart URL that are three levels deep. In that case, the cart could come at the top of the funnel or it could be the second step or the third step, and then you do a further analysis to figure out given these different permutations, which are the ones that happen the most often, and then you stack rank them. Just to see this example in action, the most popular funnel using the cart URL, was a journey where a user went to the men's collection. Then they went to the red watch product from the men's collection, and then they went to the cart page from the red watch, and then you've got a few other funnels here, decreasing popularity. It looks like now, if we look at the next code block, that you've plucked out one of these funnels, the one that goes through the blue watch. That would be the fourth most popular funnel. You pulled that one out and you're running it through a get funnel stats function. I run this, it looks like another histogram. What am I looking at?

Greg Larchev: Right. This is the actual representation of a funnel that we chose. We chose the blue watch funnel because like you said, it's the fourth most common funnel in the above list. If you're a product manager, maybe you are interested in figuring out why the blue watch product doesn't convert as well as some of the other ones, so you want to dive a little bit deeper into it. Here, what we're looking at, the first step of the funnel is the men's collection. It's everybody who visits the men's collection product page, and then it looks like about a third to a half of those users proceed to the blue watch product, and then it seems like almost all of them, in this case, go ahead and proceed to the cart.

Patrick Brandt: Yeah. Side note, the fact that there isn't a bigger drop off between the blue watch product page and cart is a function of the way we scripted robots. You would expect in a real world scenario to have a much bigger drop off. We have a slight drop off, a few bots decided not to purchase. This, to me, is the most exciting part of the project that you built. Getting back to your example, Greg, of a product owner who wants to know why they're not getting more traffic on blue watches, it could be the case that I've run a promotion or I've spent money on a media, and I'm really trying to drive traffic down the specific path. By generating a Sankey Diagram using your plot funnel method here, I can start to see where else that expected traffic went. Talk to me a little bit about how you made this and how I as a product owner might use it.

Greg Larchev: Sure. We used a Python visualization library called Plotly. It's a very powerful visualization tool and is great for producing charts. This Sankey Diagram is one of the examples of the things that it can do. As you can see here, you can still see the original funnel in pink. Each node still represents the relative size of each funnel step, but there is quite a bit of additional information that this diagram provides. You can actually see where the users come into your funnel and where do they go from each step of the funnel. That just provides a little bit more context around your funnel conversion. For instance, here, you can see that a larger percentage of your users, once they navigate to the men's collection, they actually proceed to the red watch product as opposed to the blue watch.

Patrick Brandt: Right. That would be a really interesting discovery for me. I didn't anticipate a lot of folks to go to my red watch if I'm running a blue watch campaign. I now would like to explore why this might be happening, and then using the Fullstory Session Replay capability that I mentioned at the top of the conversation, I can create some deep links to sessions to watch exactly what was going on. Just very quickly, walk me through how you're creating these links.

Greg Larchev: Absolutely. You wouldn't be a Fullstory user if you didn't want to actually see some sessions. What we're doing here, we're taking our blue watch funnel, and we're using this code block to generate session links for it. As we've mentioned earlier, each event has a session ID and a user ID associated with it. We also provide something called an org ID, which is the internal Fullstory identifier for all of our customers. Using those three parameters, we can use the session URL template and then generate a session link that would include our funnel of interest.

Patrick Brandt: Right. The org ID is like an account ID. Each one of our customers has one that's unique to their account. At that point, it's just a matter of concatenating a few parameters into a string template, and you've got a session replay URL. The next block, you're doing something really interesting. We're stepping outside of just navigation events. You're interested in finding those session replays where someone created what we call a frustration event. As part of our analysis on the event data that streams into Fullstory, we'll actually apply some heuristics that detect when users may be unhappy or otherwise be demonstrating frustration. We've got things like a Dead Click, which means someone clicked on a thing and nothing happened. We've got an Error Click where someone clicked on a thing and we detected a JavaScript error, a runtime error. Then we've got Rage Clicks. This is a heuristic that detects someone rapidly clicking on a button or some spot in the website or the app. Then we surface that separately. If I'm looking at this block, it looks like that's exactly what you're doing. You've defined a Rage Click type and you've got to find a function that takes a funnel and a click type. Walk me through how you're doing this analysis.

Greg Larchev: Right. Like you mentioned, a lot of our customers are specifically interested in our frustration heuristics because this really is the one thing that can tell you that there is something wrong with your website or a portion of your website. We just thought it would be a good idea to incorporate that into our final analysis. Let's say in this case we still are interested in our blue watch funnel, but we only want to see the sessions that include a Rage Click in this case. What's happening underneath the hood is we're filtering the original full data frame, the one that includes all of the events to only include the sessions which contain a Rage Click. Then we go ahead and generate the session links for the sessions, which do have the blue watch funnel that we've provided earlier.

Patrick Brandt: All right, cool. Let's dive into one of these. This was the first link that you generated. I'm going to pause real quick to make sure. I'll just give a quick tour of Fullstory. This is the actual session replay here, the middle of the screen. This is what the user saw, and then this is our interpretation of the events. These are the events as the user produced them over here on the right side. If I play this, we'll see that someone is scrolling around on the watch, looking at watches and then bam, bam, bam. Over here on the right hand side of the screen I'm seeing a lot of clicks happen in rapid succession and ultimately, we have a Rage Click.

Patrick Brandt: I can tell by watching the session replay that someone was, if I go back just a little bit, that someone was clicking on this watch face multiple times expecting it to do something, and nothing happens. Now I, as a product owner, can understand that there's some kind of intrinsic behavior perhaps or some kind of affordance that we're not providing. I have the opportunity now to go and maybe make that watch face linkable. Very cool. If I go back now to the notebook, it looks like there's some other things that you've done here. Unfortunately, we won't have time to look at these in detail, but as homework, our audience might want to check these out. Quick summary, you know, what else can we do with this utility you built?

Greg Larchev: You can also do things like timing analysis. This is where you look at how long it takes your users to navigate through your funnel and whether some of the funnel state steps take longer than others. You can also dive a little bit deeper into the inflow and outflow statistics for a funnel. This is the information that's similar to what you saw earlier in the Sankey Diagram, but these tools just provide a little bit more depth.

Patrick Brandt: Awesome. All of this code is on GitHub. It's in the Fullstory dev organization, the Path Utils project. Please anyone, feel free to clone it and play with it, apply for your Fullstory customer. You can apply it to your own project, your own needs. Also, if there is a feature you would like to include in the open source project, we welcome pull requests. All contributions are welcome, so please feel free to do that. Thank you, Greg. This is amazing work. Thanks so much for demonstrating how powerful data export is and happy hunting.