Accessing iframe content and JavaScript variables from Puppeteer

Following on from the previous post about logging in and saving cookies with Puppeteer, I also needed to access content and, more specifically, a JavaScript variable present within the iframe itself from within Puppeteer as this contained information I was hunting down.

A working example of this code can be found in this git repository.

In this example, we will be loading Wikipedia with an iframe tester. Wikipedia has a rtlLangs variable available on the page which we will be accessing

What is Puppeteer?

Puppeteer is a Node/NPM package which allows you to create & control a headless Chrome instance, allowing you to do front-end/UI based tasks programmatically. It is hugely powerful and worth investigating if that is your thing. One of the most common examples is opening a page and taking a screenshot or submitting a form for testing.

Setup

For this we are going to be working in a single JavaScript file - make a new one called iframe.js in a fresh folder (or one where you are adding this functionality)

Install the dependencies

The only dependency we need for this is puppeteer.

npm i puppeteer --save

Set up the script

Inside your iframe.js add the following skeleton Puppeteer code

const puppeteer = require('puppeteer');

// Our main function
const run = async () => {
    // Create a new puppeteer browser
    const browser = await puppeteer.launch({
        // Change to `false` if you want to open the window
        headless: 'new',
    });

    // Create a new page in the browser
    const page = await browser.newPage();


    // Close the browser once you have finished
    browser.close();
}

// Run it all
run();

Once saved, you can run the following to start your script

node iframe.js

Find your iframe

Once you have your code set up and running, the next step is to load (goto) the page with the iframe and locate it in the source. The location can be either via an ID or a HTML selector.

Note: When selecting your iframe, be careful of who has control over the HTML and consider if the structure could change or if more than one iframe could appear on the page. Have a look at the docs about what kind of selectors you can use.

const puppeteer = require('puppeteer');

const run = async () => {
    // Create a new puppeteer browser
    const browser = await puppeteer.launch({
        // Change to `false` if you want to open the window
        headless: 'new',
    });

    // Create a new page in the browser
    const page = await browser.newPage();

    // Go to the page and wait for everything to load - this ensures the iframe has loaded
   await page.goto('https://iframetester.com/?url=https://www.wikipedia.org/', {
        waitUntil: ['domcontentloaded', 'networkidle2'],
        timeout: 0
    });

+   // Get the iframe
+   const elementHandle = await page.$('#iframe-window');
+
+   // Get the `src` property to verify we have the iframe
+   const src = await (await elementHandle.getProperty('src')).jsonValue();
+
+   // Output the src
+   console.log(src);

    // Close the browser once you have finished
    browser.close();
};

run();

Access the iframe content & variables

With our iframe loaded and verified, we can now access the content on the iframe. This can be done with the contentFrame() function on our iframe variable.

const frame = await elementHandle.contentFrame();

Once in our frame, we can run evaluate , which is a function which allows you to evaluate JavaScript on the page (or in this instance, frame).

The rtlLangs paramter is the name of the JavaScript variable on the page

const rtlLangs = await frame.evaluate('rtlLangs');

With that, the final code looks like:

const puppeteer = require('puppeteer');

const run = async () => {
    // Create a new puppeteer browser
    const browser = await puppeteer.launch({
        // Change to `false` if you want to open the window
        headless: 'new',
    });

    // Create a new page in the browser
    const page = await browser.newPage();

    // Go to the page and wait for everything to load - this ensures the iframe has loaded
   await page.goto('https://iframetester.com/?url=https://www.wikipedia.org/', {
        waitUntil: ['domcontentloaded', 'networkidle2'],
        timeout: 0
    });

    // Get the iframe
    const elementHandle = await page.$('#iframe-window');

    // Access the frame content of the selected iframe
    const frame = await elementHandle.contentFrame();

    // Evaluate JavaScript variable available and store the output
	const rtlLangs = await frame.evaluate('rtlLangs');

    // Log the output of the variable
	console.log(rtlLangs);

    // Close the browser once you have finished
    browser.close();
};

run();

Once we have access to the frame, we can load JavaScript variables, access the HTML or navigate as you would a normal page.

View this post on Github

You might also enjoy…

Mike Street

Written by Mike Street

Mike is a CTO and Lead Developer from Brighton, UK. He spends his time writing, cycling and coding. You can find Mike on Mastodon.