save file as get dimensions puppeteer js
How to Save a File and Get Dimensions in Puppeteer JS
If you're here, you probably need to know how to save a file and get its dimensions using Puppeteer JS. Well, you came to the right place! I've had to do this myself, and I'll share with you how I accomplished it.
Saving a File
The first step is to save the file. Puppeteer JS makes this easy, as it provides a page.pdf()
method that allows you to save a page as a PDF file.
const puppeteer = require('puppeteer');
async function savePageAsPDF(url, path) {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(url, {waitUntil: 'networkidle0'});
await page.pdf({path: path, format: 'A4'});
await browser.close();
}
savePageAsPDF('https://example.com', './example.pdf');
In this example code, we first import the Puppeteer module and define an async function called savePageAsPDF()
. This function takes in two arguments: the URL of the page we want to save and the path and filename of the PDF file we want to create.
We then launch a new instance of Puppeteer, create a new page, and navigate to the URL we passed in. We wait until the page finishes loading before calling page.pdf()
. This method accepts an object as its argument that specifies the path and format of the PDF file we want to create.
Finally, we close the browser.
Getting Dimensions
Now that we have our PDF file saved, we can use another library to get its dimensions. One such library is pdf2json. This library parses PDF files and converts them to JSON format, making it easy to extract data from them.
const fs = require('fs');
const pdf2json = require('pdf2json');
const pdfBuffer = fs.readFileSync('./example.pdf');
const pdfParser = new pdf2json();
pdfParser.on('pdfParser_dataReady', function(pdfData) {
const dimensions = {
width: pdfData.formImage.Width,
height: pdfData.formImage.Height
};
console.log(dimensions);
});
pdfParser.parseBuffer(pdfBuffer);
In this code, we first import the fs
and pdf2json
modules. We then use fs.readFileSync()
to read the PDF file we just created and store it in a buffer.
We then create a new instance of pdf2json
and call its parseBuffer()
method, passing in the PDF buffer we just created. We also define an event listener for the pdfParser_dataReady
event, which is fired when the parsing is complete.
Inside the event listener, we create a new object called dimensions
and set its width
and height
properties to the width and height of the form image in the PDF data.
We then log the dimensions
object to the console. You can use this data however you like.
Alternative Method
If you don't want to use an external library like pdf2json
, you can also use Puppeteer's page.evaluate()
method to execute JavaScript code on the page itself and extract the dimensions that way.
async function getPDFDimensions(path) {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(`file://${path}`, {waitUntil: 'networkidle0'});
const dimensions = await page.evaluate(() => {
const pdf = document.querySelector('embed');
return {
width: pdf.offsetWidth,
height: pdf.offsetHeight
};
});
await browser.close();
return dimensions;
}
getPDFDimensions('./example.pdf').then(dimensions => console.log(dimensions));
In this code, we define a new async function called getPDFDimensions()
, which takes in the path and filename of the PDF file we want to extract dimensions from.
We launch a new instance of Puppeteer, create a new page, and navigate to the PDF file using a file URL. We wait until the page is loaded before calling page.evaluate()
.
Inside page.evaluate()
, we use JavaScript to select the <embed>
element that contains the PDF and extract its width and height using its offsetWidth
and offsetHeight
properties.
We then close the browser and return the dimensions as a Promise. We log the dimensions to the console using then()
.
And that's it! Those are two ways you can save a file and get its dimensions using Puppeteer JS.