javascript - PhantomJS Web Scraping Cisco Switch Web Interface -
i got phantomjs using phantomjs first developer job.
i've been tasked web scrape network switch information (hostname, productid, ipaddress, mac address, etc) old cisco catalyst 2960 x switch connected pc via lan cable.
i got http authenticatiion working fine phantomjs headless browser , can open first switch page leads startup page seen in image below.cisco switch startup report
this startup page appears first time login/access switch after witch user must click continue button has form button input property shown below. (written in ajax way)
<form method="get"> <input type="button" name="button1" value="continue" onclick="setcookiesandloadciscodevicemanager()"></form>
usually on chrome browser click on , move on. subsequently brings main page of interest, cisco device manager page containing switch information.(not allowed post picture available on phantomjs group discussion page)
my question is, to bypass startup report phantomjs headless browser best approach? either...
- simulate button press on form submission method above triggering link go next page ($.ajax() comes mind) or...
- call function setcookiesandloadciscodevicemanager() through .js file (more on latter). more of hacking approach.
the architecture of switch web pages outlined here
when url 10.44.39.252 first requested 3 frame src called. know through phantomjs callback
page.onnavigationrequested
- frmwrkresource.htm
- topbannernofpv.shtml
- setup_report.htm
input "button1" exists inside setup_report.htm frame. when "button1" pressed
setscookiesandloadsciscodevicemanager();
is called
this function call exists in preflight.js among javascript resources called transitioning between startup_report , cisco device manager(10.44.39.252/xhome.htm). i'm thinking browser cookies major part of problem.
attached source code. @ various levels of completeion
var page = require('webpage').create(); var fs = require('fs'); console.log("\n:welcome crawler scrapper:"); var url = 'http://10.44.39.252/'; page.settings.username='star'; page.settings.password='----------'; page.customheaders={'authorization': 'basic '+btoa('star:xzsawq4321')}; page.settings.useragent = 'pmg web crawler bot/1.0'; page.onnavigationrequested = function(url,type,willnavigate, main){ console.log("\n----------------------------------------------"); console.log("navigation request information:\n") console.log('trying navigate to: ' + url);//where going? console.log('caused by: ' + type); //request type console.log('will navigate: ' + willnavigate); console.log('sent page\'s main frame: ' + main); console.log("----------------------------------------------\n"); }; page.onresourceerror = function(resourceerror){ console.log("\nhold up, have errors!") console.log("resource error information: \n") console.log('resoruce errorid:' + resourceerror.id + '\nurl:' + resourceerror.url); console.log('resource error code: ' + resourceerror.errorcode + '\ndescription: ' + resourceerror.errorstring); }; page.onconsolemessage = function(msg) { console.log("the browser replied:" + msg); }; ////////////////////////////////////////////////////////////////// page.onloadstarted = function(){ console.log("loadng page...") }; page.onloadfinished = function(){ console.log("loading finished:\n"); }; ////////////////////////////////////////////////////////////////// page.viewportsize = { width: 1920, height: 1200 }; var sel = 'button1'; //dom manipulate, selector var type = 'click', //action //webpage.open page.open(url,function(status){ if(status === "success"){ page.includejs( "http://ajax.googleapis.com/ajax/libs/jquery/1.6.1/jquery.min.js",function() {//jquery syntax has been included settimeout(function(){ var t = page.evaluate(function(sel) { var = $('title').text(); return a; },0,sel); console.log("title: " + t + "\n\n"); phantom.addcookie({ cisco_devicemanager : 'value', /* required property */ sslpreference : 2, /* required property */ gettingstarted : 1 }); page.open('http://10.44.39.252/xhome.htm', function (status) { $(document).ready(function(){ console.log("your document ready:"+ document.title +"\n"); /*ajax assynchronous http request $.ajax({ async: false,//blocks ajax call, synchronous ajax request url: 'http://10.44.39.252/setup_report.htm?button1=continue', type: 'get', data: {button1: 'continue'}, success: function (out) { console.log("request sent!\n\n"); console.log(typeof(out)); $('button1').trigger(sel); console.log($('.homecontent').text); //$("button1").click(function(){ // $("input").trigger("select"); //}); }, error: function(){ console.log("nein!"); } }); */ }); }); },3000); settimeout(function() { page.render("phantomspecs1.jpg"); console.log("\nnow gtfo!") phantom.exit(); },20000); console.log("wait async...");//prints first! },0);//closes includejs doesnt operate in next open... }else{ console.log("connect fail"); phantom.exit(); } });
i need phantomjs bypass startup page , go ciscodevicemanager can render switch information. knowledge of javascript, jquery , ajax still lacking (not natively programmer landed myself coder job after college have basic concepts)
if of guys point me in right direction next step can finish task , documentation on it. no doubt valuable phantom community.(of proud part of)
sincerely, afiq abdul hamid, cyberjaya malaysia
as you're using headless browser work, logical approach use headless browser in same way normal user would. don't crazy cookie manipulation , on, it's creating more work need do.
phantomjs used automate browser interactions using javascript, need inject simple javascript interact ui.
the form displays once when user logs in should trivial deal with.
after user logs in, attempt button element, if exists, click on it.
var btn1 = document.queryselector('input[name="button1"]') if(btn1 !== null) { //continue button exists trigger click. btn1.click(); }
also you're doing scraping work, there's fantastic library called casperjs can install on top of phantomjs abstracts away lot of complexity.
Comments
Post a Comment