arrays - Simple PHP Crawler Query -
i looking build simple php web crawler, have basics want know how can continue loop through url's have found crawl more pages. adding crawled , found url's separate arrays , checking make sure ones not duplicated. have;
public function runcrawl(){ $url = 'https://www.bbc.co.uk/'; $pagelimit = 50; $crawledpages[] = $foundpages[] = $url; $document = new document($url, true); foreach($document->find('a') $link){ if(stristr($link->href, parse_url($url, php_url_host)) || strpos($link->href,"/") == '0'){ if($this->filterinternallinks($link->href) && $link->href != ''){ if(!in_array($link->href, $foundpages)){ $foundpages[] = $this->cleanurl($url, $link->href); } } } } foreach($foundpages $l){ if(!in_array($l, $crawledpages)){ $document = new document($l, true); foreach($document->find('a') $link){ if(stristr($link->href, parse_url($url, php_url_host)) || strpos($link->href,"/") == '0'){ if($this->filterinternallinks($link->href)){ if(!in_array($link->href, $foundpages)){ $foundpages[] = $link->href; } } } } $crawledpages[] = $l; } } dd($crawledpages, $foundpages); } the $this->filterinternallinks removes things # , tel: etc... , $this->cleanurl formats urls uniform e.g hrefs / converted full url. want foreach($foundpages $l) until have crawled them all? ideas of quickest way? using https://github.com/imangazaliev/didom links page , want continue use need grab other data page.
Comments
Post a Comment