arrays - Simple PHP Crawler Query -


i looking build simple php web crawler, have basics want know how can continue loop through url's have found crawl more pages. adding crawled , found url's separate arrays , checking make sure ones not duplicated. have;

    public function runcrawl(){         $url = 'https://www.bbc.co.uk/';         $pagelimit = 50;          $crawledpages[] = $foundpages[] = $url;          $document = new document($url, true);          foreach($document->find('a') $link){             if(stristr($link->href, parse_url($url, php_url_host)) || strpos($link->href,"/") == '0'){                 if($this->filterinternallinks($link->href) && $link->href != ''){                     if(!in_array($link->href, $foundpages)){                         $foundpages[] = $this->cleanurl($url, $link->href);                     }                 }             }         }          foreach($foundpages $l){             if(!in_array($l, $crawledpages)){                 $document = new document($l, true);                 foreach($document->find('a') $link){                     if(stristr($link->href, parse_url($url, php_url_host)) || strpos($link->href,"/") == '0'){                         if($this->filterinternallinks($link->href)){                             if(!in_array($link->href, $foundpages)){                                 $foundpages[] = $link->href;                             }                         }                     }                 }                 $crawledpages[] = $l;             }         }          dd($crawledpages, $foundpages);     } 

the $this->filterinternallinks removes things # , tel: etc... , $this->cleanurl formats urls uniform e.g hrefs / converted full url. want foreach($foundpages $l) until have crawled them all? ideas of quickest way? using https://github.com/imangazaliev/didom links page , want continue use need grab other data page.


Comments

Popular posts from this blog

c# - Update a combobox from a presenter (MVP) -

How to understand 2 main() functions after using uftrace to profile the C++ program? -

How to put a lock and transaction on table using spring 4 or above using jdbcTemplate and annotations like @Transactional? -