PHP - Crawl Internet Through Socks Proxy
So, I have this web crawler that I want to use to index .onion websites through the TOR network.
I amOk, here's what I need. I have a PHP based web crawler. It is accessible he http://rz7ocnxxu7ka6ncv.onion/ Now, my problem is that my spider that actually crawls pages needs to do so on a SOCKS port 9050. The thing is, I have to tunnel its connection through TOR so that It can resolve .onion domains, which is what I'm indexing. (Only ending in .onion.) I call this script from the command line using php crawl.php, and I add the appropriate parameters to crawl the page. Here is what I think: Is there any way to force it to use TOR? OR can i force my ENTIRE MACHINE to tunnel things through tor, and how? (Like forcing all traffic through 127.0.0.1:9050) perhaps if i set up global proxy settings, php would respect them? If any of my solutions work, how would I do it? (Step by step instructions please, I am a noob.) I just want to crate my own TOR search engine. (Don't recommend my p2p search engines- it's not what I want for this- I know they exist, I did my homework.) Here is the crawler source if you are interested to take a look at: Perhaps someone with a kind heart can modify it to use 127.0.0.1:9050 for all crawling requests? spider.php: http://pastebin.com/kscGJCc5 spiderfuncs.php: http://pastebin.com/m5y54RUh PLEASE someone help me! I am desperate. Similar TutorialsGuys, I recently came across this about using a tor socks proxy as a default proxy server in my local home network. So, to my centos-box, I've set the service up with a default soks-port 9050 and the local ip address of this machine is 10.10.1.5. Here's a part of the tor's config file:
SocksPort 10.10.1.5:9050
[jazz@centos-box ~]$ top -u jazz | grep tor 3413 jazz 20 0 76256 32m 9720 S 0.0 0.3 0:01.55 tor [jazz@centos-box ~]$ nmap -Pn 10.10.1.5 | grep 9050 9050/tcp open tor-socks Now, I'm completely able to use that socks proxy from the centos-box with my default browser / curl or whatever you want to be, but if I go to my laptop and set the proxy-socket up to its browser, I've got a message of "TOR is not an HTTP proxy" and half or more ( not all of them ) of my bookmarks web-sites don't work. However, a message when I'm running this service says: Sep 12 13:16:01.769 [notice] You configured a non-loopback address '10.10.1.5:9050' for SocksPort. This allows everybody on your local network to use your machine as a proxy. Make sure this is what you wanted. Sep 12 13:16:01.769 [notice] Opening Socks listener on 10.10.1.5:9050 Ideas? Edited by jazzman1, 12 September 2014 - 12:39 PM. Is it possible to log into a secure page to perform a web crawl? I have the code to crawl, it's just that it's on a password protected page. Also, is this secure doing this? Or can someone potentially hack into the secure page? I'm trying to crawl for links in a specific website and show them at the end. The problem i'm facing is that it only show the links from the specific page not the whole pages in the website. I tried several loops with no success please give some advise. Here is the code: <?php if (isset($_POST['Submit'])) { function getLinks($link) { /*** return array ***/ $ret = array(); /*** a new dom object ***/ $dom = new domDocument; /*** get the HTML (suppress errors) ***/ @$dom->loadHTML(file_get_contents($link)); /*** remove silly white space ***/ $dom->preserveWhiteSpace = false; /*** get the links from the HTML ***/ $links = $dom->getElementsByTagName('a'); /*** loop over the links ***/ foreach ($links as $tag) { $ret[$tag->getAttribute('href')] = $tag->childNodes->item(0)->nodeValue; } return $ret; } /*** a link to search ***/ $link = $_POST['address']; /*** get the links ***/ $urls = getLinks($link); /*** check for results ***/ if(sizeof($urls) > 0) { foreach($urls as $key=>$value) { if (preg_match('/^(http|https):\/\/([a-z0-9-]\.+)*/i',$key)) { echo '<span style="color:RED;">' . $key .' - external</span><br >'; } else { echo '<span style="color:BLUE;">' . $link . $key . ' - internal</span><br >'; } } } else { echo "No links found at $link"; } } ?> <br /><br /> <form action="" method="post" enctype="multipart/form-data" name="link"> <input name="address" type="text" value="" /> <input name="Submit" type="Submit" /> </form> This topic has been moved to Miscellaneous. http://www.phpfreaks.com/forums/index.php?topic=317436.0 Could someone give me a cross-domain proxy script? I am trying to post data to mysql databases on two servers. using curl, ive managed to get my program to log me into a proboards site. I can view the main forum page. The problem is, the links to viewing the page is something like href="index.cgi?board=general&thread=1111&page=45" I did a str_replace to replace the index.cgi to href= "link_processor?board=general&thread=1111&page=45" The idea was that link_processor would contain the data "board=general&thread=1111&page=45" However, i now realise that the way the php would see that as 4 different get variables link processor = board=general thread = 1111 page = 45 How could i make it all part of the link_processor variable because if i can keep the string intact, i just have to pass it to a curl function and i can display the page easily! I need a proxy that would enable me to use curl with another ip address. How do I find a paid proxy server that supports curl? Hi, I'm trying to understand any how I can block all users trying to view my website through proxies. With the following code, what I have done is a quick version through php (with headers and ports) and not the firewall which isn't exactly the best way but still stops a lot of them. <?php $user_ip = $_SERVER['REMOTE_ADDR']; $headers = array('CLIENT_IP','FORWARDED','FORWARDED_FOR','FORWARDED_FOR_IP','VIA','X_FORWARDED','X_FORWARDED_FOR','HTTP_CLIENT_IP','HTTP_FORWARDED','HTTP_FORWARDED_FOR','HTTP_FORWARDED_FOR_IP','HTTP_PROXY_CONNECTION','HTTP_VIA','HTTP_X_FORWARDED','HTTP_X_FORWARDED_FOR'); foreach ($headers as $header) { if (isset($_SERVER[$header])) { header("Location: /proxy-not-allowed/"); die; } } $queryIP = "SELECT `user_ip_address` FROM `my_table` WHERE `user_ip_address` = :user_ip_address AND `user_blocked` = :user_blocked LIMIT 1"; $queryIP1 = $pdo->prepare($queryIP); $queryIP1->execute(array(':user_ip_address' => $user_ip, ':user_blocked' => 'No')); $queryIP2 = $queryIP1->rowCount(); if ($queryIP2 === 0) { $ports = array(80, 81, 553, 554, 1080, 3128, 4480, 6588, 8000, 8080); foreach ($ports as $port) { $connection = @fsockopen($user_ip, $port, $errno, $errstr, 0.1); if (is_resource($connection)) { header("Location: /proxy-not-allowed/"); die; } } } ?> The headers script blocks any proxy sending those headers while the ports script blocks those using any assigned ports I add. I have tested this which seems to be good, though it won't block all proxies due to the assigned one I have. Is this the best way to go about blocking scripts if I don't have access to the firewall? What I am trying to do is allow users to view my HTTPS website normally and block all proxies. Even if I have some users blocked, I do not want them to be cheeky and use a proxy or even register on my website through a proxy. I was thinking of just using the 443 port as my website is https (is that wise?). Any advice would be great. Edited January 4, 2019 by Cobra23 I have this PHP script to fetch whois information of domain. It works, but when I try to connect whois server via proxy, then it doesnt work. The proxy ip is taken from proxylist.hidemyass.com. What I do wrong? Thank you for help.
$server = "whois.nic.cz"; $domain = "klikzone.cz"; function QueryWhoisServer($server, $domain){ $proxy = "85.111.25.189:8080"; $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $server); curl_setopt($ch, CURLOPT_PORT, 43); curl_setopt($ch, CURLOPT_PROXY, $proxy); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); curl_setopt($ch, CURLOPT_HEADER, 1); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_TIMEOUT, 30); curl_setopt($ch, CURLOPT_CUSTOMREQUEST, $domain . "\r\n"); $data = curl_exec($ch); curl_close($ch); return $data; } Hi I need a script to hide IP address with proxy and read a web page
$username="myuser"; The script doesn't work, it doesn't show me the page output. Any solution? Hello Is there any way to find out what is type of proxy? I mean Anonymous, Transparent etc. Using Slim to route endpoints to my application. In addition, I have many endpoints (mostly accessed via xhr) which need to be forwarded to another server, and I am using Guzzle to do so. Note only do I have to transfer text/json, I also have to send and retrieve files (currently only csv files, but will later add pdf). Accomplishing this was easier than I expected, but expect I may still be doing certain portions wrong. Anything look off, especially with the multipart forms for file uploads as well as the downloading of files? Thank you <?php $app = new \Slim\App($container); //Local requests $app->get('/settings', function (Request $request, Response $response) { return $this->view->render($response, 'somePage.html',$this->bla->getData()); }); //more local endpoints... $proxyEndpoints=[ '/bla'=>['put'], '/bla/bla/{id:[0-9]+}'=>['delete','put'], '/foo/{id:[0-9]+}'=>['get','put','delete','post'], //more proxy endpoints... ]; foreach ($proxyEndpoints as $route=>$methods) { foreach ($methods as $method) { $app->$method($route, function(Request $request, Response $response) { return $this->remoteServer->proxy($request, $response); //add content type if desired. }); } } <?php class RemoteServer { protected $httpClient, $contentType; public function __construct(\GuzzleHttp\Client $httpClient, string $contentType='application/json') { $this->httpClient=$httpClient; $this->contentType=$contentType; } public function proxy(\Slim\Http\Request $request, \Slim\Http\Response $response, string $contentType=null, \Closure $callback=null):\Slim\Http\Response { $contentType=$contentType??$this->contentType; if($contentType!=='application/json' && $callback) { throw new \Exception('Callback can only be used with contentType application/json'); } $method=$request->getMethod(); $bodyParams=in_array($method,['PUT','POST'])?(array)$request->getParsedBody():[]; //Ignore body for GET and DELETE methods $queryParams=$request->getQueryParams(); $data=array_merge($queryParams, $bodyParams); ///Would be better to write slim's body to guzzle's body so that get parameters are preserved and not overriden by body parameters. $path=$request->getUri()->getPath(); $contentTypeHeader=$request->getContentType(); if(substr($contentTypeHeader, 0, 19)==='multipart/form-data'){ syslog(LOG_INFO, 'contentType: '.$contentTypeHeader); $files = $request->getUploadedFiles(); $multiparts=[]; $errors=[]; foreach($files as $name=>$file) { if ($error=$file->getError()) { $errors[]=[ 'name'=> $name, 'filename'=> $file->getClientFilename(), 'error' => $this->getFileErrorMessage($error) ]; } else { $multiparts[]=[ 'name'=> $name, 'filename'=> $file->getClientFilename(), 'contents' => $file->getStream(), 'headers' => [ //Not needed, right? 'Size' => $file->getSize(), 'Content-Type' => $file->getClientMediaType() ] ]; } } if($errors) return $response->withJson($errors, 422); $multiparts[]=[ 'name'=> 'data', 'contents' => json_encode($data), 'headers' => ['Content-Type' => 'application/json'] ]; $options=['multipart' => $multiparts]; } else { $options = in_array($method,['PUT','POST'])?['json'=>$data]:['query'=>$data]; } try { $curlResponse = $this->httpClient->request($method, $path, $options); } catch (\GuzzleHttp\Exception\RequestException $e) { //Errors only return JSON //Networking error which includes ConnectException and TooManyRedirectsException syslog(LOG_ERR, 'Proxy error: '.$e->getMessage()); if ($e->hasResponse()) { $curlResponse=$e->getResponse(); return $response->withJson(json_decode($curlResponse->getBody()), $curlResponse->getStatusCode()); } else { return $response->withJson($e->getMessage(), $e->getMessage()); } } $statusCode=$curlResponse->getStatusCode(); switch($contentType) { case 'application/json': //Application and server error messages will be returned. Consider hiding server errors. $content=json_decode($curlResponse->getBody()); if($callback) { $content=$callback($content, $statusCode); } return $response->withJson($content, $statusCode); case 'text/html': case 'text/plain': //Application and server error messages will be returned. Consider hiding server errors. $response = $response->withStatus($statusCode); return $response->getBody()->write($curlResponse->getBody()); case 'text/csv': foreach ($response->getHeaders() as $name => $values) { syslog(LOG_INFO, "headers: $name: ". implode(', ', $values)); } if($statusCode===200) { return $response->withHeader('Content-Type', 'application/force-download') ->withHeader('Content-Type', 'application/octet-stream') ->withHeader('Content-Type', 'application/download') ->withHeader('Content-Description', 'File Transfer') ->withHeader('Content-Transfer-Encoding', 'binary') ->withHeader('Content-Disposition', 'attachment; filename="data.csv"') ->withHeader('Expires', '0') ->withHeader('Cache-Control', 'must-revalidate, post-check=0, pre-check=0') ->withHeader('Pragma', 'public') ->withBody($curlResponse->getBody()); } else { return $response->withJson(json_decode($curlResponse->getBody()), $statusCode); } break; default: throw new \Exception("Invalid proxy contentType: $contentType"); } } private function getFileErrorMessage($code){ switch ($code) { case UPLOAD_ERR_INI_SIZE: $message = "The uploaded file exceeds the upload_max_filesize directive in php.ini"; break; case UPLOAD_ERR_FORM_SIZE: $message = "The uploaded file exceeds the MAX_FILE_SIZE directive that was specified in the HTML form"; break; case UPLOAD_ERR_PARTIAL: $message = "The uploaded file was only partially uploaded"; break; case UPLOAD_ERR_NO_FILE: $message = "No file was uploaded"; break; case UPLOAD_ERR_NO_TMP_DIR: $message = "Missing a temporary folder"; break; case UPLOAD_ERR_CANT_WRITE: $message = "Failed to write file to disk"; break; case UPLOAD_ERR_EXTENSION: $message = "File upload stopped by extension"; break; default: $message = "Unknown upload error"; break; } return $message; } public function callApi(\GuzzleHttp\Psr7\Request $request, array $data=[]):\GuzzleHttp\Psr7\Response { try { $response = $this->httpClient->send($request, $data); } catch (\GuzzleHttp\Exception\ClientException $e) { $response=$e->getResponse(); } catch (\GuzzleHttp\Exception\RequestException $e) { //Networking error which includes ConnectException and TooManyRedirectsException if ($e->hasResponse()) { $response=$e->getResponse(); } else { $response=new \GuzzleHttp\Psr7\Response($e->getCode(), [], $e->getMessage()); } } catch (\GuzzleHttp\Exception\ServerException $e) { //Consider not including all information back to client $response=$e->getResponse(); } return $response; } }
I'm going to write a script that determines if a proxy is good or not through cURL and I would like to know if anyone knows what qualifies a proxy as being good. Let's assume I need to do some surfing through a proxy - that's it. Is there a way in PHP to determine the 'type' of proxy? For example (elite, codeen, etc.) If the proxy page exists is that all I need to run cURL through it or otherwise consider it as being good? Or should I focus my attention towards simply going through the whole process of getting a 'dummy' page using cURL through the proxy and, should it succeed, it will be considered good. I suppose if I can avoid the latter then the script would be more efficient. Advice and suggestions are always greatly appreciated here. Hi guys, I am creating a script as I am using this to detection the proxy server levels. <?php //proxy levels //Level 3 Elite Proxy, connection looks like a regular client //Level 2 Anonymous Proxy, no ip is forworded but target site could still tell it's a proxy //Level 1 Transparent Proxy, ip is forworded and target site would be able to tell it's a proxy if(!$_SERVER['HTTP_X_FORWARDED_FOR'] && !$_SERVER['HTTP_VIA'] && !$_SERVER['HTTP_PROXY_CONNECTION']){ echo '3'; } elseif(!$_SERVER['HTTP_X_FORWARDED_FOR']){ echo '2'; } else echo '1'; ?> I want the script to check the ip that if the proxy server is a Codeen/PlanetLab and BotNet proxy servers, then place on level one and if they are safe/unsafe to use. I cannot find the code to do the methods. Please help me! Thanks in advance. I was wondering if there was a way or if it's even possible to determine the type of a proxy using php. When I say type I mean http, socks4 or socks5. Using cURL I think it's safe to assume that if a proxy returns a code of 200 then that proxy is good and http, correct? However, how would I go about determining the type of proxies I have in a list, assuming they are good and socks4 and/or socks5? Received HTTP code 403 from proxy after CONNECT Code: [Select] <?php function getPage($proxy, $url, $referer, $agent, $header, $timeout) { $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_HEADER, $header); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_PROXY, $proxy); curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, 1); curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout); curl_setopt($ch, CURLOPT_REFERER, $referer); curl_setopt($ch, CURLOPT_USERAGENT, $agent); $result['EXE'] = curl_exec($ch); $result['INF'] = curl_getinfo($ch); $result['ERR'] = curl_error($ch); curl_close($ch); return $result; } $result = getPage( '89.106.13.93:80', // use valid proxy 'http://www.northplanet.co.uk', 'http://www.youtexv.com/', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.8) Gecko/2009032609 Firefox/3.0.8', 1, 5); print_r ($result); ?> I have some videos stored above web root. I then have a PHP file that acts as a proxy to hand the videos to the user. The reason for this is so not just anybody can download the videos. They have to be authenticated. If I am on the PC, it prompts for a download so I know the script is working. When I try it on my iPad, I get the big play button but with a cross through it. When I try it on my iPhone, the movie interface loads, but I am then prompted with "Cannot play movie: The server is not correctly configured" What could that be referring to? My script is below: <?php include("config.php"); //just includes session_start and db connection if ($_SESSION['user']['authed'] == true) { session_write_close(); $id = $_GET['id']; $query = mysql_query("SELECT filename FROM episodes WHERE id = '$id'"); $row = mysql_fetch_array($query); $filename = "../../media/".$row['filename']; header( 'Content-Description: File Transfer' ); header( 'Content-Type: application/octet-stream' ); header( 'Content-Disposition: attachment; filename='.basename( $filename ) ); header( 'Content-Transfer-Encoding: binary' ); header( 'Expires: 0' ); header( 'Cache-Control: must-revalidate, post-check=0, pre-check=0' ); header( 'Pragma: public' ); header( 'Content-Length: ' . filesize( $filename ) ); ob_clean(); flush(); readfile( $filename ); exit; } ?> hi
i want to use proxy in php with curl for scraping contet .but some proxy not suport post request .
plz tell me how to chek before use proxy post request suported or not also want proxy speed in ms..
plz help me out .
thanks .
Edited by ShivaGupta, 23 May 2014 - 04:49 PM. Hello, hello i need use curl with proxy. but i get "undefined variable offset 1" error . here is the standart code i use. please help me about it thanks curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_HEADER, $header); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_PROXY, $proxy); curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, 1); curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout); curl_setopt($ch, CURLOPT_REFERER, $referer); curl_setopt($ch, CURLOPT_USERAGENT, $agent); Cannot Get IP Address Behind A Proxy Server.I am using $aph = apache_request_headers(); print $aph['PC-Remote-Addr'] . "<br/>"; print "HTTP_CLIENT_IP: " . $_SERVER["HTTP_CLIENT_IP"] . "<br/>"; print "HTTP_X_FORWARDED_FOR: " . $_SERVER["HTTP_X_FORWARDED_FOR"] . "<br/>"; print "REMOTE_ADDR: " . $_SERVER["REMOTE_ADDR"] . "<br/>"; print "CLUSTER CLIENT " . $_SERVER["HTTP_X_CLUSTER_CLIENT_IP"] . "<br/>"; exit; but does not give output.Please suggest me how to get ip behind proxy server. |