PHP - Some Questions About Site Crawling.
Hello everyone,
I developed a php application that crawls a site and generates an xml sitemap with the gathered information. It works, but as of now I am using brute force tactics. I have a class that crawls and stores the links in a tree by returning the file_get_contents and using preg match to find the a tags. Is there a quicker method? I've seen people talking about cURL but i don't know if that will make my program any better. My application seems to get results a bit quicker than some others I have seen. My main concern comes with the sorting. Is there a way to tell if a link on a page is an rss feed or like a downloadable image or zip file or something? For files, I explode at '/' and check the last array key for a '.' , then I check it against an array of file names I think I want to include. For feeds I just check the explode array for feed, feeds , rss or ?feed=rss2 is in the array before storage. This works fine for sites I administer and wordpress sites, but it could filter out a cooking site link or something with a feed directory. It also seems like it is one of the most time consuming parts. I think what I am trying to ask... is there a good way to filter these results? Will cURL or anything else let me check for actual pages and filter out .mp3 files and all the other junk you don't want in a sitemap? Thanks in advance for your time. Similar TutorialsHello, I had made a website (PHP) for a music company few years back. They basically sell songs from their site. Now they've faced a problem. Their site has been crawled by abmp3.com and they not only let user download songs from our site but also give a full path of a song to download. I wonder how this happened and how to stop them from crawling our site. And its likely some other site doing same too. So please anybody help me how to overcome this problem, may be some PHP code can do this??. Thanks watsmyname I'm using cURL to crawl and scrape data from a website. This website contains tables with rows of data. When I send a cURL POST for the underlying data at a specific row(A), it will return the expected data. But when I move to the second row(B), the data returns blank or specifically, a tons of spaces (or &#nbsp's.) When I access the cURL's POST location by browser, I can see (B)'s data. The only difference in the 2 POST's are location ID's for the data. I don't think it's a problem with JavaScript as I can successfully return data from row (A) as I mentioned. I have set up a site for a local orchestra. I have a simple mysql db from which 'News' items are pulled using php. I have set up the following pages: 1. view_news.php 2. insert_news_item.php 3. delete_news_item.php 4. edit_news_item.php All pages are working fine. Now, rather than continually having to do news updates for the orchestra myself, I want to give a person on the orchestra committee access to the above pages so that he/she can do news updates for the orchestra, as required. What is the best way of proceeding from here? I thought about creating a password protected directory and putting pages 2, 3 and 4 above into that directory and then giving the committee member the protected directory password. Is that the way to go? What is the conventional way of doing this sort of thing? Two things to note: I'm new to php and the job is non-paying. I've set up the site as the willing parent of kids in the orchestra. Any advice will be much appreciated. I have several "sites" located in my html directory, and each has a "general" access point and an "administrator" access point:
/var/www/html/site1/index.php /var/www/html/site1/administrator/index.php /var/www/html/site2/index.php /var/www/html/site2/administrator/index.php /var/www/html/site3/index.php /var/www/html/site3/administrator/index.phpAll sites are similar except that data will be specific to site1, site2, or site3, etc. Users who log onto /var/www/html/siteX/index.php are totally unrelated to those who logon to /var/www/html/siteX/administrator/index.php, will have different logon credentials, are stored in different DB tables, and each should have their own session. If a user logs off of either the general or administrator site, it should not effect the other site even if they were previously logged on to both on the same PC (and of course not effect other sites). When a user logs off, I would like to destroy their previous cookie and associated session. Users for either will only use https. I am using Apache to rewrite https://www.mysite.com/ to https://mysite.com/. While I named the administrator site "administrator" above, the administrator user has the ability to change the directory name. I am thinking I need to use session_set_cookie_params to specify where I wish the session cookie to be stored since /var/www/html/siteX/administrator/index.php is a sub-directory to /var/www/html/siteX/index.php, but am not really sure. Sorry for the cryptic post, but I am not very well versed in this subject. How would you recommend setting up cookies/sessions for this scenario? Thank you now i use this code to show where the visitors came from to my site. <?php $referer=$_SERVER['HTTP_REFERER']; echo $referer; ?> now, i want to show the 5 latest vistors referer's site url on my site ? Not sure if I'm trying to achieve something totally crazy here, or if this is something pretty standard. Didn't have much luck with searching as I'm not fully down with all the terms. (A) I have one site providing an RSS feed. (B) I have one site I want to search, once for each of the items in the feed A. (C) I want the results of the search in (B) to be displayed on page (C). So for example, the feed on (A) says; apples bananas oranges cheese I want site (B) to search for each of those terms (by passing the item in the feed (A) to the ?search= part of the URL of that page) and then show the results from THAT search on page C. Bit of a complex one, let me know if you need me to clarify. Thanks for any help! Hi, My first post here is a cry for help I have a Windows 2003 server running IIS6/PHP5, the server hosts multiple web sites. The problem is include files that are for site A are showing on site B (each site having its own includes as part of the site files in its own site folder), though not every time, its very random, sometimes the correct includes show, sometimes ones from another site on the same server. This only occurs where the include files for both sites have the same name, such as 'inc-header.php' for example. I can only assume PHP is caching includes and because they have the same name is showing the wrong one on other sites sometimes, if I rename them to something unique then the problem goes away, but its not a practical solution to rename all include files to unique names so I find myself looking for a 'real' fix. I have a feeling its to do with the include_path in the php.ini, but right now its disabled with a semi-colon, and I don't want to set one as I have no global includes, all includes are site specific. Any help would be very much appreciated! Phil I'm currently running a classified ads site and planning to display my own content from database combined with and external site rss. So here is what i got right now after the db query for the jobs ads (procedural php),
while ($row = mysqli_fetch_array($results, MYSQLI_ASSOC)){ echo '<div class="media margin-none"> <a class="pull-left bg-inverse innerAll text-center" href="#"><img src="'.$foto.'" share_alt="" width="100" height="100"></a> <div class="media-body innerAll"> <h4 class="media-heading innerT"> <a href="' . $row['title'] .'-da' . $row['id_ad'] . '" class="text-inverse">'. $remuneracion .' ' . substr(ucfirst(strtolower($row['title'])), 0, 53) . '</a> <small class="pull-right label label-default"><i class="fa fa-fw fa-calendar-o"></i> ' . $row['date_created'] . '</small></h4> <p>' . substr(ucfirst(strtolower($row['description'])), 0, 80) . ' ...</p>'; echo '</div> </div> <div class="col-separator-h"></div>'; } echo pagination($statement,$per_page,$page, $url_filtros, $filtros); ?>it is the while loop that i use to display ads from my database, what could be the best way to display (in this same loop?) other site's rss feed so i can show my content combined with the external rss? Thanks Transferring data from sub-domain.site.com Reading sub-domain.site.com What is this all about? I'm going to put all .. images into a separate sub-domain eg: images.site.com. This would create a folder inside my public_HTML called "images" Now when sites have that Transferring data, and Reading... is this .. something relating to what I want. Facebook also does it, and they get their images for the site from a sub domain, how is it all done? I'm not sure if its entirely PHP, but I hope someone can help. Thanks Hi I made a new design for my website and I made some changes. I want to use layout for my second site.
I'll like to know if my site is easier to browse through now and if you like the design better?. I test my site on internet explorer, chrome, and firefox. It is best to use site on better browsers like firefox and chrome to get a better experience of site. Thanks.
http://adjade.com
Hello everyone, I have been doing web developing for a little while and just recently decided to make the leap to developing standalone applications. I started learning C++ and JAVA for this purpose, but quickly learned that PHP can also be used to this end, and since I am quite familiar with PHP, I thought it would make sense to start with PHP GTK. But before I jump right in, I have a few questions that I would greatly appreciate some answers for: Does PHP have any significant advantages/disadvantages over lower level languages such as C++ ? I would imagine that PHP being originally designed for web programming would be less suited for stand alones. I'm a little confused as to whether the GTK is a graphical user interphase software, the likes of QT and Netbeans, or is it a markup language like HTML, where the widgets are generated with text commands? Please I need a little clarification on that. Also are there any other tools that need to be downloaded to get started besides the GTK? Finally, am I supposed to learn OOP PHP to get going or is traditional procedural PHP sufficient? Answers to any or all of the above questions and any other advice would be highly appreciated. Thanks. I've been coding PHP for some time and would consider myself to be at an intermediate level. I can write code to do what I need but it's probably not the best way to do it. I rarely see any code that I am not able to read, understand, or follow. I've created modifications for everything from vBulletin, WordPress, Kayako Support Suit, Magento, and more. However, I've never really built a strong understanding around OOP. For example, let say you have the following classes: _main - db - admin - - modules - - - dashboard How would you share the db connection with the dashboard class? I've been trying to read up on Dependency Injections and Singletons but I haven't found an article that has explained it on a level that I can understand. I get a feeling that most people who use OOP in PHP have a background in Java or C++ and are much more familiar with everything. Could someone please explain this to me in simple terms or link me to an extremely well explained article that I'd be able to understand without a background in computer science? Thanks Hi. I'll like to ask few questions about PHP, as I think they are related to it.
I've came across some webpages, what I've spotted is that a webpage displays content but each "page" has different argument and there is no filename.
For example:
"http://www.website.com/?home" is home-like webpage, by changing "/?home" to "/?anotherpage" land me on some other webpage on their website and so on. My question is how is it done? Is it done from PHP?
Another question I wanted to ask is.. I went on InvisionPower.Board forum (such as this PHP Freaks ). How to force "folders" to be displayed as "files"?
For example:
"http://forums.phpfre...ks-on-facebook/" which links to a thread.
Thanks in advance
Howdy, I am new to SEO. Could you please help me? 1. I like to post programming tutorials to both my website & various programming forums. Is that going to screw up my website's ranking in Google? 2. My editor of my history website who sometimes posts essays there also posts them in some history forums. Is that bad for SEO? 3. I made a Facebook page for my history website. It says there "Promote your page" basically you pay $5 for around 100 likes. Has anyone tried that? Does it work? Because $5 seems like little money for additional 100 likes which will increase the traffic considerably. Thank you so much for the help! I basically have a picture uploading system for users. I have two questions: 1) What CHMOD should I use for the folders that are there for uploading files to? Currently it is 755, but I want it to be accessible and safe. 2) When I use the standard mkdir() function to create folders in my main parent folder, the folders don't get created. Is this because my parent folder is CHMOD 755? Thanks First what are the possible $_FILES['file']['type'] s of .zip. I know one is "application/x-zip-compressed" but are there any others (basically I need to check if the uploaded file is a .zip? Second question; how could I extract the contents of .zip to a directory on the server without the use of FTP? Thanks Good day, I have 2 questions about that. Here is the context. I have a list of items that i query from a database and insert in a table. The last field of the table is a input box to typpe in the quantity ("qty"). My first question, how can I associate the inputbox to product_id from the database for that item. Code: [Select] //database connecting is working <? while($row = mysql_fetch_array($result)) { echo "<tr><td>" . $row['product_code'] . "</td><td>" . $row['product_name'] . "</td><td><input type='text' maxlength = '3' value='0'></td></tr>"; } And my second question (any tutorial reference) about how to select only the items that the qty is not = 0 and pass it to another page either by sessions or other means. Thank you I am trying to create a simple form for inserting to a database table. It seems the data from the form are passed through the "insert" script, but nothing is added to the table. My question is: what is required to insert new data to a table? must all fields have a value for the new data to be added? |