PHP - Parse Large Xml Files!
I need to parse large XML files ranging in size from ~ 500 to ~ 1700 Mb.
I use XMLReader Code: [Select] set_time_limit(0); $start_time = microtime(true); include_once 'inc/Misc.php'; include_once 'inc/Database.php'; $files = array('xml/large_file.xml'); foreach($files as $file) { echo "\n"; echo 'Filename: '.basename($file)."\n"; echo 'Filesize: '.convert(filesize($file))."\n"; echo 'Start parsing...'."\n"; echo "\n"; $reader = new XMLReader(); $reader->open($file); while ($reader->read()) { switch ($reader->nodeType) { case (XMLREADER::ELEMENT): if ($reader->localName == "element-name") { $dom = new DomDocument(); $n = $dom->importNode($reader->expand(),true); $dom->appendChild($n); $sxe = simplexml_import_dom($n); $tess->file_big->insert($sxe); echo "Insert done! "; benchmark(); } } } } Everything is fine in the beginning ... Parsed file and slowly inserted my desired data, but is gradually growing memory consumption and has run out of resources. That is, I took the file to 400 Mb and as long as it is spent parsing of 2000 Mb of RAM and all the resources ran out and the script is stopped. How to deal with large files? ~ 500 to ~ 1700 Mb. Will there XML Parser? Yes, and how to apply it to my problem? Another option could have? Similar TutorialsI have an xml file thats about 20MB, I want to parse it all into an array. SimpleXML seems to have trouble processing it, I just get "Fatal error: Balloc() failed to allocate memory". I've googled this topic and can't seem to find a straight forward solution. There must be someone here who has had to do this before. Is there an easy way to parse this? I need to process a large CSV file (40 MB - 300,000 rows). While I have been working with smaller files my existing code is not able to work with large files. All I need to do is - read a particular column from file and then count total number of rows and add all the values from column. My exisitng piece of code imports whole CSV file into an array (a class is used) and then using 'ForEach' loop, reads the required column and values into another array. Once the data is in this array i can simply sum or count it. While this served me well for smaller files, i am not able to use this approach to read a larger php file. I have already increased the memory allocated to php and max_execution_time but the script just keeps on running I am no php expert but usualy get around with trial and error.......your thoughts and help will be greatly appreciated Exisiting code: Once data has been processed by initial class (Class used is freely available and known as 'parsecsv', available at http://code.google.com/p/parsecsv-for-php/) After calling the class and processing the csv file: ?php ini_set('max_execution_time', 3000); $init = array(0); //Initialize a dummy array, which hold value '0' foreach ($csv->data as $key => $col): //Get value from array, 'Data' is an array processed by class and holds csv data $ColValue = $col[SALARY']; //retrieves the column you want { $SAL= $col['SALARY']; //Column that you want to process from csv array_push ($init, $SAL); // Push value into dummy array created above echo "<pre>"; } endforeach; $total_rows = (Count($init) -1); //Count total number of value, '-1' to remove the first initilaized value in array echo "Total # of rows: ". $total_rows . "\n"; echo "Total Sum: ". array_sum($init) . "\n"; ?> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Hey Guys, I need a solution for uploading very large files. As I found PHP has some memory limits. Is it even possible to upload files with a size of 4GB? I have a flash application that talks to upload.php Say I upload a 500mb file; it will obviously take a little while to upload. Will the max_execution_time settings cause this to fail? Its set at 60 right now and the upload is obviously taking longer than 1 minute. <!doctype html> <html> <head> <meta content="text/html; charset=utf-8" http-equiv="content-type"> <title>Untitled Document</title> </head> <body> <form action="upload_image.php" method="post" enctype="multipart/form-data"> <label>select page<input type="file" name="image"> <input type="submit" name="upload"> </label> </form> <?php if(isset($_POST['upload'])) { $image_name=$_FILES['image']['name']; //return the name ot image $image_type=$_FILES['image']['type']; //return the value of image $image_size=$_FILES['image']['size']; //return the size of image $image_tmp_name=$_FILES['image']['tmp_name'];//return the value if($image_name=='') { echo '<script type="text/javascript">alert("please select image")</script>'; exit(); } else { $ex=move_uploaded_file($image_tmp_name,"image/".$image_name); if($ex) { echo 'image upload done "<br>"'; echo $image_name.'<br>'; echo $image_size.'<br>'; echo $image_type.'<br>'; echo $image_tmp_name.'<br>'; } else { echo 'error'; } } } ?> </body> </html>I make this simple script for upload files like photo , It's work correctly ,but not upload the large files ex 8MB images 12MB images Edited by Ch0cu3r, 04 October 2014 - 09:50 AM. Modified topic title - changed download to upload Hi I am running debian lenny, apache2 php5. My fail upload fails for large file sizes. A 6.2 MB file uploads file, but a 10.2Mb file fails. I have set the Max file size to 50MB and max_execution to 600 etc in php.ini, but still have the same problems. I have noticed many others having similar problems. Is there a solution? Hi I have a function that strips out lines from files. I'm handling with large files(more than 100Mb). I have the PHP Memory with 256MB but the function that handles with the strip out of lines blows up with a 100MB CSV File. What the function must do is this: Originally I have the CSV like: Code: [Select] Copyright (c) 2007 MaxMind LLC. All Rights Reserved. locId,country,region,city,postalCode,latitude,longitude,metroCode,areaCode 1,"O1","","","",0.0000,0.0000,, 2,"AP","","","",35.0000,105.0000,, 3,"EU","","","",47.0000,8.0000,, 4,"AD","","","",42.5000,1.5000,, 5,"AE","","","",24.0000,54.0000,, 6,"AF","","","",33.0000,65.0000,, 7,"AG","","","",17.0500,-61.8000,, 8,"AI","","","",18.2500,-63.1667,, 9,"AL","","","",41.0000,20.0000,, When I pass the CSV file to this function I got: Code: [Select] locId,country,region,city,postalCode,latitude,longitude,metroCode,areaCode 1,"O1","","","",0.0000,0.0000,, 2,"AP","","","",35.0000,105.0000,, 3,"EU","","","",47.0000,8.0000,, 4,"AD","","","",42.5000,1.5000,, 5,"AE","","","",24.0000,54.0000,, 6,"AF","","","",33.0000,65.0000,, 7,"AG","","","",17.0500,-61.8000,, 8,"AI","","","",18.2500,-63.1667,, 9,"AL","","","",41.0000,20.0000,, It only strips out the first line, nothing more. The problem is the performance of this function with large files, it blows up the memory. The function is: public function deleteLine($line_no, $csvFileName) { // this function strips a specific line from a file // if a line is stripped, functions returns True else false // // e.g. // deleteLine(-1, xyz.csv); // strip last line // deleteLine(1, xyz.csv); // strip first line // Assigna o nome do ficheiro $filename = $csvFileName; $strip_return=FALSE; $data=file($filename); $pipe=fopen($filename,'w'); $size=count($data); if($line_no==-1) $skip=$size-1; else $skip=$line_no-1; for($line=0;$line<$size;$line++) if($line!=$skip) fputs($pipe,$data[$line]); else $strip_return=TRUE; return $strip_return; } It is possible to refactor this function to not blow up with the 256MB PHP Memory? Give me some clues. Best Regards, I have been trying to get my files to upload onto a computer and I receive this message: Parse error: syntax error, unexpected T_STRING in /home/content/19/6550319/html/listing.php on line 27. Line 27 is how the php logs into my SQL. The problem is that I was able to log in before. I just made changes to the form by adding a dropdown menu and price and now it says it doesnt parse. Can anyone figure this out. I will include the code without the login information because the forum is public but I did put the words left out for you to see where I took out the passcodes. Code: [Select] <?php //This is the directory where images will be saved $target = "potofiles/"; $target = $target . basename( $_FILES['photo']['name']); //This gets all the other information from the form $price=$_POST['price']; $gig=$_POST['giga']; $yesg=$_POST['yesg']; $pic=($_FILES['photo']['name']); $pic2=($_FILES['phototwo']['name']); $pic3=($_FILES['photothree']['name']); $pic4=($_FILES['photofour']['name']); $description=$_POST['iPadDescription']; $condition=$_POST['condition']; $fname=$_POST['firstName']; $lname=$_POST['lastName']; $email=$_POST['email'] // Connects to your Database mysql_connect ("left out", "left out", "left out") or die(mysql_error()) ; mysql_select_db("left out") or die(mysql_error()) ; //Writes the information to the database mysql_query("INSERT INTO listing (price,giga,yesg,photo,phototwo,photothree,photofour,iPadDescription,condition,firstName,lastName,email) VALUES ('$price', '$gig', '$yesg', '$pic', '$pic2', '$pic3', '$pic4', '$description', '$condition', '$fname', '$lname', '$email')") ; //Writes the photo to the server if(move_uploaded_file($_FILES['photo']['tmp_name'], $target)) { //Tells you if its all ok echo "The file ". basename( $_FILES['uploadedfile']['name']). " has been uploaded, and your information has been added to the directory"; } else { //Gives and error if its not echo "Sorry, there was a problem uploading your file."; } echo date("m/d/y : H:i:s", time()) ?> hi all am new to this forum help me to overcome from this error Parse error: parse error, expecting `T_STRING' or `T_VARIABLE' or `T_NUM_STRING' in D:\wamp\www\quiz1\quiz1.php on line 12 Code: [Select] <?php include("contentdb.php"); $display = mysql_query("SELECT * FROM $table ORDER BY id",$db); if (!$submit) { echo "<form method=post action=$_SERVER['PHP_SELF']>"; echo "<table border=0>"; while ($row = mysql_fetch_array($display)) { $id = $row["id"]; $question = $row["question"]; $opt1 = $row["opt1"]; $opt2 = $row["opt2"]; $opt3 = $row["opt3"]; $answer = $row["answer"]; echo "<tr><td colspan=3><br><b>$question</b></td></tr>"; echo "<tr><td>$opt1 <input type=radio name=q$id value=\"$opt1\"></td><td>$opt2 <input type=radio name=q$id value=\"$opt2\"></td><td>$opt3 <input type=radio name=q$id value=\"$opt3\"></td></tr>"; } echo "</table>"; echo "<input type='submit' value='See how you did' name='submit'>"; echo "</form>"; } elseif ($submit) { $score = 0; $total = mysql_num_rows($display); while ($result = mysql_fetch_array($display)) { $answer = $result["answer"]; $q = $result["q"]; if ($$q == $answer) { $score++; } } echo "<p align=center><b>You scored $score out of $total</b></p>"; echo "<p>"; if ($score == $total) { echo "Congratulations! You got every question right!"; } elseif ($score/$total < 0.34) { echo "Oh dear. Not the best score, but don't worry, it's only a quiz."; } elseif ($score/$total > 0.67) { echo "Well done! You certainly know your stuff."; } else { echo "Not bad - but there were a few that caught you out!"; } echo "</p>"; echo "<p>Here are the answers:"; echo "<table border=0>"; $display = mysql_query("SELECT * FROM $table ORDER BY id",$db); while ($row = mysql_fetch_array($display)) { $question = $row["question"]; $answer = $row["answer"]; $q = $row["q"]; echo "<tr><td><br>$question</td></tr>"; if ($$q == $answer) { echo "<tr><td>»you answered ${$q}, which is correct</td></tr>"; } elseif ($$q == "") { echo "<tr><td>»you didn't select an answer. The answer is $answer</td></tr>"; } else { echo "<tr><td>»you answered ${$q}. The answer is $answer</td></tr>"; } } echo "</table></p>"; } ?> thanks in adavance Hello I have one problem with fwrite() I have one script to get width and height from javascript and echo it with PHP. echo "Screen width is: ". $_GET['width'] ."<br />\n"; echo "Screen height is: ". $_GET['height'] ."<br />\n"; It works but i want to store the result in a file fwrite($info,"Height: $_GET['height'] <br />"); But then I get error Parse error: parse error, expecting `T_STRING' or `T_VARIABLE' or `T_NUM_STRING //////////////////////// it would be even better if I don't have to print the values but directly store them in $iinfo sorry if something similar was soved but those examples were different or i was unable to transform it to solve my problem I am trying to put a href on a line to link me to another php module, but get an error: Parse error: parse error in C:\wamp\www\editpolicy.php on line 47 My code: while ($row2 = mysql_fetch_array($result2) ) { print "<tr><td>"; <a href='editpayment.php?payid=$row2[PaymentID]'>$row2[PaymentID]</a> .........THIS IS LINE 47 ??? print "</td><td>"; print $row2['Actioned']; print "</td><td align='right'>"; print $row2['PremiumPaidAmount']; print "</td><td align='right'>"; print $row2['Risk']; print "</td></tr>"; } So far I have managed to create an upload process which uploads a picture, updates the database on file location and then tries to upload the db a 2nd time to update the Thumbnails file location (i tried updating the thumbnails location in one go and for some reason this causes failure) But the main problem is that it doesn't upload some files Here is my upload.php <?php include 'dbconnect.php'; $statusMsg = ''; $Title = $conn -> real_escape_string($_POST['Title']) ; $BodyText = $conn -> real_escape_string($_POST['ThreadBody']) ; // File upload path $targetDir = "upload/"; $fileName = basename($_FILES["file"]["name"]); $targetFilePath = $targetDir . $fileName; $fileType = pathinfo($targetFilePath,PATHINFO_EXTENSION); $Thumbnail = "upload/Thumbnails/'$fileName'"; if(isset($_POST["submit"]) && !empty($_FILES["file"]["name"])){ // Allow certain file formats $allowTypes = array('jpg','png','jpeg','gif','pdf', "webm", "mp4"); if(in_array($fileType, $allowTypes)){ // Upload file to server if(move_uploaded_file($_FILES["file"]["tmp_name"], $targetFilePath)){ // Insert image file name into database $insert = $conn->query("INSERT into Threads (Title, ThreadBody, filename) VALUES ('$Title', '$BodyText', '$fileName')"); if($insert){ $statusMsg = "The file ".$fileName. " has been uploaded successfully."; $targetFilePathArg = escapeshellarg($targetFilePath); $output=null; $retval=null; //exec("convert $targetFilePathArg -resize 300x200 ./upload/Thumbnails/'$fileName'", $output, $retval); exec("convert $targetFilePathArg -resize 200x200 $Thumbnail", $output, $retval); echo "REturned with status $retval and output:\n" ; if ($retval == null) { echo "Retval is null\n" ; echo "Thumbnail equals $Thumbnail\n" ; } }else{ $statusMsg = "File upload failed, please try again."; } }else{ $statusMsg = "Sorry, there was an error uploading your file."; } }else{ $statusMsg = 'Sorry, only JPG, JPEG, PNG, GIF, mp4, webm & PDF files are allowed to upload.'; } }else{ $statusMsg = 'Please select a file to upload.'; } //Update SQL db by setting the thumbnail column to equal $Thumbnail $update = $conn->query("update Threads set thumbnail = '$Thumbnail' where filename = '$fileName'"); if($update){ $statusMsg = "Updated the thumbnail to sql correctly."; echo $statusMsg ; } else { echo "\n Failed to update Thumbnail. Thumbnail equals $Thumbnail" ; } // Display status message echo $statusMsg; ?> And this does work on most files however it is not working on a 9.9mb png fileĀ which is named "test.png" I tested on another 3.3 mb gif file and that failed too? For some reason it returns the following Updated the thumbnail to sql correctly.Updated the thumbnail to sql correctly. Whereas on the files it works on it returns REturned with status 0 and output: Retval is null Thumbnail equals upload/Thumbnails/'rainbow-trh-stache.gif' Failed to update Thumbnail. Thumbnail equals upload/Thumbnails/'rainbow-trh-stache.gif'The file rainbow-trh-stache.gif has been uploaded successfully. Any idea on why this is? Hello I have a simple question about file handling... Is it possible to list all files in directories / subdirectories, and then read ALL files in those dirs, and put the content of their file into an array? Like this: array: [SomePath/test.php] = "All In this php file is being read by a new smart function!"; [SomePath/Weird/hello.txt = "Hello world. This is me and im just trying to get some help!";and so on, until no further files exists in that rootdir. All my attempts went totally crazy and none of them works... therefore i need to ask you for help. Do you have any ideas how to do this? If so, how can I be able to do it? Thanks in Advance, pros echo "</select></td>\r\n </tr>\r\n \r\n <tr>\r\n <td align=\"right\">Total Views Allowed:</td>\r\n <td><input type=\"text\" id=\"ad_total_views\" style=\"width:50px;\" value=\"\" disabled=\"disabled\" /> <label><input type=\"checkbox\" checked=\"checked\" onclick=\"if(this.checked==true){$('#ad_total_views').val('');$('#ad_total_views').attr('disabled','disabled');}else{$('#ad_total_views').val('');$('#ad_total_views').removeAttr('disabled');$('#ad_total_views').focus()}\" />Unlimited</label></td>\r\n </tr>\r\n \r\n <tr>\r\n <td align=\"right\">Total Clicks Allowed:</td>\r\n <td><input type=\"text\" id=\"ad_total_clicks\" style=\"width:50px;\" value=\"\" disabled=\"disabled\" /> <label><input type=\"checkbox\" checked=\"checked\" onclick=\"if(this.checked==true){$('#ad_total_clicks').val('');$('#ad_total_clicks').attr('disabled','disabled');}else{$('#ad_total_clicks').val('');$('#ad_total_clicks').removeAttr('disabled');$('#ad_total_clicks').focus()}\" />Unlimited</label></td>\r\n </tr>\r\n </table></td>\r\n </tr>\r\n <tr bgcolor=\"#FFFFFF\">\r\n <td class=\"td_th\" align=\"center\">Smarty Code (Developer)</td>\r\n <td>{\$ads->getAdCode(<span id=\"smartycode\">1</span>)}</td>\r\n </tr>\r\n <tr bgcolor=\"#FFFFFF\">\r\n <td class=\"td_th\" align=\"center\"> </td>\r\n <td><input type=\"button\" value=\"Create New Campaign\" onclick=\"new_ad()\" /></td>\r\n </tr>\r\n </TBODY></TABLE>\r\n<br />\r\n\r\n<TABLE cellSpacing=1 cellPadding=4 width=\"100%\" border=0>\r\n <TBODY>\r\n <TR class=\"td_title\">\r\n <TD colSpan=7>Ad Campaigns</TD></TR>\r\n <TR bgColor=#ffffff>\r\n \r\n <TD width=\"10%\" align=\"center\" class=\"td_th\"> </TD>\r\n <TD width=\"4%\" align=\"center\" class=\"td_th\">ID</TD>\r\n <TD width=\"29%\" align=\"center\" class=\"td_th\">Campaign Name</TD>\r\n <TD width=\"12%\" align=\"center\" class=\"td_th\">Start Date</TD>\r\n <TD width=\"11%\" align=\"center\" class=\"td_th\">End Date</TD>\r\n <TD width=\"17%\" align=\"center\" class=\"td_th\">Viewed / Views Allowed</TD>\r\n <TD width=\"17%\" align=\"center\" class=\"td_th\">Clicked / Clicks Allowed</TD>\r\n </TR>\r\n "; $BoxSize = array("smallbox" = array("length" => 12, "width" => 10, "depth" => 2.5), "mediumbox" = array("length" => 30, "width" => 20, "depth" => 4), "largebox" = array("length" => 60, "width" => 40, "depth" => 11.5)); I am using WPSQT plugin in my blog site .I code some files in PHP also.how to add that files in plugin files.
Hi all So... I am creating an import script for putting contacts into a database. The script we had worked ok for 500kb / 20k row CSV files, but anything much bigger than that and it started to run into the max execution limit. Rather than alter this I wish to create something that will run in the background and work as efficiently as possible. So basically the CSV file is uploaded, then you choose if the duplicates should be ignored / overwritten, and you match up the fields in the CSV (by the first line being a field title row), to the fields in the database. The field for the email address is singled out as this is to be checked for duplicates that already exist in the system. It then saves these values, along with the filename, and puts it all into an import queue table, which is processed by a CRON job. Each batch of the CRON job will look in the queue, find the first import that is incomplete, then start work on that file from where it left off last. When the batch is complete it will update the row to give a pointer in the file for the next batch, and update how many contacts were imported / how many duplicates there were So far so good, but when checking for duplicity it is massively slowing down the script. I can run 1000 lines of the file in 0.04 seconds without checking, but with checking that increases to 14-15 seconds, and gets longer the more contacts are in the db. For every line it tries to import its doing a SELECT query on the contact table, and although I am not doing SELECT * its still adding up to a lot of DB activity. One thought was to load every email address in the contacts table into an array before hand, but this table could be massive so thats likely to be just as inefficient. Any ideas on optimising this process? Hi guys, I am currently receiving a large text file ( > 500mb), once per week which I have been manually splitting then processing to obtain the required CSV files. However, this is taking in the region of 2 to 3 hours. Very soon, these files will be sent daily and I really dont have the time to split and process this everyday I have been playing for a while to try and parse everything properly/automatically with fopen, feof and fgets ( and other 'f' options), but the script never seems to read the file all the way to the end - I assume this is due to memory usage. The data received in the file follows a strict pattern throughout the file which is: Code: [Select] BSNY990141112271112270100000 POO2C35 122354000 DMUS 075 O BX NTY LOLANCSTR 1132 11322 TB LIMORCMSJ 1135 00000000 LICRNFNJN 1140 00000000 H LICRNF 1141H1142H 11421142 T LISDAL 1147H1148H 11481148 T LIARNSIDE 1152H1153 11531153 T LIGOVS 1158 1159 11581159 T LIKTBK 1202 1202H 12021202 T LICARK 1206 1207 12061207 T LIULVRSTN 1214H1215H 12151215 T LIDALTON 1223 1223H 12231223 T LIDALTONJ 1225 00000000 LIROOSE 1229 1229H 12291229 T 2 LTBAROW 1237 12391 TF That is just one record of informaton (1 of around 140,000 records), each record has no fixed amount of lines but each line in each record is fixed to 80 characters and all lines in each record need to have the same unique 'id', at present, Im using an md5 hash of microtime. The first line of every record starts with 'BS' and the last line of each record starts with 'LT' terminating with 'TF'. All the other stuff between also follows a certain pattern of which I can break down effectively. The record above show one train service schedule, hence why each line in each record needs the same unique id. Anyone got any ideas on how I could process such a file effectively?? Many thanks Dave Hi All, Having issues uploading files larger than 1mb. This is what I have currently as default when I ran phpinfo() (working locally on my machine)... upload_max_filesize: 432M post_max_size: 432M memory_limit: 8M max_input_time: 60 max_execution_time: 30 I'm looking for the file to be converted into a blob, it works perfectly fine for files less than 1mb, but doesn't even run the mysql query above that. Any Ideas anyone? include("../../connect.php"); # these settings should help set_time_limit(0); # going in as a blob from now on $stamp = mktime(); $safename = $_FILES['Filedata']['tmp_name']; $filename = $_FILES['Filedata']['name']; $size = $_FILES['Filedata']['size']; $type = $_FILES['Filedata']['type']; $fk = $_REQUEST['fk']; $sqlname = $stamp . "-" . $_FILES['Filedata']['name']; # open and code in $fp = fopen($safename, 'r'); $content = fread($fp, filesize($safename)); $content = addslashes($content); fclose($fp); $insertS = "INSERT INTO $tableb (pal, afield, bfield, cfield, dfield, efield, ffield, ablob) VALUES ('6', '$fk', '$filename', '$size', '$type', '$width', '$height', '$content')"; $insertQ = mysql_query($insertS); print "1"; Hi, i am php programmer , i need help from php expert to create php apllication for large database . I have database table called "profiles" which contains millions(1.5 to 2 million) of profile of the business companies. This table has 10 fields and there is one field named as "bname" which is name of company , i made this column full-text index for full-text search . Now , i have to use this table for profile searching (using full-text search), profiles within particular cities , profiles within particular categories etc. This table contains millions of records so it will take lots of time for searching and fetching the reocrd(s) from this table. Can anybody help me that how can i manage this large table to improve the performance and fast searching with php ? Is there any other technique (algorithm) to manage large database (like facebook,twiiter,orkut)? |