PHP - Working With Large Datasets
I have an application which takes some time and would like to take steps to improve execution speed. Sample data is provided as JSON and is as follows where the values array has few columns and many rows. My desired outcome is three PHP objects for mean_P51, mean_P55, and max_P56 which all have a reference to the time array as well as their own values array. { "name": "L2", "columns": ["time", "mean_P51", "mean_P55", "max_P56"], "values": [ ["2020-06-13T14:02:02Z", 4.3527550826446255, 5.668302919254657, 0.6175362252066116], ["2020-06-13T14:02:12Z", 4.472219604166665, 5.493282520833331, 0.6095558604166668], ["2020-06-23T14:02:22Z", 4.332343173277662, 5.477678517745302, 0.6014520167014615], ... ["2020-06-23T14:02:22Z", 4.272219604166665, 5.468302919254657, 0.6195558604166668] ] } Originally, I thought that it would be more efficient to iterate over the big values array once and on each iteration process each column (I envision myself walking one mile and snapping my fingers three times each foot, or walking three miles and snapping my fingers once every foot). What I witness, however, is it is faster to iterate over the big values array multiple times to generate each object, and I've done a few simple tests comparing iterating big loops within little loops to little loops within big loops, and get similar results. I guess this makes sense I am not really stopping when I snap my fingers and if I did walking three miles would likely be quicker. Is this expected behavior? If there is too much data, I've needed to use a JSON stream parser and either a generator or iterator. I haven't tested it yet, but expect a generator would be more efficient than an iterator and using PHP's built-in json_decode() and an array would be more efficient than either a generator or iterator. Think I need to test this hypothesis or is it likely correct? Any other general strategies one should take when working with large datasets? Thanks Similar TutorialsWordpress gave me the following error when uploading the this themeforest theme. failed opening '/srv/htdocs/wp-content/themes/QuickFixWebiste(Wordpress)/../../../wp-admin/includes/class-walker-nav-menu-edit.php' for inclusion (include_path='.:/usr/local/php7.2/lib/php') in /srv/htdocs/wp-content/themes/QuickFixWebiste(Wordpress)/library/core/admin/edit-menu-walker.php Can anyone tell me how to resolve this? The source code in the edit-menu-walker.php on lines 1-4 is: <?php if (!class_exists('Walker_Nav_Menu_Edit')){ include_once(get_template_directory() . '/../../../wp-admin/includes/class-walker-nav-menu-edit.php'); }
Hi all So... I am creating an import script for putting contacts into a database. The script we had worked ok for 500kb / 20k row CSV files, but anything much bigger than that and it started to run into the max execution limit. Rather than alter this I wish to create something that will run in the background and work as efficiently as possible. So basically the CSV file is uploaded, then you choose if the duplicates should be ignored / overwritten, and you match up the fields in the CSV (by the first line being a field title row), to the fields in the database. The field for the email address is singled out as this is to be checked for duplicates that already exist in the system. It then saves these values, along with the filename, and puts it all into an import queue table, which is processed by a CRON job. Each batch of the CRON job will look in the queue, find the first import that is incomplete, then start work on that file from where it left off last. When the batch is complete it will update the row to give a pointer in the file for the next batch, and update how many contacts were imported / how many duplicates there were So far so good, but when checking for duplicity it is massively slowing down the script. I can run 1000 lines of the file in 0.04 seconds without checking, but with checking that increases to 14-15 seconds, and gets longer the more contacts are in the db. For every line it tries to import its doing a SELECT query on the contact table, and although I am not doing SELECT * its still adding up to a lot of DB activity. One thought was to load every email address in the contacts table into an array before hand, but this table could be massive so thats likely to be just as inefficient. Any ideas on optimising this process? Hi guys, I am currently receiving a large text file ( > 500mb), once per week which I have been manually splitting then processing to obtain the required CSV files. However, this is taking in the region of 2 to 3 hours. Very soon, these files will be sent daily and I really dont have the time to split and process this everyday I have been playing for a while to try and parse everything properly/automatically with fopen, feof and fgets ( and other 'f' options), but the script never seems to read the file all the way to the end - I assume this is due to memory usage. The data received in the file follows a strict pattern throughout the file which is: Code: [Select] BSNY990141112271112270100000 POO2C35 122354000 DMUS 075 O BX NTY LOLANCSTR 1132 11322 TB LIMORCMSJ 1135 00000000 LICRNFNJN 1140 00000000 H LICRNF 1141H1142H 11421142 T LISDAL 1147H1148H 11481148 T LIARNSIDE 1152H1153 11531153 T LIGOVS 1158 1159 11581159 T LIKTBK 1202 1202H 12021202 T LICARK 1206 1207 12061207 T LIULVRSTN 1214H1215H 12151215 T LIDALTON 1223 1223H 12231223 T LIDALTONJ 1225 00000000 LIROOSE 1229 1229H 12291229 T 2 LTBAROW 1237 12391 TF That is just one record of informaton (1 of around 140,000 records), each record has no fixed amount of lines but each line in each record is fixed to 80 characters and all lines in each record need to have the same unique 'id', at present, Im using an md5 hash of microtime. The first line of every record starts with 'BS' and the last line of each record starts with 'LT' terminating with 'TF'. All the other stuff between also follows a certain pattern of which I can break down effectively. The record above show one train service schedule, hence why each line in each record needs the same unique id. Anyone got any ideas on how I could process such a file effectively?? Many thanks Dave Have a client who has a Wordpress site that is running slowly and wants performance improved. It is averaging on the order of 7 seconds or so to render basic pages. No video, few images, etc. The database has about 1 million records. my primary question is, "When does Wordpress hit a point that the db is too large and will slow down a site?" i understand there is no definitive answer, just trying to get a ballpark. its noteworthy that about 700,000 of those records are in a table for the REDIRECTION plugin. ive deactivated that plugin to eliminate those records as a factor.
as an fyi to anyone with any other ideas on how to speed the site up
- ive deactivated all plugins with no effect
- activated Twentyfourteen theme with no improvement ( to eliminate theme issues )
- uploaded a php page outside of Wordpress and it rendered immediately.
I have a flash application that talks to upload.php Say I upload a 500mb file; it will obviously take a little while to upload. Will the max_execution_time settings cause this to fail? Its set at 60 right now and the upload is obviously taking longer than 1 minute. I have a soap client written up that works well on small requests, but on large ones I face out of memory issues. I want to be able to write the response directly to a file as an xml document, so I can use something like xpath. When I try this even on small responses it seems to go into a loop: $infile = $client->Retrieve($criteria); $outfile = fopen('./sites/all/modules/cvent/data.txt', 'w'); while (!feof($infile)) { fwrite($outfile, fread($infile, 2048)); } how can I put the data into a file straight from the response and still be economical on memory? Hello All This was an online test, which I didn't do very well with, and am looking for guidance of where I went wrong. The idea is you'd have a matrix $A with values -1, 0, 1 in each position, in rows M and column N where M,N can be up to 1000. The code is evaluated $k times where $k can be up to 1000 as well. Starting at $A[0][0], evaluate the matrix where -1 means you go down a row and +1 means you go right one column. A 0 value means you continue along your previous direction. After exiting a position, that positions value is multiplied by -1: 1 = (-1); (-1) = 1; 0 = 0 Once you exit the matrix either on the right or below, you stop and start over The goal of the code is to return how many times you exited the bottom right corner, ending up below (not to the right) the matrix; When your spot is [M], [N+1] Here is the code I wrote: Code: [Select] function eval_matrix( $A, $k ) { $x = 0; $y = 0; // dir is x or y for up/down // $dir = "y"; $ball_count = 0; // Get columns // $maxX = count($A); // Get rows // $maxY = count($A[0]); // for $k times for($i=0; $i<$k; $i++) { // while we're within the matrix boundaries while($x < $maxX && $y < $maxY) { // get the value of our current position $mode = $A[$x][$y]; // evaluate the value 0,1,-1 switch($mode) { case 0: if($dir =="y") { $y++; } else { $x++; } break; case -1: // Change position value $A[$x][$y] = 1; $y++; $dir = "y"; break; case 1: // Change position value $A[$x][$y] = -1 $x++; $dir = "x"; break; } } if($x == $maxX-1 && $y == $maxY) { $ball_count++; } $x = 0; $y = 0; $dir = "y"; } return $ball_count; } It may be fairly ugly, but I'm looking for help with how to properly code this type of solution. Any pointers, resources, suggestions are greatly appreciated! Hey everybody, I have a script that I think should be working, but it's not...go figure Here's the snippet that is causing and epic fail: <?php foreach($Array as $key => $value){ $$value = $key; } $checkVal = 'someValue'; $output = isset($$checkVal) ? TRUE : FALSE; ?> As you can see it basically sets the value of the array element to a var var and then checks agains an input word. If the input word matched the varName of a set variable, we can then assume that word was in the array and return TRUE. Pretty straight forward and I've tried about 3 different approaches to this, including: in_array and flipping and checking for isset(array['value']). The array that is being checked against is usually upwards of 15000 elements. I would appreciate any knowledge that helps understand any issues in searching large arrays and good ways to get around them, or if it's just an error in my coding/logic, let me know! Thank You all in advance. E The script in question works perfectly on my WAMP installation. It is designed to help a computer-challenged historian publish to the web using text CSV files without her having to use FTP or edit html files. The largest data set is about 400KB. The script is using TEXTAREA form input and uploading via POST. Smaller files upload okay. The larger ones fail with a blank screen (empty html) and usually only a few hundred bytes missing. Example: 380KB data POST fails with 367KB stored on remote host. No error message is saved to the remote host directory. I suspected suhosin.post.max_value_length as it is set to 64K, but more than four times that amount is being stored on the remote host. Can suhosin.post.max_value_length still be the problem? The remote host is running: PHP 5.2.5 Apache 2.2.11 Linux O/S PHPINFO(): max_execution_time - 30 max_input_time - 60 memory_limit - 64M post_max_size - 16M upload_max_filesize - 16M suhosin.post.max_value_length - 65384 Any suggestions much appreciated. Hi guys
Been a while since I've been on here, completely lost my login details for my old account which is a shame because it was a well established one but hey ho that's life.
Anyhow, I've starting a project as of tomorrow and I was wondering if I could get some advice on the best methods/routes to take.
The Brief
+ 3 full websites
+ no single admin
+ one main admin
The Plan
Once all 3 websites are complete none of them will have the standard admin panel that the average website has. All 3 websites (this number is going to grow in time too) will have one central admin held of a separate and isolated server which in essence will *remotely* administrate the websites I connect to it.
The Question
I have never done a project with multiple websites that run from a central admin. My questions are if anyone could shine some light....
Should I make the admin panel audit the website selected by sending commands through api, crud, rest or simply by direct db access?
What is the most secure way of doing this central admin (bare in mind the admin panel will have a minimum of 10 different admin levels/permission sets)?
How should I have the database layout/hierarchy for this (3 completely separate site - again this will grow + a central admin site)?
Last question (that I can think of) - how much of a mammoth task really is this?
Any help would be greatly appreciated!
Thanks guys
James
Edited by jamesmp, 09 July 2014 - 06:12 PM. Hey Guys, I need a solution for uploading very large files. As I found PHP has some memory limits. Is it even possible to upload files with a size of 4GB? Hi All, Having issues uploading files larger than 1mb. This is what I have currently as default when I ran phpinfo() (working locally on my machine)... upload_max_filesize: 432M post_max_size: 432M memory_limit: 8M max_input_time: 60 max_execution_time: 30 I'm looking for the file to be converted into a blob, it works perfectly fine for files less than 1mb, but doesn't even run the mysql query above that. Any Ideas anyone? include("../../connect.php"); # these settings should help set_time_limit(0); # going in as a blob from now on $stamp = mktime(); $safename = $_FILES['Filedata']['tmp_name']; $filename = $_FILES['Filedata']['name']; $size = $_FILES['Filedata']['size']; $type = $_FILES['Filedata']['type']; $fk = $_REQUEST['fk']; $sqlname = $stamp . "-" . $_FILES['Filedata']['name']; # open and code in $fp = fopen($safename, 'r'); $content = fread($fp, filesize($safename)); $content = addslashes($content); fclose($fp); $insertS = "INSERT INTO $tableb (pal, afield, bfield, cfield, dfield, efield, ffield, ablob) VALUES ('6', '$fk', '$filename', '$size', '$type', '$width', '$height', '$content')"; $insertQ = mysql_query($insertS); print "1"; Hi Guys, Theres always been one issue that I just cannot overcome (maybe because I keep pushing it to the backburner). Whenever debugging other peoples code I do the usual var_dump() or print_r() or vars which (sometimes) displays a monolithic array. I just find it very intimdating and can never really figure out whats going on. Is anyone else like this ? Any good reads on how to 'decode' such large arrays ? Hi, i am php programmer , i need help from php expert to create php apllication for large database . I have database table called "profiles" which contains millions(1.5 to 2 million) of profile of the business companies. This table has 10 fields and there is one field named as "bname" which is name of company , i made this column full-text index for full-text search . Now , i have to use this table for profile searching (using full-text search), profiles within particular cities , profiles within particular categories etc. This table contains millions of records so it will take lots of time for searching and fetching the reocrd(s) from this table. Can anybody help me that how can i manage this large table to improve the performance and fast searching with php ? Is there any other technique (algorithm) to manage large database (like facebook,twiiter,orkut)? I need to process a large CSV file (40 MB - 300,000 rows). While I have been working with smaller files my existing code is not able to work with large files. All I need to do is - read a particular column from file and then count total number of rows and add all the values from column. My exisitng piece of code imports whole CSV file into an array (a class is used) and then using 'ForEach' loop, reads the required column and values into another array. Once the data is in this array i can simply sum or count it. While this served me well for smaller files, i am not able to use this approach to read a larger php file. I have already increased the memory allocated to php and max_execution_time but the script just keeps on running I am no php expert but usualy get around with trial and error.......your thoughts and help will be greatly appreciated Exisiting code: Once data has been processed by initial class (Class used is freely available and known as 'parsecsv', available at http://code.google.com/p/parsecsv-for-php/) After calling the class and processing the csv file: ?php ini_set('max_execution_time', 3000); $init = array(0); //Initialize a dummy array, which hold value '0' foreach ($csv->data as $key => $col): //Get value from array, 'Data' is an array processed by class and holds csv data $ColValue = $col[SALARY']; //retrieves the column you want { $SAL= $col['SALARY']; //Column that you want to process from csv array_push ($init, $SAL); // Push value into dummy array created above echo "<pre>"; } endforeach; $total_rows = (Count($init) -1); //Count total number of value, '-1' to remove the first initilaized value in array echo "Total # of rows: ". $total_rows . "\n"; echo "Total Sum: ". array_sum($init) . "\n"; ?> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- I need to parse large XML files ranging in size from ~ 500 to ~ 1700 Mb. I use XMLReader Code: [Select] set_time_limit(0); $start_time = microtime(true); include_once 'inc/Misc.php'; include_once 'inc/Database.php'; $files = array('xml/large_file.xml'); foreach($files as $file) { echo "\n"; echo 'Filename: '.basename($file)."\n"; echo 'Filesize: '.convert(filesize($file))."\n"; echo 'Start parsing...'."\n"; echo "\n"; $reader = new XMLReader(); $reader->open($file); while ($reader->read()) { switch ($reader->nodeType) { case (XMLREADER::ELEMENT): if ($reader->localName == "element-name") { $dom = new DomDocument(); $n = $dom->importNode($reader->expand(),true); $dom->appendChild($n); $sxe = simplexml_import_dom($n); $tess->file_big->insert($sxe); echo "Insert done! "; benchmark(); } } } } Everything is fine in the beginning ... Parsed file and slowly inserted my desired data, but is gradually growing memory consumption and has run out of resources. That is, I took the file to 400 Mb and as long as it is spent parsing of 2000 Mb of RAM and all the resources ran out and the script is stopped. How to deal with large files? ~ 500 to ~ 1700 Mb. Will there XML Parser? Yes, and how to apply it to my problem? Another option could have? Hello, Im trying to find a way to check around 500-600 links to check if they are alive. It works fine for 5-6 links but once i add more links it just times out. Is there a way i could process this so it does 1 link at a time or somthing ? <?php include("config.php"); $query = "SELECT * FROM `games` WHERE `r_fileserve` <> \"\" LIMIT 500"; $result = mysql_query($query); while($row=mysql_fetch_assoc($result)) { $link_str = file_get_contents("$row[r_fileserve]"); $pattern = '<input type="hidden" name="download" value="normal"/>'; preg_match($pattern,$link_str,$match); if ($match[0] != null) { echo "Working <br />"; } else { echo "File Down <br />"; } } ?> I want to allow for a user to upload any photo that they might have taken from their camera. I can't get photo's with large file sizes to upload. I have changed the setting in the php5.ini and set the settings extremely high. This has always worked for me before. I also have changed the code on the form. <input type="hidden" name="MAX_FILE_SIZE" value="99000000" /> here is the code for the php5.ini register_globals = on allow_url_fopen = on expose_php = Off max_input_time = 500 variables_order = "EGPCS" extension_dir = ./ upload_tmp_dir = /tmp precision = 12 SMTP = relay-hosting.secureserver.net url_rewriter.tags = "a=href,area=href,frame=src,input=src,form=,fieldset=" [Zend] zend_extension=/usr/local/zo/ZendExtensionManager.so zend_extension=/usr/local/zo/4_3/ZendOptimizer.so register_long_arrays = on max_file_uploads = 8M post_max_size = 8M Maybe the problem is not in the php5.ini ? I am using Guzzle as a HTTP client, and the following script results in the following error: $response = $this->httpClient->request('GET', "http://$this->host:$this->port/query", ['query' => $data]); $body = $response->getBody(); $rs=json_decode($body, true); Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 136956008 bytes) in /var/www/vendor/guzzlehttp/psr7/src/Stream.php on line 80 What are the work arounds? Instead of trying to convert it into an array all at once, how can I do so in pieces? I know the expected format so it seems I will need to read just the appropriate amount of bytes and then decode parts at a time. Seems like a pain. Are there any classes designed to do so, or can any of the following Guzzle built in methods be used? Thanks Guzzle Response methods: __construct getStatusCode getReasonPhrase withStatus getProtocolVersion withProtocolVersion getHeaders hasHeader getHeader getHeaderLine withHeader withAddedHeader withoutHeader getBody withBodyGuzzle Body methods: __construct __destruct __toString getContents close detach getSize isReadable isWritable isSeekable eof tell rewind seek read write getMetadata<!doctype html> <html> <head> <meta content="text/html; charset=utf-8" http-equiv="content-type"> <title>Untitled Document</title> </head> <body> <form action="upload_image.php" method="post" enctype="multipart/form-data"> <label>select page<input type="file" name="image"> <input type="submit" name="upload"> </label> </form> <?php if(isset($_POST['upload'])) { $image_name=$_FILES['image']['name']; //return the name ot image $image_type=$_FILES['image']['type']; //return the value of image $image_size=$_FILES['image']['size']; //return the size of image $image_tmp_name=$_FILES['image']['tmp_name'];//return the value if($image_name=='') { echo '<script type="text/javascript">alert("please select image")</script>'; exit(); } else { $ex=move_uploaded_file($image_tmp_name,"image/".$image_name); if($ex) { echo 'image upload done "<br>"'; echo $image_name.'<br>'; echo $image_size.'<br>'; echo $image_type.'<br>'; echo $image_tmp_name.'<br>'; } else { echo 'error'; } } } ?> </body> </html>I make this simple script for upload files like photo , It's work correctly ,but not upload the large files ex 8MB images 12MB images Edited by Ch0cu3r, 04 October 2014 - 09:50 AM. Modified topic title - changed download to upload |