[JS/PHP] How to scrape data off a page & insert it into DB?

C++, C#, Java, PHP, ect...
Post Reply
User avatar
Verahta
Posts: 440
Joined: Wed Aug 24, 2011 1:50 am

[JS/PHP] How to scrape data off a page & insert it into DB?

Post by Verahta »

Does anyone know how to make a simple JS script that will allow you to grab specific text off a web page and send it to a PHP file for inserting it into a database?

I'm trying to do a simple hobby project for a game I play, but can't seem to get the specific info I need. This is a screen shot of the type of web page I want to grab text off of:
http://i.imm.io/nlXM.png

I want to be able to load this page, and via JS or Jquery grab that list of players and their ships and insert it into a MySQL database on my own personal web site to track, organize and analyze.

This is an example of the source code on that page:

Code: Select all

<TR><TD ALIGN=CENTER><IMG SRC=images/clear.gif HEIGHT=8 WIDTH=1 BORDER=0><BR>
Starships in Grid<BR>
<IMG SRC=images/clear.gif HEIGHT=5 WIDTH=1 BORDER=0><BR>
<TABLE CELLPADDING=2 CELLSPACING=0 WIDTH=100% BORDER=0>
<TR>
<TD BGCOLOR=#151515 WIDTH=16><IMG SRC=images/user/54413_mini1.gif WIDTH=16 HEIGHT=16 BORDER=0></TD>
<TD BGCOLOR=#151515 WIDTH=1><IMG SRC=images/clear.gif HEIGHT=1 WIDTH=1 BORDER=0></TD>
<TD BGCOLOR=#151515><FONT COLOR=#9D9DA1><A HREF='index.php?go=ship_info&ship_id=79623' ONMOUSEOVER="window.status=''; return true">Boreas</A> (<A HREF='index.php?go=class_info&class_id=14' ONMOUSEOVER="window.status=''; return true">BTR</A>)</FONT></TD>
<TD BGCOLOR=#151515 WIDTH=1><IMG SRC=images/clear.gif HEIGHT=1 WIDTH=1 BORDER=0></TD>
<TD BGCOLOR=#151515><FONT COLOR=#9D9DA1>owned by Cpt. <A HREF='index.php?go=user_info&user_id=54413' ONMOUSEOVER="window.status=''; return true">Mindless</A> of the <A HREF='index.php?go=faction_info&faction_id=464' ONMOUSEOVER="window.status=''; return true">Black Beach Alliance</A>
</FONT></TD>
<TD BGCOLOR=#151515 WIDTH=1><IMG SRC=images/clear.gif HEIGHT=1 WIDTH=1 BORDER=0></TD>
<TD BGCOLOR=#151515 ALIGN=RIGHT><FONT COLOR=#9D9DA1>Scan</FONT></TD>
</TR>
<TR>
<TD BGCOLOR=#101010 WIDTH=16><IMG SRC=images/faction/464_mini12.gif WIDTH=16 HEIGHT=16 BORDER=0></TD>
<TD BGCOLOR=#101010 WIDTH=1><IMG SRC=images/clear.gif HEIGHT=1 WIDTH=1 BORDER=0></TD>
<TD BGCOLOR=#101010><FONT COLOR=#9D9DA1><A HREF='index.php?go=ship_info&ship_id=75207' ONMOUSEOVER="window.status=''; return true">SCSY Ventura</A> (<A HREF='index.php?go=class_info&class_id=14' ONMOUSEOVER="window.status=''; return true">BTR</A>)</FONT></TD>
<TD BGCOLOR=#101010 WIDTH=1><IMG SRC=images/clear.gif HEIGHT=1 WIDTH=1 BORDER=0></TD>
<TD BGCOLOR=#101010><FONT COLOR=#9D9DA1>owned by Cpt. <A HREF='index.php?go=user_info&user_id=181' ONMOUSEOVER="window.status=''; return true">Andym88</A> of the <A HREF='index.php?go=faction_info&faction_id=464' ONMOUSEOVER="window.status=''; return true">Black Beach Alliance</A>
</FONT></TD>
<TD BGCOLOR=#101010 WIDTH=1><IMG SRC=images/clear.gif HEIGHT=1 WIDTH=1 BORDER=0></TD>
<TD BGCOLOR=#101010 ALIGN=RIGHT><FONT COLOR=#9D9DA1>Scan</FONT></TD>
</TR>
<TR>
<TD BGCOLOR=#151515 WIDTH=16><IMG SRC=images/faction/464_mini12.gif WIDTH=16 HEIGHT=16 BORDER=0></TD>
<TD BGCOLOR=#151515 WIDTH=1><IMG SRC=images/clear.gif HEIGHT=1 WIDTH=1 BORDER=0></TD>
<TD BGCOLOR=#151515><FONT COLOR=#9D9DA1><A HREF='index.php?go=ship_info&ship_id=68408' ONMOUSEOVER="window.status=''; return true">GAR Pulseing Heart</A> (<A HREF='index.php?go=class_info&class_id=14' ONMOUSEOVER="window.status=''; return true">BTR</A>)</FONT></TD>
<TD BGCOLOR=#151515 WIDTH=1><IMG SRC=images/clear.gif HEIGHT=1 WIDTH=1 BORDER=0></TD>
<TD BGCOLOR=#151515><FONT COLOR=#9D9DA1>owned by Cpt. <A HREF='index.php?go=user_info&user_id=57070' ONMOUSEOVER="window.status=''; return true">Remus Cross</A> of the <A HREF='index.php?go=faction_info&faction_id=464' ONMOUSEOVER="window.status=''; return true">Black Beach Alliance</A>
</FONT></TD>
<TD BGCOLOR=#151515 WIDTH=1><IMG SRC=images/clear.gif HEIGHT=1 WIDTH=1 BORDER=0></TD>
<TD BGCOLOR=#151515 ALIGN=RIGHT><FONT COLOR=#9D9DA1>Scan</FONT></TD>
</TR>
I don't want to alter the game pages, but simply collect specific info off the page, like Captain name and ship info, etc, and send it to a database for organization and analysis.

JS script (like one users can install from GreaseMonkey or similar) --> PHP file (on my own 'clan website') --> MySQL Database --> PHP display page for viewing the organized data.


Thank you if you can help me with this.
"In order to understand recursion, one must first understand recursion".
User avatar
Jackolantern
Posts: 10891
Joined: Wed Jul 01, 2009 11:00 pm

Re: [JS/PHP] How to scrape data off a page & insert it into

Post by Jackolantern »

I don't know where JS really fits into it. You probably could use JS for some step in there. However, PHP can do this all itself, since this is primarily a server-side operation due to connecting to other websites (aka domains). You can retrieve other websites' content as a string variable, and then convert it to another format or work with it as text to get the info you need. Here is a nice little tutorial on using the file_get_contents() function, as well as more info on getting the content of other pages.
The indelible lord of tl;dr
User avatar
Verahta
Posts: 440
Joined: Wed Aug 24, 2011 1:50 am

Re: [JS/PHP] How to scrape data off a page & insert it into

Post by Verahta »

Well other factions/clans in the game have javascript scripts that automatically grab data off the page when a player enters the area and it sends the data to a php file to be inserted into a database on the faction/clan website, so I was trying to figure out how to do the same thing since they won't share with me :lol:

Just seems like making a JS file that everyone in the faction can install on the pages, either via GreaseMonkey or Chrome, makes it easy. This is a game, so it requires you login and have sessions and everything, is there no security risk with a PHP file accessing the game/website this way? Obviously I would not want to get banned if the game admin thought I was changing something or hacking or whatever. We are allowed to grab text off the pages, but obviously its against the rules to alter the page (duh haha).

Is there any way you could grab the text off the page with JS and send it to PHP to be converted into the string?
"In order to understand recursion, one must first understand recursion".
User avatar
Jackolantern
Posts: 10891
Joined: Wed Jul 01, 2009 11:00 pm

Re: [JS/PHP] How to scrape data off a page & insert it into

Post by Jackolantern »

Verahta wrote:Well other factions/clans in the game have javascript scripts that automatically grab data off the page when a player enters the area and it sends the data to a php file to be inserted into a database on the faction/clan website, so I was trying to figure out how to do the same thing since they won't share with me :lol:
That is done in AJAX, but that only works when the data is on the same page. They can pull data off their own page with JS and AJAX it back to their server to simply be inserted into their database. That is not scraping, but rather simple AJAX data passing.
Verahta wrote:Just seems like making a JS file that everyone in the faction can install on the pages, either via GreaseMonkey or Chrome, makes it easy. This is a game, so it requires you login and have sessions and everything, is there no security risk with a PHP file accessing the game/website this way? Obviously I would not want to get banned if the game admin thought I was changing something or hacking or whatever. We are allowed to grab text off the pages, but obviously its against the rules to alter the page (duh haha).
What access do you have exactly to these pages you want the info from? You want data off of everyone's page who is in a specific clan? Are you giving them the script to add to their own pages (assuming they have the access to alter the pages from the server)? If not, then you will have to scrap that plan. A script may be runnable through a JS console (like Firebug or Chrome Dev Tools) on a page one time (edit: and rereading your post, I think this is what you meant), but the same origin policy would not allow you to AJAX it anywhere else but the original site host's server, not to mention that it would only be run one time while on that single page. Downloading the page and altering the JS on it and saving it could work for the JS side, but that would unhook you from the server side, thus probably negating any of the stats you wanted to grab. Unless of course you spoofed access to the server with your JS files in tact, but that would most likely get you banned if they found out (many sites consider this "hacking" lol, and is one of the prime ways why JS is considered insecure). And even then the same-origin policy is going to be a pain since you are wanting to send the data to another server through AJAX.
Verahta wrote:Is there any way you could grab the text off the page with JS and send it to PHP to be converted into the string?
JS's role is pretty much communicated above. It will likely be of little help unless you have direct access to alter the pages on the server you want the info from so you can add your JS, or unless you download the pages, alter the JS and spoof website access. When you download the files directly to PHP with the function I outlined in my previous response, the PHP for that page will be executed as that is how the HTML you will be downloading was generated. However, the JS on the page will not be executed because that is client-side, and in this case the client would be the PHP engine, which does not under normal circumstances execute JS. So you would get the whole JS script as could be seen in the browser as a block of text or a file reference in your PHP text variable. You could then use the same PHP function to grab the link to the JS files if you also wanted them in text format.
The indelible lord of tl;dr
Post Reply

Return to “Coding”