Mining Google Web Services: Building Applications with the Google API
Google doesn't require you to refresh your data. You can store the links you retrieve from Google as long as needed. In fact, you can build link histories if you want to provide a basis for analysis. However, you'll eventually need to refresh the data you receive from Google because the links will become outdated . The technique you use to refresh the data depends on how you're using the data within your application. A research firm that deals with relatively stable data might not need to refresh its links as often as someone who works with computers for a living. The stability of the data makes a difference in the technique used to refresh the data, as well as the refresh interval.
All of the examples in this book take a dynamic approach. The application checks the date that it last retrieved any data in the database from Google Web Services in response to a user request. If the data is too old, the application requests the information from Google Web Services. Unfortunately, this means that some users will observe an inconsistent delay in responses. You could also build a database of links and refresh those links every night at a convenient time ”when no one is likely to need the information. The idea is to refresh the data at a convenient interval using the technique that best suits your organization.
Make sure you keep up with current Google policy regarding offline data storage. Although the licensing agreement doesn't currently require you to update your data, a future agreement might add this requirement. The reason that I mention this particular potential change is that most Web services do require you to update your data at regular intervals to ensure your application accurately represents that Web service.
Sometimes you need to consider the source of Google data. None of the links that Google provides are based on Google data. You can find more than a few books on the market that purport to help you change your Web page characteristics to make the Google search engine work in your favor. In a few cases, Web sites include search terms that have nothing to do with the content of their sites. Consequently, a link that looks like it has good information might not have any useful data. This is the reason that you want to maintain a personal database of good links when accuracy is essential and validate that database against Google at reasonable intervals.
You can't extend permanent storage to volatile information such as precise page content unless you work in an area of research where the data remains relatively stable. A Web page owner can update and modify data at any point, which would make your locally stored cache inaccurate. When your business depends on the accuracy of the data you derive from Internet sites, you need to keep that data updated to reflect changes in technology. As people learn new facts, they'll update their site to reflect these changes and you need to keep apprised of them. However, a local cache does afford you the opportunity to compare the page with your cached page to see what changed ”reducing the time you spend researching new information.