Webalizer tools merge




















Its kinda basic or senseless question for you, but its more important for me. One of my colleague was working on my Linux based web server he left the job. If yes where its creating a log file for hits and views not the HTML output the file which will be used to generate output. Your email address will not be published. Skip to content Server How to install GUI in Ubuntu 9. Lee Dickey says:. June 10, at pm.

Jose Garcia says:. July 8, at pm. Scott says:. August 29, at pm. October 25, at pm. October 26, at am. October 27, at am. Rob says:. November 11, at pm. November 12, at pm. John says:. February 15, at pm. February 17, at am.

March 11, at am. Djoh says:. Defines the maximum number of lines in the HTTP error report. If the number of actual errors is greater than this value, the rest of the errors will either be discarded or generated as a separate HTTP error report, depending on the value of AllErrors. These keywords allow you to hide user agents, referrers, hosts, URL's and usernames from the various "Top" tables.

The value for these keywords are the same as those used in their command line counterparts. You can specify as many of these as you want without limit. Refer to the section above on "Command Line Options" for a description of the string formatting used as the value. Values cannot exceed 80 characters in length. This allows specified user agents to be hidden from the "Top User Agents" table.

Not very useful, since there a zillion different names by which browsers go by today, but could be useful if there is a particular user agent ie: robots, spiders, real-audio, etc.. This keyword is useless if 1 your log file does not provide user agent information or 2 you disable the user agent table.

This allows you to hide specified referrers from the "Top Referrers" table. Normally, you would only specify your own web server to be hidden, as it is usually the top generator of references to your own pages.

Of course, this keyword is useless if 1 your log file does not include referrer information or 2 you disable the top referrers table. This allows you to hide specified hosts from the "Top Hosts" table. Normally, you would only specify your own web server or other local machines to be hidden, as they are usually the highest hitters of your web site, especially if you have their browsers home page pointing to it.

This allows hiding all individual hosts from the display, which can be useful when a lot of groupings are being used since grouped records cannot be hidden. It is particularly useful in conjunction with the GroupDomains feature, however can be useful in other situations as well. Value can be either yes or no , with no the default. Normally, this is used to hide items such as graphic files, audio files or other non-HTML files that are transferred to the visiting user.

This allows you to hide Usernames from the "Top Usernames" table. Usernames are only available if you use http based authentication on your web server. If set to yes , this option allows you to hide all robots from the Top Hosts and Top Agents reports.

Robot groups, if there are any, will still be displayed in the Top Agents report. Use the Robot configuration parameter to identify robots. Controls whether items that are being currently grouped with one of the grouping configuration variables are also added to the corresponding hide item list or not. This configuration value may be used multiple times between multiple sets of group configuration values.

For example, in this configuration FireFox , Chrome and Opera user agents will be reported in a group and individually, but Internet Explorer will be reported only as a group. Group processing is only done after the individual record has been fully processed, so name mangling and site total updates have already been performed.

Because of this, groups are not counted in the main site total as that would cause duplication. Groups can be displayed in bold and shaded as well. Grouped records are not, by default, hidden from the report. This allows you to display a grouped total, while still being able to see the individual records, even if they are part of the group. There are no command line switches for these keywords. This label should be separated from the value by at least one whitespace character, such as a space or tab character.

See the sample. GroupReferrer Allows grouping Referrers. Can be handy for some of the major search engines that have multiple host names a referral could come from. This keywords allows grouping Sites. Most used for grouping top level domains and unresolved IP address for local dial-ups, etc Groups User Agents. Make sure you put Edge first because it is based on Chrome and also lists Chrome in its user agent string.

Allows automatic grouping of domains. The numeric value represents the level of grouping, and can be thought of as 'the number of dots' to display.

A 1 will display second level domains only xxx. The default value of 0 disables any domain grouping. Allows grouping of usernames. Combined with a group name, this can be handy for displaying statistics on a particular group of users without displaying their real usernames.

Allows shading of table rows for groups. Value can be yes or no , with the default being yes. GroupHighlight Allows bolding of table rows for groups. If set to yes , will instruct Stone Steps Webalizer to group automated user agents robots in the Top Agents report.

Each group will be assigned a CSS class robot to distinguish them from non-robot user agents. These keywords allow you to completely ignore log records when generating statistics, or to force their inclusion regardless of ignore criteria. Records can be ignored or included based on host, URL, user agent, referrer and username.

Be aware that by choosing to ignore records, the accuracy of the generated statistics become skewed, making it impossible to produce an accurate representation of load on the web server. These keywords do not have any command line switch counterparts, so they may only be specified in a configuration file.

Try grep'ing the records into a separate file and process it instead. This allows specified URL's to be completely ignored from the generated statistics. One use for this keyword would be to ignore all hits to a temporary directory where development work is being done, but is not accessible to the outside world. Unlike other ignore keywords, IgnoreURL can take optional search argument names and values.

If multiple IgnoreURL values are used, they must follow one another in the configuration file or those that are out of order will be ignored. The entire search argument name is matched, not a part of it, so abc will only match abc and not abcd.

For example, if you would like to ignore index. IgnoreURL may also include search argument values, in which case both, name and value must match for a log line to be ignored. For example, if the following IgnoreURL entry is used in the configuration file:. Note that search argument filtering is done before the ignore logic is applied, so if you filtered out the argument that is used in one of the IgnoreURL entries, log records containing excluded search arguments will not be ignored.

This allows specified User Agent records to be completely ignored from the statistics. Maybe useful if you really don't want to see all those hits from MSIE :. This allows specified username records to be completely ignored from the statistics. Usernames can only be used if you use http authentication on your server. Force the record to be processed based on hostname. Force the record to be processed based on URL. IncludeReferrer Force the record to be processed based on referrer.

Force the record to be processed based on user agent. Force the record to be processed based on username. Usernames are only available if you use http based authentication on your server. If set to yes , forces all records submitted by a robot user agent to be completely ignored. The file is a standard tab delimited text file, meaning that each column is separated by a tab 0x09 character. A header record may be included if required, using the DumpHeader keyword.

Since these files contain all records that have been processed, including normally hidden records, an alternate location for the files can be specified using the DumpPath keyword, otherwise they will be located in the default output directory. Specifies an alternate location for the dump files.

The default output location will be used otherwise. Allows the dump filename extensions to be specified. The default extension is tab , however may be changed with this option. Allows a header record to be written as the first record of the file. Value can be either yes or no , with the default being no. Dump tab-delimited hosts file.

Dump tab delimited url file. Dump tab delimited referrer file. Referrer information is only available if present in the log file ie: combined web server log. Dump tab delmited user agent file. User agent information is only available if present in the log file ie: combined web server log.

Dump tab delimited username file. The username data is only avilable if http authentication is used on the web server and that information is present in the log. Dump tab delimited search string file. If this configuration parameter is set to yes, Stone Steps Webalizer will generate a tab-delimited file listing all downloads for the current month. If this configuration parameter is set to yes, Stone Steps Webalizer will generate a tab-delimited file listing all HTTP errors for the current month.

Generate a tab-delimited data file for all countries. Generate a tab-delimited data file for all cities. Generates a tab-delimited data file for all ASN entries. These keywords allow you to customize the HTML code that The Webalizer produces, such as adding a corporate logo or links to other web pages. You can specify as many of these keywords as you like, and they will be used in the order that they are found in the file.

Values cannot exceed 80 characters in length, so you may have to break long lines up into two or more lines. There are no command line counterparts to these keywords. Allows generated pages to use something other than the default html extension for the filenames. Allows code to be inserted at the very beginning of the HTML files.

Use it for server-side scripting capabilities, such as php3, to insert scripting files and other directives. There is no default. Keep in mind the placement of this code in relation to the title and other aspects of the web page.

A typical use is to add a corporate logo graphic in the top right. Normally this keyword isn't needed, but is provided in case you included a large graphic or some other weird formatting tag in the HTMLHead section that needs to be cleaned up or terminated before the main report section.

This keyword defines HTML code that is placed at the bottom of the report. Normally this keyword is used to provide a link back to your home page or insert a small graphic at the bottom right of the page. Specifies a URL path to the webalizer. The path must be a URL path, even if it refers to a local file. You can reference one CSS file in many reports to make it easier to change report layout in one place. You can reference one JavaScript file in many reports to make it easier to change report layout in one place.

Configures Stone Steps Webalizer to append the current language code to the generated HTML and image files, so Apache language extensions can be used to browse language- specific reports. For example, if the current language is Japanese, index. Avoid processing the same log files more than once because in every subsequent run Stone Steps Webalizer will use the latest processed log time stamp to skip log lines that have already been processed in previous runs and, considering that modern log files will most likely contain multiple log lines with the same time stamp value, some of those log lines had been processed before and some have not yet been processed, but will be discarded because of the matching time stamp values anyway.

When configuring your web server to rotate log files, keep in mind that as soon as a log line with the time stamp from the next month is processed, the current month will be ended and reports will be generated. If you would like to end the current month after processing the log file you know to be the last one, use --end-month switch to do so and then --prepare-report to generate the monthly report from the rolled over state database.

If you choose to use this workflow and intend to process multiple log files one after another, you will achieve better performance using the --batch switch, which prevents Stone Steps Webalizer from generating intermediate monthly reports after each log file.

Stone Steps Webalizer recognizes Fields directives and dynamically reconfigures its parser to process log file entries following this directive in the matching order. IIS log format mostly follows the W3C standard, with one excepion - it outputs request processing time time-taken in milliseconds instead of seconds.

Apache logs may be customized using LogFormat and CustomLog directives these are Apache configuration keywords, not those used by Stone Steps Webalizer. Stone Steps Webalizer can parse the CustomLog directive, if it's specified anywhere in the configuration using the ApacheLogFormat configuration parameter.

For example the line is broken for display purposes; it would actually appear as a single line in the configuration file :. It is important to understand that Apache log files do not contain log format information unlike log files in W3C extended format and switching log file format without renaming the current log file will result in a log file that contains log information in mixed formats.

Such log files cannot be analyzed unless they are split onto multiple consistently-formatted log files. If log formats specified in httpd. The Webalizer supports CLF log formats, which should work for just about everyone.

If you want User Agent or Referrer information, you need to make sure your web server supplies this information in it's log file, and in a format that the Webalizer can understand.

While The Webalizer will try to handle many of the subtle variations in log formats, some will not work at all. Most web servers output CLF format logs by default. For Apache, in order to produce the proper log format, add the following to the httpd. This instructs the Apache web server to produce a combined log that includes the referrer and user agent information on the end of each record, enclosed in quotes This is the standard recommended by both Apache and NCSA. Referrers are weird critters They take many shapes and forms, which makes it much harder to analyze than a typical URL, which at least has some standardization.

What is contained in the referrer field of your log files varies depending on many factors, such as what site did the referral, what type of system it comes from and how the actual referral was generated. Why is this? Well, because a user can get to your site in many ways They may have your site bookmarked in their browser, they may simply type your sites URL field in their browser, they could have clicked on a link on some remote web page or they may have found your site from one of the many search engines and site indexes found on the web.

The Webalizer attempts to deal with all this variation in an intelligent way by doing certain things to the referrer string which makes it easier to analyze. Of course, if your web server doesn't provide referrer information, you probably don't really care and are asking yourself why you are reading this section To complicate things even more, dynamic HTML documents and HTML documents that are generated by CGI scripts or external programs produce lots of extra information which is tacked on to the end of the referrer string in an almost infinite number of ways.

If the user just typed your URL into their browser or clicked on a bookmark, there won't be any information in the referrer field and will take the form -. In order to handle all these variations, The Webalizer parses the referrer field in a certain way.

The rest of the referrer field is left alone. This follows standard convention, as the actual method HTTP and hostname are always case insensitive, while the document name portion is case sensitive.

Referrers that came from search engines, dynamic HTML documents, CGI scripts and other external programs usually tack on additional information that it used to create the page. A common example of this can be found in referrals that come from search engines and site indexes common on the web. Sometimes, these referrers URL's can be several hundred characters long and include all the information that the user typed in to search for your site. The Webalizer deals with this type of referrer by stripping off all the query information, which starts with a question mark?

When a user comes to your site by using one of their bookmarks or by typing in your URL directly into their browser, the referrer field is blank, and looks like -. Most sites will get more of these referrals than any other type. The Webalizer converts this type of referral into the string - Direct Request. This is done in order to make it easier to hide via a command line option or configuration file option.

This is because the character - is a valid character elsewhere in a referrer field, and if not turned into something unique, could not be hidden without possibly hiding other referrers that shouldn't be.

Stone Steps Webalizer supports a configuration parameter, SpamReferrer , which lists referrer patterns considered as spam. Visitors submitting these requests will be red-flagged and marked in the hosts report as spammers. Multiple SpamReferrer entries may be used to specify more than one pattern.

For example, the first two entries below will red-flag all requests with the referrer URL containing words gambling or poker anywhere in the referrer URL. The third entry will match only if the referrer URL begins with the string of characters preceding the asterisk.

Once a visitor is identified as a spammer, all requests from this IP address will be treated as spam for the rest of the currently- reported month. Spam requests will be counted as usual in all reports, except the referrer report, to prevent spam referrers from appearing in the report as clickable links. Spamming hosts will also highlighted in red color in the hosts report. If you would like to change the color of the highlighting, locate the following line in webalizer.

In addition to highlighting, the all-hosts and the tab-separated host reports will have an asterisk output next to the spammer's host. In general, URLs are supposed to be uniformly encoded in such a way that keeps them simple, but still usable even if they are printed on paper, pronounced on the radio, or appear in other contexts where it may be impossible to distinguish characters from different languages. This encoding is described by the internet standard RFC and defines which characters may appear in their natural representation and which should be percent-encoded as one or more sequences of a percent character followed by two hexadecimal digits that represent that character e.

Sometimes URL characters may be encoded incorrectly, which may be because of various historical reasons, or because of bugs in user agents, or in an attempt to avoid simple spam filters that do not percent-decode URL sequences before looking for spam keywords. In either case, having the same URL encoded differently fragments reports by creating aliases e. In order to deal with these issues, Stone Steps Webalizer normalizes all URLs extracted from log files to reduce aliasing and improve report readability.

URL normalization is performed before any other work is done against all URLs, which follows the rules described below, so all configuration filters should use normalized characters in all ignore, hide and group URL patterns. If any of these characters is percent-encoded, it will be decoded. Following characters have special meaning within URLs and will not be encoded or decoded and will remain in their current form, whatever it is i.

Percent-encoded control characters will not be decoded and unencoded control characters will be percent-encoded. A percent character that is not a part of a percent-encoding sequence in a URL will be percent-encoded e. Note that once some URL path pattern is found, only search arguments of the matching path pattern will be checked, but no further.

This may produce unexpected results if broader URL path patterns e. Consider these filters:. This is done for performance reasons, so a long list of ignore filters wouldn't slow down log processing too much. One way to work this around is to have those broader filters at the very end, so all other patterns are matched first, but this would be an error-prone approach if more than one catch-all patterns is needed. The Webalizer will do a minimal analysis on referrer strings that it finds, looking for well known search string patterns.

Most of the major search engines are supported, such as Yahoo! However, it should be accurate enough to give a good indication of what users were searching for when they stumbled across your site. Note: as of version 1. The majority of data analyzed and reported on by The Webalizer is as accurate and correct as possible based on the input log file.

However, due to the limitation of the HTTP protocol, the use of firewalls, proxy servers, multi-user systems, the rotation of your log files, and a myriad of other conditions, some of these numbers cannot, without absolute accuracy, be calculated.

In particular, Visits, Entry Pages and Exit Pages are suspect to random errors due to the above and other conditions. The reason for this is twofold,. Because log files are finite, they have a beginning and ending, which can be represented as a fixed time period.

There is no way of knowing what happened previous to this time period, nor is it possible to predict future events based on it. Also, because it is impossible to distinguish individual users apart, multiple users that have the same IP address all appear to be a single user, and are treated as such. Dynamic IP assignment used with dial-up internet accounts also present a problem, since the same user will appear as to come from multiple places. For example, suppose two users visit your server from XYZ company, which has their network connected to the Internet by a proxy server fw.

All requests from the network look as though they originated from fw. The Webalizer would see these requests as from the same location, and would record only 1 visit, when in reality, there were two.

Because entry and exit pages are calculated in conjunction with visits, this situation would also only record 1 entry and 1 exit page, when in reality, there should be 2.

As another example, say a single user at XYZ company is surfing around your website.. They arrive at pm the last day of the month, and continue surfing until am, which is now a new day in a new month.

Since a common practice is to rotate save then clear the server logs at the end of the month, you now have the users visit logged in two different files current and previous months.

Because of this and the fact that the Webalizer clears history between months , the first page the user requests after midnight will be counted as an entry page. This is unavoidable, since it is the first request seen by that particular IP address in the new month.

They do provide a good indication of overall trends, and shouldn't be that far off from the real numbers to count much. You should probably consider them as the minimum amount possible, since the actual real values should always be equal or greater in all cases. The Webalizer now has the ability to dump all object tables to tab delimited ASCII text files, which can then be imported into most popular database and spreadsheet programs.

The filename extensions default to. Since this data contains all items, even those normally hidden, it may not be desirable to have them located in the output directory where they may be visible to normal web users..

For this reason, the DumpPath configuration keyword is available, and allows the placement of these files somewhere outside the normal web server document tree. An optional header record may be written to these files as well, and is useful when the data is to be imported into a spreadsheet.. If enabled, the header is simply the column names as the first record of the file, tab separated. Stone Steps Webalizer supports dynamic languages loaded at run time.

If the language file is found, its content will be used to produce reports and progress messages. A new configuration variable, LanguageFile, can be used to specify the location of the file. For example. The name identifies a text variable used by Stone Steps Webalizer and the value provides language-specific text. For example, the English version of the error message reported if a log file cannot be opened is defined as follows:. Some language file entries, such as the list of months shown below, may contain multiple elements.

In this case, individual elements must be separated by commas:. The whitespace between the end of each element and the comma is preserved and may be used for padding purposes.

The whitespace following the comma is stripped off, unless the element is enclosed in double quotes. If an individual element of a comma-separated list contains a comma, as shown in the example below, this element must be enclosed in double quotes:. All existing language files have been converted to UTF If you would like to convert some other character encoding to UTF-8, you can use the iconv utility. For example, the following command converts a Japanese language file from euc-jp to utf Each of these configuration variables must be a fully-qualified path to the selected TrueType font file s.

For example, the following two lines configure Stone Steps Webalizer to use Lucida Console for all graph legends and axis markers and Tahoma Bold for all graph titles:. If GraphFontNormal and GraphFontBold are not specified, or if the associated font files cannot be found, Stone Steps Webalizer will use the default raster fonts to generate text for the graphs.

Note that raster fonts may not have suitable character representation for non-Latin characters. You can control the appearance of the generated text using three configuration variables shown below.

The first two variables define the size of the normal and bold fonts in points. The third one instructs Stone Steps Webalizer whether to smooth font edges or not. If you would like to use non-Latin UTF-8 characters in your language files, make sure that the TrueType font you selected contains the characters you need.

For example, Lucida Console shipped with the English version of Windows does not have Japanese characters and if used to generate graphs will result in unusable graphs.

Robots are identified before user agents are mangled. Some robot related features, such as highlighting robots in the Top Agents report, may be disabled if agent mangling is active. Log records matching IgnoreRobot entries, are completely ignored and none of the robot-related entries are updated in this case. Hosts are marked as robots when user agent matches one of the Robot entries and only when a host is seen for the first time i.

If a human and a robot share the same IP address, this address will be marked as robot or non-robot depending on which user agent was active when the first hit was logged by the web server. Active visits are marked as robot visits when user agents matches one of the Robot entries, regardless whether the corresponding hosts are marked as robots or not.

Visit robot flag is used when user agents are classified as robots or not and when website and country totals are updated. Country totals do not include robot activity. Stone Steps Webalizer computes country totals at when ending visits. Consequently, in the incremental mode active visit data is not included into country totals until the last log file for the month is processed.

All active visits are terminated at the end of the month, so that the final pie chart accurately depicts the percentage of other countries. The Webalizer makes liberal use of memory for internal data structures during analysis. Lack of real physical memory will noticeably degrade performance by doing lots of swapping between memory and disk.

One user who had a rather large log file noticed that The Webalizer took over 7 hours to run with only 16 Meg of memory. Once memory was increased, the time was reduced to a few minutes. The reason for this is that every log record must be scanned for each item in each list. On really large log files, this can have a profound impact. It is recommended that you use the least amount of these configuration options that you can, as it will greatly improve performance.

A lot of time and effort went into making The Webalizer, and to ensure that the results are as accurate as possible. If you find any abnormalities or inconsistent results, bugs, errors, omissions or anything else that doesn't look right, please let me know so I can investigate the problem or correct the error. This goes for the minimal documentation as well. Suggestions for future versions are also welcome and appreciated.

Skip to content. Star 3. View license. Branches Tags. Could not load branches. Could not load tags. Latest commit. Git stats 1, commits. Failed to load latest commit information. View code. What is The Webalizer? Stone Steps Webalizer v6. Installing the Webalizer Windows Windows pre-built package contains all run-time dependencies and can be used as-is. The script makes use of the following set of directories. Run sudo make uninstall to uninstall. Running the Webalizer The Webalizer was designed to be run from a Linux or Windows command line prompt or as a cron job.

The format of the command line is: webalizer [options SearchEngine www. About Stone Steps Webalizer is a fast command line application for web server and web proxy log file analysis.

Releases 7 SSW v6. Dec 7, Contributors 3. Display the top num sites table. Display the top num countries table. Display the top num entry pages table. Display the top num exit pages table.

The file sample. LogType name Specify log file type as name. Values can be either web , s quid or ftp , with the default being web. OutputDir dir Create output in the directory dir. If none specified, the current directory will be used. HistoryName name Filename to use for history file. Defaults to ' webalizer.

ReportTitle name Use the title string name for the report title. If none specified, use the default of in english " Usage Statistics for ". Hostname name Set the hostname for the report as name. If none specified, an attempt will be made to gather the hostname via a uname 2 system call.

If that fails, localhost will be used. Quiet yes no Supress informational messages. Warning and Error messages will not be supressed.

ReallyQuiet yes no Supress all messages, including Warning and Error messages. Debug yes no Print extra debugging information on Warnings and Errors. TimeMe yes no Force timing information at end of processing.

IgnoreHist yes no Ignore previous monthly history file. Does not prevent Incremental file processing. Normally, out of sequence log records are ignored. DailyGraph yes no Display Daily Graph in output report. DailyStats yes no Display Daily Statistics in output report.

HourlyGraph yes no Display Hourly Graph in output report. HourlyStats yes no Display Hourly Statistics in output report. PageType name Define the file extensions to consider as a page.

If a file is found to have the same extension as name , it will be counted as a page sometimes called a pageview. GraphLines num Specify the number of background reference lines displayed on the graphs produced. VisitTimeout num Specifies the visit timeout value. A visit is determined by looking at the difference in time between the current and last request from a specific site. If the difference is greater or equal to the timeout value, the request is counted as a new visit.

Specified in seconds. IndexAlias name Use name as an additional alias for index. MangleAgents num Mangle user agent names based on mangle level num. See the -M command line switch for mangle levels and their meaning.

SearchEngine name variable Allows the specification of search engines and their query strings. The name is the name to match against the referrer string for a given search engine. The variable is the cgi variable that the search engine uses for queries. See the sample. Incremental yes no Enable Incremental mode processing. IncrementalName name Filename to use for incremental data. Specify zero 0 to disable. Use zero to disable. TopReferrers num Display the top num Referrers table. TopSites num Display the top num Sites table.

TopCountries num Display the top num Countries in the table. TopEntry num Display the top num Entry Pages in the table. TopExit num Display the top num Exit Pages in the table. TopSearch num Display the top num Search Strings in the table.

TopUsers num Display the top num Usernames in the table. Usernames are only available if using http based authentication. HideReferrer name Hide Referrers that match name. HideSite name Hide Sites that match name. HideAllSites yes no Hide all individual sites. This causes only grouped sites to be displayed. HideUser name Hide Usernames that match name. IgnoreAgent name Ignore User Agents that match name. IgnoreReferrer name Ignore Referrers that match name.

IgnoreSite name Ignore Sites that match name. IgnoreUser name Ignore Usernames that match name. GroupReferrer name [ Label ] Group Referrers that match name. GroupSite name [ Label ] Group Sites that match name. GroupDomains num Automatically group sites by domain. The default value of 0 disables domain grouping. GroupUser name [ Label ] Group Usernames that match name. IncludeSite name Force inclusion of sites that match name. Takes precedence over Ignore keywords.

IncludeReferrer name Force inclusion of Referrers that match name. IncludeAgent name Force inclusion of User Agents that match name. IncludeUser name Force inclusion of Usernames that match name. Default is html. Do not include the leading period!



0コメント

  • 1000 / 1000