The price listing for a journal in the Ulrich directory is a set of prices for different types of customers. In this document we'll refer to this set of prices as P, and the conjoined string of these prices that we start with as S. Each listing in P contains keywords that designate which type of customer it applies to. The prices contained in the listings of P are either words like "free" or a currency/amount format such as "USD 10" of "GBP 50", and one listing in P may contain several such prices. We use one process of elimination, a function called getPrice to choose which listing in P is appropriate, and another, called individualPriceListing, to determine which price in this listing to use. First we describe the steps used in getPrice to eliminate all but one listing from P. The steps are presented in the order utilized, as both functions behave differently if steps in the process of elimination are applied in a different order.
Note: In the code provided, $str is the string S, @prices is P, and $country is the country in which the journal is published (a string). "PRICEUNKNOWN" is the string returned when the price cannot be determined. "ERROR" is the error string returned when something bad happens, like the function eliminates all listings in P without deciding to use one. The tags "(members)", "(pervolume)" and "(perissue)" are appended to the returned price when only the member price, per volume price, or per issue price can be determined. Several helper functions are often used as well. isInArray takes a string and an array and returns true iff that string appears in any of the array elements. hasFieldWithout takes the same arguments and returns true iff one of the array elements does not contain the string element. hasDuplicateRecords takes an array and returns true if two of the array elements are equal.
To start with, we check whether the price is given for a 6 month subscription or a year-long subscription. If it is a 6 month subscription, S contains the phrase "for 6 mos." somewhere, and the year-long subscription listings have no extra modifiers. So in the first case, we recursively call getPrice and double the result.
if ($str =~ s/for 6 mos//ig){
my $halfprice = getPrice($str,$_[1]);
if ($halfprice =~ /([A-Z]{3} )(\d+\.?\d*)(.*)/){
return $1 . (2*toNumber($2)) . $3;
} else {
return $halfprice;
}
}
Next, we check whether S doesn't have a currency/amount price (like "USD 10") in S, but instead has the keyword "membership" which appears in variants of the phrase "subscription is included with membership". The case in which the keyword "free" is the only currency/amount in S is handled later on, so we make sure to ignore those cases in this block. All other strings S without currency/amounts specified are of unknown format and thus we return the PRICEUNKNOWN keyword.
if ($str !~ /[A-Z]{3} \d{1,}\.?\d{0,2}/) {
if ($str =~ /membership/i){
return "USD 0 (members)";
} elsif ($str !~ /free/i){
return "PRICEUNKNOWN";
}
}
This next codeblock handles all the ways in which the word "free" can appear in P. "1 free journal" appears when the subscription comes with one free issue, "free archival", "free access", "online free", and "free online" appear when a subscription includes access to back issues or to an online version of the journal, "subscription is free" appears when a list of countries is given in which the subscription is free (usually third world countries, never the United States), "free iop" designates free internet access, and "free aapm" designates a journal that is free for AAPM members. None of these situations has a bearing on the relevant price, so if they are contained in their own price listing in P we ignore them, otherwise we delete the instances of "free" so the remainder of the program isn't confused by that keyword.
Next, we replace instances of "free" with the properly formatted "USD 0" when it appears with the keyword "members", meaning that the journal is free to members, or when the only price in P is the one containing the word "free". (If P contains more prices than the one containing the word "free" we do not attempt to interpret the price as this format is not well-defined.) The rest of the program then deciphers the price from there, or we return "PRICEUNKNOWN" if we don't know what to do at that point.
for(my $i=0; $i<=$#prices; $i++){
if ($prices[$i] =~ /1 free journal/i || $prices[$i] =~ /free archival/i
|| $prices[$i] =~ /free online/i || $prices[$i] =~ /subscription is free in/i
|| $prices[$i] =~ /free access/i || $prices[$i] =~ /free iop/i
|| $prices[$i] =~ /free aapm/i || $prices[$i] =~ /online free/i){
if ($prices[$i] !~ /[A-Z]{3} \d+/){
splice(@prices,$i--,1);
} else {
$prices[$i] =~ s/free//ig;
}
} elsif ($prices[$i] =~ /free/i && $prices[$i] =~ /members/i){
$prices[$i] =~ s/free/USD 0/ig;
} elsif ($i == 0 && $i == $#prices && $prices[$i] =~ s/free/USD 0/ig){
} elsif ($prices[$i] =~ /free/i){
return "PRICEUNKNOWN";
}
}
At this point, and at every point after we eliminate some listings in P if there is only one listing in P we return the price contained in that listing. To determine what that is, we use individualPriceListing, which is described below. If there are no listings in P left, we return the "ERROR" string.
Now we get rid of listings in P that we know to be irrelevant. First we take care of a particular case in which a journal has two parts available in the subscription for a higher price. In this case the phrase "Includes A & B" is contained in the price listing in P after the listing that contains the price that this applies to. (This is because there is an extra field-delimiter in the string S that is used non-standardly in this context.) We therefore remove all such prices in this code block.
for(my $i=1; $i<=$#prices; $i++){
splice(@prices,$i-- -1,1) if ($prices[$i] =~ /Includes A \& B/);
}
We are also uninterested in the newsstand price of a journal (unless that is the ONLY price given), and the way in which a price is designated as the newsstand price also requires special handling of it at this point. If the word "newsstand" ever appears in S, it applies not only to the listing of P in which it occurs, but to all later listings. We therefore get rid of all such listings, unless the only listing is for a newsstand price. In this case, the "(perissue)" tag is later applied (from individualPriceListing). We also get rid of all listings in P that do not contained correctly formatted currency/amount prices, since we dealt with the ones that contain useful information already ("free" and "membership").
if (isInArray("newsstand",@prices)){
for (my $i=1; $i<= $#prices; $i++){
splice(@prices,$i,$#prices - $i +1) if ($prices[$i] =~ /newsstand/i);
}
}
for(my $i=0;$i<=$#prices;$i++){
splice(@prices,$i--,1) if($prices[$i] !~ /[A-Z]{3} \d{1,}\.?\d{0,2}/);
}
If we haven't eliminated all but one price listing by now, we determine whether the journal distinguishes between individual or student prices and prices for institutions. If the word "institution" is mentioned in P then we know they provide this information, and so we can exclude all listings that aren't labeled "institution". The same applies to the keywords "library" and "non-member", and "institutions(academic)" although the last two must be handled separately since there is such a thing as a "member institution", and whenever something is designated "institutions(academic)" there is another price specifically for "instituions(other)" also. Similarly, if there are listings that aren't specifically for students or individuals, we can exclude the listings that are.
if(isInArray("non-members",@prices) || isInArray("Non-Members",@prices)){
for(my $i=0;$i<=$#prices;$i++){
splice(@prices,$i--,1) if ($prices[$i] =~ / members/i);
}
}
if (isInArray("institution",@prices) || isInArray("Institution",@prices) || isInArray("libraries",@prices) || isInArray("Libraries",@prices)){
for (my $i=0; $i<=$#prices; $i++){
splice(@prices,$i--,1) if ($prices[$i] !~ /institution/i && $prices[$i] !~ /libraries/i);
}
}
if (hasFieldWithout("student",@prices)){
for(my $i=0;$i<=$#prices;$i++){
splice(@prices,$i--,1) if ($prices[$i] =~ /student/i);
}
}
if (isInArray("institutions(academic)",@prices)){
for (my $i=0; $i<=$#prices; $i++){
splice(@prices,$i--,1) if ($prices[$i] !~ /institutions\(academic\)/i);
}
}
The next criterion for elimination takes into account the country of origin, and identifies the keywords "domestic" and "foreign". If the country is the U.S. and some price is tagged domestic we can eliminate all those that aren't, and in any case we can eliminate all prices tagged as foreign. If the country isn't the U.S. we can eliminate all domestic prices.
if ($country eq "United-States"){
if (isInArray("domestic",@prices)){
for(my $i=0; $i<=$#prices; $i++){
splice(@prices,$i--,1) if ($prices[$i] !~ /domestic/);
}
} else {
for (my $i=0; $i<=$#prices; $i++){
splice(@prices,$i--,1) if ($prices[$i] =~ /foreign/);
}
}
} else {
for (my $i=0; $i<=$#prices; $i++){
splice(@prices,$i--,1) if ($prices[$i] =~ /domestic/i);
}
}
In addition to this relative foreign/domestic location designation, many journals use exact location descriptors. If any price listings in P are designated as for the "United States" or the "U.S." or the "Usa" or "US" or the "Americas" or "North America", we can exclude all the other prices. Otherwise, one of the descriptors "elsewhere", "foreign", "Except Europe and Japan", or "rest of world" applies to the United States, and we can eliminate fields that don't have one of these labels. We can also eliminate prices for developing nations.
if(isInArray("United States",@prices) || isInArray("North America",@prices) || isInArray("Americas",@prices) || isInArray("Usa",@prices) || isInArray("U.S.",@prices) || isInArray("US ",@prices)){
if (isInArray("United States",@prices) || isInArray("U.S.",@prices) || isInArray("Usa",@prices) || isInArray("US ",@prices)){
for (my $i=0; $i<=$#prices; $i++){
splice(@prices, $i--,1) if ($prices[$i] !~ /united states/i && $prices[$i] !~ /U\.S\./i && $prices[$i] !~ /Usa/i && $prices[$i] !~ /US /);
}
} elsif (isInArray("North America",@prices)){
for (my $i=0; $i<=$#prices; $i++){
splice(@prices,$i--,1) if ($prices[$i] !~ /north america/i);
}
} elsif (isInArray("Americas",@prices)){
for (my $i=0; $i<=$#prices; $i++){
splice(@prices,$i--,1) if ($prices[$i] !~ /americas/i);
}
}
} elsif (isInArray("elsewhere",@prices) || isInArray("foreign",@prices) || isInArray("Except Europe and Japan",@prices) || isInArray("rest of world",@prices)){
for (my $i=0; $i<= $#prices; $i++){
if ($prices[$i] !~ /elsewhere/i && $prices[$i] !~ /foreign/i && $prices[$i] !~ /rest of world/i && $prices[$i] !~ /Except Europe and Japan/i){
splice(@prices,$i--,1);
}
}
}
for (my $i=0; $i<=$#prices; $i++){
splice(@prices,$i--,1) if ($prices[$i] =~ /in developing nations/i);
}
In the next codeblock we are not eliminating any price listings but rather building several arrays of information that will be helpful later on. First we build boolean arrays that designate whether each entry in P is for a combined (print and online) edition or exclusively online. This can be accomplished by checking whether the phrases "combined subscription" or "print and online" occur in each listing of P. We then build a similar array designating whether each listing only contains per issue prices. This is more complicated since every listing may include several currency/amount prices, some of which are labeled per issue and some that aren't. We therefore count the number of currency/amount prices and the number of "per issue" labels and if the latter is greater than zero and at least the number of prices, then every price is labeled per issue, so that entire listing is of the per issue price.
my @combined = ();
my @perissue = ();
my @online = ();
for(my $i=0;$i<=$#prices;$i++){
if (($prices[$i] =~ /combined subscription/i) || ($prices[$i] =~ /print \& online/i) || ($prices[$i] =~ /print and online/i)){
push @combined, "1";
push @online, "1";
} elsif (($prices[$i] =~ /online ed/i && $prices[$i] !~ /print ed/i) || $prices[$i] =~ /Print Or Online/){
push @combined, "0";
push @online, "1";
} else {
push @combined, "0";
push @online, "0";
}
my $perissuecount = 0;
my $pricescount = 0;
$perissuecount++ while ($prices[$i] =~ /(per issue)/ig);
$pricescount++ while ($prices[$i] =~ /[A-Z]{3} \d{1,}\.?\d{0,2}/g);
if ($perissuecount != 0 && $perissuecount >= $pricescount){
push @perissue, "1";
&nb sp; } else {
push @perissue, "0";
}
}
Next, we use these three arrays to select online editions over print editions over combined editions, and all over per issue prices. This is because online editions, when available, are always cheaper than the paper editions, and print editions are less than or equal to the price of combined editions. For each journal we therefore choose the lowest price that gives full access to the institution.
if (isInArray("0",@perissue)){
for (my $i=0;$i<=$#prices;$i++){
if($perissue[$i]=="1"){
splice(@perissue,$i,1);
splice(@combined,$i,1);
splice(@online,$i,1);
splice(@prices,$i--,1);
}
}
}
if (isInArray("0",@combined)){
for (my $i=0;$i<=$#prices;$i++){
if($combined[$i]=="1"){
splice(@perissue,$i,1);
splice(@combined,$i,1);
splice(@online,$i,1);
splice(@prices,$i--,1);
}
}
}
if (isInArray("1",@online)){
for (my $i=0;$i<=$#prices;$i++){
if($online[$i]=="0"){
splice(@perissue,$i,1);
splice(@combined,$i,1);
splice(@online,$i,1);
splice(@prices,$i--,1);
}
}
}
The last type of price listing we can eliminate is that for a journal with additional supplements or subscriptions to other journals. These are identifiable by the phrases "(Includes" which is then followed by a list of the supplements or other journals, or "For Full Set". At this point, some journals only list a "full set" price, so we also have to check that there are other prices still listed before eliminating that type.
for(my $i=0;$i<=$#prices;$i++){
splice(@prices,$i--,1) if ($prices[$i] =~ /\(Includes /i);
}
if (hasFieldWithout("For Full Set",@prices)){
for (my $i=0; $i<=$#prices; $i++){
splice(@prices,$i--,1) if ($prices[$i]=~ /For Full Set/i);
}
}
There are also a few journals that list member prices separately (in a different listing of P) than non-member prices, so we exclude those here.
if (hasFieldWithout("member",@prices)){
for (my $i=0; $i<=$#prices; $i++){
splice(@prices,$i--,1) if ($prices[$i]=~ /member/i);
}
}
We also want to exclude prices that are for countries in the AAAP member area, and prices that include IP-number access to the online edition, because in this case they also offer password-access for a lower price.
for(my $i=0; $i<=$#prices; $i++){
splice(@prices,$i--,1) if ($prices[$i] =~ /Aaap Area/i || $prices[$i] =~ /Ip-Number access/i)
}
At this point, we should have narrowed P down to one possibility. If not, the price is unknown and we return the error tag.
As described above, each price listing in P may contain several currency/amount prices; for example "USD 10, GBP 8 to institutions" is just the same price for the same customer type listed in two currencies. We therefore have a second function for interpreting these prices. Starting here, @priceshere is the name for the array of prices, which is created by splitting the remaining listing in P above into the pieces that each contain a single currency/amount price. So for example, if the price decided on above was "USD 5, GBP 4, CND 6 for institutions", @priceshere would contain "USD 5, ", "GBP 4, ", and "CND 6 for institutions".
First of all, as a check we return the error tag if there is no currency/amount price in the listing. Note that $_[1] is the second argument passed to the function, which is the listing in P that was chosen earlier.
if (!($_[1] =~ /([A-Z]{3} \d{1,}\.?[0-9]{0,2})/)){
return "PRICEUNKNOWN";
}
The most common reason for having multiple currency/amount prices listed together is to give a price in several currencies. We therefore first see if a price is given in USD, and if so we ignore everything else.
if (isInArray("USD ", @priceshere)){
for (my $i=0;$i<=$#priceshere;$i++){
splice(@priceshere,$i--,1) if ($priceshere[$i] !~ /USD/);
}
}
We then exclude any prices that are per issue, for members only, or for a combined subscription.
for (my $i=0; $i<=$#priceshere; $i++){
splice(@priceshere,$i--,1) if ($priceshere[$i] =~ /\(per issue\)/);
}
for (my $i=0; $i<=$#priceshere; $i++){
splice(@priceshere,$i--,1) if ($priceshere[$i] =~ /to members/);
}
for (my $i=0; $i<=$#priceshere; $i++){
splice(@priceshere,$i--,1) if ($priceshere[$i] =~ /combined subscription/);
}
If after any of the above eliminations we are left with a single possible price, we return that amount along with the qualifiers "perissue" "members" or "pervolume" if applicable. If there are still several choices at this point, the original price determination function getPrice can be useful, so we combine the remaining choices and apply that function to them. We first, however, delete any instances of keywords like "domestic" or "institutions", since at this point these apply to all of the remaining choices, regardless of which array element they are spelled out in. We also attach a tag "PREPROCESSED" to each of the choices that designates that they have already been through the second price function, so an indeterminate price does not get stuck in an infinite loop of indecision.
There is one additional segment of the getPrice that handles the case when that function still can't decide on a price at this point. There is one case in which it doesn't matter which price you choose, and that's when there are several prices listed together in different currencies, none of which is USD. If that's the case we choose the price given in EUR if available, otherwise just the first price listed, otherwise we returned the unknown price string.
if (isInArray("PREPROCESSED",@prices)){
my @isolatedprices = ();
foreach my $price (@prices){
$price =~ /([A-Z]{3}) \d{1,}\.?[0-9]{0,2}/;
push @isolatedprices, $1;
}
if (hasDuplicateRecords(@isolatedprices)){
return "PRICEUNKNOWN";
} else {
for (my $i=0; $i<=$#prices; $i++){
if ($prices[$i] =~ /EUR/){
return individualPriceListing($country,$prices[$i]);
}
}
return individualPriceListing($country,$prices[0]);
}
}
return "PRICEUNKNOWN";
As noted previously on the database information page, there are several ways in which journals can be omitted from the database due to its price listing. If it only has member prices, per issue prices, or per volume prices, we do not include the journal. If the above method cannot decide on a price (this happens when there is an obvious typo, such as "USD 10, CND 12 domestic; USD 15, USD 20 foreign", or when the price is something indeterminate like "price varies", or when the format of the price isn't well defined, such as "free; USD 15") it is also not included in the database.