Unintended technological consequences


US-Navy-Submarine

You never know what nefarious activity might be associated with your IP address — as it turns out, companies that try to map the physical locations of computers and electronic devices have a crude workaround for searches that don’t return a valid location. They instead return a default location, which is the geographic center of the country or state or county. The problem is that sometimes there are people living there, and suddenly the police trying to trace a stolen laptop are knocking on your door with a search warrant.

Back in the day when I was programming stuff, this was consider extremely lazy, sloppy programming. You were supposed to return a sentinel value or NaN for an invalid search, not throw in an arbitrary answer. This is a case where it’s a good idea to tell someone making a query “not found”, rather than making something up.

So now there’s a farmhouse in Kansas that is associated with 600 million IP addresses. The owner has an old Gateway computer that she uses to write Sunday School lessons, but apparently she’s the nexus of all the high-tech evil that goes on in America.

The company that sells this IP mapping service says they’re going to fix it.

Now that I’ve made MaxMind aware of the consequences of the default locations it’s chosen, Mather says they’re going to change them. They are picking new default locations for the U.S. and Ashburn, Virginia that are in the middle of bodies of water, rather than people’s homes.

They’ve learned nothing. That’s not how you do it. I’m kind of shocked that these security companies should be taking advice from an old geezer biologist: don’t substitute in any fake map coordinates. Your functions ought to be returning an indication that the IP address does not have a known physical location.

Next unintended consequence: county police all over the country start lobbying homeland security for the money to buy military surplus submarines, because of the sudden rash of high-tech meth labs being built at the bottom of the lake in the old gravel pit, you know, the one down the road near the center of the county? Yeah, that one. There are a million computers installed in that facility, so you better make sure you get us a nucyuler sub, with the best torpedos and missile tubes.

Comments

  1. blf says

    Back in the day when I was programming stuff, this was consider extremely lazy, sloppy programming.

    It still is considered such.

    Proper engineering discipline has not made much impact in too many software houses.

    (Disclaimer: Precluding this shite from every happening in the first place is one of my hobbyhorses.)

  2. happyrabo says

    I’m afraid I can’t agree on your proposed solution, PZ. This database is only intended to be precise down to the level of a zip code. It was made so advertisers would know if you live somewhere vaguely near one of their stores. It was never intended to be used to lead FBI agents to a particular house. If they returned no results any time they couldn’t be precise to the house level, they wouldn’t even have a product.

    Now that they know how their software is being misused, changing the default location to an uninhabited area like a body of water in the area is a good call. But they need to do that everywhere, not just in those two places.

  3. says

    It’s not sloppy. sometimes close is good enough. rather return a geolocation that’s within 20 miles rather than nowhere or null. If you wanted to mitigate this a bit, you can also return a confidence factor.

  4. komarov says

    Surplus submarines? Law enforcement is a dangerous job, these people deserve the best. In this case, bespoke submarines designed from scratch specifically for … whatever their purpose would be. With that there would also be a need for police cruisers – the naval sort – destroyers, minesweepers, minelayers so the minesweepers have something to do and vice versa, among other things. Last, but certainly not least, the CIAircraft carrier (or two) with all the bells and whistles, not to mention a few fighter squadrons. You can never invest too much money in keeping your taxpayers safe. This entirely impartial policy advice is courtesy of your weapon manufacturers defence industry.

    Re: happyrabo, #2:

    At best that still makes it sound like someone reused code without making sure it was fit for its new purpose. PZ’s point stands: the software gives the impression of an ‘accurate solution’ when it should be showing an error. It’s like a calculator that gives out ’42’ instead of ‘syntax error’ when you’ve made a mistake. Not very helpful at all.

  5. says

    WordPress tries to map all the time. Whenever someone follows a blog, you get an email, which purports to tell you that person’s location. It’s almost always wrong.

  6. slithey tove (twas brillig (stevem)) says

    re 2:
    then return the Zip Code, nothing more. Not even a lat-log coordinates of the center of the zip code area. (maybe ‘city, state’ in addition, only)
    Pinpointing a faux location falls in the “false identification” category, especially if the faux location has an actual house and residents there.

  7. Nerd of Redhead, Dances OM Trolls says

    I live in Chiwaukee north of the Great Lakes Naval Training Base. Google interprets my IP address as coming from Highland Park (south of GLNTB), down near the Lake County/Cook County border. It moves further south every time Comcast upgrades their service. I should live in Chicago about ten years from now. Not very accurate.

  8. wzrd1 says

    Now, now. You forget today’s business world, results are mandatory, even if they’re nonsense, they’re an answer.
    Or something.

    Of course, some of these fine, ah hem, companies are nothing more than copyright trolls, sending threatening letters that insist that if you don’t pay up, your friends and neighbors will learn that you’ve been downloading all manner of pornography that you’ve never even heard of the genre of, then go to court and pay hundreds of thousands of dollars in damages. A few were dismantled by court investigations, but they pop back up under new names, as all of the partners are never identified.
    I got a couple of such letters, to which I sent a reply with my attorney’s name and instructions that any future contact must only be with a court date. As I maintain the same general logging setup that I support at work for our information security department, my logs can beat up their logs any day. It also helps that I spent quite a few years as a DoD trusted agent. :)
    Interestingly enough, the organization that sent me those two letters had their corporate officers, first disbarred, then incarcerated for fraud upon the courts and all decisions in their favors reversed.
    I guess now, they’ll be depth charging a lake, just to serve a warrant or cease and desist letter to the fish.

  9. cartomancer says

    Wouldn’t a secret villain lair at the bottom of one of these placeholder lakes be just about the best place to avoid electronic surveillance then? Once everyone knows that the technology just points you there because it’s a designated spot with no human activity, wouldn’t that be great cover for aspiring online ne’er-do-wells?

  10. Owlmirror says

    After a few seconds of thinking about it (ie, still probably kind of lazy) . . .

    If it returns a zip code, the lat-long should be the coordinates of the post office, and the address field should read:

    NO ADDRESS FOUND
    LAT-LONG IS LOCAL POST OFFICE
    Whatever, XX, 00000

  11. says

    As a software engineer, I must say that sometimes it is the case that returning some sort of invalid indicator may not be possible based on the data structure. I personally have some experience working with navigation systems and we are rather limited on what we can do with an invalid position because our data is restricted to be within latitude and longitude values that actually exist. We generally do have a status flag to indicate if our numerical data is bad, but the numerical data itself does have to be that of a valid position.

    I would also agree with happyrabo @2 in a general sense that some data may not have been intended to ever be invalid. Where I may disagree is that good programmers should be more thoughtful about their use cases. In this case, if the idea was to help advertisers know if a customer is in their area, as happyrabo claims, it should have been obvious that some users would catch on to this and try to hide from those advertisers and that these systems would need to handle that.

  12. ftltachyon says

    It’s probably too late to replace no answer with NaN or None or something. The time to do that was at the design stage. And at the design stage, the thinking presumably was that you give the best answer you can:

    If you know the location, give the location.
    If you don’t know the location but know the zip code, give the center of the zip code.
    If you don’t know the zip code but know the state, give the center of the state.
    If you don’t know the state but know the country, give the center of the country.
    Etc.

    Basically, take whatever information you actually have and return the best possible guess given that.

    Of course, that works when it’s being used in an approximate way. It’s perfectly fine for ad targeting or something like that, or for showing a pretty picture of where in the world your blog visitors are coming from. It’s terrible when it’s used to send SWAT teams to people’s houses.

    But it’s become entrenched, so it’s too hard to change. If suddenly the service starts returning values that are not valid coordinates, who knows what’ll break? So hence “hotfixes” like changing the default location to be inside a body of water.

  13. says

    Caine #5:

    WordPress tries to map all the time. Whenever someone follows a blog, you get an email, which purports to tell you that person’s location. It’s almost always wrong.

    You do? I just get a notice that So-And-So is now following my blog, along with a link to their gravatar page and links to some of their blog-posts, if they have a blog.

    ———————————-
    A flash-games site I used to frequent had a geo-locator widget purporting to show the locations of people currently using the site. Mine always showed up as Bournemouth, which is seventy-odd miles away, and in the next county.

  14. whheydt says

    If they are using a database, the obviously correct result is to return NULL, and properly test for that result. If they are contrained to return a numerical lat/long result, given the physical space they are dealing with, the answer ought to lat 0, long 0. Fortunately, there is nothing there but ocean, and outside US territory, at that.

  15. Sastra says

    So now there’s a farmhouse in Kansas that is associated with 600 million IP addresses. The owner has an old Gateway computer that she uses to write Sunday School lessons, but apparently she’s the nexus of all the high-tech evil that goes on in America.

    Kansas?? Sunday school lessons????? Well, duh. Nexus of All the High-Tech Evil in America should know how to hide better than that.

  16. says

    Daz:

    You do? I just get a notice that So-And-So is now following my blog, along with a link to their gravatar page and links to some of their blog-posts, if they have a blog.

    Yeah. Maybe they stopped, I haven’t paid attention in a long time. Back when I noticed, I noted that a rather overwhelming amount of people were mapped to Minnesota, but only a few were actually based there.

  17. says

    Wait, are people actually making excuses for this bad programming? Even back in the distant ages, when we banged rocks to make holes in wood pulp in order to program giant machines that filled the whole back of the cave, we used NaNs in FORTRAN to flag invalid results. Are you telling me that modern programming languages are worse than FORTRAN, to excuse this laziness by software designers?

    Now I’m wondering what the code MaxMind is running is written in. Dare I guess…COBOL?

  18. Matt says

    @7: Zip code is culture specific, what about outside of the US and Canada? Lat-long works anywhere in the world and isn’t dependent on geopolitical markers. It can even be problematic to label what is and is not a country. China has gotten really upset with software vendors before just for labeling Taiwan as a “country” in drop down boxes. That’s why the wording these days is usually something like “region.” It makes sense to keep the data simple and avoid all of those potential problems.

    Maybe a better solution would have been to devise some addressing scheme that could locate regions anywhere on the globe, but who’s going to go through the hassle of that when you can just use lat-long, which is already well established and understood? Lat-long is simple data that any system can take and understand with ease. You can expect to pass lat-long into other systems and they’ll be able to understand it, etc etc. It’s pretty universal and it makes sense they’d stick with it.

    I’m a professional software developer and also quite a curmudgeon about industry practices, and usually agree that this sort of thing is just sloppy engineering. But in this case I think it may just be more of an unfortunate evolution of technology, one that wasn’t reasonably able to be foreseen.

  19. Menyambal says

    This was part of my master’s thesis. Computer mapping was really getting popular, and I argued against using it in our case. We were mapping groundcovers that were poorly defined, hard to measure precisely, and that varied with the seasons and year-to-year. But if someone put an area into a computer, it was going to be rigidly divided, designated to decimal places, never checked again, and assumed to be perfect.

    I sat in on a mapping conference where some were arguing for a confidence rating. After they got done explaining, I asked how confident they were in their confidence ratings.

  20. wzrd1 says

    @PZ. likely C+++ATH, one of the few faithful C derivatives.

    Apparently, as I hinted to, bullshit results trump factually accurate results. Hence, the poor sap at the nexus of evil or the soon to be depth charged reservoir.
    Worse, programmers are now being conditioned to not deliver an error or null, but erroneous information that is presented as accurate information or placeholder information.

    For the record, my wife was trained to punch those holes in cellulose with a rock. I could read them by eye.
    Hell, if I really dig around, I’m pretty sure I still have an old magnetic core memory laying around in the back room.

  21. blf says

    Wait, are people actually making excuses for this bad programming?

    Unfortunately, they are. Which simply means they are incompetent. There is no excuse for this shite, there is no excuse for this shite, and anyone who thinks there is is either a software manager (who by definition is an incompetent) or, quite possibly, a wanna-be software manager (and therefore incompetent).

    As I remarked in @1, “Proper engineering discipline has not made much impact in too many software houses” — and we now observe this incompetence in action. Again.

  22. kaleberg says

    Drat, and I’ve just finished building my underwater evil genius hide out for nefarious activities under the waters of Lake [Redacted]. This means I’ll have to use a VPN.

  23. blf says

    I think it may just be more of an unfortunate evolution of technology, one that wasn’t reasonably able to be foreseen.

    Bullshite. Tell me, right now, what my Lat-Long is.

    You can’t. It’s impossible. Therefore, there is always the “don’t know” case.

    Fecking software-managing incompetents.

  24. says

    I’m surprised one of the southern California or Florida law enforcement agencies doesn’t have a submarine of some sort. I’m sure some of the bigger agencies probably have remotely piloted underwater vehicles. The Canada Border Services Agency has Benthos Stingrays to survey the bottoms of ships for contraband, and I’m sure US Customs does as well.

  25. bcwebb says

    How about demanding they add a random offset to their default location, perhaps scaled by some measure of confidence: where crimes or harassing email are involved there are likely to be multiple traces of the IP address – a result that keeps changing would deter people from thinking the address is a specific location.

    — It would kind of be like airline pricing where each time you check you get a different price which tells you when you are ready to book a ticket it’s going to be expensive. :>).

  26. newbery says

    > Wait, are people actually making excuses for this bad programming?

    No… I believe people are only disagreeing with you about the original requirements. Whether this is bad programming depends on how you define bad in this case. A stronger argument can be made that it’s the consumers of this data that are perhaps guilty of bad programming… not understanding and miss-communicating the limitations of the information they are delivering to the end users.

    Most IP addresses are not actually mappable to the specific geographic address of the computer using the IP address. The requirement was to provide an “estimate” of the geo address based on the best available information, with as much specificity that can be obtained. In some cases, the best they could do was to say that the IP address was used somewhere in the U.S., hence the default middle-of-US address.

    It turns out that the MaxMind database does indicated the specificity of the data, from country, to region, and to city. What kind of information you get back is dependent on what you ask for but you can indeed tell whether you only have country-level specificity. I’m sure the default values are to help in those use cases that expect a standard datatype for geomapping software.

  27. says

    kaleberg #23:

    Drat, and I’ve just finished building my underwater evil genius hide out for nefarious activities under the waters of Lake [Redacted].

    Close it down and conceal the entrance.
    Wait ’til the various three-letter agencies have all realised that “Lake [Redacted]” is a system-created red-herring.
    Reopen your hide-out, safe in the knowledge that you’re in the one place they will now ignore.

  28. Rich Woods says

    600 million IP addresses is 14% of the total number of IPv4 addresses, and 40% of the total allocated to the US. That is a significant failure rate at the national scale. It must be repeated — to some degree — at the local scale too.

    Do the consumers of MaxMind’s mapping service understand what they’re paying for?

  29. wzrd1 says

    Bleh, with all of that shutting down, hiding, then reopening and hoping it went unnoticed.
    I set my hideout right next door to the NSA, where they’d never look. :)
    If they did attack, they’d destroy their own water supply and potentially breach my escape tunnels under the NSA.

  30. Carl Muckenhoupt says

    PZ’s initial suggestion of just returning a null value seems too extreme in cases where you actually do know something, even if it’s just the country. But when all you know is the country, returning an exact address is clearly wrong — if all they know is that the IP is somewhere in the USA, the result of the query should be “USA” and nothing else. Sure, some services have a need to map vague queries to precise locations: if you search for “USA” in Google Maps, it has to center the map somehwere, and the geographic center of the country is a good choice for that. But MaxMind is not under such a constraint.

    The thing is, though, that’s very likely not good enough for MaxMind. They’ve built a business on the idea that if you give them an IP address, they’ll give you a street address. Maybe they’ve never technically said that, but it’s clear that this is what a lot of their customers think they’re getting, and that therefore MaxMind has reason to believe that they’ll lose business if they stop maintaining that illusion.

    In other words, this most likely isn’t simply a case of bad engineering, but rather, a case of engineering being dragged down by marketing.

  31. ck, the Irate Lump says

    Looking at the MaxMind site, their geolocation database only really advertises city or country-level accuracy (depending on product). Anyone who uses a city-level geolocation database and assume that the IP can target all the way down to a single residence is clearly misusing the data.

  32. ck, the Irate Lump says

    Also, this isn’t to excuse MaxMind who clearly should be returning an estimated accuracy result with every database query, if for no other reason than to make it clear that it doesn’t locate down to street address, despite how accurate the numbers may seem.

  33. A Masked Avenger says

    Geocoding software that determines coordinates from street addresses have the same problem: the failure mode is generally to show the center of the town or zip code.

    In that case, at least, I don’t think they actually know when they don’t know. “Google Brain” is good at figuring things out, but I don’t think it has any ability to introspect its own conclusions.

    Going further on a limb, I suspect Maxmind’s Geoip data is populated from multiple sources that also lack introspection, so I suspect they’re incapable of noticing when the location is bogus.

    What they COULD do is build a database of reverse lookups and notice when there’s an unusual number of IP addresses at the same address. Which may be correct for, e.g., data centers, so they would then need to start making inferences some more, using things like a database of center points of counties, zip codes, and municipalities. They’re probably proposing to do just that, and then translate to the middle of the nearest body of water.

    NULL values would be more logically correct, but would cause millions of people to be blocked from using all manner of web services, from Netflix to Amazon, that restrict by country. Most people prefer answers that are only accurate within 100 miles over no answer at all.

  34. A Masked Avenger says

    Grr. I meant a reverse lookup linking coordinates to IP addresses. What I typed above is gibberish.

  35. newbery says

    Also, this isn’t to excuse MaxMind who clearly should be returning an estimated accuracy result with every database query, if for no other reason than to make it clear that it doesn’t locate down to street address, despite how accurate the numbers may seem.

    Really? Hmm… Other than as a reminder, how would anyone use these “estimated accuracy” values? If these extra numbers are actually useful, then how about also an “estimated error” of the “estimated accuracy” of the “estimated geo-location”? I’m only being slightly facetious here. Considering the immense size of this database, and the likely complexity of the geo-location heuristics involved, and the limited utility of the final result, I suspect that this is very hard problem and one likely to be fraught with it’s own inaccuracies.

    In any case, it turns out that the MaxMind database can indeed return “confidence” estimates at each level from country, to region, to city, and to postal code. They also provide an “accuracy radius” for each geoip location. Is this not enough?

    http://dev.maxmind.com/geoip/geoip2/web-services/

  36. Pseudonym says

    Geographic data like this should always come with a precision indicator (see for example the browser geolocation API), and visualizations should indicate the precision of those results. Back in the early days of the web it was harder to do the latter, but there’s no excuse now. It’s not as simple as just returning a null or NaN because these kinds of results will always have a certain level of imprecision and uncertainty. If you’re showing a world map of the locations where blog visitors are coming from, then indicating the middle of the contiguous US for US IPs without any more precise location data is a defensible choice. If you’re zooming in to a single map marker for a given IP then it isn’t. Still, a lot of the blame should be placed on the police who sought and judge who approved the search warrant based on that imprecise data.

  37. wzrd1 says

    The problem with the lack of precision is simply, maxmind doesn’t get IP to address log information from ISP’s, they extrapolate that information via metadata and other sources of data.
    The inaccuracy results in an IP pool being assigned to an area by an ISP, where the pool may serve anywhere from a block to a county, when the lease expires and then is reassigned to a different host in the network (typically, the “modem”/router at the customer premises. Most ISP’s don’t issue out static IP’s, as that’s more of a business service than a consumer grade service.
    So, IP 10.10.10.1 is assigned yesterday morning at 1341 Drive Drive, tomorrow, it’s assigned across the county to 1322 IEEE drive.
    So, when the warrant is issued and executed, who gets searched? Even money, the wrong address gets the flash bang tossed into the bedroom window.
    Resulting in an incinerated bedroom.

  38. Pseudonym says

    Well, IP 10.10.10.1 is probably assigned to lots of different places at the same time… (=

    But the police should really be subpoenaing ISPs before they go seeking search warrants, not relying on a marketing geolocation service.

  39. wzrd1 says

    You’d get no argument out of me, Pseudonym. The problem is, jurists have no clue about networks and errors in mapping networks and simply take law enforcement’s word on what address is to be searched.
    Add to that a militarized police force, where previously, a warrant was knock and enter and now a SWAT affair, yeah, it can get interesting. A few innocent people have been gunned down by a early morning drug raid on the wrong house, with a big “whoppsie” from the SWAT team that summarily executed a US citizen for the temerity of being upset and groggy in the wee hours of the morning.

  40. newbery says

    The problem with the lack of precision is simply, maxmind doesn’t get IP to address log information from ISP’s, they extrapolate that information via metadata and other sources of data.

    FWIW, it does not appear that MaxMind provides a geo-location-to-street-address service at all. They only provide the IP-to-geo-location service.

  41. fakeusername says

    ftltachyon is correct: this looks like a design issue, not a coding issue. If the design specified that the API always returns a valid lat/long, then software that uses the method is allowed to assume that the lat/long returned is always valid. Modifying the software to return NaN, NULL, or some other sentinel would be an API violation and would break API clients in unpredictable ways.

    There are plenty of scenarios where returning the lat/long of a county or city is perfectly adequate, such as identifying the most relevant advertising to show to people. I don’t know anything about the developers of this software, but I’d bet that they get a lot more money from advertisers than they get from cops.

    The main fault here is in the police, who seem to be assuming that the lat/long are always exact; regardless of how the API is documented, that’s not a reasonable capability for it to have. If the API has a problem, it’s in its design or documentation. The design should provide a measure of how accurate the position estimation is (eg, 0=probably the correct house, 1=probably the correct block, 2=probably the correct municipality, 3=probably the correct county/etc, 4=probably the correct state/province/etc, 5=probably the correct country, 6=known proxy server, 7=complete garbage, etc.), and what it does should be clearly documented. I have no idea what information the API actually provides or how it is documented; it might already do exactly this.

    PZM: You’re a biologist, not a software developer. What I see here is just like what you see when some non-biologist says something about biology that seems reasonable to non-experts but is in fact quite wrong. Feel free to have your own opinions, but be ready to take correction from people who actually understand the field.

  42. ck, the Irate Lump says

    newbery wrote:

    In any case, it turns out that the MaxMind database can indeed return “confidence” estimates at each level from country, to region, to city, and to postal code. They also provide an “accuracy radius” for each geoip location. Is this not enough?

    That’s great that the API supports it, but take a look at their demo page: https://www.maxmind.com/en/locate-my-ip-address

    It gets my city right, but makes no clear indication that it’s city-level accuracy, and returns a postal code that does not match mine with no indication that the postal code may be inaccurate.

  43. ck, the Irate Lump says

    I’ll should mention that the JavaScript API also returns the incorrect postal code and also provides no indication that it is wrong or that it is just one of several assigned to my city. If you blindly tried to use that data you might get incorrect results.

  44. newbery says

    That’s great that the API supports it, but take a look at their demo page: https://www.maxmind.com/en/locate-my-ip-address
    It gets my city right, but makes no clear indication that it’s city-level accuracy, and returns a postal code that does not match mine with no indication that the postal code may be inaccurate.

    So, you’re complaining that this demo page doesn’t display all the data that can be queried from their various products? I’m not sure I understand your point.

    I’ll should mention that the JavaScript API also returns the incorrect postal code and also provides no indication that it is wrong or that it is just one of several assigned to my city. If you blindly tried to use that data you might get incorrect results.

    I’m assuming you are referring again to the demo page? I suggest looking at http://dev.maxmind.com/geoip/geoip2/web-services/#postal , where it’s indicated that the “confidence” data is available in their “Insights” product, which I’m guessing is not what is being demonstrated on that page?

    Regarding the postal code errors in general, I don’t believe they promise a terribly high accuracy rate and in fact they seem to do a fair job of pointing that out in several places on their website. So again, I’m not sure I understand your point.

    It seems that if this confidence-level or accuracy-radius information is important for your use case, they do claim to offer an enhanced product, called Insights, that delivers that. I’m guessing it’s just not relevant for most of their customers.

  45. crosswind says

    I do have to disagree with your technical criticisms. Wild approximations are almost never acceptable in the kinds of scientific applications you’d be familiar with, but this is not a product designed for scientific applications. It’s designed for cheap applications that need a best guess of the geographical location associated with an IP address. It’s the misuse of that data beyond its confidence level that’s the problem. Furthermore, if it’s an industry standard that clients simply expect (I’d lay dimes to donuts this is the case, but can’t find confirmation of that), then MaxMind wouldn’t even have a choice in the matter. As an example of when an inaccurate value would be desirable, consider a map application that just wants a default geographical location to center on. Or how about a service provider who wants a best guess of your time zone. I’m sure there are hundreds of other applications I can’t think of off the top of my head that would appreciate a legitimate “best guess” even if it’s wildly inaccurate in some cases.

    The problem is applications using this data that overstate the confidence, and use it in situations which are inappropriate for that level of confidence. A cursory look at the API indicates that MaxMind does provide at least some confidence level for its data, meaning any application that does require at least state-level or county-level accuracy could discard results that lack sufficient confidence for their purpose. For most business applications this is probably harmless even if is used inappropriately, and it only becomes a serious issue when it’s used by internet vigilantes or investigators of questionable competence.

    Given the situation, MaxMind’s solution is a very graceful one. MaxMind doesn’t sell software, it sells a subscription to its data. The software itself is maintained by and proprietary to its clients. Any change to their data format specifications would require software changes on the part of their clients, potentially breaking older applications that could not be easily updated and causing anger and consternation among those who are forced to expend development resources to update applications. Worse, there’s the very real risk that the clients might just replicate the old geographical center logic and reintroduce the very same problem. Regardless of what the situation was around the original specifications or how you may feel about that, changing it now is not a realistic option. MaxMind’s fix is a simple and straightforward one that stands to fix the problem at hand. And if an investigator wants to make a national showing of his incompetence by requisitioning a submarine to chase underwater hackers, then that’s a feature and not a bug.

  46. Victor says

    In my case, the demo page gave the correct ISP and state, but gave a city which is about 90 miles east of my location.

  47. Victor says

    I found a website which uses a number of different products to do geolocation for an ip address, including MaxMind GeoLiteCity, The location from MaxMind was actually quite close – looks like the lat-long for my local post office. The site listed five products; three got my local post office, one was the community 4 miles south of me, and the fifth was about 70 miles south of me where the ISP has a regional hub.

    The difference was that when I went to the demo site, it used my ipv6 address, while the web site I went to used my ipv4 address. When I entered my ipv6 ip address into that website, only three of the products they used returned information; two said I was in Richardson, TX, and one said I was in the United States.

    FYI, I’m in Los Angeles, in a community near the western boundary of the city and county.

  48. says

    newbery #47:

    That’s great that the API supports it, but take a look at their demo page: [link]
    It gets my city right, but makes no clear indication that it’s city-level accuracy, and returns a postal code that does not match mine with no indication that the postal code may be inaccurate.

    So, you’re complaining that this demo page doesn’t display all the data that can be queried from their various products? I’m not sure I understand your point.

    Well if you read a complaint of inaccuracy as a complaint of incompleteness, you’re not likely to understand the point being made.

    I’d also point out that it gives lat–long coordinates to four decimal places, which implies an accuracy within something like 11 metres, yet at the same time doesn’t even place me in the correct postal district (the first, “general area,” part of the UK postal code). I couldn’t give a fig for all the technical discussion; as a potential end-user, this would not inspire me to pay good money for the service being offered. And it definitely shouldn’t, IMO, be being used by law enforcement bodies. As far as I can tell from the demo—which is, after all, supposed to persuade me that the product is worth paying for—it’s a heap of shite.

  49. anym says

    As Psuedonym mentioned above, the issue is one of the police using a low-resolution mapping service for a purpose it clearly wasn’t intended for. They have the authority to ask the actual ISPs involved who a given IP address was allocated to at a given time, and get their address (or at least the address of the nearest cell tower). The fact that they are using easily available (and possibly free) services which outright state their lack of accuracy and up-to-dateness is a sign of laziness, incompetence or gross negligence (or one or more of the above), and not on the part of the company providing the database.

    If the rest of maxmind’s actual customers are happy with ‘probably accurate to the nearest country’ (which is hardly useless, as it can give you stuff like a good guess at the currency to display, for example), then by breaking such a service maxmind aren’t doing themselves or their customers any favors, are they?

  50. Amphiox says

    Well, it seems to me that this is most likely an example of a deficiency that arises as a result of historical contingency. It was either an error that occurred in an early iteration of the software, or a deliberately designed feature suitable for an initially intended range of applications, which now produces a problem when the software is applied to a new application not initially intended.

  51. Marshall says

    It’s not really a question of bad vs good programming in this case, it’s whether the solution is practical. Many programmings do indeed know what the *correct* way to implement a solution is, but you have to weigh the costs against the benefits.

    Changing the default location may be a 1-hour job for a junior developer, and costs $50 to fix.

    Changing the underlying mechanism by which the technology returns bad locations may cause hundreds of hours of labor, not to mention it may break the products of customers who use your technology, which of course will require further debugging, and may even break quality-of-service contracts.

    So now you have lawyers, thousands of hours of labor (and the billing time that go with that), headaches, and a huge backlog of customers with new problems–all to do it the “right” way, as opposed to a slightly sub-optimal way which costs $50 and one hour. If I were running a business faced with these two options, this would be a no-brainer.

  52. newbery says

    So, you’re complaining that this demo page doesn’t display all the data that can be queried from their various products? I’m not sure I understand your point.

    Well if you read a complaint of inaccuracy as a complaint of incompleteness, you’re not likely to understand the point being made.

    In this example, you assert that the postal code was incorrect but there was “no indication that it may be inaccurate”. The “indication” you are looking for (the “confidence” value) appears to be available from one of their enhanced products. So… again, this demo page appears to only demonstrate their more basic product and yet somehow the results are an indictment of their entire product line. This is an odd position to take.

    I couldn’t give a fig for all the technical discussion; as a potential end-user, this would not inspire me to pay good money for the service being offered. And it definitely shouldn’t, IMO, be being used by law enforcement bodies. As far as I can tell from the demo—which is, after all, supposed to persuade me that the product is worth paying for—it’s a heap of shite.

    There is the crux of the matter. You are not the target market. This product is NOT being sold to “end-users” but to developers that build tools on top of this data. Presumably, good developers would try to understand the limitations and proper usage of this data before they pay for it.

  53. says

    newbery #55:

    So… again, this demo page appears to only demonstrate their more basic product and yet somehow the results are an indictment of their entire product line. This is an odd position to take.

    The demo includes postal district data. This implies that it should be possible display the correct postal district, yet the district displayed is incorrect. The demo includes lat and long data which implies that the result could be accurate to as little as 11 metres, yet places me three miles away from my actual location. The demo, therefore, directly implies either that they cannot produce the degree of accuracy their own chart implies is possible, or that they are trying to imply an impossible degree of accuracy. Either way, that makes it a terrible demo. And if they cant produce a good demo, why on Earth should anyone assume that the rest of their work is any better?

    If a grocery shop’s window-display was full of rotten fruit, would you even bother to walk inside?

  54. rietpluim says

    Heh. Lately I found my own cell phone number on the Internet. It appears I have got the latest iPhone.

  55. newbery says

    The demo includes postal district data. This implies…

    Let’s just stop there shall we? A real professional in this field would not base their purchase decision on what the demo “implies” when the seller’s website already provides very detailed specifications about they are selling. Even their basic product does provide real value for many real world use cases. If all you see is rotten fruit, perhaps the problem may be with your eyesight?

  56. says

    newbery #58:

    Let’s just stop there shall we?

    No, let’s not. If a demo of a product which claims to identify something includes specific parameters, then it is not at all unreasonable to assume that the identification produced is expected, by the demonstrator, to fall within, or reasonably close to, those parameters. That’s what a demo is for; to show the performance which may be expected from the product. This product may well be reasonably good at identifying what town a person is in, but if that’s the limit of its accuracy, then a statement to that effect should be included in the demo. It isn’t. Instead, they include a purported level of accuracy which is not met. Which makes it a shitty demonstration.

    Even their basic product does provide real value for many real world use cases.

    It may well do, if all the user is interested in is a very rough guide to what town someone lives in or near. Which has nothing to do with the fact that the demonstration, by including much finer-tuned location parameters which it then fails spectacularly to meet, is seriously flawed as an advertising tool.

  57. newbery says

    Daz #59:

    That’s what a demo is for; to show the performance which may be expected from the product.

    Again, I’m bewildered why this has got you all bunched up. A product demo is usually just to show that something “works” for some arbitrary definition of works… sometimes this is done by just posting a video of the product in action. Whether this particular demo fails to “perform” this task largely depends on your expectations.

    You claim to feel misled by the product advertising. I’m guessing you found their demo via the link on this page, https://www.maxmind.com/en/geoip2-precision-city-service, which includes a link to their accuracy page (in the asterisk on the postal code). I suppose an argument can be made that the link could have been more prominent. Shrug. It’s there at least. And the same information is sprinkled about in several other places too. I don’t feel misled.

    I posted the link to the accuracy page before up-thread but, for what it’s worth, here it is again,
    https://www.maxmind.com/en/geoip2-city-database-accuracy?country=United+States&resolution=postal

    In any case, I think my contributions to this thread seem well past the point of diminishing returns. Back to lurker mode…

  58. ianjs says

    I have to agree with newbery; this is not Maxmind’s problem and it is not “bad programming” (at least, not by Maxmind)..

    The bad programming is entirely on the part of the agencies who decide to use this information as though it was pinpoint accurate. Maxmind never claimed it was anything like that.

    Anyone with a basic knowledge of IP addresses would realise that that they are not traceable to a specific street address. The best you could do is to trace it to the ISP (maybe), which is obviously not the customer’s address.

    It beggars belief that an agency would use this information in a decision to kick down someone’s door. Did they even test it by, say, using their own IP address?

  59. ianjs says

    Ok, when I say, “not Maxmind’s problem” I mean technically of course. Obviously making some people’s life a misery should be dealt with, but it sounds like they have tried to deal with this in a sensible way once they found out about it.