More

jordoh · on Feb 20, 2024

Code delivery happens in desktop apps too, when you download the binary from evilsite.com, or when you receive an auto-update, they can give you a different binary than the security professionals reviewed. That's assuming the professionals even reviewed the binary, and not the source evilsite.com claimed it was built from.

It would also be difficult for said professionals to detect IP-(range)-specific backdoors (with as much obfuscation as you like; only send on Tuesdays; encrypted using a string constant elsewhere in the binary), in App Store delivered binaries that are harder to vary per downloader.

Some web apps - [Cryptee](https://crypt.ee/threat-model) is a notable example - address this with a "trust on first use" approach, that makes any change to the (web) code require approval, but that's in the same realm as a desktop app, where you've trusted it on the first download, and trust it to have actually followed through on that promise.

jordoh · on May 5, 2021

It should be noted that this extension strips ETag headers from all responses by default, which can break sites in surprising ways. As a developer of a web application that relies on ETag headers for vital functionality, I see not-infrequent support inquiries from ClearURLs users who don't understand the technical ramifications of this feature - nor do they understand why so many of the websites they use are so broken.

Khalos · on May 5, 2021

Have you considered using something other than ETag for your use case? It seems like ETag been compromised by trackers, and unfortunately this is why we can't have nice things.

jordoh · on May 5, 2021

We use the ETag header to make use of browser caching - not just for performance, but as a component of offline support. Yes, we could add an additional header with the same information to work around this specific extension for application-specific functionality using it, but that would leave the browser-based features broken.

While the ETag header may have been usable for cross site tracking at some point in the past [1], browser caches are isolated per-origin in Firefox, so there's no longer a cross-site tracking concern. That leaves it usable to identify you across sessions only in a first-party context, just like cookies, IP addresses (to a lesser extent), the Last-Modified header, and any number of other identification techniques ClearURLs doesn't block.

[1] I'd be interested to see any credible evidence of ETag headers being used for tracking in the wild - I've only seen theorizing that it _could_ be used as such, prior to cache isolation being implemented in Firefox and Chrome.

Khalos · on May 5, 2021

According to https://en.wikipedia.org/wiki/HTTP_ETag#Tracking_using_ETags

> ETags can be used to track unique users, as HTTP cookies are increasingly being deleted by privacy-aware users. In July 2011, Ashkan Soltani and a team of researchers at UC Berkeley reported that a number of websites, including Hulu, were using ETags for tracking purposes. Hulu and KISSmetrics have both ceased "respawning" as of 29 July 2011, as KISSmetrics and over 20 of its clients are facing a class-action lawsuit over the use of "undeletable" tracking cookies partially involving the use of ETags.

It appears that there have been at least a few cases of this in the wild.

The main distinction (at least to me) between ETag and the other tracking methods you mention is that ETag doesn't appear to be easily clearable by a user (although that sounds like something browsers should fix if they haven't already).

It's unfortunate that features like this end up getting co-opted by trackers, which leads to breaking legitimate use cases like your app in the process.

jordoh · on May 5, 2021

That's certainly credible evidence for past use I overlooked, though it remains unlikely to be useful with the advent of per-origin cache isolation.

The Last-Modified header can be used in exactly the same way, and isn't blocked by this extension, which harkens back to my original point: this is an extension that appears to see significant use by non-technical users, yet it breaks a browser feature by default. There are plenty of other methods of identifying a unique user that it doesn't prevent, so this seems like a pretty unexpected feature users should take note of.

jordoh · on Sept 10, 2020

`_` being assigned the result of the last statement evaluated is an IRB feature [1], not a ruby feature - hence the note that it is not a universal solution.

[1]: https://github.com/ruby/irb/blob/master/lib/irb/context.rb#L...

stouset · on Sept 11, 2020

So?

Why does this need to be in the language when both major REPLs already support it? Why must this be yet another alternative way to accomplish the same thing as the way that’s existed since the first days of the language?

There is no argument in favor of this addition. It doesn’t allow anything new, it provides a pointless alternative to the current way of doing things, and the one situation where it’s useful—REPLs—there’s already a feature that renders it completely unnecessary.

jordoh · on Sept 11, 2020

For better or worse, providing multiple ways to do the same thing is part of ruby's philosophy. [1]

[1]: https://www.artima.com/intv/ruby.html

jordoh · on Aug 10, 2019

It's used in Rails to reduce the likelihood of un-sanitized user input in SQL fragments [1]. I think it would see a lot more use if additional input sources were marked as tainted [2].

[1] https://api.rubyonrails.org/classes/ActiveRecord/Base.html#c...

[2] http://www.jkfill.com/2012/03/10/preventing-mass-assignment-...

jordoh · on July 18, 2019

I tried running the source images through FineReader Online, but the images with handwriting resulted in "was not processed: the recognized document contains errors". The website image worked, but was missing a few elements, like the other headings on the line with "Minimalist editor".

jordoh · on July 18, 2019

Running tesseract (4.0.0 using the LSTM engine) on the same images leaves a lot to be desired for handwriting, but does well on the (non-handwriting) website image (the source images are linked in the "OCR Image Processing Results" section).

ocrcustomserver · on July 20, 2019

From the Tesseract FAQ:

"Can I use Tesseract for handwriting recognition?

You can, but it won’t work very well, as Tesseract is designed for printed text. Look for projects focused on handwriting recognition."

https://github.com/tesseract-ocr/tesseract/wiki/FAQ#can-i-us...

jordoh · on July 18, 2019

Have a specific image you'd be interested in seeing tested? The article only contains a few examples that could be freely used, but images with sparse random text (e.g. [1]) do tend to have good results across all the services.

[1] https://www.gettyimages.com/detail/news-photo/ken-griffey-jr...

jordoh · on July 18, 2019

That's a good point not explained in the article: there are a huge number of use cases for OCR. In this case, the use is extracting words that can be used in full-text search, so structural extraction isn't a key criteria.

Edit: and now it's hopefully clarified in the article itself. :)

ocrcustomserver · on July 19, 2019

  In this case, the use is extracting words that can be used in full-text search, so structural extraction isn't a key criteria.

In case someone wants to know more, the former is known as "full page OCR" and the latter as "data capture"/"document processing" (or IDP, intelligent document processing).

Full page OCR for machine printed text is considered a solved problem (but not for handwritten text). Data capture is hard to do and involves extracting specific fields from documents.

The first big cloud company going into data capture territory was Amazon with AWS Textract (calling it OCR++). There's also Document Understanding AI (Google) and Azure Form Recognizer in Beta, as mentioned by others in this thread.

The big 3 RPA companies (UiPath, Automation Anywhere, Blue Prism) have also gone into data capture (calling it cognitive or intelligent RPA).

ABBYY (with FlexiCapture) and Kofax (who recently acquired Nuance's imaging division, the 2nd most popular OCR engine after ABBYY's) are the traditional IDP players.

jordoh · on March 25, 2013

This seems like a pretty clear-cut case of assuming that there is some intentional malice or favoritism in actions that are the result of an automated system.

- Google adds some terms like "assisted opening knife" and "assist folding knife" so they are recognized as prohibited knife ads. Adding these terms could very well have been automated based on the terms having a strong association with other terms found alongside prohibited items.

- knife-depots' account suddenly contains X% disallowed knife ads, based on the new terms - where X is relatively large percentage. Account automatically disabled.

- Amazon and Walmart also have X% disallowed knife ads, but X is an extremely small percent of their overall number of items. Accounts remain active.

Fortunately, AdWords is one of the few Google properties where you can actually get a human on the phone and have them intervene with the automated results (though it can certainly take a lot of back and forth, in my personal experience).

In a more general sense, this is something that you constantly run in to if you have your automated systems performing any action that a user could view as punitive. I've yet to see a site that was open about automated actions being such - likely because they don't want to make it too easy to automate getting around the automated rules - but it does seem like there is a reasonable amount of explanation of the system that could diffuse these assumptions of persecution.

ChuckMcM · on March 25, 2013

I don't think it works that way.

When you say "Amazon and Walmart also have X% disallowed knife ads, but X is an extremely small percent of their overall number of items. Accounts remain active." the implication is that total ad spend vs banned ad ratio prevails. That would suggest that the knife guys could start advertising flowers, buy 90% of their AdWords for their buddy the florist and only 10% of their AdWords for their knives and be Okay. I don't think it would work out.

Google is just acting poorly here, why doesn't matter. As the original poster points out, 'assisted open' knives are legal for sale in the US (even in California which is kind of picky about such things). They aren't part of the terms of service explicitly, so either they are or they aren't. And if they are, they are for everybody, and if they aren't they are aren't for everybody.

I'm sure if .01% of WalMart's ad spend was for Canadian pharmaceuticals that they would be shut down in a heartbeat (because the Government really came down hard on Google for that).

Its common knowledge that one of the ways larger successful sellers on Ebay harass smaller sellers is by reporting them for various rules violations. When a "Power Seller" has a dedicates account manager inside Ebay they don't have to put up with random reports like the small guy who randomly gets someone in the problem reporting staff. That asymmetry is exploited to mitigate small seller effectiveness. I have no idea if this goes on in "AdWord" competitors but some of the lawsuits I've read from various people (especially on contested keywords) suggests the advertisers (or their agencies) aren't above such tactics.

jordoh · on March 25, 2013

Do you have evidence to support the assertion that it isn't purely percentage based? Perhaps there is some account out there that has the same ratio of prohibited items as knife-depot, but has not been banned?

Whether AdWords should allow "assisted open knife" ads is beside the point. Correct or not, AdWords is counting those as prohibited, the allegation of the OP is that AdWords banning criteria are applied differently to large accounts. I'm suggesting that the criteria is applied _exactly the same_, based on a percentage.

What I'm describing is also how Google Product Listing Ads get moderated: if X% of your items are in violation, your account is automatically shut down, pending appeal. Google sends a warning when you reach (X - Y)%, and another email when you reach X% and your account is shut down. I administrate 30,000+ PLA accounts and deal with this on a daily basis. This banning process is completely automated.

ChuckMcM · on March 25, 2013

"Do you have evidence to support the assertion that it isn't purely percentage based?"

No. I reasoned to it by flipping it over. If the banning is purely percentage based then a viable business would be to create an entity that laundered AdWords spend. This is how it would work.

Let's call our company "Ads-r-us" and it contracts with Knife Depot and 1-800 Flowers. It charges Knife Depot a 'premium' to get its 'ban-inducing ads' and it charges 1-800-Flowers a discount because its ads are "clean." It structures the premium and the discount such that there is a bit of cream in the middle for it to keep. Then our entity goes off and buys Ad insertions at various bid points. Knife Depot can advertise forever since there is no risk of them being banned because Ads-R-Us is keeping the percentages in check.

I looked around for these guys, I don't see them. (And as a web search engine they aren't talking to me either). So either they don't exist (which I reason is unlikely given how much thought people put into 'scamming' the advertising business on internet ads) or such a scheme wouldn't work. And if it really is strictly percentage based it would work.

From that my thinking was that it might not be purely percentage based. No one is picking up the Canadian Pharma ads, and they have a LOT of money to throw around.

jordoh · on March 25, 2013

There could be such a service flying under the radar, but third-party AdWords resellers are required by the TOS to only use one account per client.

ISL · on March 25, 2013

"- Amazon and Walmart also have X% disallowed knife ads, but X is an extremely small percent of their overall number of items. Accounts remain active."

And the little guy, selling the same product in competition with the big guy, gets killed.

It's often better to have a level playing field.

jordoh · on March 25, 2013

How do you level the playing field in this case? One account has 50% disallowed items (50 of 100) and another account has 0.005% disallowed items (50 of 1,000,000). You could bias _against_ large accounts by banning when they reach a fixed number of disallowed items - but then you let little guys (or any big guy that makes lots of small accounts) skate by under some arbitrary limit.

ISL · on March 26, 2013

Agreed, unless the fixed number is zero.

You ban anyone selling banned items. I think vendors would respond quickly by self-censoring, especially the big guys.

Assisted-opening knives are a tiny fraction of Amazon's sales. If assisted-opening knife advertisements cut off Amazon's entire account, as has happened with Knife Depot, I'm pretty sure that Amazon would de-list assisted-opening knives.

A zero-tolerance policy, coupled with ample forgiveness for infractions, would be fair. Alternatively, you let anyone advertise anything.

Anything in-between turns Google into a kingmaker.

(NB - the very notion of censoring listings at all carries its own troubles, not at issue here)

ypeterholmes · on March 25, 2013

So we wait to see how it plays out...

jordoh · on Feb 1, 2013

The writing has been on the wall for a while about the end of XNA development by MS - none of the development tools for XNA have been updated to work in Visual Studio 2012. Fortunately, there's an open source replacement that run on a much broader range of platforms: https://github.com/mono/MonoGame