eZDebug / Firefox Add-on no longer supported

Tags : ezdebug , ezpublish
Maxime THOMAS on 24 of Jan 2011 - 08am

From now, the ezdebug extension for firefox 3.0 and higher isn't supported anymore.

Add eZPublish Documentation to you chrome search engine

Tags : chrome , tip , ezpublish
Administrator User on 23 of Jan 2011 - 11am

A quick tip for those using Google Chrome.

  • Right click on the address bar.
  • Choose Modify the search engines...
  • Set things like this :
eZPublish Documentation addon for Google Chrome

How does it work ? In your address bar you type the following : ez site.ini and you will get the results search page from doc.ez.no.

Some security points you've got to check

Tags : security , xss , ezpublish
Maxime THOMAS on 23 of Jan 2011 - 10am

Here are some points you should definitively look at before announcing to everybody that your website is going live. This article is not exhaustive and if you think it lacks some important thing, just go and share by a comment.

How can I put some mess in your website ?

This is the standard question of the potential hacker. "How can I hijack your website to make it do something else ?"

Some tracks about how we can do that in a simple way :

  • Use login forms to be logged as someone else
  • Use standard HTTP methods (get and post) to access data and simulate normal operations
  • Use content to generate bad things : bugs or bad content
  • Use scripts / robots to generate bad content or decrease your performance
  • Let the common configuration of the used solution

Those different tracks are more or less well managed by software. In the next part, I will take an example, the eZPublish CMS and show how they've protected each point.

An example : eZPublish

The editor user

A common mistake is to forget to remove default users (or to change their passwords). For example, try to login with editor/editor on a website running eZPublish. It's also very common to set password that are name of the project or common values ("test", "ezpublish" and so on). The best option is to remove this users and generates real passwords with a dedicated tool.

The user view

In eZPublish, the URL allow you to access point and features of the website. For example, if features are not explicitly unplugged, we can easily reach the following URLs : /user/login/user/register and /user/forgotpassword. It means that by typing in the URL the correct /module/action URI I can have access to features you don't want to be shown on your website. The best choice is to use rights and policies to unplug them and to check the PolicyOmitList setting.

The Ajax based feature

It's not because you have an Ajax feature that delighted people cannot see how things works. Most modern browsers have an XHR Sniffer (see Firebug for Firefox and Google Chrome Inspector for Google Chrome) that allows you to see what URL is called and what are the response. If you use Ajax to get only compiled HTML, it will be ok, but if you use JSON, it can be tricky : what happens if I set different settings in the call and I analyze what it returns ?

The utility of a captcha

The captcha will avoid to let non human contributors to contribute. For example, if you write an article and that someone create a robot to automatically post generated comments on it, you may have to problems : the first is that the amount of not real comments will depreciate your article and the second is that it can show bad things you don't want to have on your website (bad words).

The solution of the captcha introduces a random field that is used to check that the person on the other side is a human. Some very clever robots can analyze pictures inside the generated picture with characters recognition.

I personally recommend the excellent reCaptcha service that is free and efficient.

The mail bomb

Another similar point is to use the system, eZPublish, to send mail to other people. A tip a friend feature without restriction of rights can be very dangerous. For example, a robot can send tip a friend messages with bad words to a list of people.

The security point is to insure that mail are sent via the system only when users are logged (with the form that uses a captcha) or to limit the number of sendings.

The XSS

If you do extra development in eZPublish, juste take care to follow the novel philosophy of this software. Store everything the user has typed and process it for restitution. If you don't process stored data, you may have the following issues :

  • The user has set HTML in the body field of his comment. Rendering that directly, you can have extra design appearing because the user put a h1 markup in the input. Mostly, it's unintended because people does a lot of copy/paste things.
  • The user has set Javascript in the body field of his comment. Rendering that directly can have bad implications, the user access to the DOM and potentially to anything that is loaded dynamically. For example, the user typed a window.close() inside the body of the comment. Each time the article will be loaded, the window will be closed. Hmmm. Not so good.

To prevent this, you can use the wash operator in eZPublish. It kills every extra markup to have pure data.

This methods can also be used to try to access restricted areas of the website using SQL injection for example. This is prevented in eZPublish with the eZDB class.

Conclusion

The best way to secure a website is always to understand how people will try to attack it. This post has listed some known tracks, feel free to add your owns.

How to use SolR for custom implementation

Tags : ezfind , solr , status , ezpublish
Maxime THOMAS on 23 of Jan 2011 - 09am

This is the first topic on the subject but as it deals with the WSE Status extension for eZPublish, it can be interesting to have some feedback on this feature.

So, this post is about the Search component from the eZC / AZC stack. In the following parts, we will explain what was the need, how to install and set up SolR and how the component can be used with our configuration.

As our project mainly relies on eZPublish and eZFind, we won't detail how they work but only how they have been modified to make our extension work.

The aim

Our goal was to provide a powerful search engine indexation to index extra data we put in specific tables in eZPublish. For example, let say we have the following table :

Example table
Field Type Description
id Integer Primary key
field one Text Text 
filed two Integer Integer

And it is perfect because eZFind is the best search engine for eZPublish. However, after a few moments diving in the code, it appears that it was not usable directly as the its implementation was designed to only index eZ Content Object and it would not do the trick.

The best option was finally to use the instance of SolR packed in eZFind and configure it to add our own data.

The solution : a custom SolR schema

This was a bit tricky due to the documentation of SolR. There's a Wiki reachable here but important are not always documented and you have to get in the XML file to look how it works.

Here are the reference that can be helpful before going further :

eZFind configuration

eZFind comes with two configuration sets :

  • the normal one with only one core / index : your website
  • the shared one with mulitple cores / index : your website in fre-FR, eng-GB, esp-SP and so on

The normal set is located in extension/ezfind/java/solr meanwhile the multicore one is located in extension/ezfind/java/solr.multicore.

A set is made of two directories :

  • conf : handle the configuration
  • data : handle the binary data

You may also have other directories that will be helpful to use specfic filters or external features but SolR needs those two directories at least.

By default eZFind just use the solo configuration set, so we must enable the multi core one.

In extension/ezfind/settings/ezfind.ini :

MultiCore=enabled
DefaultCore=eng-GB
LanguagesCoresMap[eng-GB]=eng-GB
LanguagesCoresMap[fre-FR]=fre-FR
LanguagesCoresMap[nor-NO]=nor-NO
LanguagesCoresMap[example]=example

This settings allow you to map

In extension/ezfind/settings/solr.ini :

Shards[]
Shards[eng-GB]=http://localhost:8983/solr/eng-GB
Shards[fre-FR]=http://localhost:8983/solr/fre-FR
Shards[nor-NO]=http://localhost:8983/solr/nor-NO
Shards[example]=http://localhost:8983/solr/example

SolR Configuration

SolR has another cool feature called the sharding that allows you to make one query on several cores / index. It's useful in case you have several index that are hetrogeneous : it means you can ask for one term in one dictionnary and have a result for all dictionaries. In eZFind, it's used to have translated result : you searchbanana and you will get results for banana in english and banane in french.

In SolR, there are three XML files to set up to have a full configuration :

  • solr.xml
  • solrconfig.xml
  • schema.xml

solr.xml

This file is simple, it declares cores for the SolR system (sorry about this one :) ). A core is an index. In comparison, we can say that a core is a reference, like a dictionnay. You can have several dictionnaries : English, French, Spanish, Portuguese and so on. But you can also have several application domains, like dictionnary about medecine, about computer science or whatever.

Our only modification to this file was the add of a specific core :

<core name="example" instanceDir="example" />

The attributes are defined like this :

  • name : name of your index / core, will be available at http://localhost:8983/<name>
  • instanceDir : directory that contains all conf for this index / core (conf and data)

Then, copy the directory extension/ezfind/java/solr to extension/ezfin/java/solr.multicore/example.

solrconfig.xml

This file has been left by default for us but you may care about the language specifications made (search English for example).

schema.xml

This is the main file which will help you to map your data with fields inside Solr. But before some explanation about the SolR concept.

SolR can have different index, we've seen that just above with the core, it's useful because you can separate the index and query different index with one query (sharding).

For one index, SolR can handle several types of data :

  • Structured data : identified fields that will be required for each piece of data you want to index. For example, if you want to index homogeneous documents that is in eZPublish, you will need to provide data like the node_id, the section and so on.
  • Not structured data : not identified fields that will be indexed in the index and that are not required. For example, if you want to index heterogeneous documents, you can index data about a video and data about a picture even if picture and video content does not share fields together. Those fields are named dynamic fields.
  • Mixed data : you can have both identified and required fields and not identified and not required fields.

In eZFind, the configuration is set to Mixed data and all content fields are required. This is why the eZFind implementation is not so extendable. So I used another client has indicated by Paul Borgermans.

ezcSearch

In ezcSearch, the eZComponents / Apache Zeta Components search component, the schema.xml that is provided is a base for what you want to do and is also Mixed Data. ezcSearch also need to implement some interfaces that are compliant with the Persistent Object definition.

I made some testing with this client and I found the following a bit strange (or I did not properly understand it) :

  • There's a hard coded field called ezcsearch_type. If you know about this one, just share on the forum.
  • The unique id field in the schema must be id.

More info about this in the Fisheye repository for Apache Zeta Component Search.

I finally inherited the SolR manager class to change the index function so requests can work whith my fields and I also kicked out all the static fields I don't want from the schema.xml and set my own fields. It's much better and you're free to do what you want to do.

Conclusion

SolR is very powerful and not so accessible due to in-code documentation. Maybe the best point would have been to buy a book on it before starting.

eZFind is an out-of-the-box solution that works only for eZPublish content, which is a bit restrictive in our case. According to Paul Borgermans, the next version of eZFind will be able to take care of extra non related content fields !

ezcSearch is useful to generate all the queries sent to SolR but it's too much restrictive for an out-of-the-box use. However it exists so thank you guys for having already done the job !

News #2011.1

Tags : wascou
Maxime THOMAS on 10 of Jan 2011 - 06pm

Happy New Year !

WSE Status is almost finish and we are completing some details about eZFind.

The documentation is written and we will published the Documentation Page in the software section very soon.

The movie and screenshots have been planned and will be out at the end of the month.